Quantum Supremacy

An Era in its Infancy (60 - 90 mins read time)

First blog post! Hopefully it’s the first of many. I decided to go with a topic that’s interesting, presently relevant, and leaves lots of room to explore in future posts. So please tweet me @caervs to let me know what areas interests you.

Many of you will have heard recently that Google achieved quantum supremacy. Many of you also will already know the essential implications of this event, a crucial one being that classical cryptography, the basis for today’s secure internet, is on a path to obsolescence.

If you’re like me though, your curiosity doesn’t end there, and for good reason. All of us, engineers, mathematicians, analysts, data scientists, and other users of the internet, will be making big bets at some point in our careers on where to allocate our time, money, and other resources. We deserve an understanding of quantum computing that goes one step beyond the oft described and mystifying idea that qubits are “in two states at once.” This post will give you a detailed, qualitative account of quantum computing that is more advanced than a popular science article but more digested than a textbook or research paper.

I learned a lot researching this topic and I hope you will learn a lot reading about it. We will start by spending a good amount of time covering how quantum computers fit into the larger landscape of classical computing. These foundations help us to understand quantum, not as a wholy new paradigm, but as a continuation of a long tradition of how we compute. We will dive deep into how the secure internet works so that we can understand how it does not work in a post-quantum world. Then we will learn just enough quantum mechanics to fundamentally understand some important quantum algorithms, meaning I have to leave out A LOT. Still I have tried to make the reading self-contained. Finally we’ll be armed to understand who are the big players in this space today and what they have achieved. Let’s begin!

Part 1: Classical Computing

Three Theories that Shaped History

The word “computer” has for most of its life not referred to a machine. The earliest known example of its use in fact goes all the way back to 1613! Back then up throughout the 1900s (see Apollo program computers), the word referred to a professional whose work was to process information. Like any other specialists these professional computers created an abstraction for their fellow workers. For computationally intense organizations like the Census Bureau or NASA these “departments of computers” were trained in advanced rules for computation and symbol manipulation. They would take in requests for computations and follow various rules to arrive at and output the desired piece of information. For example if you work in the Census Bureau where county population tables are available you may ask your computers what the total population is per state, an early application for what would eventually become IBM.

The story of how we trained people en masse to solve analytical problems and then ultimately automated their work has unfolded over a thousand years and defined the modern world. Understanding this history gives us a better sense of what are the time-invariant truths of Computer Science and what are just fads. Those of you who are eager to dive into quantum and the secure internet are welcome to skip to Part 2.3. I won’t judge you! In this part we will study three key milestones that gave us the computer, starting with how humans learned to be computers using a program for arithmetic.

Arithmetic and Algorithms

In 8th century Arabia a new power had emerged to challenge the old worlds of Persia and Rome. During the hundred years after the death of their first leader, Muhammad, the Arab clans of Ummaya, Abbas, and Ali would form a vast empire and war amongst themselves to determine who would lead it. When the dust settled in 752 Al-Mansur the Abbasid was victorious. This new emperor declared that all books throughout his land should be collected and translated into Arabic and that all wise men of his land should congregate in a new house of wisdom in Baghdad so that they may discover knowledge to fill new books.

Among the scholars of this house was a Persian named Muhammad ibn Musa al-Khwarizmi (meaning from Khwarazm). His most famous work on “al jabr” (lit. the balancing) would become today’s algebra, but we’re more interested in his second most famous work, his Principles of Hindu Reckoning, which would introduce the West to a program for performing arithmetic, as with an abacus, but no hardware needed. It was thus that the method of al-Khwarizmi later transliterated as the “Algorithm” was born.

Khwarizmi’s Indian program is based on what today we would call a place value system. From a finite alphabet of numerals we can represent infinitely many numbers by using place to indicate successively larger groupings of units. Khwarizmi’s work is now extinct unfortunately but from later sources we can reconstruct his program for the addition of numbers x and y which should look familiar from grade school.









Arrange x and y one on top so that their digits are aligned from the right
Track your "carry" digit which is initially 0

Start from the rightmost digit
Add the carry and top and bottom digits (blank is the same as zero)
The ones place from this addition is a next digit of your result
The tens place is your new carry
If there are no more digits, carry is the last digit and you're done
Otherwise move one digit to the left and repeat from step 5

Simple enough. We’ll skip running through an example as you all have done addition of two numbers before. You may notice that al-Khwarizmi wrote his numbers with the biggest place at the left and the smallest place at the right just like we do in English despite the fact that he wrote in Persian and Arabic which are both right-to-left languages. Look carefully at the above algorithm and you will realize perhaps surprisingly that it is we who write numbers in the Perso-Arabic style of right-to-left.

We start with this simplest of algorithms to unravel the mystery of why algorithms in general work, a question at the heart of computation, quantum or otherwise. To answer this question we’ll move beyond simple pairs of numbers we want to add and start drawing out the whole space of inputs and outputs to addition. When we enumerate this space we start to make a graph like this one

Every circle represents a number that may function as an input to or an output from the algorithm. A square represents the parameters of a specific execution of the algorithm with arrows in denoting input and arrows out denoting output. Of course we have only shown six possible parameter pairs but the full graph goes on infinitely in two (countable) dimensions.

Because place value is familiar to us I have labeled the nodes with their decimal representation, but there is nothing extra special about decimal. We could have used a different base than decimal. We could have used tally marks or Roman numerals, spatial relations (brail), or electrical impulses (like computers do). Our choice of representation is somewhat arbitrary. Critically however when we remove labels the underlying structure of the graph always remains the same. It is fundamentally the structure of addition. When we use algebra in this article we will mean this type of collection of elements and their interrelationships.

Whatsmore, for any numerical representation you choose, there is exactly one way to label the elements of the graph. This is because the natural numbers with addition are self-similar in exactly one way, or stated equivalently they have no symmetries. To see this fact is quite simple. In any representation, "one’ will be the unique value that has no arrows going into it; it is never the output of our addition. “Two” is the sum of one and one. “Three” is the sum of two and one and so on. You may recognize this simpler characterization of the naturals as Peano’s axioms. In fact for any value we can enumerate all the arrows going into it because we know there are $\frac{n}{2}$ of them

A graph of numbers without any numbers written on it sounds pretty abstract so let’s remember concretely what we’re doing. In any situation where we’re adding we are taking two disjoint sets of objects (apples, houses, spears) and asking ourselves how many objects we would have if we combined the two sets. It is here that our choice of representation starts to matter because not all representations are as easy for us to manipulate. Some take more time, some more space, but all of them have the same graph. The magic of the decimal representation was that it led to a procedure for doing any conceivable addition with a sequence of simple and unambiguous symbol manipulations.

To see why this is so special let’s try to do the same thing with another operation we use a lot in cryptography, integer factorization. We’ll start by drawing out the same kind of algebraic graph but for multiplication

In our place value representation we’re able to find forward relations in our graph easily using our grade school algorithm for multiplication, but backwards relations are not as obvious. There is no clear symbolic process that will take you from a number to its multiplicative factors. Unlike addition and multiplication, there is especially no process that will let you factor one or two symbols at a time. If you find one, notify the IETF immediately!

We can however invent a representation in which backward and forward relationships are obvious. Let’s contrast our familiar place value system with another system which I will call the Abramsian system. In this system we denote numbers by the weights of the prime factors which make them up. For instance the number which in place value is 63 would in our new system be 0:2:0:1. The left most space corresponds to the power to which 2 is taken, the next space the power of 3 and so on so we have

20 \cdot 32 \cdot 50 \cdot 71 = 63

$2^0 \cdot 3^2 \cdot 5^0 \cdot 7^1 = 63$

Every natural number (ignoring 0) has a unique representation in this system. As you can see conversion from this system to place value is easy using traditional arithmetic. Additionally traditional operations like multiplication and division are easy. We merely add or subtract the factors that make up our numbers. Recreating our graph above in this representation we see such patterns

This graph is structurally the same, but the symbolic relationships between elements are more obvious. To multiply numbers we add the components that correspond to the same prime factor, similar to how you would add the components of a vector. Factoring is even easier in our system. All we do is break up numbers into their constituent units.

If we allow for negative weights we can also represent rational numbers and will always be writing them in reduced form

2 : 1 : 1 1 : 2 : 0 = 1 : - 1 : 1

$\frac{2:1:1}{1:2:0} = 1:-1:1$

will for instance represent the familiar fact that

60 18 = 10 3 = 21 \cdot 3 - 1 \cdot 51

$\frac{60}{18} = \frac{10}{3} = 2^1\cdot 3^{-1}\cdot 5^1$

When we try to add however we hit a snag. There is no obvious way to combine the symbols of two numbers to produce the symbols of their sum. This is strange because we’re so used to thinking of addition as a simple operation. So perhaps we try converting back to decimal. After all that’s the easy matter of multiplying together the factors of our number. Decimal then makes addition easy. It’s when we try converting back that we realize we’ve hit our original snag. Conversion back from decimal to Abramsian is hard. It requires that we be able to factor place value representation!

So one representation does addition and multiplication well, the other does multiplication and factoring well. Contrasting Khwarizmi’s system with our Abramsian system we learn an important lesson. The representation of your information determines which operations are easy and which are difficult. Choosing between representations you may be trading one operation for another. As we shall see, the power of quantum is that it gives us a single representation that does both addition and factoring well.

Khwarizmi’s system takes a great leap from earlier mathematics. Examples we have of Babylonian mathematics and Greek mathematics before it don’t show much separation between logical inference and numerical inference. Logic was the pattern of thought that took you from your data (lit. “that which are given”) to their natural consequences. Khwarizmi’s work demonstrated that there was a subtle difference. Unlike general logical inferences, the rules for inferences about numbers could be laid out so precisely that they could be taught to children, and evntually machines. With our second milestone we will see that logic and mathematics are not so different after all.

Algebra and Logic

The year was 1854 and Western society was rife with industry and scientific discovery. Only two decades prior Joseph Henry had demonstrated that an electromagnet can rectify the signal of a telegraph over long distances and insodoing invented the first electronic signal-controlled switch, the predecessor to today’s transistor. Three decades prior professor Charles Babbage and Ada Lovelace had begun work to mechanize the calculation of difference tables, a Victorian analog to today’s digital computer.

It was in this context that an English mathematician named George Boole took on an ambitious task. In his comprehensive Investigation of the Laws of Thought Boole laid out a new method for encoding the statements of our natural languages that would make logical reasoning as unambiguous and programmable as numerical reasoning was. He defined an algebra of propositions using the familiar operations of multiplication and addition to capture the linguistic conjunctions of “and” and “or.” His algebra only allowed for two values, true and false, with the goal of mechanizing the discovery of truth.

Boole demonstrates his method, in chpater VI of the Investigation, taking an example from Jewish law of the form “Clean beasts are those which both divide the hoof and chew the cud.” So if we have

x = clean beasts;
y = beasts dividing the hoof
z = beasts chewing the cud

we can multiply to write the full proposition as $x=yz$ . Boole’s original system conflated predicates with sets as the two had not yet been broken into separate objects of study. For a modern reader interpreting his symbols requires a little extra work, so we’ll take some liberty with his system.

We can represent each object in our world as a collection of Boolean features, predicates which are either true or false of their respective object. Say we want to extend Boole’s example of Jewish law to incorporate more requirements of foods. We represent each food by a series of values for predicates of that food like

(𝑖 𝑠 𝑚 𝑒 𝑎 𝑡, 𝑑 𝑖 𝑣 𝑖 𝑑 𝑒 𝑠 ℎ 𝑜 𝑜 𝑓, 𝑐 ℎ 𝑒 𝑤 𝑠 𝑐 𝑢 𝑑, 𝑖 𝑠 𝑓 𝑖 𝑠 ℎ, ℎ 𝑎 𝑠 𝑓 𝑖 𝑛 𝑠, ℎ 𝑎 𝑠 𝑠 𝑐 𝑎 𝑙 𝑒 𝑠)

$(is\space meat,\space divides\space hoof,\space chews \space cud,\space is \space fish, \space has \space fins, \space has \space scales)$

and construct a function for determining if a given food is kosher

𝑘 𝑜 𝑠 ℎ 𝑒 𝑟 ((𝑚, ℎ, 𝑐, 𝑓, 𝑛, 𝑠)) = (1 - 𝑚) (1 - 𝑓) + 𝑚 (ℎ \cdot 𝑐) + 𝑓 (𝑛 \cdot 𝑠)

$kosher((m,h,c,f,n,s)) = (1-m)(1-f) + m(h\cdot c) + f(n\cdot s)$

or rerendered in today’s syntax

𝑘 𝑜 𝑠 ℎ 𝑒 𝑟 ((𝑚, ℎ, 𝑐, 𝑓, 𝑛, 𝑠)) = (\neg 𝑚 \land \neg 𝑓) \lor (𝑚 \land ℎ \land 𝑐) \lor (𝑓 \land 𝑛 \land 𝑠)

$kosher((m,h,c,f,n,s)) = (\neg m \wedge \neg f) \vee (m\wedge h\wedge c) \vee (f\wedge n\wedge s)$

which encodes the statement “a food is kosher if it is neither meat nor fish or if it is meat with a split hoof that chews the cud or if it is fish with fins and scales.” We can now plug in values for any food like cow or shrimp and through basic symbol manipulation derive an answer to our question.

The true power of our new algebra comes when we combine Boole’s theoretical system with Joseph Henry’s relay as engineers began to do in the 1930s, for we are able to build systems which can automate our decision making task. Our predicate above corresponds to a system that looks like this

which we first specify using abstract logic gates, a modern schematic notation that can have many physical realizations. We can then translate each of the components into an array of relays

By setting and unsetting bits for the various features of our foods our subexpressions get evaluated and reduced to our final decision, yes or no. With more relays we could incorporate the full set of rules. With enough demand from restaurants wanting to cater to Kosher clientele we could mass produce and distribute our “kosher box.” We could sell it as a module and give assembly guides for other dietary restrictions. Do you see how commerce magnifies the effects of Boole and Henry’s discoveries?

In our previous section we noted that Khwarizmi’s program created a wall between general logic and numerical logic. Numbers were things you could reason about so precisely as to be almost mechnical. What makes Boole’s work so marvelous is that he broke that wall and once again unified logic and math. And it goes one step further.

We can take any algebra, like the numerical algebra of addition, and convert it into a binary one. So if we want the algebra of integer arithmetic modulo $2^{32}$ we can represent our number as a bit vector of length 32. When calculating the sum $C=A + B$ of two bit vectors and because every entry in $C$ is derived from a Boolean expression of the entries of $A$ and $B$ , we can compose switches together to automate our calculations.

Now imagine applying this principle to any finite algebra. If your algebra has fewer than $2^n$ elements then you can always represent an element with a bit vector of length $n$ . With the right arrangement of switches you can always automate the output of elements related to your input elements. The simpler the symbolic manipulation the fewer components you need. We will see later that typical computers have built in circuitry for integer arithmetic, floating point arithmetic, and block ciphers, all examples of finite algebras.

The relays of Henry and the gears of Babbage would eventually be usurped by more efficient switches, vacuum tubes and then transistors, in the 20th century, but the tenets of information processing remained the same. If you represent a number or proposition with concrete symbols you can assemble a system that will process those symbols in a way that mirrors human reasoning, logic, and math…

There was just one piece missing. When looking at the original program of Khwarizmi you may notice that it doesn’t specify anywhere the size of the numbers on which we’re operating. A human following this program in fact could operate on numbers of any size and the program would still work. The missing piece was not the addition of single digits for we certainly could treat that as a finite algebra and compose systems for automating that work. No this extra bit of decision making was something that would be systematically studied and mathematically formalized in our third and final milestone.

Sequential Systems

The father of Computer Science, Alan Turing, is perhaps best known for cracking the enigma during WWII and for his test for intelligence which has made its way into pop culture. The foundations of his work however lie in a paper he wrote in 1937, before the war, on computable numbers. Here Turing defined the set of all numbers a computer could possibly produce given infinite time and proved this set to encompass the familiar integer and rational numbers yet still be a subset of the continuum of real numbers.

To prove this fact Turing had to formally define what it meant to compute a number, a task for which he developed the model today known as the Turing machine. He called his devices automatic machines or a-machines for short, an appropriate name given they would autonomously perform computations and a derivative of the age old concept of an automaton. What could these machines do that our switch arrays could not? What was their secret sauce that would allow them to automate the final steps in our program for arithmetic? Remember those steps are

If there are no more digits, carry is the last digit and you're done
Otherwise move one digit to the left and repeat from step 5

We can tell there is a decision to be made, and as we saw in the previous section we can represent decisions as Boolean values. The key is that our decision is based off of the result of our computation and it is a decision of whether to stop our computation, to halt, or to continue to execute from a previous step. We call this feature of an automaton the ability to conditionally branch and it is the essence of what makes a computer more than a dumb calculator and fundamentally unpredictable with algorithms.

We will see in the next section that modern computers implement a form of conditional branching that more closely resembles the form in our algorithm. However, just as Peano reduced the natural numbers to the minimal, barest, property necessary of having a successor relationship, Turing’s version of conditional branching involved a simple, yet functionally equivalent, left/right movement shift. To get a good intuition for how this works, let’s construct a Turing machine to do addition.

Assume we start with a tape consisting of our padded decimal numbers with an alphabet of two addition characters. ■ will be a separator between numbers and □ will mark spaces we have already processed. So we start with

This tape is the starting condition for our machine to be able to add 653 and 47. The logic for our automaton to follow will be easier to specify if we have it work through the operands from the inside out which is why we will write our 47 and our sum backwards. In practice we would translate these numerals to a representation easier for machines to process. In the general formulation of the Turing machine, we want to design a (finitely) stateful automaton that follows a simple loop





Read in a character from the tape
Select an output character based on the input and the current state
Select a new state based on the input and the current state
Write the output character to the current space, overwriting what's there
Decide to move left one character, right one character, or halt

For our specific use case this will look more like





Read and mark a digit from the first operand
Read and mark a digit from the second operand
Add the two digits
Write the ones place of the sum to the first blank solution space
Divide by ten to preserve your carry bit

So after a few iterations our automaton plus tape will look like

and it should halt when it has done the full sum

What differentiates one Turing machine from another is the collection of states that it has and how it transitions between them, similar to how we defined addition and multiplication by their respective topologies before. So we need to discover the collection of states and transitions that define addition for our automaton. For starters, our automaton needs to know what direction, left or right, to go at any time. Direction, simply, is a function of where we are currently and what we are trying to do. When our automaton makes it to the sum space, it also needs to remember the values it has read so that it can write their sum into the sum space. For this purpose an accumulation of carry, digit 1, and digit 2 will suffice and the accumulation will always have a value between zero and nineteen inclusive. So our state can be a triple of the form $(accumulation,location,goal)$ with possible values

({0..19}, {𝑠 𝑢 𝑚, 𝑜 𝑝 2, 𝑜 𝑝 1}, {𝑤 𝑟 𝑖 𝑡 𝑒, 𝑟 𝑒 𝑎 𝑑 2, 𝑟 𝑒 𝑎 𝑑 1, ℎ 𝑎 𝑙 𝑡})

$(\{0..19\},\{sum,op2,op1\},\{write,read2,read1,halt\})$

This design gives us $20\cdot 3\cdot 4 = 240$ total states to deal with, but because most operations will not depend on the specific value we have accumulated we can concisely express our desired behavior with a state diagram of just our location and goal

For brevity we only include the necessary states. Transitions are labeled with triples of the form input -> output/action/direction where we use the following shorthands

k = keep the current label on the tape
a = add the current digit to the accumulator and mark the digit as seen
w = write the one’s place of the accumulator to the tape and floor divide the accumulator by 10

And voila! We have our adder. Our automaton will be smart enough to at each iteration, check to see if it is done, if not, find the next op1 digit to read, find the next op2 digit, write their sum, and repeat. So we see we’ve made the great leap we had wanted. Our automaton, with only finite memory, can perform any one of infinitely many additions.

Conditional branching is what allows us to make this jump from finite algebras like the tables for boolean operations or single digit arithmetic to potentially infinite processes like generating the digits of pi until one of them occurs three times in a row. That’s pretty much it. A classical computer sequentially executes instructions from a program that either do some form of finite algebra or conditional branching. When there is more variety of instructions we say the instruction set architecture is complex, otherwise we say it’s reduced. There are even examples of single instruction computers which combine algebra and conditional branching into a single operation. CISC machines fundamentally can’t do more things (e.g. output different numbers) than OISC ones; they just take smaller (within a constant factor) programs to do what they do.

We have now taken the thousand year journey of classical computing from Al-Khwarizmi and the birth of algorithms to Alan Turing and the formal definition of a computer. Throughout it we have seen a refinement of our understanding of what it means to manipulate symbols and a refinement of the technologies we use to automate symbol manipulation. In Part 2 we will explore instruction based computing further and understand how we take our three historical milestones and put them into practice today. We will thus be equipped to dive deep into the secure internet and understand how it relies on the principles we have learned.

Part 2: In Practice Today

A Modern Computer Architecture

So far we’ve come to understand what algorithms are and how we can get an automaton to execute them. The problem is there is a fairly long path between our algorithm in its original form, a sequence of instructions, and the final system, a collection of states and transitions. As we learned in Part 1, just because two representations of something are equally expressive doesn’t mean they are equally easy to work with. How we expect to program our computers is no exception! A natural next question for us as lovers of automation is if there is a way to short circuit this path from algorithm to automaton.

Absolutely there is! Since Turing’s time we have built machines, both physical and virtual, which are conveniently able to take in our instructions the way we like to describe them, as algorithms rather than state diagrams.

There is in fact an approach to the design of information systems that has become so pervasive, so much a part of what we normally consider to be computation today, that it’s easy to forget there are alternatives. In 1945 the great John von Neumann codified the tenets of this approach when he published the first draft architecture for the EDVAC electronic military computer. For this reason we still refer to the approach as the von Neumann architecture.

For now we’ll understand the von Neumann architecture as having three key components. There is

A common Randomly Accessible Memory module: unlike tapes and hardware stacks RAM takes equal time to access every entry, like cubbies in a mail room. Importantly we use RAM to store data as well as the programs we wish to execute.
An Airthmetic Logic Unit which is essentially a large-scale array of switches, like our kosher box before, but capable of doing lots of different types of operations
And a Control Unit: which decides which instruction to execute. Imagine for our addition algorithm it would start at line 1, get to line 9, and then decide whether to halt or to start again from line 5.

When we strip a modern computer of all the enhancements that make it perform better like caches, branch prediction, and pipelining as well as user i/o peripherals like video displays and keyboards that is all a computer really is.

Now don’t get me wrong. The architecture of a modern x86 system like most desktops and servers is quite sophisticated

without even getting into fabrication and assembly. And future us can explore all of that! But we can mostly ignore it for understanding quantum. We will however cover in some detail how the secure internet works so that we can understand how it does not work in a post-quantum world.

A Network of Networks

Our history has gotten us as far as 1944 and throughout it we have been using the term “bit” pretty liberally as a lense to understand the events. This observation shows the impact of the next turning point in our story as we introduce the man who popularized the bit as the unit of information. He is a man often described as the father of the information age, the mathematician Claude Shannon. Often solitary, Shannon spent much of his life working at the famous Bell Labs where he wrestled with the philosophical question of what it means to communicate. Shannon composed and published his thoughts in his 1948 Mathematical Theory of Communication (the same time and place that Shockley and team developed the transistor).

We’ve seen how to represent arbitrary information as bits but it was in the information revolution that followed Shannon that we really understood the implications of this abstraction. Consider you want to send a message across a long distance, say the 300 miles from Los Angeles to Palo Alto. In the long long ago you would use a horse and maybe get there in a week. Using a long electrical circuit you could shorten that time to about 2 milliseconds and indeed by the 1950s the US and Europe were criss-corssed with thousands of miles of wires to make up these circuits for telegraphy and telephony.

For some time we also no longer relied on heavily trained specialists to encode our messages over these wires using Morse code. As we’ve seen, you can encode anything including character sequences using bit vectors and deliver those bits in sequence. The modern ASCII and its extension UTF are variations on the Baudot code used to automate this process in the form of teletype devices as early as the 1870s.

From the surface using the internet will look a lot like using one of these devices. If I want to get Google’s home page for instance I can start by getting their IP address

$ nslookup www.google.com
...
Address: 142.250.68.100

and then create a session where I’ll send some ASCII encoded characters as a request and get back some ASCII encoded characters as a response

$ nc 172.217.14.68 80
GET / HTTP/1.1                          <--- type in request

HTTP/1.1 200 OK                         <--- beginning of response
Date: Sun, 28 Jun 2020 10:53:00 GMT
Expires: -1
...

the response being an HTML document that my browser can render as a web page. This process was designed to feel like making a phone call to Google. We start a session, write some characters at one end, and Google receives them in exactly the same order at the other. The devil however is in the details.

The internet never goes down, despite our uniformally growing demands to share more information in more forms since the internet’s inception. To accomplish this feat your bits could travel along any one of many paths that are available. Someone severed the direct line between LA and Palo Alto? No problem, our routers will choose to send through a redundant line, or another backbone, or detour by way of Nevada. You want to send to Europe? No problem. We’ll route your bits to New York then hand them off to another provider who has a fiber link across the Atlantic.

This is the magic of Shannon’s bits; they gave us a common currency for information between all these different and variadic media. Whatsmore we could take information in all its forms, documents, videos, GPS coordinates, and so on, and realize any of these in the channels that carry internet data.

So we certainly get availability with this approach but there is a huge drawback in terms of privacy. With our bits conceivably going every which way, via long copper or fiber optic cables, radio waves, satelite up links, we can’t possibly keep track of all the people who could see our confidential information if they wanted to. We need a way to keep information private.

If there were one central hub to the internet the solution would be easy. We could have a shared secret with that hub like the time tested one-time pad, send our encrypted requests to the hub, and let the hub then decrypt and proxy those requests using its own shared secret with Google, our peer. For good reason the internet does not have a central hub. Our encryption scheme, like our protocol for sharing documents has to be peer-to-peer.

TLS and Asymmetric Ciphers

Block ciphers like AES are fast! Moreover so far as we know they’re secure, even from quantum computers. These are methods of encryption on which governments have relied for decades. As such Intel bakes single-round AES instructions into their chips which can process gigabytes of data per second meaning encryption won’t be a bottleneck if you have a typical internet connection. If you use a technique called cipher block chaining you don’t even have to worry about sending the same message multiple times because to someone trying to snoop your bits the messages will all look different.

The problem is that AES is a so-called symmetric cipher. It relies on having a shared key between you and your peer which, like our one-time pad, would work if our network were centralized but does not work for the internet. What we need then is a way to generate and share cipher keys with friends without snoopers getting the keys as well. In practice we use a myriad of techniques to generate a shared secret with our peers (say Google) which are not as fast as block ciphers, but also don’t require us to share a key with each other beforehand. Because of this fact we call them “asymmetric ciphers.”

The first and most famous of these techniques is the RSA cryptosystem, which is quite involved, but fortunately you can find excellent resources explaining it all over the internet as well as the primary spec published by RSA Security LLC. I personally recommend this one.

The tl;dr of RSA is that you can randomly select some (usually two) prime numbers and share their product with the world. Because factorization is hard, doing this in practice gives you the holder of the factors more information than everyone else. Importantly because you know the factors, you also know the totient of your product. Others wishing to send you an encrypted number can take the number to a modular exponent and using Euler’s theorem you alone can find the exponent that will bring the number back to its original value.

Most high capacity systems I’ve maintained prefer not to use RSA and in fact prefer not to use asymmetric ciphers for encryption at all. For this reason I will stick to a more illustrative example of using asymmetric techniques, not for encryption but just for key exchange, which is simpler.

Imagine you and I are standing in a room and in between us stands our nosy friend Randall. I yell a number at you, say 5. You yell back 32. I yell 87. Amazingly, we have just exchanged a secret key with each other that Randall cannot easily figure out. We have done so using the Diffie-Hellman(-Merkle) algorithm for key exchange. It’s essentially this.

Let’s start by standardizing a modulus that everyone knows. Since ASCII is 7 bits, $2^{7}=128$ is a good choice, but to make the math work we have to pick a prime so we’ll subtract one and do 127. Additionally we’ll pick another common number as a “base”
I’ll yell out a number for us to use as our “base” – 5
You’ll pick a secret number at random, say 78, and yell out “32” after calculating $5^{78}mod127=32$
I will also pick a secret number at random, 52, and yell back “87” after calculating $5^{52}mod127=87$

Now for starters we have to note it’s hard for Randall to figure out what our respective secret numbers are. This is because much like factoring, the discrete logarithm problem of finding an exponent given its output is hard in place value. Using this fact you and I can then do something quite genius in its simplicity

I will take your public number to the power of my secret number – $32^{52}mod127=2$
You will take my public number to the power of your secret number – $87^{78}mod127=2$

And because taking exponents is commutative we’ll end up with the same value and Randall is none the wiser. Very clever, right? There is a catch.

On the internet we have to worry about more than snoopers. Another attack vector is for someone to act as a person in the middle, impersonating you to me and me to you, and getting all the information that goes between us. We can fix this problem using another one of our asymmetric techniques, namely at least one of us (say me) will cryptographically sign everything I send so that you know it’s coming from me. In RSA signing is the inverse operation of encrypting, but again in practice we may use other standards like signing based on elliptic curves. A simple and very common one published by the NSA is the Digital Signing Algorithm which works very similarly to Diffie-Hellman so I’ll let you explore how this one works.

Thus, a full cipher suite for a session between you and Google could be

TLS_DHE_DSS_WITH_AES_256_CBC_SHA256

meaning we’ll do our transport layer (fancy name for internet) by exchanging an ephemeral Diffie Hellman key, signing with DSS (code name for the Digital Signing Algorithm), all to get a 256 bit AES key. We’ll then use the key with cipher block chaining and use the 256 bit SHA-2 algorithm for checksums. That’s a mouthful! But that’s the secure internet in a nutshell. It leaves an attacker with very few attack vectors.

Breaking RSA requires fast factoring, and breaking DHE, DSS, or ECDSA requires fast discrete logarithms. The essential feature is always the same. Your private information gives you the priviledge of doing computations that other people can’t do in a reasonable amount of time… with classical computers.

With this (relatively) concise summary of classical computing in our minds and how we use it to secure the internet we are now ready to take the leap. In Part 3 we will be introduced to quantum systems which violate a lot of our intuitions about how reality works. Hopefully you will find though that the principles that guide us to get useful computations out of quantum systems do not change at all from the classical world.

Part 3: Quantum Systems, the Briefest Introduction

The Basis of Spin

All of you will know that the term “atom” is a misnomer. Thought at the turn of the 20th century to be indivisible (a tomos) atoms were later discovered to contain within them electrons, protons, and neutrons (also further divisible fyi). Using nuclear reactions, so called because they occurred in the nucleus rather than surface of atoms, we were able to smash together and break apart these components in new and interesting ways that created the modern era of nuclear bombs, nuclear power, and particle accelerators.

Let’s imagine that using one of these techniques we get a nice even beam of neutrons. Any old neutrons will do, but a common way is to smash Hydrogen isotopes (Deuterium) together to perform Helium fusion which leaves one neutron scattering out. Now imagine we take this lone neutron and direct it in between two parallel magnets, one with the north face in and one with the south face in.

If I tell you one of these three things will happen can you guess which one?

The neutron passes through undeflected
The neutron gets deflected up towards the magnet facing north
The neutron gets deflected down towards the south facing magnet

Personally I would’ve thought choice 1. Not just because there’s a symmetry that rules out 2 and 3 but because I know neutrons are, well, neutral. They have no charge and magnetism is supposed to only affect moving charges, right?

Surprisingly, when Otto Stern and Walther Gerlach did this experiment in 1921/22 they found that 50% of the time the neutron will go with option 2 and 50% of the time it will go with option 3. They used neutral silver atoms instead of neutrons but the neutron version was done a few decades later.

So let’s say we see our little neutron go up meaning scenario 2 happened. Ok. What if it goes through a second pair of magnets? Will it have the same behavior? As it turns out, yes, the neutron was deflected up to the north face so it will do it again. If the neutron had gone south, it would go south again, and again after that. Great! Our behavior is random, but at least it’s consistently random! Once a neutron picks a direction, it sticks with it. Let’s try to account for these observations by building a simple model of the neutron. Maybe we’ll imagine our neutron is itself like a tiny spherical bar magnet, essentially a tiny earth, with a north hemisphere and a south hemisphere.

This would suggest our neutron reorients itself to align with our magnets but then sticks with its orientation. Let’s keep experimenting and see how our model holds up. How about now we’ll rotate our magnets 90 degrees so they’re facing left and right instead. What should happen?

The neutron passes through undeflected
The neutron gets deflected left towards the magnet facing north
The neutron gets deflected right towards the south facing magnet

A perfectly balanced magnet would go through undeterred, but, just like we observe with life-size magnets the tiniest imbalance can push our magnet to go in one direction. By symmetry our imbalances shouldn’t favor north or south so we predict our netron will again go north 50% of the time and go south 50% of the time. And…

Success! That is indeed what happens. So far tiny bar magnet theory holds up. Now for the real test. If we rotate our apparatus arbitrarily, does our model still help us predict which way the particle will go?

After dozens of tests rotating our apparatus every which way we confirm something counter-intuitive. No matter which way we rotate, the neutron has some chance of going north and some chance of going south. We would expect for a small rotation, say 15°, that the tiny magnet and big magnets are still mostly aligned so 100% of the time the neutron will go up, but actually it’s more like 98%. For larger, but still mostly aligned, rotations like 45° the number goes as low as 85%.

Huh. Strange. If you’ve ever played with magnets you know that our theory just hit a snag. Magnets only move in a random direction when there’s an even split between the two directions and that’s because there must be a slight variation in one side that offsets the balance. So we continue to pace around and think about what’s going on and we get a new idea.

Perhaps our tiny magnet is so small, so fundamental, that any interaction with it, no matter how minute, will determine which way it orients itself. As soon as that interaction happens the magnet is either facing up or facing down, and its initial orientation just gives a probability of which interaction will happen first. Perhaps we don’t see this at the macroscopic scale because our big magnets of iron have so many of these fundamental magnets in them that statistically the more probable ones over power the less probable ones.

So, ok then, we say. Maybe the initial orientation of our tiny magent doesn’t perfectly predict the direction it will go, but is still meaningful. If we can describe that orientation maybe we can get predict with some probability where the neutron will go. To get to the bottom of this we need math! And in particular, some math that can handle the rotations of our apparatus, our neutron, and the neutron spining around, just lots of rotations. Fortunately the math of coplex numbers can handle rotations very elegantly. Remember in Part 1 how important it is to pick the right representation? We’re gonna keep coming back to that point.

For complex numbers we can perform a rotation by multiplying by yet another complex number, much simpler than remembering all your trig functions and nesting them to no end. To describe our neutron we’ll need two angles. Imagine we were to start with north perfectly aligned upwards, rotate the north point down along the sphere by an angle θ and then rotate it sideways by an angle φ. We can construct every orientation this way. Think longitude and latitude; it’s the same principle.

We might write our state then like this

𝑒 𝑖 𝜃 | ↓ ⟩ + 𝑒 𝑖 𝜙 | \to ⟩

$e^{i\theta}{\lvert\downarrow\rangle} + e^{i\phi}{\lvert\rightarrow\rangle}$

Some parts here are unfamiliar. We’re using a funky new “ket notation” to denote down and right rotations and we’re using exponentials for complex numbers which you kinda remember from college, but at least it’s simple enough what the expression as a whole means. In a moment we’ll see how we can use a state expression like this one to predict which way the neutron will go, but before that I want you to imagine that this prediction will be easier to make when our state kets are aligned with our measuring apparatus.

Now remember, up is relative, we have expressed the state of our neutron in terms of “up” and “right” but really that’s because we’re using our first apparatus as our frame of reference. Going back to our 45° offset magnets, can we re-express our state in terms of them? Maybe call them upright $\lvert\nearrow\rangle$ and downright $\lvert\searrow\rangle$ ? If we can do that we can more easily predict how the neutron will behave when it goes through our angled magnets.

In fact, any orientation we choose gives us two directions we can use to express the state of the neutron if we combine them in the right way. Each orientation in some sense gives us a distinct and complete “language” for expressing states. If we could translate between any of the two languages that would be ideal. In linear algebra we call such a complete language for expressing states a “basis” and we get a very simple method for translating between bases. First we’ll express each of our new basis states in terms of our old ones.

| ↗ ⟩ = | \to ⟩ + | ↑ ⟩ | ↘ ⟩ = | \to ⟩ - | ↑ ⟩

$\lvert\nearrow\rangle = \lvert\rightarrow\rangle + \lvert\uparrow\rangle\\[10pt] \lvert\searrow\rangle = \lvert\rightarrow\rangle - \lvert\uparrow\rangle$

And then by simply arranging our new weights into a matrix we get our translator!

(11 1 - 1)

$\begin{pmatrix} 1 & 1\\ 1 & -1\\ \end{pmatrix}$

We can input any state description in terms of up and right into this matrix and we’ll get out a new description in terms of upright and downright so

𝑠 = | \to ⟩ = (10) (11 1 - 1) (10) = (11) 𝑠 = | ↗ ⟩ + | ↘ ⟩

$s = \lvert\rightarrow\rangle = \begin{pmatrix}1\\0\end{pmatrix}\\[20pt] \\ \begin{pmatrix} 1 & 1\\ 1 & -1\\ \end{pmatrix}\begin{pmatrix}1\\0\end{pmatrix} = \begin{pmatrix}1\\1\end{pmatrix}\\[20pt] s=\lvert\nearrow\rangle + \lvert\searrow\rangle$

Great. There is all that rotating we have to do to translate between the two frames of reference, but with complex numbers we can encapsulate all of that in a simple matrix. We call the property of the neutron that we are measuring its spin as it is magnetism that was originally thought to come from the particle spinning around. So we might say the neutron has “spin up” if it goes upwards towards the north magnet and “spin down” if it goes downwards towards the south magnet. At this point we have some intuition for how spin behaves and a mathematical language to describe the neutron’s state which should predict the spin. We are ready to face three hard truths that will refine our intuition. At the end of it we will have a description that is perfectly faithful to our neutron, at least as far as quantum computing is concerned.

Hard Truth #1: for particle spins, up and down are orthogonal. We have expressed our spin state as a sum of “up” and “right” rotations because our spatial intuition tells us that those directions are perpendicular. In quantum land, perpendicular means something more abstract. Loosely it means “opposte of” – think of it like, if we definitely know that the neutron will go up then we definitely know it won’t go down, but we don’t know necessarily if it will go left or right. Our math still holds, except instead of choosing up and right as a basis it makes more sense to use up and down, left and right, or forward and backward. Importantly, any state in any one of these bases can be re-expressed in terms of any other basis.

Hard Truth #2: God plays with dice. No matter how we spin it (heh) our neutron always has some chance of going in either direction. More specifically if we write our state with non-unit weights like $ae^{i\theta}{\lvert\uparrow\rangle} + be^{i\phi}{\lvert\downarrow\rangle}$ then we have an $a^2$ probability of measuring up and a $b^2$ probability of measuring down. We use $a$ and $b$ instead of $a^2$ and $b^2$ for the states so that the math of rotations still works. Because all of our operations are linear, meaning we can undo multiplications at the end, it doesn’t matter if we scale up $a$ and $b$ together, so long as we normalize to get $a^2 + b^2 = 1$ when we want to calculate the actual probabilities.

Similar to #2, we’ll call it hard truth #2.5, we can actually multiply our state by any single complex number and that doesn’t change any of our predictions. So not only do the absolute values of $a$ and $b$ not matter that much, but the absolute values of the “phase variables” $\theta$ and $\phi$ also don’t matter. Importantly the difference between $\theta$ and $\phi$ , the “relative phase,” always stays constant.

Hard Truth #3 We can never observe a quantum state without changing it. The very act of measuring a neutron so it goes up will change its state from $ae^{i\theta}{\lvert\uparrow\rangle} + be^{i\phi}{\lvert\downarrow\rangle}$ to just plain old $\lvert\uparrow\rangle$ , hence why we keep seeing the neutron go up afterwards. The math of quantum mechanics describes all sorts of subatomic properties, spins, positions, momenta, atomic energy levels, and light polarization, making an observation of any of these features collapses our state to a single basis vector.

So now, you might say, we’ve completely thrown our tiny bar magnet model out the window. That is correct, but also that’s fine because it was really a stepping stone to the mathematical description we have now. How does the mathematical description relate to reality beyond making simple up-down, left-right predictions? Is the language of the universe truly these ket vectors permeating through space? The honest response is we don’t know. It’s still an open question in Physics and there are many competing answers. For now we will take this abstract behavior as granted and build up our computers from here to discover their new and interesting properties. We will not worry so much about how quantum states relate to reality but instead about how they relate to each other to form an algebra of states.

Entanglement and Superposition

When two neutrons love each other very much their fates become intertwined. It’s a beautiful and magical process we call entanglement. One goes up and the other goes down. One goes right and the other goes left. Our neutrons may be a world apart in space or seconds apart in time and they will still be so intimate with each other that each can predict where the other will go, without having planed it when they were together. “Spooky action at a distance”? Or perhaps proof of multiple universes. We don’t know how it works, but we do know how to use it.

For starters, let’s imagine we have two “fully entangled” particles. Here’s one way you might do this. What this entanglement means is that when one neutron has spin up the other one will definitely, not probably but definitely, have spin down. Mathematically we write this initial state as

𝑎 | ↑ ↓ ⟩ + 𝑏 | ↓ ↑ ⟩

$a\lvert\uparrow\downarrow\rangle + b\lvert\downarrow\uparrow\rangle$

For simplicity we’ll ignore phase by assuming our relative phase is 0. Thus we have an $a^2$ probability of neutron #1 going up and neutron #2 going down and a $b^2$ probability of the opposite happening. What happens if you measure only one of the neutrons? Well you collapse the state of the whole system so if you get $\lvert\uparrow\downarrow\rangle$ then neutron #1 is now up and neutron #2 will be down when you measure it. Certainly interesting, maybe a little spooky, but not usefully different from what we saw with a single neutron. But wait it gets more interesting. See the direction neutron #1 chooses to go doesn’t have to determine the direction for neutron #2 but may only give it a new probability distribution. We call this situation a partial entanglement.

Once again we have a state for our quantum system that we can express in a basis of our choosing. Like before we will choose our experimental results, namely neutron goes up or down, as our basis vectors to make the probability calculations easier. We have two neutrons each of which can go up or down so we have four possible outcomes $\lvert\downarrow\downarrow\rangle$ , $\lvert\downarrow\uparrow\rangle$ , $\lvert\uparrow\downarrow\rangle$ , and $\lvert\uparrow\uparrow\rangle$ . If we express our state as

𝑎 | ↓ ↓ ⟩ + 𝑏 | ↓ ↑ ⟩ + 𝑐 | ↑ ↓ ⟩ + 𝑑 | ↑ ↑ ⟩

$a\lvert\downarrow\downarrow\rangle + b\lvert\downarrow\uparrow\rangle + c\lvert\uparrow\downarrow\rangle + d\lvert\uparrow\uparrow\rangle$

then we know the probability of measuring neutron #1 as up. Sum the probabilities for each outcome where neutron #1 goes up, so $c^2$ + $d^2$ . Once we observe the way neutron #1 goes we also have a way to “re-normalize” that is, figure out the new probability distribution for neutron #2. Simply erase all the states that are no longer possible and then scale to make the reamining weights add to one when squared.

𝑐 | ↑ ↓ ⟩ + 𝑑 | ↑ ↑ ⟩ 𝑐 2 + 𝑑 2 ‾ ‾ ‾ ‾ ‾ ‾ ‾ \sqrt

$\frac{c\lvert\uparrow\downarrow\rangle + d\lvert\uparrow\uparrow\rangle}{\sqrt{c^2 + d^2}}$

I’ve refrained from calling this state what it is, a superposition, because that is a loaded word. Many people start explaining this concept as an example of quantum weirdness without covering the math first so have to say things like “well it’s in two states at once” and “no, it’s not that it’s in one state all along” eventually resigning to “well quantum mechanics is just weird.”

Now that we’ve covered the math, let’s be clear, a quantum system is in one state which predicts with some probability whether it will go up or down. That state belongs to a rich algebra of possible states the system could be in, and that algebra supports operations, like the change of basis we saw before or like the quantum logic operations we will cover in the next section. We can express that state using an arbitrary choice of basis vectors but as a convenience we choose the ones that correspond to the directions we’ll measure.

As we keep coming back to from part one, your choice of representation may be arbitrary, but when you have an algebra there is an underlying structure that is always the same. So what is our underlying structure here?

Let’s start with a simpler question. How many possible states are there? For an in depth look at this you can read Paul Dirac’s original Principles of Quantum Mechanics; he’s the brilliant, albeit enigmatic, mind who gave us that nifty ket notation.

The short answer is that if we ever have a basis for our system, e.g. the four experimental outcomes $\lvert\downarrow\downarrow\rangle$ , $\lvert\downarrow\uparrow\rangle$ , $\lvert\uparrow\downarrow\rangle$ , and $\lvert\uparrow\uparrow\rangle$ , then any weighted sum of these is a valid state of the system and we can express any valid state as a weighted sum. They correspond one-to-one with each other. In other words the expression

𝑎 𝑒 𝑖 𝜃 | ↓ ↓ ⟩ + 𝑏 𝑒 𝑖 𝜙 | ↓ ↑ ⟩ + 𝑐 𝑒 𝑖 𝜌 | ↑ ↓ ⟩ + 𝑑 𝑒 𝑖 𝜎 | ↑ ↑ ⟩

$ae^{i\theta}\lvert\downarrow\downarrow\rangle + be^{i\phi}\lvert\downarrow\uparrow\rangle + ce^{i\rho}\lvert\uparrow\downarrow\rangle + de^{i\sigma}\lvert\uparrow\uparrow\rangle$

encapsulates all possible states the system can be in and as we shall see we can build components to get it in any of these states. The math here nicely generalizes to larger systems. It turns out we can entangle not just two neutrons together but an arbitrary number, say $n$ . Each neutron gets its own up-down measurement so doubles the number of basis vectors we use to express our state.

𝑎 0 𝑒 𝑖 𝜃 | ↓ ↓ . . . ↓ ⟩ + 𝑎 1 𝑒 𝑖 𝜃 | ↓ ↓ . . . ↑ ⟩ + . . . + 𝑎 2 𝑛 - 2 𝑒 𝑖 𝜃 | ↑ ↑ . . . ↓ ⟩ + 𝑎 2 𝑛 - 1 𝑒 𝑖 𝜃 | ↑ ↑ . . . ↑ ⟩

$a_0e^{i\theta}\lvert\downarrow\downarrow...\downarrow\rangle + a_1e^{i\theta}\lvert\downarrow\downarrow...\uparrow\rangle + ... + a_{2^n-2}e^{i\theta}\lvert\uparrow\uparrow...\downarrow\rangle + a_{2^n-1}e^{i\theta}\lvert\uparrow\uparrow...\uparrow\rangle$

This expression should already hint to you some of the incredible power of quantum computers. A classical bit vector of length $n$ can be in any of $2^n$ possible states. Here our quantum system has the same number of components, $n$ neutrons, but it has $2^n$ basis vectors so it can be in $\mathbb{C}^{2^n}$ superposition states, an exponential improvement. Now we did say that absolute magninute and phase factor don’t matter (our hard truths) so that removes one degree of freedom. We also unfortunately don’t get the full gamut of complex weights due to noise so we discretize those just like we discretize analog signals in a classical computer. Nonetheless, exponential is exponential!

It’s Pronounced “Cue-bit”

Ok we have our $\mathbb{C}^{2^n}$ possible states, discretized somehow to account for noise. What can we do with them? It turns out quite a bit. There is a whole family of transformations to the state we can realize in the form of quantum logic gates. For now we’ll focus on the controlled-not gate, a quantum version of the classical XOR gate we saw do additions in Part 1. Every gate performs a linear, or matrix, operation on the state of our system. So imagine we start with a superposition state

1 2 ‾ \sqrt | ↓ ↓ ⟩ + 0 | ↓ ↑ ⟩ + 1 2 ‾ \sqrt | ↑ ↓ ⟩ + 0 | ↑ ↑ ⟩

$\frac{1}{\sqrt{2}}\lvert \downarrow\downarrow\rangle + 0\lvert \downarrow\uparrow\rangle + \frac{1}{\sqrt{2}}\lvert \uparrow\downarrow\rangle + 0\lvert \uparrow\uparrow\rangle$

Since we’ve chosen our basis we neatly put our weights into a column vector

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ 1 2 \sqrt 0 1 2 \sqrt 0 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix} \frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}}\\ 0\\ \end{pmatrix}$

Now multiply by the corresponding matrix for our gate

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ 1000010000010010 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ 1 2 \sqrt 0 1 2 \sqrt 0 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ = ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ 1 2 \sqrt 00 1 2 \sqrt ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0\\ \end{pmatrix} \begin{pmatrix} \frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}}\\ 0\\ \end{pmatrix} = \begin{pmatrix} \frac{1}{\sqrt{2}}\\ 0\\ 0\\ \frac{1}{\sqrt{2}}\\ \end{pmatrix}$

to get the new state of our system after applying our gate

1 2 ‾ \sqrt | ↓ ↓ ⟩ + 0 | ↓ ↑ ⟩ + | ↑ ↓ ⟩ + 1 2 ‾ \sqrt | ↑ ↑ ⟩

$\frac{1}{\sqrt{2}}\lvert \downarrow\downarrow\rangle + 0\lvert \downarrow\uparrow\rangle + \lvert \uparrow\downarrow\rangle + \frac{1}{\sqrt{2}}\lvert \uparrow\uparrow\rangle$

The change is subtle. It swaps the weights for two of our coordinates. And yet we’ve taken two particles that were independent before and made them entangled! Indeed real life realizations of the CNOT gate entangle their input particles. Just as the possibilities for classical logic gates are limitless, so too are the ways we can assemble quantum logic gates to vary the state of our quantum system. At this point we’ll make an important shift away from neutron spin, a specific quantum phenomenon, to the more abstract logic of quantum bits, qubits (pronounced “cue-bits”) for short. The same way we say that an abstract bit can have all kinds of physical realizations, from voltages on a wire to imprints on wax, a qubit can have many physical realizations

System	Description	Doceherence Time $(s)$
Electron Spin	Emit and measure spin of an electron	$10^{-3}$
Ion trap ( $In^+$ )	Suspend charged molecules in free space. Change state with lasers.	$10^3$
Nuclear Spin	Emit and measure spin of a nucleus	$10^0$
Optical cavity	Confine beams of light in a tiny mirrored cavity	$10^{-6}$
Quantum Dot	Nano-particles (tens of atoms) made from semiconductive material	$10^3$

as you will see in this table. It summarizes various qubit realizations as well as their decoherence and operation times in seconds.

One way to think about decoherence is that the more massive a system is the less it exhibits quantum properties and the more it exhibits classical ones. This rule explains why multi-qubit systems are hard to build and why our measurements collapse our system states. To measure the system what we are actually doing is taking a small isolated quantum system and entangling it with the rest of the world, with our highly massive measuring apparatus as well as with us as highly massive observers. With this highly massive entanglement our superposition disappears. In a future post that explores the principles of quantum mechanics further we can see how this phenomenon is a consequence of Heisenberg’s general uncertainty principle. For now we will think of decoherence time as the amount of time we can keep a quantum system isolated so that its components interract only with each other.

What makes our components into qubits is that they all support the same algebra for doing quantum logic, namely superpositions, entanglements, and transformations like our CNOT gate. Now at long last, let’s put these fundamentals to use and see if we can build something we weren’t able to build with classical computers, a computer that can factor numbers efficiently!

Part 4: Quantum “Algorithms”

Our algorithm for efficient factoring is actually going to be a hybrid classical/quantum approach. We’ll keep the quantum part to the essentials that a quantum computer can do better and do everything else with a classical computer. This part will be a lot dryer than the other ones as it’s really a deep dive into Shor’s algorithm. Hopefully if this algorithm looked like incomprehensible vodou before you’re now well equipped to understand it.

Quantum Order Finding

Order finding is a more specific version of the discrete logarithm problem we met in Part 2. If we have two integers $b$ and $c$ , then finding the discrete logarithm of $c$ base $b$ amounts to finding an exponent $i$ such that

𝑏 𝑖 = 𝑐

$b^i = c$

You may recall that this problem has no fast classical solution and hence is the basis of many cryptographic schemes like ECDSA. For a modular arithmetic (mod m) we use different terminology and say that $i$ is the index of $c$ base $b$ when

𝑏 𝑖 \equiv 𝑚 𝑐

$b^i {\equiv_m} c$

The reason here being that special numbers are called “generators” in a modular arithemetic. If $b$ is one of these generators then you can keep incrementing $i$ and you will get a different number (mod M) every time. Eventually you will have hit every number and wrap back to $b$ . Hence $i$ is the index at which $c$ occurs in this sequence. Order finding is the special case where we’re trying to find the index of 1 given a base $b$ , or more specifically find the integer $i$ such that

𝑏 𝑖 \equiv 𝑚 1

$b^i {\equiv_m} 1$

Now let’s consider the sequence of numbers that $b$ generates as a function

𝑓 (𝑖) = 𝑏 𝑖 𝑚 𝑜 𝑑 𝑚

$f(i) = b^i\; mod\; m$

Every base $b$ will have some period $r$ after which this function cycles back and repeats itself. In other words, $r$ is the smallest value such that $f(i) = f(i + r)$ for all $i$ , but if we break up this equation we get

𝑓 (𝑖) = 𝑓 (𝑖 + 𝑟) 𝑏 𝑖 \equiv 𝑚 𝑏 𝑖 + 𝑟 \equiv 𝑚 𝑏 𝑖 \cdot 𝑏 𝑟 𝑓 (𝑖) = 𝑓 (𝑖) \cdot 𝑓 (𝑟)

$f(i) = f(i + r)\\ b^i\equiv_{m} b^{i+r}\equiv_{m} b^i\cdot b^r\\ f(i) = f(i)\cdot f(r)$

Numerically then, $r$ is the value such that

𝑏 𝑟 = 1 𝑚 𝑜 𝑑 𝑚

$b^r = 1\; mod\; m$

which tells us that order finding and period finding are the same problem. If we know how many steps it takes before $b$ wraps back around to itself, we can subtract one and get the number of steps before it wraps back to one. As you might guess, order finding just like discrete logarithms and factoring, is another one of those pesky hard problems. In fact it turns out to be a stepping stone towards factoring.

Let’s say we have a magic black box that can find the order of any $b$ efficiently. For many values of $b$ , in fact more than a third of them, $r$ will be an even number. That means we can take $b$ to the power of $\frac{r}{2}$ to get a number $s$ such that

𝑏 𝑟 2 2 = 𝑠 2 = 1 𝑚 𝑜 𝑑 𝑚

${b^{\frac{r}{2}}}^2=s^2 = 1\; mod\; m$

which tells us that $s^2 - 1$ is some multiple of $m$ . As with any number we can break $s^2-1$ into the two factors $(s - 1)(s + 1)$ and because $s\neq\pm 1\; mod\; m$ we know that neither of these is a multiple of $m$ . Therefore if we take the greatest common divisor between $m$ and either $s-1$ or $s+1$ we will get a factor of $m$ . By repeating this process we can get the prime factors!

Cool, we have a route to factoring numbers and breaking the internet. All we have to do is find the order of a random base $b$ efficiently. How can quantum computers do that you ask? They do so using a component we will explore deeply in the next section, the Quantum Fourier transform. If you think about it this makes intuitive sense. We want to figure out how many steps before our function $f$ repeats itself (the period), or inversely, how many repetitions will be in a given chunk of its values (the frequency). The regular old Fouier transform is all about determining the frequency of a function, or more accurately which frequencies are most present in a function. It does so by having all other frequencies essentially cancel themselves out when convolved with the function. For an excellent visual explanation of this process watch 3blue1brown’s video on the Fourier transform. You won’t regret it.

Before going too deep let’s try an example of order finding with the (discrete) Fourier transform so you can get a sense that it works and then we’ll generalize. Say we want to factor the number 10. Normally since we can tell 10 is even we would keep dividing by two until we got an odd factor, but 10 makes our example easy to understand so we’ll stick with it. We’ll pick a candidate base $b$ less than 10 and with some luck it will lead us to our square root $s$ . Since 10 is so small we’ll cheat a little and look at all of our possible candidates ahead of time

Candidate $b$	Sequence $b^i$	Period
2	2 4 8 6 2 4 8 6 2 4 8	4
3	3 9 7 1 3 9 7 1 3 9 7	4
4	4 6 4 6 4 6 4 6 4 6 4	2
5	5 5 5 5 5 5 5 5 5 5 5	1
6	6 6 6 6 6 6 6 6 6 6 6	1
7	7 9 3 1 7 9 3 1 7 9 7	4
8	8 4 2 6 8 4 2 6 8 4 2	4
9	9 1 9 1 9 1 9 1 9 1 9	2

Remarkably 6 out of 8 of them have even periods so would have worked, pretty good odds for us. Let’s imagine we picked the number 7 at random so we get our sequence 7 9 3 1 7.... Remember, we’re trying to figure out the value of our period, 4, but this gets really hard with large numbers. To apply the Fourier transform we start with a specific complex number, the 10th root of unity, and label it $\omega$ . We’ll construct a matrix of powers of omega $\omega^{ij}$ for the $i^{th}$ row and $j^{th}$ column and multiply by our sequence.

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ 111 ⋮ 1 1 𝜔 𝜔 2 \dots 1 𝜔 2 𝜔 4 𝜔 𝑖 𝑗 \dots . . . . . . \dots 1 𝜔 9 𝜔 2 \cdot 9 𝜔 9 \cdot 9 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ 73917 ⋮ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

$\begin{pmatrix} 1 & 1 & 1 & \ldots & 1\\ 1 & \omega & \omega^2 & ... & \omega^9\\ 1 & \omega^2 & \omega^4 & ... & \omega^{2\cdot 9}\\ \vdots\\ 1 & \ldots & \omega^{ij} & \cdots & \omega^{9\cdot 9} \end{pmatrix} \begin{pmatrix} 7\\3\\9\\1\\7\\\vdots \end{pmatrix}$

It’s a bit of sorcery, but at the end of it we’ll have computed the frequency of our sequence. You can do this by hand, but I was lazy and wrote a program to do it

from cmath import rect, pi

m = 100
sequence = [7, 9, 3, 1] * (m // 4)
omega = rect(1, (2 * pi) / m)
matrix = [[omega**(i * j) for j in range(m)] for i in range(m)]
mul = lambda tup: tup[0] * tup[1]
product = [sum(map(mul, zip(matrix[i], sequence))) for i in range(m)]
squared_norms = [abs(v)**2 for v in product]

for i in range(m):
    print(i, int(squared_norms[i]))

Moreover the Fourier transform gets more accurate the larger your sequence is, e.g. the more repetitions it has. I made our sequence length 100 instead of 10. In practice our values will be MUCH larger than 10 so we won’t have to do this. Our program produces one spike at a frequency of 25 or equivalently a period of $\frac{100}{25} = 4$ . Brilliant! The Fourier transform does indeed give us our order. We try it on our own and confirm that

74 = 1 𝑚 𝑜 𝑑 10

$7^{4} = 1\; mod\; 10$

Thus we conclude that $7^2\equiv 9$ is a non-trivial square root of 1 mod 10, the value $s$ which we seek. $9+1$ is 10, our original value, so unhelpful, but $9-1$ is $8$ which has a greatest divisor in common with $10$ of $2$ . We compute this gcd and just like that we’ve found our factors 2 and 5! Convoluted? A bit, yes. For the sorts of large numbers we use to secure the internet this turns out to be the best way we know to factor them. In the next two sections we’ll diagram out some quantum circuits so we can see how we take this abstract procedure and put it into practice.

Input to the Quantum Fourier Transform

The quantum fourier transform is no different in its behavior than its classical counterpart, the Fast Fourier Transform. Both algorithms take in an input vector and use a matrix populated with the roots of unity we saw before to find the frequencies present in the vector. Where the fast fourier transform got its name from using a clever selection of frequencies that gave it a computational speed up, the quantum version goes even further by calculating the weights for all frequencies simultaneously.

There is a catch. The FFT will give you an output vector in which each entry corresponds to a frequency and has the weight for that frequency. The QFT by contrast will give you an output quantum state that is a superposition of all the frequencies weighted by the weight of that frequency. If we want to use it to get the weight of any one particular frequency we are out of luck. We have to measure the superposition at the end and what we’ll get is some frequency. The more present that frequency is in the original vector the higher our chances of getting that frequency.

If we wanted a quantum alternative to the FFT this would be a disappointing realization. However all we care about is indeed finding a single frequency, the order of our function $b^i$ . For our purposes we really don’t care. So long as we can find the fundamental frequency of our sequence we can find our order and factor our number. The QFT makes this process possible. Let’s start by building the input to our fourier transform, the sequence $b^i$ . At this point we have already selected our base $b$ and want to obtain a quantum superposition of $m$ bits weighted by our values $b^i$

𝑏 0 | 0...00 ⟩ + 𝑏 1 | 0...01 ⟩ + 𝑏 2 | 0...10 ⟩ + . . . 𝑏 2 𝑚 - 1 | 1...11 ⟩

$b^0\lvert 0...00\rangle + b^1\lvert 0...01\rangle + b^2\lvert 0...10\rangle + ... b^{2^{m}-1}\lvert 1...11\rangle$

or written more concisely

\sum 𝑖 = 0 2 𝑚 - 1 𝑏 𝑖 | 𝑖 ⟩

$\sum\limits_{i=0}^{2^{m} - 1} {b^{i}\lvert i\rangle}$

The challenge is that we only have $m$ components, our $m$ qubits, and want to stay in that same neighborhood for our overall system, but we now have to set $2^m$ independent values to get our superposition. Sounds hopeless? No it turns out to be doable using the very same trick that gave the classical FFT its speed up, the clever composing of exponents. Here’s how it works. Let’s say instead of building up all $2^m$ values of $b^i$ that we need, we only build specific values. Namely

𝑏 0 𝑏 1 𝑏 2 𝑏 4 𝑏 8 𝑏 16 . . . 𝑏 2 𝑚

$b^0\;\; b^1\;\; b^2\;\; b^4\;\; b^8\;\; b^{16}\;\; ...\;\; b^{2^m}$

one for each of the powers of two up until $2^m$ , so we’re back to only needing $~m$ components. We’ll call these our building blocks. Let’s say now we want to compute the specific weight of our quantum state $\lvert i \rangle$ which is $b^i$ . $\lvert i \rangle$ has a bit vector representation. Say for example it’s $\lvert 00101\rangle$ . Every bit here corresponds to a building block and what we can do is take the building blocks for the bits that are high and multiply them together. In this case we would get $b^{2^0}\cdot b^{2^2} = b^{1}\cdot b^{4} = b^{5}$ , since from the right the $0^{th}$ bit and the $2^{nd}$ bit are high, but we could do this for any value of $i$ . If this doesn’t make sense to you, take a moment to convince yourself that this is true.

We will keep this principle in mind as we assemble a circuit that looks like this

The components at the left are what are called Hadamard gates. Essentially each takes a qubit in the $\lvert 0 \rangle$ state and puts it in an equal superposition of $\lvert 0 \rangle$ and $\lvert 1 \rangle$ states. Think of it like the 90 degree rotations we saw in Part 3. The combination of $m$ of these on the left gives us an $m-$ qubit vector that is in an equal superposition of all of its possible states

\sum 𝑖 = 0 2 𝑚 - 1 | 𝑖 ⟩

$\sum\limits_{i=0}^{2^m-1} \lvert i \rangle$

At the bottom we then take these individual qubits and feed them into a series of controlled Z gates, a variant of the CNOT gate that conditionally multiplies its inputs by some factor, in this case the building blocks $b^{2^x}$ . So then in addition to having our first superposition of all the values $i$ in the first $m$ qubits, those qubits will be entangled with all the values $b^i$ in the next $m$ qubits. Or written out mathematically

\sum 𝑖 = 0 2 𝑚 - 1 | 𝑖, 𝑏 𝑖 ⟩

$\sum\limits_{i=0}^{2^m-1} \lvert i, b^{i} \rangle$

where for clarity the comma means I’ve written our state as split up into two quantum registers. Now, here’s the really cool part. The first thing we’re gonna do is to counter-intuitively make a measurement of our second register, the values $b^{i}$ . Doing so will collapse our second register down to a single specific power of $b$ at random, let’s say $b^4$ .

What now remains in our first register? Because the qubits were all entangled the first register has to now produce some value $i$ that would give us $b^4$ in our second register. This could be $4$ , but if the sequence is periodic with period (let’s say) $r=5$ then $i$ could also be $9$ , $14$ , $19$ , $24$ and so on. The input register actually remains in an equal superposition of all the values $i$ that would produce $b^4$ in the second register.

But this is perfect! Now the first register is in a quantum state that is periodic with exactly and only the order $r$ that we are hoping to find. Now some of you will be scratching your heads and saying that this isn’t the original sequence $b^{i}\lvert i\rangle$ that we were aiming for. That is correct. Instead we have homed in on a specific value of $b^i$ and zeroed out all of the other values. For finding the frequency it turns out not to matter. Consider that 1 2 3 4 1 2 3 4 1 2 3 4 and 1 0 0 0 1 0 0 0 1 0 0 0 both have the same frequency.

From here we’ll input our sequence to the QFT, take a measurement, and determine our order with high probability, minus some edge cases. The implementation of the QFT is not too difficult either. It follows the same rules for design and uses only two families of gates, the Hadamard gate and a series of controlled phase gates $R_m$ which is like a controlled Z gate but multiplies by a phase change matrix

(10 0 𝑒 2 𝜋 𝑖 2 𝑚)

$\begin{pmatrix} 1 & 0 \\ 0 & e^\frac{2\pi i}{2m} \end{pmatrix}$

The QFT looks like

We saw a very similar circuit to produce the input to the QFT so I will leave it as a test of comprehension for you to work out how this one works. As a final meditation here you may ponder how elegant it is that the math of complex numbers that we used for rotations also perfectly captures the wrapping around of our modular arithmetic and provides the roots of unity for our Fourier transform.

Beyond the Fourier Transform

There you have it! From the simplest imaginable components like neutrons or photons we are able to build a system that breaks centuries of work in cryptography. I hope you’re now inspired with the same level of curiosity and excitement for this new technology that I have and are well equipped to explore it further. In this last section of Part 4 we will move beyond Shor’s algorithm and survey the general principles of quantum computing. Consider this section a map that will give you a rough idea of the field that you can fill in. Afterwards, in Part 5, we will fill in some of this map by discussing what is going on in the industry at the moment.

First let’s discuss what a quantum computer can’t do. In Part 1 we made the great leap from logic circuits to sequential systems. This leap was what took us from finite arithmetics like adding single digits to performing computations that could take an unpredictable amount of steps. Today there is a quantum analog for logic circuits, the qubits and gates with which we’ve been working, but there’s no notion of a quantum conditional branch. You might imagine that as quantum systems can simultaneously explore states that have different outcomes when measured we could build a system that both branches in one direction of a program and in the other so that it explores all possible branches in a program. I suspect that if decoherence times and the no cloning theorem even allow for such a device to exist we are a long time away from having it.

For now quantum systems merely augment the behavior of classical computers. They introduce hardware that can do certain operations like order finding with much fewer resources but they don’t introduce a fundamentally new paradigm to computation. We still rely on the classical part to do the conditional branching for instance. A good analogy here is how before Intel chips had floating point operations built in you could connect them to a separate floating point co-processor. Your main processor would give an input to the co-processor by setting the co-processors input registers, then the co-processor would chug away and produce an output. Similarly a quantum circuit has input qubit registers that a CPU can set. The quantum circuit will then chug away and produce a result.

In terms of what a quantum computer can do we think of the family of matrices we can construct with logic gates and how they interract with our qubits. These matrices have the constraint that they must preserve the inner products of their inputs known as the unitary constraint. Think of our rotating magnets in Part 3 as an example. If we rotate by 90 degrees then we have a different definition of what “up” is and what “down” is, but if two states were orthogonal before the rotation, they are still orthogonal after the rotation. It turns out we can construct any such unitary matrix using Hadamard, phase, and $\frac{\pi}{8}$ gates which you can explore here more rigorously.

Using these gates experts will classify quantum algorithms into two broad categories, those based on the QFT, and those based on Grover’s algorithm for quantum search. As we have seen the QFT gives a primitive for order finding, factoring, and also gives an algorithm for discrete logarithms. Quantum search on the other hand gives a quadratic speed-up for inverting functions, a major application of which would be improving database search times and other NP problems. Quantum computers are thus not yet very general purpose, but what applications they do have show their immense promise.

Part 5: In Practice Tomorrow

Google Achieves Quantum Supremacy

A final application of quantum computers is their ability to simulate quantum systems. It makes sense that this is not something a classical computer can do well, otherwise we could simulate the QFT and find factors with a classical computer. Quantum systems have a behavior that is complex, in both senses of the word, but which can be mirrored by other programmable quantum systems. Such a speed-up in simulation could be a breakthrough in the production of pharmaceuticals or help in the quest for superconductors to prevent a future energy crisis or even help researchers at the LHC further unlock the mysteries of the universe. Not to mention we could use them to re-secure the internet in a post-quantum world.

It is here that we come full circle to the event we wanted to understand. In October of 2019, researchers at Google achieved a major milestone in quantum computing. They were able to build a 53 qubit computer that could outperform a classical comptuer in quantum simulation. There remains some contraversey from IBM, another major player in this space, as to how big the speed-up is, but this is big news.

IBM for their part has developed a fleet of 53 qubit computers and made it available to developers through their cloud offering. Those of you now wishing to get your hands dirty can dive straight in and start writing your own quantum programs. You should now be armed to explore their tutorial series.

I want to give a special shout out too to Rigetti, a company in Berkeley that is trying to build the first quantum-first cloud. They have open sourced a lot of their work for curious minds like you and me to explore. In particular you may want to check out their quantum virtual machine, an abstraction on top of their quantum hardware and quantum simulators on your laptop that understands the standard Quil Instruction Set Architecture. With the QVM there is a python wrapper library.