Nakamoto Consensus and Bitcoin
Motivation
Centralized Vs Decentralized
In the status quo, when parties A and B want to exchange value, A will authenticate themselves, and inform their bank of their desire to deliver some amount of money to party B. The bank will then take care of subtracting money from A’s account, and adding it B’s. As parties A and B both place trust in an intermediary party (the bank), this is referred to as a centralized solution to value exchange. The question then arises, is it possible to achieve value transfer in a setting in which no party trusts any other party?
In the decentralized setting, there is a set of participating nodes, all of whom are mutually distrustful. There are a number of ways to come to consensus here. Perhaps most obviously, if a majority of nodes agree that some transaction took place, then as long as there are a majority of honest parties, the system can mark that transaction as committed since an honest node knows about the transaction. However, in order to know what a majority is, the number of nodes in the system must be fixed. Therefore, we seek solutions in a dynamic setting, where nodes can be added or removed arbitrarily.
Safety
The main requirement we have for such a system is the safety guarantee that a committed transaction cannot be reverted.
The Consensus Problem
More generally, the consensus problem is given a communication graph, a device assignment for each node and an input for each node. Some fraction of the nodes are allowed to behave in an arbitrary manner. The problem is then for all nodes to decide on a particular ordering of state transitions for some state machine (or in the single-shot setting to decide on a particular transaction). In particular, all correct nodes must agree on the same ledger, and if all correct nodes have the same version of the leger, then all correct nodes must commit to that leger. In Bitcoin, the state is a tamper-evident ledger, and consensus is achieved with relation to ordering the transactions.
The Sybil Attack
As mentioned, in some settings, one approach is to ask nodes to vote for a certain version of reality, and then accept some form of majority vote (eg. a simple majority). How this vote should be given is one area of difference between different systems, with some ideas being obviously insufficient to achieve safety. For example, if a vote is given to each IP address, then any node which can spoof IP addresses will be able to achieve arbitrary power in the system. Such an attack is called a Sybil Attack.
Trust in Resources
Nakamoto’s 2008 paper presents a system in which a vote is attached to a resource, namely a unit of CPU or GPU compute power. His Proof of Work (PoW) system relies on the difficulty of finding the preimage of a hash function to ensure there is at most one leader at a time, and that anyone can verify that authenticity of a claimed leader. The focus on computation also guarantees that as long as a simple majority of the CPU power is controlled by nodes acting honestly, a malicious minority is statistically unlikely to be able to violate honesty.
Overview
Definition of a Coin
In this system, a coin is defined as a set of signed hashes which indicates ownership of the coin. So, if user A has a coin, which they wish to transfer to user B, then it hashes the previous transaction, which gave user A the coin, and user B’s public key together, and signs the result with user A’s private key. The signature ensures that the transaction was actually made by user A, and allows user B to verify that user A actually had claim to the coin. However, this does not inherently prevent double spending attacks. One common solution to this in other systems is to use a mint, or trusted central authority, which checks for double spending by being the sole issuer of coins and the return point for every coin after a single transaction. However, this destroys the advantages of a distributed system since it is reliant on a centralized entity. To prevent against this attack in a truly decentralized manner, transactions must be publicly announced, and a system to decide on a particular order of transactions must be defined.
Useful Hash Function Properties
Transactions in some order define a ledger. Nakamoto’s proposed system uses proof of work to establish trust in a particular version of the ledger. The system establishes trust in nodes (miners) using properties of cryptographically secure hash functions. Specifically, (1) For some cryptographically secure hash function $f$, finding $x\neq y$ such that $f(x) = f(y)$ is computationally infeasible. (2) For any possible output $y$, then for uniformly random $k$ it is infeasible to find $x$ such that $f(k|x) = y$.
Bitcoin uses the specific algorithm SHA-256 heavily in both the proof of work algorithm and in the creation of bitcoin addresses.
Tamper Evident Ledger
The goal here is to create a ledger, or series of transactions, such that it is easy to detect if someone has tampered with the ledger. This is achieved by hashing a block, or set of transactions. Any subsequenct block header hashes the contents of the block concatenated with the hash of the previous block, creating a chain of identifiers, each reinforcing the previous identifiers. A node $A$ maintains its known version of the head of the blockchain (a hash of the most recent block of which $A$ is aware). Thus, when node $A$ is presented with a new block, it can verify that the new block builds off of the same ledger $A$ knows since the collision free property (1) of hash functions ensures that if a block has different data, then it has a different hash. If a block includes a modified transaction then, the hash of all future blocks will also be different (as they include the hash of the previous hashes), unless the headers are also modified. However, modifying the headers in this way will eventually force the hash of the new block to be different from the hash $A$ remembers.
Wallet Addresses
In bitcoin, wallets are essentially simply public keys or hashes thereof, to which a party can lock value. User A sends bitcoins to user B’s wallet in the following way. User A uses his or her private key to generate a digital signature for a transaction which gave A the coin, thus proving ownership over the coin. User A then generates a transaction which locks the bitcoin value to user B’s public key. User B can then create an unlocking and locking script on the coin to spend the value.
Proof of Work and Longest Chain Verification
This tamper evident ledger is exactly what we’re looking for to stop double spending attacks, but in order to do so we need to introduce some sort of procedure that allows for the peer-to-peer nodes to agree that the hashes are valid and for there to not be a massive flood of potentially correct hashes. Imagine if proposing a new block was just trivial – there would be hundreds or thousands of potentially valid blocks coming off of each transaction and forks could take upwards of days to resolve (or just never, if the work was trivial) It might be the case that both forks would have blocks added to them every second - when would one become substantially longer? This is where proof of work comes in, which uses the idea of forcing block proposal to be tied to a computationally very difficult problem – thus ensuring that solutions found that extend a block make forking much more undesirable (since it requires a lot of computation) and an attack difficult once the longest chain is substantially longer than the attack chain. Proof of work is contrasted with proof of stake in the modern landscape, but proof of work fulfills Bitcoin’s goals.
The second property of hash functions motivates a particular computationally difficult puzzle, namely, find a $x$ such that $f(k|x) = y$ where $k$ is chosen uniformly at random and $y \in Y$ for some set $Y$ of output values; note that any particular strategy for achieving this is not much better than randomly guessing values for $x$. The actual proof of work puzzle thus places trust in whoever can increment the nonce in the block such that the hash value of this block begins with a specific number of zero bits (in January 2019 this number was 74). When this is found, the block with incremented nonce is published and appended to the previously confirmed block. Confirmation that this incremented nonce is correct is trivial - simply hash this value and verify that the number of zero bits is what it must be. But finding this incremented nonce can take up a lot of computation (Bitcoin aims for confirmed blocks to be on average every ten minutes, so obviously finding this nonce takes a lot of work).
Description of full proof of work process
Given these various interlocking aspects, the entire proof of work process looks something like this:
- New transactions are sent out to all the nodes on the network.
- The mining nodes coalesce multiple transactions into blocks, making sure that all the transactions within a block are consistent (ie, no attempts at double spending a coin).
- Each node attempts to find the nonce that gives the hash of the block the required number of zeros
- When a node finds a solution, it broadcasts this block and nonce to all the other nodes and attempts to attach it to the blockchain
- The other nodes will accept this block only if the transactions in the block are valid
- In other to express this acceptance, other nodes will try and coalesce transactions into a new block and attach it to the most recent block that has been added to the blockchain
In order to avoid immediate double spending problems, it is common that transactions are required to go through 6 or 7 confirmations (meaning that the block with the transaction is on the longest chain of the block-chain and there are 6 other blocks after it).
51% attack
What can an attacker do if they controlled a majority of CPU resources? What is the scope of the attack that they could muster, and how damaging would this be for the Bitcoin ecosystem?
We first note that an attacker cannot just invent Bitcoin addresses out of thin air. The only way to get more money is as a mining reward or a transaction fee in the event of hosting successful blocks on the block-chain. So if an attacker controls 51% of resources, they could singlehandedly confirm fraudulent transactions and demand high transaction fees from other participants in order to actually have the network move. But these are not actually spoofing Bitcoin, rather just rent-seeking behaviors on a legitimate process.
However, double spending would indeed be possible. After a confirmed transaction from our malicious actor to someone else, he could simply spoof a different transaction from him to someone else.
If the attacker also had a vendetta against some specific person, he could tirelessly make sure that all the blocks that he confirms do not include transactions against that person, effectively icing them out from the network.
Altering odd blocks would likely be impossible as well. Blocks that are old enough are simply hard-coded into Bitcoin software.
This 51% attack has actually been possible once before! In July 2014 the mining pool gHash.IO briefly exceeded 50%, but luckily there were enough honest people in the pool that they voluntarily reduced their compute and swore to never get to even 40% again.
Incentives
As described by Nakamoto, the very first transaction in a block creates new coin[s] that are owned by the block creator. As a result, nodes that compile new transactions into a block and extend the blockchain are rewarded for this work (which requires finding a proof of work) monetarily. This is notably the only source of new coins in the system, since there is no mint.
Furthermore, transaction fees are also used as incentives. These are added to the incentive value encoded into the confirmed block containing the transaction.
These incentives are useful because due to the statistical difficulties in trying to be malicious (discussed in later sections), it is often mutually beneficial to the Bitcoin network and malicious adversaries to work together to construct honest blockchains that record legitimate transactions - the adversaries with large compute get coins out of mining, and the Bitcoin network gets faster confirmations and continued proof of legitimacy
Space Complexity and Merkle Trees
Once the most recent transaction in a coin has been put into a block that is buried sufficiently deep, the transactions before it can be discarded. In order to do this, transactions are hashed into a data structure called a Merkle tree. Old blocks can be removed simply by cutting off branches of the tree . The structure of the Merkle tree is more explicitly described by this figure from the whitepaper (note that only the root is included in the hash of the block).
The block header is about 80 bytes - assuming block generation every 10 minutes, we see that the expected 80 * 6 * 24 * 365 = 4.2MB per year storage requirements are already not an issue especially with Moore’s law broadcasting further storage increases.
A further optimization that Satoshi makes is that verification doesn’t require a full node. Users only need to look at the longest chain and then get the Merkle branch of the tree that links that transaction. By using this link, he can trace the transaction up to a place where a network node has accepted it (and later blocks chained to it increase the chances that the network has accepted it). And so verification is reliable – but this optimization does run into the risk that if nodes rely on other nodes to verify transactions, that if an attacker temporarily controls over half the compute that this verification is not reliable. Satoshi recommends either that users accept alerts from network nodes when they see an invalid block (so that the user downloads the full block) or that businesses heavily reliant on Bitcoin payments run their nodes for independent security.
Privacy
This proof of work methodology obviously makes pure transaction info public, but privacy can still be maintained by making public keys anonymous. Through this, the history of the blockchain would see that someone is sending a specific amount of money to someone else, but only the two people who have shared their public keys would actually know who the parties are. Further optimizations include people using a newly generated key pair for every single transaction – this means that people will not be able to identify a common party in multiple transactions.
Note that there are statistical attacks which can attempt to recover an anonymized network given the identity of some key nodes and similar graphs (see random graph matching).
Statistical Analysis (and probability of reversion)
We now walk through the scenario described in the whitepaper, of an attacker trying to generate a dishonest chain of transactions faster than an honest chain.
This process can be modeled as a random walk, where success in each time-step is modeled as the honest chain extending by one block, and failure is the dishonest chain extending by one block. We see that each of these is analogous to growing/shrinking the ‘gap’ between the number of blocks in the honest chain and dishonest chain
Given this formulation, we wish to calculate the probability that an attacker can ever catch up with a starting gap of $z$ (given potentially an infinite number of time steps) We denote $p$ as the probability that the honest chain extends by one block (and likewise $q=1-p$ as the probability that the attacker finds the next block)
This is described in Section 11 of the whitepaper, but reference [3] notes some errors with the analysis. We will derive the beginning of probability using reference 3 as a guide. One clear error is that Satoshi attempts to calculate the odds that the attacker can catch up, but it is more important that the attacker actually produces a longer block because ties are broken by whichever fork had been received earlier. Whereas if the attacker produces a longer block then this is a concern because it means that miners may start trying to extend that block since it’s the longest.
The problem is first formulated in terms of a gambler’s ruin problem in [3] and then later slightly modified.
First we consider a situation where an attacker’s goal is to win $N$ dollars before going bankrupt, where he wins 1 dollar with this same probability q and loses 1 with probability p. The bankruptcy state is terminal, as is reaching N.
If we define $q_i$ as the probability of victory when initially starting with $i$ dollars, we automatically have that $q_0=0, q_N=1$
We can determine the other probabilities recursively since we have that $q_i = q q_{i+1} + pq_{i-1}$. We also have that $q_i = pq_{i}+qq_{i}$ and by substitution this gives us $pq_{i}+qq_{i}=qq_{i+1}+pq_{-1}$ which we can rearrange to yield $q_{i+1}-q_i = \frac{p}{q}(q_i-q_{i-1})$
As Reference 3 shows, we can build this up from the earliest cases (since it doesn’t make any sense when $i-1 < 0$ to see that $q_2-q_1 = \frac{p}{q} q_1$ and that $q_3-q_2 = \frac{p}{q}(q_2-q_1) = (\frac{p}{q})^2 q_1$
This can easily be extended by induction to show that $q_{i+1}-q_i= (\frac{p}{q})^i q_1$
We can write $q_{i+1}-q_i$ via telescoping as $(q_{i+1}-q_i) + (q_{i}-q_{i-1}) \dots + (q_2-q_1)$ and then reuse the inducted equation to get that this value equals $\sum_{j=1}^{i} (q_{j+1}-q_j) = q_1 \sum_{j=1}^i (p/q)^j$ We isolate $q_{i+1}$ as $q_1 \sum_{j=0}^i (p/q)^j$
We sum this geometric series to get Eq. 13 and then solve for $q_N$ since we know that equals 1, to get us a closed expression for $q_1$. From here we substitute that back into Eq 13, to produce Equation 17 in Reference 3 for the closed form expression of $q_i$
In order to produce Nakamoto’s formula written (where the attacker has infinite resources) we assume that this is the same as the Gambler’s Run problem, but the gambler starts with $y$ dollars and ends either at $0$ or at $y+z$ We substitute with these conditions and take the limit as $y \to \infty$ which implies infinite resources and this directly produces the formula that Nakomoto writes in his section.
Further math is validated in this reference, with the change of calculating $Q_{z+1}$ instead of $Q_z$ due to the error previously noted. The Poisson error rate process is fully fleshed out in the reference, with the code he used replicated for diagrammatic purposes to notate the different block depths required given the attacker’s mining power and the probability that the attacker succeeds in his overall goal. The model is also validated using Monte Carlo simluation to simulate mining and the Gambler’s ruin, not using any of the equations to predict the winner. The results match what was derived.
Conclusion
Through Nakamoto’s foundational white paper, he has constructed a proof of work scheme that allows for digital transactions that do not require a central authority. It merely requires that honest nodes comprise more than a majority of CPU power, and does this by making it computationally intractable for an attacker to change the public history of transactions (block-chain) that encodes the production and transfer of the ‘coins’ of the system. This is furthermore done with minimal coordination between nodes, allowing nodes to be offline and catch up to current history when they return. Incentives are encoded into this proof of work protocol that allow the maintenance of this network to actually be profitable for verifiers, paving the way for the adoption of Bitcoin.
References
[1] https://nakamotoinstitute.org/static/docs/bitcoin.pdf [2] https://arxiv.org/pdf/1701.03977.pdf