Understanding Blockchains (and Bitcoin) – Part 2: Technology
In our previous post in this series, we provided a conceptual view of blockchains, using its implementation in the Bitcoin ecosystem as a way to explain what it means and how it works. In this post, we’ll probe the concepts at a deeper, more technical level again using Bitcoin’s mechanisms as the example. Such details are useful when progressing further, as we shall do in future posts, on the use of blockchains in many other domains that go beyond transacting with crypto-currencies such as bitcoins.
We’ll do this using a layered architecture to help separate the different pieces of the general system, so as to better explain how these vary when moving from one blockchain-based system to another.
Blockchain Architecture
The following picture shows a general architecture of a blockchain based system. It isn’t shown this way in any of the formal blockchain literature, but this author finds it useful to break down and compartmentalize the different aspects when thinking about the differences between various blockchain-based systems.
The top most layer represents Applications that can be built using the underlying Decentralized Ledger maintained on a Peer-to-Peer network. In the case of Bitcoin, such applications are very simple as the entire architecture supports one straightforward function – the exchange of bitcoins for commerce or speculation. An example is a Bitcoin wallet, created to represent a user’s unspent bitcoins. Blockchains such as Ethereum allow for more complex applications, which we shall discuss in our next post.
The Decentralized Ledger block consists of many functions that ensure that the single, global ledger remains consistent and tamper-proof. The fundamental structure of the ledger is the blockchain, where transactions are grouped into blocks, each block cryptographically linked to its predecessor to form a chain. Transactions represent some exchange of tokens between participants, which in Bitcoin consist of bitcoins moved between different addresses. (Ethereum’s block chain records, among other things, the exchange of its native token called Ether to perform more complex actions than merely moving crypto-currency from a sender to a receiver.) Transactions, before being considered legitimate, undergo validation by all nodes.
The process of grouping transactions into a block that is added to the end of the current blockchain is called mining. To ensure that there is consensus amongst all nodes on which blockchain is the legitimate one, a proof-of-work algorithm is used by Blockchain when mining blocks to determine which chain has required the greatest cumulative effort to build. (Ethereum is planning to move to an alternative consensus mechanism called proof-of-stake. We’ll go into that in our next post.)
The bottom substrate of this architecture is the peer-to-peer network, with different types of nodes playing different roles in the system. To update and maintain the Decentralized Ledger, these nodes exchange various messages.
The next sections will provide details on all these aspects.
Applications
Wallet
A Bitcoin wallet is a software application that creates and stores private/public key pairs that allow users control over their unspent bitcoins. It does not store bitcoins, and, indeed, there is no such thing as stored bitcoins. Bitcoins exist when exchanged via transactions, and a record of every transaction (essentially a record of the change of ownership) since the creation of the Bitcoin system are kept in the decentralized ledger, the Bitcoin blockchain. The wallet software uses these records to create a human-understandable facade that aggregates your unspent bitcoins so that the wallet owner can continue to use familiar expressions such as “I have 30 bitcoins” and “I got 2 bitcoins from Bob”.
The preceding paragraph will make much more sense after you read the next section on Bitcoin transactions.
Distributed Ledger
Bitcoin participants and transactions
Bitcoin’s distributed ledger records the exchange of bitcoins between participants in the Bitcoin system. While such participants are ultimately individuals, Bitcoin is entirely indifferent to their identity. Bitcoins are attached to Bitcoin addresses, and bitcoins are moved from one address to another in a transaction.
An address in Bitcoin is a private/public key pair. The private key is any appropriately chosen random number that software (such as those in wallets) can generate. The corresponding public key is generated by using the well-known Elliptic Curve Digital Signature Algorithm (ECDSA), allowing the keyholder to sign something with the private key and have it verified with the public key. A bitcoin address is obtained after further manipulations[1] of this public key.
The net result is that bitcoins associated with a particular address are said to be “owned” by the holder of the private key, which is used to sign any transaction involving spending these bitcoins. Unlike regular currency, the bitcoins at an address must be spent in their entirety. Thus, the owner may have to cobble together the contents at various addresses she owns to pay for something, with the possibility that she might have to redirect any overpayment (the “change” received) to another address she creates/owns.
Spending bitcoins involves providing proof of ownership of the amount in question together with a receiving address and any conditions that the receiver must fulfill to spend it in a subsequent transaction. Such conditions typically show proof of possession of the private key corresponding to that address (which is, as footnote 2 shows, a sequence of various hashes of the address’ public key). This is called “Pay to Public Key hash”. There can be more complex conditions, called “Pay to Script hash”, where the conditions are provided in a script written in an obscure scripting language with arcane syntax and limited expressiveness.
Thus, a simple transaction where A transfers a total of X bitcoins to two recipients, P bitcoins to B and Q to C, is represented schematically by the following picture.
Any node which creates a transaction broadcasts this to its peers, each of whom validates the transaction by ensuring, among other checks such as syntax, that (1) the bitcoins are unspent output; (2) that it belongs to the holder of the public key associated with the sender’s address; and (3) the amount spent is greater than or equal to the amounts received[2]. Only valid transactions are further propagated, to avoid spam and denial of service attacks.
Thus, every bitcoin in existence has a past and can have its lineage traced back through transactions recorded in the blockchain all the way back to its creation. However, unlike a conventional banking or credit card system, the valid bitcoin transaction is not immediately recorded (or “cleared” to use banking parlance) in the decentralized ledger. Only when it is would this transaction be considered confirmed in the eyes of all the participants of the Bitcoin system, with each unspent transaction output (see the right-hand side of the figure above) available for spending.
The next sections describe how valid transactions are recorded in the ledger.
Block structure
As there is no central point in the Bitcoin network which maintains the definitive record of all transactions, there must be a way for valid transactions to be incorporated into the decentralized ledger – the blockchain – so that the unspent transaction outputs are now confirmed by all members of the Bitcoin ecosystem as spendable bitcoins associated with new their respective “owners”.
The first step in the process for recording transactions in the ledger is the act of grouping new, valid transactions into blocks. Gathering new transactions and creating a block that can be appended to the blockchain is something that can be attempted independently by any participant in the Bitcoin network that has sufficient computing power. The node that does this is called a miner, and the act of creating a block that meet the criteria for being attached to the end of the blockchain is called mining.
We’ll describe how mining works in the next section, but here we concentrate on the structure of the block. We show a simple diagram of a block structure in our previous post. Here we flesh it out to prepare for topics to be elaborated in subsequent sections.
The figure below shows the schematic structure of a Bitcoin block.
We’ll leave the discussion of all the other fields in the Block header above to the next section (as these are used when mining) and concentrate on the Merkle root field here. A Merkle tree of all the transactions is constructed by taking pairwise hashes of the hashes of adjacent transactions and repeating this process with the results obtained in the previous step until we are left with a single hash – the Merkle root. This hash value, which effectively summarizes the entire set of transactions included in the block, is stored in the indicated field. A schematic figure showing the Merkle tree of transactions is shown below.
The Merkle root summarizes all the transactions gathered within this block and is used as an efficient way to determine if a particular transaction is included within a block. Not all nodes, for example those only with wallets, may wish to keep a record of all transactions comprising the blockchain. However, they can readily download any block header of interest from a full node and request the intermediate hash values of those branches of the Merkle tree leading from a particular transaction of interest to the block’s Merkle root to verify if that transaction is included in the block. (This is how wallets sum up the unspent outputs associated with addresses it “owns” to create a bitcoin “account” artifact.)
Mining blocks and proof-of-work
Before plunging into the details of mining, it is important to understand the rationale. It goes back to the nature of a decentralized ledger in a peer-to-peer network that we described in our previous post, where no single copy can claim to be the absolute master copy. While a majority of nodes may have identical copies, there will always be others that haven’t caught up. In a peer-to-peer network, such nodes are expected to eventually catch up so that the view of the state of the ledger is consistent throughout.
However, if, in addition, the nodes are not known to each other and therefore untrusted, there may well be nodes that have a different version of the ledger for nefarious reasons – manipulated transactions, double spending of the same bitcoins, etc. How, then, can a consistent view of the state of the decentralized ledger – the blockchain – be achieved with untrusted peers?
The Bitcoin system uses a proof-of-work mechanism to achieve this common view amongst untrusted participants. In a proof-of-work system, some non-trivial computing power is expended as a means of establishing the right to do something[3]. Those unwilling to expend the effort are automatically excluded from effecting changes. Amongst those that do, the effort with the greatest proof-of-work expended is considered to be the candidate around which consensus builds. To apply this principle to the Bitcoin blockchain, the blockchain that was created with the greatest cumulative effort expended is the one most nodes consider the “latest” and to which new blocks are added. In other words, the longest blockchain in the system is the accepted state of the Bitcoin ledger as it required the greatest effort to create. We have more on consensus in the following section.
The reader will, hopefully, now better understand the purpose of the additional fields in the block header (shown above) in the context of the Bitcoin proof-of-work. The proof-of-work is to guess by brute force the “nonce” field in the block header which, when combined with the other fields in the header (the Merkle root of the included transactions, the timestamp, the difficulty target and the fingerprint of the last accepted and valid block) hashes into a result which starts with a specific number of zeroes – determined by the difficulty target field.
The Bitcoin system is “tuned” such that a block can be mined roughly every 10 minutes, but if the pace slackens the difficulty target (basically the number of zeros sought in the above hash output) is lowered, and the opposite if blocks are mined faster.
Of course, not all blocks in the process of being mined get added to the blockchain. This happens if a miner is preempted during his mining computations by notification of another successfully mined block. The disappointed but ever-hopeful miner starts the process again with a new set of transactions.
Consensus
As proof-of-work in Blockchain, miners compete with other miners to gather valid, new transactions into blocks and then attempt to “solve” the straightforward but computationally expensive hashing “puzzle” related to the block that we described in the previous section. The first miner to broadcast the solution to his puzzle can expect to have other nodes validate the solution and then add the successfully mined block to the end of the blockchain. This version of the blockchain is thus rapidly propagated throughout the peer-to-peer network, and miners now start work on mining new blocks to append to this blockchain. In effect, the consensus protocol is that the current longest blockchain in the system is the state of the Bitcoin ledger.
The successful miner is rewarded for his effort with some newly minted bitcoins[4]. While this is a reward for that miner, it is also an incentive for all miners to continue to support the system by attempting to mine new blocks to add to the blockchain and possibly be rewarded in future.
For a malicious actor to create an alternative blockchain in competition with the current consensus one would mean expending the same or more effort to build a longer blockchain. So why wouldn’t one expend that effort? This is best put in the words of the creator of the Bitcoin, the pseudonymous Satoshi Nakamoto in his original 2008 paper: “If a greedy attacker is able to generate more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate more bitcoins. He ought to find it more profitable to play by the rules, such rules that favor him with more coins than anyone else combined, than to undermine the system and the validity of his own wealth.” Thus, the balance between mutual interest and the increasing cost of subverting the system provides the necessary defense against misbehavior. The correctness of this proposition has not been disproved thus far in the field, although there are arguments both in support and against.
Given the widely dispersed nature of the Bitcoin network, far-flung miners might successfully mine blocks nearly simultaneously. This can temporarily lead to separate blockchains. However, it can be shown that the chains converge to a single one within a short number of newly mined blocks as knowledge of these blocks are propagated (at different speeds) by the peer-to-peer network and are appended to the chains at different times so that over time one chain starts growing faster than the other. As this happens, more and more nodes consider this the consensus blockchain to which miners start adding new blocks until its status as the blockchain is firmly established. (The approximately ten minutes time taken to mine a block helps ensure that such forks are infrequent and, if they occur, are resolved within a few blocks length.)
Blocks that are orphaned after being removed from a rejected branch effectively have their transactions returned to the pool to be mined.
Finally, it has by now become a convention that a transaction which is embedded in a block that is at least 6 blocks deep in the blockchain is considered confirmed and its unspent transaction outputs available for spending. The idea is that the proof-of-work effort expended to create this depth of the blockchain makes the possibility of having an alternative blockchain where this transaction is reversed or changed extremely unlikely.
Forking
This consensus mechanism based on the longest chain rule also provides for decentralized governance of the Bitcoin ecosystem. As there is no central authority that governs the working of the system[5], it is possible for participants to agree (or disagree) on some change to the existing protocol. If the majority of nodes were to accept this change, new blocks would be created using the new rules and, over time, create the longer chain. Any blocks created per the previous protocol would lose the race to mine new blocks and the old chain would be orphaned. However, it would be misleading to think that the power lies with miners to decide whether to accept the change. If other nodes (representing buyers and sellers of bitcoins for purchasing goods or speculation) were to not accept the blocks mined per the new rules and not add it to their copy of the blockchain, and there were miners who continued to mine blocks conforming to the old protocol, the original blockchain would remain the consensus choice. In such a situation, miners conforming to the changed version would not find the necessary economic incentive to persist and the changed version would die a natural death. Thus, consensus by the majority of participants (not the majority of a particular type) on what the correct blockchain is provides the means by which differences are settled.
Of course, it is possible for communities to develop and harden around each type of blockchain, with participants only recognizing their preferred blocks and the chain made from these. This results in what is called a hard fork of the crypto-currency. The imminent possibility of a hard fork for Bitcoin has just been narrowly averted. There was a change proposed by some participants in the Bitcoin ecosystem, for reasons that have to do with scalability, to change the block size from 1 to 2 megabytes. If the majority of miners had accepted this proposal and start mining blocks of this size which were then accepted by some significant population of users of bitcoins, there would have been a hard fork of the “currency” around mid-November this year.
Peer to peer network
Note from our discussion thus far that no confidential information is transferred over the Bitcoin network – no account numbers, no account holder names, passwords, etc., that are common in banking and credit card transactions. All transactions are visible to every node. Thus, the peer-to-peer network needs no additional protection and can be built on any physical infrastructure and is just an overlay on the internet.
Any computer can be a Bitcoin node. All that is required of a full node is that it have sufficient storage and computational power and can access other such nodes over high bandwidth links. Mining nodes have additional requirements. The computation power needed for mining has grown to the point where ordinary computers are no longer sufficient, and bitcoin collectives have formed that pool together resources and use special ASICs to do the necessary proof-of-work. There are also light-weight ones, called Simplified Payment Verification (SPV) nodes, that do not host the entire blockchain but simply check for various transactions. Such SPV nodes are typically wallets. Finally, there are payment nodes (typically hosting multiple wallets) that act as gateways to external systems such as a credit card or banking networks through which users buy and sell bitcoins using regular currency.
At startup, a Bitcoin node with the Bitcoin core software discovers its peers either via DNS populated with some well-known Bitcoin IP addresses or one that it has been provided. From these, it can find other Bitcoin nodes to which it also connects. As peers come and go, a node has to maintain connectivity to several at any time to ensure resilience of the peer-to-peer network.
Once connected, a full node will download the entire blockchain by exchanging appropriate messages with its peers. It avoids overloading any one peer by downloading portions from each and then reconstructing the entire chain. It will also retrieve missing blocks from its peers after having been offline. SPV nodes download only block headers, but not the transactions.
Summary
Bitcoin has developed into a self-contained and consistent system for the exchange of bitcoins. Given its continued growth since 2009, it provides far more than a proof-of-concept on how a decentralized ledger with consensus amongst untrusted peers can be achieved in practice. It has become the template from which new forms of blockchains are being developed, with ambitions far greater than its more modest goals. We will describe such efforts in future posts, using our architecture diagram to show where and how these differ from Bitcoin.
We have provided a technical overview of the Bitcoin system at a reasonable level of depth. It would take many more pages to delve even deeper into the formal details of Bitcoin and its blockchain. Readers whose appetite for blockchains and bitcoins are whetted by these posts can indulge themselves in further explorations via the resources at the Bitcoin Wiki and numerous helpful articles on the Web, including the original paper which started all this.
Have a question? Ask Erik
References
[1] This includes performing a SHA-256 hash of the public key followed by a RIPEMD-160 hash. The result has 0x00 added as a prefix and a checksum (created by hashing with SHA-256 twice and taking the first four bytes) as a suffix. The resultant is then converted into a Base58 string. This ensures that the bitcoin address, while not exactly memorable, is made more compact and is not susceptible to typing mistakes!
[2] It is typical for some portion of the difference (excluding the portion returned to the sender as “change”) to be given as a transaction fee to the miner who includes this transaction in a block that is subsequently added to the blockchain. Leaving out a transaction fee may cause a transaction to remain in a state of limbo or have its inclusion into a block being mined delayed. (We discuss the process of mining a little later.)
[3] One of the earliest applications of proof-of-work was a proposal to discourage spamming by making email senders expend some computational resource for every sent message. The cost of this would be trivial for ordinary users but substantial for spammers, thereby providing the necessary deterrent.
[4] While not shown in the figure of the block structure, the first transaction in a block is a special one called a “coinbase transaction” where the miner adds his potential reward called a block subsidy – a specific number of new bitcoins (i.e., one having no sender address) set by the overall Bitcoin protocol and currently 12.5 bitcoins, a not inconsiderable sum at current valuations. Assuming the block is successfully incorporated into the blockchain, the miner has to wait until 100 more blocks have been added to the blockchain before being able to spend his reward.
[5] There is a core set of developers who manage the open source project that maintain and expand the technical aspects of the Bitcoin protocol.