Understanding Blockchains – Part 3: Ethereum, or moving beyond Bitcoin

November 20th, 2017

Readers of our previous posts in this series (part 1, part 2) should by now have a reasonable understanding of what a blockchain is – its rationale, and how it is created and maintained. For continuity, it is important to reiterate that a blockchain is a decentralized ledger (a record of any sort of information) where every node in a peer-to-peer network maintains a copy of that ledger. Even though nodes may go out of synch with their peers from time to time, the ledger is eventually made consistent so that there is a common agreement on its contents.

Blockchains differ from each other on the types of entries made to the ledger and how such entries are made, validated and confirmed by all the participants in the system to achieve the common and tamper-proof record. For public blockchains, the mechanisms should ensure that these properties hold even when the participants are not always available, known to each other and, perhaps most important, who may not trust each other.

We explained these points using Bitcoin as an example of a successfully deployed public blockchain that records transactions of that eponymous currency. In part 2 of our series, we also used the following architecture picture to separate out and describe the different components that make the entire Bitcoin system tick.

The Bitcoin blockchain is an example of a ledger that records transactions – the transfer of ownership of some token (e.g., bitcoins) from one participant (represented in Bitcoin by an address) to another. This is sufficient for Bitcoin as its purpose is quite limited – the need to create a record of pseudonymous ownership of the crypto-currency that is decentralized, validated and immutable. However, as can be imagined, many have come to see this as quite a limited use of the very powerful tool it created to effect this, and the infrastructure that has grown to support it.

Several attempts have been made to expand Bitcoin beyond this narrow use. Of these, the most successful approach has been Ethereum which creates a platform built on blockchain technology over which a wide range of applications can be run. Ethereum’s success has been to decouple the crypto-currency aspects of Bitcoin from the underlying technology of blockchains, thereby unleashing the enormous potential of the latter to serve many more use cases. In the remainder of this post, we’ll describe Ethereum and how it provides such a platform. In future posts in this series, we’ll show examples of how this platform is exploited by various industries.

As in part 2, we’ll describe Ethereum using the above architecture diagram as a way to separate the different aspects of the system. Comparisons with Bitcoin are inevitable and will be helpful; thus, we will assume that the reader is familiar with the concepts of blockchains (covered in part1) as well as technical details of Bitcoin (covered in part 2). We also beg the reader’s indulgence if we postpone discussing topics to later sections, as it is often difficult to motivate aspects in the upper “layers” of this architecture without the necessary motivation and background that can come only after some parts of the infrastructure have been exposed.

A consequence of this is that we shall discuss Applications last, after the reader has gained a better grasp of the platform upon which applications run.

Decentralized Ledger

Participants, Tokens and Transactions in Ethereum

The participants in the Ethereum ecosystem are accounts, of which there are two types: external accounts and contract accounts. Thus, unlike Bitcoin, the principal item of record in the blockchain is the state of an account – in fact, the state of every account in the entire system. (The state of an account is changed by transactions, on which more below, which are also recorded on the blockchain for reasons which will become clear as we proceed.)

Accounts have an address and contain tokens, called ether, the native crypto-currency of this system.  While ether can be used as a crypto-currency like bitcoin for commerce or speculation, its most important use is to “fuel” the internal workings of the system, as we shall explain later. An external account address, like Bitcoin’s, is also derived from the hash of a public key (of a private/public key pair) except that the hash algorithm and subsequent steps to derive it are different.

The figure below shows the different type of accounts and their interactions in Ethereum.

External accounts (EA – at addresses governed by a private key) interact with each other by sending signed transactions to transfer ether, much the same way as Bitcoin (see T1 and T4 in the figure above). External accounts can also create new contract accounts (henceforth contracts, for simplicity) by sending a signed transaction (see T3) to an unspecified recipient containing some code which, when activated, represents a sequence of steps that should be executed. A contract account (CA) can be activated by a signed transaction from an external account (see T2), whereupon it executes it code. Activated contracts can “call” other contracts, if needed, via messages (see M1), but cannot call each other without a trigger from an external account (thus, M2 is not allowed).

At any point in time, the Ethereum blockchain maintains the complete state of all the external accounts and the transactions they invoked since the creation of the system. We’ll explain this in more detail in a later section. As the reader might suspect, the purpose of such an account-based system (as opposed to a purely transaction-based system such as Bitcoin) is to allow more complex interactions where accounts (representing the end users behind them) can execute and record via irrevocable computation steps that mimic aspects of human constructs such as legal, financial or business agreements.

For example, an external account can create a contract to pay a certain amount to another external account when certain conditions are fulfilled. After that transaction has been confirmed and recorded in the blockchain, the counterparty can look therein to be reassured that the ether has been deposited with the contract and, indeed, that the contract was coded as agreed. Later, when the conditions are met, which may require interactions between the contract and some mutually agreed upon neutral entity (represented by another external account and coded into the contract), the ether held in “escrow” by the contract is distributed as promised.

Another example might be a simple payroll where a contract distributes the ether contained in a transaction to several accounts in some proportion. Yet another might be the crowdfunding example where different accounts send ether to a contract. This contract is written such that the collected amounts are sent to another account (that of the beneficiary) if the total reaches a certain value. If it does not within a certain time, the code ensures that contributed amounts are returned to the funders. Thus, some straightforward financial constructs such as escrows, disbursements, etc., can be carried out by Ethereum in a transparent and automated way directly between the participants without the need to involve entities outside the system. The state of any of these contracts, recorded in the Ethereum blockchain, are always visible to all the participants in the system and cannot be altered afterwards.

Contracts in Ethereum are written in certain scripting languages which compile to bytecodes that run on the Ethereum Virtual Machine, the sandboxed execution environment that is present at every Ethereum node. Of these, the Solidity scripting language, which resembles JavaScript, is the most popular.

One final topic to round out this section: the format and contents of a transaction. An Ethereum transaction is shown schematically in the following picture (with certain aspects not essential to understanding at a conceptual level omitted):

All transactions include certain fields. We’ll postpone discussing the Gas Price and Start Gas fields until a later section on Mining. As for the rest, the Nonce is a unique transaction identifier, incremented by one as the node issues transactions. Transactions are directed to another account (To), with a certain amount of ether (Value) and attested by the sending account owner (Signature). When a transaction creates a contract, the recipient’s address (To) is not yet known and left empty. The code to be run is included (Code). Subsequently, the code in the contract might be run with some input values (Data).

Blocks in Ethereum

An Ethereum block stores not only some new transactions that have been collected since the last block in the blockchain (just as with Bitcoin) was mined, but also the changes in the states of the accounts in the system affected by the included transactions. The block header includes these in a root hash value which summarizes the state changes and is calculated using a more intricate variant of the Merkle tree called a Patricia tree. The need for such a (non-binary) tree is because Ethereum’s state is quite complex, involving account states, contract states, and their stored data states (which are also represented by trees).

The following figure is a schematic representation of block header with such a root hash value.

Looking at the above figure, note how each block contains the hash of the previous block – the key aspect of creating the blockchain. The state root in block N contains a summary of the state of all the accounts (external and contracts) in the entire Ethereum system. Let’s assume, for simplicity, that by the time the new block N+1 is mined and attached to the blockchain, there has been only one change in the system – an addition of 5 ethers to account A. This is shown by the newly computed hashes (in green) that lead up to the new state root. The Patricia tree structure now shows its utility because it allows for the quick calculation of the state tree root after a change without the need to compute the entire tree.

The reason for providing the state root in a block is that, as a part of accepting this new block and confirming the transactions within it, all nodes comprising the Ethereum system must arrive at the same view of the state of the entire system. Thus, each node in the system, as a part of validating a mined block before including it in the consensus blockchain, must independently run through the new transactions and computations (if some of these transactions are directed at contracts), update the individual states of all accounts and arrive at the same end result.

The Ethereum block contains more fields, some of the more interesting ones of which are shown schematically in the following figure.

We discussed the State Root earlier. Some of the other fields shown are reasonably obvious: The beneficiary address represents the account of the miner to whom the block reward for mining the block should be sent. The miner uses the difficulty target and nonce to do the proof-of-work needed to successfully mine a block.

The transactions root, just like that in the Bitcoin block, summarizes the set of transactions included in this block. The additional hash value – receipts root – contains a summary of all events that were fired and logged during the processing of a transaction. These logs can be useful when trying to search for the occurrence of particular events (e.g., how many votes have been submitted to a voting contract) without having to comb through a lot of transactions.

There are a few headers related to gas and the ommers hash, which we’ll discuss in the next section where we motivate and explain the features where these are used.

Mining in Ethereum

The time has now come to explain the concept of gas. Recall that a Bitcoin miner who groups new transactions into a block and provides the “winning” proof-of-work is rewarded with newly minted bitcoins as well as the fees from the transactions included in the block. The other equally industrious miners get nothing, and continue mining in the expectation of a future reward.

Ethereum is a much more equitable system by design. Miners in Ethereum do many more computations and require more storage than merely recording transactions, as they have to run the contracts invoked by any transaction they choose to include in the block being mined so as to ensure the appropriate state changes in the affected accounts. This is particularly important as Ethereum transactions can be arbitrarily complex[1]. Gas is the measure of computation fuel needed to run an Ethereum transaction. Payment of gas (valued in ether) by the initiator of a transaction to a miner is how the Ethereum system is kept running. Using and paring for gas, as we explain below, ensures that computations are reasonably bounded and discourage rogue behavior.

The Ethereum system was started in 2015 with a certain amount of ether. Moreover, some new ether is created through mining as reward to the successful miner and a smaller amount for some other miners (called ommers, on whom more later) who successfully mine blocks are not chosen for inclusion in the consensus blockchain.

Gas

Looking at the earlier figure of a transaction, the start gas and gas price fields can now be better explained. Every computation step in the Ethereum Virtual Machine has a cost in the amount of gas consumed (based on the number and types of computation as well as storage required) and these can be used to calculate the cost for executing the transaction[2]. The gas price is the cost of a unit of gas in ether[3]. The start gas field provides the maximum amount of gas that the transaction initiator is prepared to pay, effectively limiting the number of computation steps that can be taken. This value is presumably estimated to ensure that the transaction completes. An amount (= start gas X gas price) in ether is removed from the initiator’s account at the start of execution.  If the computation steps complete with gas remaining, the residue (gas_rem) is sent back to the initiator, while the miner gets rewarded with [(start gas – gas_rem) X gas price] ethers. If gas runs out whilst in the middle of the computation, the steps are reverted but the miner keeps the entire amount set aside at the start of the transaction. This discourages contract initiators from low-balling their computation costs.

All this complexity is, apart from the fairness in compensating those participants providing the most resources, the need to protect against denial-of-service computational attacks within contracts such as infinite loops and other transaction execution starvation methods. The need for gas (especially if the gas price is large) to execute transactions also discourages poorly written contracts, creating spam contracts or creating those that don’t really need to be put on the blockchain when these have no need for the benefits of decentralized consensus.

The block header (see the preceding figure) also includes two gas-related fields. The field gas used is the sum of the gas expended by each of the transactions included in the block. This is used during the validation of the block. The gas limit field ensures that the total gas consumption of all the transactions included in a block does not exceed this value. This value is chosen to ensure that a reasonable number of transactions are included in the block and the process of validating it is bounded. (In Bitcoin, blocks are bounded by a block size limit, whereas in Ethereum it is the block gas limit.)

Ommers

Another fairness algorithm in Ethereum owes its presence to the fact that Ethereum blocks are chosen to be mined at approximately 15 seconds intervals (compare with Bitcoin’s roughly 10 minutes). This allows transactions to be confirmed faster – remember that it isn’t just the transfer of cryptocurrencies that’s being transacted, but the execution of contracts – with the side-effect that multiple miners are likely to simultaneously find valid blocks for inclusion in the Ethereum blockchain. Such valid mined blocks that don’t make it into the main consensus chain (which we’ll describe in the next section) are called ommers, supposedly the gender-neutral word for uncles and aunts. (Most dispense with this nicety and just use the more common term. Thus, you’ll mainly see the word uncle blocks used in the literature.)

The following figure shows the relationship of the ommer blocks to the blocks in the main chain.

Looking at the right-hand side of the consensus blockchain, an ommer block is one that shares the same parent as the latest block’s parent. Ommer blocks that are within six generations of the latest block in the blockchain are referenced in the ommers hash field of the block header (shown earlier). Miners of ommer blocks are paid a smaller fee (and diminishing based on the distance from the head of the blockchain) for the effort of mining and aiding in the overall resilience of the consensus blockchain. (In Bitcoin, by contrast, miners of orphaned blocks are out of luck.)

The payment to orphaned (ommer) blocks that didn’t make it into the consensus blockchain, which independent miners are more likely to experience, ensures that there is an economic incentive for them to participate and hope that they will likely be rewarded for their efforts. This is another way to reduce the power of mining centralization, which we discuss in the next section.

Proof-of-work and Consensus in Ethereum

Ethereum’s proof-of-work is (at this time) similar to that in Bitcoin, using brute force to solve a “puzzle” involving hashing that has a certain difficulty target, as identified in the block header. However, it has many more steps and uses a different algorithm called Ethash. The rationale for a new algorithm is to ensure that Ethereum avoids the pitfall that befell Bitcoin, whose proof-of-work hashing algorithm has become too dependent on custom-built, expensive hardware (ASICs) used by a few mining pools which run large farms of these and thereby effectively control the mining function[4].

Ethereum’s algorithm is chosen to favor CPUs that can move data around faster in memory rather than those that can perform calculations faster. This swings the mining power back to ordinary miners armed with CPUs and GPUs readily found in commercial, off-the-shelf hardware that are optimized for such memory-and-bandwidth intensive tasks. The payment to orphaned blocks (described in the previous section on ommers), which came close to being included in the consensus chain, also incentivizes ordinary miners to participate in the hope of being (at least partially) rewarded for mining.

Ethereum’s consensus mechanism requires the inclusion of ommer blocks to the blockchain to add greater resilience when choosing the consensus blockchain. Instead of just counting the longest chain, as in Bitcoin, to represent the consensus blockchain on the grounds that the most effort has been expended to produce this, the Ethereum blockchain also includes the effort expended by ommer (orphaned) blocks whose effort in mining off an earlier consensus chain is (partially) rewarded even if not ultimately included. Thus, miners of ommer blocks are not tempted to continue to mine further to extend a competing chain based on their mined block.  Reducing the competition for alternatives strengthens the main blockchain.

Such a consensus technique also guards against a potential attack possible in Bitcoin where a group of miners selfishly withhold mined blocks, revealing these at some particular moment when this hidden chain is longer than the current consensus chain. This replaces the current chain, with the “selfish” miners gaining all the rewards. The game theory based motivation and “proof” of the validity of Ethereum’s consensus mechanism against various attacks, formally called Greediest Heaviest-Observed Sub-Tree (GHOST), has withstood the test in practical deployments thus far.

There is, however, one aspect of mining which continues to be a source of concern. It deals with the energy required to mine blocks. Bitcoin’s mining costs are enormous, which can only be afforded by well-financed mining pools. Ethereum’s needs are less but still large, and there are concerns about the inefficiency and ecological aspects of using such massive amounts of electricity for what to many seem like useless and throw-away computations (hashing). In the coming years, Ethereum expects to move to a new “green” system to replace its current proof-of-work, called proof-of-stake.

In simple terms, proof-of-stake requires no energy-intensive computational puzzle solving for mining the next block to be added to the chain. Participants (called validators) who wish to take part in the process of progressing the blockchain send a deposit (their “stake”) in ether to a specific contract. The next block is chosen either in a round-robin fashion amongst these validators or by voting (in proportion to their stake) on what should be the next block. Rewards are distributed amongst the validators according to some formula, while badly behaved validators have their deposits taken away and are banned form further mining. Work on finalizing the details are still ongoing.

Peer-to-Peer Network

A new Ethereum client at startup discovers its peers by connecting to some hard coded, highly available “bootstrap” nodes. The bootstrap nodes inform the client of others nodes which have connected with them in the past. The client peers with those, dropping the connection to the bootstrap nodes.

Ethereum uses an encrypted messaging protocol on top of TCP/IP called RLPx for communications between nodes, including the exchange of transactions. Thus, state-changing communications are protected against tampering. RLP, Recursive Length Prefix, is also the name of the encoding scheme for the data in the messages.

All nodes contain the Ethereum Virtual Machine, while full nodes (which include mining nodes) maintain the entire set of accounts, transactions and state changes since the start of the system. This ensures that these nodes can do the necessary computations and validations needed to all arrive at the same, consistent view of the entire system.

Applications

As Ethereum describes itself as a “blockchain application platform”, its principal purpose is to provide the infrastructure on which applications can run. Unlike Bitcoin, whose blockchain really is a single purpose ledger for recording transactions that exchange bitcoins, Ethereum allows for complex application state to be permanently and immutably recorded on its blockchain. Ethereum calls these distributed applications, or DApps. The figure below shows the Applications layer in Ethereum.

There are fairly obvious applications such as wallets for holding one’s ether. Another application, Mist, is a very popular special-purpose browser that offers a non-programmer-friendly way to browse the Ethereum blockchain and its components, create transactions and interact with contracts. Thus, in a sense, DApps are just a reproduction of the client-server paradigm, albeit with a fancy, new-fangled business logic tier and database – transactions and smart contracts recorded on the Ethereum blockchain – whose users interact with it via some Web-based front end.

DApps are, increasingly, not contracts created for a specific purpose running on the Ethereum blockchain but those that also require their own token (or “coin”) for using them. Hence, the current popular rage (in more ways than one) about initial coin offerings (ICO). The idea is that the application developer finds some clever and presumably compelling offering that users wish to trade or engage in. This offering can be something tangible, such as, taking recent examples, allowing others to use one’s unused data storage, or CPU, etc. Users purchase the coins developed for this DApp (with regular currency or exchanging bitcoins or ether) to be able to use the service offered. If there is a demand for this service, such coins increase in value. All DApp developers hope that they will achieve the necessary network effect to make their application, and hence the associated token, the next big thing.

The reason DApp developers take this route is to take advantage of all the built-in features that Ethereum offers – its blockchain, above all, and the way in which it is maintained and grown. If it weren’t for Ethereum, the app develop would have to define a new blockchain, the software to access it, the proof-of-work technique and the consensus mechanism needed to create and grow it. All this comes for free when piggy-backing on Ethereum.

Of course, ICOs have gained a certain amount of notoriety because in many cases the so-called DApp is merely an idea in someone’s head (if that) and the ICO is a means to raise funds towards its eventual development. This offers entrepreneurs an alternative to seeking venture capital via more traditional routes, but unfortunately also allows speculators and various hangers-on a means to exploit gullible investors using the eye-popping market valuations of bitcoins and ether as an enticing bait. The potential for a new ICO token increasing in value is what beckons.

We’ll go into various applications that use Ethereum to create interesting and potentially useful applications in our next post.

Summary

Our aim is to provide the reader with just enough details on Ethereum so that they can follow and make sense of news articles (including our next posts) on blockchain and its applications on their own. There is an enormous amount of material on Ethereum, at GitHub and numerous websites, which interested readers should consult to expand their understanding. Unlike Bitcoin, which appears to be in a maintenance mode at a technical level, Ethereum is an active topic of research and development, with a roadmap and a dedicated team for developing new features and improvements.

In our next posts, we shall explore interesting applications of the Ethereum blockchain and, specifically, those that are (or might be) proposed for use in the medical domain.

 

Notes

[1] The fancy expression for this is that the scripting languages used in Ethereum to write contracts are Turing complete.

[2] There are some fixed, baseline costs for every transaction independent of its complexity.

[3] As there is no central authority in the Ethereum ecosystem, the gas price is set by what the transaction initiator is willing to pay balanced against what the miners will accept to process it. Set too low a gas price compared to others and your transaction will remain unconfirmed for a longer time as miners will pass over it when constructing a block. An Ethereum wallet can keep track of the current gas price in the “market” of transactions and recommends an appropriate value to the user.

[4] Any form of centralization, as the alert reader will have guessed, is an anathema to the blockchain purist. However, avoiding centralization is more than an academic ideology as there is a real security threat if more than 51% of the total mining power in the network were to be used to subvert the consensus blockchain and create an alternative one that is presumably favorable to the rogue miners.