7 Innovations that Make Solana the First Web-Scale Blockchain

in #technology5 years ago

Understand the tech breakthroughs that make Solana’s 50,000 TPS blockchain network possible

0_K2Heczk4huH5I9wX.png

Solana was conceived in 2017 when its founder Anatoly Yakovenko sought out a way for a decentralized network of nodes to match the performance of a single node. None of the major blockchains come close to achieving this property. Achieving this is Solana’s north star.

Proof of Work systems like Bitcoin and Ethereum support about 10 transactions per second (TPS). Practical Byzantine Fault Tolerance-based (PBFT) Proof of Stake (PoS) systems like Tendermint support about 1,000 TPS with 100–200 nodes. Solana, a PBFT-like PoS blockchain, supports upwards of 50,000 TPS with over 200 nodes on current testnet iterations, making it the most performant blockchain and the world’s first web-scale decentralized network.

Since its inception, the Solana team — comprised of pioneering technologists from Qualcomm, Intel, Netscape, and Google—has focused on building the tech required for Solana to function with these groundbreaking performance standards.

In order to create a decentralized, permissionless network that matches the performance of a single node, the Solana team developed 7 key technologies:

Proof of History (POH) — a clock before consensus;
__Tower BFT __— a PoH-optimized version of PBFT;
Turbine — a block propagation protocol;
Gulf Stream — Mempool-less transaction forwarding protocol;
Pipeline VM — Parallel smart contracts run-time;
Cloudbreak — Horizontally-Scaled Accounts Database; and
Replicators — Distributed ledger store

In this essay, we’ll briefly explain each of the above. If you’d like to learn more about each, we’ve also written detailed explainers that you can access by clicking the links above.

Proof of History

If a blockchain network as a whole is going to match the performance of a single node, that implies that bandwidth cannot be the bottleneck, but rather computation. To achieve this, we need to first optimize how the nodes in the network communicate.

Wireless cellular networks offer many similarities to blockchain-based networks, and have long been focused on optimizing network communication. At scale, no single radio tower has enough bandwidth to give each cell phone its own radio frequency to transmit on, so telecom companies needed “multiple access technologies” to cram multiple phone calls on the same frequency.

Time Division Multiple Access (TDMA) is one of the major technologies that enabled massive scalability in cellular networks. TDMA specifies that towers divide each radio frequency into time slots, and allocate these time slots to each phone call. In this manner, the cell tower provides a globally available clock for the network. This massively increases scalability of limited bandwidth by letting each frequency support multiple, simultaneous data channels, and decreasing interference from multiple phones broadcasting on the same frequency at the same time.

Today’s blockchain-based networks have a clock problem. Their clocks “tick” whenever a new block is produced. For Ethereum, that happens once every fifteen seconds, and there’s only so much information that can be fit into a single block. The TDMA-equivalent for blockchain-based networks would be a clock with sub-second granularity that all validating nodes agree on, so that they can more efficiently process transactions.

The core Solana innovation is Proof of History (POH), a globally-available, permissionless source of time in the network that works before consensus. POH is not a consensus protocol or anti-Sybil mechanism. Rather, POH is a solution to the clock problem.

Whereas other blockchains require validators to talk to one another in order to agree that time has passed, each Solana validator maintains its own clock by encoding the passage of time in a simple SHA-256, sequential-hashing verifiable delay function (VDF). Solana does not use a VDF for randomness. Instead, each validator uses the VDF to maintain its own clock. Because each validator maintains its own clock, leader selection is scheduled ahead of time for an entire epoch. Like Tendermint, the schedule for an epoch lasts for thousands of blocks. However, unlike Tendermint, the network never waits for a failed node. Each validator runs the VDF to prove that it has acquired its slot to transmit a block and validators. Each validator is compensated for doing so because the block producer receives a reward for producing a block.
Enabled by Proof of History, leaders continue to rotate and the network as a whole makes progress regardless of network conditions. This means that the network never stops. The network can make a decision to rotate validators without any of the validators talking to one another. This is a subtle but profound shift. No other blockchain has a comparable mechanism. In every other chain, validators must communicate in order to make a decision. In Solana, leader rotation decisions are made asynchronously.

This core innovation opened up the design space going up the stack. In addition to providing a clock that can be used for timestamping, POH allows Solana to optimize for block time (800ms), block propagation (log200(n)), throughput (50K-80=K TPS)), and ledger storage (petabytes) available on the network.

Tower BFT

On top of Proof of History, Solana runs Tower Consensus, a PBFT-like consensus algorithm specifically designed to take advantage of the synchronized clock. Unlike PBFT, Tower Consensus prefers liveness over consistency. Like PBFT, nodes exponentially increase their timeouts to come to an agreement, but because the ledger is also a trustless source of time, nodes can observe and examine timeouts of all the other validators in the network. To better understand, let’s consider an example:

Imagine that you are on an island, and a bottle floats by with a thumb drive. Inside the drive is a Solana ledger. If you just look at the ledger itself, you will see that each node can compute the number of validators present, the state of each validator, and critically, the timeout each validator has committed to any block in the network. Based on the data structure alone, without any peer-to-peer messages, a validator can make the decision to vote, and the network can come to a consensus.

Turbine

Given that the Solana consensus layer has no dependencies on peer-to-peer messages, Solana is able to optimize how blocks are transmitted through the network independently of consensus. Turbine, Solana’s block-propagation technique, borrows heavily from BitTorrent. As a block is streamed, it is broken up into small packets along with erasure codes, and then fanned out across a large set of random peers. With a fan-out of 200, the second layer of the network can cover 40,000 validators. Thus, validators are able to propagate blocks with a log200(n) impact to finality. For all practical purposes, if each connection is 100 ms, replication can be achieved in 400 ms, and finality in 500 ms for a 40,000 node network.

The fanout mechanism must be resistant to faults. As such, validators encode data using Reed-Solomon erasure codes, providing a degree of fault tolerance.

Gulf Stream

In a high performance network, mempool management is a new class of problem that other chains don’t really have to address. Gulf Stream functions by pushing transaction caching and forwarding to the edge of the network. Since every validator knows the order of upcoming leaders in Solana architecture, clients and validators forward transactions to the expected leader ahead of time. This allows validators to execute transactions ahead of time, reduce confirmation times, switch leaders faster, and reduce the memory pressure on validators from the unconfirmed transaction pool.

Clients, such as wallets, sign transactions that reference a specific block-hash. Clients select a fairly recent block-hash that has been fully confirmed by the network. Blocks are proposed roughly every 800ms, and require an exponentially increasing timeout to unroll with every additional block. Using our default timeout curve, a fully confirmed block-hashes are, in the worst case, 32 blocks old. Assuming block times of 800 ms, that equates to 25.6 seconds.

Once a transaction is forwarded to any validator, the validator forwards it to one of the upcoming leaders. Clients can subscribe to transaction confirmations from validators. Clients know that a block-hash expires in a finite period of time, or the transaction is confirmed by the network. This allows clients to sign transactions that are guaranteed to execute or fail. Once the network moves past the rollback point such that the blockhash the transaction reference has expired, clients have a guarantee that the transaction is now invalid and will never be executed on chain.

Pipeline VM

To take advantage of Solana’s high-performance network, we built Pipeline VM, a hyper parallelized transaction processing engine designed to scale horizontally across GPUs and SSDs. Note that all other blockchains are single-threaded computers. Solana is the only chain to support parallel transaction execution (not just signature verification) in a single shard.

The solution to this problem borrows heavily from an operating system driver technique called scatter-gather. Transactions specify up front what state they will read and write while executing. The runtime is able to find all the non-overlapping state transition functions occurring in a block and execute them in parallel — what is called parallel execution — while optimizing how read and writes to the state are scheduled across an array of RAID 0 SSDs.
Although Pipeline itself is a VM that schedules transactions, Pipeline doesn’t actually execute transactions in the VM. Instead, Pipeline hands off transactions to be executed on hardware natively using an industry-proven bytecode called the Berkeley Packet Filter (BPF), which is designed for high-performance packet filters. This bytecode has been optimized since the early 90s, and has been deployed in production in millions of switches worldwide to handle 60 million packets per second on a 40-gigabit network in a single switch.

Every time Nvidia doubles the number of SIMD lanes available, our network will double in computational capacity. Virtually all other blockchains, which are single-threaded computers by design, can never scale in this way.
Using LLVM, the same compiler that targets WASM, we provide a great set of tools for developers to write high-performance smart contracts in C/C++ and Rust that execute contracts on GPUs. Although Solana isn’t using WASM, developers can re-compile C and Rust code written for WASM compilers in the Solana compiler with minimal changes. Thus, developers can easily migrate their applications from other major WASM chains like Dfinity, EOS, Polkadot and Ethereum 2.0.

Ethereum has had a history of bugs resulting from the software architecture. Two relevant examples:
Multiple parity hacks through Delegate Call
The DAO rentrancy bug through ‘call’
It’s definitely possible to write safe Solidity code, just like it’s possible to write complex software in C without memory protection. But as long as unsafe behavior is easy to add and hard to detect, it becomes geometrically harder to verify the behavior of complex software. Both Solana and the Libra team recognized this problem early on and developed architectures that maintain a strict separation of state between different modules.
The Move language introduced Resources and Scripts as high level concepts. Both fit naturally into the Solana Pipeline Runtime, and how we have been designing our native programs. Our goal is to support Move as a first level language, such that Resources behave as native Solana programs, and can be developed and composed through Move, or through our own native Rust ABI without any compromises to performance or security.

Cloudbreak — Horizontally Scaled Memory

It is not enough to simply scale computation. Memory that is used to keep track of accounts quickly becomes a bottleneck in both size and access speeds. For example, it’s generally understood that LevelDB, the local database engine that many modern chains use, cannot support more than about 5,000 TPS.

A naive solution is to maintain the global state in RAM. However, it’s not reasonable to expect consumer grade machines to have enough RAM to store the global state. For Solana, we designed Cloudbreak, a state architecture that’s optimized for concurrent reads and writes spread across a RAID 0 configuration of SSDs. Each additional disk adds storage capacity available to on-chain programs, as well as increasing the number of concurrent reads and writes programs can perform when executing.

Coupled with our transaction design, this architecture supports AOT (Ahead Of Time) execution of transactions. As soon as a transaction is observed by a validator, the Pipeline VM can start pre-fetching all the accounts from disk and preparing the runtime for execution. Validators and block producers can even start executing transactions before they are encoded into a block, which allows us to further optimize block time and confirmation latencies.

Replicators

At 1GBPS, a blockchain network will generate 4 petabytes of data a year for the ledger. Storing the data would quickly become the primary centralization vector, defeating the very purpose of blockchain implementation in the process.
On Solana, data storage is offloaded from validators to a network of nodes called Replicators. Replicators do not participate in consensus. The history of the state is broken into many pieces and erasure coded. Replicators store small parts of the state. Every so often, the network will ask the Replicators to prove that they’re storing the data they are supposed to. Solana leverages Proofs of Replication (PoRep), which are borrowed heavily from Filecoin.
We are able to use Proof of History — our clock before consensus — to optimize how PoReps are created. Replicator nodes, which do not participate in consensus, use PoH to generate lightweight proofs by which pieces of the ledger have been replicated, and validators are able to verify them in very large batches across GPUs.

Replicators can be lightweight nodes (e.g. laptops). With erasure codes and redundancy, a network of Replicators can offer data availability guarantees exceeding anything AWS or GCE could ever hope to provide.

Sort:  

Congratulations @solanalabs! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You made your First Comment
You received more than 10 upvotes. Your next target is to reach 50 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Vote for @Steemitboard as a witness to get one more award and increased upvotes!