<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Preston Evans' Blog]]></title><description><![CDATA[Thoughts on crypto, math, and programming.]]></description><link>http://www.prestonevans.me/</link><image><url>http://www.prestonevans.me/favicon.png</url><title>Preston Evans&apos; Blog</title><link>http://www.prestonevans.me/</link></image><generator>Ghost 2.9</generator><lastBuildDate>Tue, 09 Jan 2024 16:48:08 GMT</lastBuildDate><atom:link href="http://www.prestonevans.me/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Nearly Optimal State Merklization]]></title><description><![CDATA[There's a meme going around Twitter that Solana is fast because it doesn't merklize its state.

I think I'm having a Eureka moment in performant blockchain design. Three main points:

1. Do not Merklize chain state. This removes a huuuuge overhead in both execution and disk usage!

2. Use an SQL DB instead of a raw KV store. Instead of the chain implementing a query server,…

— Larry Engineer (@larry0x) December 27, 2023


... And to some extent, this is true. In the Sovereign SDK, for example, ]]></description><link>http://www.prestonevans.me/nearly-optimal-state-merklization/</link><guid isPermaLink="false">Ghost__Post__659737d3fded6332b69a74b2</guid><category><![CDATA[Crypto]]></category><dc:creator><![CDATA[Preston Evans]]></dc:creator><pubDate>Thu, 04 Jan 2024 23:15:28 GMT</pubDate><media:content url="https://prestonevans.me/content/images/2024/01/merkle-2.png" medium="image"/><content:encoded><![CDATA[<img src="https://prestonevans.me/content/images/2024/01/merkle-2.png" alt="Nearly Optimal State Merklization"/><p>There's a meme going around Twitter that Solana is fast because it doesn't merklize its state.</p><figure class="kg-card kg-embed-card"><div><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I think I'm having a Eureka moment in performant blockchain design. Three main points:<br><br>1. Do not Merklize chain state. This removes a huuuuge overhead in both execution and disk usage!<br><br>2. Use an SQL DB instead of a raw KV store. Instead of the chain implementing a query server,…</br></br></br></br></p>— Larry Engineer (@larry0x) <a href="https://twitter.com/larry0x/status/1739853256455582170?ref_src=twsrc%5Etfw&ref=127.0.0.1">December 27, 2023</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"/></div></figure><p>... And to some extent, this is true. In the <a href="https://github.com/Sovereign-Labs/sovereign-sdk?ref=127.0.0.1">Sovereign SDK</a>, for example, we find that our (mostly unoptimized) merkle tree update currently takes at least 50% of our total runtime. Although exact performance numbers vary, updating the state trie is also a significant bottleneck for Ethereum and Cosmos node implementations.</p><p>The commonly cited reason for this inefficiency is that maintaining a merkle tree incurs lots of random disk accesses. And indeed, if you look at existing merkle tree implementations, you'll find this to be exactly the case. But I think this state of affairs is more an artifact of history than a fundamental limitation of merklization. In this post, I'll outline a more efficient strategy for merklizing state data, which I hope will be useful to the broader community.</p><blockquote>N.B. Although this post makes some original contributions, it also combines many existing optimizations into a single scheme. When ideas were not developed originally, I credit the original author in block like this one.</blockquote><h2 id="background-why-is-merklization-expensive">Background: Why is Merklization Expensive?</h2><h3 id="a-very-brief-introduction-to-addressable-merkle-trees">A Very Brief Introduction To Addressable Merkle Trees</h3><blockquote>N.B. If you're totally unfamiliar with merkle trees and want a better introduction, I recommend the <a href="https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf?ref=127.0.0.1">Jellyfish Merkle Tree paper</a></blockquote><p>Addressable merkle trees usually work by grouping intermediate hashes into sets of 16 (called a <code>node</code>) and storing them in a key-value store like LMDB or RocksDB, where the key is the node hash and the value is the node itself.</p><p>To traverse the trie, implementations do something like the following:</p><pre><code class="language-rust">struct ChildInfo { 
  is_leaf: bool,
  hash: H256,
  // ... some metadata omitted here
}

struct Node { 
  children: [Option&lt;ChildInfo&gt;; 16]
} 

fn get(db: TrieDb, path: &amp;[u4]) -&gt; Option&lt;Leaf&gt; { 
  let mut current_node: Node = db.get_root_node();
  for nibble in path { 
    let child: ChildInfo = current_node[nibble]?;
    if child.is_leaf { 
      // Some checks omitted here
      return db.get_leaf(child.hash)
    }
    current_node = db.get_node(child.hash)
  } 
}
</code></pre><p>As you can clearly see, this algorithm requires one random disk access for each layer of the tree that needs to be traversed. A basic property of addressable merkle tries is that their <em>expected</em> <em>depth</em> is logarithmic in the number of items stored, and the base of the logarithm is the width of the intermediate nodes. So if your blockchain has five billion objects in state and uses intermediate nodes with a width of 16, each state update will need to traverse to an expected depth of &nbsp;<code>log_16(5,000,000,000) ~= 8</code>. Putting all of this together, we can see that traversing a naive merkle trie will require roughly 8 <em>sequential</em> database queries per entry.</p><p>Compare that to the single query required to look up a value a "flat" (non-merklized) store like the one used in Solana, and you'll begin to see why merklization has such a big performance impact.</p><h3 id="but-wait-theres-more-overhead">But Wait, there's More (Overhead)!</h3><p>Unfortunately, this isn't the end of the story. In the previous discussion, we treated the underlying key-value store as if it could store and fetch values with no overhead. In the real world, though, lookups can be expensive. Almost all key-value stores use a tree structure of their own internally, which causes "read amplification" - multiplying the number of disk accesses for a random read by another factor of <code>log(n)</code>.</p><p>So, for those of you keeping track at home, each read against a merkle tree incurs an overhead of <code>log(n) ^ 2</code>, where <code>n</code> is the number of items in global state. On Ethereum mainnet today, this <a href="https://ethereum.karalabe.com/talks/2022-ethprague.html?ref=127.0.0.1#6">log-squared factor is roughly 50</a> - meaning that each state update takes roughly 50 sequential disk accesses. When you observe that a <a href="https://etherscan.io/tx/0x1e84e38a7bbb3eede7f65769279d71759e74c3b9120d693652825c651a65270f?ref=127.0.0.1#statechange">typical Swap</a> on Uniswap might write to 7 storage slots across 4 different contracts addresses and update the balances of 3 additional accounts, you can see how this becomes an issue.</p><p><a href="https://ethereum.karalabe.com/talks/2022-ethprague.html?ref=127.0.0.1#8">This is the reason</a> Ethereum doesn't just "raise the gas limit". Even though all the major full node implementations could easily a handle an order of magnitude increase in throughput <em>today</em>, the resulting state growth would eventually degrade node performance to an unacceptable degree.</p><h2 id="part-2-lets-make-it-better">Part 2: Let's make it better</h2><p>But hang on... aren't merkle trees a theoretically optimal data structure? They <em>should</em> be fast! Can we make them live up to the hype?</p><p>I think so, yes. Here's how.</p><h3 id="step-1-make-the-trie-binary">Step 1: Make the Trie <em>Binary</em></h3><p>The first thing to do is to decouple your data <em>storage</em> format from the logical structure of your merkle trie. While it's a good idea to store nodes into groups of 16 (or more!) on disk and in-memory, using a high-arity merkle tree wastes computation and bandwidth. The solution is simple. Store your nodes in power-of-two-minus-two sized groups where each cluster contains a rootless binary subtree of some constant depth. In plain english, store your nodes as arrays of size (say), 6, where the array represents a binary tree with depth (say) 2 - with the additional wrinkle that the root of each tree is stored at the layer above. (See diagram).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://sovlabs.notion.site/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fc11ef805-ecd7-483f-ae57-20337206518e%2F1219018b-710c-4367-a62c-a0bc902e9007%2FUntitled.png?table=block&amp;id=a10fb45d-a08b-439a-82d9-a4c2fee04c06&amp;spaceId=c11ef805-ecd7-483f-ae57-20337206518e&amp;width=2000&amp;userId=&amp;cache=v2" class="kg-image" alt="Nearly Optimal State Merklization" loading="lazy"><figcaption><span style="white-space: pre-wrap;">A diagram showing the physical and logical layout of an example merkle tree during an insertion at q. Modified items are shown in green, while siblings which are needed to compute the new root are shown in blue.</span></figcaption></img></figure><p>The reason we store data this way is to ensure that all of the <em>siblings</em> needed in order to make a proof for a particular leaf are stored in the same nodes as the hashes that will be modified. If we also included the root of the binary trie in the node, then we would have to do an extra disk access each time we wanted to do an update that touched the root hash of a node (since the <em>sibling</em> of the root would be stored is stored in a different node).</p><p>Using this binary layout, we can eliminate a significant amount of overhead. For example, when we want to prove an insertion at &nbsp;<code>q</code>, we only need to perform four hash compressions: <code>H(a || b)</code>, <code>H(c || d)</code>, <code>H(g || h)</code>, and <code>H(q || r)</code>. This is significantly better than the number of compressions we would need to ingest a branch node in a 16-ary trie (roughly 9). Similarly, we only need to send four siblings in the merkle proof instead of the (minimum) 15 required by a 16-ary trie.</p><blockquote>N.B. The idea of separating disk-format from tree arity is not new to Sovereign. It is already implemented in the Jellyfish Merkle Tree pioneered by Diem, and <a href="https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751?ref=127.0.0.1">was suggested by Vitalik</a> as least as early as 2018</blockquote><h3 id="step-2-improve-the-disk-layout">Step 2: Improve the Disk Layout</h3><p>Now that we've decoupled our on-disk layout from the underlying logical structure of the merkle tree, lets design the best possible layout. </p><blockquote>N.B. Both Monad and Avalanche are doing related work at the same time this blog post is being written. Monad's work is still closed source, so it's not possible to assess how much overlap there is between our construction and theirs. Avalanche has open-sourced their FirewoodDB, but has not released much documentation, so I haven't yet had time to assess it in detail.</blockquote><p>First, we're going to increase the size of our on-disk nodes to be the size of a page on a standard SSD. SSDs can’t read or write data in units smaller than a page - so storing smaller chunks than that just wastes IOPS. While there's some variance in the page size across popular SSDs, 4096 bytes seems to be a popular one. Let's start there.</p><p>Using nodes of size 4096, we can fit a rootless binary trie of depth 6 into a single node and still have 64 bytes left over for metadata. Great.</p><p>Now, how can we use our disk most effectively? Well, SSDs accesses have pretty high latency, but modern drives have lots of internal parallelism. So, we want to make sure that our lookups can be pipelined. In other words, the on-disk storage location where we keep a node should <em>not</em> depend on the specific contents of the merkle tree! That way, we can fetch all of the data at once, instead of reading one node at a time.</p><p>How can we pull that off? Simple - we give each node in our tree a unique key. Since we're using nodes of size <code>2^6</code> to store a binary tree of (logical) size <code>2^256</code>, we have <code>2^250</code> distinct logical nodes that might be stored on disk. We'll give the root node the ID <code>0</code>, its leftmost child the ID <code>1</code>, and so on. To convert between <em>the path to a node</em> and that node's ID, we can use a relatively simple formula. The <code>ith</code> child of a the node with ID <code>k</code> has ID <code>k * 2^6 + i</code>.</p><p>Cool, now each node in our tree has a unique key. Using those keys, we can build a simple hashmap backed by our SSD. First, we pick some fixed number of blocks on the SSD to use for storing merkle tree nodes. Then, we assign each of those blocks a logical index. To compute the expected location of a particular node on disk, we simply use the formula <code>index = hash(node_id) % num_blocks</code>. In other words, it works exactly like a standard hashmap, just on disk instead of in memory.</p><p>The beauty of this construction is that it lets us look up all of the data on the path to a given node concurrently. Say, I want to update the merkle leaf at path <code>0x0f73f0ff947861b4af10ff94e2bdecc060a185915e6839ba3ebb86b6b2644d2f</code>. I can compute exactly which pages of the SSD are likely to contain relevant merkle nodes <em>up front,</em> and then have the SSD fetch all of those pages in parallel. In particular, I know that my tree is unlikely to store more than a few trillion items, so in general I shouldn't find any non-empty nodes beyond the first ~40 layers of the tree. So, I can just fetch the first <code>40/6 = 7</code> nodes along the path - well within the capability of the SSD to handle in parallel.</p><p>But wait... hash maps can have collisions. How can we handle those? In the current setup, we can't. We can't even <em>detect</em> them. Thankfully, there's a simple fix that's been used in hash maps since time immemorial. If you're worried about collisions, just store the <em>key</em> inside the hash bucket alongside the value. By a lucky coincidence, our keys are just node ids - which themselves are just 250-bit integers. And as we mentioned before, each of our on-disk nodes happens to have 64 bytes of extra space. So, we can just follow the standard practice of storing the key along with its value and call it a day. With that setup in place, we can now detect collisions - so all that's left to do is figure out how to handle them. Since we still have about 32 bytes of extra space for metadata (which we can use to record the fact that a collision has occurred and link to the colliding node), this should be straightforward. There are a plethora of strategies in the literature (linear probing, quadratic probing, double-hashing, etc.), and a full discussion is out of scope for this article.</p><p>Whew, that was a lot. Let's take a moment and survey what we've built so far.</p><ul><li>We have a binary merkle-patricia tree, which gives us optimal hashing load and bandwidth consumption. This gives us performance which should be roughly equivalent to the best known constructions in terms of CPU, bandwidth, and (if we're using zk) proving cost.</li><li>We have an on-disk structure which is mostly data independent. This means that <em>in most cases</em> we can fetch all nodes on the path to a particular leaf node with a single roundtrip to our SSD - taking full advantage of the drive's internal parallelism. Compare this to the naive merkle-tree construction I outlined in the first section, which needs 7 or 8 sequential <em>database lookups,</em> and about 50 sequential disk accesses due to nested tree structure.</li></ul><h2 id="step-3-compress-your-metadata">Step 3: Compress your Metadata</h2><p>At this stage, astute readers will have noticed a problem. up to this point, we've made two contradictory assumptions:</p><p>Assumption 1: Every item we want to store is only 32 bytes long</p><p>Assumption 2: Operations are logarithmic in the number of items actually <em>stored</em> in the tree, rather than in the logical size of the tree.</p><p>These two assumptions can't co-exist naturally. For Assumption 2 to hold, we need to use an optimized leaf-node construction which allows us to elide empty branches of the tree (see the JMT paper for details). To make that work, we need to give leaf nodes a completely different structure from internal nodes. So we need a <em>domain separator</em> to allow us to distinguish between internal and leaf nodes. (N.B. Domain separators are a good idea for security reasons as well. Type confusion is bad!) But, we've already stipulated that each of the items in the tree is a 32-byte hash. If we increase the size to 33 bytes, our on-disk nodes will get too large to fit on a single SSD page, killing our performance.</p><p>Thankfully, there's a simple solution here too. Since all we need is a single bit of discriminant, we just borrow the one bit from our hash. In other words, we mandate that all internal nodes must have the leading bit set to <code>1</code>, and all leaf nodes must have their leading bit set to <code>0</code>. (That leaves us with 255 bits of hash output, which still leaves plenty of security margin.)</p><h2 id="step-4-halve-your-compressions">Step 4: Halve your Compressions</h2><p>Thanks to our trick from the previous section, we can now guarantee the invariant that every internal node is the hash of exactly two children, and that each child has a width of exactly 32 bytes. That means that at each layer of the tree, we need our hasher to ingest <em>exactly</em> 64 bytes of data. By a shocking coincidence, 64 bytes happens to be exactly the amount of data that can be ingested by many hash "compression functions" in a single invocation. (Most notably, SHA-256).</p><p>Unfortunately, such fixed-length compression functions are insecure on their own due to length extension attacks - so they aren't exposed by common cryptographic libraries. If you invoke the <code>sha256()</code> function in your favorite programming language, the library will be secretly padding your input with 64 bytes of metadata to prevent exactly this issue. Such padding is mandated by the NIST standard, and is absolutely necessary to ensure collision resistance in the presence of arbitrary-length messages.</p><p>But in our case, the messages don't have arbitrary length. <em>Internal</em> nodes are always constructed from 64 byte inputs. And thanks to our domain separators, we can easily distinguish internal nodes (which have fixed-size inputs) from leaf-nodes (which do not). By exposing the internals of our favorite hashing library, we can reduce the number of hashes required for tree traversal by a factor of 2 by using a padding-free hasher for internal nodes.</p><blockquote>(Note: Messing with hash functions should only be done with good reason, and always under the supervision of a qualified cryptographer! This optimization is most useful in contexts where you're proving statements about the tree in zero-knowledge. If you're only using the tree in "native" contexts, you may want to skip this one!)</blockquote><h2 id="step-5-blake-is-cool-be-like-blake">Step 5: Blake is cool. Be like Blake.</h2><p>So far, we’ve focused exclusively on internal nodes of the tree. Now, lets turn our attention to <em>leaf nodes.</em></p><p>The first thing to do is to recognize that, despite all of our hard work, state trees are <em>expensive</em>. While we've drastically reduced overheads, the cost for each access is still logarithmic in the number of values stored. "Normal" hashing, on the other hand, is relatively cheap. You can do a standard (linear) hash of about a kilobyte of data for the same compute cost as a single merkle tree lookup (assuming a tree depth of 32). So, if two values have even a relatively small chance off being accessed together, you want to store them in a single leaf.</p><p>But increasing leaf sizes hits an efficiency ceiling disappointingly quickly. Using a linear hasher like <code>sha256</code>, you pay roughly one compression for every 32 bytes of data - so the "breakeven" leaf size is about one kilobyte. (In other words, if you'd need to add more than a kilobyte of data to the leaf before you'd expect that two distinct values from that leaf would be acessed together, then the clustering would not be worthwhile.)</p><p>Thankfully, there's yet another trick we can use to reduce the overhead. <a href="https://twitter.com/fd_ripatel?ref=127.0.0.1">@fd_ripatel</a> recently pointed out to me that Blake3 uses an internal merkle tree instead of a linear hasher. This means that it's possible to do partial reads and incremental updates against a Blake3 hash with only logarithmic overhead. With this advancement, the breakeven size of a leaf increases from about a kilobyte to over a <em>gigabyte</em>. In other words, if you're using Blake3 as your leaf hasher then your leaves should be <em>really, really big</em>.</p><blockquote>Note: It seems like it <em>should</em> be possible to combine the padding scheme from Blake3 with the compression function from just about any secure hasher to yield an optimized leaf hasher. But IANAC! If you are a cryptographer and have thoughts on this, please reach out!</blockquote><h2 id="step-6-cache-aggressively">Step 6: Cache Aggressively</h2><p>Last but not least, if we're going to make a performant merkle tree we need to cache aggressively. We'll start simple - the (physical) root node of the tree should <em>always</em> live in memory. Why? Because any read or write against the tree is going to read and/or write that node. By keeping this node in memory, we can already reduce our expected SSD load from 7 pages per query to 6 pages per query.</p><p>But we can go further. If we keep the next level of the <em>physical</em> tree in memory, we can eliminate one more page from each query. This only costs us an extra 256k of memory (since there are 2^6 nodes of 4k each at this height), and we've already reduced our disk&nbsp;IOPS by 30%.</p><h2 id="step-7-be-lazy">Step 7: Be Lazy</h2><p>Last but not least, we can improve our efficiency even using pretty standard tricks.</p><p>First, we should avoid reading from the merkle tree wherever possible. We can do this by storing our leaf-data in a "flat" key-value store so that we don't need to traverse the tree just to read data. This reduces disk IOPS and reduces tail latencies (since each page we avoid querying is one less opportunity for a hash collision). If the data is modified, we'll still end up needing to read the merkle path eventually - but this optimization is still well worth the effort since reads are so common.</p><blockquote>N.B. Almost all merkle tree implementations already supporting reading data from a flat store without traversing the trie.</blockquote><p>Second, we should always defer updates to our trie for as long as possible. The obvious reason for this is that two sequential updates might both update the same key (meaning that we can skip one trie update altogether). The less obvious reason is that a properly implemented merkle tree is asymptotically more efficient when updates are done in large batches.</p><p>The reason for this is relatively easy to understand. Suppose we have a sequence of four writes against the pictured merkle tree. Since the tree has depth 4 (excluding the root), we incur a cost of 16 hashes if we perform the updates sequentially (4 hashes per update time 4 updates). But if we perform the updates in a batch, we only need 9 hashes to compute the new root. Intuitively, this is because the hashing work for the top layers of the tree is amortized across 4 inserts. Even though we made four modifications, we only recomputed the root hash once.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://prestonevans.me/content/images/2024/01/merkle-batch.png" class="kg-image" alt="Nearly Optimal State Merklization" loading="lazy" width="2000" height="1091" srcset="https://prestonevans.me/content/images/size/w600/2024/01/merkle-batch.png 600w, https://prestonevans.me/content/images/size/w1000/2024/01/merkle-batch.png 1000w, https://prestonevans.me/content/images/size/w1600/2024/01/merkle-batch.png 1600w, https://prestonevans.me/content/images/2024/01/merkle-batch.png 2072w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">A diagram showing batch insertion at locations q, r, t, and z.</span></figcaption></img></figure><h2 id="conclusion">Conclusion</h2><p>Using the techniques described in this post, it should be possible to reduce the disk-induced <em>latency</em> of merkle tree operations a factor of ten or more, the computational overhead by a factor of two or more, and the total disk IOPS by a factor of 25 or more. With those improvements, it’s my hope that the cost of generating state commitments should no longer be prohibitive even for high throughput chains.</p><p>Of course, this approach is not magic. In engineering, tradeoffs abound. In this case, we achieve much of our IO performance gains by sacrificing the ability to store a complete archive of the merkle tree at all points in time. Also, we require lower-level interfaces to the SSD than standard constructions - which translates to some engineering complexity. Still, we’re excited to see where this approach leads!</p><hr><h3 id="acknowledgements">Acknowledgements</h3><p>Thanks to…</p><ul><li>Cem Ozer and Patrick O'Grady for reviewing drafts of this post.</li><li>Zhenhuan Gao, Yuxuan Hu, and Qinfan Wu for the Jellyfish Merkle Tree paper</li><li>Peter Szilagyi for his excellent <a href="https://www.youtube.com/watch?v=Cmuz_Xn_YJw&ref=127.0.0.1">talk on the physical limits of Ethereum</a> (and for his tireless work on Geth)</li><li><a href="https://twitter.com/fd_ripatel?ref=127.0.0.1">Richard Patel</a> and <a href="https://twitter.com/dubbel06?ref=127.0.0.1">@dubbel06</a> for introducing me to Blake3’s secret merkle hasher</li><li>Emmanuel Goossaert for <a href="https://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/?ref=127.0.0.1">his excellent series on SSDs</a>.</li></ul><h3 id="final-note">Final Note</h3><p>Since this post is now approaching Jon Charb levels of verbosity, I’ve made the editorial decision to omit a complete description of insertion, deletion, and proving algorithms compatible with the general construction described here. You can find suitable algorithms in the <a href="https://github.com/penumbra-zone/jmt?ref=127.0.0.1">Jellyfish Merkle Tree</a> repo that Sovereign co-maintains. To understand the batched algorithms, see the <a href="https://github.com/Sovereign-Labs/jellyfish-merkle-generic/blob/master/src/lib.rs?ref=127.0.0.1#L444">Aptos fork of the JMT</a>. I’ve also omitted some details related to crash recovery, but the problem is not too difficult (hint: use a WAL). Finally, I’ve left aside the question of storage for the leaf nodes of the merkle tree. The best store for that data will depend on the application, but either a key-value store or a custom database might be appropriate.</p></hr>]]></content:encoded></item><item><title><![CDATA[From the Archives: Pentagonal Exchange]]></title><description><![CDATA[From the archives: the original proposal for Pentagonal Exchange. Now thoroughly outdated, but maybe still interesting. 
]]></description><link>http://www.prestonevans.me/pentagonal-exchange/</link><guid isPermaLink="false">Ghost__Post__60d41963870a3238dbe34230</guid><category><![CDATA[Crypto]]></category><dc:creator><![CDATA[Preston Evans]]></dc:creator><pubDate>Thu, 24 Jun 2021 05:40:07 GMT</pubDate><media:content url="https://prestonevans.me/content/images/2021/06/pentagon.jpeg" medium="image"/><content:encoded><![CDATA[<img src="https://prestonevans.me/content/images/2021/06/pentagon.jpeg" alt="From the Archives: Pentagonal Exchange"/><p><strong><strong>A Technique for Trustless Decentralized Random Number Generation</strong></strong></p><blockquote>This post was first published in 2017 - and is now thoroughly outdated. Still, at the time, it was a pretty good idea and attracted quite a bit of attention from Ethereans. Please don't try to implement this today. Just use threshold signatures. </blockquote><p><strong><strong>Introduction:</strong></strong></p><p>Cryptocurrencies are based on the idea that individuals will act in their self interest. This makes the problem of random number generation extremely difficult. To see why this is, let’s take the example of an online lottery for some extremely large sum — maybe $10,000,000,000. Every entrant in this lottery has been assigned some “ticket number,” and the winner will be the individual whose ticket number comes out of a random number generator. Let’s also say that each ticket costs a non-trivial sum — maybe $1,000.</p><p>An obvious mechanism for decentralized random number generation is for each entrant in the lottery to pick some number, and then have the smart contract combine each of these numbers to create a seed for a random number generator. The problem here is that the last participant to submit their number controls the outcome of the drawing — they can submit any number they want and thus create any outcome that they want. All they have to do is wait until everyone else submits their numbers, and then choose a secret number of their own that produces the desired result.</p><p>To avoid this kind of cheating, RANDAO (Ethereum’s current random number generation protocol) breaks the process into two steps. First, participants publish the hash of whatever secret number they plan to submit. Then, once every participant has had an opportunity to publish a hash, the contract asks for the secret numbers. As the secret numbers are submitted the network verifies that they match the hashes provided earlier. This prevents participants from changing their minds about which number to submit, removing the final participant’s ability to rig the outcome.</p><p>However, despite this precaution one crucial weakness remains — although the final participant can’t fix the outcome, he can alter it by refusing to submit his secret number. Since the smart contract is public, we have to assume that he’ll be able to calculate the outcome and make whatever decision best suits his interests. Essentially, if he won’t win by submitting his number, he won’t submit. Since hash functions are one-way, the smart contract has no way of knowing what his secret number is and it must either 1) proceed without his contribution, 2) try again (collect a new round of hash/secret number pairs) or 3) terminate, refunding the cost of lottery tickets to each entrant.</p><p>None of these options are desirable. If we proceed without the dishonest participant’s secret number, then a malicious participant has altered the outcome of the lottery, essentially doubling his chance to win. If we try again, we haven’t addressed the weakness of the first solution — a single bad actor has still altered the outcome — and we’ve added the increased time and computation costs of a second attempt at random number generation. In addition, there’s nothing to prevent our bad actor from causing our subsequent attempts to fail in exactly the same way. Finally, if we terminate the lottery because of a single malicious participant, we open ourselves up to denial of service attacks.</p><p>Current solutions attempts to mitigate these problems by requiring participants to put in a security deposit before they submit the hash of their secret number. If participants fail to follow through by submitting their secret number at the appropriate time, the dishonest participants will have their security deposits confiscated and split among the honest participants. This approach goes a long way toward eliminating the danger of dishonesty, but it isn’t foolproof. Continuing the example from earlier — let’s say that our participant has bought a lottery ticket for $1000. RANDAO would require him to put in a relatively large security deposit- say, an extra $2000 — to participate in the random number generation. This way, if he attempts to influence the result of the lottery by failing to submit his secret number (which, as we said earlier, effectively doubles his odds of winning), his expected reward (the value of a second $1000 ticket) is outweighed by the cost of his lost security deposit ($2000).</p><p>Unfortunately, this system of incentives has one serious limitation — it’s vulnerable to Sybil attacks. To see why this is, let’s imagine a fairly powerful malicious actor. Instead of buying one ticket, he buys 100 tickets under different identities. With ninety-nine of those tickets, he acts honestly, submitting his secret numbers at the appropriate time. Now, if he knows that submitting his last secret number will make him lose the drawing, he can fail to submit and give himself an extra 100 chances to win — one for each ticket he holds. Even though he loses $2000 from his confiscated security deposit, he comes out $98,000 in the black.</p><p>With current techniques, the only way to combat this vulnerability is to increase the size of the security deposit. However, if the incentives for dishonesty are extreme enough, the deposits eventually get so large that individual actors can no longer afford to participate, and the protocol breaks down.</p><p>It should be noted that Vitalik Buterin has proposed an improved scheme called “RANDAO++” which attempts to combat this vulnerability. Rather than security deposits, this new proposal relies on limiting the timeframe for submission and computing so many sequential hashes of each secret number (say, 1,000,000,000) that no participant can possibly know the result of tampering with the generation process until after his window for action has closed. This innovation does theoretically solve the problem of tampering, but it is extraordinarily computationally expensive. Most participants will lack the hardware to themselves verify the result of the lottery within a reasonable time frame, but the gas cost of using the Ethereum Virtual Machine to verify the result would be exorbitant. This will result in centralization of trust in those interested parties who also happen to have access to massive computing power.</p><p>In this paper, I propose a new technique to solve the problem of tampering in random number generation — Pentagonal Exchange.</p><p><strong><strong>Pentagonal Exchange:</strong></strong></p><p>Step 1: Separate participants into groups of five. This can be done pseudorandomly, as there’s no significant advantage to be gained by manipulating these groupings. Assign each of the five nodes within a group an index (1–5) for reference. Again, this can be done pseudorandomly — no combination of indices has any advantage over any other combination. As in RANDAO, each contributor should submit a security deposit to help ensure good faith. Each participant establishes an encrypted connection with each other participant in his group.</p><p>Step 2: Each contributor creates a secret number (or seed) for itself. At this point, here’s what each participant knows:</p><p>1 — Secret Number 1</p><p>2 — Secret Number 2</p><p>3 — Secret Number 3</p><p>4 — Secret Number 4</p><p>5 — Secret Number 5</p><p>Step 3: Each contributor generates a second seed. Each contributor should signal the network when Step 3 is completed by publishing the hash of the two components that it generated. (For example, node 1 publishes SHA3(SN1) and SHA3(SN6)). At this point, here’s what each participant knows.</p><p>1 — Secret Numbers 1 and 6</p><p>2 — Secret Numbers 2 and 7</p><p>3 — Secret Numbers 3 and 8</p><p>4 — Secret Numbers 4 and 9</p><p>5 — Secret Numbers 5 and 10</p><p>Step 3A: Once the hash of every pair has been published, take the hash of the next complete block of the Ethereum blockchain as an eleventh input into the final seed. In most circumstances (exceptions are detailed in the last section), <em><em>this prevents even a participant who controls all five nodes from rigging the outcome of the random number generation</em></em>.</p><p><em><em>UPDATE: 11/14</em>/17 <em>— Philip Daian rightly points out that when all five nodes collude, we’ve regressed to an earlier RNG proposal in which some future block is agreed on to be the sole seed. In this case, it’s possible that a dishonest miner could find a valid block, realize that he will lose the drawing if he publishes his block, and fail to publish. This increases his chance to win. I’ve added a section to the paper to address some of these concerns.</em></em></p><p>(Note: At this point, the entire network — including our five nodes — knows SHA3(block(n+1)), as well as SHA3(SN1), SHA3(SN6), SHA3(SN2), etc.)</p><p>Step 4: Each contributor sends its first secret number to the nodes which would be two spaces counterclockwise (“left”) and one space “right” of itself if the nodes were arranged in a pentagon (see diagram below). It sends its second secret number to the nodes two spaces “right” and one “left”. Each contributor should signal the network when it has received each component.</p><figure class="kg-card kg-image-card"><img src="https://miro.medium.com/max/636/0*cnw_NTu9N_wpaLwD." class="kg-image" alt="From the Archives: Pentagonal Exchange" loading="lazy"/></figure><p>At the end of step four, here’s what each individual node knows:</p><p>Node 1 — Secret Numbers 1, 3, 5, and 7, 9, 8</p><p>Node 2 — Secret Numbers 1, 2, 4, and 8, 10, 9</p><p>Node 3 — Secret Numbers 2, 3, 5, and 9, 6, 10</p><p>Node 4 — Secret Numbers 3, 4, 1, and 10, 7, 6</p><p>Node 5 — Secret Numbers 4, 5, 2, and 6, 8, 7</p><p>Note: Even if two nodes are controlled by a single malicious entity, that adversary still doesn’t have all ten secret numbers and cannot predict what output will be generated.</p><p>Step 5: Each contributor publishes its knowledge. At this point, even if two of the five nodes decide not to publish (losing their security deposits in the process), all ten secret numbers can be transmitted to the network. If any node tries to change the value of the secret numbers it publishes, it will be found out when the network compares the hashes of these values to the hashes published after Step 3, and the offender’s security deposit will be forfeited.</p><p>Step 6: The secret numbers, along with the hash of the most recent Ethereum block, are fed into a hash function to create a seed for the random number generator. (We’ll call this seed the primary seed. For scaling purposes, the primary seed can be hashed again to create a “secondary seed.”)</p><p><strong><strong>Scaling up and down:</strong></strong></p><p>What if there aren’t exactly five participants? If there are less than five interested parties, we simply bundle groups together until we have five participants — a group of two waits for a group of three that also needs a random number, and then the five carry out Pentagonal Exchange, and each use the seed for their own purpose. If they so desire, each group can agree ahead of time to modify the seed slightly — maybe one group will use the primary seed and one group will use the secondary seed — so that they don’t have to share a random number. In high security situations, we can require a substantial proof of work from groups before they are paired. This makes Sybil attacks difficult and limits the likelihood of dishonesty. However, we should remember that even Sybil attackers who control all five nodes in a round of pentagonal exchange are unable to do anything worse than cancel the number generation.</p><p>What if we have more than five participants? Going back to our lottery example, $10,000,000,000 in tickets divided by $1,000 per ticket gives us at least 10,000,000 participants interested in choosing a random number. If we pick five participants to act as representatives pseudorandomly, we’ve simply regressed the problem. Instead, we group all of our participants into fives, and have each group carry out Pentagonal Exchange. Based on the seed generated by each group, one participant from that group is selected to serve as the representative of that group in the next round. Those representatives are grouped into fives, and the process is repeated. In this way, we can get from ten million to five representatives in only ceil(log5 (10,000,000)) = 11 steps.</p><p>(Note that when the number of participants in a round isn’t divisible by five, we need to fill up to four extra spots. We do this by sorting primary seeds from high to low. Then, working our way along the list, we select one extra participant to move into the next round from each of the top 1–4 groups. The second representative is chosen using the secondary seed. No group should be allowed to furnish more than two representatives for the subsequent round, and those two representatives should not be placed in the same group of five unless absolutely necessary.)</p><p>Once we are down to only five representatives, we carry out one final round of Pentagonal Exchange to determine the winner of the lottery. (Note: The seed generated by this final group of five is used to pick the winning ticket number out of all 10,000,000 tickets. At this point, the five representatives have no greater chance to win than anyone else).</p><p>In this way, we can securely generate a random number with participation from ten million (or more) entities. Pentagonal Exchange is relatively computationally inexpensive and, given the current Ethereum block rate of approximately 30 seconds, the whole process can be completed in about six minutes.</p><p><em><em>Author’s Note: This is the end of the original publication. In response to valid concerns about attacks through dishonest Ethereum minings, I’ve added the following section. Huge thanks to Philip Daian for pointing out this potential attack vector.</em></em></p><p><strong><strong>Preventing Ethereum Mining Attacks:</strong></strong></p><p>As laid out above, Pentagonal Exchange retains one potential weakness. When all five participants are dishonest and aware that the other participants are dishonest (i.e. when all five are controlled by one attacker), they may collude to allow ETH miners to influence the drawing. If ETH miners know that secret numbers ahead of time, they can calculate the effect of publishing valid blocks and only publish those blocks which produce the desired outcome. If the stakes are high enough, it’s possible that a significant portion of ETH hashpower could act dishonestly and someone could succeed in rigging the outcome.</p><p>To mitigate this risk, we need to do one of two things — minimize the chance that someone will benefit from controlling all five nodes, or find a source of external randomness that isn’t a blockchain. I’ll attempt to do the first in this section of the paper. I’ll attempt to tackle the second solution at a later date.</p><p>First, we should ask ourselves when an attacker benefits from controlling the outcome of the drawing. The answer — only the final round of Pentagonal Exchange. In earlier rounds, an attacker who controls all five participants in a group is guaranteed that one of his participants will move on, even if he doesn’t influence the outcome of the drawing. <em><em>He has nothing to gain by sacrificing the ETH block reward</em></em>. In the final round, however, the random number is used to pick a winner out of every participant, not just a group of five. This is where attackers will focus their efforts. It’s also important to remember that this <em><em>ETH </em></em>mining attack is only possible when all five nodes collude since the exchange of secret numbers hasn’t happened by the time the ETH block is published.</p><p>These two realizations make our job significantly simpler. We don’t need to worry about preventing dishonest mining in early rounds. In those rounds, the union of people who can attack (people who control all five nodes) and people who benefit from attacking (people who don’t control all five nodes and thus need to cheat to ensure that their node moves on) is exactly zero. This means that even in high stakes situations, the early rounds of vanilla Pentagonal Exchange should be secure.</p><p>And, if the early rounds are secure, the chances of someone controlling all five nodes in the final round are slim. For argument’s sake, let’s imagine that our would-be attacker buys about half of the 10,000,000 tickets in our drawing from the previous paper. Now, he controls roughly 50% of the participants in our pentagonal exchange. Since our early rounds are secure, Pentagonal Exchange should guarantee that the distribution of our final representatives approximates the distribution of tickets. If our adversary’s odds of controlling any one randomly selected node are 1/2, the odds of him controlling five consecutive randomly selected nodes are 1/32 = 3.125%. So, an adversary who controls half of the tickets has about a 3% chance of being able to cheat by dishonest ETH mining.</p><p>If he controls 100% of ETH hashpower, this means that in total our adversary has a 51.5625% of winning the drawing — 48.4375 (50% chance of being selected in 96.875% of drawings in which he doesn’t control all five of the final nodes) + 3.125% (100% chance of being selected in the 3.125% of drawings in which he controls all five nodes)</p><p>We should note, however, that cheating by dishonest mining doesn’t guarantee a win for our attacker unless he controls the entirety of ETH hashpower. If he controls all five PE nodes but not all the ETH hashpower, then there are three possible scenarios. 1) Someone besides our adversary wins the race to discover the next block of the ETH block chain and publishes. In this case, the adversary has gained nothing. The odds of this happening depend on the hashpower of our adversary relative to the network. 2) Our adversary wins the race, happens to discover a block that makes him win, and publishes. In this case, no attack is made — he acts indistinguishably from an honest participant. 3) Our adversary wins the race but the block he discovers will not allow him to win the drawing, so he suppresses his knowledge.</p><p>Unfortunately, the formula for calculating our adversary’s odds of winning when he doesn’t control all of the network hashpower is recursive. If he controls p percent of the total hashpower, his odds of winning outright are 48.4375% + (50% * (100-p) * 3.125%) + (50% * p * 3.125%), and his odds of suppressing his knowledge — essentially triggering a second race — are (50% * p * 3.125%). His chance of winning in a second race is [50% * (100-p) *(50% * p * 3.125%)] + [50% * p *(50% * p * 3.125%)] and his chance of triggering a third race is [50% * p *(50% * p * 3.125%)]. His chances of winning that third race are… well, you get the idea.</p><p>What this amounts to in practice is that an adversary powerful enough to buy 50% of the tickets in our lottery has between a 50% and a 51.5625% chance of winning, depending on the fraction of ETH hashpower he controls. Since we believe that no one controls 50% of the ETH network, it’s safe to assume that the actual number is closer 50% than 51.5625%. <em><em>So, in practice, even vanilla Pentagonal Exchange is extremely robust to ETH mining attacks. </em></em>However, this is an issue that should be taken extremely seriously. In the coming months, I intend to propose a system of “bootstrapped randomness” that I hope will completely eliminate the possibility of mining attacks.</p><p><strong><strong>Sixty Percent Attacks, and How to Prevent Them:</strong></strong></p><p>I’ve received several good questions about what happens when an attacker controls 3/5 of the nodes in a Pentagon. I’ll attempt to answer those questions in this section.</p><p>First, it’s crucial to remember what an attacker stands to gain in the early rounds of Pentagonal Exchange - namely, a better chance for one of his nodes to be selected to move on to the next round. This limits the scope of his attack significantly. He can’t attack when he controls one or two nodes of a Pentagon, and he won’t attack when he controls three (all three of his nodes would lose their deposits and be unable to move on). When an attacker controls all five nodes, he has nothing to gain by attacking - one of his five will move on and four won’t, no matter what he does. So, the only attacks will take place when an attacker controls four fifths of a pentagon.</p><p>We should also note that when an attacker controls 4/5 of a pentagon, he only needs to attack 20% of the time (the other 80% of the time, he is selected honestly). Since attacking requires an adversary to reveal at least three of the nodes under his control, and he wouldn’t have attacked if he controlled all five, we know that exactly one of the two remaining nodes is honest and one is dishonest. For now, let’s assume that we have some way of randomly choosing between the two remaining nodes (more on this later). We have a fifty-fifty chance of choosing the honest node which means means that our adversary has essentially raised his total odds of moving on to the next round from 80% to 90%. However, this 10% increase in chance of success costs him three security deposits in the 20% (1/5) of rounds in which he needs to cheat.</p><p>So, his cost of attacking is 3/5 of a security deposit per 10% increase in chance of winning. To compare this to his cost per chance of buying tickets, let’s multiply both sides by ten. At this rate, our attacker is paying 30/5 (or 6) security deposits per additional 100% chance of winning. However, his ticket cost of a guaranteed win is only 5. So, as long as security deposits are at least 5/6 of the ticket price, our adversary really hasn’t gained anything. For extra security, let’s set them at the full price of a ticket.</p><p>Now that the early rounds are secure, how do we go about securing the final round? Well, as we discussed earlier, later rounds are composed of a probabilistically representative group of participants when the early rounds are secure. This means that in order for someone to control a 3/5 majority of the final round they (probably) need to buy about 60% of the tickets in our lottery.</p><p>But if they an attacker buys 60% of the tickets, he also has to pay 60% of the total security deposits. If we go with the one-to-one ticket price to security deposit ratio, we calculate that this attackers initial input to the lottery was 120% of the jackpot. This is worthwhile to the adversary because he expects to get all of his security deposits back, but we can use it to our advantage.</p><p>Since his 3/5 majority is only enough to cancel and not rig the final round of our drawing (recall that Pentagonal Exchange requires him to commit to secret numbers before he can predict who will win the drawing), our adversary has to decide what to do in the 40% of drawings that he isn’t going to win. He doesn’t want someone else to win, so he’ll be tempted to refuse to publish his secret numbers. If he does, the network is stuck several inputs short of being able to generate a random number.</p><p>How do we prevent this from happening? The good news is, we don’t have to. As long as we hold on to everyone’s security deposit until the last round comes through, our adversary will never have an incentive to attack. Recall that our adversary has bought 60% of the tickets and paid 60% of the security deposits. He’s paid more than the jackpot (and more than everyone else combined) into the system. That means that if he refuses to submit his secret numbers, he’s lost more money than the entire rest of the network has, combined. More importantly, he hasn’t changed the outcome of the drawing. If we go back to our lottery example from earlier, our adversary will have burned twelve billion dollars, and have nothing to show for it. This makes his rational response painfully clear. He has to publish his secret numbers and salvage his security deposits, and our system remains secure.</p><p>Now, there’s just one more point to touch on - if three nodes in an early pentagon fail to submit, how do we choose between the two that are left? There isn’t a perfect answer to this question, but I think that in practice, using some permutation of the next block of the ETH blockchain would turn out to be close enough to random for our purposes. Yes, a dishonest miner could theoretically withhold a valid block to give his node a better chance at being selected, but he has to burn three security deposits just to get to a point where he can mine dishonestly. Even then, he can only increase his chance of being selected from fifty-fifty to 75%. So, in practice, sacrificing the ETH block reward probably won’t be worthwhile. However, this is certainly an area where there’s room for improvement — it’s likely that some provably perfect solution exists, I just haven’t found it yet.</p><p><em><em>Author’s Note: This entire paper, and especially the final section, represents an idea in its earliest stages — please feel free to contact me with technical critiques and don’t hesitate to point out any errors in logic.</em></em></p>]]></content:encoded></item><item><title><![CDATA[Ava is no ETH-killer]]></title><description><![CDATA[If you're going to build a Dapp, don't do it on Avalanche. ]]></description><link>http://www.prestonevans.me/ava-bad/</link><guid isPermaLink="false">Ghost__Post__60ce0b5818c9b5df0d35a610</guid><category><![CDATA[Crypto]]></category><dc:creator><![CDATA[Preston Evans]]></dc:creator><pubDate>Mon, 21 Jun 2021 23:42:09 GMT</pubDate><media:content url="https://prestonevans.me/content/images/2021/06/frozen_landscape.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://prestonevans.me/content/images/2021/06/frozen_landscape.jpg" alt="Ava is no ETH-killer"/><p>If you're building a Dapp today, there can be good reasons to look at launching on a platforms besides Ethereum. ETH has high fees, it's time to finality is long, and there's MEV to worry about. But if you're going to build a production Dapp on another platform, please <em>please</em>, don't use Avalanche. </p><p>Avalanche has 3 significant shortcomings, any one of which could be enough to doom a platform.</p><ol><li>Even in theory, it can't maintain <strong><em>liveness</em></strong> without centralization.</li><li>In practice, it may not even guarantee <strong><em>safety</em></strong>.</li><li>It doesn't solve the <strong><em>state-growth problem</em></strong>.</li></ol><h2 id="alive-in-theory">Alive, in theory</h2><h3 id="odarn">O(<em>darn</em>)</h3><p>According to its <a href="https://assets.website-files.com/5d80307810123f5ffbb34d6e/6009805681b416f34dcae012_Avalanche%20Consensus%20Whitepaper.pdf?ref=127.0.0.1">whitepaper</a>, Avalanche provides the following guarantee about liveness: </p><blockquote>P3. Liveness (Strong Form). If f ≤ O( √n), then the Snow protocols terminate with high probability (≥ 1−ε) in O(log n) rounds. (Blogger's Note: the <em>"f"</em> in this statement refers to the number of adversarial nodes.)</blockquote><p>In vanilla distributed systems research, this is a perfectly reasonable guarantee to provide. Unfortunately, it's not a good guarantee for a cryptosystem that wants to be meaningfully decentralized. Why not? I'm so glad you asked! </p><p>Imagine an Ava network with 25 nodes. According to the whitepaper, that network can tolerate √25 = 5 malicious nodes without experiencing a liveness failure. For the mathematically inclined, that's 20% malicious nodes (or stake, or whatever). That guarantee is <em>significantly</em> worse than the one provided Bitcoin (which keeps liveness up to ~51% malicious nodes), but is sort of in the ballpark of BFT protocols which tolerate 33% byzantine nodes before experiencing a safety or liveness failure. Unfortunately, the story doesn't end here.</p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I&#39;ve been saying some of this stuff, much more nicely, for years. <br><br>The right substrate for all PoS coins is Avalanche. It&#39;s super fast, it scales, and it enables tens of thousands to millions of nodes to participate in the protocol. Come check it out. <a href="https://t.co/mgbhV1mZtL?ref=127.0.0.1">https://t.co/mgbhV1mZtL</a></br></br></p>&mdash; Emin Gün Sirer (@el33th4xor) <a href="https://twitter.com/el33th4xor/status/1309158257055993858?ref_src=twsrc%5Etfw&ref=127.0.0.1">September 24, 2020</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"/>
</figure><p>As Ava scales up, its liveness guarantee gets proportionally worse. At 100 nodes, an Ava network can only tolerate 10% of its nodes acting maliciously. At its current scale (about 1000 nodes), only 3% can be malicious without jeopardizing liveness. At the "tens of thousands of nodes" scale that its inventors <a href="https://twitter.com/el33th4xor/status/1309158257055993858?ref=127.0.0.1">promise on Twitter</a>, an adversary controlling even 1% of stake can bring the network to a halt. And, unlike most proof-of-stake protocols, <a href="https://twitter.com/el33th4xor/status/1382721311814868996?ref=127.0.0.1">Ava doesn't have a mechanism</a> to identify and slash the adversaries. Game over.</p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Avalanche does not have slashing. <a href="https://t.co/j4GQXYn0Jm?ref=127.0.0.1">https://t.co/j4GQXYn0Jm</a></p>&mdash; Emin Gün Sirer (@el33th4xor) <a href="https://twitter.com/el33th4xor/status/1382721311814868996?ref_src=twsrc%5Etfw&ref=127.0.0.1">April 15, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"/>
</figure><h3 id="non-zero-probability-of-success">Non-zero Probability of Success</h3><p>But, as I can already hear the Ava defenders protesting, the whitepaper provides another liveness guarantee: </p><blockquote>P2. Liveness (Upper Bound). Snow protocols terminate with a strictly positive probability within \(t_max\) rounds.</blockquote><p>Unfortunately, the statement that an algorithm "terminates with strictly positive probability" is meaningless as a guarantee about a system's behavior. After all, the probability that the computer you're reading this on will spontaneously re-assemble itself into a sculpture of Satoshi Nakamoto is also "strictly positive". Non-zero probability of your Dapp working is not a guarantee you should be comfortable with. I know it seems like I'm just being pedantic here. I'm really not trying to be. This is deeply important. </p><p>As the fraction of nodes (or stake, or whatever), controlled by the attacker exceeds that  √n upper bound, the performance of Avalanche gets poorer and poorer. Exactly how quickly performance degrades is not clear from the whitepaper - it says that the slowdown is polynomial when <em>f</em> exceeds √n but becomes exponential as <em>f</em> approaches \(\frac{n}{2}\) - but even a polynomial increase in communication is really bad.  </p><p>Remember, we're talking about thousands of nodes here - if each node has to send and receive a small polynomial number of messages (say, \(n^2\) ), that translates to <em><strong>millions</strong></em> of messages <em><strong>per decision</strong></em>. That won't work. Even if the slowdown is polynomial, a decentralized Ava network is dead in the water. </p><h2 id="danger-close">Danger Close!</h2><p>As it turns out, there's an even more worrying development on the horizon. <a href="https://twitter.com/SarahJamieLewis?ref=127.0.0.1">Sarah Jamie Lewis</a> of Open Privacy recently published (and formally verified) <a href="https://git.openprivacy.ca/sarah/formal-verification/raw/branch/master/snowfall.pdf?ref=127.0.0.1">an attack on Snow-family consensus protocols</a> which purports to break the safety and liveness guarantees of Ava's underlying consensus algorithm. </p><blockquote>Through probabilistic modelling (sic) we formally verify an adversarial strategy that forces correct nodes to choose between safety and liveness even when f &lt; O( √n).</blockquote><p>If you develop on Avalanche, this should make you really nervous.  Avalanche claims to provide strong guarantees of safety and liveness as long as the attacker controls less than √n nodes. If those guarantees can really be broken, even in a contrived setup, then there's something wrong with the whitepaper. Remember, a single error contaminates the entire security proof. If the Ava whitepaper is 99% correct, it's 100% wrong.</p><p>I want to be clear: this doesn't necessarily mean that Ava is not secure. It means that we don't know whether Ava is secure or not. That's ok - research takes time. But it's not ok to build a production app on a system that might be fundamentally broken. Not if there's another option. </p><h2 id="tradeoffs-the-good-the-bad-and-the-downright-weird">Tradeoffs: The Good, the Bad, and the Downright Weird</h2><p>This brings me to my final point. When you're designing a complex system, you're often forced to accept a tradeoff between two desirable properties. Like all blockchain projects, Ava chose a set of tradeoffs that its designers found compelling - it traded away some security guarantees to secure faster consensus. But here's the rub: <em>consensus was never the bottleneck to begin with</em>! The real bottleneck in modern blockchains is state growth. </p><p>In Ethereum, each read or write into the <em>database</em> incurs seven or eight random disk accesses (because the database uses a trie structure internally), and each<em> state access</em> incurs seven or eight random <em>database</em> accesses in the course of traversing the state trie. </p><p>For those of you who haven't spent time a lot of optimizing computer programs, that's what we in the business call "really bad". Random disk accesses are <strong><em>SLOW</em></strong>. In the time it takes to do a single random read, a CPU might be able perform 10,000 computations. This is why your full node can take a long time to sync even though the CPU is mostly idle. Sync times are typically dominated by disk accesses.</p><p>But remember how I said that each database access takes seven or eight writes, and each state access takes seven or eight database accesses? It turns out that both of those numbers grow with the size of a blockchain's state. Specifically, they're each about log(s) where S is the state size. So the time it takes to process a transaction grows with \(log^2(s)\). The next time Ethereum state size doubles, the time it takes to process a given transaction will increase by roughly 30%. This is why Ethereum keeps the block gas limit low - it needs to limit both block processing times and state growth. </p><p>As long everyone can process every candidate transaction, coming to consensus is relatively easy. There are already dozens of consensus protocols which offer fast finality. But if state size grows too quickly, processing transactions on commodity hardware becomes impossible. It's processing blocks that's the bottleneck, not coming to consensus. (And in case you were wondering, the Avalanche-go client relies on Ethereum's Geth - so all of these limitations really do apply to Ava.)</p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">The discussion was raised on the context of scaling, people saying that AVAX is better than Ethereum because it scales. I pointed out that it uses Geth under the hood. From the contextual perspective, you are changing the topic.</p>&mdash; Péter Szilágyi (karalabe.eth) (@peter_szilagyi) <a href="https://twitter.com/peter_szilagyi/status/1358654546646626304?ref_src=twsrc%5Etfw&ref=127.0.0.1">February 8, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"/>
</figure><p>So, to recap, Ava trades safety and/or liveness guarantees for speed of consensus, but consensus was never the bottleneck. That's like taking the engine out of a motorcycle to give it more cargo space. It's not illegal, just... not a good tradeoff. </p><h2 id="conclusion-building-on-ava-considered-harmful">Conclusion: Building on Ava Considered Harmful</h2><p>Not everyone wants to build on Ethereum. That's good and healthy. We live in a multi-chain world. But please, for the love of all that is holy, stop treating Avalanche like a production system. Ava is a cool distributed systems research project, but it's not a good place to build mission critical applications. Its safety and liveness guarantees are much weaker than those of other projects, and it might have a fundamental consensus flaw. Besides, even if it works, it doesn't solve the scaling problem. </p><p><em>Author's Note: Did I get some things wrong? Almost certainly. If you find one, please reach out on Twitter. I'm happy to issue corrections or retractions as necessary. </em></p><h3 id="psto-the-folks-at-ava-labs">P.S. - To the Folks at Ava Labs</h3><p>Like all blockchains, Ava has its share of evangelists  - and that's ok. But <strong><u><em>please</em></u></strong> be careful making claims like <a href="https://twitter.com/el33th4xor/status/1316471519640465411?ref=127.0.0.1">this</a> on Twitter. Ava is <em>not</em> live with 49% honest nodes unless you define "live" as "having non-zero - but arbitrarily small - probability of advancing". At best, you're just going to confuse a lot of newcomers. Ava isn't another IOTA, but this kind of rhetoric is how you would turn it into one. </p><figure class="kg-card kg-embed-card"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I can talk about Avalanche. Avalanche has a parameterizable decision protocol that can withstand huge attackers, even 80%, with safety, but not liveness. It requires 51% honest nodes for liveness and safety together. And its bootstrap requires 51%.</p>&mdash; Emin Gün Sirer (@el33th4xor) <a href="https://twitter.com/el33th4xor/status/1316471519640465411?ref_src=twsrc%5Etfw&ref=127.0.0.1">October 14, 2020</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"/>
</figure><p/><p/>]]></content:encoded></item></channel></rss>