From the Archives: Pentagonal Exchange
A Technique for Trustless Decentralized Random Number Generation
This post was first published in 2017 - and is now thoroughly outdated. Still, at the time, it was a pretty good idea and attracted quite a bit of attention from Ethereans. Please don't try to implement this today. Just use threshold signatures.
Introduction:
Cryptocurrencies are based on the idea that individuals will act in their self interest. This makes the problem of random number generation extremely difficult. To see why this is, let’s take the example of an online lottery for some extremely large sum — maybe $10,000,000,000. Every entrant in this lottery has been assigned some “ticket number,” and the winner will be the individual whose ticket number comes out of a random number generator. Let’s also say that each ticket costs a non-trivial sum — maybe $1,000.
An obvious mechanism for decentralized random number generation is for each entrant in the lottery to pick some number, and then have the smart contract combine each of these numbers to create a seed for a random number generator. The problem here is that the last participant to submit their number controls the outcome of the drawing — they can submit any number they want and thus create any outcome that they want. All they have to do is wait until everyone else submits their numbers, and then choose a secret number of their own that produces the desired result.
To avoid this kind of cheating, RANDAO (Ethereum’s current random number generation protocol) breaks the process into two steps. First, participants publish the hash of whatever secret number they plan to submit. Then, once every participant has had an opportunity to publish a hash, the contract asks for the secret numbers. As the secret numbers are submitted the network verifies that they match the hashes provided earlier. This prevents participants from changing their minds about which number to submit, removing the final participant’s ability to rig the outcome.
However, despite this precaution one crucial weakness remains — although the final participant can’t fix the outcome, he can alter it by refusing to submit his secret number. Since the smart contract is public, we have to assume that he’ll be able to calculate the outcome and make whatever decision best suits his interests. Essentially, if he won’t win by submitting his number, he won’t submit. Since hash functions are one-way, the smart contract has no way of knowing what his secret number is and it must either 1) proceed without his contribution, 2) try again (collect a new round of hash/secret number pairs) or 3) terminate, refunding the cost of lottery tickets to each entrant.
None of these options are desirable. If we proceed without the dishonest participant’s secret number, then a malicious participant has altered the outcome of the lottery, essentially doubling his chance to win. If we try again, we haven’t addressed the weakness of the first solution — a single bad actor has still altered the outcome — and we’ve added the increased time and computation costs of a second attempt at random number generation. In addition, there’s nothing to prevent our bad actor from causing our subsequent attempts to fail in exactly the same way. Finally, if we terminate the lottery because of a single malicious participant, we open ourselves up to denial of service attacks.
Current solutions attempts to mitigate these problems by requiring participants to put in a security deposit before they submit the hash of their secret number. If participants fail to follow through by submitting their secret number at the appropriate time, the dishonest participants will have their security deposits confiscated and split among the honest participants. This approach goes a long way toward eliminating the danger of dishonesty, but it isn’t foolproof. Continuing the example from earlier — let’s say that our participant has bought a lottery ticket for $1000. RANDAO would require him to put in a relatively large security deposit- say, an extra $2000 — to participate in the random number generation. This way, if he attempts to influence the result of the lottery by failing to submit his secret number (which, as we said earlier, effectively doubles his odds of winning), his expected reward (the value of a second $1000 ticket) is outweighed by the cost of his lost security deposit ($2000).
Unfortunately, this system of incentives has one serious limitation — it’s vulnerable to Sybil attacks. To see why this is, let’s imagine a fairly powerful malicious actor. Instead of buying one ticket, he buys 100 tickets under different identities. With ninety-nine of those tickets, he acts honestly, submitting his secret numbers at the appropriate time. Now, if he knows that submitting his last secret number will make him lose the drawing, he can fail to submit and give himself an extra 100 chances to win — one for each ticket he holds. Even though he loses $2000 from his confiscated security deposit, he comes out $98,000 in the black.
With current techniques, the only way to combat this vulnerability is to increase the size of the security deposit. However, if the incentives for dishonesty are extreme enough, the deposits eventually get so large that individual actors can no longer afford to participate, and the protocol breaks down.
It should be noted that Vitalik Buterin has proposed an improved scheme called “RANDAO++” which attempts to combat this vulnerability. Rather than security deposits, this new proposal relies on limiting the timeframe for submission and computing so many sequential hashes of each secret number (say, 1,000,000,000) that no participant can possibly know the result of tampering with the generation process until after his window for action has closed. This innovation does theoretically solve the problem of tampering, but it is extraordinarily computationally expensive. Most participants will lack the hardware to themselves verify the result of the lottery within a reasonable time frame, but the gas cost of using the Ethereum Virtual Machine to verify the result would be exorbitant. This will result in centralization of trust in those interested parties who also happen to have access to massive computing power.
In this paper, I propose a new technique to solve the problem of tampering in random number generation — Pentagonal Exchange.
Pentagonal Exchange:
Step 1: Separate participants into groups of five. This can be done pseudorandomly, as there’s no significant advantage to be gained by manipulating these groupings. Assign each of the five nodes within a group an index (1–5) for reference. Again, this can be done pseudorandomly — no combination of indices has any advantage over any other combination. As in RANDAO, each contributor should submit a security deposit to help ensure good faith. Each participant establishes an encrypted connection with each other participant in his group.
Step 2: Each contributor creates a secret number (or seed) for itself. At this point, here’s what each participant knows:
1 — Secret Number 1
2 — Secret Number 2
3 — Secret Number 3
4 — Secret Number 4
5 — Secret Number 5
Step 3: Each contributor generates a second seed. Each contributor should signal the network when Step 3 is completed by publishing the hash of the two components that it generated. (For example, node 1 publishes SHA3(SN1) and SHA3(SN6)). At this point, here’s what each participant knows.
1 — Secret Numbers 1 and 6
2 — Secret Numbers 2 and 7
3 — Secret Numbers 3 and 8
4 — Secret Numbers 4 and 9
5 — Secret Numbers 5 and 10
Step 3A: Once the hash of every pair has been published, take the hash of the next complete block of the Ethereum blockchain as an eleventh input into the final seed. In most circumstances (exceptions are detailed in the last section), this prevents even a participant who controls all five nodes from rigging the outcome of the random number generation.
UPDATE: 11/14/17 — Philip Daian rightly points out that when all five nodes collude, we’ve regressed to an earlier RNG proposal in which some future block is agreed on to be the sole seed. In this case, it’s possible that a dishonest miner could find a valid block, realize that he will lose the drawing if he publishes his block, and fail to publish. This increases his chance to win. I’ve added a section to the paper to address some of these concerns.
(Note: At this point, the entire network — including our five nodes — knows SHA3(block(n+1)), as well as SHA3(SN1), SHA3(SN6), SHA3(SN2), etc.)
Step 4: Each contributor sends its first secret number to the nodes which would be two spaces counterclockwise (“left”) and one space “right” of itself if the nodes were arranged in a pentagon (see diagram below). It sends its second secret number to the nodes two spaces “right” and one “left”. Each contributor should signal the network when it has received each component.
At the end of step four, here’s what each individual node knows:
Node 1 — Secret Numbers 1, 3, 5, and 7, 9, 8
Node 2 — Secret Numbers 1, 2, 4, and 8, 10, 9
Node 3 — Secret Numbers 2, 3, 5, and 9, 6, 10
Node 4 — Secret Numbers 3, 4, 1, and 10, 7, 6
Node 5 — Secret Numbers 4, 5, 2, and 6, 8, 7
Note: Even if two nodes are controlled by a single malicious entity, that adversary still doesn’t have all ten secret numbers and cannot predict what output will be generated.
Step 5: Each contributor publishes its knowledge. At this point, even if two of the five nodes decide not to publish (losing their security deposits in the process), all ten secret numbers can be transmitted to the network. If any node tries to change the value of the secret numbers it publishes, it will be found out when the network compares the hashes of these values to the hashes published after Step 3, and the offender’s security deposit will be forfeited.
Step 6: The secret numbers, along with the hash of the most recent Ethereum block, are fed into a hash function to create a seed for the random number generator. (We’ll call this seed the primary seed. For scaling purposes, the primary seed can be hashed again to create a “secondary seed.”)
Scaling up and down:
What if there aren’t exactly five participants? If there are less than five interested parties, we simply bundle groups together until we have five participants — a group of two waits for a group of three that also needs a random number, and then the five carry out Pentagonal Exchange, and each use the seed for their own purpose. If they so desire, each group can agree ahead of time to modify the seed slightly — maybe one group will use the primary seed and one group will use the secondary seed — so that they don’t have to share a random number. In high security situations, we can require a substantial proof of work from groups before they are paired. This makes Sybil attacks difficult and limits the likelihood of dishonesty. However, we should remember that even Sybil attackers who control all five nodes in a round of pentagonal exchange are unable to do anything worse than cancel the number generation.
What if we have more than five participants? Going back to our lottery example, $10,000,000,000 in tickets divided by $1,000 per ticket gives us at least 10,000,000 participants interested in choosing a random number. If we pick five participants to act as representatives pseudorandomly, we’ve simply regressed the problem. Instead, we group all of our participants into fives, and have each group carry out Pentagonal Exchange. Based on the seed generated by each group, one participant from that group is selected to serve as the representative of that group in the next round. Those representatives are grouped into fives, and the process is repeated. In this way, we can get from ten million to five representatives in only ceil(log5 (10,000,000)) = 11 steps.
(Note that when the number of participants in a round isn’t divisible by five, we need to fill up to four extra spots. We do this by sorting primary seeds from high to low. Then, working our way along the list, we select one extra participant to move into the next round from each of the top 1–4 groups. The second representative is chosen using the secondary seed. No group should be allowed to furnish more than two representatives for the subsequent round, and those two representatives should not be placed in the same group of five unless absolutely necessary.)
Once we are down to only five representatives, we carry out one final round of Pentagonal Exchange to determine the winner of the lottery. (Note: The seed generated by this final group of five is used to pick the winning ticket number out of all 10,000,000 tickets. At this point, the five representatives have no greater chance to win than anyone else).
In this way, we can securely generate a random number with participation from ten million (or more) entities. Pentagonal Exchange is relatively computationally inexpensive and, given the current Ethereum block rate of approximately 30 seconds, the whole process can be completed in about six minutes.
Author’s Note: This is the end of the original publication. In response to valid concerns about attacks through dishonest Ethereum minings, I’ve added the following section. Huge thanks to Philip Daian for pointing out this potential attack vector.
Preventing Ethereum Mining Attacks:
As laid out above, Pentagonal Exchange retains one potential weakness. When all five participants are dishonest and aware that the other participants are dishonest (i.e. when all five are controlled by one attacker), they may collude to allow ETH miners to influence the drawing. If ETH miners know that secret numbers ahead of time, they can calculate the effect of publishing valid blocks and only publish those blocks which produce the desired outcome. If the stakes are high enough, it’s possible that a significant portion of ETH hashpower could act dishonestly and someone could succeed in rigging the outcome.
To mitigate this risk, we need to do one of two things — minimize the chance that someone will benefit from controlling all five nodes, or find a source of external randomness that isn’t a blockchain. I’ll attempt to do the first in this section of the paper. I’ll attempt to tackle the second solution at a later date.
First, we should ask ourselves when an attacker benefits from controlling the outcome of the drawing. The answer — only the final round of Pentagonal Exchange. In earlier rounds, an attacker who controls all five participants in a group is guaranteed that one of his participants will move on, even if he doesn’t influence the outcome of the drawing. He has nothing to gain by sacrificing the ETH block reward. In the final round, however, the random number is used to pick a winner out of every participant, not just a group of five. This is where attackers will focus their efforts. It’s also important to remember that this ETH mining attack is only possible when all five nodes collude since the exchange of secret numbers hasn’t happened by the time the ETH block is published.
These two realizations make our job significantly simpler. We don’t need to worry about preventing dishonest mining in early rounds. In those rounds, the union of people who can attack (people who control all five nodes) and people who benefit from attacking (people who don’t control all five nodes and thus need to cheat to ensure that their node moves on) is exactly zero. This means that even in high stakes situations, the early rounds of vanilla Pentagonal Exchange should be secure.
And, if the early rounds are secure, the chances of someone controlling all five nodes in the final round are slim. For argument’s sake, let’s imagine that our would-be attacker buys about half of the 10,000,000 tickets in our drawing from the previous paper. Now, he controls roughly 50% of the participants in our pentagonal exchange. Since our early rounds are secure, Pentagonal Exchange should guarantee that the distribution of our final representatives approximates the distribution of tickets. If our adversary’s odds of controlling any one randomly selected node are 1/2, the odds of him controlling five consecutive randomly selected nodes are 1/32 = 3.125%. So, an adversary who controls half of the tickets has about a 3% chance of being able to cheat by dishonest ETH mining.
If he controls 100% of ETH hashpower, this means that in total our adversary has a 51.5625% of winning the drawing — 48.4375 (50% chance of being selected in 96.875% of drawings in which he doesn’t control all five of the final nodes) + 3.125% (100% chance of being selected in the 3.125% of drawings in which he controls all five nodes)
We should note, however, that cheating by dishonest mining doesn’t guarantee a win for our attacker unless he controls the entirety of ETH hashpower. If he controls all five PE nodes but not all the ETH hashpower, then there are three possible scenarios. 1) Someone besides our adversary wins the race to discover the next block of the ETH block chain and publishes. In this case, the adversary has gained nothing. The odds of this happening depend on the hashpower of our adversary relative to the network. 2) Our adversary wins the race, happens to discover a block that makes him win, and publishes. In this case, no attack is made — he acts indistinguishably from an honest participant. 3) Our adversary wins the race but the block he discovers will not allow him to win the drawing, so he suppresses his knowledge.
Unfortunately, the formula for calculating our adversary’s odds of winning when he doesn’t control all of the network hashpower is recursive. If he controls p percent of the total hashpower, his odds of winning outright are 48.4375% + (50% * (100-p) * 3.125%) + (50% * p * 3.125%), and his odds of suppressing his knowledge — essentially triggering a second race — are (50% * p * 3.125%). His chance of winning in a second race is [50% * (100-p) *(50% * p * 3.125%)] + [50% * p *(50% * p * 3.125%)] and his chance of triggering a third race is [50% * p *(50% * p * 3.125%)]. His chances of winning that third race are… well, you get the idea.
What this amounts to in practice is that an adversary powerful enough to buy 50% of the tickets in our lottery has between a 50% and a 51.5625% chance of winning, depending on the fraction of ETH hashpower he controls. Since we believe that no one controls 50% of the ETH network, it’s safe to assume that the actual number is closer 50% than 51.5625%. So, in practice, even vanilla Pentagonal Exchange is extremely robust to ETH mining attacks. However, this is an issue that should be taken extremely seriously. In the coming months, I intend to propose a system of “bootstrapped randomness” that I hope will completely eliminate the possibility of mining attacks.
Sixty Percent Attacks, and How to Prevent Them:
I’ve received several good questions about what happens when an attacker controls 3/5 of the nodes in a Pentagon. I’ll attempt to answer those questions in this section.
First, it’s crucial to remember what an attacker stands to gain in the early rounds of Pentagonal Exchange - namely, a better chance for one of his nodes to be selected to move on to the next round. This limits the scope of his attack significantly. He can’t attack when he controls one or two nodes of a Pentagon, and he won’t attack when he controls three (all three of his nodes would lose their deposits and be unable to move on). When an attacker controls all five nodes, he has nothing to gain by attacking - one of his five will move on and four won’t, no matter what he does. So, the only attacks will take place when an attacker controls four fifths of a pentagon.
We should also note that when an attacker controls 4/5 of a pentagon, he only needs to attack 20% of the time (the other 80% of the time, he is selected honestly). Since attacking requires an adversary to reveal at least three of the nodes under his control, and he wouldn’t have attacked if he controlled all five, we know that exactly one of the two remaining nodes is honest and one is dishonest. For now, let’s assume that we have some way of randomly choosing between the two remaining nodes (more on this later). We have a fifty-fifty chance of choosing the honest node which means means that our adversary has essentially raised his total odds of moving on to the next round from 80% to 90%. However, this 10% increase in chance of success costs him three security deposits in the 20% (1/5) of rounds in which he needs to cheat.
So, his cost of attacking is 3/5 of a security deposit per 10% increase in chance of winning. To compare this to his cost per chance of buying tickets, let’s multiply both sides by ten. At this rate, our attacker is paying 30/5 (or 6) security deposits per additional 100% chance of winning. However, his ticket cost of a guaranteed win is only 5. So, as long as security deposits are at least 5/6 of the ticket price, our adversary really hasn’t gained anything. For extra security, let’s set them at the full price of a ticket.
Now that the early rounds are secure, how do we go about securing the final round? Well, as we discussed earlier, later rounds are composed of a probabilistically representative group of participants when the early rounds are secure. This means that in order for someone to control a 3/5 majority of the final round they (probably) need to buy about 60% of the tickets in our lottery.
But if they an attacker buys 60% of the tickets, he also has to pay 60% of the total security deposits. If we go with the one-to-one ticket price to security deposit ratio, we calculate that this attackers initial input to the lottery was 120% of the jackpot. This is worthwhile to the adversary because he expects to get all of his security deposits back, but we can use it to our advantage.
Since his 3/5 majority is only enough to cancel and not rig the final round of our drawing (recall that Pentagonal Exchange requires him to commit to secret numbers before he can predict who will win the drawing), our adversary has to decide what to do in the 40% of drawings that he isn’t going to win. He doesn’t want someone else to win, so he’ll be tempted to refuse to publish his secret numbers. If he does, the network is stuck several inputs short of being able to generate a random number.
How do we prevent this from happening? The good news is, we don’t have to. As long as we hold on to everyone’s security deposit until the last round comes through, our adversary will never have an incentive to attack. Recall that our adversary has bought 60% of the tickets and paid 60% of the security deposits. He’s paid more than the jackpot (and more than everyone else combined) into the system. That means that if he refuses to submit his secret numbers, he’s lost more money than the entire rest of the network has, combined. More importantly, he hasn’t changed the outcome of the drawing. If we go back to our lottery example from earlier, our adversary will have burned twelve billion dollars, and have nothing to show for it. This makes his rational response painfully clear. He has to publish his secret numbers and salvage his security deposits, and our system remains secure.
Now, there’s just one more point to touch on - if three nodes in an early pentagon fail to submit, how do we choose between the two that are left? There isn’t a perfect answer to this question, but I think that in practice, using some permutation of the next block of the ETH blockchain would turn out to be close enough to random for our purposes. Yes, a dishonest miner could theoretically withhold a valid block to give his node a better chance at being selected, but he has to burn three security deposits just to get to a point where he can mine dishonestly. Even then, he can only increase his chance of being selected from fifty-fifty to 75%. So, in practice, sacrificing the ETH block reward probably won’t be worthwhile. However, this is certainly an area where there’s room for improvement — it’s likely that some provably perfect solution exists, I just haven’t found it yet.
Author’s Note: This entire paper, and especially the final section, represents an idea in its earliest stages — please feel free to contact me with technical critiques and don’t hesitate to point out any errors in logic.