GRANDPA Questions

Hello,

while studying the GRANDPA protocol I have come across a few questions that I could not satisfactorily answer for myself and would appreciate some clarification on:

  1. Finalisation: Why is it explicitly stated in section 3.1 that B = g(C_{r,v}) must have a supermajority in V_{r,v}? Does a supermajority of precommits not imply a supermajority of prevotes for B? After all, any honest voter precommits on the prevote-ghost he sees and even commit messages, which offer an additional finalisation condition (see also the next question), only contain precommits as well.

  2. Finalisation: Section 3.1 mentions the optional sending of commit messages, defining the receipt of a (valid) commit message as an alternative finalisation condition. Since commit messages seem to be optional for finalisation in the core protocol (they seem to be primarily used in the accountable safety protocol, i.e. Theorem 4.1, and possibly in the context of voter set changes), why have this additional finalisation condition? Is this a case of “why not” if commit messages are sent anyway (e.g. for the accountable safety protocol) or are there other reasons?

  3. Primary Block “Hint”: At the start of a round, the primary broadcasts the last round estimate E_{r-1,v}, followed by sending its prevote for the best chain containing E_{r-1,v} after t_{r,v} + 2T. Does the primary really need to wait 2T before sending its prevote and if so, why? Relatedly, do I understand correctly that the broadcasting of E_{r-1,v} and the prevote by the primary are intentionally separate steps so that the primary can prevote for a block B > E_{r-1,v}?

  4. Primary Block “Hint”: Related to the previous point, in the Polkadot Runtime Environment Specification section 5.2.4 algorithm 5.9 line 4, the primary supposedly even broadcasts a finalisation / commit message at the beginning of the round for the last round estimate, followed by a condition in line 7 for receiving a prevote message from the primary. That seems like a mismatch, but assuming line 4 is supposed to refer to a prevote message, i.e. M^{r-1,pv}_v, then this is still a deviation from the GRANDPA paper, since the primary then always prevotes for the previous round estimate, i.e. the block “hint” and the prevote from the primary are not separated. Is that intentional?

  5. Optimised GRANDPA: Why does the algorithm wait for a child of the last finalised block at the end in step 3, delaying the precommits, instead of waiting for such a child to appear at the beginning of a round, i.e. before step 1? Or put differently, why allow prevoting on a finalised block?

  6. Finalisation & Optimised GRANDPA: Why does the “regular” GRANDPA protocol restrict finalisation for a voter v to rounds where v completed the precommit step (even for received commit messages), if that restriction is then lifted in the “optimised” GRANDPA protocol in order to allow a voter to catch up to the current round after observing two rounds of voting (which requires finalising blocks that v did not vote on)? What kind of bad things can happen if something is finalised by a voter v based on a supermajority of precommits observed in round r if v did not actually reach the precommit step of round r (yet)?

  7. Equivocation: The GRANDPA paper just says

    A voter equivocates in a set of votes S if they have more than one vote in S

    whereby presumably S is either a set of prevotes or precommits in a particular round. On the other hand, the Polkadot Runtime Environment Specification Definition 5.17 is more specific in that it says

    Voter v equivocates if they broadcast two or more valid votes to blocks not residing on the same branch of the block tree during one voting sub-round.

    Which one is the preferable definition and if it is the latter, presumably the vote for the block with the higher block number on the same chain by the same voter is supposed to be the one that takes precedence when calculating vote weights?

  8. Finalisation: Do I understand correctly that, since finalisation of the precommit-ghost in a round r does not depend on r being completable, the finalisation condition may be satisfied multiple times for different precommit-ghosts in the same round (since as long as the round is not completable, the precommit-ghost, and hence the estimate, may still change to a later block)?

Thanks a lot for your help!

Hi Romanb. There are some subtleties here and I hope that these answers make sense.

  1. Yes, under the 2/3 honest assumption, this is redundant. At the moment GRANDPA participants in the protocol outlined here ignore finality. But this is not true in the implementation. In particular, it uses the latest finalised block as a base for computing g(). I think we run into problems with accountable safety if we didn’t have this condition and we did that. An honest validator should not be slashed by the accountable safety protocol even if there are more than 1/3 or 2/3 dishonest validators, under the assumption that we can run the challenge protocol in a censorship resistant way. This is related to the answer to Q6, so I’ll answer that next.

  2. So suppose that there are two blocks B and B’ that are children of the same parent. B is the our estimate of the last round, r. But suppose that we see that B’ was finalised in a later round r’. If we have 2/3 honest validators, then seeing this means that we don’t need to vote on a chain incliuding B, because the safety proof then means that we didn’t finalise B as we later finalise B’. But if we do not have 2/3 honest validators and we can justify not voting for B’ because we saw that B was finalised later, then it is possible for 2/3 dishonest validators to create a “closed causal loop”, where they justify not voting for B because B’ was finalised later, and then use that to justify the sequence of votes that eventually finalised B’. In this way, they could finalise B in round r and B’ in round r’ and get away with it.

So we cannot let the fact that something is finalised in a later round affect our voting in an earlier round, and the way the implementation works would do that. But it would be fine if we didn’t actually vote in rounds between r and r’, which is what the skipping ahead procedure in optimized GRANDPA gets. If at least 2/3 of validators votes in a round and those who do, do so using the estimate from the last round and the last round is completable, then the accountable safety works.

  1. It was more a question of why not. The commit message actually contains precommits, possibly with aggregated signatures and acting on those gives the same result anyway. This section predates the “catch-up messages” in polite/optimized GRANDPA and I just wanted to make explicit that this hould happen even if the inital gossip didn’t reach us.

  2. No the primary doesn’t need to wait. And nor does anyone else if the primary’s broadcast makes sense. But they may have to wait up to 2T even after the primary’s broadcast if their estimate is later, because hopefully they would see the precommits that allowed the primary to move back their estimate and do that as well. Hopefully the later optimized GRANDPA will handle this more sensibly.

  3. That’s a mistake. It is not the primaries prevote that should be here, but the broadcast from line 4.

  4. If we ran GRANDPA rounds continuously, the best time to recieve a new block in terms of time to finalise it, is clearly just before the prevote. However, the worst time is half-way through a prevote, because then we don’t get enough votes for anyone to precommit to it in this round and have to wait a whole extra round. The reason not to wait in the prevote step is that other validators may already have prevoted, and then this is the worst time to stop, rather than the best.

As long as the BFT properites we have hold, it’s probably not worth the risk of running a hacked client to delay finality by 2 seconds. So the validators are probably all honest, but we also assume that 2/3 of validators are synced and have good networking. So the up to 1/3 of Byzantine guys we need to tolerate are probably those who are honest with bad networking. And these guys definitely might be voting a block behind other people.

Suppose right now we have a block B, with a parent A, that most people see as finalised, and they wait for a child C before prevoting rather than vote for B. So we have 2/3 precommits in the last round for B. But quite possibly we’ll have had ones for A too. And someone who saw 2/3 of precommits, but not just for B, can see the last round as completable and B as the estimate. but not finalise B. If they still don’t receive enough precommits for B in another 2T, they won’t wait for C as they don’t see B as final when they vote. By Murphy’s law, this will happen most blocks. Now by the time 2/3 of good-network people have recieved C and prevoted, many validators may already be at the precommit step and precommited for B as they saw some for B and some for C. Now it takes two rounds to finalise C.

I’m not sure that putting the waiting in the middleof the rounf is the right solution, but it should reduce the worst case time before the new block is finalised. It also means that validators still start rounds within time T.

  1. The first one is correct right now. We were thinking of extending votes to include multiple recent blockhashes, so we could make sense of them somewhat even if we haven’t seen the latest blocks, but how to deal with impossibility then has never been specced.

  2. This is definitely true. If we need to finalise many blocks in one round, we can update the precommit-ghost many times. And we will frequently be doing that when we have already started the next round.

Hi Alistair,

thanks a lot for your reply. I had to digest the details of your answer for a while, trying to fill in any remaining gaps in my understanding myself, but I’m afraid there is still some confusion left on my part on some of the subtler points, namely 1, 5 and 6, on which I would greatly appreciate further clarification. See my comments below.

ad 1)

I understand that an honest validator should not be made punishable by dishonest validators, i.e. it needs to be able to validly respond to the questions arising in the accountable safety protocol, but I don’t currently see the connection of question 1 to how g is computed. However, I realised that even with all honest validators, it may well happen that a validator v sees a supermajority of precommits for a block before seeing a supermajority of prevotes for that same block. For example, with 4 validators and blocks A < B, validator v may prevote for A while all three others may prevote for B. Now v may see a supermajority of prevotes (i.e. 3) for A before seeing it for B and thus precommit on A whilst all others may see a supermajority for B and precommit on B. Then v may see the supermajority of precommits for B before seeing the supermajority of prevotes for B if one of the prevotes has sufficient delay. Without the requirement for supermajority in V_{r,v}, validator v may then finalise B without (yet) having seen a supermajority of prevotes for B and since the accountable safety protocol may result in v being asked to present such a supermajority as justification still before it eventually sees the last prevote for B, it may not be able to respond. If this scenario is valid, it certainly seems prudent to explicitly require supermajority in V_{r,v} for any finalised block. Does that seem right?

Of course, I would still like to understand what you have in mind w.r.t. the problem with how g is computed and how that presents a problem without the requirement for supermajority in V_{r,v}, but if the above reasoning is correct, at least I’ve already found another justification to convince myself of the necessity for this condition.

ad 6)

These may not necessarily be children but any two descendants on different branches of some common ancestor, right?

Here you are already taking for granted that g is computed based on the last finalised block, what you mentioned in the context of question 1, is that right? Since only according to the paper, the votes in rounds between r and r’ of honest validators would be unaffected by seeing B’ as finalised, i.e. they would always be for the head of the best chain including the last round estimate (B in the case of round r + 1).

At this point I am lost, because how can we see that both B' is finalised later (from the previous sentence) and now that B is finalised later. It is not always clear to me which validator(s) (or node(s) in general) you are referring to with “we” and “they” and a further disambiguation would be great.

I think I understand what you’re trying to show, namely that in a situation with 2/3 dishonest validators, if votes of honest validators in earlier rounds are influenced by what has already been finalised in later rounds, it can lead to different honest validators finalising blocks on different chains (which is generally possible with f > 1/3) without being able to identify and punish at least 1/3 of the dishonest validators (not OK). Is that what you’re trying to show in this example (which I have trouble understanding)?

In particular, at which point exactly in the example does seeing B' as final result in undesirable votes by some honest validator(s) which then in turn allow the dishonest validators to go unnoticed in the accountable safety protocol?

ad 5)

Thanks for this example. I think I understand it, but to be clear, when you write “Now it takes two rounds to finalise C”, you mean “to finalise it across all validators”? Because C may very well get a supermajority of precommits in this round already (which are also precommits for B, since C > B) and may thus be finalised by 2/3 or more of the validators in this round. I don’t see the advantage of having the 2/3 only wait at the precommit step - sure enough they then already prevoted for B, but the other 1/3 are still waiting on at least 1/3 of precommits for B in order to be able to finalise it, and the other 2/3 are waiting for C before sending these. And again, despite seeing C already in that round, the 2/3 can only prevote for it in the next round, as they already prevoted for B in the current round. So the time it takes for the 1/3 to finalise B seems to be the same, regardless of whether the 2/3:

a) wait for C > B, then prevote and precommit at which point the 1/3 who already precommitted for B can finalise it it.

or

b) prevote for B, then wait for C > B, then precommit for B at which point the 1/3 who already precommitted for B can finalise.

The difference being though that with b) no validator gets the opportunity to vote and potentially finalise block C > B in that round already. What am I missing?

You mean that all honest validators still start round r+1 within time T of the first such validator seeing round r as completable? Why is that desirable?

Thanks again for your help and patience with the clarification of these details.

I ran into another question that I would like to add to the list here:

9) In the preliminaries section of the paper, the notion of possibility for a supermajority is defined as follows:

We say that it is impossible for a set S to have a supermajority for B if at least (n + f + 1)/2 voters either vote for a block \not \geq B or equivocate in S. Otherwise it is possible for S to have a supermajority for B.

Suppose n = 5, thus f = 1 and threshold = \lceil (n + f + 1)/2 \rceil = 4. Now given two different blocks A and B where either A < B or A \not \sim B with A having 3 votes in S, B having 2 votes in S and no equivocations, so |S|=5. Then according to the definition it is possible for both blocks to have a supermajority in S, even though it is evidently impossible for B, since there can be at most f=1 equivocations bringing the votes for B up to 3 < threshold. This also seems to provide counterexamples to the claim in the next sentence which reads:

Note that if S is tolerant, it is possible for S to have a supermajority for B if and only if there is a tolerant T \supseteq S that has a supermajority for B, […]

In the scenario given above, the only tolerant superset of S would contain an equivocatory vote, yielding at best 2 + 1 = 3 < 4 votes for B, i.e. not a supermajority.

Approaching the question of possibility for supermajority from the other direction, i.e. asking “Can B still reach supermajority?”, leads me to the following definition: Let V_S(B) denote the (non-equivocatory) set of votes for B in S and \mathbb{V}_S the set of voters in S, then it is possible for B to have supermajority in S iff

(\star) \qquad |V_S(B)| + (n - |\mathbb{V}_S|) + min\{f, |\mathbb{V}_S| - |V_S(B)|\} \geq threshold

i.e. “(non-equivocatory) votes on B” + “missing votes” + "possible equivocations from voters \not \geq B" \geq threshold. With this definition in the aforementioned example it is judged impossible for B to have a supermajority, since 2 + 0 + 1 = 3 \not \geq 4 = threshold. An equivalent calculation to (\star) also seems to be what is actually implemented. Since these definitions are clearly not equivalent, that leads me to believe that either the paper or the implementation is incorrect. If it is the latter, I would really like to understand why.

Thanks for your help!