S-Kademlia and multipaths

This post assumes you know what Kademlia is, and roughly how its distance metric and routing algorithms work.

Actually, there are many possible routing algorithms based on any given distance metric, and while we know how to analyse various properties of a routing algorithm, the field has not yet devised a single objective way to compare two routing algorithms, nor even what it means to be “the best” routing algorithm. Most of the reasonable ones all follow a similar pattern though, roughly:

  1. each node maintains a routing table, which contains ~k “close” entries, ~k “further” entries, ~k “even further” entries etc, for super-linearly expanding notions of “further away”
  2. routing is done by choosing the neighbour in your routing table that is closest to the target, and then iterating to their neighbour, etc etc.

The devil is in the details of course, and one major thing is that in practise you want to do this in parallel - but how parallel.

The original paper defines a parallel routing algorithm roughly as follows:

  1. query the closest d neighbours
  2. out of the combined set of their results, select the closest next d
  3. loop until you’ve reached your target, or run out of patience

This is an insecure parallel routing algorithm - if even only 1 of the neighbours is malicious, this can trivially be attacked by giving d results that are fake nodes “very close” to the victim’s target, but are in fact all controlled by the attacker. From this point onwards the attacker can lead the victim on a wild goose chase, repeatedly giving him bad results that are again controlled by the attacker. This is an example of the general class of attacks called the “eclipse attack”.

S-Kademlia tries to protect against this, to quote: “The initiator starts a lookup by taking the k closest nodes to the destination key from his local routing table and distributes them into d independent lookup buckets. From there on the node continues with d parallel lookups similar to the traditional Kademlia lookup. The lookups are independent, except the important fact, that each node is only used once during the whole lookup process to ensure that the resulting paths are really disjoint.”

The meaning is hard to decode; worded slightly differently this means:

  1. keep track of d separate lookups for a given target lookup. each lookup is expect to hop through a bunch of different peers.
  2. when the results of the i-th lookup are returned, select the next hop for this i-th lookup out of these results, but ignore any nodes you’ve already previously queried in any of the d lookups

(To make this work and to improve redundancy, S-Kademlia also has the closest d neighbours to a given target store the value for that target, instead of just the single closest node storing it. This is not very relevant for the discussion below so we’ll ignore it from here on and just focus on the actual routing part, before we reach the destination.)

The view then, is that we have d lookups, where each lookup hops through a different sequence of peers to reach one of the target storage nodes. The S-Kademlia paper calls these paths “disjoint paths” and asserts that being disjoint is a security property, leading some other internet commenters to believe that losing disjointness would be to lose this security protection. However this view is not accurate - it purposefully discards a lot of information, omits handling some corner cases that degrade its performance in a real network, and gives an inaccurate picture of security. Below, I’ll first describe why this view is inaccurate, then describe a more complete viewpoint.

Suppose d=3, k=3, and we issue a query for target 100 to our neighbours {1, 2, 3}. They reply, in order:

  1. node 1 replies {4,5,6}
  2. node 2 replies {4,5,6}
  3. node 3 replies {1,5,6} – note that this is perfectly legitimate in the general case since node 3 can’t be expected to know that we also have node 1 as a neighbour

According to S-Kademlia, one possible choice of paths would be {1, 6} and {2, 5}. Once you have made these choices, there is nothing further to do when you get a reply back from node 3 because all its results have already been used up.

However, this effectively reduces your query from this point onwards to be 2 lookups instead of 3, which is what we specifically wanted to avoid! By the problem assumptions, we have the resources to execute 3 lookups in parallel, but we end up under-utilising this due to too-restrictive rules on S-Kademlia’s part.

Let’s take a step back however. Why did we have to choose {1, 6} and {2, 5}? If we had chosen something else, namely {1, 4} and {2, 5}, we could have then chosen {3, 6} and this would still be within S-Kademlia’s own rules.

The problem arises from arbitrarily selecting one of a set of results, and then discarding the rest too early in a way that significantly affects the remainder of the algorithm. In other words, S-Kademlia’s rules as stated are not symmetric - they give significantly different outcomes (2 lookups vs 3 lookups) depending on the internal random choices we ourselves make, regardless of what the outside world tells us.

As a general principle in designing algorithms, the results of arbitrary decisions should be indistinguishable from each other (in the long run), otherwise the design of the algorithm is questionable. In the flawed S-Kademlia algorithm, the decisions are easily distinguishable (d vs d-1 parallel lookups), but in the version below I give this invariant is preserved.

Objectively speaking, there is no such thing as “disjoint paths” - they are an artifical construct made up for convenience to help explain the S-Kademlia algorithm, but is not the whole picture of what actually happens in reality. From the example above, since we chose 6 our query will use that within one of its paths, but we “arrived at” 6 via all three nodes {1, 2, 3}, there is no reason to pretend that we only used {1, 6, …} as a “disjoint path”. The actual paths we used are already overlapping. If we want to, we could select some paths out of the ones we used, and these would be disjoint when ignoring the paths we did not select. However this is a simple logical consequence of another property we describe below, that is more directly the contributor to security, and that we consider a more natural way of thinking about the problem that does not involve selecting some paths and ignoring others.

First we start with a fuller picture. Between adjacent hops, what you actually have is a bipartite graph of hop-n query-peers to hop-n result-peers. More precisely: suppose for hop n, you have d nodes to query, and each of them gives you back up to k results. Then, you could have anywhere between 0 and d * k results, since different nodes could give you the same result. Call the query nodes Q[n] with Q[n].size = d and the result nodes R[n] with R[n].size <= d * k, then Q[n] and R[n] forms a bipartite graph with outdegree(v) = k for every v in Q[n]. To choose the query nodes for the next hop Q[n+1], we want to select up to d nodes out of R[n]. Instead of disjointness, we express our desired security property as follows:

  1. As long as R[n].size >= d, we must choose Q[n+1] such that its size = d.

    Note: if R[n].size < d then we are forced to select Q[n+1] = R[n] with size < d, since we simply don’t have enough information to continue. This could happen with a very bad or very small underlying network for example.

  2. We must also choose Q[n+1] such that the union of the sets { predecessor(v)
    | v in Q[n+1] } is equal to Q[n].

    In other words, all of the hop-n query peers “have vouched for” at least one of the hop-(n+1) query-peers, or equivalently the hop-(n+1) peers are not “vouched for by only a subset of the hop-n peers”. This latter constraint is a more complete description of the reason on why security is improved. By the pidgeonhole principle, if you do this it is always possible to select d disjoint pairs (q, r) out of your hop-n and hop-(n+1) results, such that q is one of the nodes that r came from. So it still preserves the “disjoint” property that S-Kademlia talks about. But note this is an arbitrary choice; the whole picture is the bipartite graph of our queries vs our results.

If (1) or (2) is broken, then this means our results after hop n has been given to us by fewer than d nodes, and {any other nodes that we might have previously talked to} do not have any representation in these results. Note that even a drop from d to (d-1) is bad as this affects us for all remaining hops, where the number could drop even further. We want to keep this number at d as far as we can. This is “morally speaking” what we and S-Kademlia want to avoid - for an incomplete fraction of any query set, to control the overall query.

So using the above simple example, you could do something like this:

  1. node 1 replies {4,5,6}
    a. remember that: node 4 came from 1, node 5 came from 1, node 6 came from 1
    b. arbitrarily choose any result that comes from 1, e.g. 6
  2. node 2 replies {4,5,6}
    a. remember that: node 4 came from [1,2], node 5 came from [1,2], node 6 came from [1,2]
    b. at least one previously-chosen result {6} also came from 2, so
    c. arbitrarily choose any other result that was not already chosen, e.g. 5
  3. node 3 replies {1,5,6}
    a. filter this down to {5,6} as 1 was already queried in previous hops
    b. remember that: node 4 came from [1,2], node 5 came from [1,2,3], node 6 came from [1,2,3]
    c. at least one previously-chosen result {5, 6} also came from 3, so
    d. arbitrarily choose any other result that was not already chosen, e.g. 4

Note: In (2c) (resp. (3c)) if all previously-chosen results did not come from node 2 (resp. node 3), then in the next step 2d (resp. (3d)) we would instead arbitrarily choose one of the results directly from the reply. In fact this logic is done in step (1b) but we note it explicitly here for clarity, because in (1b) there are no previously-chosen results so the condition is vacuously true.

Note: For a more complete version of this algorithm that gracefully deals with the case when R[n].size < d, see here.

To compare this with S-Kademlia, instead of choosing one result (and sending a next-hop query to it) and discarding the rest, we choose one but keep the rest for later reference. The intuition is that we might “backtrack” our choice, but since we will have already queried that node it would be inefficient to “backtrack” the query itself (by cancelling it or ignoring its result), the more efficient thing would be to locally “move” this query into a different disjoint path. However we suggest that even this way of looking at the picture is unnecessarily complex and convoluted, one should not implement the algorithm by literally reassigning queries to different lookup paths, and the more natural viewpoint and implementation is as we described previously.

Extended across a whole sequence of 0…n hops, I find it suitable to name the whole collection of bipartite graphs as “fat multipaths” - in this way you are acknowledging all of the individual paths that exist, that form the whole parallel lookup query.


Note that this principle of a fat multipath is applicable not just to Kademlia DHT routing, but in fact to all parallel routing, where you want to retain query diversity, and don’t want to let the end result be fully-controlled by an incomplete fraction of any part of the query set.

Further research to adapt this to other routing schemes would be good follow-up work. The smallest non-trivial instance of this general problem would be:

Suppose you have two query nodes, P, Q, and you make the same query to both of them:

  1. P gives you {X, A}
  2. then Q gives you {X, B}

In step (1) you have no information, so you need to arbitrarily choose one of them. Suppose you chose X, then in step (2) you are forced to choose B. So there is nothing to “optimise” here. However suppose in step (1) you chose A, then what is better in step (2) - to choose X or B? The difference is that P also gave you X previously, but B is a completely-new result. Is it better to choose the former, or the latter?

To compare, in our skad algorithm this would be the difference between choosing prioritise_unique_results is True or False.

1 Like

Thanks for the detailed write-up @infinity0!

Following up on the scenario above where there are no unused results within node-3’s result set, thus reducing d from 3 to 2. The S-Kademlia paper says:

We also see that with k = 16 there is enough redundancy in the k-buckets to actually create disjoint paths.

Thereby one could rely on the fact that this is a reasonably rare case to hit, and when hit, prioritize unique results, but fall back to any, in order to not lower d for all future rounds, correct? If I am not mistaken this is also the scenario in your first test case within your sample algorithm.

In a situation with no attackers, this would be reasonably rare for most of the path, but may well be reasonably common the closer we get to the target, since peers closer to the target have a higher chance that this part of their k-buckets overlap.

In a situation with attackers, if you accidentally select a malicious hop-2 node via an honest hop-1 node, S-Kademlia will fail to recover whereas the tweak I suggested will recover by selecting another hop-2 node out of the hop-1 results.

“Prioritising unique results” is a separate issue, it’s just a parameter that tweaks the way in which you avoid reducing d. The fix I suggested will keep d at its intended value even if you set this to false. Right now I’m unsure which value {true,false} is better. I couldn’t think of any obvious way to break the symmetry, so decided to leave this question open for the future.