This post assumes you know what Kademlia is, and roughly how its distance metric and routing algorithms work. It is a much more developed version of this post, with a newer & better algorithm, and more detailed explanations both high and lowlevel. The old post may be considered obsolete.
Actually, there are many possible routing algorithms based on any given distance metric, and while we know how to analyse various properties of a routing algorithm, the field has not yet devised a single objective way to compare two routing algorithms, nor even what it means to be “the best” routing algorithm. Most of the reasonable ones all follow a similar pattern though, roughly:
 each node maintains a routing table, which contains ~k “close” entries, ~k “further” entries, ~k “even further” entries etc, for superlinearly expanding notions of “further away”
 routing is done by choosing the neighbour in your routing table that is closest to the target, and then iterating to their neighbour, etc etc.
The devil is in the details of course, and one major thing is that in practise you want to do this in parallel  but how precisely? The original paper defines a parallel routing algorithm roughly as follows:
 query the closest d neighbours
 out of the combined set of their results, select the closest next d
 loop until you’ve reached your target(s), or run out of patience
This is an insecure parallel routing algorithm  if even only 1 of the neighbours is malicious, they can trivially attack the victim by giving d results that are fake nodes “very close” to the victim’s target, but are in fact all controlled by the attacker. From this point onwards the attacker can lead the victim on a wild goose chase, repeatedly giving him bad results that are again controlled by the attacker. This is an example of the general class of attacks called the “eclipse attack”.
SKademlia improvements
The attack described just above, comes from the fact that (A) we combine everyone’s results together, losing information about where each result came from, then filter the combination using (B) a criteria that is easily gamed, i.e. nodeId XOR distance.
(B) partly is due to the context of operating in a decentralised network, where new IDs can be generated freely. Securing this aspect of the system is a hard topic we won’t go into here, and it is best done separately from the routing algorithm. We do note that SKademlia also proposes an improvement in this area, but it is actually insecure and does not even improve the situation on top of regular Kademlia.
(A) is what we’ll focus on. SKademlia proposes a reasonable improvement, but as we’ll explain, this improvement still has some bad corner cases, and we take it to its logical conclusion with a fullygeneral version that covers all corner cases.
SKademlia makes two main modifications to the routing algorithm. Firstly and foremost, to the lookup traversal:
To quote: “The initiator starts a lookup by taking the k closest nodes to the destination key from his local routing table and distributes them into d independent lookup buckets. From there on the node continues with d parallel lookups similar to the traditional Kademlia lookup. The lookups are independent, except the important fact, that each node is only used once during the whole lookup process to ensure that the resulting paths are really disjoint.”
In other words:
 Keep track of d separate lookups for a given target lookup. As before, each lookup will hop through a bunch of different peers.
 When the results of a lookup are returned, select the next hop for this lookup out of these results, but ignore any nodes you’ve already previously queried in any of the other lookups.
In this way, all the lookups query entirely different sets of peers, and therefore a malicious peer in one lookup cannot take over another lookup, since the other lookups will ignore this peer.
Secondly, to the end of the algorithm, the result collection part:
To quote: “the lookup doesn’t converge at a single node but terminates on d closeby neighbors, which all know the complete s siblings for the destination key. Hence a lookup is still successful even if k  1 of the neighbors are adversarial.”
Explanation: s
here refers to the storage replication factor, which SKademlia separates away from k
the routing replication factor. In original Kademlia, s = k
, and the routing algorithm returns the k
closest peers observed during the routing. In SKademlia, instead the routing algorithm finishes with d peers, one from every disjoint path, and each “final peer” gives s
results. The union of all of these results is then returned as the result of the overall lookup. In other words, the caller gets between s
and d*s
results, all of which must be used for any level of security.
After trying to implement these algorithms, which forces us to think through many specific cases in detail, including randomly generated test cases. we find that SKademlia’s specific algorithms contain corner cases that degrade its performance in a real network. Furthermore the way it constructs its final result set is awkward to work with, in terms of concretely using its security guarantees. The details of these are technical and given in the appendix. To give a summary here, SKademlia:

is unable to backtrack properly to find global constraint solutions, when the disjointpaths constraint cannot be satisfied locally, and this is dependent on the order in which results are received. Our algorithm always finds the best global solution, regardless of the order in which results are received.

does not allow disjoint paths to terminate in the middle of other disjoint paths; all nodes in the path are unavailable for other paths to make use of. In a nonadversarial setting this can reduce the result set too much, in a way that significantly disrupts the lookup; conversely relaxing this in an adversarial setting does not reduce security as we only terminate in the middle of another path, if the first path has no closer options anyway.

does not apply its disjointpaths security logic to the result sets themselves  instead of selecting 1 result from each peer at the end of each
d
disjoint path, it uses alls
results for each peer, potentially givings*d
results in total. Callers are then forced to use all the results, even if half of them are anomalous  information about the anomalies are lost by the algorithm.
All these corner cases came up during our attempted implementation of SKademlia, and they are solved cleanly by our improved algorithm below.
Defining the security property
First we’ll describe a highlevel way of thinking about the derivation of the following algorithm, then explain how the algorithm works, then explain why it works. To note: this “highlevel thinking” was only our second step towards coming up with the below  our first step was the realisation and understanding of the corner cases in the appendix, but that takes some time working through the technical details and makes for harder reading. So for brevity and structure we skip that here, and start with this second step.
Having found some issues with SKademlia’s specific approach, we begin to salvage it by thinking about how to separate what it does vs how it does it  in other words, the properties it’s trying to achieve, vs the algorithm trying to achieve these properties. For example, “shortest path” is a property, “Dijkstra’s algorithm” is obviously, an algorithm. In the SKademlia paper these two things are somewhat mixed together, and neither aspect is formalised, so some level of licensed interpretation is required. In our view, the property that SKademlia wants to achieve can be described as:
 Suppose we have
d
initial peers and we want to traverse them tod
final peers. We must ensure that no 2 final peers were reached only via some single common node.
We state this in its briefest form so that it is easier to refer back to it later, and now we in explain it in more detail: As we perform a query, we make requests to nodes and they send us back a set of nodes in their response. We can model this as nodes in a graph telling us about their successors, and so during the course of our query we are building up a directed graph [1] of queried peers to result peers. Importantly, a single result peer might have “come from” multiple different queried peers.
In terms of this query graph then, the SKademlia property is that for any pair of final peers, there must exist two disjoint paths from some two initial peers, one path to each final peer. If this were not the case, then there would be a common node that is the only node that (directly or indirectly) gave you those two peers, and this node could be an attacker that gave you fake peers. So that’s why this property helps with security. By induction, the pairwise form of the property implies the full form of the property  i.e. that there must be d
disjoint paths to every set of d
final peers.
Note that this is different from what the SKademlia algorithm is doing! The algorithm attempts to satisfy the property, by assigning peers to paths during the course of the traversal. This is an entirely greedy approach to solving the constraint, is analogous to attempting to solve the shortestpaths problem by only following the shortest path at each step, and is why it fails for certain cases. It also throws away the information needed to perform the traversal correctly, namely which result peers came from which queried peers.
By thinking about the security property in more detail, and working our way through more corner cases, we eventually realised that the property can be translated into a maxflowmincost problem. Algorithms for solving these are wellknown, and efficient. The core of our work then was to figure out how to do the translation most appropriately, which we give below.
Maxflowmincost routing
Our modified SKademlia query algorithm covers both the lookup traversal stage, and the result collection stage. In both cases, we translate the query graph into a flow graph (with associated costs), then run maxflowmincost to obtain our results. The translations are described in the next sections. They are each parameterised on an idea of a “candidate set”, and the second is further parameterised on an idea of a “query set”; this is explained afterwards. We also describe how these fit together in the context of the whole lookup algorithm.
[1] By filtering out results that are further away than the source peer, we can remove any cycles in the graph. But in fact with our maxflowmincost approach, cycleremoval is unnecessary and actually undesirable for the final part of the query  the final peer in each path is supposed to give you their s
neighbours closest to the lookup target, some of which may be further away than themselves. Notwithstanding, a prudent implementation should still filter out results that are “too much” further away from the target than the queried node.
Maxflowmincost for queries
 Initialise an empty flow graph (with associated costs).
 For every node in the query graph, including ourselves, it will be converted into two nodes in the flow graph, “in” and “out”.
a. All incoming edges instead connect to “in”
b. All outgoing edges instead connect from “out”
c. Connect “in” to “out”  Mark the “in” node for our own node as the “source” of the flow
 Add a new node to be the “sink” of the flow
 For every node in our “candidateset”, connect its “in” node to the sink
 Every edge has capacity 1, except:
a. The edge from our own “in” to our own “out” node has capacity d  Every edge has cost 0, except:
a. Every edge to the sink has cost equal to the distancetoourquerytarget of the sourcenode of that edge  Run maxflowmincost on this flow graph. From the maxflow, the nodes adjacent to the sink are the overall result of this algorithm.
Maxflowmincost for results
 Determine the successors of each node in our “query set”, that belong to our candidate set. We call these the “relevant successors”.
 For simplicity, assume all the successor sets are the same size, call this N. (We have a more generalised version, in the code listing.)
 Initialise a flow graph (with associated costs) with a new source node.
 For every node in the query set:
a. Add an edge from the source to this query node, with capacity N.
b. Add an edge from this query node to each of its relevant successors, with capacity 1.
(If a query node is also a relevant successor, treat these as separate nodes in the flow graph when constructing it.)  For every relevant successor:
a. Add an edge from this node to the flow sink, with capacity N.  As with the first algorithm, every edge has cost 0, except:
a. Every edge to the sink has cost equal to the distancetoourquerytarget of the sourcenode of that edge  Run maxflowmincost on this flow graph. From the maxflow, the nodes adjacent to the sink are the overall result of this algorithm, along with their flow values (ranging from 1 to N).
How it fits together
a. On query startup, and whenever we update the query graph, we run the first algorithm with:
 candidateset = nodes we haven’t got a result/failure/timeout from
This will give us up to d results. Atmost1 of these results will be a peer we haven’t queried yet [2], this will be the next peer to query. The rest of these will be peers that we are currently in the process of querying.
If a peer times out or gives an error, we add them to the failed set, add them to our query graph with an empty successor set, and run (a) as normal.
b. To check whether the query may be terminated, we run the first algorithm with:

candidateset = nodes we haven’t got a failure/timeout from. This should be equivalent to the disjoint union of:
a. nodes we’ve already gotten a query result from
b. nodes we’re in the process of querying
c. nodes we found out about and could potentially query
If the output is itself a subset of (a), then the query may be terminated, and results may be collected via the second algorithm, see (\c) next. Alternatively, the caller may choose to continue with the query, in case the traversal finds even better nodes later, but this is unlikely assuming Kademlia actually works.
c. When we want to collect results from the query, we run the second algorithm with:
 querynodes = nodes we got from a successful run of (b)
 candidateset = nodes we haven’t got a failure/timeout from
This will give us all the results from those closest querynodes, in sorted order, first by their flow and then by their distance. Suppose the caller believes that up to a fraction f of nodes are faulty, then roughly speaking they can use the result nodes that have flow > f*d
and ignore the rest. If an insufficient number of nodes (e.g. < s
, the storage replication factor) meet this condition, the caller immediately knows this and can take a higherlevel action such as alerting any relevant network operators.
Why does this work? The flow graph we generate, expresses the same constraints of SKademlia’s disjoint paths approach:

By (6a) we have total incapacity of d. Therefore the maxflowmincost algorithm will give us up to d results.

In (2) we only allow each node to carry one flow (unlike a typical maxflow problem where nodes have infinite capacity, restricted only by their edges), this means that every possible maxflow consists of d disjoint paths each with flow 1. This satisfies the SKademlia “disjoint paths” property.

By adding edges from every candidate to the sink, we allow these nodes to be used as the terminus of a path, even if they are part of another path. This helps with the special case we mentioned earlier and discuss in more detail.

By associating costs with each candidate node, we add the constraint that we want the query to progress towards our target, whilst retaining the “disjoint paths” security property.
Why does this algorithm work better than SKademlia?

Iterative P2P queries are effectively graph traversal algorithms that try to solve certain constraints. Like similar constraintsolving problems, backtracking is often required. SKademlia’s disjoint paths approach adopts a particular set of constraints, however it attempts to solve these constraints by effectively running a graphtraversal algorithm that does not do any (global) backtracking  in fact it throws away the information needed to do this. And so, depending on the arbitrary choices it makes during the traversal (i.e. which peer to query next, based on the available set) it can find the best solution local to each “disjoint path”, but then be unable to find the best global solution in the next step when more data is added to the problem context.
This algorithm avoids this by retaining all the information, and casting the problem as a global constraint solving problem. It is thus able to always find bestsolutions, even if they are not built directly on top of the bestsolution in the previous step.
Why do we need two separate algorithms?

While the maxflowmincost algorithm is efficient compared to more general NP constraintsolving problems, one characteristic is that it sometimes gives a maxflow that is split across equallygood candidates. This is sometimes desirable and sometimes not.
In the first algorithm, we are looking for no more than d results. So we supply a limited amount of flow, and by giving every edge capacity 1 we ensure that splitting never occurs.
In the second algorithm, we are interested in comparing all our results. So we supply a lot of flow, allow splitting to happen, and use this to help our comparison.
[2] Proof to follow; to give a sketch, this can be done by comparing the differences between results on each successive call to the algorithm. Essentially, since we restrict nodeflows to 1, each new result set (added to the graph) can only add atmostone peer to the “next peers to query” set.
Appendix  bad cases & corner cases
In the below, for simplicity all our lookups are for target key “0”, which makes the XOR distance the same as the node id.
Backtracking over redundant routes
Suppose d=k=s=3, and we issue a query to our neighbours {4,5,6}. They reply, in order:
 node 4 replies {1,2,3}
 node 5 replies {1,2,3}
 node 6 replies {4,3,2}  note that this is perfectly legitimate in the general case since node 6 can’t be expected to know that we also have node 4 as a neighbour
(click to view animation)
In SKademlia, one possible choice of paths would be {4,3} and {5,2}, the red and blue path in the diagram. Once you have made these choices, there is nothing further to do when you get a reply back from node 6 (green) because all its results have already been used up by other paths.
This effectively reduces your query from this point onwards to be 2 parallel lookups instead of 3. In other words, 2 peers now control the eventual lookup result, instead of 3  which is what we specifically want to avoid with disjoint paths. By the input parameters, we have the resources to execute 3 lookups in parallel, but we end up underutilising this due to toorestrictive rules on SKademlia’s part.
Let’s take a step back however. Why did we choose {4,3} and {5,2}? These are arbitrary choices. If we had chosen something else, namely {4,1} and {5,3}, we could have then chosen {6,2} and this would still be within SKademlia’s own rules.
The problem arises from greedily choosing to traverse one part of a graph optimising a constraint (disjoint path + close distance), then immediately discarding the rest of the graph. This prevents us from backtracking later when our chosen path prevents us from solving our constraints. The maxflowmincost algorithm handles this case gracefully  going from step 2 to step 3 it will recalculate what the maxflow is, finding the correct solution as shown in our diagram.
Backtracking over failed routes
Another more complex example, this one involving multiple levels of backtracking and failed queries.
Suppose d=k=s=3, and we issue a query to our neighbours {10,11,12}.
(click to view animation)
Nodes 1 and 2 both fail to respond. We then want to reassign the paths and select 7 as the next peer to query, even though it’s unrelated to the “10 > 5 > 1” path, because 6 was already selected previously.
Good results case
(click to view animation)
This demonstrates one consequence with being too strict with disjoint paths. In a realworld situation, several paths may end up wanting to traverse the same set of nodes near the target. However the disjointpaths rule prevents any part of the path from being traversed twice, effectively hiding them from our results. Furthermore at the end of each path, these results are typically going to be further away than our target node  otherwise we would have queried the closer nodes as part of the path.
These aspects do not play nicely with each other, and often we find that we miss results such as in this example  node #50 gives us back #100 and #160 as the results, yet node #200 knows of a closer node #140.
Therefore we relax the SKademlia disjoint paths rule slightly, by allowing paths to terminate in the middle of other paths. The other rules about singleflow still apply, so two paths cannot terminate at the same node  which would give more weight if that node was indeed malicious. And if there are closer alternatives, then the path will follow those as usual and this exception won’t be needed.
This relaxation was already deemed to be necessary by other implementers, before we came up with our maxflowmincost algorithm.
Final results
With the SKademlia approach: If you ask for s
results, you will be given between s
and d*s
results. If >=1 of the final queried nodes (in the paths you visited) is honest, then at least s
of the results will be correct. However we cannot perform reasoning more finegrained than this, and so the caller must use (i.e. contact) all these results for any minimal belief of security.
In the good case, the majority of result sets (from each final queried peer a.k.a terminus) should overlap, and so we’ll get around s
results. In the worst case, there is no overlap and we are forced to consume d*s
results; no protection can get around this since the entire data is faulty. However in a more realistic attack scenario, most of your peers are honest and give you overlapping result sets, but a few of them are attackers and give you bogus results. In this case, you will be forced to consume 2*s
or 3*s
results.
This may not be a big deal, but with just a little more sophistication we can easily take care of this, by running the maxflow algorithm. This demonstrates that with a more theoreticallyprincipled approach, we do not have to make compromises on smallbutawkward matters like this.
(click to view animation)
As shown in the diagram, the maxflowmincost algorithm can distinguish between result nodes with more or less flow coming from the final queried nodes.
The example is simple for brevity; its result can be recreated with a much simpler algorithm that merely counts the indegree of each result node. However a maxflowmincost solution generalises better, e.g. if the final queried nodes gave different numbers of results, we don’t want the larger set to dominate the smaller sets.