author Henk P. Penning date Sun Dec 3 13:02:06 CET 2006
- The PGP web of trust can be viewed as a directed graph where the points are the PGP keys, and the arrows (directed lines) are the signatures.
- If there is a path from key A to key B, the distance from A to B is the length of the shortest path from A to B.
- The strong set is the largest set of keys such that for any two keys in the set, there is a path from one to the other.
- The mean shortest distance (MSD) of a key is the average distance to that key.
- The avarage mean shortest distance (AMSD) is the average of the MSD's of all keys.
The size of the strong set is growing ...
... and dived down in 2004-12 ; a massive cleanup removed a lot of revoked/expired keys.
... and the average distance between the points is shrinking!
A possible reason : the average degree (number of signatures per key) is rising.
The population of the strong set changes over time, as keys enter and leave the set.
In the graph below, for a given date
About half of the keys in the first set (april 2003) have disappeared since then.
- a blue line represents the size of the strong set
- a red line represents the number of keys that disappeared from the strong set
The plot below shows the distribution of degree (the number of signatures per key).
The distribution hardly changes with time. The fraction of keys with only one signature is remarkably stable at some 25%.
The graph below shows last month's distribution of degree (number of signatures per key) using a log/log scale.
The graph is very typical for a social network exhibiting the small world phenomenon, a scale-free network.
- the red points show the raw degree/count scatter graph
- the blue points show the average degree per count (level)
- the smooth green line follows the blue points
The graph below shows the distribution of the distances between all PGP keys in the strong set.
Since the strong set grows, the number of paths grows too. If you look closely, you can see that the most common distance decreases. In april 2003 it was just below 6, now it is creeping to 5.5.
The plot below shows the same data, but the counts are normalized (divided) by the number of paths in the strong set :#paths = |G| × (|G| - 1), where |G| is the number of points in the graph
The plot is interesting, because it is so boring! The shape of the distribution curves is very stable; over time, the pattern shifts slowly toward smaller distances.
Some people are interested in key ranking ... See the top 50 and top 1000.
Following the footsie idea of Matthew Wilcox, here is the footsie for the strong set :
What happens to the size of the largest strongly connect component (scc) if we delete points ? The graph will break up and the size of the biggest component will go down. How this happens, depends on which points we remove. Below we will see what happens when we remove core, fringe or random points.
The plots below show what happens if we remove k core keys : the keys with low msd's.
To see how the size of the scc (red line) goes to zero, the second plot shows the same data on a logscale.
- the red line shows the size of the remaining largest strongly connected component
- the green line shows the number of strongly connected components
- the blue line shows (an estimate for) the average mean shortest distance in the largest strongly connected component ; the msd is computed for 10% of the points
The graph crumbles gracefully until some 10.000 keys are removed. After that, only splinters remain. As expected, the amsd shoots up, to over 15. Note that the top-1000 aren't very special. The crumbling is evenly distributed over the entire range, until there is nothing but graph dust, probably local clusters.
The plots below show what happens if we remove k fringe keys : the keys with high msd's. To get a better view of the number of components (green line), the second plot shows the same data on a logscale.
The graph remains very much intact, all the way down. The number of components remains low and very stable.
At first, the amsd decreases more than average because the the largest connected component becomes 'rounder'.
What happens if we remove random keys ?
The scc of the graph remains big, almost all the way down. The amsd (blue line) remains very stable, between 5 and 6.
This plot shows the fractal nature of the strong set. A random subgraph has the same structure as the entire graph.
- PGP Web of Trust Statistics by Neal McBurnett ; a first analysis of the PGP web of trust, 1997
- keyanalyze - Analysis of a large OpenPGP ring by M. Drew Streib ; a analysis of the PGP web of trust, starting in 2001.
- Dissecting the Leaf of Trust by Jörgen Cederlöf is an analysis of trust relations in the strongly connected component of the pgp web of trust. Data is gathered from the level of trust assigned to signatures by the signer.
- The Footsie Web of Trust analysis by Matthew Wilcox ; shows the state of the Web-of-Trust at regular points of time ; average MSD's of the top 50 and top 1000 keys, etc.
- keyanalyze reports by Jason Harris
- Betweenness Centrality reports by Matthias Bauer ; the computations are based on Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, Journal of Mathematical Sociology, 25(5):163-177, 2001. The usefulness of a key is measured by the number of shortest paths that run through it.
- wikipedia.org : PGP, web of trust, graph, graph theory, scale-free network social network, small world phenomenon
- Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications, Lun Li, David Alderson, Reiko Tanaka, John C. Doyle, Walter Willinger ; Condensed Matter, cond-mat/0501169, december 2004.
The plots were done by Henk P. Penning → home → pgp → pathfinder and key statistics
The data was kindly provided by Patrick Feisthammel and Jörgen Cederlöf.