Here's a draft. I'll be off during the next 2 weeks, so don't worry if I don't post here. I hope you'll find here some useful information, even if it is not very clear for the moment. Feel free to comment and criticize this draft. Tell what you would develop, unanswered questions, what you dislike and so on...
Draft paper about ForceAtlas
I. Introduction
ForceAtlas is a force vector proposed in the Gephi software. It is appreciated for its simplicity and readibility, despite being limited to relatively small graphs.
This paper explains how it works and why such a simple algorithm is still useful to users, particularly researchers in humanities.
II. Anatomy of ForceAtlas
ForceAtlas is a force directed layout: it simulates a physical system with custom forces.
At each step forces are calculated, and a displacement of each node is given by the force and a "speed vs. precision" approximation.
II. A. Energy Model
The (attraction,repulsion)-model of ForceAtlas (1,-1), as formalised by A. Noack, is between Früchterman Rheingold (2,-1) and LinLog (0,-1).
II. B. Repulsion by degree
The repulsion between two nodes is proportional to the produce of the two degrees.
It makes the graph clearer. We can see its effect on "star" structures.
This is almost the same thing as Noack's "edge-repulsion", even if it is not how we understand this feature.
It also causes instability and side effects (with gravitation)
II. C. The formulas
attraction(n1,n2) = attraction_factor * distance(n1,n2)
repulsion(n1,n2) = repulsion_factor * (1+degree(n1)) * (1+degree(n2)) / distance(n1,n2)
II. D. Options
- Settled nodes: The force takes in account the nodes that are manually settled by the user.
- Gravity: There is also a gravity that prevents nodes from drifting away: gravity(n) = gravity_factor * distance(n)
- Weight: if edges have a weight, it is used as a factor for the attraction
- "DistributedAttracion" differenciates hubs from authorities by dividing the attraction of each edge by (1+degree(node_from)). The hub attract less and appear more peripheric than authorities.
- "AdjustBySize" prevents nodes from overlapping: nodes that overlap are repulsed by a constant force.
II. E. Optimizations
There is no sophisticated optimization such as Barnes Hut or so. Thus the algorithm is exponentially slow as the size of the graph rises (n² complexity) and is unable to deal with big graphs.
But the advantage is that it converges to a very low energy (good quality).
Two basic optimizations allow to manage convergence speed:
- the essential "speed vs. quality" application of the force, including the necessary max_displacement protection
- a dynamic freezing of nodes that works this way: nodes "freeze" when the force applying to them changes quickly. "frozen" nodes have a malus for displacement. This prevents nodes from swinging around their balance position.
III. Spatialization quality
Here we compare ForceAtlas to other layouts such as OpenOrd, Yifan Hu and Früchtermann Rheingold, on different graphs.
We do not test performance, but quality, meaning: interpretability.
III.1. Are close nodes more connected ?
III.2. Are close nodes structurally equivalent ?
III.3. Are close nodes in the same modularity clusters ?
IV. Conclusion