[SOLVED] help on OpenOrd interpretation

weedlili · Post by **weedlili** » 18 Aug 2011 10:40

Hello,

I used OpenOrd to visualize a huge co-words network (135 258 nodes and 1 254 935 links). These words were extracted from a bibliographic database of about 76 000 scientific papers. I found four main aligned clusters, each dealing with a scientific speciality.

These clusters make sense to me, but I have difficulties to interpret the whole graph, because I am not sure I understood the spatialization process. I read the paper by Martin & al, but could not find any help in it :/

Can I conclude that the scientific fields in the middle are fields acting like "bridges" for the others? How can I interpret the distance between the clusters?

Are there publications, or online examples of OpenOrd graphs interpretations?

Any advice would be greatly appreciated! I fear to make an horrible, savage interpretation!

W.

seinecle · Post by **seinecle** » 19 Aug 2011 15:54

Hi!

I'd have the exactly same question. So if anybody out there has an "explanation of openord for laymen", that would be great!

In practice, is there a reason why you don' filter your network before importing it? In my experience co-word networks can be reduced in size (for example, keeping only words which score high on this measure: http://en.wikipedia.org/wiki/Tf%E2%80%93idf , or else).

If you'd filter your network to downsize it to 50,000 nodes and 200,000 edges, then the parallel Force Atlas layout would be an option, and it is much more easy to interpret.

Best,

Clement

weedlili · Post by **weedlili** » 21 Aug 2011 14:23

Hello!

thank you for your reply.

Indeed, I reduced my network after importing it, but I did not use the metrics you mentionned. By the way, how do you compute it? I mean, which tool do you use? As far as I am concerned, I use the Sci2 and the R tools, and I am not aware of a way to compute these metrics in them.

Then I proceeded to an analysis with ForceAtlas, but the results are quite different from the OpenOrd results. So, I really needed some extra-explanations!

Best,

W.

Post by **jacomyma** » 22 Aug 2011 12:12

Hi,

I don't know how works exactly the OpenOrd layout, but I made ForceAtlas and I'm quite experienced on the question of layouts. I'm giving some insights and I hope they will help you to figure out why there is a difference between OpenOrd and other layouts.

A classic force-driven layout follows a simple pair of rules: nodes repulse, and edges attract. There is a law for the attraction and another law for the repulsion. We could use the physical laws like magnetic repulsion and elastic attraction. The force-driven algorithms simulates the system, like a physical engine in a video game, and the nodes move according to these law until they reach a balanced state.

A force-driven layout is not a Cartesian projection. We cannot compute the balanced state directly. That's why we use this kind of simulation. It finds an optimal configuration little by little. The very idea of a force-driven layout is to "empirically" find such an optimal configuration of the network.

I think this is well known. I wrote it so that this point is clear: the "result" of a layout depends on the "physical" laws involved, the initial configuration of the network, and also the moment to which you stopped it.

ForceAtlas, compared to Fruchterman Rheingold and Yifan Hu, has different laws and a different way to determine "when to stop it".

OpenOrd works another way. It has 5 different stages that use different "physical" laws and that run for a fixed amount of steps. Here are how I understand them:
1) Liquid stage: very rough layout, that is better for starting than a real randomization.
2) Expansion: roughly layout the graph
3) Cooldown: refine the rough layout
4) Crunch: gathers "clusters", making them tigher (nodes overlap)
5) Simmer: give some "air" to the dense clusters (less nodes overlap)

Here is why the result with OpenOrd is so much different from other layouts:
- Stages 1 and 2 as a preprocessing of the graph. Their purpose is performance. They do not play an important role in the result layout.
- Stage 3 is the actual layout. At the end of this stage, the result is comparable to a classic force-driven layout.
- Stage 4 and 5 follow a different principle: they specifically highlight the clusters, by actively differentiating them. These stages make a difference, as they are like a post-processing.

I'll explain now which difference it makes.

If the question asked is "Why my network has this specific shape?":
- The classic force-driven layout answer: "The layout is the best compromise between putting connected nodes together without too much overlap so that the image is readable". (Different algorithms have different definitions of what is the "best compromise", how to put nodes "together" and without "overlapping", that is different optimizations and different laws).
- The OpenOrd answer I would give: "The layout aims at highlighting clusters of strongly-connected nodes". I can't tell much more because it is not a compromise. The post-processing broke the compromise.

The main difference appears if the question asked is "Does the cluster I see here actually exist?":
- The classic force-driven layout answer is "Yes, the nodes are more strongly connected together than with the rest or the graph", because visual density directly translates structural density. This has been shown by Andreas Noack.
- The OpenOrd answer is "More or less, since the nodes are strongly connected together, but might also be strongly connected to another visual cluster", because the post-processing has the side effect of cutting the clusters. Then the visual density denotes which clusters have been defined rather than the very structural density. The classical case is that it separates sub-clusters of a bigger cluster (and you don't see the bigger cluster anymore).

The difference between OpenOrd and a classic force-driven layout resides in the way clusters appears. OpenOrd's visual clusters are more "artificial" than with a classic force-driven layout.

Note: I'm not claiming that it is a problem. When a graph is very dense, you'll fine useful to highlight clusters so that you can actually read your network. With OpenOrd as well as with another layout, I recommend that you test your clusters by a measure such as density or modularity.

weedlili · Post by **weedlili** » 23 Aug 2011 10:52

Thank you very much for this long answer. I am not comfortable with the cutting option in OpenOrd, thus I will keep on using ForceAtlas.

Regards,

W.

seinecle · Post by **seinecle** » 12 Sep 2011 11:38

Hi,

On the td-idf measure, you can check that: http://stackoverflow.com/questions/2380 ... -in-python

And thx to Mathieu for his explanations on OpenOrd.

Best,

Clement

weedlili · Post by **weedlili** » 13 Sep 2011 08:02

Thanks.

Gephi forums

[SOLVED] help on OpenOrd interpretation

Re: help on OpenOrd interpretation

Re: help on OpenOrd interpretation

Re: help on OpenOrd interpretation

Re: help on OpenOrd interpretation

Re: help on OpenOrd interpretation

Re: help on OpenOrd interpretation