Grouping nodes by similarity

bonephilipp · Post by **bonephilipp** » 22 Dec 2013 12:24

Dear Gephi users,

I have a list of strings looking like this:

1 A B C D E F G
2 B C D E F X Y
3 A B G H J I K

I would like to import lists of this style in to Gephi and group nodes by similarity. What I have tried so far is the following:

1. I wrote a perl script that compares each line with all the other lines and determines a similarity value based on how many edits would be necessary to change the one string into the other (using the perl module String::Similarity)
2. The output I get is something like this (values don't correspond to the example above):

String1,String2,0.5
String1,String3,0.23
String2,String3,0.9

The higher the value, the higher the similarity between the first and the second item of the CSV.

3. I imported this CSV into gephi, but then last number is interpreted as a weight and this just affects the edges but doesn't do anything towards grouping the strings such that if two strings have a high similarity value, they should be close to each other and if the value is low they should be far apart from each other.

So I have the suspicion that my overall approach is wrong. Could someone give me a few hints or direct me to reading material on how to do graphs like that?

Thanks and merry xman,
bphil