discrepancies between methods of data importation

Get help with your data
Post Reply [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
nellymar
Posts:2
Joined:14 Aug 2014 00:04
Location:New Zealand
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
discrepancies between methods of data importation

Post by nellymar » 14 Aug 2014 02:40

Hi dear Gephi users,
I have been using Gephi for a while to obtain great displays of my networks. I have recently come across a problem however, I noticed discrepancies when using the Modularity function depending on how I import my network into Gephi.
-one method is to import directly a edgelist on a csv format (source, target) using the import spreadsheet in the data laboratory.
-another method (that I was solely using before, hence I had not noticed the difference) is to open a .net file of the network (Pajek file format) by using File>Open (Note that to produce the .net file, I take the exact same edgelist and I use a little program called createpajek.exe).
Those 2 method should be strictly equivalent (I think) and they do produce networks that look similar, with same number of edges and nodes, same average degree, average path length etc...
However when I use the modularity function, I obtain different results depending on how I imported my network data, (modularity of 0.529 versus 0.536 with some nodes that are not in the same community).
This is really very intriguing to me as I cannot explain, and I don't know if one method is wrong and which one. Does anybody has any explanation as to why this problem occurs?
(I attach the 2 files, pertaining to the exact same network, but one is a csv edgelist and the other is a .net file obtained from the same edgelist, and they give me different results of modularity when I try to import one or the other into Gephi)
thanks a lot for any advice
Attachments
movmt_edgelist_binary_1a.net
(6.34KiB)Downloaded 150 times
movmt_edgelist_binary_1a.csv
(8.24KiB)Downloaded 123 times

xptrxptr
Posts:31
Joined:21 Dec 2010 21:17
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: discrepancies between methods of data importation

Post by xptrxptr » 15 Aug 2014 14:27

Thanks for providing your data. This is a nice example with clear modularity structure.

.....(modularity of 0.529 versus 0.536
.....This is really very intriguing to me

I am even more worried since both of of your results reported are wrong - too low.
I checked modularity with Pajek - the right modularity is: 0.564909

The true partition into 6 clusters is:
1: 100 104 15 103 107 16 28 71 19 69 17 2 70 98 21 26 23 22 29 4 18 5 92 1 27 40
2: 49 52 87 55 51 54 3 56 58 53 88 86
3: 8 38 9 12 10 11 25 13 6 108 59 89
4: 24 101 7
5: 32 67 33 68 34 35 36 37 96 64 65 66 39 41 42 43 44 91 45 46 94 47 90 97 48 50 102 57 14 60 61 30 62 63 31
6: 72 73 74 75 76 77 78 79 80 81 82 83 20 84 85 93 95 99 105 106

I hope somebody can explain why Gephi is providing wrong results.

nellymar
Posts:2
Joined:14 Aug 2014 00:04
Location:New Zealand
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: discrepancies between methods of data importation

Post by nellymar » 21 Aug 2014 00:14

Thank you very much for your reply. Could you indicate me the sequence of commands to realise in Pajek to obtain this result? I will go for Pajek modularity for now. I am using Pajek already (except for that) and it can be sometimes a little cryptic to figure out the right commands, so that would be really useful thanks.
Concerning Gephi, I would like to add that it has happened to me to obtain a slightly different result when I analyse modularity in the exact same network (same import method) with the same procedure, hence I was wondering if there is any randomness involved in the process? Or maybe I am doing something wrong, also possible. As a hint, I can also say that I noticed that differences arise, as I import a csv file as edgelist, when there is an additional column corresponding to attributes of the edge (numbers) and this DESPITE the fact that I do not call this column WEIGHT, hence I thought Gephi should not consider this column as a weight for the edges and would simply ignore it. To be more clear, I obtain the following results with different files corresponding to the SAME network, where the difference are only whether they are valued or not or whether the files are converted as .net or not:
-edgelist_binary.csv (source,target), imported in data laboratory: modularity 0.529
-edgelist_valued.scv(source,target,number), imported in data laboratory with the 3rd colomn NOT called WEIGHT and imported as a string: modularity 0.532
-edgelist_binary.net, imported by opening the pajek file (File>Open): modularity 0.536
-edgelist_valued.net, imported by opening the pajek file (File>Open): modularity 0.536 as well (but I think that when converting the csv edgelist into a pajek file, the 3rd column containing the values are simply dropped, so it seems normal to me that the binary or valued networks once converted as .net will be exactly the same and give the same results).
I hope someone can explain these discrepencies and reassure us as to the validity of the modularity method implemented in Gephi, or about the appropriate way to use it in case I am mis-using it. I quite liked this method because it's well documented and the algorithm seems to perform well as presented in the corresponding publication by Blondel et al. ("Fast unfolding of communities in large networks"). But until I find what is wrong, I will try to stick with Pajek if I find how to operate it.
Thanks
Nelly

xptrxptr
Posts:31
Joined:21 Dec 2010 21:17
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: discrepancies between methods of data importation

Post by xptrxptr » 21 Aug 2014 08:44

...Could you indicate me the sequence of commands to realise in Pajek to obtain this result?

Network/Create Partition/Communities/Louvain Method/Multi-Level Coarsening + Multi-Level Refinement

You can also check the Pajek mailing list for additional information or:

http://mrvar.fdv.uni-lj.si/pajek/commun ... xample.htm

Post Reply
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable