[SOLVED] Working with CSV's

rotten · Post by **rotten** » 07 Oct 2010 20:38

There appear to be two official documentation pages on CSV's:

The first one I found:
http://gephi.org/users/supported-graph- ... sv-format/
is really confusing because it starts off with semicolons in it, and then switches to commas.

Then I found this one:
http://wiki.gephi.org/index.php/Import_CSV_Data
which is more helpful.

I'm still a little confused. Here is what I think I understand so far:

If I want to build a graph in Gephi from a csv, I actually need two csv files if I want to have extra properties (columns) on both the nodes and edges. [ So I can use Partitioning and extra Filters. ]

One will define the nodes and the node properties. Each line in the csv represents a single node.

The other defines the edges, with each line representing an edge. Edges can have properties too.

Both imports expect the first line of the CSV to contain headers identifying the columns. If the header names match standard Gephi column names, that is where the data will be populated.

If an edge occurs more than once in the file, the weight value will not automatically be incremented unless you read it in via "File/Open" instead of 'Import CSV'. (However, you can count up the duplicate edges and add a 'Weight' column before trying to read it in.)

After selecting "New Project", and navigating to the Data Laboratory, I should import the nodes CSV first. Once the nodes table is populated, I can import the edges.

The problem is that the edge import doesn't recognize any of the nodes I already added. My theory was that this is because when I imported the nodes, they had extra columns.

If I import the edges first, and let it seed the nodes table, the same thing happens when I go to import the nodes. The nodes that get pulled in don't map to the ones already loaded.

If I remove the extra columns, they still don't seem to map to each other. So maybe the extra-column theory is wrong. Should Source and Target in the Edge CSV be Ids, Labels, or some other value in the node table?

If I load in just the edges (with properties) it will create the nodes and build me a graph with properties and I can use the Partitioning tool on the edges.

** I've also noticed that if I close the project, (don't save) and then select "New Project" and import the nodes again, the 'Id' column numbering starts up where it left off from the last project. This is a minor bug. I don't really mind exiting and restarting Gephi after each experiment with CSV imports. But I thought I should mention it.

[ I'm working in 0.7Beta. ]

Here are some a couple of sample CSV's of the sorts I've been experimenting with:

test-nodes.csv

Code: Select all

Label,PPAVM
Rick,Person
Jeff,Person
Liam,Person
Farm,Place
City,Place
Cow,Animal
Dog,Animal
Carrot,Vegetable
Brocolli,Vegetable
Salt,Mineral

test-edges.csv

Code: Select all

Source,Target,Relationship
Rick,Carrot,Eats
Dog,Brocolli,Eats
Cow,Farm,Lives
Brocolli,Farm,Grows
Dog,City,Lives
Jeff,Cow,Eats
Liam,City,Lives

Post by **admin** » 07 Oct 2010 21:37

Hi,

Currently we have a naming problem: CSV is a format one can use for various cases, e.g. encoding an adjacency matrix, or saving and Excel-like data sheet.

The first one is only usable to import a graph topology (nodes and edges without additional information).

The second is to generate a network from a spreadsheet.

rotten · Post by **rotten** » 07 Oct 2010 22:13

If I understand you then, what I'm trying to do with comma separated text file inputs is not possible in 0.7 Beta.

Post by **admin** » 08 Oct 2010 09:28

Not this way,

However you can create a GDF file, which is comma-separated, to have list of nodes and and a list of edges.

rotten · Post by **rotten** » 08 Oct 2010 15:37

Aha! Thanks for the pointer. GDF works great.

Code: Select all

nodedef> name VARCHAR,label VARCHAR, ppavm VARCHAR
Rick,Rick,Person
Jeff,Jeff,Person
Liam,Liam,Person
Farm,Farm,Place
City,City,Place
Cow,Cow,Animal
Dog,Dog,Animal
Carrot,Carrot,Vegetable
Brocolli,Brocolli,Vegetable
Salt,Salt,Mineral
edgedef>node1 VARCHAR,node2 VARCHAR,relationship VARCHAR
Rick,Carrot,Eats
Dog,Brocolli,Eats
Cow,Farm,Lives
Brocolli,Farm,Grows
Dog,City,Lives
Jeff,Cow,Eats
Liam,City,Lives

Post by **eduramiba** » 08 Oct 2010 20:52

Yes, you are right that 2 ways of importing CSV data can be confusing.

I made the CSV import of data laboratory as a simple rows and columns importer from a spreadsheet (CSV only for now).
Since these rows are nodes or edges, the wizard offers some node or edge specific options like assigning ids or creating nodes...

For your files, you would need a id column in nodes file with the same values as the label so edges can be imported sucessfully.

Gogolo · Post by **Gogolo** » 09 Nov 2010 22:09

I also have a lot of problems importing csv format. There seem to be a lot of bugs.

Also gdf is not working for me. I carefully followed the instructions, but the file wont load (Java reports errors on import).

A proper xls import of seperate edge and node tables with attributes would be very nice. Also should be considered if edge relations of source and target could be a attribute signing the direction of the relation (like the cytoscape sif or nnf format). This would reduce data amount and would give other opportunities for attributes relating to both of the edges.

Olivier

rbelew · Post by **rbelew** » 12 Sep 2011 19:20

any guess why i would not have the CSV wizard button included in my DataTable pane?

FYI: Gephi 0.8 alpha, System: Linux version 2.6.32-33-generic running on i386; UTF-8; en_US (gephi)

diannepat · Post by **diannepat** » 15 Dec 2011 00:14

I have encountered some unexpected behavior in gephi's "import spreadsheet" function.
I'm hoping someone can tell me whether this is a bug, or the behavior makes sense in some way.
---------------------------------------------------------
I create an excel sheet like this and import it.
source target type
a b undirected
a c undirected
a c undirected
b c undirected
c b undirected

gephi correctly identifies the a-c relationship as being entered twice, and includes it only once with a weight of 2.
I have no quarrel with this behavior.

However, gephi fails to do the same thing for the b-c relationship.
It removes c-b entirely (recognizing that the pair is repeated), BUT, it does not change the weighting of b-c to 2.
I could understand doing nothing (leaving b-c and c-b as separate relationships with weights of 1), OR
removing c-b, but weighting b-c 2.....
but I don't understand why you would remove c-b but fail to change the weight of b-c.
-------------------------------
Here is what gephi imports from the above example:

source target weight type
a b 1.0 Undirected
a c 2.0 Undirected
b c 1.0 Undirected

It seems to me (naively) that if a-c is weighted as 2, then b-c should be weighted as 2, ELSE b-c and c-b should be left in place.

So, is this a bug, or is it mean to be this way?

Thanks,

Dianne

Post by **eduramiba** » 15 Dec 2011 00:20

Hi,
There cannot be an undirected edge a-b if a b-a exists (only possible with directed).
So yes, this is a bug. The importer should consider it the same edge, since there is no direction specified and sum the weights.

Thanks for reporting this bug, I will fix it.

Eduardo

Gephi forums

[SOLVED] Working with CSV's

Re: Working with CSV's

Re: Working with CSV's

Re: Working with CSV's

Re: Working with CSV's

Re: Working with CSV's

Re: Working with CSV's

no CSV wizard in DataTable?!

Re: Working with CSV's

Re: Working with CSV's