The first one I found:
http://gephi.org/users/supported-graph- ... sv-format/
is really confusing because it starts off with semicolons in it, and then switches to commas.
Then I found this one:
which is more helpful.
I'm still a little confused. Here is what I think I understand so far:
If I want to build a graph in Gephi from a csv, I actually need two csv files if I want to have extra properties (columns) on both the nodes and edges. [ So I can use Partitioning and extra Filters. ]
One will define the nodes and the node properties. Each line in the csv represents a single node.
The other defines the edges, with each line representing an edge. Edges can have properties too.
Both imports expect the first line of the CSV to contain headers identifying the columns. If the header names match standard Gephi column names, that is where the data will be populated.
If an edge occurs more than once in the file, the weight value will not automatically be incremented unless you read it in via "File/Open" instead of 'Import CSV'. (However, you can count up the duplicate edges and add a 'Weight' column before trying to read it in.)
After selecting "New Project", and navigating to the Data Laboratory, I should import the nodes CSV first. Once the nodes table is populated, I can import the edges.
The problem is that the edge import doesn't recognize any of the nodes I already added. My theory was that this is because when I imported the nodes, they had extra columns.
If I import the edges first, and let it seed the nodes table, the same thing happens when I go to import the nodes. The nodes that get pulled in don't map to the ones already loaded.
If I remove the extra columns, they still don't seem to map to each other. So maybe the extra-column theory is wrong. Should Source and Target in the Edge CSV be Ids, Labels, or some other value in the node table?
If I load in just the edges (with properties) it will create the nodes and build me a graph with properties and I can use the Partitioning tool on the edges.
** I've also noticed that if I close the project, (don't save) and then select "New Project" and import the nodes again, the 'Id' column numbering starts up where it left off from the last project. This is a minor bug. I don't really mind exiting and restarting Gephi after each experiment with CSV imports. But I thought I should mention it.
[ I'm working in 0.7Beta. ]
Here are some a couple of sample CSV's of the sorts I've been experimenting with:
Code: Select all
Label,PPAVM Rick,Person Jeff,Person Liam,Person Farm,Place City,Place Cow,Animal Dog,Animal Carrot,Vegetable Brocolli,Vegetable Salt,Mineral
Code: Select all
Source,Target,Relationship Rick,Carrot,Eats Dog,Brocolli,Eats Cow,Farm,Lives Brocolli,Farm,Grows Dog,City,Lives Jeff,Cow,Eats Liam,City,Lives