Anonymize Data

Get help with your data
nullusadinfinitum
Posts:13
Joined:08 Jun 2011 07:31
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Anonymize Data

Post by nullusadinfinitum » 01 Aug 2011 09:06

Hello,

Does anyone know of a script or application to anonymize network data (in .CSV format)? I've got some new data I've collected and I want to make it available in the public domain. I've got CSV files that look like this:

ACTORA,ACTORB
ACTORA,ACTORC
ACTORB,ACTORD,ACTORF,ACTORG,ACTORH
ACTORC,ACTORD,ACTORG,ACTORI,ACTORJ
ACTORC,ACTORE

I'm trying to convert ACTORA to 1 and ACTORB to 2, etc. I.e., I want it to look like this:

1,2
1,3
2,4,5,6,7
3,4,6,8,9
3,10

Does anyone know how to do this? Have a look at the data sets at http://snap.stanford.edu/data/index.html. That format would be perfect. I'd really appreciate your help on this, as I want to make the data available to the community. :P

Thank you kindly!

User avatar
eduramiba
Gephi Code Manager
Posts:1064
Joined:22 Mar 2010 15:30
Location:Madrid, Spain
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by eduramiba » 01 Aug 2011 13:28

Hi, well I don't know one but if you are using Gephi, you can use the default generated Ids and remove personal data (copy Id column to label column for example).

Eduardo

nullusadinfinitum
Posts:13
Joined:08 Jun 2011 07:31
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by nullusadinfinitum » 01 Aug 2011 13:55

eduramiba wrote:Hi, well I don't know one but if you are using Gephi, you can use the default generated Ids and remove personal data (copy Id column to label column for example).

Eduardo
Thank you for your help. The problem I have is that both the label and id columns are the same and they contain the personal data (I'm opening a CSV file). Any idea how I can anonymize these?

User avatar
eduramiba
Gephi Code Manager
Posts:1064
Joined:22 Mar 2010 15:30
Location:Madrid, Spain
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by eduramiba » 01 Aug 2011 14:36

Oh, I see, I can't find a way to do this easily without programming.

nullusadinfinitum
Posts:13
Joined:08 Jun 2011 07:31
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by nullusadinfinitum » 01 Aug 2011 17:03

eduramiba wrote:Oh, I see, I can't find a way to do this easily without programming.
Any idea where I can go to get help with writing some code for this? How much code would it be to do something like that?

User avatar
eduramiba
Gephi Code Manager
Posts:1064
Joined:22 Mar 2010 15:30
Location:Madrid, Spain
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by eduramiba » 01 Aug 2011 19:09

It should be a short code. We can import the file with Gephi toolkit, set the Nodes Ids to 1,2,3... and export it.

nullusadinfinitum
Posts:13
Joined:08 Jun 2011 07:31
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by nullusadinfinitum » 01 Aug 2011 20:51

eduramiba wrote:It should be a short code. We can import the file with Gephi toolkit, set the Nodes Ids to 1,2,3... and export it.
Hmm, can you walk me through an example?

User avatar
eduramiba
Gephi Code Manager
Posts:1064
Joined:22 Mar 2010 15:30
Location:Madrid, Spain
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by eduramiba » 01 Aug 2011 22:08

Hi, just had an idea, you could do it with this http://gephi.org/plugins/script-console/ wonderful plugin

It is really simple:
Open Gephi 0.8 alpha
Go to Tools, Plugins, Available Plugins and there install the Script Console plugin
Reboot Gephi

Open your graph file, copy and paste the following code

Code: Select all

import java.lang.String as String
i=0

graph = getGraph()
for n in graph.getNodes():
  i=i+1
  graph.setId(n,String.valueOf(i))   
print i, "nodes"
Click Run

And that should be enough to anonimyze the Id column. For the label column, you can copy Id column values in Data Laboratory for example

Eduardo

nullusadinfinitum
Posts:13
Joined:08 Jun 2011 07:31
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by nullusadinfinitum » 02 Aug 2011 05:00

In addition to the above proposed solutions, I have obtained the following Python code to do anonymize data programmatically:

Code: Select all

import sys

hashes = {}
count = 1
with open(sys.argv[1]) as f1:
    for line in f1:
        actors = line.strip("\n").split(',')
        hashActors = []
        for actor in actors:
            try:
                hashActors.append(hashes[actor])
            except KeyError:
                hashes[actor] = str(count)
                hashActors.append(str(count))
                count += 1
        print(",".join(hashActors))
Thought I would post it here in case someone needs to anonymize data in the future. Thank you to everyone who assisted with this issue! Much obliged.

nullusadinfinitum
Posts:13
Joined:08 Jun 2011 07:31
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Anonymize Data

Post by nullusadinfinitum » 03 Aug 2011 12:49

seniyajw wrote:In the case of parallel edges, I suggest to alert the user and make the "road Import" to act as a CSV file importer, adding weight, if possible, and leave blank the other attributes. I opened a mistake.
Not quite sure I understand. Would you be able to elaborate? Are you referring to the Python code?

Post Reply
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable