ENRON email dataset visualization

User's perspective on software quality
Post Reply [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Merlin Blume
Posts:6
Joined:02 Jun 2012 15:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
ENRON email dataset visualization

Post by Merlin Blume » 02 Jun 2012 15:45

Hi there,

my name is Merlin and I am studying at a university in Germany. For one class we have got to somehow visualize the Enron email dataset. The tools we can use are pretty much free to chose. Today for the first time I found Gephi. Awesome project!

Well my question is how you would in general try to create a program that is highly interactive (GUI + realtime visualization) showing the connections between people at Enron doing some kind of fraud detection. (Kind a like the "Enron corpus explorer" of "J. Heer"). it should also be possible to have a look at the actual emails. maybe also some kind of categorizing would be neat.

I am pretty new to this subject and the time we have (my group and I) got left is pretty limited!

This is what we have achieved so far: Putting the raw data into an organzied MySQL database to work with.

The initial Plan was to use Java combined with Processing using Eclipse as developement IDE. But using Processing would mean, that we have got to pretty much invent everything from scratch (exept graphical lib stuff). There are also some older libs like Prefuse but they stoped developing their tools and there is no support left.

So what do you think, is it an easy, clean and effective way to use the Gelphi Toolkit to create a tool that can fulfill these criterias? Do you have got any other ideas? Is there already something similar (explorational email traffic visualization) out there we can build on?

Thank you VERY much for your help!

Bye, Merlin.

User avatar
seinecle
Gephi Community Support
Posts:546
Joined:08 Feb 2010 16:55
Location:Lyon, France
Contact:

Re: ENRON email dataset visualization

Post by seinecle » 04 Jun 2012 08:41

Hi,

I am not sure... if you build a GUI on top of the Gephi Toolkit, that's pretty much reinventing the Gephi desktop application, no? But if you'd want to do it anyway, I suppose you would have to be familiar with the Netbeans development environment.

Does it have to be Java? For a simpler GUI in the browser, you could look at Sigma.js and build from it to suit your needs. Many other javascript libraries would help you build nice browser based network apps (d3.js for one).

If that has to be java, there could also be GraphStream (http://graphstream-project.org/doc/Gallery/)
I've never used it but the graphics are outstanding. It probably does not include a ready-made API to connect to a SQL database but hey, that would be your job to create it no?

Best,

Clement





Best,

Clement

Merlin Blume
Posts:6
Joined:02 Jun 2012 15:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: ENRON email dataset visualization

Post by Merlin Blume » 04 Jun 2012 11:32

Hi Clement,

thank you for the fast answer!

It doesn't have to be Java. Pretty much everything can be used so JavaScript is an alternative too. We just have got to find a simple way (because of the lack in time) to implement an exploratory approach to visualize then Enron email dataset. Also it doesn't have to be a graph but I guess this form of representation would make sense somehow.

Using JavaScript and d3.js would be on the one hand a great benefit, because they already have got some predefined visualization charts. On the other hand I don't know if d3.js could handle the enormous abount of data. There is no way as far as I know to use MySQL, so we would have to export all the emails n stuff into a JSON or CSV file.

I only have got little knowledge of JavaScript and its different libraries. Is it possible to build GUIs with just some Buttons, Rangesliders etc. and a panel where you have got your visualization?


Our first idea was to work with Processing and Java using Eclipse as IDE. This way you can create a SWING based GUI and place some different PApplets into your Programm. The problem is that Processing doesn't come with any spezialized InfoViz Library (afaik). So everything would have to be done by scratch. What do you think?


Bye Merlin

PS: If you can have a look at the attachment you can see what we thought could also be a way to analyse the emails. Using this sort of pipeline breaks the huge amount of mails down with every single step until you only have got some of them left. What would you say would be a nice and easy way to realize something like that (regarding the language, frameworks ...). Do you think we could handle this neatly using JS and d3?

Thanks again for your help!!!
Attachments
flowChart + diagramms.pdf
(241.97KiB)Downloaded 410 times

User avatar
seinecle
Gephi Community Support
Posts:546
Joined:08 Feb 2010 16:55
Location:Lyon, France
Contact:

Re: ENRON email dataset visualization

Post by seinecle » 04 Jun 2012 12:20

Hi,

To make things clear I have a self-taught, limited background in programming so I hope other interested readers will jump in this conversation to provide a more professional point of view.

On Javascript and GUI: yes, it is very possible to create an interactive GUI, see for example http://www.velt.info/rencnum/mentions.html (source at: https://github.com/raphv/gexf-js)

You don't have to build these UI elements from scratch. Many libs exist, easy to implement. For example, JQuery, which makes js even easier to learn, has a dedicated collection of widgets for UI: http://jqueryui.com.

One of these widgets: http://jqueryui.com/demos/slider/range.html

Other libs by Jquery or in plain js will help you a lot.
For example, googling "jquery wordle" I found this: http://primegap.net/2011/03/04/jqcloud- ... rd-clouds/

Also, Processing is available with Processing.js, or you can use raphael.js which has less features but is compatible with all browsers (because it's not html5).

So if time is a constraint, using javascript surely provides a solution.

Finally, on the server side of things. I have no experience here but I don't see that as a big issue. There must exist super easy javascript or jquery libs to query a MySQL database in a fast way, and then stackoverflow will help you solve your problems on the way.

Good luck!

Best,

Clement

Merlin Blume
Posts:6
Joined:02 Jun 2012 15:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: ENRON email dataset visualization

Post by Merlin Blume » 05 Jun 2012 08:03

Hi Clement,

thank you for your usefull input. Today we will discuss which way we want to go. JavaScript or Java. It also depends on the liking of the others.

Bye Merlin.

PS: As soon as we start with the programming I will let you know about the progress that we make (hopefully ... :D).

Merlin Blume
Posts:6
Joined:02 Jun 2012 15:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: ENRON email dataset visualization

Post by Merlin Blume » 26 Jun 2012 10:35

Hi Clement,

it has been a long time since I have been writing you here in this topic. Right now I am considering to use the gexf-js script that you have mentioned the last time. Do you know if there is any way (plugin) to encode informations queried from the mySQL database using PHP into a gexf file? We have got to dynamically create the file because the user can select within which time range he wants to see the emails visualized as a graph.

Bye Merlin

Merlin Blume
Posts:6
Joined:02 Jun 2012 15:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: ENRON email dataset visualization

Post by Merlin Blume » 27 Jun 2012 17:26

... also I can't find out how I can use a sort of "click listener" within sigma.js so that I can return the selection of a node a user has made.

Can you help me please?

User avatar
seinecle
Gephi Community Support
Posts:546
Joined:08 Feb 2010 16:55
Location:Lyon, France
Contact:

Re: ENRON email dataset visualization

Post by seinecle » 03 Jul 2012 14:40

Hi,

Sorry that I can't help you out on both issues. For SQL to XML through php I think you found a solution on Stack Overflow? If that's not you, maybe that you could be inspired:
http://stackoverflow.com/questions/1124 ... p-too-slow

For the question on SigmaJS, I suggest you try to contact the creator at @jacomyal on Twitter, he might be able to reply to you directly.

Good luck with the deadline!

Best,

Clement

Post Reply
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable