Indexed Attributes API using Lucene

GSoC developers forum
Post Reply [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
admin
Gephi Community Manager
Posts:964
Joined:09 Dec 2009 14:41
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Indexed Attributes API using Lucene

Post by admin » 19 Mar 2011 18:09

This is the thread for asking more details about the Indexed Attributes API using Lucene proposal.

eaneiros
Gephi Plugin Developer
Posts:3
Joined:30 Mar 2011 22:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Indexed Attributes API using Lucene

Post by eaneiros » 04 Apr 2011 21:13

Hi,

I'm interested in working with the Lucene proposal over the summer and have implemented a very simple proof of concept. Anyone interested can pull the code from https://code.launchpad.net/~eaneiros/gephi/lucene1. This is not meant to showcase anything related to design or architecture but just to get an idea of what path to follow and get a feeling of how the integration between the Lucene and the Attributes API might work.

To test the feature follow the steps below:

1 - Download the branch, open in Netbeans, compile & run
2 - Download the test case attached to this post and open it. I use this one because the node columns contain text data like country, name, programming language, etc which are useful to test Lucene.
3 - In the Data Laboratory Node view, you will see a button "Index" to the left of the filter textbox. Click it and Lucene will index all the nodes.
4 - Your are now ready to use Lucene in Gephi!! Enter your Lucene queries in the Filter textbox hit enter and see the results appear in the data table.
5 - To reset the data table erase the textbox and press enter.

Here are some interesting queries that you can try:

language:lisp - All developers that use some flavor of Lisp
location:"United States" AND language:Ruby - All developers from the United States that use Ruby
location:United AND (language:Ruby OR language:Javascript) - All developers from either the United States or the United Kingdom that use either Ruby or Javascript

Note: since this is such a primitive implementation any problems you find check stdout first for any clues as to what the problem might be. If it persists use the only true way of fixing bugs, restart Gephi and pray, or change your query ;)
Attachments
github-profiles.gdf
Test case for Lucene proof of concept
(2.42MiB)Downloaded 707 times

User avatar
eduramiba
Gephi Code Manager
Posts:1064
Joined:22 Mar 2010 15:30
Location:Madrid, Spain
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Indexed Attributes API using Lucene

Post by eduramiba » 04 Apr 2011 23:34

Hi eaneiros,

Well this is a nice start as a proof of concept, you already made changes to Gephi code :)

The real implementation of indexing should be done on Attributes API and implementation be flexible enough to work like it does now or with indexing.

For the future implementation some key features that I consider important are:
  • Ability to choose normal/indexed attributes from the start when opening a graph file. This is important to be able to load graphs with very large amounts of data that can't be stored all in memory.

    Provide an API for other modules like data laboratory to use for enabling/disabling the index, changing some behaviour (columns to store for example), perform a search...

    Possible usage to improve ranking/partition/filters.

    An easy way to build queries could be nice.
Looking forward to your proposal! Remember to include your ideas about how the architecture and API will be designed.

Also remember that these previous specifications draft can be useful to you http://wiki.gephi.org/index.php/Core_ev ... API_Future

eaneiros
Gephi Plugin Developer
Posts:3
Joined:30 Mar 2011 22:32
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Indexed Attributes API using Lucene

Post by eaneiros » 05 Apr 2011 21:01

Hi Eduardo,

definitely, those requirements are an absolute must, I had already included some of them in my draft. I read and analyzed the previous proposal and found some really valuable ideas. I'm building mine with a different concept in mind because I think that flexibility and ease of use are the two main goals that the API must achieve, performance will come later and if the design is right it shouldn't be a problem.

I'm giving the final touches to my proposal and will submit it soon!!! Thanks for the fast reply and best regards,

ernesto.

Post Reply
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable