Dynamic attributes and statistics

GSoC developers forum
Post Reply [phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
admin
Gephi Community Manager
Posts:964
Joined:09 Dec 2009 14:41
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
Dynamic attributes and statistics

Post by admin » 22 Mar 2010 09:41

This is the thread for asking more details about the Dynamic Attributes and Statistics proposal.

User avatar
cezar_1
Gephi Core Developer
Posts:20
Joined:23 Mar 2010 02:30
Contact:

Re: Dynamic attributes and statistics

Post by cezar_1 » 31 Mar 2010 20:59

Hi,

I would like to know what do you think about my idea concerning a data structure hosting dynamic attributes. I'll try to explain it briefly.

First of all I would like to use the modified PR-Tree data structure described here: http://moezchaabouni.com/Documents/PR%20Tree.pdf. The modified version is necessary due to a possibility of overlapping between time intervals.

I'll show you what I would like to do with this structure using a simple example:

Let's say we have got this graph:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gephi.org/gexf/1.1draft" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gephi.org/gexf/1.1draft http://gephi.org/gexf/1.1draft.xsd" version="1.1">
    <meta lastmodifieddate="2009-03-20">
        <creator>Gexf.net</creator>
        <description>A Web network changing over time</description>
    </meta>
    <graph mode="dynamic" defaultedgetype="directed" start="2009-01-01" end="2009-03-20">
        <attributes class="node" mode="static">
            <attribute id="0" title="url" type="string"/>
            <attribute id="1" title="frog" type="boolean">
                <default>true</default>
            </attribute>
        <attributes class="node" mode="dynamic">
            <attribute id="2" title="indegree" type="float"/>
        </attributes>
        <nodes>
            <node id="0" label="Gephi" start="2009-03-01">
                <attvalues>
                    <attvalue for="0" value="http://gephi.org"/>
                    <attvalue for="2" value="1"/>
                </attvalues>
            </node>
            <node id="1" label="Webatlas">
                <attvalues>
                    <attvalue for="0" value="http://webatlas.fr"/>
                    <attvalue for="2" value="1" end="2009-03-01"/>
                    <attvalue for="2" value="2" start="2009-03-01" end="2009-03-10"/>
                    <attvalue for="2" value="1" start="2009-03-10"/>
                </attvalues>
            </node>
            <node id="2" label="RTGI">
                <attvalues>
                    <attvalue for="0" value="http://rtgi.fr"/>
                    <attvalue for="2" value="0" end="2009-03-01"/>
                    <attvalue for="2" value="1" start="2009-03-01"/>
                </attvalues>
                <slices>
                    <slice end="2009-03-01">
                    <slice start="2009-03-05" end="2009-03-10">
                </slices>
            </node>
            <node id="2" label="BarabasiLab">
                <attvalues>
                    <attvalue for="0" value="http://barabasilab.com"/>
                    <attvalue for="1" value="false"/>
                    <attvalue for="2" value="0" end="2009-03-01"/>
                    <attvalue for="2" value="1" start="2009-03-01"/>
                </attvalues>
            </node>
        </nodes>
        <edges>
            <edge id="0" source="0" target="1" start="2009-03-01"/>
            <edge id="1" source="0" target="2" start="2009-03-01" end="2009-03-10"/>
            <edge id="2" source="1" target="0" start="2009-03-01"/>
            <edge id="3" source="2" target="1" end="2009-03-10"/>
            <edge id="4" source="0" target="3" start="2009-03-01"/>
        </edges>
    </graph>
</gexf>
We can identify a few time intervals here:
[03.10, 03.20], [03.01, 03.10], [03.01, 03.20] and [01.01, 03.01].
Let's transform them to:
A = [69, 79], B = [60, 69], C = [60, 79], D = [0, 60].

It's the PR-Tree for these intervals (an original one, not modified due to simplification):

Image

Each time interval should contain a vector of "nodes". I wrote nodes, but I mean containers of "attributes" associated with concrete nodes and time intervals. Such attributes should have got only ids and values (to decrease space usage). In our example we have got:

A: (1,2,1)
B: (1,2,2)
C: (2,2,1), (3,2,1)
D: (2,2,0), (3,2,0)

in the format of X: (node_id, atrr_id, atrr_value).

Thanks to the vector structure we can access "nodes" with constant time cost.

Now we want to return attributes values array for a given node (2 for instance) and time interval (let's say 0, 78):

Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken). We can see that the only time intervals which contain 78 are A and C.
Step 2. Now we can easily get the values we are looking for. It's only (2,2,1) from C interval.

If we want to return static graph copy for given time interval, it's even easier. We can build such graph from intervals we get in the step 2.

What do you think about it?

And another question:
Adapt Metrics framework to use Dynamic API to propose dynamic versions of existing metrics.
Does it mean that we should be able to query Metrics framework in order to get metrics for some time interval? Is it sufficient to add additional methods to its API?

User avatar
mbastian
Gephi Architect
Posts:728
Joined:10 Dec 2009 10:11
Location:San Francisco, CA
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Dynamic attributes and statistics

Post by mbastian » 01 Apr 2010 13:06

Hi,

Yes the data structure for dynamic attributes is something close from this. I also found this paper when I was doing research. The problem is that the search query is a unique point, not an interval. What about Interval tree ?
Does it mean that we should be able to query Metrics framework in order to get metrics for some time interval? Is it sufficient to add additional methods to its API?
The idea is to combine dynamic filtering and metrics. The aim is to compute metrics, like "Betweeness Centrality" for the time-intervals. When topology is dynamic (nodes, edges with start/end dates), one can filter the graph and iterate over all intervals. Metrics framework would be extended to allow this.

User avatar
cezar_1
Gephi Core Developer
Posts:20
Joined:23 Mar 2010 02:30
Contact:

Re: Dynamic attributes and statistics

Post by cezar_1 » 01 Apr 2010 14:44

mbastian wrote: The problem is that the search query is a unique point, not an interval.
Hmm, so I was mistaken here:
Step 1. We should look for 78 in our PR-Tree because we are interested in attributes values from upper bound of the time interval (tell me if I am mistaken).
I just read the article http://www.cmu.edu/joss/content/article ... McFarland/ and noticed interesting sentences:
[...] Thick slices have start and end times that define an interval, and include all events that either occur within the interval or whose duration intersect the interval (see figure 4c).9 Thick slices query the data similar to the way a questionnaire might ask about ongoing or past relationships. For example, "show me all the loans among firms that took place during 1994.” It is not a picture of a network at a point in time but rather a period in time, a collection of data within a certain range, as in the bin of a histogram. Thick slices are a way of discretizing (pseudo-) continuous data, and thin slices are a way to sample continuous or real-valued versions of discrete-interval data.

A slice is a “bin” that contains a set of events in time, but for most network operations, the time information is not used and the network data must be collapsed to form a matrix or arc-list. Depending on the duration or "thickness" of the slice, it is possible that there is more than one arc between a given pair of nodes (or there may be multiple kinds of ties) so it is important to consider how these ties will be aggregated. For most variables, common operations like sum, maximum, count, and average can be used. (Non-numeric categorical attributes will need more sophisticated treatment in the future.)
This is something that should be done. All things considered Interval Tree is necessary in fact. I can implement this instead of PR-Tree and in the step in which we get intervals that overlap the interval we look for, I should do something with "ties". I think that for numbers a potential user should have got a possibility to choose what to do with them (sum, maximum, count and average as described in the article). For non-numeric attributes he should choose between something like the first occurrence, the last occurrence etc.
The idea is to combine dynamic filtering and metrics. The aim is to compute metrics, like "Betweenness Centrality" for the time-intervals. When topology is dynamic (nodes, edges with start/end dates), one can filter the graph and iterate over all intervals. Metrics framework would be extended to allow this.
So, this is "only" querying The Dynamic API to get graphs for time intervals and computing metrics of such graphs. What about efficiency? Sometimes it could take much time to compute metrics of a single "snapshot"...

EDIT: OK, I know what you would like to get :P It should be only possibility to compute metrics of a single snapshot, not computing them automatically while changing time intervals...

EDIT 2: I've been thinking about this:
Propose idea for dynamic visualization of attributes (color, size, label color, label size, edge weight).
It's a bit challenging, because it depends on abstract sense of attributes very much. Taking this into consideration I think a potential user should decide what kind of visualization should be used for each attribute. For example:

Let's say that some graph's nodes contain an attribute like priority. A potential user could choose between several possibilities like: nothing (it means this attribute shouldn't be visualized), color (for min and max value, it could be combined with "ties" problem, i.e. in case of average this color could be gradient or something like that), stroke thickness etc.

Of course this idea is connected with UI development. Is it something that should be a part of my proposal?

User avatar
mbastian
Gephi Architect
Posts:728
Joined:10 Dec 2009 10:11
Location:San Francisco, CA
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Dynamic attributes and statistics

Post by mbastian » 02 Apr 2010 16:48

Yes, right!

Yes with intervals instead of points, several slices could match. It's nice to be able to get sum, average and these kind of functions for numerical indeed.
Of course this idea is connected with UI development. Is it something that should be a part of my proposal?
Yes, add all your reflexions in your proposal. Reviewing what other softwares propose could be also useful.

Cristina Grigoruta
Posts:3
Joined:08 Apr 2010 19:19
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Dynamic attributes and statistics

Post by Cristina Grigoruta » 20 Apr 2010 01:12

With regarding your comment on my proposal, (link to your proposal), I would like to describe the API by listing all the functions, their uses, and a high level view of how they would be implemented. In this way I can present all the methods that other developers can utilize my segment of code. What do you think?

User avatar
mbastian
Gephi Architect
Posts:728
Joined:10 Dec 2009 10:11
Location:San Francisco, CA
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Dynamic attributes and statistics

Post by mbastian » 20 Apr 2010 14:17

Sure, I agree with this approach

Cristina Grigoruta
Posts:3
Joined:08 Apr 2010 19:19
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable

Re: Dynamic attributes and statistics

Post by Cristina Grigoruta » 21 Apr 2010 12:27

I have posted the answer on the comment section of my proposal.

Have a nice day:)

Post Reply
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable
[phpBB Debug] PHP Warning: in file [ROOT]/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1275: count(): Parameter must be an array or an object that implements Countable