Edges & BC Calculation-to the kind Attn. of Clement

vieirae · Post by **vieirae** » 01 Oct 2014 13:05

Greetings Clement,

I have used Textexture.com online by Dmitry Paranyushkin, who is affiliated with Gephi. A sample gefx file that I generated in Textexture is attached. When I open it in Gephi, there are a few network measures including betweenness centrality. However, when I generate BC in Gephi using the Textexture generated gefx file, the BC values are only approximately 1/3 the values of the original Textexture gefx file BCs. The modularity classes vary as well, however to a lesser extent.

Perhaps, the method that Textexture generated the initial nodal (word) edges may be responsible for the differences. However, I am unsure because I was unsuccessful in duplicating the Textexture results manually using a short, simple text network. In any case, below are two descriptions that Textexture noted as the approach to calculating edges.

“First, the normalized text above is scanned using a 2-word gap. For each word, if it appears the first time in the text, it’s recorded as a new node with the id that equals the name of the node. When two words appear within the gap, the algorithm first checks if the pair exists already. If the pair does not exist yet, a new connection (an edge) is recorded where the first word is the source and the second word is the target, the weight equals 1. If the pair exists already, the weight of the corresponding edge is incremented by 1. This way we trace the narrative and create a concept graph from the text. Each connection is based on the words’ proximity to each other. The more frequent the combination of words, the higher is the weight of connection between them. When the scanner reaches the end of the paragraph, it jumps to the next one in order to avoid that the last word from the previous paragraph is linked to the first word from the next one. This helps us to somewhat translate the spatial structure of the text into the graph.”

“The second pass uses a 5-word gap and follows a similar procedure. For each combination of 5 words, starting from the beginning of the text, the algorithm first checks whether each word pair exists. If it does already (because of the 2-word gap pass before), the weight of the connection (or the edge) between the pair is incremented by one. If it does not exist, the new pair is recorded as the new edge (the weight equals 1), where the source is the word to the left of the gap and the target is the word to the right of the gap. The words adjacent to the words in the beginning and in the end of each paragraph will have a slightly less intense connection (as the 5-word gap starts at the first word of a paragraph and terminates when it reaches the last word of the paragraph, then jumping to the next paragraph and starting again from the first word). Such approach allows us to accommodate further for the spatial structure already utilized within the text. It also allows us to increase the intensity of connections between the words that are more proximate to each other. If the first 2-word gap scan is sketching a general structure of the text intensifying repetitions of adjacent words within the text and outlining its paragraph structure, then the 5-word gap scan is a kind of zooming in tool into the local areas of the text, which allows us to intensify the local clusters of meaning overlaying them on the general structure created before.”

Attached is the article.

For my type of text study implementing network/graphical analyses, the 2 and 5 work gaps weighted tabulation of edges is important. However, I am concerned about the inconsistencies b/w Textexture and Gephi in the BC and modularity class measures. The degrees appear equal, I think, and the output of edges generated in Textexture and Gephi are equal. I wonder if the differences rest in the algorithms used and perhaps weighed edges. Clement, any guidance that you can provide would be deeply appreciated. Thank you very much as usual.

Ed