Page 1 of 1
Special characters in GEXF import
Posted: 22 Apr 2012 18:54
by GapaxGermany
Hello,
I'm importing some data into Gephi via self-written GEXF files. In the GEFX files, I set the size of the nodes:
Code: Select all
<node id="3060473429321147825814991366103" label="Mühe" >
<viz:color r="127" g="201" b="127" a="1" />
<viz:shape value="triangle" />
<viz:size value="30"></viz:size>
</node>
You see that the node has the label "Mühe" with the German special character "ü". The file itself is UTF-8 encoded and seems to be fine.
When I import these files into Gephi, all the special charactes are destroyed, for me it seems so that there is somewhere a string conversion done with the wrong encoding. When I don't add the size information, the file is imported fine and everything looks perfect.
Re: Special characters in GEFX import
Posted: 24 Apr 2012 13:48
by GapaxGermany
Dear all,
I want to bring up this issue again, please forgive me
I tried it now not only on Windows, but also on Mac, and there the problem is the same. I made a screenshot which shows the problem: on the left, you can see Gephis Data Laboratory with the wrongly displayed label. On the right, it shows a simple text editor which has opened the same GEXF file and which shows the label correctly (it should be "König"):
Re: Special characters in GEFX import
Posted: 24 Apr 2012 14:51
by eduramiba
Hi,
Can you share at least some part of the file to see what can be wrong?
Eduardo
Re: Special characters in GEFX import
Posted: 24 Apr 2012 14:53
by GapaxGermany
I've uploaded the full file ...
... meanwhile, I'm already in the code, seems to be a problem with the XMLStreamReader in ImporterGEXF.java ...
Re: Special characters in GEFX import
Posted: 24 Apr 2012 15:27
by eduramiba
Adding the Byte Order Mark to the file seems to make it load fine.
But I guess it should not be necessary. I'll check the code.
Eduardo
Re: Special characters in GEFX import
Posted: 24 Apr 2012 15:34
by GapaxGermany
Eduardo, thank you very much!
I took a look at the code in "ImporterGEXF.java", in "execute", I switched some lines:
Code: Select all
InputStream in = new ReaderInputStream(reader);
xmlReader = inputFactory.createXMLStreamReader(in, "UTF-8");
Where "ReaderInputStream" is a class which converts Reader to InputStream (quite old stuff, but works pretty well here). Then, xmlReader can created with any encoding (here just "UTF-8" hardcoded).
Probably a better way of dealing with UTF-8 files?
Re: Special characters in GEFX import
Posted: 26 Apr 2012 15:26
by eduramiba
Well, we should not force the charset to UTF-8 since the reader is prepared to auto detect charset, and it does on most files.
I'm not an expert about this, but I guess sometimes it is just not possible to detect the charset correctly without the BOM?
Eduardo
Re: Special characters in GEFX import
Posted: 29 Apr 2012 21:54
by mbastian
Should we open an issue for that?
Re: Special characters in GEFX import
Posted: 30 Apr 2012 08:05
by admin
We should re-open
this one, as it is the same bug described differently.