Special characters in GEXF import

All questions about the GEXF (see http://gexf.net before)
Post Reply
GapaxGermany
Posts: 4
Joined: 11 Apr 2012 13:47

Special characters in GEXF import

Post by GapaxGermany » 22 Apr 2012 18:54

Hello,

I'm importing some data into Gephi via self-written GEXF files. In the GEFX files, I set the size of the nodes:

Code: Select all

<node id="3060473429321147825814991366103" label="Mühe" >
<viz:color r="127" g="201" b="127" a="1" />
<viz:shape value="triangle" />
<viz:size value="30"></viz:size>
</node>
You see that the node has the label "Mühe" with the German special character "ü". The file itself is UTF-8 encoded and seems to be fine.

When I import these files into Gephi, all the special charactes are destroyed, for me it seems so that there is somewhere a string conversion done with the wrong encoding. When I don't add the size information, the file is imported fine and everything looks perfect.

GapaxGermany
Posts: 4
Joined: 11 Apr 2012 13:47

Re: Special characters in GEFX import

Post by GapaxGermany » 24 Apr 2012 13:48

Dear all,

I want to bring up this issue again, please forgive me ;-)

I tried it now not only on Windows, but also on Mac, and there the problem is the same. I made a screenshot which shows the problem: on the left, you can see Gephis Data Laboratory with the wrongly displayed label. On the right, it shows a simple text editor which has opened the same GEXF file and which shows the label correctly (it should be "König"):

Image

User avatar
eduramiba
Gephi Code Manager
Posts: 976
Joined: 22 Mar 2010 15:30
Location: Madrid, Spain

Re: Special characters in GEFX import

Post by eduramiba » 24 Apr 2012 14:51

Hi,
Can you share at least some part of the file to see what can be wrong?

Eduardo

GapaxGermany
Posts: 4
Joined: 11 Apr 2012 13:47

Re: Special characters in GEFX import

Post by GapaxGermany » 24 Apr 2012 14:53

I've uploaded the full file ...

... meanwhile, I'm already in the code, seems to be a problem with the XMLStreamReader in ImporterGEXF.java ... ;-)
Attachments
Accent_4.gexf
(39.54 KiB) Downloaded 190 times

User avatar
eduramiba
Gephi Code Manager
Posts: 976
Joined: 22 Mar 2010 15:30
Location: Madrid, Spain

Re: Special characters in GEFX import

Post by eduramiba » 24 Apr 2012 15:27

Adding the Byte Order Mark to the file seems to make it load fine.
But I guess it should not be necessary. I'll check the code.

Eduardo
Attachments
Accent_4_with_BOM.gexf
(39.54 KiB) Downloaded 207 times

GapaxGermany
Posts: 4
Joined: 11 Apr 2012 13:47

Re: Special characters in GEFX import

Post by GapaxGermany » 24 Apr 2012 15:34

Eduardo, thank you very much!

I took a look at the code in "ImporterGEXF.java", in "execute", I switched some lines:

Code: Select all

 InputStream in = new ReaderInputStream(reader);
 xmlReader = inputFactory.createXMLStreamReader(in, "UTF-8");
Where "ReaderInputStream" is a class which converts Reader to InputStream (quite old stuff, but works pretty well here). Then, xmlReader can created with any encoding (here just "UTF-8" hardcoded).

Probably a better way of dealing with UTF-8 files?

User avatar
eduramiba
Gephi Code Manager
Posts: 976
Joined: 22 Mar 2010 15:30
Location: Madrid, Spain

Re: Special characters in GEFX import

Post by eduramiba » 26 Apr 2012 15:26

Well, we should not force the charset to UTF-8 since the reader is prepared to auto detect charset, and it does on most files.
I'm not an expert about this, but I guess sometimes it is just not possible to detect the charset correctly without the BOM?

Eduardo

User avatar
mbastian
Gephi Architect
Posts: 728
Joined: 10 Dec 2009 10:11
Location: San Francisco, CA

Re: Special characters in GEFX import

Post by mbastian » 29 Apr 2012 21:54

Should we open an issue for that?

admin
Gephi Community Manager
Posts: 964
Joined: 09 Dec 2009 14:41

Re: Special characters in GEFX import

Post by admin » 30 Apr 2012 08:05

We should re-open this one, as it is the same bug described differently.

Post Reply