Saxparser doesn't work when encountering &quot [solved]

Put your problem here if it does not fit any of the other categories.

Saxparser doesn't work when encountering &quot [solved]

Postby jumpbug » Sun Mar 28, 2010 3:46 pm

Hello,

I'm using SAXParser to parse an XML (with SDK 1.5).
Everything works as it should, unless in the following situation in the XML:

<title>Some text &quot;some quoted text&quot;</title>

I'm using the following to parse this text:

Syntax: [ Download ] [ Hide ]
Using xml Syntax Highlighting
  1.  
  2. @Override
  3.  
  4. public void characters(char ch[], int start, int length) {
  5.  
  6.                   strText = (new String(ch, start, length));
  7.  
  8.     }
Parsed in 0.000 seconds, using GeSHi 1.0.8.4


The resulting string strText only contains one single quote ("), not even the text before the quote.

I thought it might be a bug (I found a bug report similar to this) so I tried the most recent SDK but with the same results.

The problem does not exist when the text is placed within a CDATA tag, but unfortunately I don't control the XML myself.

What am I doing wrong? Is there a way around this?

Thanks in advance.
Last edited by jumpbug on Thu Apr 01, 2010 1:40 pm, edited 1 time in total.
jumpbug
Junior Developer
Junior Developer
 
Posts: 12
Joined: Thu Mar 25, 2010 12:25 pm

Top

Postby vik » Tue Mar 30, 2010 11:30 am

i had the same error in the sax parser i just did


Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
  1.  
  2.                str.append(new String(ch, start, length));
  3.  
  4.                 if(this.bio_value == null)
  5.  
  6.                 {
  7.  
  8.                         this.bio_value = str.toString();
  9.  
  10.                 }
  11.  
  12.                 else
  13.  
  14.                 {
  15.  
  16.                         this.bio_value += str.toString();
  17.  
  18.                 }
Parsed in 0.031 seconds, using GeSHi 1.0.8.4
User avatar
vik
Senior Developer
Senior Developer
 
Posts: 141
Joined: Wed Sep 09, 2009 7:32 am

Postby jumpbug » Thu Apr 01, 2010 1:40 pm

I have found the problem: it seems that the saxparser interprets any special character as the start/end and starts a new string every time.

So

This: "Is a text"

was parsed as

This:
"
Is a text
"

The string then only contained the last parsed line.

It was solved by appending the string rather than recreating a new one each time. So your hint was valuable, thanks!
jumpbug
Junior Developer
Junior Developer
 
Posts: 12
Joined: Thu Mar 25, 2010 12:25 pm

cdata

Postby frankie » Mon Apr 19, 2010 9:02 am

my xml contains a cdata tag , which when parsed shows a null result. Do i need to do anything else so that the parser reads through the cdata tag and returns the value.

thanks
frankie
Junior Developer
Junior Developer
 
Posts: 11
Joined: Wed Apr 07, 2010 12:49 pm
Location: india

Postby jumpbug » Mon Apr 19, 2010 9:53 am

Not with the saxparser. I don't know about DOM or Pull.

Can you post the relevant code and the XML you're trying to parse?
jumpbug
Junior Developer
Junior Developer
 
Posts: 12
Joined: Thu Mar 25, 2010 12:25 pm

Postby frankie » Mon Apr 19, 2010 12:02 pm

thanks for the swift reply.

My xml file is somthing like this.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title> News - Top Stories </title>
<link>http://www.news.com/news/</link>
<language>en</language>
<lastBuildDate>April 19, 2010 3:15 PM</lastBuildDate>
<image>

<url></url>
<link></link>
</image>


<item>
<title><![CDATA[EU says half of normal flights may run today]]></title>
<link></link>
<guid isPermaLink="false">20352</guid>

<pubDate>April 19, 2010 3:09 PM</pubDate>
<AlsoSeeLink1>http://</AlsoSeeLink1>
<MobileText><![CDATA[Some European airports were reopening to limited traffic on Monday after volcanic ash forced their closures, a day after the European Union said that if weather forecasts confirm the skies are clearing, air traffic over the continent could return to about 50 per cent of normal levels.]]></MobileText>
<StoryImage></StoryImage>
<DateLine><![CDATA[]]></DateLine>
<description><![CDATA[Some European airports were reopening to limited traffic on Monday after volcanic ash forced their closures, a day after the European Union said that if weather forecasts confirm the skies are clearing, air traffic over the continent could return to about 50 per cent of normal levels.]]></description>
</item>

The statement to parse is :
Xml.parse(this.getInputStream(), Xml.Encoding.UTF_8,root.getContentHandler());

and

my handler is

public void characters(char[] ch, int start, int length)
throws SAXException {
super.characters(ch, start, length);

builder.append(ch, start, length);
}

@Override
public void endElement(String uri, String localName, String name)
throws SAXException {
super.endElement(uri, localName, name);
if (this.currentMessage != null){
if (localName.equalsIgnoreCase(TITLE)){
currentMessage.setTitle(builder.toString());
} else if (localName.equalsIgnoreCase(LINK)){
currentMessage.setLink(builder.toString());
} else if (localName.equalsIgnoreCase(DESCRIPTION)){
currentMessage.setDescription(builder.toString());
} else if (localName.equalsIgnoreCase(PUB_DATE)){
currentMessage.setDate(builder.toString());
} else if (localName.equalsIgnoreCase(ITEM)){
messages.add(currentMessage);
}
builder.setLength(0);
}
}

@Override
public void startDocument() throws SAXException {
super.startDocument();
messages = new ArrayList<Message>();
builder = new StringBuilder();
}

@Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase(ITEM)){
this.currentMessage = new Message();
}
}
}
frankie
Junior Developer
Junior Developer
 
Posts: 11
Joined: Wed Apr 07, 2010 12:49 pm
Location: india

Top

Postby jumpbug » Mon Apr 19, 2010 5:21 pm

Hi Frankie,

I don't have much time to look at your code at the moment, but here is a snippet of the working code that I have used in my newsreader project:

This is a method to read the XML:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
  1.  
  2.         public static void ReadXML(String strSourceURL) {        
  3.  
  4.          
  5.  
  6.         try {
  7.  
  8.  
  9.  
  10.                         URL url = new URL(strSourceURL);
  11.  
  12.  
  13.  
  14.                SAXParserFactory XMLNewsItems = SAXParserFactory.newInstance();
  15.  
  16.                SAXParser sp_XMLNewsItems = XMLNewsItems.newSAXParser();
  17.  
  18.  
  19.  
  20.                XMLReader xr_sp_XMLNewsItems = sp_XMLNewsItems.getXMLReader();
  21.  
  22.                
  23.  
  24.                XMLHandler_Channel myXMLHandler = new XMLHandler();
  25.  
  26.                xr_sp_XMLNewsItems.setContentHandler(myXMLHandler);
  27.  
  28.                
  29.  
  30.                xr_sp_XMLNewsItems.parse(new InputSource(url.openStream()));
  31.  
  32.  
  33.  
  34.                          
  35.  
  36.           } catch (Exception e) {
  37.  
  38.                   Log.e("ERROR_CATCHER", e.toString());
  39.  
  40.           }
  41.  
  42.    
  43.  
  44.     }  
  45.  
  46.  
Parsed in 0.034 seconds, using GeSHi 1.0.8.4



This is the XMLHandler class, referenced in the above code, which is used to parse the XML:

Syntax: [ Download ] [ Hide ]
Using java Syntax Highlighting
  1. import org.xml.sax.Attributes;
  2.  
  3. import org.xml.sax.SAXException;
  4.  
  5. import org.xml.sax.helpers.DefaultHandler;
  6.  
  7.  
  8.  
  9. import android.util.Log;
  10.  
  11.  
  12.  
  13. public class XMLHandler extends DefaultHandler{
  14.  
  15.      
  16.  
  17.      private boolean b_title = false;
  18.  
  19.      private String strTitle = null;
  20.  
  21.  
  22.  
  23.      @Override
  24.  
  25.      public void startDocument() throws SAXException {
  26.  
  27.           // do something
  28.  
  29.      }
  30.  
  31.  
  32.  
  33.      @Override
  34.  
  35.      public void endDocument() throws SAXException {
  36.  
  37.           // do something
  38.  
  39.      }
  40.  
  41.  
  42.  
  43.      @Override
  44.  
  45.      public void startElement(String namespaceURI, String localName,
  46.  
  47.                String qName, Attributes atts) throws SAXException {
  48.  
  49.            if (localName.equals("title")) {
  50.  
  51.               this.b_title = true;  
  52.  
  53.               strTitle = "";
  54.  
  55.           }
  56.  
  57.      }
  58.  
  59.      
  60.  
  61.      @Override
  62.  
  63.      public void endElement(String namespaceURI, String localName, String qName)
  64.  
  65.                throws SAXException {
  66.  
  67.            if (localName.equals("title")) {
  68.  
  69.               this.b_title = false;
  70.  
  71.               Log.i("DEBUG_INFO","Parsed title: " + strTitle);
  72.  
  73.           }              
  74.  
  75.      }
  76.  
  77.      
  78.  
  79.      @Override
  80.  
  81.      public void characters(char ch[], int start, int length) {
  82.  
  83.           if(this.b_title){
  84.  
  85.                   strTitle += (new String(ch, start, length));
  86.  
  87.            }
  88.  
  89.  
  90.  
  91.     }
  92.  
  93.  
  94.  
  95. }
Parsed in 0.040 seconds, using GeSHi 1.0.8.4


The XML I am reading contains CDATA tags in each XML element.

I hope this gives you some insight.
jumpbug
Junior Developer
Junior Developer
 
Posts: 12
Joined: Thu Mar 25, 2010 12:25 pm

Postby frankie » Tue Apr 20, 2010 8:11 am

thanks for the reply. that really did help me but still not able to parse successfully.my parser works well as a java stand alone application but when i implement it in android, it isnot working . i am struck very badly. kindly help. i am getting an exception message as "null ". can you share the project ? it could help me understand better. if you watn to mail me the project my maid id is jayanth2frankie@gmail.com

thanks again
frankie
Junior Developer
Junior Developer
 
Posts: 11
Joined: Wed Apr 07, 2010 12:49 pm
Location: india

Postby jumpbug » Tue Apr 20, 2010 9:34 pm

Unfortunately I cannot share my project as it was developed for the company I work for.

However, the code I have pasted is as good as a working project for the purpose that you need: just create your classes, change the XML handler so that it handles the tags of your XML file and you're good to go.

Can you at least tell us which part of the code your are getting a nullpointer exception on? You should try working with some logging in between the program's flow or setting up break points to see where it goes wrong.

FYI: I have found this to be a good example on how the saxparser works. On this forum is also a good example somewhere, posted by plusminus I believe.
jumpbug
Junior Developer
Junior Developer
 
Posts: 12
Joined: Thu Mar 25, 2010 12:25 pm

Postby frankie » Wed Apr 21, 2010 6:47 am

thanks again jumpbug. I was able to solve it , i was checking if the first element of the parsed list to be null, that was throwing me an exception. i have fixed it now . thanks again for your guidance.
frankie
Junior Developer
Junior Developer
 
Posts: 11
Joined: Wed Apr 07, 2010 12:49 pm
Location: india

Re: Saxparser doesn't work when encountering &quot [solved]

Postby gioxlit » Tue Jan 10, 2012 11:28 am

i use almost identical code for my xml parser, and works fine with most of the feeds i tried.

The problem is that for a specific xml file located at http://www.ntua.gr/announcements/rector/an_0_1.xml i get a SAXexception error and can not retrieve any items.

I suspect there is a problem with cdata.

Since the application i am building must retrieve items from this specific feed, could you make any suggestions?
gioxlit
Freshman
Freshman
 
Posts: 2
Joined: Tue Jan 10, 2012 11:23 am

Re: Saxparser doesn't work when encountering &quot [solved]

Postby Phyll » Wed Jan 11, 2012 4:31 am

Hi gioxlit,

I see you've revived an old thread. This may not be the best approach but it seems to me that if you knew of certain deficiencies in the saxparser that cause it to fail, you might bring this document down and inspect for these known failures and fix them if possible before you parse it.

Hope this helps.

Phyll
Phyll
Master Developer
Master Developer
 
Posts: 648
Joined: Fri Oct 14, 2011 11:19 am

Re: Saxparser doesn't work when encountering &quot [solved]

Postby gioxlit » Wed Jan 11, 2012 1:02 pm

i think the main problem is that each field in the xml file contains cdata tags, and i cant find a way to process them correctly
gioxlit
Freshman
Freshman
 
Posts: 2
Joined: Tue Jan 10, 2012 11:23 am

Re: Saxparser doesn't work when encountering &quot [solved]

Postby Phyll » Wed Jan 11, 2012 2:00 pm

Hi gioxlit,

It doesn't seem like they would use some kind of non-standard xml in a feed that should decode in a standard way.

Here is an example of some of your xml from that feed:

Code: Select all

<?xml version="1.0" encoding="ISO-8859-7"?>
<rss version="0.91">
<channel>
<title><![CDATA[RSS Áíáêïéíþóåùí Ðñõôáíåßáò ÅÌÐ]]></title>
<link><![CDATA[http://ww



There are lots of CDATA tags (almost everything). This seems like it should be handled in the saxparser because its a standard part of RSS feeds.

Here is some code that also has CDATA tags but works fine:

Code: Select all
<title><![CDATA[Build a digital book with EPUB]]></title>
      
      <description><![CDATA[Need to distribute documentation, create an eBook, or just archive your favorite blog  posts? EPUB is an open specification for digital books based on familiar technologies like  XML, CSS, and XHTML, and EPUB files can be read on portable e-ink devices, mobile phones, and desktop computers. This tutorial explains the EPUB format in detail, demonstrates EPUB validation using Java technology, and moves step-by-step through automating EPUB creation using DocBook and Python.]]></description>
      <link><![CDATA[http://www.ibm.com/developerworks/edu/x-dw-x-epubtut.html?ca=drs-]]></link>
      <pubDate>05 Feb 2009 06:00:00 +0000</pubDate>



One thing I noticed about the source in your feed, it doesn't have carriage returns and line feeds, just line feeds.
The second example does have both but this editor adds the CR to it. I wouldn't think that would make a difference.

When I look at my own implementation of that parser, it just returns a couple of empty strings and quits. On the title I think, and the log does not return any of the Greek characters just the [RSS ].

Hope this helps.

Phyll
Phyll
Master Developer
Master Developer
 
Posts: 648
Joined: Fri Oct 14, 2011 11:19 am

Re: Saxparser doesn't work when encountering &quot [solved]

Postby albereseanes » Fri Dec 28, 2012 7:33 am

My perform is done, why wait?" What a loss to the globe that he select to end his lifestyle rather than ongoing to discuss his gifts in other explanations of perform.
albereseanes
Freshman
Freshman
 
Posts: 5
Joined: Fri Dec 28, 2012 6:22 am

Top

Return to Other Coding-Problems

Who is online

Users browsing this forum: Yahoo [Bot] and 23 guests