issues with python xml parsing -
i'm new xml , rest have basic knowledge python. i'm facing issues while trying parse attached xml file.
i use beautifulsoup library parse file and, unknown reason, can access different fields of entries 2 , 3 not entry 1, while formatted same way. can tell i'm doing wrong (attached) code , output please?
<?xml version='1.0' encoding='utf-8'?> <feed xmlns="http://www.w3.org/2005/atom"> <title type="text">news</title> <id>1</id> <link href="" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/entries" rel="self" /> <updated>2014-11-26t10:41:12.424z</updated> <author /> <entry xmlns:georss="http://www.georss.org/georss"> <title type="html">test rest</title> <content type="html">1</content> <author> <name>user213</name> </author> <summary type="html">test put entry 3</summary> <id>7</id> <georss:point>21.94420760726878 17.44</georss:point> <updated>2014-11-24t09:55:31.000z</updated> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/7" rel="self" type="application/atom+xml" length="0" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/7/editentry" rel="edit" type="application/atom+xml" length="0" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/7/comments" rel="replies" type="application/atom+xml" length="0" /> </entry> <entry xmlns:georss="http://www.georss.org/georss"> <title type="html">test rest</title> <content type="html">1</content> <author> <name>user213</name> </author> <summary type="html">test put entry 8</summary> <id>8</id> <georss:point>21.94420760726878 17.44</georss:point> <updated>2014-11-24t13:47:09.000z</updated> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/8" rel="self" type="application/atom+xml" length="0" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/8/editentry" rel="edit" type="application/atom+xml" length="0" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/8/comments" rel="replies" type="application/atom+xml" length="0" /> </entry> <entry xmlns:georss="http://www.georss.org/georss"> <title type="html">test rest</title> <content type="html">1</content> <author> <name>user213</name> </author> <summary type="html">test post</summary> <id>12</id> <georss:point>21.94420760726878 17.44</georss:point> <updated>2014-11-25t14:29:02.000z</updated> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/12" rel="self" type="application/atom+xml" length="0" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/12/editentry" rel="edit" type="application/atom+xml" length="0" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/12/comments" rel="replies" type="application/atom+xml" length="0" /> </entry> </feed>
python code:
#!/usr/bin/python beautifulsoup import beautifulsoup handler = open("/tmp/test.xml").read() results = soup.findall('entry') r in results: print r print r.find('title').text print r.find('content').text print r.find('georss:point') print r.find('id') print r.find('updated')
and output following:
<entry xmlns:georss="http://www.georss.org/georss"> <title type="html">test rest</title> <content type="html">1</content> </entry> test rest 1 none none none <entry xmlns:georss="http://www.georss.org/georss"> <title type="html">test rest</title> <content type="html">1</content> <author> <name>user213</name> </author> <summary type="html">test put entry 8</summary> <id>8</id> <georss:point>21.94420760726878 17.44</georss:point> <updated>2014-11-24t13:47:09.000z</updated> <link href="http://192.168.20.223:8083/mywebapp/rest/listofentries/1/8" rel="self" type="application/atom+xml" length="0" /> <link href="http://192.168.20.223:8083/mywebapp/rest/listofentries/1/8/editentry" rel="edit" type="application/atom+xml" length="0" /> <link href="http://192.168.20.223:8083/mywebapp/rest/listofentries/1/8/comments" rel="replies" type="application/atom+xml" length="0" /> </entry> test rest 1 <georss:point>21.94420760726878 17.44</georss:point> <id>8</id> <updated>2014-11-24t13:47:09.000z</updated> <entry xmlns:georss="http://www.georss.org/georss"> <title type="html">test rest</title> <content type="html">1</content> <author> <name>user213</name> </author> <summary type="html">test post</summary> <id>12</id> <georss:point>21.94420760726878 17.44</georss:point> <updated>2014-11-25t14:29:02.000z</updated> <link href="http://192.168.20.223:8083/mywebapp/rest/listofentries/1/12" rel="self" type="application/atom+xml" length="0" /> <link href="http://192.168.20.223:8083/mywebapp/rest/listofentries/1/12/editentry" rel="edit" type="application/atom+xml" length="0" /> <link href="http://192.168.20.223:8083/mywebapp/rest/listofentries/1/12/comments" rel="replies" type="application/atom+xml" length="0" /> </entry> test rest 1 <georss:point>21.94420760726878 17.44</georss:point> <id>12</id> <updated>2014-11-25t14:29:02.000z</updated>
from have tested following code :
#!/usr/bin/python beautifulsoup import beautifulsoup handler = open("./test.xml").read() soup = beautifulsoup(handler) print soup.prettify()
the ouput :
<?xml version='1.0' encoding='utf-8'?> <feed xmlns="http://www.w3.org/2005/atom"> <title type="text"> news </title> <id> 1 </id> <link href="" /> <link href="http://192.168.1.12:8083/mywebapp/rest/listofentries/1/entries" rel="self" /> <updated> 2014-11-26t10:41:12.424z </updated> <author> <entry xmlns:georss="http://www.georss.org/georss"> <title type="html"> test rest </title> <content type="html"> 1 </content> </entry> </author> <author> <name> user213 </name> </author>
if closely see in xml <author />
seen open tag beautifulsoup.
that's why don't find title, content.. because him out of tag.
hope this`ll help
Comments
Post a Comment