Unable to Access Child Node in Parsing XML with Python Language -
i new python scripting language , working on parser parses web-based xml file.
i able retrieve 1 of elements using minidom in python no issues have 1 node having trouble with. last node require xml file 'url' within 'image' tag , can found within following xml file example:
<events> <event id="abcde01"> <title> name of event </title> <url> url of event <- url tag not need </url> <image> <url> url need </url> </image> </event>
below have copied brief sections of code feel may of relevance. appreciate retrieve last image url node. include have tried , error recieved when ran code in gae. python version using python 2.7 , should point out saving them within array (for later input database).
class xmlparser(webapp2.requesthandler): def get(self): base_url = 'http://api.eventful.com/rest/events/search?location=dublin&date=today' #downloads data xml file: response = urllib.urlopen(base_url) #converts data string data = response.read() unicode_data = data.decode('utf-8') data = unicode_data.encode('ascii','ignore') #closes file response.close() #parses xml downloaded dom = mdom.parsestring(data) node = dom.documentelement #needed declaration of variable #print out event names (titles) found in eventful xml event_main = dom.getelementsbytagname('event') #urls list parsing - attempt - urls_list = [] im in event_main: image_url = image.getelementsbytagname("image")[0].childnodes[0] urls_list.append(image_url)
the error receive following appreciated, karen
image_url = im.getelementsbytagname("image")[0].childnodes[0] indexerror: list index out of range
first of all, not reencode content. there no need so, xml parsers capable of handling encoded content.
next, i'd use elementtree api task this:
from xml.etree import elementtree et response = urllib.urlopen(base_url) tree = et.parse(response) urls_list = [] event in tree.findall('.//event[image]'): # find text content of first <image><url> tag combination: image_url = event.find('.//image/url') if image_url not none: urls_list.append(image_url.text)
this consideres event
elements have direct image
child element.
Comments
Post a Comment