html parsing - XPATH exclude more than one element/tag -
i having issues trying extract text between 2 div
tags in xml.
imagine have following xml:
<div class="default_style_wrap" > <!-- body starts --> <!-- irrelvent data --> <div style="clear:both" /> <!-- irrelvent data --> <div class="name_address" >...</div> <!-- irrelvent data --> <div style="clear:both" /> <!-- irrelvent data --> <span class="img_comments_right" >...</span> <!-- text want --> 2 members of expedition 35 crew wrapped 6-hour, 38 minute spacewalk @ 4:41 p.m. edt friday deploy , retrieve several science experiments on exterior of international space station , install new navigational aid. <br /> <br /> spacewalkers' first task install obstanovka experiment on station's zvezda service module. obstanovka study plasma waves , effect of space weather on earth's ionosphere. <!-- irrelvent data again --> <span class="img_comments_right" >...</span> <!-- text want --> after deploying pair of sensor booms obstanovka, vinogradov , romanenko retrieved biorisk experiment exterior of pirs. biorisk experiment studied effect of microbes on spacecraft structures. <br /> <br /> 167th spacewalk in support of space station assembly , maintenance, totaling 1,055 hours, 39 minutes. vinogradov's 7 spacewalks total 38 hours, 25 minutes. romanenko completed first spacewalk. <!-- body ends --> </div>
as may not reflective in code, default_style_wrap
parent of other irrelevant divs
, spans
. relevant text me of tag-less text there other tags in between can see, instance img_comments_right
, driving me nuts.
i tried following saw in post:
"//div[@class='article_container']/*[not(self::div)]";
but seems not returning text @ all, , if did, wouldn't know how exclude spans
.
any ideas?
you should try following query. selects following siblings of <span>
nodes, text nodes:
query = '//span[@class="img_comments_right"]/following-sibling::text()';
Comments
Post a Comment