ruby - Disable HTML within XML escaping with Nokogiri -
i'm trying parse xml document google directions api.
this i've got far:
x = nokogiri::xml(googledirections.new("48170", "48104").xml) x.xpath("//directionsresponse//route//leg//step").each |q| q.xpath("html_instructions").each |h| puts h.inner_html end end the output looks this:
head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b> turn <b>right</b> onto <b>n territorial rd</b> turn <b>left</b> onto <b>gotfredson rd</b> ... i output be:
turn <b>right</b> onto <b>n territorial rd</b> the problem seems nokogiri escaping html within xml
i trust google, think sanitize further to:
turn right onto n territorial rd but can't (using sanitize perhaps) without raw xml. ideas?
because don't have google directions api installed can't access xml, have strong suspicion problem result of telling nokogiri you're dealing xml. result it's going return html encoded should in xml.
you can unescape html using like:
cgi::unescape_html('head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b>') => "head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b>\n" unescape_html alias unescapehtml:
unescape string has been html-escaped cgi::unescapehtml("usage: foo "bar" <baz>") # => "usage: foo \"bar\" " i had think bit more. it's i've run into, 1 of things escaped me during rush @ work. fix simple: you're using wrong method retrieve content. instead of:
puts h.inner_html use:
puts h.text i proved using:
require 'httpclient' require 'nokogiri' # url comes from: https://developers.google.com/maps/documentation/directions/#xml url = 'http://maps.googleapis.com/maps/api/directions/xml?origin=chicago,il&destination=los+angeles,ca&waypoints=joplin,mo|oklahoma+city,ok&sensor=false' clnt = httpclient.new doc = nokogiri::xml(clnt.get_content(url)) doc.search('html_instructions').each |html| puts html.text end which outputs:
head <b>south</b> on <b>s federal st</b> toward <b>w van buren st</b> turn <b>right</b> onto <b>w congress pkwy</b> continue onto <b>i-290 w</b> [...] the difference inner_html reading content of node directly, without decoding. text decodes you. text, to_str , inner_text aliased content internally in nokogiri::xml::node our parsing pleasure.
Comments
Post a Comment