ruby - Disable HTML within XML escaping with Nokogiri -
i'm trying parse xml document google directions api.
this i've got far:
x = nokogiri::xml(googledirections.new("48170", "48104").xml) x.xpath("//directionsresponse//route//leg//step").each |q| q.xpath("html_instructions").each |h| puts h.inner_html end end
the output looks this:
head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b> turn <b>right</b> onto <b>n territorial rd</b> turn <b>left</b> onto <b>gotfredson rd</b> ...
i output be:
turn <b>right</b> onto <b>n territorial rd</b>
the problem seems nokogiri escaping html within xml
i trust google, think sanitize further to:
turn right onto n territorial rd
but can't (using sanitize perhaps) without raw xml. ideas?
because don't have google directions api installed can't access xml, have strong suspicion problem result of telling nokogiri you're dealing xml. result it's going return html encoded should in xml.
you can unescape html using like:
cgi::unescape_html('head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b>') => "head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b>\n"
unescape_html
alias unescapehtml
:
unescape string has been html-escaped cgi::unescapehtml("usage: foo "bar" <baz>") # => "usage: foo \"bar\" "
i had think bit more. it's i've run into, 1 of things escaped me during rush @ work. fix simple: you're using wrong method retrieve content. instead of:
puts h.inner_html
use:
puts h.text
i proved using:
require 'httpclient' require 'nokogiri' # url comes from: https://developers.google.com/maps/documentation/directions/#xml url = 'http://maps.googleapis.com/maps/api/directions/xml?origin=chicago,il&destination=los+angeles,ca&waypoints=joplin,mo|oklahoma+city,ok&sensor=false' clnt = httpclient.new doc = nokogiri::xml(clnt.get_content(url)) doc.search('html_instructions').each |html| puts html.text end
which outputs:
head <b>south</b> on <b>s federal st</b> toward <b>w van buren st</b> turn <b>right</b> onto <b>w congress pkwy</b> continue onto <b>i-290 w</b> [...]
the difference inner_html
reading content of node directly, without decoding. text
decodes you. text
, to_str
, inner_text
aliased content
internally in nokogiri::xml::node our parsing pleasure.
Comments
Post a Comment