ruby - Disable HTML within XML escaping with Nokogiri -


i'm trying parse xml document google directions api.

this i've got far:

x = nokogiri::xml(googledirections.new("48170", "48104").xml) x.xpath("//directionsresponse//route//leg//step").each |q|   q.xpath("html_instructions").each |h|     puts h.inner_html   end end 

the output looks this:

head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b> turn <b>right</b> onto <b>n territorial rd</b> turn <b>left</b> onto <b>gotfredson rd</b> ... 

i output be:

turn <b>right</b> onto <b>n territorial rd</b> 

the problem seems nokogiri escaping html within xml

i trust google, think sanitize further to:

turn right onto n territorial rd 

but can't (using sanitize perhaps) without raw xml. ideas?

because don't have google directions api installed can't access xml, have strong suspicion problem result of telling nokogiri you're dealing xml. result it's going return html encoded should in xml.

you can unescape html using like:

cgi::unescape_html('head &lt;b&gt;south&lt;/b&gt; on &lt;b&gt;hidden pond dr&lt;/b&gt; toward &lt;b&gt;ironwood ct&lt;/b&gt;') => "head <b>south</b> on <b>hidden pond dr</b> toward <b>ironwood ct</b>\n" 

unescape_html alias unescapehtml:

  unescape string has been html-escaped   cgi::unescapehtml("usage: foo "bar" <baz>")      # => "usage: foo \"bar\" " 

i had think bit more. it's i've run into, 1 of things escaped me during rush @ work. fix simple: you're using wrong method retrieve content. instead of:

puts h.inner_html 

use:

puts h.text 

i proved using:

require 'httpclient' require 'nokogiri'  # url comes from: https://developers.google.com/maps/documentation/directions/#xml url = 'http://maps.googleapis.com/maps/api/directions/xml?origin=chicago,il&destination=los+angeles,ca&waypoints=joplin,mo|oklahoma+city,ok&sensor=false' clnt = httpclient.new  doc = nokogiri::xml(clnt.get_content(url)) doc.search('html_instructions').each |html|   puts html.text end 

which outputs:

head <b>south</b> on <b>s federal st</b> toward <b>w van buren st</b> turn <b>right</b> onto <b>w congress pkwy</b> continue onto <b>i-290 w</b> [...] 

the difference inner_html reading content of node directly, without decoding. text decodes you. text, to_str , inner_text aliased content internally in nokogiri::xml::node our parsing pleasure.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -