jsoup escaping ampersand in link href -
jsoup escaping ampersand in query portion of url in link href. given sample below
string l_input = "<html><body>before <a href=\"http://a.b.com/ct.html\">link text</a> after</body></html>"; org.jsoup.nodes.document l_doc = org.jsoup.jsoup.parse(l_input); org.jsoup.select.elements l_html_links = l_doc.getelementsbytag("a"); (org.jsoup.nodes.element l : l_html_links) { l.attr("href", "http://a.b.com/ct.html?a=111&b=222"); } string l_output = l_doc.outerhtml();
the output
<html> <head></head> <body> before <a href="http://a.b.com/ct.html?a=111&b=222">link text</a> after </body> </html>
the single & being escaped & . shouldn't stay & ?
it seems can't it. went through source , found place escape happens.
it defined in attribute.java
/** html representation of attribute; e.g. {@code href="index.html"}. @return html */ public string html() { return key + "=\"" + entities.escape(value, (new document("")).outputsettings()) + "\""; }
there see using entities.java jsoup takes default outputsettings of new document("");
that's way can't override settings.
maybe should post feature request that.
btw: default escape mode set base
.
the documet.java creates default outputsettings
objects, , there defined. see:
/** * html document. * * @author jonathan hedley, jonathan@hedley.net */ public class document extends element { private outputsettings outputsettings = new outputsettings(); // ... } /** * document's output settings control form of text() , html() methods. */ public static class outputsettings implements cloneable { private entities.escapemode escapemode = entities.escapemode.base; // ... }
workaround (unescape xml):
with stringescapeutils
apache commons lang project can escape thinks easly. see:
string unescapedxml = stringescapeutils.unescapexml(l_output); system.out.println(unescapedxml);
this print:
<html> <head></head> <body> before <a href="http://a.b.com/ct.html?a=111&b=222">link text</a> after </body> </html>
but of course, replace &
...
Comments
Post a Comment