jsoup escaping ampersand in link href -


jsoup escaping ampersand in query portion of url in link href. given sample below

    string l_input = "<html><body>before <a href=\"http://a.b.com/ct.html\">link text</a> after</body></html>";     org.jsoup.nodes.document l_doc = org.jsoup.jsoup.parse(l_input);     org.jsoup.select.elements l_html_links = l_doc.getelementsbytag("a");     (org.jsoup.nodes.element l : l_html_links) {       l.attr("href", "http://a.b.com/ct.html?a=111&b=222");     }     string l_output = l_doc.outerhtml(); 

the output

    <html>     <head></head>     <body>     before      <a href="http://a.b.com/ct.html?a=111&amp;b=222">link text</a> after     </body>     </html> 

the single & being escaped &amp; . shouldn't stay & ?

it seems can't it. went through source , found place escape happens.

it defined in attribute.java

/**  html representation of attribute; e.g. {@code href="index.html"}.  @return html  */ public string html() {     return key + "=\"" + entities.escape(value, (new document("")).outputsettings()) + "\""; } 

there see using entities.java jsoup takes default outputsettings of new document(""); that's way can't override settings.

maybe should post feature request that.

btw: default escape mode set base.

the documet.java creates default outputsettings objects, , there defined. see:

/**  * html document.  *  * @author jonathan hedley, jonathan@hedley.net   */ public class document extends element {     private outputsettings outputsettings = new outputsettings();     // ... }   /**  * document's output settings control form of text() , html() methods.  */ public static class outputsettings implements cloneable {     private entities.escapemode escapemode = entities.escapemode.base;     // ... } 

workaround (unescape xml):

with stringescapeutils apache commons lang project can escape thinks easly. see:

    string unescapedxml = stringescapeutils.unescapexml(l_output);     system.out.println(unescapedxml); 

this print:

<html>  <head></head>  <body>   before    <a href="http://a.b.com/ct.html?a=111&b=222">link text</a> after  </body> </html> 

but of course, replace &amp;...


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -