Python 2.7.2: plistlib with itunes xml -


i'm reading itunes generated xml playlist plistib. xml has utf8 header.

when read xml plistib, both unicode (e.g., 'name': u'don\u2019t remember') , byte strings (e.g., 'name': 'where eagles dare').

standard advice decode read correct encoding possible , use unicode within program. however,

unicode_string.decode('utf8')  

fails (as should) with

unicodeencodeerror: 'ascii' codec can't encode character u'\u2019' in position 3: ordinal not in range(128) 

the solution seem be:

for name in names:     if isinstance(name, str):         name = name.decode('utf8')     # etc. 

is correct way of dealing problem? there better way?

i'm on windows 7.

edit:

xml read with:

import plistlib xml = plistlb.readplist(fn) track in xml['tracks']:     info = xml['tracks'][track]     info['name'] 

produces in idle:

u'don\u2019t remember' 'where eagles dare' 

here's xml file:

<?xml version="1.0" encoding="utf-8"?> <!doctype plist public "-//apple computer//dtd plist 1.0//en" "http://www.apple.com/dtds/propertylist-1.0.dtd"> <plist version="1.0"> <dict>     <key>major version</key><integer>1</integer>     <key>minor version</key><integer>1</integer>     <key>date</key><date>2013-08-14t15:04:27z</date>     <key>application version</key><string>10.6.3</string>     <key>features</key><integer>5</integer>     <key>show content ratings</key><true/>     <key>music folder</key><string>file://localhost/c:/users/rdp/music/itunes/itunes%20media/</string>     <key>library persistent id</key><string>fe28ccacd9a36c34</string>     <key>tracks</key>     <dict>         <key>1019</key>         <dict>             <key>track id</key><integer>1019</integer>             <key>name</key><string>where eagles dare</string>             <key>artist</key><string>iron maiden</string>             <key>album</key><string>piece of mind</string>             <key>genre</key><string>rock</string>             <key>kind</key><string>mpeg audio file</string>             <key>size</key><integer>7372755</integer>             <key>total time</key><integer>370128</integer>             <key>track number</key><integer>1</integer>             <key>year</key><integer>1983</integer>             <key>date modified</key><date>2009-10-07t21:11:31z</date>             <key>date added</key><date>2008-02-07t16:04:15z</date>             <key>bit rate</key><integer>153</integer>             <key>sample rate</key><integer>44100</integer>             <key>play count</key><integer>4</integer>             <key>play date</key><integer>3414416760</integer>             <key>play date utc</key><date>2012-03-12t21:06:00z</date>             <key>artwork count</key><integer>1</integer>             <key>persistent id</key><string>fe28ccacd9a383e5</string>             <key>track type</key><string>file</string>             <key>location</key><string>file://localhost/d:/music/iron%20maiden/piece%20of%20mind/01%20where%20eagles%20dare.mp3</string>             <key>file folder count</key><integer>-1</integer>             <key>library folder count</key><integer>-1</integer>         </dict>         <key>11559</key>         <dict>             <key>track id</key><integer>11559</integer>             <key>name</key><string>don’t remember</string>             <key>artist</key><string>adele</string>             <key>album</key><string>21</string>             <key>genre</key><string>pop</string>             <key>kind</key><string>mpeg audio file</string>             <key>size</key><integer>6120028</integer>             <key>total time</key><integer>229511</integer>             <key>track number</key><integer>4</integer>             <key>track count</key><integer>11</integer>             <key>year</key><integer>2011</integer>             <key>date modified</key><date>2012-11-17t10:50:31z</date>             <key>date added</key><date>2012-12-19t16:03:46z</date>             <key>bit rate</key><integer>199</integer>             <key>sample rate</key><integer>44100</integer>             <key>artwork count</key><integer>1</integer>             <key>persistent id</key><string>7130c888606fb153</string>             <key>track type</key><string>file</string>             <key>location</key><string>file://localhost/d:/music/adele/21/04%20-%20don%e2%80%99t%20you%20remember.mp3</string>             <key>file folder count</key><integer>-1</integer>             <key>library folder count</key><integer>-1</integer>         </dict>     </dict>     <key>playlists</key>     <array>         <dict>             <key>name</key><string>short</string>             <key>playlist id</key><integer>30888</integer>             <key>playlist persistent id</key><string>166746c6572b0005</string>             <key>all items</key><true/>             <key>playlist items</key>             <array>                 <dict>                     <key>track id</key><integer>11559</integer>                 </dict>                 <dict>                     <key>track id</key><integer>1019</integer>                 </dict>             </array>         </dict>     </array> </dict> </plist> 

wow weird behaviour. non-uniform behaviour bug in 2.x implementation of plistlib. plistlib in python 3 returns unicode strings better.

but have live :) answer question yes. should protect when reading string plist

def safe_unicode(s):     if isinstance(s, unicode):         return s     return s.decode('utf-8', errors='replace')  value = safe_unicode(info['name']) 

i added errors='replace' in case string not utf-8 encoded. you'll bunch of \ufffd characters if cannot decoded. if rather exception leave out , use e.decode('utf-8').

update:

when tried elementtree:

from xml.etree import elementtree et tree = et.parse('test.plist') map(lambda x: x.text, tree.findall('dict/dict/dict')[1].findall('string')) 

which gave me:

[u'don\u2019t remember',  'adele',  '21',  'pop',  'mpeg audio file',  '7130c888606fb153',  'file',  'file://localhost/d:/music/adele/21/04%20-%20don%e2%80%99t%20you%20remember.mp3'] 

so there unicode , byte string mixed :-/


Comments

Popular posts from this blog

assembly - 8086 TASM: Illegal Indexing Mode -

Java, LWJGL, OpenGL 1.1, decoding BufferedImage to Bytebuffer and binding to OpenGL across classes -

javascript - addthis share facebook and google+ url -