Python 2.7.2: plistlib with itunes xml -
i'm reading itunes generated xml playlist plistib. xml has utf8 header.
when read xml plistib, both unicode (e.g., 'name': u'don\u2019t remember') , byte strings (e.g., 'name': 'where eagles dare').
standard advice decode read correct encoding possible , use unicode within program. however,
unicode_string.decode('utf8') fails (as should) with
unicodeencodeerror: 'ascii' codec can't encode character u'\u2019' in position 3: ordinal not in range(128) the solution seem be:
for name in names: if isinstance(name, str): name = name.decode('utf8') # etc. is correct way of dealing problem? there better way?
i'm on windows 7.
edit:
xml read with:
import plistlib xml = plistlb.readplist(fn) track in xml['tracks']: info = xml['tracks'][track] info['name'] produces in idle:
u'don\u2019t remember' 'where eagles dare' here's xml file:
<?xml version="1.0" encoding="utf-8"?> <!doctype plist public "-//apple computer//dtd plist 1.0//en" "http://www.apple.com/dtds/propertylist-1.0.dtd"> <plist version="1.0"> <dict> <key>major version</key><integer>1</integer> <key>minor version</key><integer>1</integer> <key>date</key><date>2013-08-14t15:04:27z</date> <key>application version</key><string>10.6.3</string> <key>features</key><integer>5</integer> <key>show content ratings</key><true/> <key>music folder</key><string>file://localhost/c:/users/rdp/music/itunes/itunes%20media/</string> <key>library persistent id</key><string>fe28ccacd9a36c34</string> <key>tracks</key> <dict> <key>1019</key> <dict> <key>track id</key><integer>1019</integer> <key>name</key><string>where eagles dare</string> <key>artist</key><string>iron maiden</string> <key>album</key><string>piece of mind</string> <key>genre</key><string>rock</string> <key>kind</key><string>mpeg audio file</string> <key>size</key><integer>7372755</integer> <key>total time</key><integer>370128</integer> <key>track number</key><integer>1</integer> <key>year</key><integer>1983</integer> <key>date modified</key><date>2009-10-07t21:11:31z</date> <key>date added</key><date>2008-02-07t16:04:15z</date> <key>bit rate</key><integer>153</integer> <key>sample rate</key><integer>44100</integer> <key>play count</key><integer>4</integer> <key>play date</key><integer>3414416760</integer> <key>play date utc</key><date>2012-03-12t21:06:00z</date> <key>artwork count</key><integer>1</integer> <key>persistent id</key><string>fe28ccacd9a383e5</string> <key>track type</key><string>file</string> <key>location</key><string>file://localhost/d:/music/iron%20maiden/piece%20of%20mind/01%20where%20eagles%20dare.mp3</string> <key>file folder count</key><integer>-1</integer> <key>library folder count</key><integer>-1</integer> </dict> <key>11559</key> <dict> <key>track id</key><integer>11559</integer> <key>name</key><string>don’t remember</string> <key>artist</key><string>adele</string> <key>album</key><string>21</string> <key>genre</key><string>pop</string> <key>kind</key><string>mpeg audio file</string> <key>size</key><integer>6120028</integer> <key>total time</key><integer>229511</integer> <key>track number</key><integer>4</integer> <key>track count</key><integer>11</integer> <key>year</key><integer>2011</integer> <key>date modified</key><date>2012-11-17t10:50:31z</date> <key>date added</key><date>2012-12-19t16:03:46z</date> <key>bit rate</key><integer>199</integer> <key>sample rate</key><integer>44100</integer> <key>artwork count</key><integer>1</integer> <key>persistent id</key><string>7130c888606fb153</string> <key>track type</key><string>file</string> <key>location</key><string>file://localhost/d:/music/adele/21/04%20-%20don%e2%80%99t%20you%20remember.mp3</string> <key>file folder count</key><integer>-1</integer> <key>library folder count</key><integer>-1</integer> </dict> </dict> <key>playlists</key> <array> <dict> <key>name</key><string>short</string> <key>playlist id</key><integer>30888</integer> <key>playlist persistent id</key><string>166746c6572b0005</string> <key>all items</key><true/> <key>playlist items</key> <array> <dict> <key>track id</key><integer>11559</integer> </dict> <dict> <key>track id</key><integer>1019</integer> </dict> </array> </dict> </array> </dict> </plist>
wow weird behaviour. non-uniform behaviour bug in 2.x implementation of plistlib. plistlib in python 3 returns unicode strings better.
but have live :) answer question yes. should protect when reading string plist
def safe_unicode(s): if isinstance(s, unicode): return s return s.decode('utf-8', errors='replace') value = safe_unicode(info['name']) i added errors='replace' in case string not utf-8 encoded. you'll bunch of \ufffd characters if cannot decoded. if rather exception leave out , use e.decode('utf-8').
update:
when tried elementtree:
from xml.etree import elementtree et tree = et.parse('test.plist') map(lambda x: x.text, tree.findall('dict/dict/dict')[1].findall('string')) which gave me:
[u'don\u2019t remember', 'adele', '21', 'pop', 'mpeg audio file', '7130c888606fb153', 'file', 'file://localhost/d:/music/adele/21/04%20-%20don%e2%80%99t%20you%20remember.mp3'] so there unicode , byte string mixed :-/
Comments
Post a Comment