urllib2 - Python: AssertionError: proxies must be a mapping -
i error :
traceback (most recent call last): file "script.py", line 7, in <module> proxy = urllib2.proxyhandler(line) file "/usr/lib/python2.7/urllib2.py", line 713, in __init__ assert hasattr(proxies, 'has_key'), "proxies must mapping" assertionerror: proxies must mapping
when run following script:
import urllib2 u=open('urls.txt') p=open('proxies.txt') line in p: proxy = urllib2.proxyhandler(line) opener = urllib2.build_opener(proxy) urllib2.install_opener(opener) url in u: urllib.urlopen(url).read() u.close() p.close()
my urls.txt file has this:
'www.google.com' 'www.facebook.com' 'www.reddit.com'
and proxies.txt has this:
{'https': 'https://94.142.27.4:3128'} {'http': 'http://118.97.95.174:8080'} {'http':'http://66.62.236.15:8080'}
i found them @ hidemyass.com
from googling have done, people have had problem have proxies formatted wrong. case here?
as the documentation says:
if proxies given, must dictionary mapping protocol names urls of proxies.
but in code, it's string. in particular, it's 1 line out of proxies.txt
file:
p=open('proxies.txt') line in p: proxy = urllib2.proxyhandler(line)
looking @ file, looks lines intended repr
of python dictionary. and, given of keys , values string literals, means could use ast.literal_eval
on recover original dicts:
p=open('proxies.txt') line in p: d = ast.literal_eval(line) proxy = urllib2.proxyhandler(d)
of course won't work sample data, because 1 of lines missing '
character. if fix that, will…
however, better use format that's intended data interchange. example, json human-readable you've got, , not different:
{"https": "https://94.142.27.4:3128"} {"http": "http://118.97.95.174:8080"} {"http": "http://66.62.236.15:8080"}
the advantage of using json there plenty of tools validate, edit, etc. json, , none custom format; rules , isn't valid obvious, rather have guess at; , error messages invalid data more helpful (like "expecting property name @ line 1 column 10 (char 10)" opposed "unexpected eof while parsing").
note once solve problem, you're going run 1 urls. after all, 'www.google.com'\n
not want, it's www.google.com
. you're going have strip off newline , quotes. again, use ast.literal_eval
here. or use json interchange format.
but really, if you're trying store 1 string per line, why not store strings as-is, instead of trying store string representation of strings (with quotes on)?
there still more problems beyond that.
even after rid of excess quotes, www.google.com
isn't url, it's hostname. http://www.google.com
want here. unless want https://www.google.com
, or other scheme.
you're trying loop through 'urls.txt'
once each proxy. that's going process of urls first proxy installed, , remainder (which nothing, since did of them) first 2 installed, , remainder (which still nothing) 3 installed. move url
loop outside of proxy
loop.
finally, these aren't problem, while we're @ it… using with
statement makes easier write more robust code using manual close
calls, , makes code shorter , more readable boot. also, it's better wait until need file before try open it. , variable names u
, p
going cause more confusion in long run they'll save typing in short run.
oh, , calling urllib.urlopen(url).read()
, not doing result won't have effect except waste few seconds , bit of network bandwidth, assume knew that, , left out details sake of simplicity.
putting together, , assuming fix 2 files described above:
import json import urllib2 open('proxies.txt') proxies: line in proxies: proxy = json.loads(line) proxy_handler = urllib2.proxyhandler(proxy) opener = urllib2.build_opener(proxy_handler) urllib2.install_opener(opener) open('urls.txt') urls: line in urls: url = line.rstrip() data = urllib.urlopen(url).read() # data
as turns out, want try of urls through each proxy, not try of them through proxies, or through first , first 2 , on.
you indenting second with
, for
under first for
. it's simpler read them @ once (and more efficient, although doubt matters):
with open('urls.txt') f: urls = [line.rstrip() line in f] open('proxies.txt') proxies: line in proxies: proxy = json.loads(line) proxy_handler = urllib2.proxyhandler(proxy) opener = urllib2.build_opener(proxy_handler) urllib2.install_opener(opener) url in urls: data = urllib.urlopen(url).read() # data
of course means reading whole list of urls before doing work. doubt matter, if does, can use the tee
trick avoid it.
Comments
Post a Comment