urllib2 - Python: AssertionError: proxies must be a mapping -


i error :

traceback (most recent call last):   file "script.py", line 7, in <module>     proxy = urllib2.proxyhandler(line)   file "/usr/lib/python2.7/urllib2.py", line 713, in __init__     assert hasattr(proxies, 'has_key'), "proxies must mapping" assertionerror: proxies must mapping 

when run following script:

import urllib2   u=open('urls.txt') p=open('proxies.txt') line in p:     proxy = urllib2.proxyhandler(line)     opener = urllib2.build_opener(proxy)     urllib2.install_opener(opener)     url in u:         urllib.urlopen(url).read()  u.close() p.close() 

my urls.txt file has this:

'www.google.com' 'www.facebook.com' 'www.reddit.com' 

and proxies.txt has this:

{'https': 'https://94.142.27.4:3128'} {'http': 'http://118.97.95.174:8080'} {'http':'http://66.62.236.15:8080'} 

i found them @ hidemyass.com

from googling have done, people have had problem have proxies formatted wrong. case here?

as the documentation says:

if proxies given, must dictionary mapping protocol names urls of proxies.

but in code, it's string. in particular, it's 1 line out of proxies.txt file:

p=open('proxies.txt') line in p:     proxy = urllib2.proxyhandler(line) 

looking @ file, looks lines intended repr of python dictionary. and, given of keys , values string literals, means could use ast.literal_eval on recover original dicts:

p=open('proxies.txt') line in p:     d = ast.literal_eval(line)     proxy = urllib2.proxyhandler(d) 

of course won't work sample data, because 1 of lines missing ' character. if fix that, will…

however, better use format that's intended data interchange. example, json human-readable you've got, , not different:

{"https": "https://94.142.27.4:3128"} {"http": "http://118.97.95.174:8080"} {"http": "http://66.62.236.15:8080"} 

the advantage of using json there plenty of tools validate, edit, etc. json, , none custom format; rules , isn't valid obvious, rather have guess at; , error messages invalid data more helpful (like "expecting property name @ line 1 column 10 (char 10)" opposed "unexpected eof while parsing").


note once solve problem, you're going run 1 urls. after all, 'www.google.com'\n not want, it's www.google.com. you're going have strip off newline , quotes. again, use ast.literal_eval here. or use json interchange format.

but really, if you're trying store 1 string per line, why not store strings as-is, instead of trying store string representation of strings (with quotes on)?


there still more problems beyond that.

even after rid of excess quotes, www.google.com isn't url, it's hostname. http://www.google.com want here. unless want https://www.google.com, or other scheme.

you're trying loop through 'urls.txt' once each proxy. that's going process of urls first proxy installed, , remainder (which nothing, since did of them) first 2 installed, , remainder (which still nothing) 3 installed. move url loop outside of proxy loop.

finally, these aren't problem, while we're @ it… using with statement makes easier write more robust code using manual close calls, , makes code shorter , more readable boot. also, it's better wait until need file before try open it. , variable names u , p going cause more confusion in long run they'll save typing in short run.

oh, , calling urllib.urlopen(url).read() , not doing result won't have effect except waste few seconds , bit of network bandwidth, assume knew that, , left out details sake of simplicity.

putting together, , assuming fix 2 files described above:

import json import urllib2    open('proxies.txt') proxies:     line in proxies:         proxy = json.loads(line)         proxy_handler = urllib2.proxyhandler(proxy)         opener = urllib2.build_opener(proxy_handler)         urllib2.install_opener(opener) open('urls.txt') urls:     line in urls:         url = line.rstrip()         data = urllib.urlopen(url).read()         # data 

as turns out, want try of urls through each proxy, not try of them through proxies, or through first , first 2 , on.

you indenting second with , for under first for. it's simpler read them @ once (and more efficient, although doubt matters):

with open('urls.txt') f:     urls = [line.rstrip() line in f] open('proxies.txt') proxies:     line in proxies:         proxy = json.loads(line)         proxy_handler = urllib2.proxyhandler(proxy)         opener = urllib2.build_opener(proxy_handler)         urllib2.install_opener(opener)         url in urls:             data = urllib.urlopen(url).read()             # data 

of course means reading whole list of urls before doing work. doubt matter, if does, can use the tee trick avoid it.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -