Badlinestatus: <html> | Python with http.client - Works for some sites but not others -
import http.client import csv def http_get(url, path, headers): try: conn = http.client.httpconnection(url) print ('connecting ' + url) conn.request(url, path, headers=headers) resp = conn.getresponse() if resp.status<=400: body = resp.read() print ('reading source...') except exception e: raise exception('connection error: %s' % e) pass finally: conn.close() print ('connection closed') if resp.status >= 400: print (url) raise valueerror('response error: %s, %s, url: %s' % (resp.status, resp.reason,url)) return body open('domains.csv','r') csvfile: urls = [row[0] row in csv.reader(csvfile)] l = ['version 0.7','version 1.2','version 1.5','version 2.0','version 2.1','version 2.3','version 2.5','version 2.6','version 2.7','version 2.8','version 2.9','version 2.9','version 3.0','version 3.1','version 3.2','version 3.3','version 3.4','version 3.5.1','version 3.5.2'] path = '/' user_agent = 'mozilla/4.0 (compatible; msie 5.5; windows nt)' headers = {'user-agent': user_agent} url in urls: host = url print ('testing wordpress installation on ' + url) http_get(host,path,headers) i've been looking @ week or 2 , i've found similar errors around don't why works sites in csv file , not others. checked server , saw dropping icmp packets default changed , traceroute , ping both 100% received opposed previous 100% lost. figured related sites on host had same issue. script still throwing exception:
mud@alex-bbvm:~/desktop/scripts$ python3 httptest.py testing wordpress installation on xxxxx.ie connecting exsite.ie reading source... connection closed testing wordpress installation on aaaaaa.com connecting aaaaa.com reading source... connection closed testing wordpress installation on yyyyy.ie connecting yyyyy.ie reading source... connection closed testing wordpress installation on ccccc.ie connecting cccccc.ie reading source... connection closed testing wordpress installation on ddddddd.ie connecting ddddddd.ie connection closed traceback (most recent call last): file "httptest.py", line 9, in http_get resp = conn.getresponse() file "/usr/lib/python3.2/http/client.py", line 1049, in getresponse response.begin() file "/usr/lib/python3.2/http/client.py", line 346, in begin version, status, reason = self._read_status() file "/usr/lib/python3.2/http/client.py", line 328, in _read_status raise badstatusline(line) http.client.badstatusline: <html> during handling of above exception, exception occurred: traceback (most recent call last): file "httptest.py", line 38, in <module> http_get(host,path,headers) file "httptest.py", line 14, in http_get raise exception('connection error: %s' % e) exception: connection error: <html> i've replaced urls placeholders client addresses , i'd rather not post them here.
anyways, insights or appreciated.
i've read documentation http.client , it's relevant exceptions can't seem extract solution gleened that.
thanks!
first off, suggest read httpresponse object before calling conn.close(). 404 responses contain document.
i'm rather confused tracebacks, far can see http.client.badstatusline should have been hidden except exception.
typically except exception clause isn't idea unless re-raise same exception (you not) may masking underlying problems. in case, it's first thing should go when code isn't working expected.
additionally output you've provided doesn't seem match code you've provided.
specifically, according traceback:
connection closed traceback (most recent call last): file "httptest.py", line 9, in http_get resp = conn.getresponse() the code has print ('connecting ' + url) before:
print ('connecting ' + url) conn.request(url, path, headers=headers) resp = conn.getresponse() but line preceding traceback in output connection closed.
update
ignoring confusing execution order of try / finally.
http.client.badstatusline raise when initial response not http/1.1 200 ok. in particular case, <html> instead.
either server returning documentwithout http header. or it's unexpected behaviour code.
i repeat i've said: read httpresponse object.
a packet capture confirm what's going on wire server.
Comments
Post a Comment