Badlinestatus: <html> | Python with http.client - Works for some sites but not others -


import http.client import csv  def http_get(url, path, headers):     try:         conn = http.client.httpconnection(url)         print ('connecting ' + url)         conn.request(url, path, headers=headers)         resp = conn.getresponse()         if resp.status<=400:             body = resp.read()             print ('reading source...')     except exception e:         raise exception('connection error: %s' % e)         pass     finally:         conn.close()         print ('connection closed')      if resp.status >= 400:         print (url)         raise valueerror('response error: %s, %s, url: %s' % (resp.status, resp.reason,url))     return body   open('domains.csv','r') csvfile:     urls = [row[0] row in csv.reader(csvfile)]  l = ['version 0.7','version 1.2','version 1.5','version 2.0','version 2.1','version 2.3','version 2.5','version 2.6','version 2.7','version 2.8','version 2.9','version 2.9','version 3.0','version 3.1','version 3.2','version 3.3','version 3.4','version 3.5.1','version 3.5.2'] path = '/' user_agent = 'mozilla/4.0 (compatible; msie 5.5; windows nt)' headers = {'user-agent': user_agent}  url in urls:             host = url      print ('testing wordpress installation on ' + url)     http_get(host,path,headers) 

i've been looking @ week or 2 , i've found similar errors around don't why works sites in csv file , not others. checked server , saw dropping icmp packets default changed , traceroute , ping both 100% received opposed previous 100% lost. figured related sites on host had same issue. script still throwing exception:

mud@alex-bbvm:~/desktop/scripts$ python3 httptest.py testing wordpress installation on xxxxx.ie connecting exsite.ie reading source... connection closed testing wordpress installation on aaaaaa.com connecting aaaaa.com reading source... connection closed testing wordpress installation on yyyyy.ie connecting yyyyy.ie reading source... connection closed testing wordpress installation on ccccc.ie connecting cccccc.ie reading source... connection closed testing wordpress installation on ddddddd.ie connecting ddddddd.ie connection closed traceback (most recent call last):   file "httptest.py", line 9, in http_get     resp = conn.getresponse()   file "/usr/lib/python3.2/http/client.py", line 1049, in getresponse     response.begin()   file "/usr/lib/python3.2/http/client.py", line 346, in begin     version, status, reason = self._read_status()   file "/usr/lib/python3.2/http/client.py", line 328, in _read_status     raise badstatusline(line) http.client.badstatusline: <html>   during handling of above exception, exception occurred:  traceback (most recent call last):   file "httptest.py", line 38, in <module>     http_get(host,path,headers)   file "httptest.py", line 14, in http_get     raise exception('connection error: %s' % e) exception: connection error: <html> 

i've replaced urls placeholders client addresses , i'd rather not post them here.

anyways, insights or appreciated.

i've read documentation http.client , it's relevant exceptions can't seem extract solution gleened that.

thanks!

first off, suggest read httpresponse object before calling conn.close(). 404 responses contain document.

i'm rather confused tracebacks, far can see http.client.badstatusline should have been hidden except exception.

typically except exception clause isn't idea unless re-raise same exception (you not) may masking underlying problems. in case, it's first thing should go when code isn't working expected.

additionally output you've provided doesn't seem match code you've provided.

specifically, according traceback:

connection closed traceback (most recent call last):   file "httptest.py", line 9, in http_get     resp = conn.getresponse() 

the code has print ('connecting ' + url) before:

print ('connecting ' + url) conn.request(url, path, headers=headers) resp = conn.getresponse() 

but line preceding traceback in output connection closed.


update

ignoring confusing execution order of try / finally.

http.client.badstatusline raise when initial response not http/1.1 200 ok. in particular case, <html> instead.

either server returning documentwithout http header. or it's unexpected behaviour code.

i repeat i've said: read httpresponse object.

a packet capture confirm what's going on wire server.


Comments

Popular posts from this blog

assembly - 8086 TASM: Illegal Indexing Mode -

Java, LWJGL, OpenGL 1.1, decoding BufferedImage to Bytebuffer and binding to OpenGL across classes -

javascript - addthis share facebook and google+ url -