python - Quick way to extract CSS style attributes from html elements -

January 15, 2011

for machine learning purposes, have html page input, extract style attributes of dom elements. so, here preliminary code:

from selenium import webdriver  start = time.time() driver = webdriver.phantomjs() driver.get('example page') elements = driver.find_elements(by.xpath, "//*[not(child::*)]") #select leaf nodes l = {} css_properties=("line-height", "text-align","font-size", "font-style")  in elements:     if i.text:         #print time.time() - end_dl         if i.text not in l:             l[i.text] = {}         el in css_properties:             l[i.text][el] = str(i.value_of_css_property(el))             l[i.text]["text_length"] = len(i.text)

the problem code taking long parse features (~8s). can think in faster way this?

are sure it's parsing step that's taking long?

if so, here few options...

try beautifulsoup4 parsing dom.
deploy on cloud server faster hardware. use amazon ec2 or digitalocean charges hour.
deploy on distributed system.

Search This Blog

Detect

python - Quick way to extract CSS style attributes from html elements -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -

c++ - importing crypto++ in QT application and occurring linker errors? -