python - Quick way to extract CSS style attributes from html elements -


for machine learning purposes, have html page input, extract style attributes of dom elements. so, here preliminary code:

from selenium import webdriver  start = time.time() driver = webdriver.phantomjs() driver.get('example page') elements = driver.find_elements(by.xpath, "//*[not(child::*)]") #select leaf nodes l = {} css_properties=("line-height", "text-align","font-size", "font-style")  in elements:     if i.text:         #print time.time() - end_dl         if i.text not in l:             l[i.text] = {}         el in css_properties:             l[i.text][el] = str(i.value_of_css_property(el))             l[i.text]["text_length"] = len(i.text) 

the problem code taking long parse features (~8s). can think in faster way this?

are sure it's parsing step that's taking long?

if so, here few options...

  1. try beautifulsoup4 parsing dom.
  2. deploy on cloud server faster hardware. use amazon ec2 or digitalocean charges hour.
  3. deploy on distributed system.

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -