python - Quick way to extract CSS style attributes from html elements -


for machine learning purposes, have html page input, extract style attributes of dom elements. so, here preliminary code:

from selenium import webdriver  start = time.time() driver = webdriver.phantomjs() driver.get('example page') elements = driver.find_elements(by.xpath, "//*[not(child::*)]") #select leaf nodes l = {} css_properties=("line-height", "text-align","font-size", "font-style")  in elements:     if i.text:         #print time.time() - end_dl         if i.text not in l:             l[i.text] = {}         el in css_properties:             l[i.text][el] = str(i.value_of_css_property(el))             l[i.text]["text_length"] = len(i.text) 

the problem code taking long parse features (~8s). can think in faster way this?

are sure it's parsing step that's taking long?

if so, here few options...

  1. try beautifulsoup4 parsing dom.
  2. deploy on cloud server faster hardware. use amazon ec2 or digitalocean charges hour.
  3. deploy on distributed system.

Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

jquery - Fancybox - apply a function to several elements -

An easy way to program an Android keyboard layout app -