selenium

selenium是个浏览器自动化工具,操作例如Chrome的headless模式、PhantomJS(本身就是headless的,已停止开发)等浏览器。

github: https://github.com/SeleniumHQ/selenium

docs: https://seleniumhq.github.io/docs/index.html

Selenium使用Headless Chrome浏览器

via: https://stackoverflow.com/questions/46920243/how-to-configure-chromedriver-to-initiate-chrome-browser-in-headless-mode-throug

1
2
3
4
5
6
7
8
9
10
11
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
options.add_argument('--no-sandbox') # # Bypass OS security model
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options)
print ("Headless Chrome Initialized on Linux OS")

还有建议用chromedriver的,在生产环境该用它吧。

使用Python+Selenium对某元素截图

via: https://www.jianshu.com/p/7ed519854be7

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from selenium import webdriver
from PIL import Image

driver = webdriver.Chrome()
driver.get('http://stackoverflow.com/')
driver.save_screenshot('screenshot.png')

left = element.location['x']
top = element.location['y']
right = element.location['x'] + element.size['width']
bottom = element.location['y'] + element.size['height']

im = Image.open('screenshot.png')
im = im.crop((left, top, right, bottom))
im.save('screenshot.png')

Selenium使用cookies

1
2
3
4
5
6
7
brower.add_cookie({
"domain":".taobao.com",
"name":"xxx",
"value":"xxx",
"path":'/',
"expires":None
})