web scraping - Browser Automation with Selenium: Fingerprints, recognizability and traceability? -


i want use selenium/webdriver simulate browser , scrape website-content it. if not fastest method, me has many advantages such executing scripts etc.

for many websites forbidden access them via automated method, example search engines google or bing.

for 1 tool need scrape estimated resultstat google several keywords. following: simulate browser visits google.com , types in keyword , scrapes results, after little pause type in next keyword, scrape results , on...

my question is: possible website recognize i'm using selenium simulate browser instead of using browser hand? google case gives me doubts. know selenium partly developed google or @ least guys working google. leave selenium fingerprints or isn't possible decide if i'm using browser myself or simulated selenium, google?

no, nobody can see you're using selenium , not hand-operating browser webdriver. i'm not sure old selenium rc, should same way. here's how works:

  1. selenium opens browser clean profile (or profile selected)
  2. selenium hooked browser can steer it, control it. browser still of work. basically, selenium replaces user inputs browser, not more.

you can verify reading contents of http headers sent browser.

if ever needed selenium recognized server, can use browsermob-proxy , add custom header requests.


all said, there 1 thing must aware of. while there's no way detect selenium directly, there can indirect clues picked website you're visiting. include scanning many requests made in virtually no time - might issue you. make sure selenium behaving user.


edit 2016/04:

apparanetly is possible https://stackoverflow.com/a/33403473/2930045 states company can it. guess - , nothing guess - can run js selenium installs browser operate.


Comments