Scrape JS Rendered Websites with Ruby and PhantomJS
Before Angular and its trillions of lookalikes a simple html request™ worked just fine. Nowadays, things have changed. More and more websites unconditionally require a working JavaScript Engine to be rendered readable. This disqualifies a simple html request™ to get what you need.
PhantomJS to the recue
Please install PhantomJS first.
Install Watir, this allows you to interact with urls, just like your favourite browser would do.
$ gem install watir
In our case it will be PhantomJS. Read more about the concept on Watir’s official Website and see also Watir on GitHub. -
OPTIONAL Download the binary from and extract the package. Then copy the /bin folder (with the file called
in it) into your Rails™ or Sinatra™ project’s /bin folder to make this work on heroku.
Get your scrape on
require 'watir'
require 'nokogiri'
# go to url
url = ''
puts "Loading: #{url}"
# set up a watir browser session with phantomjs
browser = :phantomjs
# make watir visit the url
# this is crucial!
# give it some time to render :)
# 4 seconds might be too long - find your own sweet spot
# time for some Nokogiri™ magic
html_doc = Nokogiri::HTML(browser.html)
puts 'Received Website\'s rendered HTML'
puts html_doc.css('h1').text
# close watir session! be kind.
⬅️ Read previous Read next ➡️