Are there any good tools besides SeleniumRC that can fetch webpages including content post-painted by JavaScript?

One major shortcoming of curl is that more and more wepages are having their main piece of content painted by a JavaScript AJAX response that occurs after the initial HTTP response. curl never picks up on this post-painted content.

So to fetch these types of webpages from the command line, I’ve been reduced to writing scripts in Ruby that drive the SeleniumRC to fire up a Firefox instance and then return the source HTML after these AJAX calls have completed.

It would be much better to have a leaner command line solution for this type of problem. Does anyone know of any?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Have you considered Watir?

http://watir.com/

When you’ve added the package, you can run it as a standalone file or from irb, line-by-line after include 'watir-webdriver'. I’ve found it to be more responsive than selenium-webdriver, but without the test recording GUI to help work out complex test conditions.

Method 2

I just recently started using the WebDriver from Selenium 2 in Java. There is a driver called HtmlUnitDriver that fully supports JavaScript but does not fire up an actual browser.

It is not a light solution but it does get the job done.

I’ve designed the code to run from the command line and save the web data to files.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x