How to trigger a JS ASP.Net next page event using scrapy?

I’m scraping content off this website I start by sending a FormRequest that yields the search result based on Wim Herman’s answer on my other question here

I scrape what is needed and want to move to the next page which does not consist of a url, it’s triggered by JS. Here’s how the html tag looks like:

<a href="javascript:__doPostBack('dgSearchResults$ctl24$ctl01','')">2</a>

I tried the following and nothing seems to work:

In [18]: fr = FormRequest.from_response(response, formdata={"__EVENTTARGET": 'dg
    ...: SearchResults$ctl02$ctl03'})                                           

In [19]: fetch(fr)                                                              
2020-08-24 16:47:06 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx> (referer: None)

In [20]: view(response)                                                         
Out[20]: True

and this:

In [21]: fr = FormRequest.from_response(response, formdata={"__EVENTTARGET": 'dg
    ...: SearchResults$ctl02$ctl01'}, clickdata={'type': 'submit'})             

In [22]: fetch(fr)                                                              
2020-08-24 16:50:24 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx> (referer: None)

In [23]: view(response)                                                         
Out[23]: True

when I view the response, it either lands me on the initial page (the one containing the initial form) or just nothing happens, the page number is still set to 1.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

As I mention in the comment this is pretty common issue on ASP Net pages. As you probably know by now the js you mentioned will trigger a POST request. The body of this post request may contain fields that you filled in your search form as inputs and several hidden inputs generated by the page instance (like __VIEWSTATE or __VIEWSTATEGENERATOR ).

When you use the FormRequest.from_response() method it will search for those inputs to fill the request body, it does that by selecting all input elements inside the //form element in the page. Sometimes that’s ok, sometimes it isn’t, that’s your case.

When the method selects all inputs, it gets an input that was meant for something else. In your case it is this input:

<input id="cmdSearchNew" value="New Search" ... />

How would you know?

If you use your browser’s dev tools and analyse how the request is made to change from page 1 to 2 you will see that it’s a POST request and it’s body is something like this:

{
    "__EVENTTARGET":"dgSearchResults$ctl24$ctl01",
    "__EVENTARGUMENT":"",
    "__VIEWSTATE":"jyAD4Bm...",
    "__VIEWSTATEGENERATOR":"11C1F95B",
    "__EVENTVALIDATION":"TmG0xFB..."
}

However, if you inspect the body of your scrapy request (you can print your fr.body in the shell you are already using) you will see somethng like this:

{
    "__EVENTTARGET":"dgSearchResults$ctl24$ctl01",
    "cmdSearchNew": "New Search"
    "__VIEWSTATE":"jyAD4Bm...",
    "__VIEWSTATEGENERATOR":"11C1F95B",
    "__EVENTVALIDATION":"TmG0xFB..."
}

It will be urlencoded, this is a parsed view

That cmdSearchNew field shouldn’t be there, it’s meant for something else, but scrapy couldn’t know that as it was inside the same form. (Also __EVENTARGUMENT won’t be there because the value is empty, so Scrapy will ignore it)

Once you identified the problem, you can tell the from_response() method that you don’t want a specific field to be in the body, by setting it to None.

fr = FormRequest.from_response(response, formdata={
    '__EVENTTARGET': 'dgSearchResults$ctl24$ctl01',
    'cmdSearchNew': None
})

This should be enough for you to get the response for page 2.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x