I’m scraping an old ASP.net website using Python’s requests module.
I’ve spent 5+ hours trying to figure out how to simulate this POST request to no avail. Doing it the way I do it below, I essentially get a message saying “No item matches this item reference.”
Any help would be deeply appreciated – here’s the request and my code, a few things are modified out of respect to brevity and/or privacy:
My own code:
import requests
# Scraping the item number from the website, I have confirmed this is working.
#Then use the newly acquired item number to request the data.
item_url = http://www.example.com/EN/items/Pages/yourrates.aspx?vr= + item_number[0]
viewstate = r'/wEPD...' # Truncated for brevity.
# Create the appropriate request and payload.
payload = {"vr": int(item_number[0])}
item_request_body = {
"__SPSCEditMenu": "true",
"MSOWebPartPage_PostbackSource": "",
"MSOTlPn_SelectedWpId": "",
"MSOTlPn_View": 0,
"MSOTlPn_ShowSettings": "False",
"MSOGallery_SelectedLibrary": "",
"MSOGallery_FilterString": "",
"MSOTlPn_Button": "none",
"__EVENTTARGET": "",
"__EVENTARGUMENT": "",
"MSOAuthoringConsole_FormContext": "",
"MSOAC_EditDuringWorkflow": "",
"MSOSPWebPartManager_DisplayModeName": "Browse",
"MSOWebPartPage_Shared": "",
"MSOLayout_LayoutChanges": "",
"MSOLayout_InDesignMode": "",
"MSOSPWebPartManager_OldDisplayModeName": "Browse",
"MSOSPWebPartManager_StartWebPartEditingName": "false",
"__VIEWSTATE": viewstate,
"keywords": "Search our site",
"__CALLBACKID": "ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22",
"__CALLBACKPARAM": "startvr"
}
# Write the appropriate headers for the property information.
item_request_headers = {
"Host": home_site,
"Connection": "keep-alive",
"Content-Length": len(encoded_valuation_request),
"Cache-Control": "max-age=0",
"Origin": home_site,
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie": "__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button",
"Accept": "*/*",
"Referer": valuation_url,
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "en-US,en;q=0.8"
}
response = requests.post(url=item_url, params=payload, data=item_request_body, headers=item_request_headers)
print response.text
What Chrome is telling me the request looks like:
Remote Address:202.55.96.131:80 Request URL:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789 Request Method:POST Status Code:200 OK Request Headers Accept:*/* Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en;q=0.8 Cache-Control:max-age=0 Connection:keep-alive Content-Length:21501 Content-Type:application/x-www-form-urlencoded; charset=UTF-8 Cookie:__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button Host:www.site.com Origin:www.site.com Referer:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789 User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36 Query String Parameters vr:123456789 Form Data __SPSCEditMenu:true MSOWebPartPage_PostbackSource: MSOTlPn_SelectedWpId: MSOTlPn_View:0 MSOTlPn_ShowSettings:False MSOGallery_SelectedLibrary: MSOGallery_FilterString: MSOTlPn_Button:none __EVENTTARGET: __EVENTARGUMENT: MSOAuthoringConsole_FormContext: MSOAC_EditDuringWorkflow: MSOSPWebPartManager_DisplayModeName:Browse MSOWebPartPage_Shared: MSOLayout_LayoutChanges: MSOLayout_InDesignMode: MSOSPWebPartManager_OldDisplayModeName:Browse MSOSPWebPartManager_StartWebPartEditingName:false __VIEWSTATE:/wEPD...(Omitted for length) keywords:Search our site __CALLBACKID:ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22 __CALLBACKPARAM:startvr
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You have too many request parameters, and should not set the content-type, content-length, host, origin, or connection headers; leave those to requests to set.
You are also doubling up the url parameters; either add the vr parameter to the URL manually or use params, not do both.
It may well be that some of the parameters in the POST body are generated by the ASP application tied to a session. I’d use a GET request with a Session object the valuation_url, parse the form in that page to extract the __CALLBACKID parameter. The requests Session will then store any cookies the server sets and reuse those:
item_request_headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
"Accept": "*/*",
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "en-US,en;q=0.8"
}
payload = {"vr": int(item_number[0])}
session = requests.Session(headers=item_request_headers)
# Get form page
form_response = session.get(validation_url, params=payload)
# parse form page; BeautifulSoup could do this for example
soup = BeautifulSoup(form_response.content)
callbackid = soup.select('input[name=__CALLBACKID]')[0]['value']
item_request_body = {
"__SPSCEditMenu": "true",
"MSOWebPartPage_PostbackSource": "",
"MSOTlPn_SelectedWpId": "",
"MSOTlPn_View": 0,
"MSOTlPn_ShowSettings": "False",
"MSOGallery_SelectedLibrary": "",
"MSOGallery_FilterString": "",
"MSOTlPn_Button": "none",
"__EVENTTARGET": "",
"__EVENTARGUMENT": "",
"MSOAuthoringConsole_FormContext": "",
"MSOAC_EditDuringWorkflow": "",
"MSOSPWebPartManager_DisplayModeName": "Browse",
"MSOWebPartPage_Shared": "",
"MSOLayout_LayoutChanges": "",
"MSOLayout_InDesignMode": "",
"MSOSPWebPartManager_OldDisplayModeName": "Browse",
"MSOSPWebPartManager_StartWebPartEditingName": "false",
"__VIEWSTATE": viewstate,
"keywords": "Search our site",
"__CALLBACKID": callbackid,
"__CALLBACKPARAM": "startvr"
}
item_url = 'http://www.example.com/EN/items/Pages/yourrates.aspx'
response = session.post(url=item_url, params=payload, data=item_request_body,
headers={'Referer': form_response.url})
The session handles the headers (setting a user agent, and accept parameters), only on the POST with the session do we add a referrer header as well.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0