How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

I have been given a staff list which is supposed to be up to date but it doesn’t match an intranet People Finder which is written in ASP.NET.

As the information is sensitive I am not able to access the database the People Finder is using so the only way I can get at the information is by scraping the structure starting at the top brass at the top and then going through each tier in turn.

Each person has a Staff number which then forms the URL http://intranet/peoplefinder/index.aspx?srn=ABC1234 and then all the people who report to them are listed underneth in the format <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" rel="nofollow noreferrer noopener" target="_self"> where each URL indicates the Staff number and provides a link to their team.

The trouble arises when the teams are big as paging is implemented in the GridView with an URL such as <a href="javascript:__doPostBack('gvEmployees','Page$2')" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener">2</a>.

How would I scrape this page, capture the SRN and other details along with the people who report to the person on all pages of the GridView then loop through each reportee and do the same process until the whole list is complete?

Example HTML of result

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
<head><title>
    People Finder: Name Surname
</title><link rel="stylesheet" href="/path/to/style.css" rel="nofollow noreferrer noopener" type="text/css" /><link rel="stylesheet" href="/path/to/anotherStyle.css" rel="nofollow noreferrer noopener" type="text/css" />
    <script type="text/javascript" src="/path/to/peoplefinder.js"></script>
</head>
<body>
    <form name="form1" method="post" action="/path/to/index.aspx" id="form1">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="### ViewState ###" />
</div>

<script type="text/javascript">
<!--
var theForm = document.forms['form1'];
if (!theForm) {
    theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
// -->
</script>


<script src="/path/to/WebResource.axd?d=AueXWrgAf8xSxMTAt1Q4AA2&amp;t=633311832634916698" type="text/javascript"></script>

        <div class="HP3CHeader">
            <div id="LWHPBanner">
                <h1><span id="lblName">Name Surname</span></h1>
            </div>
        </div>

        <div id='CPMain'>
            <div id="mainBox">

            <div id="pnlEmployeeDetails">

                <div id='basicData'>
                    <img id="imgPhoto" class="photo" src="/path/to/photo.jpg" style="height:69px;width:69px;border-width:0px;" />
                    <span id="lblBusinessUnit">Business Unit</span>
                    <span id="lblCostCentreName">Cost Centre</span>
                    <span id="lblLocation">Location</span>

                    <a href='/path/to/checkcontactdetails.htm' target='_blank' onclick='return OpenCheckContactDetails();' >Find out how to change your details/photo.</a>
                    <div id="manager">
        <strong>Reports to: </strong><a id="hlManager" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener">Name Surname</a>
    </div>
                </div>

                <div id='contactData'>

                    <div id="pnlSrn">
        <strong>Staff number:</strong> <span id="lblSrn">ABC1234</span>
    </div>


                    <div id="pnlEmailAddress">
        <strong>Email Address:</strong> <span id="lblEmailAddress">Email</span>
    </div>
                    <div style="clear: both"></div>
                </div>

</div>

            <div id="pnlGrid">

                <h3><span id="lblGridTitle">Name's team</span></h3>
            <div>
        <table class="subordinates" cellspacing="0" cellpadding="2" rules="cols" border="1" id="gvEmployees" style="border-style:None;border-collapse:collapse;">
            <tr style="color:Black;background-color:#EFF3FB;border-style:None;font-weight:bold;">
                <th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$SRN')" rel="nofollow noreferrer noopener" style="color:Black;">SRN</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$FullName')" rel="nofollow noreferrer noopener" style="color:Black;">Full name</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$RACFID')" rel="nofollow noreferrer noopener" style="color:Black;">RACFID</a></th>
            </tr><tr class="reports" style="background-color:White;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl02_lnkFullName" href="index.aspx?srn=1K5932" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl03_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:White;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl04_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl05_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:White;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl06_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl07_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:White;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl08_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl09_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:White;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl10_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
                <td style="width:70px;">ABC1234</td><td>
                            <a id="gvEmployees_ctl11_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" target="_self">Name Surname</a> 
                        </td><td>ABCD</td>
            </tr><tr class="PagerStyle" style="color:#000039;border-style:None;">
                <td colspan="3"><table border="0">
                    <tr>
                        <td><span>1</span></td><td><a href="javascript:__doPostBack('gvEmployees','Page$2')" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" style="color:#000039;">2</a></td>
                    </tr>
                </table></td>
            </tr>
        </table>
    </div>

</div>
            </div>

            <div id="searchBox">
                <strong>Search People Finder:</strong>
                <br /><br />
                <span>Forename:</span><br/>
                <span><input name="txtFirstname" type="text" id="txtFirstname" /></span><br/>
                <span>Surname:</span><br/>
                <span><input name="txtSurname" type="text" id="txtSurname" /></span><br/>
                <span>RACFID:</span><br/>
                <span><input name="txtRacfid" type="text" id="txtRacfid" /></span><br/>
                <span>Staff number:</span><br/>
                <span><input name="txtSrn" type="text" id="txtSrn" /></span><br/>
                <div class="searchBoxItem" style="text-align:center;width:100%"><input type="submit" name="btnFind" value="Search" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;btnFind&quot;, &quot;&quot;, false, &quot;&quot;, &quot;index.aspx&quot;, false, false))" id="btnFind" title="Search for employees member" class="button" style="border-style:Outset;" /></div><br/> 
                <div>People Finder searches only UK staff.</div> 
               <!-- <div><a class="execBoardLink" href="/path/to/index.aspx?srn=ABC1234" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener">Show Executive Board</a></div> -->
                <div style="margin-top:5px;"><a href="/path/to/phonebook" rel="nofollow noreferrer noopener" target="phoneBook" onclick='return OpenPhonebook();' title="Open Phonebook in new window">Open Phonebook</a></div>
            </div>
        </div>

        <div class="contentFooter"  style="text-align:center;">
            <table width="100%" cellpadding="0" cellspacing="0" border="0" summary="Navigation layout table">
                <tr>
                    <td align="left"><span class="linkArrow">&lt;</span> <a href="javascript:history.back();" rel="nofollow noreferrer noopener">Back</a></td>
                    <td align="center"></td>
                    <td align="right"><span class="linkArrow">^ </span><a href="#top" rel="nofollow noreferrer noopener">Top</a></td>
                </tr>
            </table>
        </div> 

<div>

    <input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="vy066Txz34y1E515UsTSTDabHKEmdBRCsq7xM0lpJls1" />
    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWCgKM3uTTAgLP/83pDwLfwaTTAQKNguzjCAKt98LeCwLZh62pDwKKqdGpBwLd2q7jAwKa+5aMBAL5zb65C42zY4GBEUKujhjtZ/hZ8sLESfiF" />
</div></form>
</body>
</html>

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You could post a variable to the HTML page to go through the paging.

string lcUrl = "http://www.mysite.com/page.aspx";

HttpWebRequest loHttp =

   (HttpWebRequest) WebRequest.Create(lcUrl);


// *** Send any POST data

string lcPostData =

   "gvEmployees=" + HttpUtility.UrlEncode("Page$2");

loHttp.Method="POST";

byte [] lbPostBuffer = System.Text.           

                       Encoding.GetEncoding(1252).GetBytes(lcPostData);

loHttp.ContentLength = lbPostBuffer.Length;

Stream loPostData = loHttp.GetRequestStream();

loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);

loPostData.Close();

HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();

Encoding enc = System.Text.Encoding.GetEncoding(1252);

StreamReader loResponseStream =

   new StreamReader(loWebResponse.GetResponseStream(),enc);

string lcHtml = loResponseStream.ReadToEnd();

loWebResponse.Close();

loResponseStream.Close();

Then parse out the data you need from the string.

–EDIT–

Here is what I would try (something similar) where all of the post data is sent:

string lcPostData =

       "__EVENTTARGET" + HttpUtility.UrlEncode("gvEmployees"); &
"__EVENTARGUMENT" + HttpUtility.UrlEncode("Page%242"); &
"__VIEWSTATE" + HttpUtility.UrlEncode("<Value of _Viewstate>");

Method 2

You open the fiddler and open the second page of asp.net website table.Go to webforms tab in Fiddler for that particular page session and check in the body what are the variables are posting.Concat all variables in same sequence format and post data using HttpWebRequest.
In my case it was:

string PostData = "__EVENTTARGET=" 
    + HttpUtility.UrlEncode("ctl00$ContentPlaceHolder2$grdDirectory") 
    + "&"
    + "__EVENTARGUMENT="+HttpUtility.UrlEncode("Page$2") 
    + "&"
    + "__VIEWSTATE="+ HttpUtility.UrlEncode(view_state)
    + "&"
    + "__VIEWSTATEGENERATOR=" 
    + HttpUtility.UrlEncode(viewstategenerator)
    + "&"
    + "__VIEWSTATEENCRYPTED=" 
    + HttpUtility.UrlEncode(viewstateencrypted) 
    + "&" 
    + "__EVENTVALIDATION=" + HttpUtility.UrlEncode(eventvalidation);

Hope It will work.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x