why HTML Agility Pack HtmlDocument.DocumentNode is null?

I’m using this code to change the href attribute of a HTML stream.

first I download a full html page using this code:(URL is webpage address)

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 
                         (HttpWebResponse)myHttpWebRequest.GetResponse();

Stream s = myHttpWebResponse.GetResponseStream();

then I process this:

HtmlDocument doc = new HtmlDocument();

doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);

s is html stream.

but I’ve got an exception that says doc.DocumentNode is null!

i tried many sites but doc.DocumentNode is null to

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

This works for me.

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.

Use HttpUtility.UrlDecode to decode it.

Method 2

Try using //a instead of /a.

In XPath, this basically means give me all the links in the document, as opposed to give me all the links in the document root.

Update:

The following code works fine:

        var myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://google.com");
        var myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

        var s = myHttpWebResponse.GetResponseStream();

        var doc = new HtmlDocument();

        doc.Load(s);
        foreach (var link in doc.DocumentNode.SelectNodes("//a"))
        {
            var att = link.Attributes["href"].Value;
            link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;

            Console.WriteLine(link.Attributes["href"].Value);
        }

Method 3

Here is your answer: HTML Agility Pack Null Reference.

Method 4

Try using the below code:

HtmlDocument htmlDoc = new HtmlDocument
        {
            OptionAddDebuggingAttributes = false,
            OptionAutoCloseOnEnd = true,
            OptionFixNestedTags = true,
            OptionReadEncoding = true
        };
        try
        {
            using (Stream reader = myHttpWebResponse.GetResponseStream())
            {
                reader.Seek(0, SeekOrigin.Begin);
                htmlDoc.Load(reader, true);
            }
            HtmlNode node = htmlDoc.DocumentNode;
            if (node != null)
            {
                foreach (var href in doc.DocumentNode.Descendants("a").Select(x =>x.Attributes["href"]))
                 {
                     href.Value = "http://ahmadalli.somee.com/default.aspx?url=" +HttpUtility.UrlEncode(href.Value);
                 }
            }
        }
        catch { }

I am using HtmlAgility pack version: 1.4.0

Solved your problem? If no, please comment. Else mark as answer.

Method 5

Anchor tag reference is an incorrectly escaped string:

...doc.DocumentNode.SelectNodes("/a")    //incorrect
...doc.DocumentNode.SelectNodes("//a")   //correct
...doc.DocumentNode.SelectNodes(@"/a")   //also correct

The original code fails to select any nodes and evaluates to null; this should be checked against to prevent failing on, say, a document where there are no links at all (however unlikely that is 🙂

var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
    foreach (HtmlNode link in anchors)
    {
        /*do stuff*/
    } 
}


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x