Fremus.co.za

Demistifying Life and Web Development

Archive for December, 2009

Interesting code with HtmlAgilityPack

Yesterday I was busy with HTML to PDF conversion and for this I used the HTML Agility Pack. Everything worked great, except it seemed IE and FF/Chrome render different HTML. So today I took some fairly straightforward HTML and pushed it through HTMLAgility:






	
	




New Website Under Construction

And if I use this code to loop through the childnodes:

            HtmlDocument doc = new HtmlDocument();
            string s;
            StringBuilder builder = new StringBuilder();
            using (StreamReader reader = new StreamReader(@"C:\Documents and Settings\user\Desktop\fremus.net\index.htm"))
            {
                while ((s = reader.ReadLine()) != null)
                {
                    builder.AppendLine(s);
                }
            }
            doc.LoadHtml(builder.ToString());
            Console.WriteLine(doc.DocumentNode.ChildNodes.Count);
            foreach (HtmlNode node in doc.DocumentNode.ChildNodes)
            {
                Console.WriteLine(node.Name);
                foreach (HtmlNode childNode in node.ChildNodes)
                {
                    Console.WriteLine("\t\t" + childNode.Name);
                    foreach (HtmlNode grandChildNode in childNode.ChildNodes)
                    {
                        Console.WriteLine("\t\t\t" + grandChildNode.Name);
                    }
                }
            }

I get the following result in my command line window:
cmdline

As you can see from the output the html node has a text node. The head node has a text node, and it has 9 childnodes including 5 #text nodes. The body node has a text node as well, and it has 7 childnodes, four being #text and the other three being div. So what is this #text node? If you read this article on the W3C site you will see that it states:

A common error in DOM processing is to expect an element node to contain text.

However, the text of an element node is stored in a text node.

On the same page it then gives an example using a title tag. If you do a Google on “html #text node“, you will see that the second result points to an article and if you read the bit on the nodes it seems that each #text node is a child. The #text nodes that appear in the body node seem to point to the text spaces after each div or each element inside the body node. If I change my code slightly:

                    Console.WriteLine("\t\t" + childNode.Name);
                    foreach (HtmlNode grandChildNode in childNode.ChildNodes)
                    {
                        Console.WriteLine("\t\t\t" + grandChildNode.Name);
                        Console.WriteLine("\t\t\t\t" + grandChildNode.HasChildNodes);
                    }

It tells me that the divs have child elements, but the #text nodes do not. Thus it seems for each ‘empty space’ inside a node there exists a #text node. If I amend the HTML from earlier like this:





	
	







Then the footer div will have two text nodes, and the paragraph node will have a textnode. My issues yesterday had to do with the way IE rendered the HTML and that when I used HTMLAgility to parse it, the node counts weren’t the same. From the sample HTML I have given so far that difference is negligble, but I found that if I went to a site like this one and I saved the HTML from IE and Chrome into separate HTML files and I ran my code with that HTML, I got different node counts. Here are two screenshots that illustrate this:
chromeie

The first screen is the html from the page saved from chrome and the second one is from ie. Notice the extra text nodes.

  • Share/Bookmark
posted by fr3dr1k in Browsers,C# and have No Comments

WCF – Getting the foundations right

Ok so admittedly I have been using ASMX services for too long now and the time has come to kick it to the curb and adopt WCF. And the issue I have been having of late was that I was skimming through code just to get stuff done, without spending the time understanding some of the details.

Why would I want to adopt WCF? Well there are the list of reasons found in articles on MSDN, one whitepaper can be found here, and of particular interest is the combination of technologies and the general idea that interoperability is the main goal. But these things are just a way of promoting the technology, and its not until you understand what it can do that you realise what it is you are dealing with. And to help you get to that point you need to work through an example, and I found that after I worked through the “Getting Started Tutorial” example, a light went on and I was like, “ok I get it”. Essentially a WCF service is made up of two key elements (there is a third as well) but in terms of C# code there are two key elements:
*An interface marked as a Service Contract using the ServiceContract attribute and with the methods marked as OperationContracts using an attribute with the same name
*A class that implements the methods in the interface

The third part of a WCF service is the configuration settings which can be found in a web.config/app.config’s system.servicemodel tag. Within the servicemodel section you define service behaviours as well as endpoints. One of the keys to understanding WCF is knowing that a service is defined by its endpoint, see it as a consumer. WCF can be consumed by client web apps, Silverlight apps and desktop apps. The endpoints themselves have configuration settings as well specifically relating to message sizes.

From the tutorial I was able to see that you can run a WCF service in a browser, without having IIS running. Thats something I need to think about but it does pose a few interesting questions. After I did the tutorial I wanted to do a simple REST service, and that took a few minutes but eventually got that sorted. StackOverflow was quite helpful and so was several articles on MSDN, with this one being the most helpful.

  • Share/Bookmark
Tags: ,
posted by fr3dr1k in C#,WCF and have No Comments
Get Adobe Flash playerPlugin by wpburn.com wordpress themes