1. Comment Feeds Without Well-formedness

    body

    In response to the popular confusion about XML well-formedness and a recent nudgeing by Greg, I have upgraded Blogger Recent Comments. People who have been there before will note that there is now one less instruction — XML well-formedness is no longer necessary! I have tested this with a Blogger template on which I purposely broke well-formedness and the comments still came through fine! Originally introduced on this blog, Blogger Recent Comments is a Ning app that can automatially generate RSS, JavaScript, and JSON feeds of all comments on your blog. Setup is now just three easy steps!

    Creative Commons Licence © 2006-2008 Stephen Paul Weber. Some Rights Reserved.
  2. The Importance of XML Well-formedness

    body

    XHTML validity is a buzzword around the Internet, but many people generally agree that it is not all that important. It has its advantages, but it is not the end of the world if you can't quite get it. XML well-formedness, however, is very important. Why? Because it makes server-side hackery much easier. That may not be the only reason, but it is an important one. Some people have mastered the art of screen-scraping with RegExps, but I and others like me have never quite mastered that often-complicated technique. Instead, it is much easier to parse the webpage as XML and pull out the data that way. This works especially well when the page is known to conform to some standard (as in the code addition for Blogger Recent Comments).

    While some leniancy can be built in, here are some basic guidelines for keeping your pages well-formed and making our job that much easier:

    1. XHTML empty tags — some tags, such as <br>, <link …>, and others used to be written in HTML as you see them there. This breaks XML well-formedness. Instead, one should use <br />, <link … /> and the like. (note to advanced users, this can be partially overcome using a RegExp line similar to $XMLdata = preg_replace('/<(img|meta|link|hr|br)([^<>]*?)([\/]?)>/i','<$1$2 />', $XMLdata); )
    2. Escaping out Ampersands — Many URLs contain the '&' character, and sometimes this character is used in content as well. If this character is left unescaped it breaks XML well-formedness. Use '&amp;' instead. (note to advanced users, this can be mostly overcome use a RegExp line similar to $XMLdata = preg_replace('/&([^;]{10})/i','&amp;$1', $XMLdata); )
    3. Escaping Scripts — JavaScript code will often contain characters that must be escaped out in XML, but which cannot be escaped out if the script is to work. To overcome this you add '//<![CDATA[' after every <script> tag and '//]]>' before every </script> tag.
    4. Closing tags — Some tags, such as <p> are often inserted by web designers without a closing tag. instead of '<p>text<p>more text' use '<p>text</p><p>more text</p>'. Note that XML is case-sensetive, so if you open a section, say, with <head> you must end it with </head> not </HEAD>
    5. Quoting Attributes — <p class="1"> not <p class=1>, etc. Quotation marks always go around attributes, no matter what.
    6. Non-tag < > — If you reference a Blogger template tag (such as <$BlogID$>) or for some other reason need to include a < or > character in content, you must escape it out with &lt; or &gt;, respecively.

    A note about content : Blogger's post form and comment form is not very good at checking XML well-formedness. Thus if you want to maintain a (at least mostly) well-formed page you must follow these rules in any code entered in these forms. For example, if you enter a < character in the blogger post form, it does not escape it out for you, you must actually enter &lt;, and the same goes for the comment form. This is sometimes annoying if you are trying to maintain full XML well-formedness because a well-meaning commentor can sometimes mess up your well-formedness and you must go and edit their comment. This is not usually the biggest problem, however, since it is usually one of the first two problems which can be overcome as noted. You can check for XML-formedness without validating XHTML using this tool.

    Tags:
    Tags:
    Creative Commons Licence © 2006-2008 Stephen Paul Weber. Some Rights Reserved.
  3. Outline Classes

    body

    The list of XOXO Developer's Resources has been updated to include the Outline Classes. Written in PHP4 but with compatability built in for PHP5 this set of classes is designed to be able to parse and create XOXO, OPML, hAtom, JSON, and arbitrary XML documents and fragments. The classes are GPL'ed.

    Creative Commons Licence © 2006-2008 Stephen Paul Weber. Some Rights Reserved.
Stephen Paul Weber