[svlug] XML vs HTML

William R Ward bill at wards.net
Wed Jul 11 13:24:01 PDT 2001

Deirdre Saoirse Moen <deirdre at deirdre.net> writes:
> HTML is an instance of an SGML-compliant markup language.

Actually, that's not quite true.  Well-formed HTML is an instance of
SGML, but most HTML in actual use would not be accepted by any SGML
parser, methinks...

> So is XML. 

XML is not an *instance* of SGML, it's a separate meta-language, which
is a *subset* of SGML.  They cut out all the SGML features that are
hard to process quickly, so that XML parsers could be lightning fast.

> Well-formed HTML *is* XML, but HTML is not required to be well-formed.
> By well-formed, I mean a closing tag for each opening tag, like so:
> <p>This is a paragraph.</p>

Also, HTML is not XML, well-formed or not.  In XML, all tags must have
closing tags, including things like <img>.  Because people want to use
XML tools for HTML pages, we have a thing called XHTML, where you have
tags like this:
 <img src="pr0n.jpg" alt="pr0n" />

In XML, a tag that ends in / counts as both opening and closing tags.
The space before the / is in order for existing web browsers to not
get confused when reading XHTML.  XML parsers just ignore the space,
so everyone's happy.


