Markup Languages

A brief history of Markup languages

The first standardised structured information technology of any importance was SGML (Standard Generalised Markup Language) devloped by IBM. It was originally created to provide a way of formatting legal documents. It was subsequently expanded as an all purpose information standard and in 1986 emerged as an ISO (International Standards Organisation) standard

SGML is extremely powerful and as a consequence is also quite complex. For this reason in the early days of the Internet the search was on for something simpler. In 1989, Tim Berners-Lee and Anders Berglund, two researchers at CERN (the European Laboratory for Particle Physics) created a tag based language for marking up technical documents which could then be shared across the internet. This language we now know as HTML (Hypertext Markup Language) was a simplified version of SGML.

As the popularity of the web increased, so did the demands placed on it and HTML underwent a number of changes in order to keep up. Remember it was originally conceived as a way of presenting static information. The information presented on the web needed to be supported by databases with HTML front ends.

HTML was was evolving to cope with dynamic information (DHTML), supported by other technologies, such as Java applets and other plug-ins. In changing however it started to expose its weaknesses. The most obvious being its fixed number of tags. It would be nice if custom tags could be added to suit the particular needs of industry. SGML supported the use of custom tags and offered three significant benefits that were missing from HTML, namely

  • Extensibility
  • Structure
  • Validation

Technically HTML does have structure, but most browsers, and must notable MSExplorer, did not enforce it. Good marketing perhaps, selling a browser that doesn't complain when users write sloppy code, but a bad idea.

In 1996 the W3C (the World Wide Web consortium) set out to find a way of adding the benefits of SGML but retain the simplicity of HTML. The result of their work was released two years later in February 1998 as XML 1.0 (eXtensible Markup Language). XML is a sub set of SGML, the original XML specification was about a tenth the size of the SGML specification. So isn't XML just another version of HTML?

No, it is radically different. HTML is a specific markup

language for encoding information and displaying displaying information in a web browser. XML is a specification for designing markup languages. I.e. its a meta language, (meta meaning information about information). In this case information about how to design your own markup langauge. So if you think about it, XML can be used to design HTML.

In fact this has already been done, and is reflected in the current HTML specifications. The new version of HTML naturally is called XHTML. But hold on a minute, HTML has a fixed number of tags, to markup text etc. for display. The browser knows how to display each tag. (Which is why different browsers from different manufacturing, display things in a slightly different way).

Whats a browser going to do if I make up my own set of tags?

OK remember XML is about structuring information, not about how it should be displayed.

There are XML viewers which let you view the structure of the HTML markup but this doesn't show how the information is to be displayed. If you have used HTML its a problem you are perhaps already familar with. Along with the content of the document itself you must embed tags to tell the browser how to display the document. Consequently, you end up with a document thats littered with <font> tags for example. I.e. the content and format information is all mixed up together. That is until Cascading Style Sheets (CSS) came along. Now the content and format information can be kept separate.

So you use XML to structure the content, and CSS to describe how it is to be displayed. Problem solved then. Well, in the same way as people started to realise the limitations of HTML, the spotlight has turned on CSS. It to has its shortcoming, so along with XML comes XSL (eXtensible Style sheet Language).

Currently either CSS or XSL can be used to display an XML document. Eventually though CSS may be gobbled up by XSL in the same way HTML will probably be swallowed up by XML.

In focusing on how information is to be displayed is straying from the real point of XML. HTML and the Internet is currently designed for use by humans, not machines. How many times have you typed a query in your favourite search engine, and get a million replies, on all sorts of things you never intended.

The classic story perhaps is the dear old ladies of the Womens Institute in Norfolk, who got their first computer. As a group they were very keen on lace making, so naturally enough typed lace into their search engine. What they didn't know, but very quickly found out, is that lace is Amercian slang for prostitution, so no guesses what the majority of search results were.

Now some of the later generation of search engines will take the million or so results and categorise them for you. By selecting the category you are interested in the search engine can use this information to filter out all the unwanted stuff. What you are doing is provided metadata i.e. information about information (your query) by putting it into context.

It is this more fundamental problem, that XML is setting out to address; because once the machines understand what we are looking for, and can manipulate the data for themselves (using Artificial Intelligence techniques), maybe in the not to distant future when you enter your query you will actually get back just what you what.

Latest revisions

Title Taxonomy terms Updated date
Basic Webpage page 05 Mon-Mar-2012

Web Page design

The Basic Web Page

A basic web page is shown below. You'll notice right away that the HTML page contains both readable text and different types of tags enclosed in angle brackets (< >). These HTML tags tell the browser how to display the page.

Add markup tags markup, tag, HTML 05 Mon-Mar-2012

Adding the Markup tags

Still not clear? The best way to understand markup is a little demonstration. So open your favourite text editor and type the following four lines of text. Then save the file as "ExUnformat.htm". Keep the text editor open.

HTML tutorial
Section heading
Sub heading
Paragraph tag

If you are using Microsoft's notepad, then before saving the file, make sure the file type is set to all files.

HTML formating nest, format, HTML 16 Thu-Feb-2012

HTML Formatting

The example of a basic web page, introduces the < H1 > and < P > tags and adds some simple formatting commands

How HTML works definition, HTML 10 Fri-Feb-2012

How HTML works

HTML stands for Hypertext Markup Language, so what does that mean?. A dictionary definition of hypertext is,

A definition of Hypertext
(noun) Computing a software system that allows extensive cross referencing between related sections of text and associated graphic material.

Imagine you have looked up an article in an encyclopaedia. At the end of the article it says see also with a reference to another section, which you then go on and read. That's hypertext bit. One definition of Markup is

HTML Getting Started HTML 09 Thu-Feb-2012

Introduction to HTML

HTML is the language of the World Wide Web (WWW). This short course will enable you to read a HTML document and understand broadly what is going on. By the end of the course you will have the basic knowledge needed to create your own web pages, If it has wetted your appetite for more, there are numerous Web sources and books which cover both HTML and JavaScript. Its not possible to talk about Mark-up languages and the Web without providing a plug for the WWW consortium whom manage the web standards and development.