
SGML, The Forerunner of HTML: Tutorials and More

Disclosure: Your support helps keep the site running! We earn a referral fee for some of the services we recommend on this page. Learn more
SGML is the forerunner of HTML and XML. It was invented in the 1960s and fully standardized by the ISO in 1986.
The markup looks a lot like XML or HTML — angle brackets are used to define opening and closing tags, which set off various elements of a document. Like XML, SGML is open-ended — you can define any set of element tags and specify them using a Document Type Definition. SGML parsers can then validate an SGML document against the DTD.
Until HTML5, HTML was considered an application of SGML — there were even SGML DTDs that defined HTML. HTML5 moved away from SGML and is now its own standard.
HTML wasn’t the only important use for SGML. Because it promised interoperability and stability, it was widely adopted by government, military, industry, and large enterprises. It has since been largely replaced by XML, which is similar in format and philosophy, but easier to work with.
SGML Tutorials
- A Brief SGML Tutorial from the W3C is mostly about the relationship of SGML to HTML.
- HTML Unleashed. SGML and the HTML DTD is a multi-part history and tutorial on SGML and how HTML grew out of it.
- A Gentle Introduction to SGML is an extensive tutorial on using SGML.
- SGML and SGML-Tools describes a bit of the history of SGML in relationship to DocBook, including why XML has become the favored format over SGML.
- SGML Document Introducing You to SGML is an explanation of SGML, contained in an SGML document. This gives you a chance to see what the markup actually looks like.
- Software Documentation in SGML or XML explains the benefits of SGML and provides examples of how it can be used in software documentation.
- EAD Application Guidelines for Version 1.0: SGML and XML Concepts is a (now deprecated) document regarding the Encoded Archival Description application from theLibrary of Congress. While the documentation is from the perspective of EAD, it still provides quite a bit of useful information about SGML itself.
- SGML for Windows NT (PDF) explains how to set up a free editing and publishing system for SGML and XML.
Additional Information and Reference
- This SBML FAQ from 1998 provides some interesting insight into the relationship of SGML to HMTL and XML at that time.
- Using SGML as a Basis for Data-Intensive Natural Language Processing is a scholarly article that looks at corpus-processing (analyzing a large body of texts) using SGML for documents rather than storing texts in a database.
- Electronic Thesis and Dissertation Markup Language (ETD-ML) is an application of SGML. The User’s Guide provides background info on SGML, and the project as a whole is a good example of what SGML was used for.
- The Roots of SGML is a personal memoir from a lawyer-turned-programmer, about the development of computer typesetting and document authoring.
- History and Relationships of SGML, HTML and XML explains how these different standards developed together, with a handy visual.
- Comparison of SGML and XML is a note from the W3C, and is perhaps the most thorough treatment available on the differences in these two markup languages.
- Indexing SGML Files Using LT NSL is a scholarly article describing a method for indexing large SGML documents.
- Charles F Goldfarb’s SGML Source Home Page is one of the most complete resources on SGML still in existence, created by the inventor of SGML.
- ISO 8879:1986 is the official standard for SGML. This text is only available from ISO for a fee. If you are interested in reading the text, it is reproduced in its entirety by the Austrailian Standard for SGML (PDF).
- On SGML and HTML is a W3C publication detailing the relationship between HTML (version 4 at that time) and SGML.
- Guidelines for Writing SGML DTDs is a draft specification for the Text Encoding Initiative.
- MARC SGML Archive from the Library of Congress is an archive of documentation related to the SGML version of their MARC machine-readable bibliographic and citation system. SGML was was abandoned in favor of XML in 2001, but the information here is still of historical interest.
- SGML/XML Bibliography has a list of books and articles about SGML, along with some introductory material on the language.
SGML Tools
- OpenJade is an implementation of DSSSL, Document Style Semantics and Specification Language, which is an ISO standard for formatting SGML documents. OpenJade includes OpenSP, which is the only viable Open Source SGML parser available.
Books on SGML
- Practical SGML (2013) by Eric van Herwijnen is one of the most recent books available on SGML, which also make it one of the most practical in terms of relevant tooling and contemporary examples.
- PARSEME.1st: SGML for Software Developers (1997) by Sean McGrath emphasizes using SGML documents in a software system, rather than primarily for human consumption. The ideas in this book prefigure much of the XML ecosystem that would develop within a few years.
- SGML and HTML Explained (1997) by Martin Bryan is the renamed second edition of SGML: An Author’s Guide. This book provides an accessible explanation of SGML’s features and capabilities, and how HTML implemented SGML.
- Practical Guide to SGML/XML Filters (1998) by Norman E Smith provides in-depth information about the differences between SGML and XML, and how to translate data between formats.
- The SGML Implementation Guide: A Blueprint for SGML Migration (1995; reprinted in 2013) by Travis and Waldt focuses on implementing SGML as an authoring and publication tool in businesses and large enterprises.
- ABCD…SGML: A User’s Guide to Structured Information (1995) by Liora Alschuler is intended as a non-technical introduction to the power and promise of SGML for document management. Of particular historical interest today is the number of case studies describing real organizations that adopted SGML.
- Developing SGML DTDs: From Text to Model to Markup (1995) by Maler and El Andaloussi is a guide to creating DTD specifications for SGML documents.
- The SGML Handbook (1991) by Charles F Goldfarb is the definitive reference book on SGML, by the language’s inventor. This book contains the entire text of the ISO specification, along with detailed annotations by Goldfarb. Not a book for beginners, but essential reading for serious SGML developers and researchers.
Summary
SGML is mostly obsolete today outside of legacy systems. It does still get a lot of use in government agencies like the Library of Congress and highly regulated international industries like aeronautics. But even here, SGML is slowly being phased out in favor of XML.
What makes SGML really interesting is not how it is used today, but the revolution that is spawned. SGML gave birth to HTML, which had a deep impact on the evolution of the World Wide Web. It also gave rise to XML, which has made the world’s data compatible and interoperable in ways few people imagined possible.
Further Reading and Resources
We have more guides, tutorials, and infographics related to coding and website development:
- Composing Good HTML: this is a solid introduction to writing well-formed HTML and using HTML validator software.
- CSS3 — Intro, Guides & Resources: this is a great place to start learning webpage layout.
- PostScript Introduction and Resources: a popular language for describing physical page layout.
HTML for Beginners — Ultimate Guide
If you really want to learn HTML, we’ve created a book-length article, HTML for Beginners — Ultimate Guide And it really is the ultimate guide; it will take you from the very beginning to mastery.
Comments