Introduction to semantic HTML

Fix the way you write HTML code

Paweł Dąbrowski

CTO / Software Engineer

How many div elements did you use in your last project? There is a high chance that the amount is quite big. Writing non-semantic HTML code is usually a sin of backend developers who occasionally write frontend stuff but not only. If you fall into this category or you don’t know what semantic HTML is, you are in the right place.

I still remember my first website. The whole structure was based on tables, and there was no formatting besides the font size and color. It feels like it was yesterday, but over 17 years have passed since that day. The technology has completely changed since then, and so have the rules for writing HTML code.

This article is a complete guide to writing semantic HTML. HTML code not only presents the information correctly but also is easily searchable by search engines and interpreted by various devices (also those used by disabled people).

Accessibility object model (AOM)

If you had a chance to work with the frontend side of web development for a while, most likely, you would know what DOM is. If not, here you are: the shortcut DOM stands for Document Object Model, and it’s a structure that is created by the browser as it parses your website. Thanks to it, you can later make your app more interactive using JavaScript.

Besides DOM, the browser also creates AOM, the semantic version of the DOM. This version is used by assistive devices like screen readers or magnifiers. Thanks to this standard, the web is easily accessible for people with certain impairments: visual, cognitive, hearing, or mobility.

Basically, if you are using too many div or span elements in your website structure, you make it harder for people with disabilities to access the information. But not only. The new WC3 standard also is used generally to better render websites on various devices also for people without disabilities. The goal is to render HTML correctly for everyone on every device.

Inspecting the AOM of the website

If you are curious about the structure of the Accessibility Object Model of a given website, you can access it the same way you can access the DOM tree. For example, using Google Chrome, open the developers tool panel, select the Elements tab, and on the right side of the console, select the Accessibility tab:

As you can see, non-semantic elements are marked as generic, and other semantic elements are identified with their names. The structure on the screenshot is mixed, but it’s hard to get rid of div elements completely.

The accessibility audit

You don’t have to check the AOM tree by yourself. You can perform a quick audit by using the Lighthouse tool. Visit pagespeed.web.dev, type the URL you want to audit, and check the accessibility tab.

Semantic and non-semantic elements

Now that you understand what AOM is and how important role it plays in the modern internet, it is a good time to find out which elements of HTML code are considered non-semantic and which are semantic.

Non-semantic elements

I mentioned before that some of the most widely used non-semantic HTML elements are div as blocks and span as inline elements. They are considered as non-semantic because:

  • They don’t have any meaning - just by looking at the name of the element, you can’t tell what the element can contain.
  • They can contain anything - you can put literally anything. You can build the whole website layout or navigation by using only div and span elements.

Besides div and span, as non-semantic elements, we can also consider <br> and <hr> elements.

Semantic elements

Now that you know what non-semantic elements are, it’s easy to tell which elements you can consider semantic:

  • They have special meanings - just by looking at the name of the element, you can tell what the element is used for.
  • They should contain specialized content - each semantic element is designed to contain a specific type of content so it’s easily parsable by devices, but also it’s easier to understand by developers.

There are way more semantic elements than non-semantic elements. The reason is simple: in modern websites, we can put a lot of different things, and to make it easy for devices to understand the structure, we have to use various elements.

The most popular semantic elements include:

  • section - a separate group of elements. For example, the pricing table, the hero section, or the part of the website where the testimonials are presented.
  • article - independent part of the content. It should be possible to distribute the content of the article element independently from the rest of the website. An example is an article, user comment, or forum post.
  • aside - as the name states, it represents the element that is indirectly related to the main content, which in most cases is just a sidebar on the left or right side of the website.
  • header - usually represents the navigation of the website, contains information about the author, or includes headings of different levels.
  • nav - element used to present the navigation part with the set of links to different parts of the website.
  • footer - an element that usually represents the content at the bottom of the website.

Those are just a few most popular semantic elements. Their names are self-explanatory enough, so you should intuitively know what kind of content you should put inside. Other popular semantic elements are: time, figure, main, or address.

We can also categorize semantic elements into text, structure, and uncategorized elements.

Semantic elements for text

There is a high probability that you are already using a lot of semantic elements for text. These include the following popular elements:

  • h1, h2, h3, h4, h5, h6 - for formatting headers of different importance and levels.
  • ul and ol - for unordered and ordered lists that include one ore more li elements that represent a single element of a list.
  • a - for links.
  • p - for paragraphs.

Semantic elements for the structure

The structure describes how the text and other website elements are located on the screen. I already listed a lot of them, for example, article or section.

Uncategorized semantic elements

There are also a few elements that can’t be categorized either as text or structure elements:

  • img - for rendering images.
  • table - for rendering tables with one or more headers and columns.
  • iframe - for embedded elements.
  • figure and figcaption - for images that require description.

Benefits of writing semantic HTML

I mentioned that when you write semantic HTML, you make the website more accessible for various devices, including assistive devices like screen readers or magnifiers that help people with disabilities. This is not the only benefit:

  • Readability - it’s easier to find out the specific section of the website in the source code when semantic elements are used for the structure.
  • Simplicity - you can use fewer class and id attributes as you use fewer generic elements. By default, you can distinguish them by their names.
  • SEO improvements - web crawlers will index your website more efficiently, resulting in a better position in search results in engines like Google, Bing, or DuckDuckGo.
  • Efficiency - the development of the website will be much faster with semantic elements as most websites will share the same skeleton, including header, footer, or main elements.
  • Preparation for the future - writing semantic HTML will help you to ensure that your website will remain compatible with future technologies and web standards.

These are the main benefits that you will get from writing HTML code that is readable and meaningful. Even if you got used to using a lot of div in your structure, it is worth investing time and effort to change this habit.

Best practices with semantic HTML

Of course, the knowledge that semantic HTML elements exist is not enough. There are some good practices that we can follow to write the structure that will be easily readable not only by other developers but also by web crawlers and various devices:

  • If your website contains different parts of the website, let’s say components, most likely it is a good idea to use the section element to define them.
  • If the sentence or word in the paragraph has some special or essential meaning, use em element. The parsers will understand that you want to emphasize this text part.
  • Don’t use header elements like h1 to make the text bigger. Use styles instead.
  • Don’t use strong or em elements for text that should not be treated as important by the parsers. Apply styles if you want text just to be bolded or formatted with italic style.
  • Always order header elements by their importance.
  • Don’t use semantic elements to style elements.

I believe these are the most important practices we should follow when creating the semantic structure for the HTML document. Let me know in the comments sections what are your good practices.

The role attribute

Semantic HTML elements have an implicit role which means that the browser will identify their role in the document just by looking at their name. It is worth mentioning that by using the role attribute, which is a global attribute and it’s valid for all elements, we can inform the device how it should interact with non-semantic elements.

If you want the div to behave like a button semantically, you have to define its role:

<div role="button">content</div>

It won’t change the style of the element, but it will inform screen readers that the element is a button. For example, it will be added to the document’s tab ordering sequence. Of course, it’s easier just to use the button element, but the approach with the role attribute is an alternative solution.

If you would like to check all available roles, visit the Mozilla documentation page that explicitly mentions all of the roles.

Accessible rich internet applications

Last but not least, when talking about semantic HTML, I have to mention ARIA - accessible rich internet applications. It’s a W3C specification designed to improve keyboard accessibility and interactivity, enhance accessibility for interactive controls, and much more.

ARIA specification has three main components, and one of them is role - the attribute we discussed in the previous paragraph. The other two components are properties and states.

There is much more to learn about ARIA and the way we should build web applications to make them more accessible, and it’s definitely beyond the scope of this article. You can read more about this specification here.