Introduction to semantic HTML
CTO / Software Engineer
How many div elements did you use in your last project? There is a high chance that the amount is quite big. Writing non-semantic HTML code is usually a sin of backend developers who occasionally write frontend stuff but not only. If you fall into this category or you don’t know what semantic HTML is, you are in the right place.
I still remember my first website. The whole structure was based on tables, and there was no formatting besides the font size and color. It feels like it was yesterday, but over 17 years have passed since that day. The technology has completely changed since then, and so have the rules for writing HTML code.
This article is a complete guide to writing semantic HTML. HTML code not only presents the information correctly but also is easily searchable by search engines and interpreted by various devices (also those used by disabled people).
If you had a chance to work with the frontend side of web development for a while, most likely, you would know what DOM is. If not, here you are: the shortcut DOM stands for Document Object Model, and it’s a structure that is created by the browser as it parses your website. Thanks to it, you can later make your app more interactive using JavaScript.
Besides DOM, the browser also creates AOM, the semantic version of the DOM. This version is used by assistive devices like screen readers or magnifiers. Thanks to this standard, the web is easily accessible for people with certain impairments: visual, cognitive, hearing, or mobility.
Basically, if you are using too many div
or span
elements in your website structure, you make it harder for people with disabilities to access the information. But not only. The new WC3 standard also is used generally to better render websites on various devices also for people without disabilities. The goal is to render HTML correctly for everyone on every device.
If you are curious about the structure of the Accessibility Object Model of a given website, you can access it the same way you can access the DOM tree. For example, using Google Chrome, open the developers tool panel, select the Elements tab, and on the right side of the console, select the Accessibility tab:
As you can see, non-semantic elements are marked as generic
, and other semantic elements are identified with their names. The structure on the screenshot is mixed, but it’s hard to get rid of div
elements completely.
You don’t have to check the AOM tree by yourself. You can perform a quick audit by using the Lighthouse tool. Visit pagespeed.web.dev, type the URL you want to audit, and check the accessibility tab.
Now that you understand what AOM is and how important role it plays in the modern internet, it is a good time to find out which elements of HTML code are considered non-semantic and which are semantic.
I mentioned before that some of the most widely used non-semantic HTML elements are div
as blocks and span
as inline elements. They are considered as non-semantic because:
div
and span
elements.
Besides div
and span
, as non-semantic elements, we can also consider <br>
and <hr>
elements.
Now that you know what non-semantic elements are, it’s easy to tell which elements you can consider semantic:
There are way more semantic elements than non-semantic elements. The reason is simple: in modern websites, we can put a lot of different things, and to make it easy for devices to understand the structure, we have to use various elements.
The most popular semantic elements include:
section
- a separate group of elements. For example, the pricing table, the hero section, or the part of the website where the testimonials are presented.
article
- independent part of the content. It should be possible to distribute the content of the article element independently from the rest of the website. An example is an article, user comment, or forum post.
aside
- as the name states, it represents the element that is indirectly related to the main content, which in most cases is just a sidebar on the left or right side of the website.
header
- usually represents the navigation of the website, contains information about the author, or includes headings of different levels.
nav
- element used to present the navigation part with the set of links to different parts of the website.
footer
- an element that usually represents the content at the bottom of the website.
Those are just a few most popular semantic elements. Their names are self-explanatory enough, so you should intuitively know what kind of content you should put inside. Other popular semantic elements are: time
, figure
, main
, or address
.
We can also categorize semantic elements into text, structure, and uncategorized elements.
There is a high probability that you are already using a lot of semantic elements for text. These include the following popular elements:
h1
, h2
, h3
, h4
, h5
, h6
- for formatting headers of different importance and levels.
ul
and ol
- for unordered and ordered lists that include one ore more li elements that represent a single element of a list.
a
- for links.
p
- for paragraphs.
The structure describes how the text and other website elements are located on the screen. I already listed a lot of them, for example, article
or section
.
There are also a few elements that can’t be categorized either as text or structure elements:
img
- for rendering images.
table
- for rendering tables with one or more headers and columns.
iframe
- for embedded elements.
figure and figcaption
- for images that require description.
I mentioned that when you write semantic HTML, you make the website more accessible for various devices, including assistive devices like screen readers or magnifiers that help people with disabilities. This is not the only benefit:
class
and id
attributes as you use fewer generic elements. By default, you can distinguish them by their names.
header
, footer
, or main
elements.
These are the main benefits that you will get from writing HTML code that is readable and meaningful. Even if you got used to using a lot of div
in your structure, it is worth investing time and effort to change this habit.
Of course, the knowledge that semantic HTML elements exist is not enough. There are some good practices that we can follow to write the structure that will be easily readable not only by other developers but also by web crawlers and various devices:
section
element to define them.
em
element. The parsers will understand that you want to emphasize this text part.
h1
to make the text bigger. Use styles instead.
strong
or em
elements for text that should not be treated as important by the parsers. Apply styles if you want text just to be bolded or formatted with italic style.
I believe these are the most important practices we should follow when creating the semantic structure for the HTML document. Let me know in the comments sections what are your good practices.
Semantic HTML elements have an implicit role which means that the browser will identify their role in the document just by looking at their name. It is worth mentioning that by using the role
attribute, which is a global attribute and it’s valid for all elements, we can inform the device how it should interact with non-semantic elements.
If you want the div
to behave like a button semantically, you have to define its role:
<div role="button">content</div>
It won’t change the style of the element, but it will inform screen readers that the element is a button. For example, it will be added to the document’s tab ordering sequence. Of course, it’s easier just to use the button
element, but the approach with the role
attribute is an alternative solution.
If you would like to check all available roles, visit the Mozilla documentation page that explicitly mentions all of the roles.
Last but not least, when talking about semantic HTML, I have to mention ARIA - accessible rich internet applications. It’s a W3C specification designed to improve keyboard accessibility and interactivity, enhance accessibility for interactive controls, and much more.
ARIA specification has three main components, and one of them is role
- the attribute we discussed in the previous paragraph. The other two components are properties and states.
There is much more to learn about ARIA and the way we should build web applications to make them more accessible, and it’s definitely beyond the scope of this article. You can read more about this specification here.