Sitemap.xml or sitemap - a guide for beginners
When getting acquainted with the project, the SEO specialist must enter the phrase “sitemap.xml” into the address bar. Sitemap analysis allows you to find out why this or that content is not indexed.
- What is an XML Map.
- What are the types of XML Sitemap.
- How to find XML Sitemap.
- What elements does an XML Sitemap consist of.
- How to create an XML Sitemap.
- General information and Google recommendations regarding XML-Sitemap files.
- Bing general information and recommendations regarding XML-Sitemaps.
- How to build an XML-map for multilingual sites.
- XML sitemap for images.
- XML sitemap for video.
- Sitemap for news.
- How to embed a sitemap.
- Errors in XML Sitemap.
- Life hack.
What is an XML Map
XML-Sitemap - a file for search engine bots with a list of all pages of the site in XML format. It is needed so that search engines crawl and index the content of the site more efficiently.
💡 Do not confuse XML-Sitemap and html-sitemap for site users.
What are the types of XML Sitemap
Two types of sitemaps:
- ordinary - contains no more than 50,000 pages and weighs no more than 50MB;
- index - a file with a map combines several regular site maps. Designed for large or
- multilingual sites. These files have a maximum size of 50 MB and a maximum capacity of 50,000 URLs.
How to find an XML Sitemap
Several ways to view the sitemap:
1. In the robots.txt file. Write in the address bar: https://site.com/robots.txt. The XML-Sitemap directive will probably be written in the file itself in the following format: sitemap: https://site.com/sitemap.xml
2. If you could not find a link to the file in robots.txt, write the following request in the address bar: https://site.com/sitemap.xml
💡 If the url format for the Robots.txt file is strictly required - /robots.txt, then the url for the Sitemap.xml file can be anything.
/sitemap.xml is just a more popular XML map name, but it can be different, for example: /sitemap-categories.xml, /sitemap-en.xml, and so on.
1. You can also make a request in a search engine using search operators. You need to use two operators:
- site: - searches for the exact address;
- filetype: - looks for the required file type;
To search for an XML type file, you need to form a search query:
What elements does an XML Sitemap consist of?
As we already know, a sitemap can be regular and indexed, below we will consider what elements each of these types consists of.
Elements of a regular sitemap
- the first line specifies the XML version and the required encoding for sitemap files — UTF-8:
- - a tag that indicates the standard of the current protocol. Is the parent of the tags below;
- - a tag for each URL entry. Is the parent of the tags below and the child of ;
- is a tag that points to the exact URL of the page. Is a child of .
- - a tag that indicates the last date the page was updated. Is a child of . Unlike the previous tags, this is optional. Note that Google only considers the value of this tag if it matches the actual time the page was last updated. When writing a date in this tag, use the W3C Datetime format. This format provides a complete date with hours, minutes, seconds and time zone (YYYY-MM-DDThh:mm:ss+TZD). For example: 2022-05-16T19:20:30+03:00;
- - a tag that indicates the approximate frequency of page refreshes. Valid values: always, hourly, daily, weekly, monthly, yearly, never;
- - a tag that indicates the priority of the page compared to other pages. The value is specified in the range from 0.0 to 1.0.
💡 According to the latest data from the Google Search Center, the search engine does not consider tag values and .
XML sitemap example:
What elements does an XML Sitemap consist of?
- using a content management system (CMS). Systems such as WordPress or Wix can generate a search engine-friendly sitemap. You need to find information about how the sitemap is generated in the CMS you are using - the process is automatic or you need to perform some operations for this;
- manually. If the site is small, you can create a sitemap yourself using a text editor and following syntax standards;
- using third party generators. There are many services that can generate sitemaps. Among them:
- xml sitemap generator from https://smallseotools.com/ (the free version has a limit of up to 500 pages);
Of course, there are many such generators, you can find a convenient one for yourself.
- using Netpeak Spider. In this case, follow this algorithm:
- Crawl as many URLs as you need.
- Open the Sitemap Generator tool.
General Information and Google Guidelines for XML Sitemaps
- Google will crawl the URLs you provide. Therefore, specify correct and accurate URLs.
- All URLs that you enter in the sitemap must be from the appropriate domain. Do not specify another domain/subdomain.
- A sitemap can be placed anywhere on the site, but will only affect directories below the parent directory. Therefore, you should place the XML-Sitemap in the root directory of the site.
- A link to a regular XML file or to an index file can be specified in the Robots.txt file as follows: sitemap: https://site.com/sitemap.xml.
- Sitemaps must be UTF-8 encoded, meaning only ASCII characters must be used.
- If the page addresses contain other characters, they must be escaped. This usually happens automatically unless you create page addresses manually. If the characters in the URL are not correctly encoded and escaped, then when you add the sitemap, you may receive a Google alert that no pages were found from your XML Sitemap.
- Google does not guarantee to crawl every URL in a sitemap. This file only helps the system determine which pages you consider important.
- Google ignores the order of URLs in the sitemap.
- The XML-Sitemap file must be no more than 50,000 pages in size and no more than 50 MB in weight. If the weight is greater, create an index sitemap that will contain several sitemap files.
- Include in the XML-Sitemap only canonical pages that are open for indexing and crawling, giving a response code of 200, excluding pagination pages.
- All URLs in the XML Sitemap must be opened in robots.txt for crawling, indexing, and must not contain the "noindex" meta tag.
The sitemap should be automatically updated regularly when adding / deleting, closing / opening specified pages for indexing.
Bing General Information and Best Practices for XML Sitemaps
How to build an XML map for multilingual sites
There are three main ways to indicate to the search engine that multilingual versions of pages are not duplicates:
- the rel=”alternate” hreflang=”x” attribute in the page code is the most common way;
- using XML Sitemap;
- using http headers.
It should be noted that in 99% cases, one way to indicate that a site is multilingual is enough - using the rel=”alternate” hreflang=”x” attribute.
💡 If you are making a sitemap for a large site, you can additionally specify multilingualism using XML-Sitemap.
To specify alternative language versions of a page in an XML Sitemap, you must:
- specify a namespace in a block :
- within the tag , below the tag , which contains the URL of the page, specify the tag for each language version of the page, and within - rel="alternate" hreflang="x" attributes, which will indicate a specific language version.
For example, a page has three language versions: Russian, Ukrainian, and English. The URLs for the language versions of this page look like this:
In XML-Sitemap multilingual versions of the page will look like this:
XML sitemap for images
- Specify links to them in a regular XML-Sitemap.
- Create a separate sitemap for images.
In both cases, you must specify the XML namespace that defines the tags for the images:
Also within the tag required tags for images:
- - contains all information about the image. Up to 1000 images can be specified for one page.
- - location of the file. In some cases, the image URL may be different from the site's main domain. In order to properly crawl content in such cases, both domains must be verified in Google Search Console.
Also in the xml sitemap for images, you can find optional tags that, according to the Google Search Center, are not taken into account by the search engine, namely:
- - caption to the image;
- — shooting location (country, city, and so on);
- — image name;
- — Image license URL.
In addition to these tags, the sitemap for images must meet the following requirements:
- the encoding used is UTF-8;
- The XML sitemap for images should contain no more than 50,000 URLs and be no larger than 50 mb. If the sitemap goes beyond these limits, you must create a sitemap index file.
- this type of sitemap should contain only canonical pages that are open for indexing and crawling, giving a response code of 200;
- each URL has no more than 1000 images;
- XML-Sitemap for images should contain only full-sized images without thumbnails;
- A link to an XML map for images or an index file must be placed in robots.txt;
- The XML Sitemap for images should be automatically updated regularly.
An example XML map for images that has one page and two images:
XML sitemap for video
A video sitemap is a way to let the search engine know if there are videos on the page, especially if they're new or hard to find. This is an important aspect of search engine optimization, especially if you want your videos to show up in search results.
General information and recommendations from Google regarding XML sitemaps for videos:
- The encoding used is UTF-8.
- Each video sitemap file can contain up to 50,000 video elements and not exceed 50 MB in size. If you exceed these limits, you can, as for the main sitemaps, create an index file that will contain information about the usual XML sitemap for the video.
- You can create a separate XML sitemap for the video, or you can embed video information into a regular sitemap.
- It is allowed to specify several videos from one page.
- Do not enter information about videos that are not related to the main content of the page. Otherwise, the video may not get into the search engine index.
- Googlebot ignores the Sitemap entry if no video is found at the specified URL.
- Creating an XML Sitemap for a video does not guarantee file indexing.
- The specified pages must be canonical, open for indexing and crawling, return a 200 response code.
- Googlebot must have access to both the video file and the player. They should not be placed on pages that require authorization, prohibited in robots.txt, or blocked in other ways.
- Place a link to the XML sitemap or index file in robots.txt.
- The XML Sitemap for the video should be automatically updated regularly.
Let's consider what elements the XML-Sitemap for video consists of.
First, you need to specify the namespace in which the tags will be defined:
Also, when creating a sitemap of this type, you must specify the following required tags:
- - a tag that specifies the standard of the current protocol. Is the parent of the tags below;
- - a tag for each URL entry. Is the parent of the tags below and the child of ;
- is a tag that points to the exact URL of the page. Is a child of ;
You can also specify recommended tags:
, as well as tags and .
What a video sitemap might look like:
Sitemap for news
For news sites, you can create a separate map with dynamic generation and daily updates. These files will only work for resources included in the Google News listings. If the site is not in the list, you can send a request to add it.
The sitemap file should only contain the URLs of articles published in the last two days. Articles older than two days can be removed from the file and remain in the Google News index for 30 days.
This sitemap can contain no more than 1000 URLs. This limitation is due to the fact that XML sitemaps for Google News are crawled more frequently than regular sitemaps, and thus the search engine avoids excessive load. If more content appears on the site in two days, you can create a sitemap index file for several maps.
Google recommends updating the Google News XML Sitemap as new content is posted. Such a sitemap must be placed either in the root directory or in the news section of the site.
Main elements of sitemap for news:
- namespace for news sitemaps:
- — parent tag for all news tags;
- the publication that published the article. Contains two required child elements:
- - name of the publication;
- - language in the format ISO 639-1;
- — exact date in W3C format;
- - the title of the article, which should be indicated in the same form as on the site.
Example sitemap for Google News:
How to embed a sitemap
Several ways to point the search engine to the XML-Sitemap:
- using Google Search Console;
- execute a ping request - send a GET request to the specified address, specifying the full URL of your XML Sitemap:
FULL_URL_OF_SITEMAP - The full URL of the XML sitemap.
- place the sitemap address in robots.txt - it will be detected during the next crawl of the site. Example:
The XML-Sitemap is only parsed the first time it is encountered, not every time the site is crawled. If you have made changes to the file, please notify the search engine with a ping request.
Errors in XML Sitemap
By following the instructions above, you can avoid common mistakes when creating a sitemap. If, nevertheless, an error occurred while creating this file, you can see it in the Google Search Console in the “Sitemap files” item:
Insert a link to the relevant sitemap and click the "Start" button:
After scanning, the validator will indicate errors in the sitemap (1). After clicking the "To Table" button (2), the URLs of the pages will be moved from the validator to the working field of the program, where you can continue working:
Some experts argue that large sitemaps are not always fully crawled and internal links are not always quickly indexed. There are some cases where setting the sitemap capacity limit to 10,000 pages or 1,000 pages gave better results.
We can conclude that if you have certain problems on the site with url scanning and indexing, or, for example, if you need to quickly drive new pages of product cards into the index, you can try to split your sitemap into smaller parts and add them to the index sitemap .
Smaller lists of URLs are supposed to be easier for a search engine to process. At the same time, sitemaps should not be split too small, into tens of thousands of files, since Google Search Console only shows information about 1000 sitemap URLs in its reports, that is, you may not get data about XML-Sitemaps URLs from GSC.
It is necessary to calculate the volume of each sitemap, based on the size of the site. Based on some cases, you can test the fragmentation of sitemap files by sections, the number of URLs, and the novelty of the content.
An XML sitemap is needed by search robots to discover and index the necessary pages of a site. It contains the URLs of the pages on the site, as well as additional data related to them, such as when they were last updated. It is very important to comply with the requirements for files of this type so that the search engine scans and indexes the necessary pages of the site in time.
Separate cards can be created for images, videos. XML can also be marked up for Google News.
Creating a map manually is only worth it if your site is small, otherwise it can take a very long time.
Use CMS tools, generators, and other software to create sitemaps, and periodically check your XML for correctness.
The sitemap should be automatically updated regularly so that the search engine bot, as soon as possible after the update, enters the actual versions of pages into the index, or does not crawl pages for which instructions and access rules have been changed.
A source: Netpeak.net