XML sitemaps are files containing the most important – or all – URLs of your site, created with search engines in mind. Using and submitting a XML sitemap to search engines is best SEO practice, which can help the process of a site getting indexed.
Most WordPress users leave XML sitemap generation and submission to plugins, such as Yoast SEO and All in One SEO. Every now and then, though, it becomes obvious that you cannot trust such automated solutions.
Especially large sites or sites with long category and tag lists tend to not be handled properly by plugins, resulting in not optimized XML sitemaps and tons of rubbish getting indexed and shown to potential site visitors.
This post will demonstrate to you, why you, as an owner of a site with advanced structure and tons of content, cannot depend on plugins to create an optimized XML sitemap. You will also be guided step-by-step through the process of manually creating your own optimized XML sitemap.
Unreliable Plugins: XML Sitemap with Errors
I have read a lot of posts on generating XML sitemaps for WordPress sites and the vast majority of them recommend the use of plugins or online sitemap generator services. For the most of the time, plugins do indeed do the trick and deliver a nice dynamic sitemap that evolves along with your site.
Sometimes, however, issues occur due to different reasons: plugin incompatibility, server misconfiguration, or settings from other installation affecting the current one (in the case of subdomains and subdirectories). In such cases, using a plugin to generate a XML sitemap can result in various errors, as the following example will show.
A site I am working on, uses All in One SEO to generate its sitemap. All in One SEO is one of the two most popular and most reliable SEO plugins out there, so I thought that it should be fine to use it to automate sitemap creation.
I set it up to only include vital content: pages, posts, courses, products, and media attachments (see the screenshot below). Courses and products are theme-specific post categories.
I instructed the plugin to ignore all other post types and taxonomies, as the screenshot shows. In addition, I reviewed all of the site’s pages and set a bunch of them (demo-pages or pages, such as the cart) to not be indexed and to be excluded from the sitemap.
However, when checking out the generated sitemap, I noticed that many of the included URLs weren’t supposed to be listed at all.
Keep in mind that these are only a part of the problematic URLs in the sitemap.
In addition, it turned out that the sitemap returns warnings upon submission to Google Search Console.
Lastly, when I performed a check in Google Search Console to see, which of the site’s pages appear in search, I was alarmed by the number of resources, tags, categories, even widgets, that outperform the site’s actual pages.
While it is true that search engines only partially depend on sitemaps and only sometimes follow instructions to not index resources, the quality of the sitemap is still an important factor for the proper indexing of a site.
Manually Creating XML Sitemap
XML sitemaps can be manually created in a simple plain text editor, such as Notepad for Windows, and should be saved with a xml extension. They use a sitemap protocol, consisting of XML tags, and must be UTF-8 encoded.
Each sitemap starts with an encoding declaration stating which encoding is used (UTF-8 or UTF-16) and an opening tag <urlset>, within which the protocol standard should be defined. The sitemap ends always with the closing </urlset> tag.
For each URL that you want to include, you should use the two compulsory tags: <url> as a parent tag and <loc> for each parent tag.
Optional tags include <lastmod>, <changefreq>, and <priority>.
XML Sitemap Tags Explained
- URLSET: Encapsulates the file and contains information about the current protocol standard. This tag is required for all sitemap files.
- URL: URL is a parent tag, used for each URL that is added to the sitemap. All other tags are subordinate to it. This tag is required for all URL-entries.
- LOC: This tag encapsulates the actual URL of the targeted page. The URL must start with the protocol (HTTP or HTTPS) and end with a slash. There is a maximum of 2048 characters set for URLs. This tag is also required.
- LASTMOD: This tag is used to provide the date, on which the file was modified last. The format should be YYYY-MM-DD. It can also contain a time stamp, if necessary. This tag is optional.
- CHANGEFREQ: This tag provides search engine with general information about how often the page is likely to change. You can use the following values: hourly, daily, weekly, monthly, yearly, always, and never. This tag is optional and doesn’t guarantee that search engines will crawl the page as often as set in the sitemap.
- PRIORITY: This tag is used to assign indexing priority to pages on your site, relative to other pages on the same site. This means that you can use the priority tag to tip search engines, which pages are most important and must be indexed before other pages on the site. Assigning the same priority to all pages will make this tag useless. If you don’t assign a priority, the default value of 0.5 is applied. The tag is optional.
NB! Please note that all tags start with an opening tag and end with a closing tag. Failing to insert both an opening and closing tag for each entry will result in errors.
The example below shows, how a manually created XML sitemap should be formatted and provides a few examples of URLs, using different combinations of optional tags. You can, though, choose to use the same combination of tags for each URL.
Keep in mind that sitemaps’ location is crucial for, which files they can include. If your sitemap is located at the core of your site, that is, http://mysite.com/sitemap.xml, it can include any files with a URL starting with http://mysite.com. If, however, your sitemap is located in a subdirectory, that is, http://mysite.com/shop/sitemap.xml, it can only contain files residing in the shop subdirectory or in further subdirectories. It cannot contain files, residing in another subdirectory or its further subdirectories, for example, http://mysite.com/products or http://mysite.com/products/images.
Using Sitemap Indexes
XML sitemaps can easily get too large (over 50000 entries or over 50MB). In such cases, it is recommended to use multiple sitemap files, organized in a sitemap index. Each of the sitemaps in the index, as well as the index itself must not contain more than 50000 entries or be larger than 50MB (when uncompressed). You can have more than one sitemap index.
Sitemap index files start with the same encoding declaration as the sitemaps and use the SITEMAPINDEX opening and closing tags.
Besides the SITEMAPINDEX tag, sitemap indexes support the obligatory SITEMAP parent tag and LOC child tag, as well as the optional LASTMOD tag. Sitemap indexes’ tags are very similar to sitemaps’ tags. The only differences are that the URL tag is replaces by the SITEMAP tag and fewer optional tags are supported.
Upload of XML Sitemap
After manually creating XML sitemap or sitemaps and indexes, you have to upload them. Remember the rule mentioned above and upload sitemaps in the corresponding directories. For example, a sitemap of the subdirectory site division http://mysite.com/blog should be located in that subdirectory. Sitemaps of the main site must be uploaded to the root of your domain, that is to the public_html directory on the server.
It is recommended to try accessing the sitemap at the intended URL upon upload to make sure that the upload is successful.
XML Sitemap Declarations in robots.txt
Even though this step isn’t compulsory, it is a good idea to use the XML sitemap declarations in your robots.txt file to make search engines aware of your sitemaps. You can list one or multiple sitemaps, as well as one or more sitemap indexes.
This is done by adding the following lines (excluding comments and modified to fit your configuration and number of sitemaps/sitemap indexes) as the last entry in the robots.txt file:
You can submit manually created XML sitemap to search engines via the Webmaster Tools, made available by the majority of the largest search engines. To exemplify the process, a submission to Google is described step-by-step here:
- Log into your Google Search Console account
- Navigate to Crawl->Sitemaps
- Click on the Add/Test Sitemap button at the top right corner.
Enter the exact URL pointing to the sitemap or sitemap index and click on Test. By clicking on the Test button, you do not submit the sitemap but simply test it for errors. You get a report looking like this: Eventual errors will be listed under Error details.
- As long as no errors are found, repeat step 3 and click on Submit at the end.
- The sitemap should now be listed in the sitemap list under Crawl->Sitemaps as pending review. This means that it has successfully been submitted.
Updating the XML Sitemap
When using a manually created XML sitemap, it is important to remember to update it, as soon as changes occur on your site. In practice, this means that you have to update your sitemap on every publishing event.
The update process includes the following steps:
- Download your current XML sitemap from the server
- Open it with a plain text editor and edit the file as necessary by adding new entries or deleting expired ones
- Save the file without changing the name or the extension
- Upload it to the server and overwrite the existing file
- Re-test and re-submit the XML sitemap to search engines