Six ways to check the indexing of a site or page
For a site to appear in search results, all of its pages that are open for indexing must become part of the search engine's index. If the system fails to add site pages to the database, they remain virtually invisible to potential customers.
I propose to consider six ways to check indexing. We will find out why search engines do not index the site, we will find ways to solve problems.
- How does site indexing work?
- Checking in Google Search Console
- How to check the indexing of a specific page in Search Console?
- Analysis of site indexing in Google search
- Checking indexation online using services
- SEO Tools for Checking Websites
- Google Indexing API
- Why isn't Google indexing the site?
- How to speed up site indexing?
- How to manage your crawl budget?
How does site indexing work?
At first glance, it might seem that crawling and indexing are two very similar processes. They are indeed closely related, but they are different stages of processing the site by search engines.
Crawling is the process of finding pages for further processing and indexing.
Indexing is the process of adding a scanned page to the database (index) of a search engine. Thus, with a relevant user query, the page will appear in the search results:
Important. Crawling a site does not mean indexing it. And getting a site into the search engine index does not guarantee the first positions of search results, since ranking is responsible for the order in which pages are displayed by a key query.
How to check a site for indexing
Indexed means visible. It is very important that all pages that need to be shown to potential customers, users or readers are indexed. Otherwise, it may turn out like this: the product or service is the best on the market, the article fully reveals the essence of the issue, but if the pages of your site are not in the search engine index, no one will know about it.
That is why it is important to be able to check indexing.
1. Checking in Google Search Console
This is one of the basic ways to check indexing for the site owner and webmaster.
Sign in to your account at Search Console and go to the "Pages" tab in the "Indexing" section:
In this report you will find general information about indexed and non-indexed pages displayed on the timeline:
By going to the full report, you will see detailed data on all pages checked for indexing:
To view a report on non-indexed pages and indexing errors, return to the Page Indexing section:
By clicking on the left column "Reason", you can see the lists of site pages that are not available for indexing:
How to check the indexing of a specific page in Search Console?
Paste the link to the page you are interested in into the Search Console search bar:
As a result of the check, you will see one of the following messages:
- the page has been successfully indexed;
- The page is not in the index.
If you have made all the necessary changes, but the page is still not in the index, submit a rescan request yourself:
- Enter the page address in url checking tool.
- Click the Request Indexing button.
The same method can be used to speed up the indexing of new pages of the site - but no more than 10 per day.
When scanning several pages at once, it is recommended to use the Sitemap.
2. Analysis of site indexing in Google search
Search operators (such as "site:") help refine your search results.
To find out if your site is indexed by Google, type "site:[url of your site]" in the search bar:
To check the indexing of a particular page, use the operator in the format "site:[url of your site's page]":
If the site or page is indexed successfully, it will appear in the search results. Information about the approximate number of pages on your site that were indexed by the search engine will appear:
In addition, search tools can analyze the number of pages indexed by a search engine in a given period of time:
3. Checking indexation online using services
There are a large number of free online services to control the indexing of a site, such as linkody.com or smallseotools.com. Their disadvantage is the limits: from 5 to 50 pages.
However, if your site is connected to Google Search Console, with the help of the new tool from Serpstat, you can check the indexing of up to 100 pages in one request.
- Go to the "Tools" section and select "Page Indexing":
2. Click the "Connect Google Search Console" button and sign in with your Google account:
3. After authorization, in the upper left part of the page you will see a list of all sites that you have access to using Google Search Console:
4. Enter up to 100 URLs to check the indexing of the site and click the Check Pages button. The results of this operation will be displayed on the right side of the service.
4. SEO Tools to Check Websites
Special programs designed to scan and analyze the main parameters of the site in order to identify errors can also check indexing.
When scanning a site in Netpeak Spider, you will receive a list of pages and parameter values that affect site indexing: robots.txt, canonical, meta robots, X-Robots-Tag, presence of redirects, etc.:
By uploading the resulting list to Netpeak Checker, you can check if the page or site is in the search results:
To use the bookmarklet, open the Index Check file and drag the link to the bookmarks bar:
Then find the page or site you are interested in and simply click on the bookmark. A new Google tab will appear with the search term, using the "site:" qualifying operator, and indexing results for the query.
6. Google Indexing API
Google Indexing API is a free indexer from Google that allows you to:
find out when the last time Google received information at the requested URL;
send a request to re-index the page;
submit a request to remove the page from the index.
An important advantage of the Google Indexing API is the ability to batch index requests combined into one HTTP (up to 100 pieces). You can use no more than 200 requests per day, but if your pages have JobPosting or BroadcastEvent markup implemented, you can apply for a quota increase.
Read more about indexer quotas and connecting it. in Google Help.
Why isn't Google indexing the site?
New site. Depending on the number of pages, the full indexing of a new site can take Googlebot from one week to 2-4 months.
Forced closing of pages from indexing.
The noindex tag prevents the crawler from indexing pages. To avoid problems caused by incorrect use of the noindex tag, check the following directories:
robots meta tag
Placed in HTML code and acts at the page level. An example of a robots meta tag that disables crawling is −
X Robot Tag
It is an HTTP header placed in the server configuration file. Example of X-Robot-Tag disallowing scanning −
X-Robots-Tag: noindex, nofollow
Using robots.txt it is impossible to directly control the indexing of the site, but the incorrect configuration of this file can lead to a complete closure for crawling by search robots.
The line in robots.txt that completely blocks the site from crawling -
User-agent: * Disallow: /
Incorrect use of the canonical tag. The canonical tag is used to combat duplicate content, which can also be one of the reasons for poor site indexing. It indicates to the search robot the preferred version of the page. When crawling, the URL specified in the canonical tag will be perceived by Googlebot as the main version, among pages with similar content.
In addition to settings that directly limit or prohibit site crawling, the following factors that negatively affect indexing speed should be noted:
- Missing sitemap.xml sitemap. It helps search robots understand its structure and crawl pages faster.
Absence or insufficiency of internal links. Correct and uniform linking on the site distributes the link weight of the pages, and adding links to new pages of the site, as well as pages that are already in the index, helps search engines find them faster.
- Site speed. Googlebot is not permanently on the site. It only periodically scans the content of its pages. The slow loading speed of the site complicates its indexing and the work of search robot algorithms.
- Content quality. The latest updates to Google's search engine involve improving the quality of content and creating authoritative materials - primarily for people, and not for manipulating search algorithms. When evaluating the quality of your site's content, Google takes into account spelling errors, accuracy of information, originality, and other factors. You can learn more about creating useful content in Google Help.
- Lack of mobile optimization. For new sites created after July 1, 2019, Google uses the Mobile-First index indexing method, which gives preference to indexing the mobile version of the site. This decision of the company is due to the fact that already in 2016 more than half of the requests to Google were made from mobile devices.
- Ignoring errors in Google Search Console. Examine the error report in the Google Search Console. See which of the errors found affect the indexing of the site and fix them. Also check if your site has been subject to manual sanctions. Perhaps his domain name was previously restricted due to a violation of Google's policies.
How to speed up site indexing?
Googlebot's ability to crawl a site is limited by the crawl budget - the limit for checking pages per visit. You can find out the approximate crawl budget of your site in the Search Console (section "Indexing") - view the report on the number of pages processed per month by the search engine:
The crawl budget varies according to the algorithm laid down by the search engine: depending on the size of the site, its speed and needs. If many errors or junk pages are found, the budget can be spent faster.
Factors negatively affecting the crawl budget:
- page duplicates;
- non-unique content;
- broken links;
- too many redirects;
- slow website speed.
How to manage your crawl budget?
In addition to the actual management of the crawling budget using the Google Index API indexer, you can reduce its consumption through technical optimization and improving the quality of content.
Check the content of the XML sitemap, make sure that all the pages specified in it are open for indexing and return the server response code 200, set the automatic addition of new pages (open for indexing) to the sitemap.xml file.
The date the page was modified.
Set the Last-Modified and If-Modified-Since HTTP headers to point Googlebot to pages whose content hasn't changed since it was last visited (they don't require re-crawling).
Website loading speed optimization.
The server response time should not exceed 200 ms, and the page load speed should not exceed 3-5 seconds. Check website speed with PageSpeed Insights, optimize pages based on recommendations.
Redirects and broken pages.
Do not allow a large number of redirects and broken pages on the site. Otherwise, instead of visiting new content, Googlebot will spend its crawl budget on clicks on broken positions.
Add links to new content on the main page of the site, display a preview of the announcement of recent articles, show new products. Organize linking between old and new materials, post links to new pages on social networks - this will help the search robot to find and crawl them faster.
Site update frequency.
Search engines prefer relevant and high-quality information. Regularly update the content of your site, add new content at least once a week, update and supplement the information on old pages.
Six ways to independently check the indexing of a site or page:
- In Google Search Console.
- With the help of search operators.
- Online services.
- SEO tools such as Netpeak Spider or Netpeak Checker.
- Using bookmarklets.
- Google indexing API.
To improve site indexing:
- Customize and optimize the sitemap.xml sitemap.
- Check your robots.txt file settings and use of the noindex tag.
- Keep track of the number of redirects and the presence of broken pages.
- Work on relinking the site.
- Improve page loading speed.
- Create quality content and work on updating and improving it regularly.
- Optimize your crawl budget.
- Don't forget about the mobile version of the site.
- Regularly review bug reports in Google Search Console.
A source: Netpeak.net