What is Index Coverage? : Explaining its Relationship with SEO

2024.05.29

What exactly is index coverage? It’s one of the features found in Google Search Console . It allows site managers to check the status of content indexing, making it an essential tool to understand. This feature becomes particularly useful when pages don’t appear in search results or when it’s unclear if content has been indexed.

Index coverage enables the identification of the causes behind the indexing status of pages. Several status types are displayed, each requiring specific handling to improve the situation.

In this article, I’ll explain index coverage in detail, incorporating its relevance to SEO.

What is Index Coverage?

Index coverage is a function within Google Search Console that helps examine the indexing status of site content.

For Google to reflect your content in search results, it must be indexed. Unindexed content will not appear in search results, affecting site traffic. Using index coverage helps verify the registration status of your content.

*You can also use Google’s “site: search” to check if content is indexed, but for a thorough check on unindexed statuses, Google Search Console is recommended. Remember, the “site: search” count is approximate, so relying on Google Search Console is better for accurate index numbers.

The Relationship Between Index Coverage and SEO

Using Google Search Console, you can quickly determine if your content is indexed. Google must crawl and index content to evaluate it.

(Top left) Not registered

If your content isn’t indexed, regardless of its quality, it won’t appear in Google’s search results and won’t be SEO evaluated. Therefore, it’s crucial to use index coverage to monitor page status and address errors immediately.

Meaning of Statuses Displayed in Index Coverage

The statuses displayed in index coverage include the following seven points.

1.Error

2.Valid

3. Valid (with warnings)

4.Excluded

5. Alternate page (with appropriate canonical tag)

6.Duplicate page

7.Crawl error

Each situation requires a different approach. It’s important to understand the meaning of each situation and apply the appropriate solution.

i mark

Depending on the status displayed, the way to handle it will vary. First, understand the meaning of the status, and then take the appropriate action.

Error

This status appears when a page has not been indexed due to various reasons, such as server errors or being blocked by robots.txt. There are primarily four reasons for errors (which will be discussed later).

Valid

This status means that the page is indexed without any issues. If searched for, the page should appear in the results, confirming that it is attracting search traffic.

Valid (with warnings)

“Valid (with warnings)” means that while the page is indexed, there are areas that need attention. Such content may show up in search results with a description stating “No information available for this page,” which could negatively impact SEO ratings.

Excluded

“Excluded” refers to pages not indexed for reasons other than errors, such as the following.

-Alternate page

-Page duplication

-Crawl error

-Crawled – not indexed

-Discovered but not indexed

Alternate pages (with appropriate canonical tags)

Recognized as alternative pages for a preferred page due to canonical tags. For instance, if the content on desktop and mobile sites overlaps, setting a canonical tag allows for the evaluation to be consolidated under one page. This helps communicate the desired URL to crawlers. No specific actions are required for this scenario.

Duplicate pages (without appropriate canonical tags)

Google may flag these as duplicates. Duplicate content often leads to lower evaluations, , which necessitates improvement. It’s effective to rewrite or restructure the content to avoid overlaps with competing sites or other content.

Crawl Errors

These occur when accessing a URL leads to an error. The URL Inspection tool in Google Search Console can help identify the cause. Common reasons include typing errors in links or status errors, which should be addressed by correcting the links and responding according to the specific status code.

Crawled – Not Indexed

The page has been crawled but not added to the index. No resubmission is necessary as it doesn’t guarantee indexing. If critical content remains unindexed, improvements might be needed to appear in search results.

Reference: Index Coverage Report (Search Console Help)

Reasons for not being indexed can include duplicate content, RSS feeds, and subsequent pages of paginated content not being evaluated. If the issue is duplicate content, a review is necessary. However, if RSS feeds or later pages of paginated content are not evaluated, no specific improvements are required. Yet, if pages you want to be crawled are not indexed, improvements are necessary to ensure they appear in search results.

Discovered – Not Indexed

This status occurs when a page has not been crawled, preventing indexing. Typically, this happens when Google anticipates server overload and cannot perform crawling effectively.

Reference: Index Coverage Report (Search Console Help)

To address this, increase the frequency of crawler visits by regularly updating the site and ensuring that the homepage is not overloaded. However, if server load is the issue, crawling and indexing can occur during off-peak times when the load is lower.

Addressing Index Coverage Errors

The errors displayed in the Index Coverage report mainly fall into the following seven categories.

How to address Server Error 5XX
How to address Soft 404 Errors
How to address ‘Page Not Found (404)’
How to address issues with robots.txt
How to address noindex tags
How to address Redirect Errors
How to address 403 Errors

Understanding each error’s meaning and appropriate response is efficient since their meanings and remedies differ.

Dealing with Server Errors (5XX)

Errors in the 500 series indicate an unidentified problem on the server. Possible causes vary, including server overload or insufficient memory. Addressing this involves reviewing the server or submitting a sitemap.

Handling Soft 404 Errors

Soft 404 errors occur when a page returns a status code 200 (indicating the page exists) for a non-existent page. Ideally, a 404 not found status should be returned for non-existent pages. If multiple soft 404 errors occur, it may delay the crawling of desired content.

To address this, change the status code of non-existent content to 404. If an error occurs on a page you want to index, the content may be insufficient

Dealing with Page Not Found (404) Errors

When a page is not found (404 error), it means the requested URL does not exist. While soft 404 errors return a status code 200 for non-existent pages, a 404 error specifically indicates a missing page.

To address this, if 404 errors are intentional, remove the URL from the sitemap. However, if a page exists but displays a 404 error, correct URL input or set up redirection is necessary.

Handling robots.txt Issues

When crawlers are blocked by robots.txt, it indicates a state of blocking. It’s not an issue for pages that shouldn’t be evaluated. However, for pages intended for evaluation, unblock crawling through file editing. After modifications, use the robots.txt tester tool to confirm whether crawling is blocked.

Addressing noindex Tags

Pages noindexed in the XML sitemap are not indexed. If noindex tags are intentional, no action is necessary. However, for content intended for indexing, remove the noindex tags.

Handling Redirect Errors

Redirect errors occur when there are too many redirects or a redirect loop. Errors will occur if there are more than ten redirects.

Reference: HTTP Status Codes, Network Errors, and DNS Errors, Google Search | Documentation | Google Search Central

To address redirect errors, ensure that the number of redirects is reduced to ten or fewer, revise the system, and limit the URL to 2,000 characters or less. Exceeding 2,000 characters is rare, but if it does happen, the URL must be shortened.

Addressing 403 Errors

A 403 error means access to the page is denied because the user does not have the necessary permissions. Users can try reloading the page or waiting for the site administrator to resolve the issue. If the site administrator intends to restrict access, no action is needed. However, if the restricted access is unintentional, the following steps should be taken.

Check server load
Verify .htaccess file for errors
Ensure DNS is functioning correctly
Check for false positives in the Web Application Firewall (WAF)
Verify permissions settings

Most causes are errors in the .htaccess file or a high volume of traffic. To distinguish between these causes, first check the access logs. If high traffic is the issue, consider upgrading the server. However, if it is unlikely that high traffic is the cause, it is efficient to review the content description.

Verifying Fixes After Addressing Errors

After fixing errors, verify that the corrections have been implemented. Use the “Fix Verification” feature in Google Search Console for faster verification. Google will check if the errors have been resolved and provide feedback within a few days, indicating pass or fail. If it fails, another verification attempt is needed.

Summary

Using index coverage enables you to reveal the indexing status of pages and identify errors. To ensure content is evaluated, first confirm that it is indexed. Index coverage is a useful feature when content is not appearing in search results despite being uploaded. Addressing errors promptly is essential to maintain SEO rankings and ensure content is indexed.

Author Profile

Mr. Takeshi Amano, CEO of Admano Co., Ltd.

Mr. Takeshi Amano is a graduate of the Faculty of Law at Nihon University. With 12 years of experience working in the advertising agency industry, he discovered SEO and began his research during the early days of SEO. He self-taught and conducted experiments and verifications on over 100 websites. Using this expertise, he founded Admano Co., Ltd., which is currently in its 11th year of operation. Mr. Amano handles sales, SEO consulting, web analytics (holding the Google Analytics Individual Qualification certification), coding, and website development. The company has successfully managed SEO strategies for over 2000 websites to date.