SEO Measures for Beginners 1 : Checking Indexing
contents
- 1 How to Check If You’re Indexed
- 2 Reasons and Benefits for Checking Indexing
- 3 Low Content Quality
- 3.1 Cannibalism is occurring
- 3.2 Using Technologies That Google Cannot Read
- 3.3 Instructions That Block Indexing
- 3.4 Domain Has Been Penalized by Google
- 3.5 A Certain Amount of Time is Needed for Indexing
- 3.6 Handling Pages You Don’t Want to Be Indexed
- 3.7 Specifying Pages with noindex
- 3.8 Blocking Crawling with robots.txt
- 4 Frequently Asked Questions When Checking Indexing
- 4.0.1 Q: What exactly is indexing?
- 4.0.2 Q: Why is it important to be indexed?
- 4.0.3 Q: Is there a simple way to check if something is indexed?
- 4.0.4 Q: How long does it take for a new page to be indexed?
- 4.0.5 Q: My indexing is slow; are there any tips to speed it up?
- 4.0.6 Q: What does the number of indexes refer to?
- 4.0.7 Q: What is index coverage?
- 5 Summary
After building a website and publishing content, the next SEO measure to tackle is checking the indexing status (index/registration). Being aware of the indexing status of your website allows you to identify challenges your website or pages may face, revealing the next actions to be taken.
There are several checkpoints to consider when checking indexing, which you should familiarize yourself with by reading this article. Furthermore, if your site or pages are not indexed at all, improvements are necessary. Identify the reasons behind the lack of indexing and resolve the issues.
This article explains the first SEO measure for beginners: checking indexing. If you are a complete beginner, please read ” What is SEO? The Latest Guide to SEO and Complete SEO Checklist 100 for Beginners to Advanced ” before proceeding with this page.
How to Check If You’re Indexed
There are primarily two methods to check if your site is indexed.
Method 1: Check with Google Search Console
Method 2: Check with a Google search
Method 1: Checking with Google Search Console
Google Search Console, a tool for managing websites, allows you to check indexing status. As a web analytics tool provided by Google, Google Search Console offers high reliability and functionality, making it valuable for detailed investigations into indexing status.
Features for checking indexing include primarily the following.
- URL Inspection
- Page Indexing
If you have not yet used Google Search Console, start by registering your website with Google Search Console.
Related Article: What is Google Search Console? An Explanation of How to Set Up and Use Google Search Console
URL Inspection
With URL inspection, you can check detailed indexing status on a page-by-page basis. For example, in addition to whether or not a page is indexed, you can also check items such as:
- Whether the Googlebot has crawled the page
- The crawled HTML information
- Whether the page is mobile-friendly
- Information on structured data, like breadcrumbs URL Inspection
Page Indexing
Page indexing allows you to check the overall indexing status of the site. Specifically, you can find out the following:
- How many pages of the site are indexed
- How many pages are not indexed
- The reasons why some pages are not indexed
Method 2: Checking with Google Search
Using Google search, anyone can easily check the indexing status. By entering the following into the Google search box and searching, a list of pages indexed under the site URL will be displayed as search results:
- site: (Enter “site or page URL” here)
The pages displayed in the results are indexed in Google’s database.
Checkpoints When Checking Indexing
When checking the indexing status, please note the following checkpoints
- Check the overall indexing status of the site
- Check the indexing status of individual pages
- Verify whether the pages have been crawled
Checking the Overall Index Status of the Site
Please check what percentage of the pages you’ve published are indexed. If the number of indexed pages is extremely low, it may indicate that your website is facing some issues.
Checking the Index Status of Each Page
For each published page, check the index status in detail. Even if a page is indexed, it may still have issues. For example, if a page is indexed but not mobile-friendly, it needs to be updated.
Checking If the Page Has Been Crawled
If it’s determined that a page is not indexed, first check whether the published page has been crawled at all.
As part of the process for a page to be indexed by Google, a crawler developed by Google, known as Googlebot, first checks the page. Based on this check, Google’s system decides whether or not to index the page. If a page has not been crawled at all, it means that Google has not recognized the page.
Reasons and Benefits for Checking Indexing
Checking the indexing status offers several advantages, such as
- Being able to check for any issues with the site structure
- Being able to check for any issues with the quality of page content
- Being able to infer Google’s evaluation of the site or pages
- Being able to understand the total number of indexed pages
- Reasons for Not Being Indexed
Continually updating your website and providing more accurate information is crucial. This requires creating new articles and updating existing ones. Checking the indexing status can help answer questions like “What should I create?” and “What needs to be fixed?”.
By checking indexing, you can determine if there are any issues with your site structure
By checking indexing, you can determine if there are any issues with your site structure. For instance, if the number of indexed pages is low despite increasing published articles, it may indicate a problem with the site’s structure.
Here, “site structure” refers to what is known as the directory structure or the site hierarchy. An inappropriate site structure can make it difficult for Googlebot to understand the overall picture of the site. Ensure that the site structure is as simple and understandable as possible.
Checking the Quality of Page Content
Checking indexing allows you to assess if there are any issues with the quality of page content. Typically, Google deems low-quality articles as “content of little value for indexing.” Moreover, an increase in low-quality articles within a website can lead to a vicious cycle where the site’s overall rating drops, making it harder for new articles to be indexed.
Inferring Google’s Evaluation of the Site or Pages
If a page is indexed quickly after publication, it may indicate that Google highly evaluates the quality of the website or page. Conversely, if indexing does not progress swiftly, it might mean that the site or page is considered a low priority for indexing.
Causes and Solutions for Not Being Indexed
The main reasons for a page not being indexed include
- The website is not recognized by Google.
- The content quality is low.
- Cannibalization is occurring.
- The site uses technology that Google cannot read.
- Instructions to block indexing are in place.
- The domain has been penalized by Google.
- A certain amount of time is necessary for indexing.
The website is not recognized by Google
A newly launched website is not yet recognized by Google. From Google’s perspective, it’s an unknown entity with neither positive nor negative evaluations. To make Google aware of your website, you can use the following methods
- Send a sitemap
- Submit a crawl request
- Acquire backlinks
First, submit your sitemap to Google to make your site known. You can do this through Google Search Console.
Additionally, you can use the URL Inspection tool within Google Search Console to request a crawl from Googlebot. Submitting a request when you create new articles can make Google aware of their existence. However, whether or not Google will actually crawl them is at Google’s discretion. Understand that it’s not guaranteed that they will be crawled.
Related Article: Mastering the Search Console’s URL Inspection Tool! A Clear Explanation on How to Promote Index Registration for SEO
Another way to make your website Google aware is by continuously creating quality content. This increases the chances of acquiring backlinks, which gradually makes Google recognize your site and makes it easier to be indexed.
Low Content Quality
Pages with low content quality tend to be indexed less frequently. Low-quality pages are those that are not useful to site visitors and typically have characteristics such as
- No demand for the content
- Copying articles to other sites already indexed
- Overstuffed with topics, making it unclear what the message is
- Poor grammar in the text, making it unclear what is being explained
- Content is too thin for the theme
- Over-designed, making the text hard to read
If quality content is not being indexed, it’s likely not created with Googlebot’s specifications in mind. To get your pages indexed, ensure the content is accessible for Googlebot just as it is for site visitors.
If you’re not experienced in content creation, consult Google Search Central (formerly Google Webmasters) for guidance as you proceed.
Cannibalism is occurring
Cannibalization within a website can make it harder to be indexed. To begin with, the term “cannibalization” directly translates to “cannibalism” in English, which carries the meaning of consuming one’s own kind. In a figurative sense, it refers to a situation where multiple pages within a website target the same keyword, causing competition among those pages for the keyword.
When cannibalization occurs, Google may struggle to decide which article to index preferentially. As a result, a page that the author did not intend may be prioritized for indexing, leading to significant delays in indexing the intended pages.
If you realize that cannibalization is occurring and it’s a significant issue, consider the following solutions
- Merge duplicate elements between pages
- Consolidate the pages themselves
- Link the pages to each other with internal links
Using Technologies That Google Cannot Read
Google can index certain file formats but not others. Essentially, a website is composed of a combination of files and folders placed on a server. If the file formats used here are unreadable by Google, they will not be indexed. The file formats that can be indexed include the following:
- Text (.txt, .text, and other file extensions). This includes the source code of common programming languages such as
- Basic source code (.bas)
- C, C++ source code (.c, .cc, .cpp, .cxx, .h, .hpp)
- C# source code (.cs)
- Java source code (.java)
- Perl source code (.pl)
- Python source code (.py)
Citation: File Formats That Can Be Indexed by Google
For more details on the file formats that can be indexed, you can check on Google Search Central’s page about file formats that Google can index.
Instructions That Block Indexing
There are instructions that can block Google’s indexing, and if these instructions are set mistakenly, the page will not be indexed. There are mainly two types of instructions:
- robots.txt
- noindex
The former is a text file that contains instructions to deny crawling, typically placed in the root directory where the website is installed. The latter is a value used in HTML language that denies indexing. If either is set, the page will generally not be indexed, so be cautious.
Related Article: What is robots.txt? Explaining the Purpose and How to Write It
Domain Has Been Penalized by Google
A domain is like an address that indicates the location of a website. If there have been serious violations of Google’s policies in the past, the domain may be penalized.
Especially, violations that fit the policies against spam in Google’s Web Search as mentioned in Google Search Central are a concern. In some cases, no matter how high-quality the content you create is, it might not be indexed at all.
There are two patterns when a domain is penalized.
- The domain has been penalized for violations committed by the company or individual in the past.
- The previous owner of a second-hand domain that was acquired has been penalized for violations.
Typically, it’s rare for beginners in website operation to use second-hand domains. However, there are cases where a newly acquired domain turns out to be a second-hand domain without the buyer’s knowledge. If unsure, consider checking the domain’s history using tools like the Wayback Machine.
Related Article: What is the Wayback Machine?
If it’s determined that the domain has been penalized by Google, consider changing the domain.
A Certain Amount of Time is Needed for Indexing
Not limited to new sites, indexing inherently requires a certain amount of time. While it may happen as quickly as within a day, it can also take over a month in slower cases. If you’ve explored various measures and still aren’t indexed, consider waiting some time.
Handling Pages You Don’t Want to Be Indexed
There are pages on a website that should not be indexed, such as sitemaps or community pages. Additionally, there may be low-quality pages that you wish to keep.
In cases where there are pages you don’t want to be indexed, the following methods can help avoid indexing
- Specify pages with noindex
- Block crawling with txt
However, note that neither method guarantees that indexing will be completely avoided.
Specifying Pages with noindex
noindex is an HTML tag that instructs not to index a page. Typically, noindex is used when wanting to avoid indexing.
To directly include noindex in HTML, write the following between the <header></header> tags:
<meta name=”robots” content=”noindex” />
Related Article: What is noindex? Explaining the Difference from nofollow and How to Utilize It
Blocking Crawling with robots.txt
robots.txt is a text file placed in the root directory on the server where the website is located, instructing Googlebot whether to allow access and crawling.
By using robots.txt, you can block crawling by Googlebot. However, be aware that this is not an instruction for controlling indexing.
Related Article: What is robots.txt? Explaining the Purpose and How to Write It
Frequently Asked Questions When Checking Indexing
Here, we’ve compiled frequently asked questions and answers regarding checking indexing.
Q: What exactly is indexing?
A: Indexing refers to the registration of web pages or site information in Google’s database. Websites and pages that have been indexed appear in the search results of Google’s search engine. The search ranking at that time is determined by Google’s evaluation of the website or page.
>>https://www.switchitmaker2.com/wordpress/seo/about-index/
Q: Why is it important to be indexed?
A: Indexing is considered important because it directly relates to securing traffic to the site. Typically, the purpose of operating a website is to attract visitors. Google’s search engine is used by people worldwide. By having your website or pages indexed, you create an opportunity to attract this vast user base to your site.
Q: Is there a simple way to check if something is indexed?
A: You can easily check if a page is indexed by searching for the URL in the Google search engine like this
site:(enter the “site or page URL”)
For example, to check the indexing status of Tokyo SEO Maker, you would search
site: https://www.switchitmaker2.com/wordpress/
Q: How long does it take for a new page to be indexed?
A: There is no set time for a new page to be indexed. It could be indexed within a few hours or take several months.
Q: My indexing is slow; are there any tips to speed it up?
A: Pages that tend to be indexed more quickly include:
- New pages from domains with high ratings
- New pages considered to contain high-quality content
Therefore, one strategy to promote indexing is to continuously create high-quality articles and improve the domain’s evaluation.
Q: What does the number of indexes refer to?
A: The number of indexes refers to the count of pages within a site that have been indexed. However, having a high number of indexed pages does not necessarily translate to a higher site evaluation. What’s important for enhancing a site’s evaluation is the extent to which high-quality content is indexed.
>>https://www.switchitmaker2.com/wordpress/seo/about-index/
Q: What is index coverage?
A: Index coverage refers to a feature in Google Search Console that allows you to check the indexing status within your site. It was originally a term used in the Google Search Console menu, but as of 2023, the term “coverage” is not being used in Google Search Console.
>>https://www.switchitmaker2.com/wordpress/seo/index-coverage/
Summary
The indexing status is a critical determinant of whether a website or page will appear in Google’s search engine results. To efficiently gather traffic to your website, it’s essential to implement SEO measures and promote indexing. Therefore, when you start operating a website, make it a habit to first understand the indexing status. If you find your pages are not being indexed, it’s necessary to identify the causes of this and make the necessary adjustments.