Domain and URL Normalization Methods : Explaining Expected SEO Effects

2024.05.27

contents

1 SEO Internal Strategy Checklist
2 Understanding URL Normalization
3 URLs we need to normalize
4 Reasons for URL Normalization
5 Points to Note in URL Normalization
6 Methods of URL Normalization
- 6.1 301 Redirect
- 6.2 Using the Canonical Tag
7 How to Confirm URL Normalization?
8 Summary

url normalization

When the same content exists on different URLs or when URLs differ between mobile and desktop sites, it’s necessary to normalize the URLs. Failing to normalize URLs can result in search engines penalizing them as duplicate content, which can negatively impact SEO .

Related Article: What is a URL? Explaining Meaning and Components

URL normalization is done using 301 redirects or canonical tags. This article explains the reasons for normalization, methods, URLs that should be normalized, and also touches on how to confirm if normalization has been successful and important considerations.

SEO Internal Strategy Checklist

Title Length
Meta Description Tag
Heading (H Tag) Settings
HTML Tag Placement
Optimization of Internal Links
Optimization of Directory Structure
Implementation of Breadcrumb Navigation
Understanding Alt Attributes
Indexing Strategies
Implementation of Structured Data
SSL (HTTPS) Encryption
Setting Canonical Tags
Understanding Sitemaps
Setting up robots.txt
Handling URLs with and without “www”
Improving Page Loading Speed
Enhancing UI and UX
Improving Core Web Vitals
Accelerating Server Processing
Improving Mobile Friendliness
Implementing Responsive Web Design

Understanding URL Normalization

URL normalization refers to unifying multiple URLs that lead to the same content into a single URL. By consolidating URLs, you can concentrate search engine evaluations on a single URL, making it a crucial element in SEO strategies.

Failing to normalize URLs can result in search engines considering them as duplicate content, which may lead to them being excluded from search results.

URLs we need to normalize

During URL normalization, there are six key items to check. If any of these apply to your company’s URLs, normalize them accordingly.

Differences in capitalization
Presence or absence of “www”
Difference between “.htm” and “.html”
Difference between “http” and “https”
Presence or absence of parameters
Presence or absence of trailing slashes

Differences in Capitalization

https://sample.com/Sample

https://sample.com/sample

URLs should use lowercase letters only. If uppercase letters are used, normalize the URL to lowercase.

Presence or absence of “www”

https://www.sample.com/sample

https://sample.com/sample

While having or not having “www” is acceptable, omitting it can provide a cleaner appearance and has the added benefit of being easier to remember.

Difference Between .htm and .html

https://sample.com/index.htm

https://sample.com/index.html

“htm” is simply an abbreviation for “html,” and both are HTML files. However, publishing the same content with different extensions may be considered duplicate content. Be sure to standardize on one of them.

Difference Between http and https

http://sample.com/

https://sample/sample.com

The difference between “http” and “https” boils down to whether the communication content is encrypted. “https” encrypts communication content, providing the benefit of preventing impersonation.

Google also recommends using “https,” but mixing “http” and “https” may be considered duplicate content. It’s necessary to standardize on “https.”

Presence or Absence of Parameters

https://sample.com

https://sample.cpm/?gclid=ABCD

Parameters are variables that come after “?” in the URL. They are used for purposes like advertising access analysis and e-commerce sites. Generally, URLs without parameters should be normalized, but if necessary, use canonical tags.

Presence or Absence of Trailing Slash

Trailing slash refers to the “/” (slash) at the end of a URL.

https://sample.com/media/

https://sample.com/media

When there is a trailing slash, it displays files like index.htm or index.php within the “media” directory, while without it, it shows a file named media.htm. However, if the media.htm file is not available, it generally displays the same content as /media/.

Because the processing differs based on the presence or absence of a trailing slash, it’s essential to normalize URLs.

When the URL consists of only the domain name (https://sample.com), there’s no need to obsess over the presence or absence of a trailing slash. This is because even when displaying URLs without a trailing slash, browsers and Googlebot process them as if the trailing slash were present.

Reasons for URL Normalization

There are primarily three reasons for URL normalization

-Preventing dispersion of search engine evaluation.

-Improving crawlability.

-Reducing tracking analysis costs.

Normalizing URLs for these reasons can ultimately yield SEO benefits.

Preventing Dispersion of Search Engine Evaluation

Search engines may consider duplicate content or disperse evaluation when two pages with the same content exist. This can lead to inefficient SEO.

Google has officially addressed this issue.

When multiple URLs bring the same content to users, evaluations of that content can become dispersed across those URLs. To prevent this, use consistent URLs when linking to pages within your site.

Source: “One URL per piece of content” (Google Search Engine Optimization Starter Guide)

Improving Crawlability

Crawling refers to the process where robots, known as crawlers, gather information on the internet. Based on the information collected by these crawlers, search engines display pages in search results. However, among the thousands or even billions of pages available on the web, not all are shown in search results.

No matter how high-quality your content may be, it’s meaningless if crawlers can’t pick up the information. Improving crawlability means providing information to crawlers to recognize your website, which ultimately contributes to SEO.

URL normalization makes it easier for crawlers to find your pages. This makes it more likely for your website to catch the users’ attention and allows search results to reflect updates quickly.

Unnormalized URLs are like hidden gems in the deep sea for crawlers. Always normalize your URLs to provide appropriate information to crawlers.

Reducing Tracking Analysis Costs

When duplicate pages exist on your website, it results in analyzing the same content twice during tracking analysis. This consumes time, effort, and ultimately leads to decreased cost-effectiveness. To avoid this risk, normalize your URLs.

Points to Note in URL Normalization

When performing URL normalization, there are three points to keep in mind.

Normalization takes time.
Confirm the URLs to be normalized.
Avoid using robots.txt.

Normalization Takes Time

Even after normalizing URLs, it doesn’t mean that Google will recognize them immediately. It’s said to take about 1 to 6 months for crawlers to collect information and display the pages in search results after URL normalization.

However, requesting index registration from Google Search Console makes it easier for your pages to appear in search results. When making the request, select URL inspection and click on “Request indexing” to execute.

It’s important to note that this method is simply a request to Google and doesn’t guarantee immediate appearance in search results.

Confirming URLs for Normalization

Before normalizing URLs, make sure to double-check them. Providing the wrong URLs to crawlers can result in 404 errors or normalization of duplicate pages.

Listing the URLs for normalization in advance and checking for any typos or errors can help avoid such issues.

Avoid Using robots.txt

Some may attempt to normalize duplicate content by preventing crawlers from recognizing it using robots.txt. However, Google doesn’t recommend this approach. Google prefers to allow crawlers to assess the content of duplicate pages and make their own judgments.

When normalizing URLs for SEO, focus on creating content with accurate information rather than trying to hide duplicate content.

Methods of URL Normalization

There are two methods of URL normalization.

-Use a 301 redirect

-Use a canonical tag

A 301 redirect is used to merge an old page with a new one, or when transferring to a new server. A canonical tag is used when the content of pages is similar but each has unique features, and you want to prevent search engines from recognizing them as duplicate content.

When using a 301 redirect and a canonical tag, choose the appropriate one based on your goal.

301 Redirect

When you migrate servers or update pages, your evaluation from search engines starts anew. This means there’s a risk that pages previously ranked high in search results may no longer appear. 301 Redirects help mitigate this risk.

Here’s how to do it.

Create an htaccess file.
Set the character encoding to UTF8.
Write the necessary code according to the situation.
Save the file with a blank line at the end.
Upload it to the server.
Rename “htaccess” to “.htaccess”.

Make sure your server supports .htaccess if you plan to use 301 redirects.

Create the .htaccess file using a text editor and set the text code to UTF8. If you’re using a Windows PC, save the file as “htaccess” and choose “All Files” as the file type before saving.

Also, add the dot to the file name after uploading to the server. This is because some PC environments may not allow you to create files starting with a dot, so this is a precautionary measure.

To normalize URLs by removing “www.

RewriteEngine on

RewriteCond %{HTTP_HOST} ^www\.sample\.com$

RewriteRule ^(.*)$ http://sample.com/$1 [R=301,L]

Normalizing URLs without “index.html”

When dealing with URLs containing “index.html”, it’s beneficial to specify the URL without it as the redirect destination. This is because URLs without “index.html” are more common and easier for users to remember.

RewriteEngine on

RewriteCond %{THE_REQUEST} ^.*/index.(html|htm|php)

RewriteRule ^index.(html|htm|php)$ https://sample.com/$1 [R=301,L]

Converting URLs from “http” to “https”

RewriteEngine on

RewriteCond %{HTTPS} off

RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

When implementing HTTPS redirects, there may be a temporary drop in search rankings. However, rankings typically return to normal within 1-2 weeks if the transition is smooth. Always monitor ranking changes and ensure that normalization is correctly implemented across all cases.

Using the Canonical Tag

The canonical tag is a tag used to declare to search engines that even if content is similar, it should not be considered duplicate. There are two ways to use it.

Insert it in the HTML head tag.
Specify it in the HTTP header.

The former is suitable for general websites, while the latter can be used for PDF files or other formats, but make sure your server supports it by checking beforehand if it’s compatible with .htaccess.

When using the canonical tag, it’s important to note that its role is merely to declare to Google. The actual judgment is left to Google, so it doesn’t necessarily mean that the content won’t be recognized as duplicate.

Furthermore, when using the canonical tag in the head tag, only one should be used. Using two or more can confuse search engines about which one to prioritize.

Also, avoid using the canonical tag for pages that extend one content across multiple pages, as this might prevent subsequent pages from appearing in search results.

Inserting in the HTML Head Tag

To include the canonical tag in the HTML head tag, specify the URL you want to unify with the canonical tag. If it’s not included in the head tag, it won’t have any effect. Ensure to include it in both the unified URL and the duplicate URL in your HTML.

For example, to remove “www” from “https://www.sample.com,” use the following setting.

<head>

・

・

</head>

To remove the trailing slash from “https://www.sample.com/,” use the following setting:

<head>

・

・

</head>

When specifying URLs, double-check for any spelling errors.

Using the Canonical Tag in the http Header

Using the canonical tag in the HTTP header allows compatibility with file formats other than HTML. Google has officially communicated the method.

Let’s look at an example of a website that provides documents in both HTML and PDF versions. The URLs for the HTML and PDF versions are as follows.

http://www.example.com/white-paper.html

http://www.example.com/white-paper.pdf

In this case, by using the rel=”canonical” HTTP header when the PDF file is requested. You can notify Google that the preferred URL is the above HTML document.

Example

GET /white-paper.pdf HTTP/1.1

Host: www.example.com

(…other HTTP request headers…)

HTTP/1.1 200 OK

Content-Type: application/pdf

Link: <http://www.example.com/white-paper.html>; rel=”canonical”

Content-Length: 785710

(…other HTTP response headers…)

Source: Supported rel=”canonical” attribute in HTTP headers (Google Search Central)

To execute this method, you’ll need to use developer tools.

Using the Alternate Tag

Even if the content is the same, having different URLs for the desktop and mobile sites, or having versions in other languages, can lead to duplicate content in search results. To prevent this, use the alternate tag.

The alternate tag is used in conjunction with the canonical tag.

Let’s assume URLs for the PC and mobile sites.

PC site URL: https://sample.com

Mobile site URL: https://sp.sample.com

In the head tag of the PC site, include the following.

At the same time, the head tag of the mobile site should include;

When writing the tags, insert the URL of the mobile site into the desktop site, and the URL of the desktop site into the mobile site.

The alternate tag can be useful when there is a URL for the mobile site, but be aware that any errors in the description could result in it not appearing in search results. Switching to a responsive design site can alleviate this risk.

How to Confirm URL Normalization?

To ensure that URL normalization is accurately implemented, use Google Search Console. When examining URLs, follow three steps to confirm.

Log in to Google Search Console.
Select URL Inspection from the menu.
Input the URL to inspect.
Check the Coverage

Start by logging in to Google Search Console.

Select URL Inspection from the menu and input the URL from your website that you want to inspect.

When performing a 301 redirect, there are three key points to confirm within the displayed results.

“Redirected page” is shown in the Coverage section.
The user-specified canonical URL.
The canonical URL selected by Google.

If the normalization is successful with the 301 redirect, the Coverage section will display “Same as user-specified canonical URL” under the URL selected by Google. While using the canonical tag, the same principle generally applies, but there are differences in the content of the Coverage section among the three confirmation points.

-In the coverage section, it is indicated as ‘Alternate page (with appropriate canonical tag)

-User-specified canonical URL

-Google-selected canonical URL

Check if the Coverage section displays “Alternate page (with appropriate canonical tag)”. Also, if normalization is successful with the canonical tag, the URL selected by Google should show “Same as user-specified canonical URL”.

If the results do not appear as described after URL inspection, it indicates that normalization has not been achieved. Ensure that the .htaccess file and the URLs within the canonical tags are correct.

Summary

URL normalization, an aspect of SEO consultancy, involves consolidating pages with identical content but different URLs into one URL. Normalization helps avoid the risk of search engines categorizing content as duplicate. It also prevents the dispersion of evaluation by search engines and improves crawlability. Additionally, it reduces costs when conducting site tracking analysis. There are two methods for URL normalization: using 301 redirects with .htaccess and using canonical tags. However, 301 redirects require server compatibility with .htaccess. When URLs differ between mobile and PC sites, normalize them using alternate tags along with canonical tags. Remember these key points: normalization takes time, double-check for URL typos, and avoid using robots.txt.

Author Profile

Mr. Takeshi Amano, CEO of Admano Co., Ltd.

Mr. Takeshi Amano is a graduate of the Faculty of Law at Nihon University. With 12 years of experience working in the advertising agency industry, he discovered SEO and began his research during the early days of SEO. He self-taught and conducted experiments and verifications on over 100 websites. Using this expertise, he founded Admano Co., Ltd., which is currently in its 11th year of operation. Mr. Amano handles sales, SEO consulting, web analytics (holding the Google Analytics Individual Qualification certification), coding, and website development. The company has successfully managed SEO strategies for over 2000 websites to date.

Return to the top of Japan SEO