Guide to Noindex, Nofollow, Canonical and Disallow

Guide to Noindex, Nofollow, Canonical and Disallow

Do you have difficulty judging how and when to use tags, attributes and commands such as Noindex, Nofollow, Canonical or Disallow?

We’ve persuaded Henrik Bondtofte (a Danish SEO wizard) to act as your judge.

Nofollow

Nofollow was collaboratively developed in 2005 by Yahoo, MSN and Bing. It’s purpose was to quell the growing problem of comment spam.

Nofollow is a HTML attribute which instructs most search engines to refrain from following a link and thereby transfer value to the site linked to.

A nofollow link does not automatically ensure that the target site will not be crawled. Google, in fact, has a bot designed exclusively for that purpose.

The tag, however, tells the search engine that you don’t trust or cannot vouch for the content of the website being linked to.

This is also why it’s pretty dumb to use nofollow internally on your own site. It’s a weird signal to send to search engines – that your site has content you can’t vouch for. So, please, don’t do it.

Nofollow can be inserted by adding rel=”nofollow” in an active link. In most CMS/Content Management Systems you can set it in your WYSIWYG editor when inserting an active link.

<a href=”http://www.fictivepage.com/page-you-dont-want-to-show” rel=”nofollow”>

Nofollow can also be implemented using a meta tag. You can see an example below.

<meta name="robots" content="nofollow">

This was, in fact, the original form of Nofollow, but because it blacked out an entire page it proved to be less adaptable and useful than an attribute which could be added to links individually.

Disallow: (Robots.txt)

Robots.txt is your website’s gatekeeper.

It’s found in the root of your website and it’s directives trump all others.

You should use Disallow if there are entire folders on your website that need to be screened off. And especially if they haven’t been indexed before.

When you create a Disallow in your robots.txt you tell search engines that they shouldn’t crawl the relevant page or folder.

You can even block for your entire website but this doesn’t mean that the page won’t be indexed.

If your page continues to show up on Google’s index you can use Google Webmaster Tools to request that it be removed again. If, that is, you insist on using robots.txt for the purpose.

If you combine both Disallow and Noindex the latter will be ignored because your Disallow command has already told search engines that they may not look at the page.

That’s why it’s superfluous to use these two tags in conjunction. If you want to be absolutely sure that your URL isn’t indexed by search engines then use Noindex and nothing else.

Using Disallow on already indexed pages, for example those with incoming links, means you lose the value which would otherwise have been passed on to the benefit of other pages on your website.

This is why I strongly recommend you use Noindex instead, whenever possible.

If you link internally to pages which are disallowed in your robots.txt you waste your internal Pagerank.

Examples of disallow commands

Disallow: /folder-we-dont-want-to-show/
Disallow: /file-we-dont-want-to-show.html
Allow: /folder-we-dont-want-to-show/single-file-we-want-to-show-from-folder.html

(Allow is supported by Google, but not by all search engines)

Noindex

A Noindex tag tells search engines that the relevant page should not be indexed in their search results.

If you haven’t added a Nofollow to the tag, however, search engines will still follow all the links on the page.

In other words, the page will be read by search engines but it won’t be indexed.

If you don’t want search engines to follow links on your page you need to add a Nofollow to the command.

<meta name="robots" content="noindex">

Don’t index, but follow links.

<meta name="robots" content="noindex, nofollow">

Don’t index and don’t follow links.

The advantage of not using Nofollow with Noindex is that Pagerank can flow through the Noindex and continue on to the pages which are linked to. This can’t happen when you combine with a Nofollow attribute.

Canonical tag

Canonical tag is used to nominate a primary page when you have several pages with duplicate content.

If you have five versions of the same page then four of them should be tagged with a canonical tag pointing to the primary page. Search engines can thereby see that this URL is the right one.

Canonical tags are just a guideline, though. That’s why it’s not uncommon to find pages in the index bearing a canonical tag. That’s also the reason you will probably lose a little of your Pagerank if you use canonical tags instead of Noindex.

Examples of canonical tag

<link rel="canonical" href="http://www.fictivepage.com/best-page" />

Above code snippet needs to be inserted in the page header

Summary

In conclusion, I rank these four methods in the following order:

  1. Noindex tag, without Nofollow attribute
  2. Canonical tag
  3. Robots.txt disallow command. Unless entire folders need to be screened off, in which case I recommend robots.txt above Canonical or Noindex.
  4. Nofollow – In fact, I don’t recommend you use this last one at all.

  • Like this article?

Please help us convince Henrik to write more articles for DashboardJunkie.com

If you like this guide, please click on the buttons below, or drop a comment.

Help yourself and your network now!


  • About the author

Henrik Bondtofte runs the successful danish online marketing company Bondtofte.dk, the sports gear webshop Wolfgear.dk and has written one of the most popular danish books about SEO.

Get notified about new articles!

Yes, please notify me

Comments are closed

Copyright ©2012. All Rights Reserved.