This project is read-only.

SEOMoz Crawler will crawl an infinite number of pages

Topics: ASP.NET 2.0
Sep 16, 2013 at 11:35 PM
Have an odd one that's just shown up. We help clients manage about a half dozen installs of various installs (2.8), which are basically stock, other than changing css/html on the standard theme. One of them has come to us saying that has indexed 10,000 new pages on their blog, and has fired a similar amount of 'duplicate content', 'duplicate title', etc. etc. for each page.

Logging into their account I see that all of the pages don't exactly exist. They're all for things like:

And so on. The problem being, there isn't enough content for these pages to exist. If I visit one of the URLs above, sure enough will actually return a page for it, and this is on all 6 of the blogs I was able to test.

The strange thing is, most of these blogs have existed since at least January, but this Moz crawl fired 10,000 pages only in the last week. This implies that it's a change on SEOMoz's part, and not necessarily a flaw on's part, but I thought I'd ask here.

So the question is, can I restrict from not responding to these page requests? Or if so, how should they be responded to? I can think of two ways off the top of my head:
  • Redirect to highest current page
  • 404
Sep 17, 2013 at 1:11 AM
Sep 18, 2013 at 7:55 PM
Brandozz -

Thanks for the suggestion. I was hoping for something more robust such as BE 'knowing' that there are only x number of pages and responding appropriately when page x+1 is requested but I guess we can't have everything. I still have to pick it all apart but this ambiguously written blog posting has some good ideas:

The funny thing is I'd already written a duplicate content buster that would append unique meta info into paged files, when all along I should have actually been canonically and no-crawling in the first place.
Sep 19, 2013 at 8:42 AM

There is an easy solution to this :)

Just go to your default.aspx.cs page

and put this code in there:

else if (Request.RawUrl.ToLowerInvariant().Contains("?page="))
        base.AddMetaTag("ROBOT", "NOINDEX, NOFOLLOW");
With the ToLowerInvariant().Contains("?page=")) section you can also add additional filtering too like


and add noindex or follow to those too.

See it in action live at:

In my opinion pages like ?page=2 tags, and categories are really meant just for "humans" to use as tools and not for
search engines to index.

The only thing that really should be index are just your /pages and /posts

The rest have no need to be indexed.

So in your robots.txt file you can also put allow /pages /posts and then decline all of the other folders.

This will solve the issue of duplicate content because you are only having the actual page or posts being indexed and that is it.

Have Post list, list of posts on pages, keywords, categories all will just devalue your value of your actual posts and pages.

If users want to use those they can use the widgets on your website to get to them.

When they are directed from a search engine from one of these pages like ?page=3 90% chance they are not being directed to what they were looking for any
way, or they have to search the posts or pages in the list to get to it.

It is much easier to direct them to the actual page or post they are looking for in the first place instead of having them play Easter egg hunt in finding what
they were looking for in the first place. :)

Have a great Day!

Brian Davis
Sep 19, 2013 at 7:02 PM
Brian -

Thanks for your reply. It's funny you suggested that, since I'm already ring-fencing how page/category/etc. are handled so all I have to do is add the meta robot noindex directives to my existing Else Ifs, though the way I did it is:
else if (!String.IsNullOrEmpty(Request.QueryString["page"]))

I think I'll try it that way, though the big question is whether the SEOMoz robot will respect it. According to this:

It'll grab everything >up to< finding that tag, which might mean it'll cause problems regardless. Would prefer to be able to do this without having to make custom robots.txt entries for each platform.

kbdavis07 wrote:

There is an easy solution to this :)

Just go to your default.aspx.cs page

and put this code in there:

else if (Request.RawUrl.ToLowerInvariant().Contains("?page="))
        base.AddMetaTag("ROBOT", "NOINDEX, NOFOLLOW");
Oct 7, 2013 at 7:39 PM
Just updating in case anyone else has this problem. The noindex, nofollow didn't work. is still grabbing these non-existent pages. I've made a robots.txt entry for /?page=* so we'll see how that works.