error404.aspx returning invalid status code.

Jul 21, 2010 at 4:52 AM

If you check an invalid link with Google bot you will find that the 404 page even though is pretty does not tell Google bot that the page was not found. Google bot expects the page to return 404 if it is not the correct page. Adding the line of code below to the page load event will fix the issue.

Response.StatusCode = 404;

Comments?

Coordinator
Jul 21, 2010 at 6:15 AM

That is basically a good idea.  However, there's another problem in that the two most common ways a person ends up at error404.aspx is either when (1) from BE code, we do a Response.Redirect to error404.aspx, or (b) when ASP.NET redirects the person to error404.aspx via the <customErrors> section in the web.config file (this happens when a legitimate page is not found, or sometimes when an unhandled error occurs).

In both cases, the initial response to the client (browser, search engine, etc) is 302 (redirect).  The client gets redirected to error404.aspx and that's when the client gets a 200 -- but could get a 404, like you suggest.

From what I've heard, this is a problem because the search engines are only really considering the first response (currently a 302).  If we set Response.StatusCode = 404 for error404.aspx, then the search engine is going to think that error404.aspx doesn't exist.  In fact, error404.aspx does exist -- it's the original page that doesn't exist.  The search engine is going to consider the first page as moved to another location -- rather than consider the first page to not be found.

This is a known issue with ASP.NET.

The way this could be addressed would be to not use the <customErrors> tag, and instead use an HTTP Module to capture 404 errors, and then do something like a Server.Transfer to error404.aspx (along with using your Response.StatusCode = 404).  In this case, the URL you would see in your address bar would be the original URL, and not error404.aspx.  This would be the most proper way of issuing a 404.  Similarly, the BE code should not use Response.Redirect("error404.aspx").  It should also do a Server.Transfer to error404.aspx.

Jul 21, 2010 at 7:13 PM

Sounds good maybe I'm just mis-understanding because Google has a page that say "see what google bot sees". When I hit a page that is not there I get a status code of 200. Which means google bot sees the page as 200. Am I wrong?

Coordinator
Jul 21, 2010 at 11:01 PM

You are getting a 200 for error404.aspx, not for original-page.aspx.

With a typical BE installation, when redirected to error404.aspx, there's 2 responses to the client.  The first response is the 302 which is associated with original-page.aspx.  The 2nd response is 200 which is associated with error404.aspx.  If you set the StatusCode to 404 for error404.aspx, then the first response for original-page.aspx is still going to be 302, and the 2nd response for error404.aspx is going to be 404.

I'm not an expert on this, and I'm certain this type of thing has been discussed on SEO forums.  So might be good to find out from a place like that.

You might be lucky if when Google sees the 302 and they follow it to error404.aspx, that they associate the 404 status code with the first page.  I don't think this is what happens.  Definitely the surest way to make sure it works right would be to not have a 302 issued where the URL in the address bar changes to error404.aspx.  If the URL stays the same (i.e. no redirect) and a 404 is issued on the original request for original-page.aspx, this would leave no doubt that the 404 is for original-page.aspx.

Jul 21, 2010 at 11:15 PM
Edited Jul 21, 2010 at 11:17 PM

I'm not sure returning 302 is correct either. Why would BE return a 302 found status code when the content really doesn't exist? When does it finally return a true 404?

10.3.3 302 Found

The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.

The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).

If the 302 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

      Note: RFC 1945 and RFC 2068 specify that the client is not allowed
      to change the method on the redirected request.  However, most
      existing user agent implementations treat 302 as if it were a 303
      response, performing a GET on the Location field-value regardless
      of the original request method. The status codes 303 and 307 have
      been added for servers that wish to make unambiguously clear which
      kind of reaction is expected of the client.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Jul 21, 2010 at 11:19 PM

Oh btw, do you know anything about extensions?

http://blogengine.codeplex.com/Thread/View.aspx?ThreadId=220499

Coordinator
Jul 21, 2010 at 11:33 PM

Exactly, the 302 response is not good.  I was trying to say this in my first post.

BE uses the ASP.NET built-in <customErrors> functionality to serve error404.aspx when a page is not found.  This is nice built-in functionality, but it is flawed because it handles this by 302 redirecting the person to the 404 error handler page (error404.aspx).  This is a known problem with the <customErrors> functionality.  BE code itself does the same thing by using Response.Redirect("error404.aspx").

In my first post, I was suggesting that a better way for this to happen would be for BE to not use <customErrors> and instead create an HTTP module that catches legitimate 404 errors and does a Server.Transfer to error404.aspx.  Similarly,  BE code should also use Server.Transfer instead of Response.Redirect when it cannot find something the client is requesting.

Jul 22, 2010 at 12:05 AM

Well hum. I dumped my site in favorite of a packaged blog system so I wouldn't have to deal with coding the site. I guess BE does the job but I'm trying to figure out how to handle the search engine in the case of old URLs. I would like to return a true 404 on bad links and redirect with a 301 on certain links. 

I haven't figured out either...

Looks like I will have to get the BE source and fix the issue and add the feature. I'm sure I wouldn't be the only one with these issues, right?

Coordinator
Jul 22, 2010 at 12:03 PM

To do a proper 404 wouldn't require many changes.  If you're using BE 1.6, the Global.asax file has an Application_Error handler in it that catches all errors -- including 404 errors (at least for ASPX pages or other file extensions that ASP.NET handles).  If you have an older version of BE, you can use the Application_Error handler in BE 1.6's global.asax file.  The Application_Error handler is actually using a Server.Transfer for 500 internal server errors.  It filters out 404 errors.  This part of the code looks like:

if (ex == null || !(ex is HttpException) || (ex as HttpException).GetHttpCode() == 404)
	return;

If you replace that with:

if (ex == null || !(ex is HttpException))
	return;

if ((ex as HttpException).GetHttpCode() == 404)
{
	Server.Transfer("error404.aspx");
	return;
}

Then on a 404 error, it'll do a Server.Transfer to error404.aspx.  You would still want to set the Response.StatusCode to 404 in error404.aspx.  When the server transfer happens, this is happening on the server side, and the client is unaware of it.  So, if you go to a fake URL, such as:

http://www.your-blog.com/non-existant-page.aspx

You should be server transferred to error404.aspx, the URL in the address bar should remain at that URL above, and the HTTP status code will be 404.

............ the other change to make would be to replace the Response.Redirect("error404.aspx") in the BE code to Server.Transfer("error404.aspx").  Doing a quick search thru the code, it looks like there's about 6 files (core and non-core files) that have this.

Sep 15, 2010 at 8:38 PM
Edited Sep 15, 2010 at 8:39 PM

You may check the Free extension from SpiceLogic, here it is : http://blog.spicelogic.com/post/BlogEngine-Extension-Redirect-Url-Http-301-Search-Engine-Optimization.aspx

It allows you to manage a list of Old Url -> New Url Mapping pairs from Admin Panel and finally redirect the Old Url issuing 301 Status Code.