Google’s Official Blog has been talking about 404 server response codes (Page Not Found errors) this week, and since I just wrote about 301 Redirect Responses last week I guess I might as well pick up where they left off and describe how to handle a custom 404 message with a PHP page. As a web developer and an SEO it’s important to understand how web servers work, as it can impact both site performance for your users and for search engines.
Whether you are hosting your website on a Windows IIS Server or a Linux Apache Server (or Unix, or Mac, etc.), the essential job of the web server remains the same. The web server listens for a network request on port 80 and when it receives one, it sends a response. When you type a URL into your browser, let’s give a shameless plug to myself and say http://www.barrywise.com, a network request is sent on port 80 to whomever is hosting that domain. In this case it’s my URL so my web server is sitting here all day waiting for your request. Really. All day. Before my web server actually sends my home page back to your browser to display it for you to read, it first sends a Server Response Code in the HTTP Header. This code tells your browser what to do with the page.
Most of the time you’ll get a 200 code back, which means all is OK, and the browser simply displays the web page. If you typed in a URL of a page which doesn’t exist, however, the web server sends back a 404 code which tells your browser that it could not find that page on the server.
But there’s a problem with the way some CMS software, and even web designers, handle Page Not Found errors. Suppose you don’t want to tell your site visitors that a page is not found. You’d rather send them a different page instead, maybe with alternate content on it or some advertising banners like Google Adsense. In this case, you may wish to alter the way your web server handles 404s and just redirect the request to an alternate content page with a 301 status code, or worse yet deliver a 200 code, meaning all is OK with the page. This is called a soft 404 and Google doesn’t like it.
WordPress does correctly handle 404s, but there are some other CMS’s which don’t.
If you’d like to know how your server is handling them, just check a header status code tool. It will show you the exact response code your server is returning. You want to see 404 for pages which are not there, 200 for normal OK pages, and 301 for any pages you want to redirect with a rewrite rule.
The reason search engines hate these soft 404s is because they can’t accurately crawl your correct URL structure. They get false positives on pages which don’t actually exist. Since your server returned a status code of 301 or 200, the search engine thinks the page exists, even though users reading the page can tell otherwise.
Make sure you are not redirecting Page Not Found errors, and use custom 404 messages instead. With Windows you can display a custom ASP or ASP.NET page, and with Linux/Apache you can have a custom PHP page display for all 404 errors. Then you can process the requested file name/URL and generate custom messages for your 404 errors. For example, in Apache, just edit your httpd.conf file and add this:
ErrorDocument 404 /pagenotfound.php
Then create a PHP file called pagenotfound.php and use getenv(“REQUEST_URI”) to obtain the page which the visitor was trying to view.
For example, let’s say someone mistyped your web address and entered http://www.yoursite.com/pagg.php, instead of http://www.yoursite.com/page.php. Your web server will return a status code of 404, Page Not Found, telling search engines the page isn’t there, but the actual page displayed to the user will be the file you created called pagenotfound.php. Now in that PHP page, you can place something like this code:
Sorry, you were looking for <?php=getenv("REQUEST_URI")?>, but we couldn't find that page. Are one of these other pages what you were looking for:
To the end user, this will look like:
Sorry, you were looking for pagg.php, but we couldn't find that page. Are one of these other pages what you were looking for:
And then of course you can list several alternates for the visitor, be it other content pages or advertising banners, etc. You can even have the server email you every time someone gets a 404 message on your website, if you want to micromanage the whole thing.