Tuesday, March 27, 2007

Supplemental Listings - How To Avoid Them

Simply put, 1) don't trip Google's duplicate content filter: 2) Improve a page's PageRank by getting more quality, relevant inbound links.

1)Install a 301 redirect in .htaccess from non-www to www (or the other way around, your preference.). You can also register your site with Google Webmaster Tools and set your prefered domain (though this is new, so do it at your own risk).

Installing 301 redirects will usually not result in supplemental pages going away, but 1) it tells Google that two urls refer to the same page and 2) unify PageRank to one page, so there's no PageRank leakage. Higher PageRank also increases the chance of pages returning to the main index.

2)Install a 301 redirect from /index.html to /. The simplest fix is not to use index.html/index.htm at all.
3)Don't link to http://www.domain.com/index.html. Instead, link only to http://www.domain.com/. Same deal with linking to index.html in directories - link to http://www.domain.com/dir/ instead of www.domain.com/dir/index.htm. 4)If you use Dreamweaver, write links to home pages by hand. Why? Because Google indexes URLs, not pages. And though those URLS point to the same page, Google will treat them as unique addresses in the Interweb.

5)Install META ROBOTS=NOINDEX on duplicate pages to prevent Google from indexing them (or use robots.txt).
META name="robots" content="noindex"

6)Check the web for copies of your page. You can use copyscape. I just take a unique snippet sample and run it through Google.

7)Prevent multiple URLS from referring to the same page. For example,
www.domain.com/index.php?user=halfdeck&page=supplemental
www.domain.com/index.php?page=supplemental&user=halfdeck
www.domain.com/index.php?page=supplemental&user=halfdeck&reply=23
You'd want to noindex two of those URLs out of the three. If other people link to those pages and/or those URLs are already in Google's index, you need to 301 redirect them.

8)404 invalid urls. For example, if your site is dynamic, make sure URLs like
http://www.domain.com/bogusurl/thatdoesnt/really/exist/
don't return a status 200. To check HTTP status code, I use Web Sniffer. If necessary, use a PHP header to validate urls. For example, if your url is /blue/car/ and you only have entries for red and green cars in your database, issue a 404.
9)Get rid of long session IDs:
www.domain.com/session_id=d9j5034jkfgk94HHdfgasFG5sdf
Use cookies instead.
Too many parameters in your URL. Now, things like this may change in the future, but according to Google, having too many parameters in your url may prevent it from being listed in the main index:
For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.

10)Don't put the same content on multiple domains. Well, you can, but then expect to run into supplemental issues.
Use unique TITLE/META description tags to create unique SERP listings. Each page should share as few words in the title/meta description as possible with another page. For example:

Title: SEO4FUN Search Engine Optimization - Supplemental Hell
Description: SEO4Fun search engine optimization - A checklist of things to do to avoid getting trapped in supplemental hell.

Title: SEO4FUN Search Engine Optimization - Supplemental Listings
Description: SEO4Fun search engine optimization - A checklist of things to do to stay out of the supplemental index.
are NOT unique enough. They share too many words: "SEO4FUN", "search engine optimization", etc. You want:

Title: Google's Supplemental Index - Traffic Black Hole
Description: How do you get out of supplemental hell? Is it even possible? Who do I listen to? Nobody agrees on anything!!

Title: Squeezing Out More Traffic with Del.icio.us
Description: Not getting enough traffic off Google, Yahoo, or MSN? Try riding the social bookmarking wave.

11)Make sure your META description tags are no shorter than 60 characters. The point of using META description tags is to prevent Google from digging into your BODY HTML to fish out some irrelevant text from your navigation bar or listbox. However, if your META descriptions are too short (and 60 chars sounds like a lot, but it really isn't), Google will still dig into your source code.


50 characters long description tag. Make it longer.
To prevent Google from making a mess, make sure your META descriptions are at least 60 chars. 60 is a roundabout figure, so in reality, you may be safe just going with a minimum of 50. Also, since Google constantly improves its snippet generation process, keep your eyes open for updates on this. But as they say, error on the side of caution.

12)Move navigation below content in your source. Google parses HTML to construct description snippets that shows up in site:search results. The parsing algo is still somewhat simple minded. To make sure Google finds your content, its critical that you position content above navigational elements in your source code. Sometimes, Google will fish out the right text if its buried in the source, but sometimes it doesn't. If it's positioned right below body, it will always find the right text.
13)Get rid of TABLES (optional...then again, not really). Use CSS. As I said, Google has problems with content buried inside structurally complex HTML. That means it can choke on content hidden away in a TR. Jill Whalen will disagree, but she's flat wrong when it comes to TABLES, since she doesn't take Google's description snippet generation into account (she also claimed PageRank doesn't matter, a statement she later retracted after Big Daddy). Remember, TABLES are for tabular data. You don't need TABLEs to make a page look pretty. For example, look at the structure of this page:


14)Store CSS/JS in external files. Don't clutter your HTML (or XHTML, whatever the case may be). Make life easier for Google and you improve your chance of a clean listing.

15)Validate your page. "look at Google, it doesn't validate!" is a line strictly for newbs. Pro's don't even need a validator to write clean code, and they don't leave a mess like amateurs do. Remember, Googlebot is NOT GOD. It's not omnicient. It doesn't know how to parse HTML perfectly. Invalid code may rank, but will all invalid HTML get indexed? At least validate your pages to a point where there are no serious validation errors. Go to w3.org to valdiate your page. Right now. I mean it.
16)Start off your main content with a H/P combination to make the most important part of your page easy for Google to find. This doesn't mean you should litter your page with H2 and H3 tags for purely presentational purposes or hopes of higher ranking. H tags should be used to tell visitors (and Google) how content is organized on a page.

17)Beef up your page. More content not only means your pages actually have some useful information for visitors, but it also means as you add more words, each page becomes less similar to other pages in the Webspace. Google will also be more reluctant to index pages without much content on them. Less content tends to mean less value.

Thanks,
Sonika
Web design company
Garments buying house india

7 comments:

Afzal Khan said...

Good read and nice to get indepth article from your end.

Njoy!!!

ankurindia said...

extremely useful info

Prashant Vikram Singh said...

HI Sonika,
Nice post, hey try explaining 404 invalid urls

Prashant Vikram Singh said...

HI Sonika,
Nice post, hey try explaining 404 invalid urls

thanks!!

Anonymous said...

Hi sonika,

gud post just bookmarked it

ill visit soon to read few more imp posts

thanks

ankurindia said...

these are really nice tips sonika

Anonymous said...

hi Sonika,
Nice effort!
if you have time you are welcome at
http://global-outsourcing.blogspot.com