How I got more pages from my sitemap into the Google index

Written by on November 3, 2011, 10:11 pm

N.B. Disclaimer: For SEO types this might all seem very obvious, but I thought someone might find it interesting.

Wow! In July this year I tweeted that I was working on improving the number of URLs I had in the index:

Working on getting more urls in my sitemap into the index. Currently at 308 of 1804 links. Hoping todays changes will improve that ratio.

Just checked my Webmaster tools and discovered of the 1885 pages on my site today, 1790 are in the index. So what did I do?

You have to be careful with Monetization

I have an Amazon affiliate account and when I was building my World War Two timeline project it seemed like a good idea, in addition to linking to Wikipedia articles, to link to Amazon search results for the title of each data point (using my affiliate link of course).

For something like my entry on D-Day, this kinda worked. You got a link that looked something like this:

Buy a book about 'D-Day'

This makes sense right? I have to monetize this thing somehow and maybe people want to buy a book about this topic they have clicked on? It turns out very few people actually did that and in the end it was just adding a distracting link to my content.

The cartoon caricature of the Googlebot in my head saw those pages of affiliate links and frowned, so I removed them (This is the reason that in my head the Googlebot is an enthusiastic labrador puppy).

Since then I have built iPhone and Android apps to provide new ways of getting to my content and found that is a much more effective way of monetizing my work.

Spammy user profiles

Every now and then a medical supplies peddler or enthusiastic pornographer will find their way to my user signup page. Their profile descriptions are often entertaining, but perhaps something of a red flag for the G-rated Googlebot. To prevent their potentially nasty content from affecting my page rank, I use robots.txt to hide those pages until I have time to nuke them.

Disallow: /user.php

Titles, Titles, Titles

Thanks to the HTML suggestions section in the Webmaster tools I noticed that I had lots of duplicate titles for my content. Each historical data point page was using the generic "World War Two Timeline Project" title instead of a title that summarised the information on the page. It was a very simple change to update these pages to render a more appropriate title and suddenly the number of duplicates for my site dropped dramatically.

Static pages for map popups

This one is actually still a work in progress.

I have lots (1128) of Info Windows on my Patrick O'Brian Mapping Project containing content that should be searchable by Google but currently aren't.

My first attempt at solving this problem was to create a whole pile of static pages that were chained together (with next and previous links) in a giant linked list. I think the Googlebot started down my list of pages and decided it was getting mired in some pathological labyrinth and maybe it would come back another day.

I think that this is something that the Googlebot could probably figure out on its own (indexing the data from the Javascript map), so I am hoping that by the time I get around to revisting the issue, they will have worked it out. I am looking at you Chris Broadfoot ;)

Conclusions

So pretty simple and hopefully intuitive changes and it looks like they worked. I found a good way of getting into the mindset for this type of thing is to watch some of Matt Cutts' Webmaster Q&A videos. I found his even tempered, sensible explanations for the various things that Google does to improve search were enough to get me thinking about this stuff the right way.

Permalink - Tags: Development,Google