Robots.txt file validator

by Sapphire (December 13, 2006)

My research into whether or not a missing robots.txt file could really be the cause of Google shunning Project Mai Tai led me to a discovery: my robots.txt files suck.  Wayback machine was no longer archiving any of them because of something in the file - I’m not sure what, but I copied the robots.txt file from another site that was indexed, and then I ran it through this lovely robots.txt file validator.  After confirming that the new file is both valid and able to play well with Wayback, I updated the file on all my sites, tweaking as needed for special folders I want the bots to avoid.

I still don’t think this is what’s keeping Project Mai Tai out of Google. But can a valid, well-written, updated robots.txt file help your site?  Well, again, I’m not entirely sure, but I think it’s worth a try.

Your Ad Here


4 Responses to “Robots.txt file validator”

  1. SarahG said:

    What was in your old robots.txt file? If it’s not valid or blocking a bot/spider then yes it could cause problems. Sometimes it’s safer to not have one (or have a blank file to prevent 404s) than to have one which could stop.

    Have you also tried using Google Sitemaps as this can tell you if there’s a reason as to why Google isn’t visiting your site, and that will validate your robots.txt file at the same time and tell you if there are any problems that may hinder Googlebot.

  2. Sapphire said:

    I should clarify: I had several sites, including this one, with “invalid” robots.txt files, but they were all indexed and getting traffic. Additionally, I have a new site which I forgot to put a robots.txt file on - and Yahoo and MSN are indexing and sending traffic, but even though the Googlebot comes by every day, the engine has yet to index a single page.

    The other day, I thought maybe the lack of a robots.txt file was causing this, but I’ve talked to other people who never use them so I know that’s not the case.

    My current best guess is that it’s something to do with the domain not being brand new - someone owned it before in 2004. They had a simple blog with nothing that seemed likely to have tripped a Google filter… but I’m wondering if perhaps when you buy a domain that’s existed before, Google waits a while before indexing. We’re coming up on exactly three months since I bought the domain, so we’ll see if Google starts indexing.

  3. Doug Karr said:

    What does http://www.google.com/webmaster say? You can absolutely verify what Google is doing there.

  4. Sapphire said:

    Doug, it confirms that it’s crawling the site successfully every couple of days, that most of the pages have “low pagerank” and a few have none. Everything seems to be right… it’s just not indexed.

    I bought another domain around the time I bought this one, and really did nothing but install WordPress on it, but it got indexed immediately. I really think the problem with this one domain is that the name is not totally brand-spanking new. The domain was owned by someone in 2004, but then they let it lapse. Even though they don’t appear to have done anything shady with it, I have to think maybe Google puts a hold on any previously owned domains.

    We’ll see. I’m getting traffic from other engines so I think I’ll just hold steady and hope Google gets over this.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Comments will be sent to the moderation queue.