Googlebot, Javascript, Ajax, and Cookies – “It’s Complicated”

Girl with Bird on her Head

This is how confusing the results are

So, following on from the previous post “Can the Googlebot read JavaScript? Ajax? Cookies?“, the wait is over, the results are in…
Overall the results are quite confusing, but lead me to make some strange, possible conclusions..

  • Googlebot CAN read JavaScript
  • Googlebot CAN execute, and read the results of, AJAX requests
  • Googlebot CAN NOT store/read Cookies

But the killer is:

  • Googlebot DOES NOT use this information in search relevancy calculations

If we take a look at the cached version of the page it doesn’t actually tell us anything about the Googlebot. The only string that can be seen is the one that uses inline JavaScript which is pretty much expected as the Ajax requests are relative and not absolute meaning they fail on the cached version.
And, as we predicted, the cookie is not set, telling us that the Googlebot didn’t/can’t read cookies.

Googlebot reading Javascript, example 1

How do they know the string exists if they didn't parse the JavaScript?

The ability of the Googlebot to read the JS and Ajax is not immediately obvious.

JavaScript

Indeed if we take the first string: and search for it in Google, we get no results. Try it yourself.

….However

A search for the line immediately before the JavaScript text (“so… Relentless Marauder becomes…”) returns the page.
When you check out the page preview, the search query “so… Relentless Marauder becomes…” is highlighted.
Right below it is our JavaScript text….

This is really interesting as it shows the JavaScript text in the right context on the page. It also shows that:

Googlebot is able to read the JavaScript as a string of text.

But the only place that you can see evidence of that is in the page preview section, which is quite strange.

What makes this more interesting is that the JavaScript string is conspicuously missing in the snippet of this search result.
Given the exact search query we are presented with a snippet that shows the context of the query on the target page, but the JavaScript text is simply missing altogether.

The JavaScript string is missing in the snippet.

This would suggest to me that there are perhaps 2 kinds of Googlebot: The classic text based crawler and a more advanced, browser type bot that can handle JavaScript, CSS, etc.

On the results page here we see Google showing us two different interpretations of the content of our page. Ok in this case it is only 2 words, but theoretically it could be a huge difference.

Ajax Requests

So we know that the GoogleBot can read the JavaScript as a string of text, but chooses not to use it in calculating relevancy to a search query. But what about the Ajax requests we make?

GoogleBot reading the Ajax result, in context on the page

If we take the second test string: and run another Google search for that phrase we see a strange result Try it yourself.
As you can see, the search returns the file that contains that string. On the main test page we call this file with an Ajax call and show the contents. This file is not linked from any other page which would lead me to think that the GoogleBot has understood the structure of the Ajax request and sent off a spider to grab the contents of the file.

Whilst I can’t completely rule out that the GoogleBot got there via an external link, I think given the time frame and obscurity of the file location, this is very unlikely.

Again if we search for the text immediately prior to the string we are testing, we get a results page with the string showing up in the page preview and not in the snippet. This is exactly the same as with the JavaScript text and again we have Google showing us two different interpretations of the page content.

With the third test string we also run some referrer filtering to make sure that the text is only output when it is called via an Ajax request into the main page.
Again this file was found and indexed See here

But, again, when we run a search for the text prior to the expected string, we are presented with the ajax text shown in the page preview but not in the snippet. Try it Yourself. This is perhaps the strongest evidence I have seen yet that, in some way, the GoogleBot DOES execute Ajax requests and CAN read the resulting output.

Cookies

The fourth test yielded no results whatsoever, backing up the idea that the Googlebot CAN NOT store and read Cookies (which is good as its the main premise behind my other post “Faking Backlinks using the Referrer“).

Conclusion

So, in conclusion, it would seem that the GoogleBot has the ability to parse JavaScript content and to read the results of Ajax requests but for some reason these elements are not being used to calculate search relevancy in the same way that on page text is.

Why is this?

Happy Face

Happy Face Man is Happy.

I would guess that Google is doing this to protect the quality of its search results. At the moment the idea of reading JavaScript and Ajax seems to be new/experimental and it has the chance to mess up a lot of stuff in the SERPs if they were to suddenly switch to valuing such text. I would guess the current algorithm for calculating search relevancy has been hammered out over the last 15 or so years and is very, very, sensitive. This would also give some reasoning as to why Facebook comments are being crawled/indexed as the source can be trusted and possibility of ‘poisoning’ the SERPs with blackhat style stuff must surely be minimal. But these are just my thoughts.

Finally I would like to say a big thanks to everyone that helped spread the previous post, it was great to get feedback from other great SEO’s out there.

This entry was posted in Blog. Bookmark the permalink.

2 Responses to Googlebot, Javascript, Ajax, and Cookies – “It’s Complicated”

  1. SEO Mofo says:

    Thanks for taking the time to run these experiments. I love reading stuff like this.

    I have a few things to add that might clarify your results.

    As you pointed out, Google seems to understand JavaScript when it renders Instant Previews, but not when calculating relevance scores or displaying SERP snippets. However, this shouldn’t be interpreted as one single entity (Google) acting differently under different circumstances–rather, it should be interpreted as two separate/independent processes.

    The process that generates Instant Previews is programmed to execute/render a web page (including JavaScript) just like a browser would. [http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1062498]

    On the other hand, Google calculates relevance scores from the raw HTML code returned by the server after the initial request. External resources aren’t fetched nor executed prior to indexing. Doing so would be much too expensive in terms of memory and CPU (at least for now). This is why Instant Previews are often generated on the fly: Google doesn’t want to spend their resources on that until a User explicitly requests it.

    The text contained in the Instant Preview snippets is also calculated on the fly, since it changes depending on the search query.

    With regards to AJAX requests, I think you’re giving Google too much credit. You concluded this:

    Googlebot CAN execute, and read the results of, AJAX requests

    …and this:

    As you can see, the search returns the file that contains that string. On the main test page we call this file with an Ajax call and show the contents. This file is not linked from any other page which would lead me to think that the GoogleBot has understood the structure of the Ajax request and sent off a spider to grab the contents of the file.

    There’s an important detail here that should be emphasized: your blog post does NOT rank for phrases 2 or 3; only the external files do. This means that Google did NOT understand the nature of the AJAX request, because the content of those external files was NOT associated with the page (i.e., the blog post) that embedded them.

    In other words, all Google did was find a string that looked like a URL…tried it…saw that it worked…and treated it like a new web page. I’ve written a post about this, if you’re interested:

    http://www.seomofo.com/advanced/do-not-let-google-crawl-javascript.html

    Cheers,
    SEO Mofo

    • danclarkie says:

      Hey,
      Thanks for the comment.
      I think the line:
      “As you can see, the search returns the file that contains that string. On the main test page we call this file with an Ajax call and show the contents. This file is not linked from any other page which would lead me to think that the GoogleBot has understood the structure of the Ajax request and sent off a spider to grab the contents of the file.”
      is ambiguous, I agree it is more likely that the GoogleBot simply found a string that looks like a URL and spidered it but in this case they were relative paths to the files which is slightly confusing. I meant to say that the GoogleBot figured out that the content was there somehow, i guessed by quasi interpreting the AJAX request, then spidered the external files.

      As for interpreting the GoogleBot/Crawling as being two separate/independent processes, this is kind of the conclusion I was trying to draw but I think the conclusion became a little lost in the sweeping generalisations I made.

      But when I said:
      “Googlebot CAN execute, and read the results of, AJAX requests”
      I was referring to the process used in the generating of the instant previews.
      Theoretically Google could use this data to calculate search relevancy in the future.
      The main point was that Google is able to process the AJAX and JS stuff and understand the outputs as a string of text but they don’t do this on a normal crawl of the site, only on the process used to generate the preview, and it seems that isn’t influencing the relevancy at the moment.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">