If you want to determine how search engine healthy your site is, one thing you can do is find out how many indexed pages you have on the Big G. More specifically, find out how many of your site's pages show up as indexed as opposed to buried away in Google's supplemental index - where they put lower ranking pages. Pages found in the supplemental index tend to be crawled much less and usually never assigned Page Rank.
So admittedly, we're a little obsessed with how our site indexes on Google and in doing everything we can to improve this number and crack the one million mark :) With Google constantly making changes to their UI (especially over the last six months), we've had to rely on multiple query methods to approximate the number of indexed pages our site is registering on Google. Because we've often seen wild fluctuations on a weekly or even daily basis in the number of indexed pages, using multiple methods has allowed us to predict indexing more accurately.
If you want to find out how many pages your domain has in Google's index how do you do it?
For a long time, webmasters have relied on a simple Google site command query using site:<yourdomain>. It's important to note here the "www" is not included. For example, when doing an indexing query for our own domain (www.ezlocal.com), we go to Google and type in site:ezlocal.com and today's results look something like this:
As you can see in this example site index query, Google estimates 485,000 indexed results for our domain. This seems about right. Even though, for at least one day last week, we witnessed 1,000,000+ results. Needless to say, we hold back from getting too excited until we can verify this number of indexed pages over a series of days through multiple queries and methods using multiple IP addresses.
An additional index query that seems to mirror closely with the site:url command is using inurl:<yourdomain>. An "indexed pages" search using inurl:ezlocal.com yielded similar results:
Using the command inurl:<yourdomain> method is really just another best guess at determining your total number of indexed pages on Google as it also includes results for sites using your url in their own pages (e.g., http://www.alexa.com/siteinfo/ezlocal.com). Just like the site:url query, this method isn't perfect. Even so, one would expect an inurl command query to net more results than a site:url command, but we've consistently seen the opposite.
Another tool for estimating how many pages Google indexes on your site is within Google Webmaster Tools (GWT). This involves submitting XML sitemaps for every URL within your domain. Google's Sitemaps report shows how many URLs, from the sitemaps you've submitted, have been indexed.
On the official Google Webmaster Central forum, they seem to dismiss the site command query as inaccurate.
Here's what Google employee JohnMu had to say about determining indexing:
For information about the number of URLs indexed, I'd recommend submitting a Sitemap file with the preferred URLs through Webmaster Tools. There you can see the number of URLs (based on the ones that you submitted) that were actually indexed. This number is much more realistic than the rough approximation given in a site:-query (also keep in mind that the site:-query is a restrict, so it's by design not a conclusive list of matching URLs). That said, one of the most important elements in having many URLs indexed properly is to make sure that the content returned is unique and compelling. If for instance you are hosting the exact same content on multiple URLs, this can result in us only indexing one copy. Similarly, if we determine that the content is generally the same as available elsewhere, this can result in us not indexing as much as we might otherwise. At any rate, focusing on the site:-query rough approximations will not lead to useful results.
So according to Google, the best method is using their GWT. Are we surprised? We have to agree with JohnMu's advice on best practices in indexing, but we're still clinging to the site and inurl commands as useful "indexing approximation" tools and so are the majority of webmasters.
We've found the GWT to deliver less results than a site or inurl command primarily because most webmasters exclude pages in the sitemap that normally get indexed via a site or inurl command. Our results happen to match fairly closely.
Use these three methods, and you can better gauge how many pages Google indexes on your site. You can do the same site or inurl command on Yahoo and compare these numbers with Yahoo's Site Explorer but let me remind you, for at least the foreseeable future, we live in a Google world.