Google has a mission to categorize the Web - searching deep into the unknown trenches and finding that one comment you made on a forum in 2004. Post on a blog, and they know it - via Web crawlers that work better than any other company. Upload images to Flickr, and maybe add a metatag or two, and Google can find them. But there are still quite a few things that Google can t find.
1. Images. If you post images to and tag them, Google can do the search. However, if all you do is post images to a Web site without any metadata text at all, just uploading images, Google has a harder time. What needs to improve here is image recognition. They are working on it - there s a facial recognition system that can tell the difference between George Bush senior and W. Google has an even harder time with video unless it is tagged.
2. Public databases. Google still can t find the hairy details of life. For example, the info for your next flight is readily available (you can usually type just a few terms such as the flight number and city names), but when the public database contains info like a flight delay due to weather, Google has a harder time. Google often can t find things like auction information or health records. The search is just not detailed enough - and some info is contained in private databases. One site that is working on this problem is called Koxmix.
3. Speech. Google is working on this one as well, but today if you record a note to yourself and post it on the Web, you may never find it with a search engine unless the audio is tagged. This is a problem as speech recognition becomes more viable, because we may start talking to our computers a lot more (ala, Microsoft Sync in a car but form your desktop PC instead).
4. Printed literature. Obviously these are not typically on the Web, but there are sites like MagCloud.com and Zinio that are digitizing magazine content. More importantly, the minute I scan a page into the computer I have made it unsearchable unless I add tags.
5. Online books. Some of these are in PDF, some are not. Those that are in proprietary formats, such as the one used for the Sony Reader, are not that searchable. Google has had a tough road with books.google.com due to come copyright issues.