- or, How Google Could Become a Really Big Company
Today I want to talk about search engines. In just five years, Google has gone from the engine behind Yahoo! to a household brand name that has become a verb (as in, “I Googled him and he collects Pez dispensers, of all things.”)
Is Google really the best? Let’s take a look.
Suppose you’re looking for an attorney who specializes in intellectual property in the San Francisco Bay Area. What would you type into Google to get the best results? No matter what you type into any search engine, you’ll be lucky to find a few attorneys at a time. Through Google, you could find a number of legal directories, or an online yellow pages, and if you spend enough time with them you’ll probably find what you’re looking for. But you won’t find a single listing of all such lawyers.
Suppose you’re looking for a zoom lens for your new camera. You type in the model number and – voila! – you get bombarded with ads from tons of retailers that don’t have the lens you’re looking for in stock. You can find a few retailers who have the lens, but you can’t find the 500 people in the country who have this lens and would like to sell it. For that, you have to go to Ebay, Craigslist, Overstock, and a hundred other web sites. It’s easy to find a retailer; it’s hard to find the lens you’re looking for at the right price within easy reach. Your neighbor could have that lens and want to sell it, but you’d never know.
Suppose you’re a watch collector looking for a Rolex Daytona with a Zenith El-Primero movement. Rolex has made the Daytona since 1961, using three different underlying movements. Using Google, it took me 33 minutes (I timed it on my stopwatch) to find a table showing which movements were used in which years, so now I know which years to look for. However, Rolex uses letter-based series for their watches (“A” series, “F” series, “K” series, etc) to identify the years. I’ve seen a table online that shows which series used which movements in which years, but I can’t find it again.
Let’s suppose I’ve decided to buy a stainless steel Rolex Daytona with a white dial and black subdials made in the 1990s. There are a lot of these on the market. How to find them? With Google? Good luck! You can try Ebay, Timezone, Ioffer.com, Timemerchants.com, Rolxwatch.com, Craigslist (one search for each city), and a dozen other sites. This way, if you work hard, you’ll probably find about 30% of all the watches available. And, as you’re using Google to find what you’re looking for, you’re going to see “relevant” ads from Shopzilla.com, Dealtime.com, Pricegrabber.com, and a dozen other shopping sites, none of which has a single Daytona from the 1990s. Furthermore, there may be a guy in Connecticut who has the exact watch in his drawer and if I offered him the right price he’d be happy to take it. How am I going to find him?
Suppose I’m looking for a new Lexus RX 400h hybrid SUV in silver with heated seats. Where to look? Google? Hardly. I have to go to different car web sites, auction sites, classifieds, community sites, and car enthusiast sites. I type my request into each one, hoping to find a Lexus that suits me and isn’t too far away. I’m going to spend all day doing this and might find some but not all of the cars that suit me. Furthermore, each site has its own form for describing a car. So a seller has to go through a maze of forms to list her car on several sites.
And many sites try to hide their information from search engines, so people looking for cars have to go to each site and type in keywords to try to find the car they are looking for.
What if I’m looking for a microwave oven, a specific power tool, a Gucci handbag, a new cell phone? Same thing. No search engine can help me find most of the products available, let alone retailers that have the product in stock. On the other side, people who want to list their products have to choose among hundreds of sites and fees for listing, with not much hope of finding a buyer that suits them best. Anyone who has ever listed something on Craiglist knows that the listings go stale after a day or two and have to be relisted.
What about services? I’m looking for a nurse who specializes in home visits for my grandmother in Denver. I’d like to find a bicycle frame builder who works with titanium. I’m looking for someone who makes custom orthotics in New York City. I’m looking for a gardener who specializes in low-maintenance gardens for my place in Flagstaff. An organic bakery in Los Angeles. A dog walker in my neighborhood. A bar of Amedei chocolate (the best in the world). Is this one-stop shopping? Hardly. I’m lucky if I can find a small fraction of what’s available online, even using the online yellow pages.
This isn’t just a problem with Google – any search engine that looks for keywords is going to be woefully inadequate for finding specific things. How many times do I search on Google, only to get a list of stores in the UK or Hong Kong? Does Google know I live in New York? Sometimes I want to find something nearby, other times I want to avoid paying sales tax.
I spend a lot of time rephrasing my search terms each day. Example: some companies, like I/PRO, offer software for measuring/auditing the performance of web sites and web traffic. Good luck finding them with a search engine! The words you need to use are just too common.
Have you tried to find a job online? How many job sites do you have to submit your resume to? Does Google help? No. Google can help you find job sites like Hotjobs.com or Salesjobs.com – that’s not much of a service. Have you looked for a person to hire? How many sites do you have to keep track of? Hundreds. Unless the job is for something extremely specific, like a pediatric oncologist, you’d be lucky to find 20% of all qualified candidates at any given time.
Contextualizing Keywords
All search engines have essentially the same approach: you
type in keywords and the search engine tries to guess what would be the best set
of documents for you based on what it knows. Google uses a popularity approach,
relying on web links and some inference rules to determine what’s relevant to
you. Like any search engine, Google tries to match ads to your search by trying
to contextualize the keywords you type and give you relevant ads that might
make them money if you click on them.
It often works quite well, and it's getting better. If Google sees a UPS tracking number in your email, it offers to make it easy for you to track your package. If i get an email talking about euros or other currencies, I often get a link to a currency converter, which is very useful.
Yet, when you're looking for something specific, Google's results can often be humorous, rather than effective. A friend of mine invited me to a party, and Google's email service showed me several ads for purchasing bridal-shower invitations online. How many times have you typed in something like “Oil Changers,” only to get an ad from Amazon.com offering you to search for books about “Oil Changers” in their online bookstore? How relevant is that?
How can a search engine know what I have in mind? If I type the words “Rolex Daytona” into Google, here’s what I get:
Rolex Daytona Review
danchan is a weblog with the best daily news on the net, high gear cool equipment reviews, tips on backpacking in central europe, cgi/perl/mysql tutorials ...
2005 Rolex 24 hours of Daytona
2005 24hrs of Daytona
Vist us track side from Daytona Speedway. Live updates and photos from
track side.
Rolex Daytona - PriceSCAN.com - Unbiased Price Comparisons ...
PriceSCAN is your unbiased
guide to finding the lowest prices on Rolex Daytona Watches.
Don't buy it before you PriceSCAN it!
Official ROLEX Website: the perpetual spirit watch collections
An inside view of the House of Rolex, classic or elegant watches: discover the art of precision timekeeping, and the Rolex watch collections.
eBay - Rolex, rolex submariner, rolex daytona items on eBay.com
Buy Rolex, rolex submariner, rolex daytona items on eBay. Find a huge selection of omega, breitling items and get what you want now!
MCDOWELL SUSPENDED FROM ROLEX
SERIES CROWN ROYAL 250 AT THE GLEN AFTER SEVERAL ...
09/15, New Picchio Daytona
Prototype Completes Successful Shakedown Runs ...
These are the first six items returned as of 9/18/05. Note that the search engine has to guess whether I’m looking for a watch, a web site with reviews of cool stuff, or a road racing association. How can the search engine guess correctly? Why don’t we give the search engine a better chance by telling more about ourselves and what we are looking for so it doesn’t have to guess?
Go to Ebay and try your luck. As an art collector, I often go to Ebay to look at art porn. Do you think you can type in the word “painting” and see all the paintings? Absolutely not. You’ll miss a lot of the watercolors because they don’t have the word “painting” in their derscriptions. You have to use the Ebay categories to get anything done at all, and even then you get a lot of crossed listings and see a lot of things you don't want to see.
Google’s basic approach so far has been to be better at contextualizing keywords than anyone else. Look at their Google Desktop software – type in keywords and search your own computer – it’s all about contextualizing keywords. The exciting new GooglePrint project lets you search for any sentence in any book ever written (search for “I don’t know nuthin’ ‘bout birthin’ no babies” and you should be taken to the exact page in “Gone With the Wind” that has that sentence.) While this is helpful, especially for research, we are far from having the tools we really need.
Google probably is the best search engine online today. But we have a long way to go! Think of the 300 million web-enabled telephones that will be online in ten years and you can start to see why keyword searching may not be the final solution. Think of the billions of RFID (radio-frequency ID) tags that will be in use then, offering structured metadata in real time to any system that can take advantage of it.
Structure
Could things be different? Yes. Things could be very
different.
To generalize the problem, we could say that we’ve become used to building branded sites that serve as marketplaces, where you list and search according the terms of each site. At Ebay you have to fill out Ebay’s form; at Cars.com you fill out the Cars.com form, and at Craigslist you fill out the Craigslist form. Each database holds its data in a proprietary format. Most databases let you search by keyword and perhaps a few other descriptors (like date, price, location, etc). But much of the searching is by keyword. For example, gray cars aren’t marked as having a particular color. They simply have the word “gray” in their text descriptions. If someone is looking for the word “charcoal” in the text rather than “gray,” the search won’t find it.
We don’t need all these different databases to do all that work for us. We need a standard form that describes something, a form everyone uses. We could then describe everything we “have” and everything we “want,” and simply leave these descriptions on the Internet for all (especially the search engines) to see.
Suppose you want to sell your car. If we had a standard form for describing a car, you could just take a copy of the standard form for a car, fill it out, and then put it on your own personal web site. How do people find your car if it’s not in a database? Simple. Your car’s description is on the Internet. The search engines find it. You can even specify to what degree you’d be interested in selling it (so people looking for a car below a certain price don’t even see your car’s description.) In fact, you can put all your “haves” online and list everything you own, along with an indication of how interested you are in selling each item at what price. Then you can forget about Craigslist or Ebay and just wait until someone contacts you!
To find a car, you again take a copy of the standard description of the form for a car, but this time you fill it out specifying ranges and priorities for the cars that would be acceptable. You may list a range of colors, years, options, prices, and other parameters. Then you simply hit the “search” button on that page sitting on your own web site. The search engine then matches your request to all the cars on the entire Internet. You instantly see a complete list of cars available, sorted according to your priorities. You see 100% of all cars available that meet your criteria. You can even specify preferences (e.g., cars closer to me are better than cars located far away, but lower prices are also better), so cars with higher scores show up first.
If, for some reason, you haven’t found your dream car yet, you just leave your “want” description on the web, and the search engines keep an eye out for any new description that matches your criteria. When a description shows up, you see it immediately. You can keep all your “want” descriptions online, until that elusive 1936 Pez dispenser that would complete your pre-war collection becomes available and you can be the first one to grab it.
The best part about this scheme is that no databases hide the data. All the information is simply sitting on the Internet, waiting for the search-engine spiders to come get it. You no longer have to go to Cars.com, Autobytel.com, Carsdirect.com, Carsbelowinvoice.com, etc. You don’t log in anywhere.
Let’s see – you don’t log in anywhere, you don’t pay any listing or membership fees, you make industries 20 times more efficient, and you see 100% of what you want and 0% of what you don’t want. Is that of any interest to you?
More examples
This approach works for many many markets. Think about
buying a home. Again, there are many realtor web sites and real-estate markets
online, and they all use different formats for describing a home. With a single
universal descriptor, people selling homes could simply fill out the form once.
Anyone looking will be able to find them directly, without going through an
agent or searching a dozen sites. This is what the Multi-List has done for many
years, but there was a stiff charge for getting on the list. With a universal
format, anyone can list a home for free. Realtors would have to distinguish
themselves by performance and service offerings rather than access to their
list.
Think about resumes. How many resume databases are there? Not only do you have to put your resume on 100 different web sites, but they all have different forms and you still don’t know how many employers you’re reaching. Furthermore, once you find a good job, it takes weeks to “turn off” all your resume listings at all those sites! From an employer’s point of view, going to all these different web sites and paying for ads not only costs money, but if the person you’re looking for doesn’t see your ad, then you don’t exist. Why not use the search engines to sort through every single resume available online in one second? This can be done by defining a universal resume that everyone fills out once and that employers fill out with their requirements. Then the search engines simply make the match. And, when you find a good job, you just take your resume offline and – voila! – you stop getting emails from companies.
Retailers would still exist, but they would compete directly with anyone offering a similar product. So they would have to distinguish themselves by offering better service, support, and availability. They could hide their prices from the search engines, but that probably won’t get them any business.
Think about buying an airline ticket. How many discount air-travel sites are there? Thousands! And yet each flight has a certain number of seats, and those seats are the “product” being sold. Why don’t the airlines just put every seat for every flight online and let customers find those seats directly? Now you can see everything available, every individual seat, and when you purchase one then the “availability” flag for that seat turns from “available” to “taken.” If you want to change your plans, you just turn your seat availability back on and sell the seat to someone else. No more travel agents!
The same obviously goes for restaurant bookings, hotel rooms, performances, memberships, events, etc.
Remember my zoom lens example? How many times have you found the right product at the right price, only to learn that it’s not in stock? With structured metadata, you would know exactly how many lenses a retailer has. When your credit-card charge goes through, their system reduces inventory by one unit. This is a far cry from the way things are done today, when someone has to “go look in back” to see what’s in stock. Many retailers have no idea when they are out of a particular item, but if their system knew then the system could just re-order automatically (or re-order more if the product has been moving faster lately).
Want a loan for the house you just found? You could go to a hundred different loan/mortgage web sites and fill out applications, or you could just fill out a standard form once, leave it on the Internet, and wait for the lenders to find you (yes, you can be cloaked to hide your identity, or you could allow access only to banks that meet your criteria). In fact, once you’ve filled out a single standard form (with all the information any bank could ever need), you never have to fill out a loan application again. After you find the right loan, you simply hide your application. Then, next time you’re looking for a loan, you simply update your application and “turn it on” again. In fact, I can do better than that – you could have your loan application update itself constantly.
Interoperability
What I’ve been talking about is called structured metadata. Metadata
is simply information that describes something, whether it’s a house, a pair of
jeans, or an airline seat. The more structured it is, the more precise our
searches become. It may not help you find a good babysitter or recommend a
movie you might like, but structured metadata can help you find anything that’s
unique or has a serial number or can be described objectively. Structured
metadata is being used inside of companies and inside of some industries to
streamline processes, and whenever it’s used it’s a huge win.
Believe it or not, we already have tons of structured metadata. Every database uses structure of some sort. Some, like Craigslist, are extremely light, while others, like CDW.com, are extremely heavy in their use of metadata. And yet, we’re far from where we want to be, because there are very few good universal standards for metadata.
The original idea for standardized metadata came from Tim Berners Lee, the inventor of the Web. His vision was to use the Web to find things using the structured metadata that describes them, rather than having to look for keywords and make associations. He called his vision The Semantic Web.
Unfortunately, the Semantic Web never really took shape. But now, organically, it's starting to grow.
When an industry adopts a standard for its metadata, we get
interoperability. You see it in pockets, and even in a few entire industries.
One example is the book industry. Books are now described using a format called ONIX. An ONIX description of a book really is a universal descriptor for a book, with all the possible fields anyone could ever need to describe any book in any language. Now, publishers can put all their catalog information into ONIX format and it’s accepted by everyone – libraries, online bookstores, used bookstores, distributors, reviewers, etc.
We’re making progress in a few areas, but we're nowhere near critical mass -- the mass needed for the search engines to adopt these universal descriptors and start to make sense of them. We’ve had a language for expressing structured metadata, called XML, for about ten years now. Many industries are now working to define XML standards for all kinds of products (ONIX is what we call a Data Type Descriptor in XML). But it’s hardly universal. In most cases, different companies use different in-house versions of XML descriptors, and then they need databases to try to translate from one to another.
Very few industries have adopted universal descriptors. The “electronic patient chart” for medical informatics is a great example – there must be over a thousand different kinds of electronic patient charts out there, and none has really risen to the surface as the overall standard. It’s much better to have a flawed standard that everyone uses than to have hundreds of “dream solutions” that don’t talk to each other.
[In fact, this is how the Web came about: some scientists used a really lousy version of a document mark-up language called HTML to build the first web pages, and boy did they suck! They were so successful that the browsers adopted this lousy standard and we’re still stuck with it today, although we’ve bolted a lot of fixes onto it since then.]
The next layer on top of XML is called RDF - the Resource Description Framework. The idea is to create public standards for metadata we can all use. RDF is a strong step in the right direction and, after a few dormant years, is just now starting to pick up a bit of momentum. Anyone interested in further research on this topic should start learning about RDF and the work already being done to apply RDF to specific problems.
I hope we’ll develop universal descriptors everyone can use for every kind of product and service. Almost anything that can be described uniquely and objectively can have its own universal descriptor. It even works for commodity contracts and electronic signatures. Then we need the search engines to see them and use the structure contained within. Let’s use keywords to find articles, scientific papers, contents of books, online brochures, and other information. But when we’re looking for something specific, let’s use structured metadata.
Structured Metadata can revolutionize every industry, from health care records to restaurant reservations to delivery of fresh produce to buying music to automobile assembly. I could go on, and I will, but not here. Instead, I’m going to list a number of interesting resources and let you do further research on your own.
In my opinion, we are about 1% of the way toward having an Internet that really works. Some day, people will look back on the Google days and wonder how we could possibly have lived without structured metadata. Just type “structured metadata” into Google right now, and you’ll see why my 1% estimate may actually be high. Many books have been written about structured metadata, but the revolution has not yet started. Google could do it. They have the resources, the brand, and the right business model. If they don’t do it, the next Google will.
Resources
I have written a white paper on this topic. It contains
answers to the questions about where/how to store the metadata,
security/privacy issues, more industries revolutionized, and how to make money
in the brave new world of structured metadata. I’ll email it to anyone who is
interested. My main goal is to get people at Google to read it, so if you can
help with that I would appreciate it. Just email [email protected] and ask me for it.
Books
I wrote a book on this topic. It’s called Futurize Your
Enterprise, John Wiley & Sons, 1999. Section 4. It’s no longer in print.
Use a search engine to find a copy – if you can.
Universal Meta Data Models,by David Maco and Michael Jennings
I haven't read it but it looks interesting.
Links
The Dublincore Metadat Initiative
A strong start in the right direction. Probably the single
best resource for bringing together disparate data standards.
By Tim Berners Lee, James Hendler, and Ora Lassila – a good old article that
helps put the concepts into (imaginary) practice.
Metadata Demystified (PDF)
A good online treatise on metadata standards for book
publishing – read it with an eye toward other industries as well.
The Semantic Web
By Tim Berners Lee, the original inventor of the Web. Seems
to have lost a lot of steam, but the Semantic Web promises to re-emerge as a
place of thought leadership.
An Introduction to the Semantic Web
Explains the difference between RDF and XML – a bit geeky,
but worth reading.
A primer on RDF - the Resource Description Framework
This comes from the W3C, so it's technical but a good start for any metadata geek.
Resource Description Framework FAQ
Written in reasonably plain english.
XML.com
News and current topics in the world of interchangeability.
Project Liberty
Hopes to create a standard set of identity structures for everything from digital authentication to micropayment.
Search Engine Watch
News on search engines.
Search Engines
Scirus
A fantastic place to search for science information
Scinet
Another science search engine – I love these!
Metacrawler
A search engine that searches many search engines at once.
Great for comparing results.
Dogpile
Another meta-search engine
Companies
Taxonomy Strategies
A company that consults on data interoperability
Recent Comments