The concept of hypertext and a memory extension really came to life in July of 1945, when after enjoying the scientific camaraderie that was a side effect of WWII, Vannevar Bush's As We May Think was published in The Atlantic Monthly.
He urged scientists to work together to help build a body of knowledge for all mankind. Here are a few selected sentences and paragraphs that drive his point home.
Specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.
The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.
A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted.
He not only was a firm believer in storing data, but he also believed that if the data source was to be useful to the human mind we should have it represent how the mind works to the best of our abilities.
Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing. ... Having found one item, moreover, one has to emerge from the system and re-enter on a new path.
The human mind does not work this way. It operates by association. ... Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it. In minor ways he may even improve, for his records have relative permanency.
Presumably man's spirit should be elevated if he can better review his own shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion and not merely become bogged down part way there by overtaxing his limited memory.
He then proposed the idea of a virtually limitless, fast, reliable, extensible, associative memory storage and retrieval system. He named this device a memex.
Gerard Salton, who died on August 28th of 1995, was the father of modern search technology. His teams at Harvard and Cornell developed the SMART informational retrieval system. Salton’s Magic Automatic Retriever of Text included important concepts like the vector space model, Inverse Document Frequency (IDF), Term Frequency (TF), term discrimination values, and relevancy feedback mechanisms.
He authored a 56 page book called A Theory of Indexing which does a great job explaining many of his tests upon which search is still largely based. Tom Evslin posted a blog entry about what it was like to work with Mr. Salton.
Ted Nelson created Project Xanadu in 1960 and coined the term hypertext in 1963. His goal with Project Xanadu was to create a computer network with a simple user interface that solved many social problems like attribution.
While Ted was against complex markup code, broken links, and many other problems associated with traditional HTML on the WWW, much of the inspiration to create the WWW was drawn from Ted's work.
There is still conflict surrounding the exact reasons why Project Xanadu failed to take off.
The Wikipedia offers background and many resource links about Mr. Nelson.
The first few hundred web sites began in 1993 and most of them were at colleges, but long before most of them existed came Archie. The first search engine created was Archie, created in 1990 by Alan Emtage, a student at McGill University in Montreal. The original intent of the name was "archives," but it was shortened to Archie.
Archie helped solve this data scatter problem by combining a script-based data gatherer with a regular expression matcher for retrieving file names matching a user query. Essentially Archie became a database of web filenames which it would match with the users queries.
Bill Slawski has more background on Archie here.
As word of mouth about Archie spread, it started to become word of computer and Archie had such popularity that the University of Nevada System Computing Services group developed Veronica. Veronica served the same purpose as Archie, but it worked on plain text files. Soon another user interface name Jughead appeared with the same purpose as Veronica, both of these were used for files sent via Gopher, which was created as an Archie alternative by Mark McCahill at the University of Minnesota in 1991.
Tim Burners-Lee existed at this point, however there was no World Wide Web. The main way people shared data back then was via File Transfer Protocol (FTP).
If you had a file you wanted to share you would set up an FTP server. If someone was interested in retrieving the data they could using an FTP client. This process worked effectively in small groups, but the data became as much fragmented as it was collected.
From the Wikipedia:
While an independent contractor at CERN from June to December 1980, Berners-Lee proposed a project based on the concept of hypertext, to facilitate sharing and updating information among researchers. With help from Robert Cailliau he built a prototype system named Enquire.
After leaving CERN in 1980 to work at John Poole's Image Computer Systems Ltd., he returned in 1984 as a fellow. In 1989, CERN was the largest Internet node in Europe, and Berners-Lee saw an opportunity to join hypertext with the Internet. In his words, "I just had to take the hypertext idea and connect it to the TCP and DNS ideas and — ta-da! — the World Wide Web". He used similar ideas to those underlying the Enquire system to create the World Wide Web, for which he designed and built the first web browser and editor (called WorldWideWeb and developed on NeXTSTEP) and the first Web server called httpd (short for HyperText Transfer Protocol daemon).
The first Web site built was at http://info.cern.ch/ and was first put online on August 6, 1991. It provided an explanation about what the World Wide Web was, how one could own a browser and how to set up a Web server. It was also the world's first Web directory, since Berners-Lee maintained a list of other Web sites apart from his own.
In 1994, Berners-Lee founded the World Wide Web Consortium (W3C) at the Massachusetts Institute of Technology.
Computer robots are simply programs that automate repetitive tasks at speeds impossible for humans to reproduce. The term bot on the internet is usually used to describe anything that interfaces with the user or that collects data.
Search engines use "spiders" which search (or spider) the web for information. They are software programs which request pages much like regular browsers do. In addition to reading the contents of pages for indexing spiders also record links.
Another bot example could be Chatterbots, which are resource heavy on a specific topic. These bots attempt to act like a human and communicate with humans on said topic.
Search engines consist of 3 main parts. Search engine spiders follow links on the web to request pages that are either not yet indexed or have been updated since they were last indexed. These pages are crawled and are added to the search engine index (also known as the catalog). When you search using a major search engine you are not actually searching the web, but are searching a slightly outdated index of content which roughly represents the content of the web. The third part of a search engine is the search interface and relevancy software. For each search query search engines typically do most or all of the following
Andrei Broder authored A Taxonomy of Web Search [PDF], which notes that most searches fall into the following 3 categories:
Want to become a better searcher? Most large scale search engines offer:
Soon the web's first robot came. In June 1993 Matthew Gray introduced the World Wide Web Wanderer. He initially wanted to measure the growth of the web and created this bot to count active web servers. He soon upgraded the bot to capture actual URL's. His database became knows as the Wandex.
The Wanderer was as much of a problem as it was a solution because it caused system lag by accessing the same page hundreds of times a day. It did not take long for him to fix this software, but people started to question the value of bots.
In October of 1993 Martijn Koster created Archie-Like Indexing of the Web, or ALIWEB in response to the Wanderer. ALIWEB crawled meta information and allowed users to submit their pages they wanted indexed with their own page description. This meant it needed no bot to collect data and was not using excessive bandwidth. The downside of ALIWEB is that many people did not know how to submit their site.
Martijn Kojer also hosts the web robots page, which created standards for how search engines should index or not index content. This allows webmasters to block bots from their site on a whole site level or page by page basis.
By default, if information is on a public web server, and people link to it search engines generally will index it.
In 2005 Google led a crusade against blog comment spam, creating a nofollow attribute that can be applied at the individual link level. After this was pushed through Google quickly changed the scope of the purpose of the link nofollow to claim it was for any link that was sold or not under editorial control.
By December of 1993, three full fledged bot fed search engines had surfaced on the web: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider. JumpStation gathered info about the title and header from Web pages and retrieved these using a simple linear search. As the web grew, JumpStation slowed to a stop. The WWW Worm indexed titles and URL's. The problem with JumpStation and the World Wide Web Worm is that they listed results in the order that they found them, and provided no discrimination. The RSBE spider did implement a ranking system.
Since early search algorithms did not do adequate link analysis or cache full page content if you did not know the exact name of what you were looking for it was extremely hard to find it.
Excite came from the project Architext, which was started by in February, 1993 by six Stanford undergrad students. They had the idea of using statistical analysis of word relationships to make searching more efficient. They were soon funded, and in mid 1993 they released copies of their search software for use on web sites.
Excite was bought by a broadband provider named @Home in January, 1999 for $6.5 billion, and was named Excite@Home. In October, 2001 Excite@Home filed for bankruptcy. InfoSpace bought Excite from bankruptcy court for $10 million.
When Tim Berners-Lee set up the web he created the Virtual Library, which became a loose confederation of topical experts maintaining relevant topical link lists.
The EINet Galaxy web directory was born in January of 1994. It was organized similar to how web directories are today. The biggest reason the EINet Galaxy became a success was that it also contained Gopher and Telnet search features in addition to its web search feature. The web size in early 1994 did not really require a web directory; however, other directories soon did follow.
In April 1994 David Filo and Jerry Yang created the Yahoo! Directory as a collection of their favorite web pages. As their number of links grew they had to reorganize and become a searchable directory. What set the directories above The Wanderer is that they provided a human compiled description with each URL. As time passed and the Yahoo! Directory grew Yahoo! began charging commercial sites for inclusion. As time passed the inclusion rates for listing a commercial site increased. The current cost is $299 per year. Many informational sites are still added to the Yahoo! Directory for free.
On September 26, 2014, Yahoo! announced they would close the Yahoo! Directory at the end of 2014, though it was transitioned to being part of Yahoo! Small Business and remained online at business.yahoo.com.
In 1998 Rich Skrenta and a small group of friends created the Open Directory Project, which is a directory which anybody can download and use in whole or part. The ODP (also known as DMOZ) is the largest internet directory, almost entirely ran by a group of volunteer editors. The Open Directory Project was grown out of frustration webmasters faced waiting to be included in the Yahoo! Directory. Netscape bought the Open Directory Project in November, 1998. Later that same month AOL announced the intention of buying Netscape in a $4.5 billion all stock deal.
Google offers a librarian newsletter to help librarians and other web editors help make information more accessible and categorize the web. The second Google librarian newsletter came from Karen G. Schneider, who is the director of Librarians' Internet Index. LII is a high quality directory aimed at librarians. Her article explains what she and her staff look for when looking for quality credible resources to add to the LII. Most other directories, especially those which have a paid inclusion option, hold lower standards than selected limited catalogs created by librarians.
The Internet Public Library is another well kept directory of websites.
Due to the time intensive nature of running a directory, and the general lack of scalability of a business model the quality and size of directories sharply drops off after you get past the first half dozen or so general directories. There are also numerous smaller industry, vertically, or locally oriented directories. Business.com, for example, is a directory of business websites.
Looksmart was founded in 1995. They competed with the Yahoo! Directory by frequently increasing their inclusion rates back and forth. In 2002 Looksmart transitioned into a pay per click provider, which charged listed sites a flat fee per click. That caused the demise of any good faith or loyalty they had built up, although it allowed them to profit by syndicating those paid listings to some major portals like MSN. The problem was that Looksmart became too dependant on MSN, and in 2003, when Microsoft announced they were dumping Looksmart that basically killed their business model.
In March of 2002, Looksmart bought a search engine by the name of WiseNut, but it never gained traction. Looksmart also owns a catalog of content articles organized in vertical sites, but due to limited relevancy Looksmart has lost most (if not all) of their momentum. In 1998 Looksmart tried to expand their directory by buying the non commercial Zeal directory for $20 million, but on March 28, 2006 Looksmart shut down the Zeal directory, and hope to drive traffic using Furl, a social bookmarking program.
All major search engines have some limited editorial review process, but the bulk of relevancy at major search engines is driven by automated search algorithms which harness the power of the link graph on the web. In fact, some algorithms, such as TrustRank, bias the web graph toward trusted seed sites without requiring a search engine to take on much of an editorial review staff. Thus, some of the more elegant search engines allow those who link to other sites to in essence vote with their links as the editorial reviewers.
Unlike highly automated search engines, directories are manually compiled taxonomies of websites. Directories are far more cost and time intensive to maintain due to their lack of scalability and the necessary human input to create each listing and periodically check the quality of the listed websites.
General directories are largely giving way to expert vertical directories, temporal news sites (like blogs), and social bookmarking sites (like del.ici.ous). In addition, each of those three publishing formats I just mentioned also aid in improving the relevancy of major search engines, which further cuts at the need for (and profitability of) general directories.
Here is a great background video on the history of search.
Brian Pinkerton of the University of Washington released WebCrawler on April 20, 1994. It was the first crawler which indexed entire pages. Soon it became so popular that during daytime hours it could not be used. AOL eventually purchased WebCrawler and ran it on their network. Then in 1997, Excite bought out WebCrawler, and AOL began using Excite to power its NetFind. WebCrawler opened the door for many other services to follow suit. Within 1 year of its debuted came Lycos, Infoseek, and OpenText.
Lycos was the next major search development, having been design at Carnegie Mellon University around July of 1994. Michale Mauldin was responsible for this search engine and remains to be the chief scientist at Lycos Inc.
On July 20, 1994, Lycos went public with a catalog of 54,000 documents. In addition to providing ranked relevance retrieval, Lycos provided prefix matching and word proximity bonuses. But Lycos' main difference was the sheer size of its catalog: by August 1994, Lycos had identified 394,000 documents; by January 1995, the catalog had reached 1.5 million documents; and by November 1996, Lycos had indexed over 60 million documents -- more than any other Web search engine. In October 1994, Lycos ranked first on Netscape's list of search engines by finding the most hits on the word ‘surf.’.
Infoseek also started out in 1994, claiming to have been founded in January. They really did not bring a whole lot of innovation to the table, but they offered a few add on's, and in December 1995 they convinced Netscape to use them as their default search, which gave them major exposure. One popular feature of Infoseek was allowing webmasters to submit a page to the search index in real time, which was a search spammer's paradise.
AltaVista debut online came during this same month. AltaVista brought many important features to the web scene. They had nearly unlimited bandwidth (for that time), they were the first to allow natural language queries, advanced searching techniques and they allowed users to add or delete their own URL within 24 hours. They even allowed inbound link checking. AltaVista also provided numerous search tips and advanced search features.
Due to poor mismanagement, a fear of result manipulation, and portal related clutter AltaVista was largely driven into irrelevancy around the time Inktomi and Google started becoming popular. On February 18, 2003, Overture signed a letter of intent to buy AltaVista for $80 million in stock and $60 million cash. After Yahoo! bought out Overture they rolled some of the AltaVista technology into Yahoo! Search, and occasionally use AltaVista as a testing platform.
The Inktomi Corporation came about on May 20, 1996 with its search engine Hotbot. Two Cal Berkeley cohorts created Inktomi from the improved technology gained from their research. Hotwire listed this site and it became hugely popular quickly.
In October of 2001 Danny Sullivan wrote an article titled Inktomi Spam Database Left Open To Public, which highlights how Inktomi accidentally allowed the public to access their database of spam sites, which listed over 1 million URLs at that time.
Although Inktomi pioneered the paid inclusion model it was nowhere near as efficient as the pay per click auction model developed by Overture. Licensing their search results also was not profitable enough to pay for their scaling costs. They failed to develop a profitable business model, and sold out to Yahoo! for approximately $235 million, or $1.65 a share, in December of 2003.
In April of 1997 Ask Jeeves was launched as a natural language search engine. Ask Jeeves used human editors to try to match search queries. Ask was powered by DirectHit for a while, which aimed to rank results based on their popularity, but that technology proved to easy to spam as the core algorithm component. In 2000 the Teoma search engine was released, which uses clustering to organize sites by Subject Specific Popularity, which is another way of saying they tried to find local web communities. In 2001 Ask Jeeves bought Teoma to replace the DirectHit search technology.
Jon Kleinberg's Authoritative sources in a hyperlinked environment [PDF] was a source of inspiration what lead to the eventual creation of Teoma. Mike Grehan's Topic Distillation [PDF] also explains how subject specific popularity works.
On Mar 4, 2004, Ask Jeeves agreed to acquire Interactive Search Holdings for 9.3 million shares of common stock and options and pay $150 million in cash. On March 21, 2005 Barry Diller's IAC agreed to acquire Ask Jeeves for 1.85 billion dollars. IAC owns many popular websites like Match.com, Ticketmaster.com, and Citysearch.com, and is promoting Ask across their other properties. In 2006 Ask Jeeves was renamed to Ask, and they killed the separate Teoma brand.
AllTheWeb was a search technology platform launched in May of 1999 to showcase Fast's search technologies. They had a sleek user interface with rich advanced search features, but on February 23, 2003, AllTheWeb was bought by Overture for $70 million. After Yahoo! bought out Overture they rolled some of the AllTheWeb technology into Yahoo! Search, and occasionally use AllTheWeb as a testing platform.
Most meta search engines draw their search results from multiple other search engines, then combine and rerank those results. This was a useful feature back when search engines were less savvy at crawling the web and each engine had a significantly unique index. As search has improved the need for meta search engines has been reduced.
Hotbot was owned by Wired, had funky colors, fast results, and a cool name that sounded geeky, but died off not long after Lycos bought it and ignored it. Upon rebirth it was born as a meta search engine. Unlike most meta search engines, Hotbot only pulls results from one search engine at a time, but it allows searchers to select amongst a few of the more popular search engines on the web. Currently Dogpile, owned by Infospace, is probably the most popular meta search engine on the market, but like all other meta search engines, it has limited market share.
One of the larger problems with meta search in general is that most meta search engines tend to mix pay per click ads in their organic search results, and for some commercial queries 70% or more of the search results may be paid results. I also created Myriad Search, which is a free open source meta search engine without ads.
The major search engines are fighting for content and marketshare in verticals outside of the core algorithmic search product. For example, both Yahoo and MSN have question answering services where humans answer each other's questions for free. Google has a similar offering, but question answerers are paid for their work.
Google, Yahoo, and MSN are also fighting to become the default video platform on the web, which is a vertical where an upstart named YouTube also has a strong position.
Yahoo and Microsoft are aligned on book search in a group called the Open Content Alliance. Google, going it alone in that vertical, offers a proprietary Google Book search.
All three major search engines provide a news search service. Yahoo! has partnered with some premium providers to allow subscribers to include that content in their news search results. Google has partnered with the AP and a number of other news sources to extend their news database back over 200 years. And Topix.net is a popular news service which sold 75% of its ownership to 3 of the largest newspaper companies. Thousands of weblogs are updated daily reporting the news, some of which are competing with (and beating out) the mainstream media. If that were not enough options for news, social bookmarking sites like Del.icio.us frequently update recently popular lists, there are meme trackers like Techmeme that track the spread of stories through blogs, and sites like Digg allow their users to directly vote on how much exposure each item gets.
Google also has a Scholar search program which aims to make scholarly research easier to do.
In some verticals, like shopping search, other third party players may have significant marketshare, gained through offline distribution and branding (for example, yellow pages companies), or gained largely through arbitraging traffic streams from the major search engines.
On November 15, 2005 Google launched a product called Google Base, which is a database of just about anything imaginable. Users can upload items and title, describe, and tag them as they see fit. Based on usage statistics this tool can help Google understand which vertical search products they should create or place more emphasis on. They believe that owning other verticals will allow them to drive more traffic back to their core search service. They also believe that targeted measured advertising associated with search can be carried over to other mediums. For example, Google bought dMarc, a radio ad placement firm. Yahoo! has also tried to extend their reach by buying other high traffic properties, like the photo sharing site Flickr, and the social bookmarking site del.icio.us.
After a couple years of testing, on May 5th, 2010 Google unveiled a 3 column search result layout which highlights many vertical search options in the left rail.
Search engine marketing is marketing via search engines, done through organic search engine optimization, paid search engine advertising, and paid inclusion programs.
As mentioned earlier, many general web directories charge a one time flat fee or annually recurring rate for listing commercial sites. Many shopping search engines charge a flat cost per click rate to be included in their databases.
As far as major search engines go, Inktomi popularized the paid inclusion model. They were bought out by Yahoo in December of 2003. After Yahoo dropped Google and rolled out their own search technology they continued to offer a paid inclusion program to list sites in their regular search results, but Yahoo Search Submit was ended at the end of 2009.
Pay per click ads allow search engines to sell targeted traffic to advertisers on a cost per click basis. Typically pay per click ads are keyword targeted, but in some cases, some engines may also add in local targeting, behavioral targeting, or allow merchants to bid on traffic streams based on demographics as well.
Pay per click ads are typically sold in an auction where the highest bidder ranks #1 for that keyword. Some engines, like Google and Microsoft, also factor ad clickthrough rate into the click cost. Doing so ensures their ads get clicked on more frequently, and that their advertisements are more relevant. A merchant who writes compelling ad copy and gets a high CTR will be allowed to pay less per click to receive traffic.
In 1996 an 18-year-old college dropout named Scott Banister came up with the idea of charging search advertisers by the click with ads tied to the search keyword. He promoted it to the likes of Yahoo!, but their (lack of) vision was corrupted by easy money, so they couldn't see the potential of search. The person who finally ran with Mr. Banister's idea was IdeaLab's Bill Gross.
Overture, the pioneer in paid search, was originally launched by Bill Gross under the name GoTo in 1998. His idea was to arbitrage traffic streams and sell them with a level of accountability. John Battelle's The Search has an entertaining section about Bill Gross and the formation of overture. John also published that section on his blog.
“The more I [thought about it], the more I realized that the true value of the Internet was in its accountability,” Gross tells me. “Performance guarantees had to be the model for paying for media.”
Gross knew offering virtually risk-free clicks in an overheated and ravenous market ensured GoTo would takeoff. And while it would be easy to claim that GoTo worked because of the Internet bubble’s ouroboros-like hunger for traffic, the company managed to outlast the bust for one simple reason: it worked.
While Overture was wildly successful, it had two major downfalls which prevented them from taking Google's market position:
Those two faults meant that Overture was heavily reliant on it's two largest distribution partners - Yahoo! and Microsoft. Overture bought out AltaVista and AllTheWeb to try to win some leverage, but ultimately they sold out to Yahoo! on July 14, 2003 for $1.63 billion.
Google AdWords launched in 2000. The initial version was a failure because it priced ads on a flat CPM model. Some keywords were overpriced and unaffordable, while others were sold inefficiently at too cheap of a price. In February of 2002, Google relaunched AdWords selling the ads in an auction similar to Overture's, but also adding ad clickthrough rate in as a factor in the ad rankings.
Affiliates and other web entrepreneurs quickly took to AdWords because the precise targeting and great reach made it easy to make great profits from the comfort of your own home, while sitting in your underwear :)
Over time, as AdWords became more popular and more mainstream marketers adopted it, Google began closing some holes in their AdWords product. For example, to fight off noise and keep their ads as relevant as possible, they disallowed double serving of ads to one website. Later they started looking at landing page quality and establishing quality based minimum pricing, which squeezed the margins of many small arbitrage and affiliate players.
Google intends to take the trackable ad targeting allowed by AdWords and extend it into other mediums. Google has already tested print and newspaper ads. Google allows advertisers to buy graphic or video ads on content websites. On January 17, 2006, Google announced they bought dMarc Broadcasting, which is a company they will use to help Google sell targeted radio ads.
On September 15, 2006, Google partnered with Intuit to allow small businesses using QuickBooks to buy AdWords from within QuickBooks. The goal is to help make local ads more relevant by getting more small businesses to use AdWords.
On March 20, 2007, Google announced they were beta testing creating a distributed pay per action affiliate ad network. On April 13, 2007 Google announced the purchase of DoubleClick for $3.1 billion.
On March 4, 2003 Google announced their content targeted ad network. In April 2003, Google bought Applied Semantics, which had CIRCA technology that allowed them to drastically improve the targeting of those ads. Google adopted the name AdSense for the new ad program.
AdSense allows web publishers large and small to automate the placement of relevant ads on their content. Google initially started off by allowing textual ads in numerous formats, but eventually added image ads and video ads. Advertisers could chose which keywords they wanted to target and which ad formats they wanted to market.
To help grow the network and make the market more efficient Google added a link which allows advertisers to sign up for AdWords account from content websites, and Google allowed advertisers to buy ads targeted to specific websites, pages, or demographic categories. Ads targeted on websites are sold on a cost per thousand impression (CPM) basis in an ad auction against other keyword targeted and site targeted ads.
Google also allows some publishers to place AdSense ads in their feeds, and some select publishers can place ads in emails.
To prevent the erosion of value of search ads Google allows advertisers to opt out of placing their ads on content sites, and Google also introduced what they called smart pricing. Smart pricing automatically adjusts the click cost of an ad based on what Google perceives a click from that page to be worth. An ad on a digital camera review page would typically be worth more than a click from a page with pictures on it.
Google was secretive about its revenue share since the inception of AdSense, but due to a lawsuit in Italy Google feared they would be stuck disclosing their revenue share, so they decided to do so publicly for good public relations on May 24, 2010. Google keeps 32% while giving publishers 68% of contextual ad revenues. On search ads Google keeps 49% and gives publishers 51%. Some premium publishers are able to negotiate higher rates & custom integration options as well.
Yahoo! Search Marketing is the rebranded name for Overture after Yahoo! bought them out. As of September 2006 their platform is generally the exact same as the old Overture platform, with the same flaws - ad CTR not factored into click cost, it's hard to run local ads, and it is just generally clunky.
In 2000 Microsoft launched a keyword driven ad program called keywords, but shut it down after 2 months because they feared it would cannibalize their banner ad revenues.
Microsoft AdCenter was launched on May 3. 2006. While Microsoft has limited marketshare, they intend to increase their marketshare by baking search into Internet Explorer 7. On the features front, Microsoft added demographic targeting and dayparting features to the pay per click mix. Microsoft's ad algorithm includes both cost per click and ad clickthrough rate.
Microsoft also created the XBox game console, and on May 4, 2006 announced they bought a video game ad targeting firm named Massive Inc. Eventually video game ads will be sold from within Microsoft AdCenter.
Search engine optimization is the art and science of publishing information in a format which will make search engines believe that your content satisfies the needs of their users for relevant search queries. SEO, like search, is a field much older than I am. In fact, it was not originally even named search engine optimization, and to this day most people are still uncertain where that phrase came from.
Early search engine optimization consisted mostly of using descriptive file names, page titles, and meta descriptions. As search advanced on the page factors grew more important and then people started trying to aim for specific keyword densities.
One of the big things that gave Google an advantage over their competitors was the introduction of PageRank, which graded the value of a page based on the number and quality of links pointing at it. Up until the end of 2003 search was exceptionally easy to manipulate. If you wanted to rank for something all you had to do was buy a few powerful links and place the words you wanted to rank for in the link anchor text.
On November 15, 2003 Google began to heavily introduce many more semantic elements into its search product. Researchers and SEO's alike have noticed wild changes in search relevancy during that update and many times since then, but many searchers remain clueless to the changes.
Search engines would prefer to bias search results toward informational resources to make the commercial ads on the search results appear more appealing. You can see an example of how search can be biased toward commercial or informational resources by playing with Yahoo! Mindset.
On January 18, 2005, Google, MSN, and Yahoo! announced the release of a NoFollow tag which allows blog owners to block comment spam from passing link popularity. People continued to spam blogs and other resources, largely because search engines may still count some nofollow links, and largely because many of the pages they spammed still rank.
Since 2003 Google has came out with many advanced filters and crawling patterns to help make quality editorial links count more and depreciate the value of many overtly obvious paid links or other forms of link manipulation.
Older websites may be given more trust in relevancy algorithms than newer websites (just existing for a period of time is a signal of quality). All major search engines use human editors to help review content quality and help improve their relevancy algorithms. Search engines may factor in user acceptance and other usage data to help determine if a site needs reviewed for editorial quality and to help determine if linkage data is legitimate.
Google has also heavily pushed giving away useful software, tools, and services which allow them to personalize search results based on the searcher's historical preferences.
In many verticals search is self reinforcing, as in a winner take most battle. Jakob Nielsen's The Power of Defaults notes that the top search result is clicked on as often as 42% of the time. Not only is the distribution and traffic stream highly disproportionate, but many people tend to link to the results that were easy to find, which makes the system even more self reinforcing, as noted in Mike Grehan's Filthy Linking Rich.
A key thing to remember if you are trying to catch up with another website is that you have to do better than what was already done, and significantly enough better that it is comment worthy or citation worthy. You have to make people want to switch their world view to seeing you as an authority on your topic. Search engines will follow what people think.
Google engineer Matt Cutts frequently comments that any paid link should have the nofollow attribute applied to it, although Google hypocritically does not place the nofollow attribute on links they buy. They also have placed their ads on the leading Warez site and continued to serve ads on sites that they banned for spamming. Yahoo! Shopping has also been known to be a big link buyer.
Much of the current search research is based upon the view that any form of marketing / promotion / SEO is spam. If that was true, it wouldn't make sense that Google is teaching SEO courses, which they do.
Google's corporate history page has a pretty strong background on Google, starting from when Larry met Sergey at Stanford right up to present day. In 1995 Larry Page met Sergey Brin at Stanford.
By January of 1996, Larry and Sergey had begun collaboration on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website. Larry, who had always enjoyed tinkering with machinery and had gained some notoriety for building a working printer out of Lego™ bricks, took on the task of creating a new kind of server environment that used low-end PCs instead of big expensive machines. Afflicted by the perennial shortage of cash common to graduate students everywhere, the pair took to haunting the department's loading docks in hopes of tracking down newly arrived computers that they could borrow for their network.
A year later, their unique approach to link analysis was earning BackRub a growing reputation among those who had seen it. Buzz about the new search technology began to build as word spread around campus.
BackRub ranked pages using citation notation, a concept which is popular in academic circles. If someone cites a source they usually think it is important. On the web, links act as citations. In the PageRank algorithm links count as votes, but some votes count more than others. Your ability to rank and the strength of your ability to vote for others depends upon your authority: how many people link to you and how trustworthy those links are.
In 1998, Google was launched. Sergey tried to shop their PageRank technology, but nobody was interested in buying or licensing their search technology at that time.
Later that year Andy Bechtolsheim gave them $100,000 seed funding, and Google received $25 million Sequoia Capital and Kleiner Perkins Caufield & Byers the following year. In 1999 AOL selected Google as a search partner, and Yahoo! followed suit a year later. In 2000 Google also launched their popular Google Toolbar. Google gained search market share year over year ever since.
In 2000 Google relaunched their AdWords program to sell ads on a CPM basis. In 2002 they retooled the service, selling ads in an auction which would factor in bid price and ad clickthrough rate. On May 1, 2002, AOL announced they would use Google to deliver their search related ads, which was a strong turning point in Google's battle against Overture.
In 2003 Google also launched their AdSense program, which allowed them to expand their ad network by selling targeted ads on other websites.
Google used a two class stock structure, decided not to give earnings guidance, and offered shares of their stock in a Dutch auction. They received virtually limitless negative press for the perceived hubris they expressed in their "AN OWNER'S MANUAL" FOR GOOGLE'S SHAREHOLDERS. After some controversy surrounding an interview in Playboy, Google dropped their IPO offer range from $85 to $95 per share from $108 to $135. Google went public at $85 a share on August 19, 2004 and its first trade was at 11:56 am ET at $100.01.
In addition to running the world's most popular search service, Google also runs a large number of vertical search services, including:
Google's corporate mission statement is:
Google's mission is to organize the world's information and make it universally accessible and useful.
However that statement includes many things outside of the traditional mindset of search, and Google maintains that ads are a type of information. This other information includes:
In addition to having strong technology and a strong brand Google also pays for a significant portion of their search market share.
On December 20, 2005 Google invested $1 billion in AOL to continue their partnership and buy a 5% stake in AOL. In February 2006 Google agreed to pay Dell up to $1 billion for 3 years of toolbar distribution. On August 7, 2006, Google signed a 3 year deal to provide search on MySpace for $900 million. On October 9, 2006 Google bought YouTube, a leading video site, for $1.65 billion in stock.
Google also pays Mozilla and Opera hundreds of millions of dollars to be the default search provider in their browsers, bundles their Google Toolbar with software from Adobe and Sun Microsystems, and pays AdSense ad publishers $1 for Firefox + Google Toolbar installs, or up to $2 for Google Pack installs.
Google also builds brand exposure by placing Ads by Google on their AdSense ads and providing Google Checkout to commercial websites.
Google Pack is a package of useful software including a Google Toolbar and software from many other companies. At the same time Google helps ensure its toolbar is considered good and its competitors don't use sleazy distribution techniques by sponsoring StopBadware.org.
Google's distribution, vertical search products, and other portal elements give it a key advantage in best understanding our needs and wants by giving them the largest Database of Intentions.
They have moved away from a pure algorithmic approach to a hybrid editorial approach. In April of 2007, Google started mixing recent news results in their organic search results. After Google bought YouTube they started mixing videos directly in Google search results.
Since the Florida update in 2003 Google has looked much deeper into linguistics and link filtering. Google's search results are generally the hardest search results for the average webmaster to manipulate.
Matt Cutts, Google's lead engineer in charge of search quality, regularly blogs about SEO and search. Google also has an official blog and has blogs specific to many of their vertical search products.
Google also helps webmasters understand how Google is indexing their site via Google Webmaster Central. Google continues to add features and data to their webmaster console for registered webmasters while obfuscating publicly available data.
For an informal look at what working at Google looked like from the inside from 1999 to 2005 you might want to try Xooglers, a blog by former Google brand manager Doug Edwards.
In October of 2007 Google attempted to manipulate the public perception of people buying and selling links by announcing that they were going to penalize known link sellers, and then manually editing the toolbar PageRank scores of some well known blogs and other large sites. These PageRank edits did not change search engine rankings or traffic flows, as the PageRank update was entirely aesthetic.
The net effect of these new algorithms & other forms of obfuscation Google has introduced has been to make it much harder to rank independent websites owned by small companies, while making SEO easier for large companies that have significant usage signals associated with their websites. This has caused many SEO professionals to chase after servicing large corporate clients, as talent tends to follow the money.
Yahoo! was founded in 1994 by David Filo and Jerry Yang as a directory of websites. For many years they outsourced their search service to other providers, considering it secondary to their directory and other content features, but by the end of 2002 they realized the importance and value of search and started aggressively acquiring search companies.
Overture purchased AllTheWeb and AltaVista in 2003. Yahoo! purchased Inktomi in December, 2002, and then consumed Overture in July, 2003, and combined the technologies from the various search companies they bought to make a new search engine. Yahoo! dumped Google in favor of their own in house technology on February 17, 2004.
In addition to building out their core algorithmic search product, Yahoo! has largely favored the concept of social search.
On March 20, 2005 Yahoo! purchased Flickr, a popular photo sharing site. On December 9, 2005, Yahoo! purchased Del.icio.us, a social bookmarking site. Yahoo! has also made a strong push to promote Yahoo! Answers, a popular free community driven question answering service.
On July 2, 2007, Yahoo! launched their behaviorally targeted SmartAds product.
On July 29, 2009, Yahoo! decided to give up on search and signed a 10 year deal to syndicate Bing ads and algorithmic results on their website.
Yahoo! shut down their directory service in December of 2014.
In 2014 Yahoo! signed a deal to be the default search provider in Mozilla Firefox inside the United States. They also did a distribution deal with Oracle, however those revenue gains were short lived & Yahoo kept losing share in online advertising & web search.
Over the years Yahoo! not only exited the search business, but they also exited most of their other vertical businesses. The role of the general purpose web portal was relegated to irrelevancy through the combination of:
Verizon announced they were acquiring the Yahoo! operating business in July of 2016 for $4.83 billion.
In 1998 MSN Search was launched, but Microsoft did not get serious about search until after Google proved the business model. Until Microsoft saw the light they primarily relied on partners like Overture, Looksmart, and Inktomi to power their search service.
They launched their technology preview of their search engine around July 1st of 2004. They formally switched from Yahoo! organic search results to their own in house technology on January 31st, 2005. MSN announced they dumped Yahoo!'s search ad program on May 4th, 2006.
On June 1, 2009, Microsoft launched Bing, a new search service which changed the search landscape by placing inline search suggestions for related searches directly in the result set. For instance, when you search for credit cards they will suggest related phrases like
Microsoft released a Bing SEO guide for Webmasters [PDF] which claimed that the additional keyword suggestions helped pull down search demand to lower listed results when compared against the old results 6 through 10 when using a single linear search result set. Conversely, the Google format tends to concentrate attention on the top few search listings. After extensive eye tracking Gord Hotchkiss named this pattern Google's Golden Triangle.
While Yahoo! has lost much of their relevance, Bing has built a formidable Google search competitor. They have narrowed the revenue gap against Google & have built a profitable search business.
Bing is strongest in the US market, while having a lower share outside of the US, in part due to Google driving aggressive installs of Google Chrome from Flash security updates & promoting Chrome across Google properties & the AdSense ad network.
Google is more dominant in moble search than they are in desktop due to
One would be foolish to think that there is not a better way to index the web, and a new creative idea is probably just under our noses. The fact that Microsoft is making a large investment into developing a new search technology should be some cause for concern for other major search engines.
Through this course of history many smaller search engines have came and went, as the search industry has struggled to find a balance between profitability and relevancy. Some of the newer search engine concepts are web site clustering, semantics, and having industry specific smaller search engines / portals, but search may get attacked from entirely different angles.
On October 5, 2004 Bill Gross ( the founder of Overture and pioneer of paid search) relaunched Snap as a search engine with a completely transparent business model (showing search volumes, revenues, and advertisers). Snap has many advanced sorting features but it may be a bit more than what most searchers were looking for. People tend to like search for the perceived simplicity, even if the behind the scenes process is quite complex.
Outside of technology there are four other frontiers search is being attacked / commoditized from
Some early search pioneers have tried to reboot search, but most these efforts have failed to gain a sustainable marketshare.
Cuil was heavily hyped but quickly bust. Blekko launched with less hype & lasted longer, but ultimately sold to IBM. Gigablast was founded in 2000 by Matt Wells. They are an open source search engine which has quietly existed for nearly 2 decades. Gabriel Weinberg founded DuckDuckGo in 2008. It leverages the core Bing index but differentiates through the search interface & result features. They have done a great job of consistently growing off a small base & is popular with many web developers in part for their search privacy features & lack of result personalization.
Some foreign markets have dominant local search services. Yandex is big in Russia. Baidu leads China. Naver is popular in South Korea.
In 2005 the DoJ obtained search data from AOL, MSN, and Yahoo!. Google denied the request, and was sued for search data in January of 2006. Google beat the lawsuit and was only required to hand over a small sample of data.
In August of 2006 AOL Research released over 3 months worth of personal search data by 650,000 AOL users. A NYT article identified one of the searchers by name. In 2007 the European Union aggressively probed search companies aiming to limit data retention and maintain searcher privacy rights.
As more people create content attention is becoming more scarce. Due to The Tragedy of the Commons many publishing businesses and business models will die. Many traditional publishing companies enjoyed the profits enabled by running what was essentially regionally based monopolies. Search, and other forms of online media, allow for better targeting and less wasteful / more efficient business models. Due to growing irrelevancy, a fear of change, and a fear of disintermediation, many traditional publishing companies have fought search.
In an interview by Danny Sullivan, Eric Schmidt stated he thought many of the lawsuits Google face are business deals done in a court room.
In September of 2006 some Belgian newspaper companies won a copyright lawsuit against Google News which makes Belgium judges look like they do not understand how search or the internet work. Some publisher groups are trying to create an arbitrary information access protocol, Agence France Presse (AFP) sued Google to get them to drop their news coverage, and Google paid the AP a licensing fee.
Perfect 10, a pornography company, sued Google for including cached copies of stolen content in their image index, and for allowing publishers to collect income on stolen copyright content via Google AdSense.
In May of 2000 a French judge required Yahoo! to stop providing access to auctions selling Nazi memorabilia.
In 1999 Playboy sued Excite and Netscape for selling banner impressions sold for searches for Playboy.
Overture sued Google for patent infringement. Just prior to Google's IPO they settled with Yahoo! (who by then bought out Overture) by giving them 2.7 million shares of class A Google stock.
Geico took Google to court in the US for trademark violation because Google allowed Geico to be a keyword trigger to target competing ads. Geico lost this case on December 15, 2004. Around the same time Google lost a similar French trademark case filed by Louis Vuitton.
Lane's Gifts sued Google for click fraud, but did not have a strong well put together case. Google's lawyers pushed them into a class wide out of court settlement of up to $90 million in AdWords credits. The March 2006 settlement aimed to absolve Google of any clickfraud related liabilities back through 2002, when Google launched their pay per click model.
The US government requested that major search companies turned over a significant amount of search related data. Yahoo!, MSN, and AOL gave up search data. The Google blog announced that Google fought the subpoena
In August, Google was served with a subpoena from the U. S. Department of Justice demanding disclosure of two full months’ worth of search queries that Google received from its users, as well as all the URLs in Google’s index.
A judge stated that Google did not have to turn over search usage data.
AOL not only shared information with the government, but AOL research also accidentally made search records public record.
Each search company has its own business objectives and technologies to define relevancy. The three biggest issues search engines are fighting are
In order to try to lock users in search engines offer things like free email, news search, blogging platform, content hosting, office software, calendars, and feature rich toolbars. In some cases the software or service is not only free, but it is expensive to provide. For example, Google does not profit from Google news, but they had to pay the AP content licensing fees, and hosting Google Video can't be cheap.
In an attempt to collect more data, better target ads, and improve conversion rates Google offers
The end goal of search is to commoditize the value of as many brands and markets as possible to keep adding value to the search box. They want to commoditize the value of creating content and increase the value of spreading ideas, the value of attention, and the importance of conversion.
As they make the network more and more efficient they can eat more and more of the profits, which was a large part of the reasoning behind Jakob Nielson's Search Engines as Leeches on the Web.
Because search aims to gain distribution by virtually any means possible the search engines that can do the best job of branding and get people to believe most in their goals / ideals / ecosystem win. Search engines are fighting many ways on this front, but not all of them are even on the web. For example, search engines are trying to attract the smartest minds by sharing research. Google goes so far as offering free pizza!
Google hires people to track webmaster feedback across the web. Matt Cutts frequently blogs about search and SEO because to him it is important for others to see search, SEO, and Google from his perspective. He offers free tips on Google Video in no small part because it was important for Google Video to beat out YouTube for Google to become the default video platform on the web. Once it was clear that Google lost the video battle to YouTube Google decided to buy them.
Beyond just selling their company beliefs and ideology to get people excited about their field, acquire new workers, and get others to act in a way that benefits their business model search engines also provide APIs to make portions of their system open enough that they can leverage the free work of other smart, creative, and passionate people.
Selling search as an ecosystem goes so far that Google puts out endless betas, allowing users to become unpaid testers and advocates of their products. Even if the other search engines matched Google on relevancy they still are losing the search war due to Google's willingness to take big risks, Google's brand strength, and how much better Google sells search as an ecosystem.
Google wants to make content ad supported and freely accessible. On October 9, 2006, Google announced they were acquiring YouTube for $1.65 billion in stock. In March, 2007,Viacom sued Google / YouTube for $1 billion for copyright infringement. In 2007 Microsoft pushed against Google's market position calling Google a copyright infringer (for scanning books) and doing research stating that many of Google's blogspot hosted blogs are spam.
In 2006 and 2007 numerous social bookmarking and decentralized news sites became popular. Del.icio.us, a popular social bookmarking site, was bought out by Yahoo. Digg.com features fresh news and other items of interest on their home page based on user votes.
In 1992 TREC was launched to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. In addition to helping support the evolution of search they also create special tracks for vertical search and popular publishing models. For example, in 2006 they created a blog track. Past TREC publications are posted here.
There are a number of other popular conferences covering information retrieval.
Search Science lists a number of conferences on the right side of the Search Science blog.
There are also a number of conferences which talk about search primarily from a marketer's perspective. The three most well known conferences for are
Many of the following have not been updated in years, or only cover a partial timeline of the search space, but as a collection they helped me out a lot. SearchEngineWatch is amazingly comprehensive if you piece together all of the articles Danny Sullivan has published.