eWeek reports that Google is going to add blogs to its Universal Search alongside images, news, books, maps and videso. Google has been running Google Blogsearch as a seperate search engine. eWeek says Google will make the move to include blogs this week (which is nearly over) or next week.
Starting this week or next, queries on the leading search engine will return links to blogs alongside the images, news, books, local maps and video, Marissa Mayer, vice president of search products and user experience, told eWEEK in a briefing at the company's headquarters here.
Blogs have been gaining significant momentum in the last couple of years, fueled by everything from fascinating news revelations to gossipy snipes. The inclusion of blogs as a genre on Universal Search is a nod to their growing number and ability to get people to go online to find content, which is what Google is all about.
Universal Search is the fruit of a five-year effort involving hundreds of engineers working to refine the company's search algorithms and add multimedia content to its search returns to give users richer results.
It's a logical move for Google or any search engine that wants to provide current and relevant information for it users. Frequently updated blogs tend to contain news about what is happening right now. That's often just the kind of information people are searching for. Blogs have always been indexed by Google so it will be interesting to see how much more exposure this will give blogs and what the search results will look like.
Here is a video explaining what Google's Universal Search is all about. (hat tip Jim Kukral).
The new Mahalo search engine is offering to pay people to create their search results. Guides will be paid $10 to $15 for each approved search results according to the Mahalo Greenhouse Faq.
The Mahalo Greenhouse is where talented part-time Guides (PTGs) help Mahalo create the best search results on the internet. PTGs create search results in the Mahalo Greenhouse for terms Mahalo has yet to cover, and if our full-time Guides approve the PTGs' search results, those results will be moved from the Mahalo Greenhouse to Mahalo.
Anyone can apply to be a PTG. PTGs are paid $10 to $15 for each search result approved and added to Mahalo.
The Mahalo Greenhouse is the place where people can build search results for Mahalo. You can also see a list of the most wanted search results pages (SeRPS). Bloggers would probalby be very good at creating search results for websites like Mahalo since one thing bloggers do very well is point people towards interesting content and resources.
Mahalo is a human-powered search engine that was launched a couple weeks ago by Weblogs, Inc. founder Jason Calacanis. (via TechCrunch and Techmeme)
Google has added a search menu to its search results that gives Google users more search options. Blogs is one of the search options. You can see the menu in the screenshot below for a Google search for the keyword "test."
This should mean that more traffic will now be driven to blogs as people try Google's Blog Search.
Steve Rubel blogs about how web searching by using dashes instead of quotes can save you time. For example, if you are looking for Clifford the Big Red Dog using big-red-dog instead of "big red dog" will return the same results in most search engines. It works in both Technorati and Google Blog Search. It doesn't seem like much but the time can really add up as Steve Rubel explains.
However, don't laugh. This tiny trick saves me lots of time over the years. Consider this. According to my Google search history I have performed a staggering 31,000 searches over the last two years since they added this feature. Many of these are phrases. Let's argue it's half of them. If I save 0.5 seconds thanks to the dashed-search technique and multiply it times 15,500, I calculate that I have saved 7,500 seconds. That adds up to 125 minutes or two hours! That means I saved at least an hour a year. Plus, that's not counting the tons of other searches on sites the Google history doesn't track.
Your mileage may vary but give it a try.
Saving an hour a year isn't much but every little bit helps. Plus, you don't have to use the shift key. A comment left on Rubel's post says that a period will also work instead of a dash. Also, some people are saying it may not work for all searches.
Topix has launched the redesign of its news search website. The relaunch included a move from Topix.net to Topix.com. Topix paid $1 million for the domain last month. News.com reports that Topix has also added a citizen journalism feature to the website which allows poeple to provide local news by zipcode through the website or from a cell phone.
Topix is following the user-powered models of the popular online encyclopedia Wikipedia and the Open Directory Project (ODP) of Web links in which volunteers are responsible for creating and editing entries. Topix will avoid the spam problem that sites like Digg have by requiring people to sign up with their real names, said Rich Skrenta, chief executive officer. Skrenta is co-founder of the ODP.
Anyone can submit local news by ZIP code through the Web site or from their cell phone. The citizen journalist idea came to executives after they unearthed hidden in the site's forums a posting from a Texas Minuteman of his first-person experience patrolling the U.S.-Mexico border, something that wasn't published anywhere else, Skrenta said.
Topix also continues to provide its effective news and blog search engine. Like before users can use Topix to search through news content with blogs or without blogs or with both blogs and news articles. News and blog searches can also be restricted by domain, country, zip code and source.
Google Operating System has a very interesting post about how Google Blog Search ranks search results. They found the information in a patent filed by Google. These are some of the positive things that can help your blog rank better in Google Blog Search.
links from blogrolls (especially from high-quality blogrolls or blogrolls of "trusted bloggers")
links from other sources (mail, chats)
using tags to categorize a post
PageRank
the number of feed subscriptions (from feed readers)
clicks in search results
The negative things that can hurt your blog's ranking in Google Blog Search are spam indicators like duplicated content, spammy keywords and adding posts at a predictable time. Having very few feed subscribers or no little on important blog rolls would also be a negative.
The patent itself is worth reading. Scroll down the part that says "Determining a Quality Score for a Blog Document." For example, Google does not just look at the number of feed subscribers. They also take a very close look at the number of individual feed subscribers in an attempt to help rule out spam blogs.
The popularity of the blog document may be a positive indication of the quality of that blog document. A number of news aggregator sites (commonly called "news readers" or "feed readers") exist where individuals can subscribe to a blog document (through its feed). Such aggregators store information describing how many individuals have subscribed to given blog documents. A blog document having a high number of subscriptions implies a higher quality for the blog document. Also, subscriptions can be validated against "subscriptions spam" (where spammers subscribe to their own blog documents in an attempt to make them "more popular") by validating unique users who subscribed, or by filtering unique Internet Protocol (IP) addresses of the subscribers.
Google Operating System says Google develops a relevance score called an IR score to rank search results.
To rank the search results, Google combines a quality score obtained by mixing those signals with a relevance score (IR score) that depends on the query. "The IR score may be determined based on the number of occurrences of the search terms in the document. The IR score may be determined based on where the search terms occur within the document (e.g., title, content, etc.) or characteristics of the search terms (e.g., font, size, color, etc.). A search term may be weighted differently from another search term when multiple search terms are present. The proximity of the search terms when multiple search terms are present may influence the IR score." (the quote was slightly altered for clarity)
If you can improve your inbound links and feed subscribers these are two things that will probably boost your blog's rank in Google Blog Search. A lot of times people are searching Google Blog Search for recent posts (which are sorted by date) so blogging frequently about current issues is also helpful.
Rumors are that Technorati, a blog search tool, has acquired Personal Bee, a tool that lets people create personalized news pages. Valleywag writes that the acquisition could be a sign that Technorati plans to launch "themed news pages" similar to the Techmeme memetracker.
Personal Bee's founder will come in as VP of business development at Technorati; we're not sure whether the value of the target was in engineering, where Technorati's been weak. Any significance beyond that? One person familiar with Personal Bee says Technorati -- which has in the past offered brand-tracking to marketers, ego-surfing to bloggers and search to ordinary users -- plans now to build themed news pages in the style of Techmeme.
PaidContent.org reports that Topix.net, a popular blog and news search engine, has paid $1 million for the Topix.com domain. PaidContent.org says the companies is worried about the effect search engine influence may have once they move the site to Topix.com.
Topix.net, the news search site majority owned by Gannett, McClatchy and Tribune, has bought its .com domain after paying a Canadian company $1 million in January (late last year Topix received $15 million funding), and is planning to move the site onto Topix.com, reports WSJ. The story, which makes a bigger point about search engine influence on other sites, says it is worried about the effect on its Google rankings after this. About 50 percent of visits to Topix come through a search engine, and about 90 percent out of that is through Google...Even if traffic to Topix, which gets about 10 million visitors a month, dropped just 10 percent, that would essentially be a 10 percent loss in ad revenue, CEO Rick Skrenta said in the story. Topix will run its site at both Topix.net and Topix.com for awhile, in order to get over any unpredictabilities in Google and other search results.
Currently Topix.com does not have the news search features from Topix.net but it does have recent posts from Topix CEO Rich Skrenta's blog. It also a link to this WSJ article where Skrenta talks more about search engine influence.
Google Video has added a new feature on its homepage called Blog Buzz (hat tip Google Operating System) that lists the most discussed videos in the blogosphere. The list includes the top ten most discussed videos, links to the videos and a links to Google Blog Search that shows which blogs are discussing the videos.
It would be interesting if Google would tell us more about the popular items in Google Blog Search sort of like Technorati does in its Popular section. Technorati also has a list of the most popular videos.
Here are the top ten videos in the blogosphere according to Google Blog Search as of this writing.
Wikipedia has launched a new search tool called WikiSeek. WikiSeek is an improved search tool for Wikipedia. Michael Arrington at TechCrunch reports that WikiSeek also indexes websites that Wikipedia links to. If the new search engine becomes heavily used it may give a traffic boost to websites linked from Wikipedia.
WikiSeek is a search engine that has indexed only Wikipedia sites, plus sites that are linked to from Wikipedia. It serves two purposes. First, it is a much better Wikipedia search engine than the one on Wikipedia (and has been built with Wikipedia’s assistance and permission). Second, the fact that it also indexes sites that are linked to from Wikipedia means that, presumably, it will return only very high quality results and very little spam. It won’t show every relevant result to a query, but it will certainly give a good overview of a subject without all the mess.
The search results also include a tag cloud which contains Wikipedia categories containing the search term. Results can be quickly filtered by clicking on one of those categories (see screen shot, click for larger view). The first three results of a query are always Wikipedia content (unless there are not three results) and are shaded blue. The remaining results are below the shaded area.
Michael Arrington also says that WikiSeek is going to confuse some people expecting Wikiasari, a new search engine in the works from Wikipedia creator Jimmy Wales. More about Wikiasari here and here. The fact that Wikipedia has a second search engine offering a better way to search Wikipedia that is not on the Wikipedia website may also confuse people.
The Findory blog search engine and personalized news reader is no longer going to supported. Greg Linden, the site's creator, says Findory will slow to a crawl but should run for a while longer during 2007.
Development on Findory now will slow to a crawl. There may be new features, but they will be rare. I no longer will spend time exploring funding, biz dev deals, or recruiting.
Findory appears to have sufficient resources to run on autopilot through most of 2007. Findory will eventually fade away, but I believe it has touched immortality through the impact it had.
It was exciting, challenging, and fun to try to build a startup. I consider myself very lucky to have had that opportunity.
It is a shame because Findory is a very useful tool for finding interesting discussions of the latest topics in some of the best written blogs. We wish we could say it isn't so but the sad truth is that Findory now appears to be:
Nielsen BuzzMetrics has released its list of the top blog posts in 2006. The top post was a petition against changes in the Livejournal interface. The top posts are primarily political posts and David Sifry's State of the Blogosphere posts. Here were the top ten most linked to posts in 2006 according to BuzzMetrics.
The official press release from Nielsen BuzzMetrics can be found here. It would be interesting to be able to compare yearly top posts lists from Google BlogSearch and Technorati but so far the two leading blog search engines have not released similar lists.
Earlier today Times Onlinereported that Wikipedia founder Jimmy Whales is planning a search engine called Wikiasari that will launch early next year compete with today's search leaders like Google, Live.com, Yahoo, Ask.com, etc. TechCrunch followed up with a post that includes a screenshot of the search engine and writers that the first three results will be Wikipedia results.
A source tells us that the working name for the project was "WikiSearch" until recently. It's clear that Wikiasari will be focused on quality first, depth second. Search results will include tag based navigation, the top three results will be wikipedia content, and the remaining results are determined by sites wikipedia considers to be "reputable" because they are external reference links from wikipedia pages.
Since all search results will be tied to wikipedia, either directly by linking to wikipedia content or because the sites are linked to from Wikipedia, real people will eventually be determining all search results and rankings within Wikiasari. The search engine will be opensource, and the index will be available under a GFDL. Wikia will operate the master version of the index, but others are free to take it under the terms of the GFDL.
Wikiasari was originally going to be called Wikisearch. The screenshot TechCrunch posted looks a lot like Google with the placement of the text ads. The big question here is whether people are unhappy with Google enough to leave. Google was able to grow quickly because people were unhappy with the quality of the other search engines. If people are finding what they need by using Google they may not see a reason to change even if a rival search engine is slightly better.
Want to Share Your Life Online With a Blog? is the question Google is asking to promote its Blogger service in Google search results. Google Blogoscope reports that Google recently started pimping its own blogging service in search results when a search for "blog" is conducted. Google's pimp for Blogger looks like this:
Even Google searches for "blogging," "blogger" and the nonsensical "bloggisaurus rex" will show you the Blogger "B" and the tempting Want to Share Your Life Online With a Blog? question. Googlified is also covering Google's Blogger pimpage.
The most popular search term that people typed into Google for 2006 was "Bebo" according to the 2006 Year-End Google Zeitgeist. Bebo? Yes, Bebo. Not eBay. Not Britney Spears.
Bebo is a popular social network for sure but it isn't even the most popular social network. Most people have probably never heard of it so how could it possibly be the most popular search term that people typed into Google in 2006? Metacafe is also on the list. Metacafe is a video sharing website that is less popular than YouTube and YouTube didn't even make the list. Here is the list of the top ten terms:
bebo
myspace
world cup
metacafe
radioblog
wikipedia
video
rebelde
mininova
wiki
Nicholas Carr has an interesting post that shows how Google's top 2006 terms differ drastically from the top search terms at Yahoo and AOL. Apparently, most people use Google to find social networks, videos and World Cup information while Yahoo users want celebrity gossip and the boring AOL users want weather information and dictionary links.
Lycos has announced that Perez Hilton's blog was the most-searched blog on Lycos in 2006. Perez Hilton is also being pursued by angry photo agencies who accuse him of constantly stealing their photos. Perez had 91% more searches the second-most searched for blog, the Huffington Post. Other top searched blogs included TMZ, Pink is the New Blog and PostSecret.
And from the blogosphere, Perez Hilton is the most-searched blog site of 2006, generating 91 percent more search interest than the second most popular blog site with web searchers, Huffington Post. While Huffington Post provides news and opinions, three of the top five most-searched blog sites this year cater to celebrity gossip news, including Perez Hilton, a.k.a. Mario Armando Lavandeira Jr., TMZ, a.k.a. "Thirty Mile Zone" around Hollywood, and Pink is the New Blog. The fifth most popular blog site in 2006 is PostSecret, an ongoing community art project where people anonymously email their secrets on postcards.
You can see the entire Lycos 50 here. The Lycos 50 also has a blog but it could use some new posts -- the last post was in October.
Baidu, a Chinese web search company, has launched a blog search service. People's Daily Onlinesays (via Techmeme) the search tool crunches through blogs written by 20 million Chinese bloggers.
Chinese Internet company Baidu launched its blog search service on Thursday to help Internet users navigate their way through the 20 million Chinese bloggers.
It is the first Chinese search service specifically for blogs.
Yu Jun, a senior executive with Baidu, said the service was based on a database of billions of websites, including all the blogs supported by Chinese blog service providers and individual blog websites.
The new service is expected to boost Baidu's users. Baidu started its space channel last July to provide blog services.
In case you were curious the inspiration for the name Baidu comes from a poem written over 800 years ago during the Song Dynasty.
Many people have asked about the meaning of our name. "Baidu" was inspired by a poem written more than 800 years ago during the Song Dynasty. The poem compares the search for a retreating beauty amid chaotic glamour with the search for one's dream while confronted by life's many obstacles. ".hundreds and thousands of times, for her I searched in chaos, suddenly, I turned by chance, to where the lights were waning, and there she stood." Baidu, whose literal meaning is hundreds of times, represents persistent search for the ideal.
This is more about Baidu can be found on the company's About page.
TagBulb is a new search tool that lets you search for tags from multiple Web 2.0 websites. Simply type in a tag to search and TagBulb will return images results for that tag. You can also change the display to show videos, books, products, blogs, jobs, podcasts, bookmarks, questions, events and goals for the tag you typed into the searchbox. TagBulb also lets you view related tags to the tag you selected. You can also see the most recent tags and the most popular tags that other people have searched for. (Via path -> Lifehacker -> Emily Chang)
Google Testing Including Blog Results in Google Web Search
Andy Boyd spotted a box listing "blog posts about tea" in Google's web search results when he was conducting a web search for tea. You can see the box in the image from his search below. The full screenshot of his Google results page can be found here.
The box does not appear if you search "tea" on Google today - it appears to be something Google is testing. If this feature Google is testing goes live it will definitely help boost traffic to blogs. (Via path -> Steve Rubel -> Google Operationg System)
Danny Sullivan has announced that he will be launching a new search blog and website called Search Engine Land on December 11th. Sullivan founded the Search Engine Watch website. Sullivan announced he would be leaving the Search Engine Watch site in August. Here are some of the features that will be available on Search Engine Land.
Original content covering developments in the search space.
Daily blog posts covering search news from across the web.
SearchCap: A daily email newsletter recapping search news from Search Engine Land and across the web. Also available by feed.
SearchCap Monthly: A monthly email newsletter recapping search news over the past month. Also available by feed.
The Search Engine Land blog will be a must-read for anyone who closely follows the search industry.
News.com reports that Topix.net has received an additional $15 million from the newspaper media companies that invested in it back in 2005. The ownership of Topix.net by news publishers is now Gannett 33.7%, Tribune 33.7% and McClatchy 11.9% for a total of 79.3%.
Topix, founded in 2002, aggregates news and categorizes it into topics. Earlier this year, the company added the ability for readers to comment on articles. Topix provides automated related links on some of the Web sites of newspapers owned by the investors and is adding reader comment capabilities to their sites as well.
Chris Tolles, vice president of sales and marketing at Topix, said the company would spend the funding on hiring and marketing. Company executives want to double the 25-person staff over the next year, he said. "For us to grow, we needed the money."
Topix.net also has a useful blog search. Blogs are automatically searched in the results but users can select the option to have only blogs searched. Topix.net also has a section showing the top stories in the blogosphere. The news of the $15 million investment was also blogged on the Topix.net weblog.
Google has launched the Google Custom Search Engine. The new service that lets anyone build their own search engine using the websites they want. You can also configure the Google custom search engines so that other people can help submit websites to it. AdSense members can also include their AdSense code. The new tool is definitely new competition for Swicki and Rollyo. Search Engine Watch calls the new search tool custom search with a "social twist."
But what is perhaps most interesting about the new Custom Search is that publishers (large or small) can allow anyone or selected colleagues, friends or community members to contribute to that index. For example, if I own a site dedicated to stamp collecting and have a group of regular contributors or trusted readers I can allow those individuals to contribute their selections to this index. This gives the index the ability to evolve and grow over time -- and makes it "social."
Here are some search tests and comments about Google Custom Search from bloggers.
We quickly set up a celebrity gossip blog search as an example. Our
HowToWeb.com site set up a gadget search engine. Specialized niches and networks of city blogs and newspapers would probably also work very well.
RealClimate has built a specialty search engine of top climate and global warming resources.
Google has added a blog search teaser to Google News results. At the bottom of each news search result page Google now gives you the option of trying the same keyword search with blogs. For example, if you search the keywords "Iraq Vietnam" you get the following result.
The Google News homepage also a link to Google Blog Search near the top right of the page.
Hey, has anyone heard about this cool blog search called Technorati? The New York Timeswrites as if the blog search engine is new to them. Om Malik and Stowe Boyd note that the Times piece also calls Peter Hirshberg the chief executive of Technorati.
"A year ago, brands were saying, 'Oh no, not the blogosphere,'" said Peter Hirshberg, chief executive of Technorati, a blog-tracking service that last week, in partnership with Edelman, provided results of a global survey of blog use. "Now they're saying, 'Great, this is an opportunity.'"
Peter Hirshberg, who has a blog here, is the Chairman on Technorati's Board of Directors but David Sifry is still the CEO according to Technorati's management page. Technorati's Daily Vlog also calls Sifry the CEO in the latest vlog entry. The Times also has Technorati's blog count at 55 million, which is 5 million larger than the last State of the Blogosphere.
Will Facebook, Technorati or YouTube be the Next Big Web 2.0 Sale?
Bloggers are discussing a possible Yahoo bid to buy Facebook. The New York Times (on News.com) reports that Yahoo's offer for Facebook was $900 million -- higher than Viacom's January offer but lower than Facebook's $2 billion goal.
When Viacom offered $750 million for Facebook in January, he asked for $2 billion and was rebuffed, according to a person involved in the negotiations. Now, he remains undecided about the latest offer, made in the last few weeks by Yahoo. That offer, first reported by The Wall Street Journal, was confirmed Thursday by two industry executives, one briefed on the deal by Facebook and the other by Yahoo. Both spoke on the condition of anonymity because the negotiations are continuing.
To woo Zuckerberg, Yahoo has offered about $900 million for Facebook and says it will keep the company somewhat independent, with Zuckerberg in charge. This has been its model with other acquisitions like Flickr, a photo-sharing site, and Del.icio.us, a social bookmarking service that lets members share lists of their favorite Web sites.
Paid Content has more about some of the rumors going on about Yahoo and Facebook. Meanwhile, a New York Post story, discussed on hundreds of tech blogs including Charlene Li, GigaOM, B2Day and Internet 2.0, puts YouTube's magic acquisition number at $1.5 billion. Less than that and they won't sell. Should Yahoo just add $600 million to their latest Facebook offer and buy YouTube instead?
Last year and earlier this year there were many sale rumors about Technorati. At one point a rumor suggested that Technorati had actually been sold and everyone was trying to find out who the buyer was. Lately there haven't been as many Technorati rumors. The rumors are primarily about YouTube, Facebook and other social networks and video sharing sites. It would be easy to speculate again that maybe Yahoo will buy Technorati since they recently mysteriously removed their blog search engine from Yahoo News. However, this would be pure speculation. It also seems unlikely they are planning on buying Technorati since they are looking to spend so much on a Facebook buy ... but if the Facebook deal doesn't pan out then maybe they will spend the money elsewhere. Many bloggers speculated in their 2006 predictions that Technorati would be sold this year -- see Blogspotting, rev2.org, Newsome.org, Blog Herald and Ruzee. We mentioned it in our predictions as well. There was also a post on TechCrunch in January called, When Will Yahoo Acquire Technorati?. So far this sale has not materialized and time is running out for an acquisition to happen in 2006.
Web search expert Danny Sullivan blogs that he is leaving Search Engine Watch. Sullivan was the founder of Search Engine Watch. He sold the site in 1997 to Mecklermedia, which later became Internet.com and then Jupitermeda. Jupitermedia has since sold the site to Incisive Media.
Back in 1997, I sold Search Engine Watch to what became Jupitermedia. That company later started the Search Engine Strategies conference series. I had a long and prosperous association with both of those properties (USA Today recently recounted the tale here). I renegotiated my contract to provide services for both of them to Jupitermedia several times without any major issues.
Last year, Jupitermedia sold the site and the series to Incisive Media. I wasn't unhappy with the sale and chose to let my contract be extended through the end of 2006 as part of it.
I was concerned about moving forward with Incisive, however. I'm far from the only reason behind the success of SEW and SES, but I've played a major role. I helped build both of those assets. Then I watched one company sell them to another without me having any formal capital stake in the sale. That left me wary of history repeating itself. I wasn't going to help this new company grow the business out of the sheer kindness of my heart.
I explained these reservations at the very beginning of my relationship with Incisive, that I needed some long-term incentive for helping them continue to grow and strengthen the site and conferences. After over a year of talks, that's failed to materialize. As a result, I'm departing.
You can keep up with Danny Sullivan on his personal blog called Daggle, which he has been running since early 2005. Sullivan also has a great post here about his decade of writing about search engines.
Yahoo has pulled the blog search from the Yahoo News homepage (thx Digital Inspiration). Yahoo merged blogs in with its regular news search last October.
Now the blogs part has been removed from the Yahoo News search page. So, what is happening with Yahoo's blog search? Some bloggers, including Digital Inspiration and Micropersuasion, are speculating that Yahoo may be preparing to launch a stand alone blog search tool. Kevin Burton finds a few blog results from a regular Yahoo News Search but most blogs are gone. Hopefully, we will get an explanation from Yahoo News soon. Yahoo did just add Flickr photos to Yahoo Search (thx AMCP Tech Blog) so they have been making changes recently.
AOL's accidental unleashing of hundreds of thousands of AOL customer's private searches has already resulted in the discovery of at least one specific person. The New York Timesexplains how 62-year-old Thelma Arnold's search keywords and phrases were revealed to all.
No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from "numb fingers" to "60 single men" to "dog that urinates on everything."
And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for "landscapers in Lilburn, Ga," several people with the last name Arnold and "homes sold in shadow lake subdivision gwinnett county georgia."
It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. "Those are my searches," she said, after a reporter read part of the list to her.
AOL removed the search data from its site over the weekend and apologized for its release, saying it was an unauthorized move by a team that had hoped it would benefit academic researchers.
But the detailed records of searches conducted by Ms. Arnold and 657,000 other Americans, copies of which continue to circulate online, underscore how much people unintentionally reveal about themselves when they use search engines — and how risky it can be for companies like AOL, Google and Yahoo to compile such data.
Mrs. Arnold plans to dump her AOL subscription and told the New York Times, "We all have a right to privacy. Nobody should have found this all out."
Mrs. Arnold is right. The general public should never ever know what keywords she plugged into a search engine. Internet search providers have a responsibility to keep this information private. People that use search engines should be able to trust that a list of their search keywords and phrases are not going to be made public months or years later. Search engines that promise to not keep search data or vow to destroy search histories
and records after a short period of time may find themselves with some new friends as a result of the AOL search data disaster.
Update 8-9-6: Ixquick Metasearch (thx blog.v7n.com) has already jumped on the opportunity to attract more searchers by promising to delete people's IP addresses and Unique User IDs.
Technorati has posted a new State of the Blogosphere report. Technorati is now tracking 50 million blogs and the amount of blogs Technorati is tracking is now 100 times larger than it was just 3 years ago. In July there were 175,000 new weblogs created each day or over 2 blogs created each second of each day. Technorati's CEO David Sifry notes that this torrid growth cannot continue forever.
Technorati has been tracking the blogosphere, or world of weblogs, since November 2002, and I'm constantly amazed at the growth over the years. The blogosphere has been doubling in size every 6 months or so. It is over 100 times bigger than it was just 3 years ago.
Whenever I write about these statistics, I'm always asked by people, "Can it continue to grow this quickly?" Frankly, I can't possibly imagine it continuing to grow at this pace - after all, there are only so many human beings in the world! It has to slow down.
There are even less human beings capable of blogging and all of the blog-able people are not going to blog.
Things have gotten spamier. 70% of the pings Technorati receives are now spam. It looks like blog spam is rapidly headed in the same direction email took but faster. A recent study found 95% of email is spam.
This graph provided Technorati is always one of the most interesting from the State of the Blogosphere reports. It shows incidents which led to big spikes in the number of blog posts. The latest spike occured when the Israel-Lebanon War began. There was another spike in May for the National Spelling Bee.
Here are the summary highlights of the report provided by Technorati's CEO David Sifry.
Technorati is now tracking over 50 Million Blogs.
The Blogosphere is over 100 times bigger than it was just 3 years ago.
Today, the blogosphere is doubling in size every 200 days, or about once every 6 and a half months.
From January 2004 until July 2006, the number of blogs that Technorati tracks has continued to double every 5-7 months.
About 175,000 new weblogs were created each day, which means that on average, there are more than 2 blogs created each second of each day.
About 8% of new blogs get past Technorati's filters, even if it is only for a few hours or days.
About 70% of the pings Technorati receives are from known spam sources, but we drop them before we have to send out a spider to go and index the splog.
Total posting volume of the blogosphere continues to rise, showing about 1.6 Million postings per day, or about 18.6 posts per second.
This is about double the volume of about a year ago.
The most prevalent times for English-language posting is between the hours of 10AM and 2PM Pacific time, with an additional spike at around 5PM Pacific time.
What are they thinking at AOL corporate headquarters? Over the weekend
AOL placed an enormous amount of private customer search history
onto the Internet. Customer search records for 650,000 customers from
the last three months were released onto the Internet. A total of 20 million search queries were released. This was a huge free gift for marketers and spammers but a big slap in the face to AOL customers. AOL usernames were replaced with a number but some of this information might be able to be tracked back to a real person who made the searches. For example, people often search their own names in search engines. Elliot Black shows that a huge amount of social security numbers were included in the AOL data. Some more examples of the search keywords and phrases that could cause privacy problems can be found here. More bloggers covering the topic can be found here and here.
People should not enter their social security numbers into search engines but AOL also should not be releasing information to the public that contains them. People also search for career, financial, health and relationship information online that they want kept private. This is a great way to get people to fear using the Internet and search engines. AOL's poorly conceived public
data release also comes during a time period when many services are launching
where privacy is a huge concern -- online word processors and spreadsheets, desktop search engines, instant messenger software, web-based email, etc. AOL's reckless behavior could make people less likely to use these kinds of services.
Update: Reuters reports that AOL has admitted the enormous data release was a screw-up.
"This was a screw up, and we're angry and upset about it," Andrew Weinstein, an AOL spokesman said. "It was an innocent-enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant."
Unfortunately, since the data was released mirror sites have popped up and the file has been download countless times. It is now impossible to make this customer search data private again.
Update: 8-9-06
A CNET article provides a look at some of the more disturbing searches made by users caught in AOL's data dumb. (via Search Engine Watch)
There are many people that have a feed that do not want the feed to be public. Bloglines announced a new feed access control standard that could help solve the problem. The XML for the proposed standard can be found here. The idea could help people hide their feeds from the unwanted eyes of employers and strangers while still being able to share feeds with friends and family members.
As we've seen more types of information get syndicated, and as feeds are becoming used for multiple purposes, we've been growing concerned about the lack of controls on the distribution of personal data, especially through RSS. For example, you may want to allow your friends and family to subscribe to your blog but you'd prefer your posts not show up in search results.
Along these lines, we recently offered a new way to claim your own feeds and indicate whether you want your feed included or excluded from Blog & Feed search on Ask.com and Bloglines (for more information, read the blog post announcing our Publisher Tools). But this method only solves the issue at Bloglines and Ask.com, and it doesn't address user-created (as opposed to publisher-created) feeds, like flickr feeds, which can't be claimed. Clearly, there is a need for an industry-wide solution.
As a result, we are proposing (and have implemented) an RSS and ATOM extension that allows publishers to indicate the distribution restrictions of a feed. Setting the access restriction to 'deny' will indicate the feed should not be re-distributed. In Bloglines, we'll use this to prevent the display of the feed information or posts in search results or any other public venue. If other readers and aggregators use the information in the same way, and publishers of feeds, including services that let users create feeds, implement this standard, we could make significant progress toward making feeds truly safe for non-public information. We think that's a pretty cool idea.
The downside is that unless other RSS aggregators adopt the standard it will only work on Ask.com and Bloglines.com. Marshall Kirkpatrick at TechCrunch said that no "formal agreements have been made yet with any other company, but it's hard to know why they wouldn't accept the idea with enthusiasm." Unfortunately, 100% acceptance by all search engines and websites sounds a little too optimistic. Some privacy is better than none but if you are publishing a public blog with or without a feed you should always expect that your content can be discovered. More thoughts on Bloglines' idea can be found at A Feed is Born, Majordojo, FuzzyBlog and Alex Barnett.
Dabble is a new video search tool that searches video data from over 300 video hosting websites including Blip.tv, Clipshack, YouTube and FrozenHippo. The launch post announcing the new service can be found here. Currently, Dabble can search over 321,000 videos.
Dabble collects video data from 240 + hosting sites that accept video uploads from people, plus tens of thousands of independent sites. Dabble also collects other sorts of media like audio for searching and organizing. And Dabblers bookmark media they find around the web. Dabble does not host media, but instead, makes media from anywhere on the web searchable, collectable through our bookmarking tool, part of a community, able to be tagged and commented upon, made into playlists and played. Dabble is an organizing tool to help people discover the value of media.
Users can also add videos to Dabble and tag and provide other information about videos already in the databse. Dabble was founded by Marry Hodder. Dabble's blog can be found here. (via Laughing Squid)
Technorati has upgraded its website and a post (also here) from David Sifry explains the changes to the blog search engine. You can also view a screencast of the changes here. Here are some of the highlights:
Search: Technorati's search engine allows you to choose from posts, tags or blogs. The results page has been cleaned up and looks much better than before. Technorati also shows the fifteen most popular tags and searches instead of ten.
David Sifry says the blog inbound link counts have been updated. "In addition, our link-counting mechanisms have also been dramatically improved. If you're a blogger, you should notice that your blog is being counted much more regularly, and that your rankings and authority information is much more accurate and up-to-date."
The individual blog pages have been updated. These pages show recent posts, recent inbound links, recent outbound links, top tags, traffic history from Alexa and
other information. A search box for searching the blog is also provided.
You can sort a search for who is linking to a particular post by authority or freshness. For example, here are the results for blogs linking to David Sifry's post on the Technorati blog sorted by authority. However, it doesn't look like it is working perfectly at the moment.
The Popular section was updated to show all the popularity rankings on a single page.
Technorati is also focusing more on Technorati members and including more member photos on the website.
David Sifry also said The Wall Street Journal has now integrated Technorati inbound link features onto its website just like The Washington Post, Newsweek and the Associated Press have done.
Feedster has announced a new president, Tyler Goldman, and a new round of funding.
Feedster, Inc., the leader in search and syndication of dynamic content, has announced today that Tyler Goldman will join its Board of Directors and become acting president. Mr. Goldman was previously Senior VP of Corporate & Business Development at Movielink, and founder and CEO of Broadband Sports, Inc. Former president, Chris Redlitz, left to pursue interests in earlier stage companies.
"Feedster is well positioned to expand its leadership in search and syndication." said Mr. Goldman. "With 41 million blogs and feeds being searched dynamically, Feedster provides consumers with the best way to leverage the constant flow of information that is being published on the web. Feedster is well recognized by the technology community as the leading search and syndication engine for dynamic web-published content, including blogs, news and podcasts, and is in the process of expanding this leadership position in newly emerging areas like images and video. As millions of users, publishers and other web-based entities continually generate a vast amount of dynamic content Feedster provides a comprehensive platform to search and syndicate content that is most relevant and timely."
Om Malik says sources put the funding at $1 to $5 million. Hopefully, it will be enough money to make the old Feedster subscribe links work again. Maybe someday they will even update the Feedster 500 again.