Getting More From Your Data
The growth of the Internet has led to a huge expansion of data - data on virtually everything - such that it has become nearly impossible to find a particular document without using search engines such as Google. Yu and colleagues pointed out in a recent presentation to the Mellon Foundation on Patterns in Unstructured Data that…
…for all their problems, online search engines have come a long way. Sites like Google are pioneering the use of sophisticated techniques to help distinguish content from drivel, and the arms race between search engines and the marketers who want to manipulate them has spurred innovation. But the challenge of finding relevant content online remains. Because of the sheer number of documents available, we can find interesting and relevant results for any search query at all. The problem is that those results are likely to be hidden in a mass of semi-relevant and irrelevant information, with no easy way to distinguish the good from the bad.
While Google’s algorithms are good at unearthing documents based on keywords, it will become paramount to extract more than just the documents itself. Analysis of so-called unstructured data - information contained in emails, reports, PowerPoint presentations, voice mail, phone notes, agendas and photographs (in fact anything less structured than database entries) will generate true and measurable value by providing information along with its context, something that is missing from today’s search queries.
Along the same lines, AP technology writer Brian Bergstein recently discussed how Companies Are Using Tech Analysis on Themselves:
Eastman Kodak Co. uses unstructured-data analysis to spot connections in its own and its competitors’ patent filings. Government agents use it to hunt for insider trading or linkages between terrorist groups. Mayo Clinic researchers use it to scan physicians’ notes for evidence about the efficacy of treatments. The breakthrough has been in getting computers to understand the content of the documents they scan.The automated analysis of “unstructured” data is becoming remarkably agile at giving companies detailed answers to the age-old business question of “How are we doing?” For example, Intelliseek Inc. recently partnered with the Factiva information service to offer “reputation insight.”
Intelliseek scans 4 million Web logs and e-mail list servers, and Factiva — a joint venture between Dow Jones & Co. and Reuters Group PLC — combs news stories, radio transcripts and other media. Together they produce for companies a detailed analysis of how the public thinks about them at any given point.
(…) The most popular phrases relating to a company can be determined, and whether those terms are waxing or waning in significance. Comparisons with competitors can be generated — as well as to a company’s own business results. Who knows? Perhaps a seemingly unrelated bit of geopolitical news tends to boost sales. Or maybe early word can be gleaned about problems with a product that might lead to an expensive recall.
It is the extraction of this additional value in ordinary information that can provide a competitive advantage, allowing companies to discover profitable niche markets and to lower the cost of doing business by “getting more bang for your buck”. Without sophisticated software tools to cut through the chatter, we’ll likely drown in our self-created stream of information…
Uli’s Blog » Good Decision Making Requires Frugality Said:
Comment posted on August 14th, 2005 at 10:25 am[…] be a successful decision maker, we have to edit.” Going back to the recent post on Getting more from your Data, it is clear that applying medicinal chemistry successfully relies […]