Today we’re lucky enough to speak with Flemming Madsen, founder and managing director of Onalytica, a firm that attempts to mine the content on the internet to get a feel for collective sentiment around topics or brands.
Flemming, thanks for talking with us.
1. Explain a bit what Onalytica does.
Thanks for having me. We provide tools and services that extract insights and trends from the millions of pages and documents on the Internet.
By statistically analysing the online content our clients do not have to run risk relying on a few anecdotes from a random search.
Using some fairly advanced mathematics we machine-analyse the content of millions of web pages and documents and extract valuable knowledge from it. Think traditional data mining as many corporations do on their in-house data - except the Internet is our database.
While a search engine may find you thousands of documents about a topic, say “mobile phones”, our tools looks at all those pages and try to answer a number of key business questions, for example:
- What’s the relative influence of each of the voices that talk about this issue?
- When you add it all up, and take the impact of each voice into account, what is actually being said? By consumers, media, influencers, regulators - what are they focused on? Happy/unhappy about? What gets recommended?
- What is the sentiment towards the brands and issues mentioned in the context?
- How has the focus on issues, individual brands and the sentiment towards them developed over time? Why? Who’s driving it and where is it going?
Our clients also use our service to monitor what is being said about them (and their competitors and key issues) online. Because we measure the topical influence of each voice, they get a much more correct picture than they are used to from other services.
Our clients are typically big brand owners, PR/Advertising/Media companies and government departments; actually anyone with a sufficiently ongoing interest in an issue, market or brand. Lately we have also seen good interest from financial players such as banks and hedge funds looking for better tools.
2. As we’ve learned here at Nestoria, the challenge of extracting meaning from human entered text is exceptionally difficult given problems like typos, slang, and abbreviations (not to mention filtering out spam). What are some of the techniques Onalytica uses?
There are many challenges and there is no one way to overcome them. Our challenges are not too different from other companies dealing with manipulation of large datasets. They just tend to be bigger. However now that we have overcome the very hard challenges they actually work in our competitive favour.
Spam is not a big problem for us mainly because spam sites have always low topical influence. It’s essentially a result of the fact that they can’t get bona fide endorsements outside their own cluster.
Spam sites have some success with search engines that use popularity as a measure of relevance. But of course popularity is not relevance (or influence) - it’s only easy for the search engine to calculate.
When considering the challenges we face you should also take into account that our clients are mostly focused on trends so they can understand the past and be better at predicting the future. So when we analyse say 100k documents on a particular topic and present the trends, the conclusions will normally not be different if we do include a few hundred pages or documents that shouldn’t have been included. Because we aggregate such large sets of data our results are less sensitive to a few random errors.
Slang is positive for us and not really a challenge. We sometimes use it to further slice the data we analyse. There are certain elements of a person’s writing style that can lead to an indication of gender, age and educational background. Slang is one of them. Another is your appetite for hyper linking.
3. What are your thoughts on the developments in the vertical search sector in the last year?
From a professional perspective I find it interesting how many vertical search engines that have a appeared.
I am no expert on the business of search, but I imagine that vertical search will always provide better results than horizontal engines simply because the engine already knows the focus of the searcher. This again allows the programmers to make a number of justified assumptions; all leading to better results and better user experience.
4. How might better ability to convert data into information play a role in property search?
Automatically analysing the online debate about areas can provide valuable information for property seekers.
It could give a fairly precise picture of what is being said about an area. Imagine considering moving to an area and you then find out that when people talk about this area they are, say 200% more likely to say “traffic jam” and”noisy neighbours” than in the neighbouring area.
Similar it would be interesting if the debate about the neighbouring borough was consistently more positive than about the borough I was considering. Taking price into account you might be able to indicate where the cost of “quality of life” is more favourable.
Thanks very much Flemming. Many interesting ideas around using technology to better extract meaning and thus be able to turn the masses of data on the internet into information. We’ll be updating our readers on some of the similar work we’ve done recently in our next post about Nestoria Rank tomorrow.
For those interested in learning more about Onalytica we recommend
subscribing to their blog.
past Nestoria interviews:
Mike Price,
Prashant Agarwal,
Paul Carr.
Who would you like to see interviewed next?
Let us know.
4 Responses to “Nestoria Interview - Flemming Madsen - Onalytica”