A Note on Data Quality

Ahoy,

in our recent post about our new co-branded search tool for webmasters there was an interesting discussion afterwards about data quality of some of our data sources.

First of all, it’s great to see everyone getting so excited about this topic, it’s one we feel passionate about ourselves. As loyal readers of this blog will know, data quality is a major issue here at Nestoria. Here’s a post from just last week about how we’re extracting more information for property searchers from listings as a way to improve data quality.

We take it so seriously that we currently exclude 15-20% of the listings we have because they are of low quality, and we’re constantly fine tuning our algorithms to catch more dodgy listings. Unfortunately spam rel=”nofollow” is a very real problem in the classifieds industry.

As anyone who’s ever worked with software (or had an email inbox) can imagine, catching this isn’t a simple. It’s a very iterative process.

We’ve recently integrated a few new data sources, and it looks like a few bad listings slipped through.

As an example, this morning we’ve spotted this gem:

bad listing

(no offense Michelle, it’s nothing personal, it’s just that people come to Nestoria to look at houses, not blondes).

So, loyal readers, we can say we’re hard at work on it (you should see some of the stuff we throw away. It makes viagra ads seem tame). Keep the passion for quality! We welcome all ideas you have on the topic. Please get in touch via our feedback form.

2 Responses to “A Note on Data Quality”


  1. 1 Duncan@oodle

    Ed - perhaps we should work together (with Zoomf and Extate) and define some industry standards for web 2.0 property listings - its bound to get some press and might move the discussion forward ?

  1. 1 Nestoria Rank update - February 2007 » Nestoria Blog

Leave a Reply

retaggr