Congratulations! You have developed a classification model that correctly identifies promising pharmaceutical compounds 10% of the time! But isn’t that just a 10% success rate? Why spend the time and money developing that model when you could get a 50% success rate simply by flipping a coin? The usefulness of a classification model is sometimes underestimated based on a false assumption that if there are two possible outcomes, then there is a 50% chance that either will occur.** **In most environments in which two possible outcomes exist, there is NOT an equal probability that each will occur.

Consider the pharmaceutical example presented recently by Graettinger. A company has determined that 99.9% of potential compounds do not yield promising drugs. They would like to determine a statistical rule that will increase chances of identifying potentially promising compounds. In his article, Graettinger presents two models:

· Model 1 is a classification model (resulting from Graettinger’s data mining techniques) that correctly identifies promising compounds 10% of the time.

· Model 2 is a simple rule that labels ALL potential compounds as “unpromising.”

It would appear that Model 1 has a 10% success rate, while Model 2 has a 99.9% success rate. But what is the definition of success? It is NOT simply the correct identification of a compound as “promising” or “unpromising.” Rather, it is the correct identification of “promising” compounds alone. Thus Model 1 boasts a 10% success rate compared to 0% for Model 2.

We can all agree that 10% success is better than 0% success, but is it really something to be excited about? Consider the business application for the pharmaceutical company. Without the classification model, the company would have to test each potential compound. The knowledge that 99.9% of the potential compounds do not yield promising drugs indicates that 1 of every 1000 tests will identify a promising compound. If the classification model is applied and only those labeled as “promising” are tested, the 10% model success rate indicates that 1 of every 10 tests will yield a promising compound. The model allows the company to achieve the same result with only 1% of the testing resources that would be required without the model.

Cortney Lantry

Director, Marketing Sciences

Resource: "Data Mining Misconceptions", Graettinger, 2008-2009, www.discoverycorpsinc.com

Name:lindaTime:Saturday, January 16, 2010Cortney, do you know if there are any statistics that state companies are X times more likely to improve profitability if they employ segmentation?

Name:CortneyTime:Friday, January 22, 2010Linda- It would not be feasible to calculate a generalized statistic as to how much more profitable a company would be as a result of employing segmentation or classification analyses. A variety of factors play a role in how helpful such analyses can be. Here are just a few: •Current profitability •Success rate of the classification model •Value/resources associated with current missed opportunities Generally speaking, a first step might be to evaluate current procedure and its success rate, along with estimating the potential value that is not yet being realized. That would offer an idea of how much a company stands to gain and whether that potential gain warrants investment in these types of analyses. Thanks, Cortney