Congratulations! You have developed a classification model that correctly identifies promising pharmaceutical compounds 10% of the time! But isn’t that just a 10% success rate? Why spend the time and money developing that model when you could get a 50% success rate simply by flipping a coin? The usefulness of a classification model is sometimes underestimated based on a false assumption that if there are two possible outcomes, then there is a 50% chance that either will occur. In most environments in which two possible outcomes exist, there is NOT an equal probability that each will occur.
Consider the pharmaceutical example presented recently by Graettinger. A company has determined that 99.9% of potential compounds do not yield promising drugs. They would like to determine a statistical rule that will increase chances of identifying potentially promising compounds. In his article, Graettinger presents two models:
· Model 1 is a classification model (resulting from Graettinger’s data mining techniques) that correctly identifies promising compounds 10% of the time.
· Model 2 is a simple rule that labels ALL potential compounds as “unpromising.”
It would appear that Model 1 has a 10% success rate, while Model 2 has a 99.9% success rate. But what is the definition of success? It is NOT simply the correct identification of a compound as “promising” or “unpromising.” Rather, it is the correct identification of “promising” compounds alone. Thus Model 1 boasts a 10% success rate compared to 0% for Model 2.
We can all agree that 10% success is better than 0% success, but is it really something to be excited about? Consider the business application for the pharmaceutical company. Without the classification model, the company would have to test each potential compound. The knowledge that 99.9% of the potential compounds do not yield promising drugs indicates that 1 of every 1000 tests will identify a promising compound. If the classification model is applied and only those labeled as “promising” are tested, the 10% model success rate indicates that 1 of every 10 tests will yield a promising compound. The model allows the company to achieve the same result with only 1% of the testing resources that would be required without the model.
Director, Marketing Sciences