Insolubility classification with accurate prediction probabilities using a metaClassifier

C. Kramer, B. Beck, Tim Clark

Research output: Contribution to journalArticlepeer-review


Insolubility is a crucial issue in drug design because insoluble compounds are often measured to be inactive although they might be active if they were soluble. We provide and analyze various insolubility classification models based on a recently published data set and compounds measured in-house at Boehringer-Ingelheim. The 2D descriptor sets from pharmacophore fingerprints and MOE and the 3D descriptor sets from ParaSurf and VolSurf were examined in conjunction with support vector machines, Bayesian regularized neural networks, and random forests. We introduce a classifier-fusion strategy, called metaclassifier, which improves upon the best single prediction and at the same time avoids descriptor selection, a potential source of overfitting. The metaclassifier strategy is compared to the simpler fusion strategies of maximum vote and highest probability picking. A prediction accuracy of 72.6% on a three class model is achieved with the metaclassifier, with nearly perfect separation of soluble and insoluble compounds and prediction as good as our calculated maximum possible agreement with experiment.
Original languageEnglish
Pages (from-to)404-414
Number of pages11
JournalJournal of Chemical Information and Modelling
Issue number3
Publication statusPublished - 2010


Dive into the research topics of 'Insolubility classification with accurate prediction probabilities using a metaClassifier'. Together they form a unique fingerprint.

Cite this