Insolubility classification with accurate prediction probabilities using a metaClassifier

C. Kramer, B. Beck, Tim Clark

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Insolubility is a crucial issue in drug design because insoluble compounds are often measured to be inactive although they might be active if they were soluble. We provide and analyze various insolubility classification models based on a recently published data set and compounds measured in-house at Boehringer-Ingelheim. The 2D descriptor sets from pharmacophore fingerprints and MOE and the 3D descriptor sets from ParaSurf and VolSurf were examined in conjunction with support vector machines, Bayesian regularized neural networks, and random forests. We introduce a classifier-fusion strategy, called metaclassifier, which improves upon the best single prediction and at the same time avoids descriptor selection, a potential source of overfitting. The metaclassifier strategy is compared to the simpler fusion strategies of maximum vote and highest probability picking. A prediction accuracy of 72.6% on a three class model is achieved with the metaclassifier, with nearly perfect separation of soluble and insoluble compounds and prediction as good as our calculated maximum possible agreement with experiment.
    Original languageEnglish
    Pages (from-to)404-414
    Number of pages11
    JournalJournal of Chemical Information and Modelling
    Volume50
    Issue number3
    DOIs
    Publication statusPublished - 2010

    Fingerprint

    Dive into the research topics of 'Insolubility classification with accurate prediction probabilities using a metaClassifier'. Together they form a unique fingerprint.

    Cite this