Abstract
Symbolic regression (SR) is the machine learning (ML) method for learning functions from data. After a brief overview of the SR landscape, I will describe the two main challenges that traditional algorithms face: they have an unknown (and probably significant) probability of failing to find any given good function, and they suffer from ambiguity and poorly justified assumptions in their function-selection procedure. To address these, I propose an exhaustive search and model selection by the minimum description length (MDL) principle, which allows accuracy and complexity to be directly traded off by measuring each in units of information. I showcase the resulting publicly available Exhaustive Symbolic Regression (ESR) algorithm on three open problems in astrophysics: the expansion history of the universe, the effective behaviour of gravity in galaxies and the potential of the inflaton field. In each case, the algorithm identifies many functions superior to the literature standards. This general-purpose methodology should find widespread utility in science and beyond. This article is part of the discussion meeting issue 'Symbolic regression in the physical sciences'.
| Original language | English |
|---|---|
| Article number | 20240584 |
| Number of pages | 15 |
| Journal | Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences |
| Volume | 384 |
| Issue number | 2317 |
| Early online date | 9 Apr 2026 |
| DOIs | |
| Publication status | Published - 9 Apr 2026 |
Keywords
- Machine learning
- minimum description length
- symbolic regression
Fingerprint
Dive into the research topics of '(Exhaustive) symbolic regression and model selection by minimum description length'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver