Priors for symbolic regression

Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When choosing between competing symbolic models for a data set, a human will naturally prefer the "simpler" expression or the one which more closely resembles equations previously seen in a similar context. This suggests a non-uniform prior on functions, which is, however, rarely considered within a symbolic regression (SR) framework. In this paper we develop methods to incorporate detailed prior information on both functions and their parameters into SR. Our prior on the structure of a function is based on a n-gram language model, which is sensitive to the arrangement of operators relative to one another in addition to the frequency of occurrence of each operator. We also develop a formalism based on the Fractional Bayes Factor to treat numerical parameter priors in such a way that models may be fairly compared though the Bayesian evidence, and explicitly compare Bayesian, Minimum Description Length and heuristic methods for model selection. We demonstrate the performance of our priors relative to literature standards on benchmarks and a real-world dataset from the field of cosmology.
Original languageEnglish
Title of host publicationGECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation
PublisherAssociation for Computing Machinery
Pages2402–2411
ISBN (Print)9798400701207
DOIs
Publication statusPublished - 24 Jul 2023
EventGECCO 2023 - Lisbon, Portugal
Duration: 15 Jul 202319 Jul 2023

Conference

ConferenceGECCO 2023
Country/TerritoryPortugal
CityLisbon
Period15/07/2319/07/23

Keywords

  • model selection
  • minimum description length
  • symbolic regression
  • equation learning
  • data analysis
  • cosmology
  • language model
  • UKRI
  • STFC

Cite this