Abstract
We present a method for combining the feature extraction power of neural networks with model based dimensionality reduction, to produce linguistically motivated low dimensional measurements of sounds. Our method works by first training a convolutional neural network (CNN) to predict linguistically relevant category labels from the spectrograms of sounds. We then define idealized models of these categories as probability distributions in a low dimensional measurement space, with locations chosen to reproduce, as far as possible, the perceptual characteristics of the CNN. To measure a sound, we find the point in our measurement space for which the posterior probability distribution over categories in the idealized model most closely matches the category probabilities output by the CNN for that sound. In this way we are able to use the feature learning power of the CNN to produce low dimensional measurements. We demonstrate our method using monophthongal vowel categories to train our CNN, and produce measurements in two dimensions. We also show that the perceptual characteristics of our CNN are similar to those of human listeners.
Original language | English |
---|---|
Pages (from-to) | 304-315 |
Number of pages | 12 |
Journal | Journal of the Acoustical Society of America |
Volume | 153 |
DOIs | |
Publication status | Published - 18 Jan 2023 |