Low dimensional measurement of vowels using machine perception

James Burridge, Bert Vaux

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)

Abstract

We present a method for combining the feature extraction power of neural networks with model based dimensionality reduction, to produce linguistically motivated low dimensional measurements of sounds. Our method works by first training a convolutional neural network (CNN) to predict linguistically relevant category labels from the spectrograms of sounds. We then define idealized models of these categories as probability distributions in a low dimensional measurement space, with locations chosen to reproduce, as far as possible, the perceptual characteristics of the CNN. To measure a sound, we find the point in our measurement space for which the posterior probability distribution over categories in the idealized model most closely matches the category probabilities output by the CNN for that sound. In this way we are able to use the feature learning power of the CNN to produce low dimensional measurements. We demonstrate our method using monophthongal vowel categories to train our CNN, and produce measurements in two dimensions. We also show that the perceptual characteristics of our CNN are similar to those of human listeners.
Original languageEnglish
Pages (from-to)304-315
Number of pages12
JournalJournal of the Acoustical Society of America
Volume153
DOIs
Publication statusPublished - 18 Jan 2023

Fingerprint

Dive into the research topics of 'Low dimensional measurement of vowels using machine perception'. Together they form a unique fingerprint.

Cite this