Significance of novel WSD algorithms

Armin Shams Baragh, David S. Bree, Hossein Sharif-Paghaleh

Research output: Contribution to journalArticlepeer-review

Abstract

Word sense disambiguation (WSD) is considered to be a difficult problem in computational linguistics. Single-approach solutions to this problem consisting of only one module are unlikely to yield high performance levels, while hybrid systems formed by combining a number of modules tend to offer a better performance in tackling WSD problems. We propose the strong POS + Frequency baseline as a basic easy-to-implement platform for testing how well algorithms can do when combined with other high-accuracy modules. After giving an overview of the field, we discuss our proposed novel model for WSD, which is a stand-alone contribution in a field in which ideas are repeated. Under the umbrella of our novel WSD model, called Sense Space Model (SSM), we show that significant and interesting algorithms exist. While the accuracy of some of the unsupervised offspring algorithms of the model can be low compared to the strong POS + Frequency baseline (and also compared to the top hybrid systems), sometimes even having an accuracy lower than a random system, such algorithms can still act significantly better than a random system when combined to the strong baseline, considering a meticulous 1% significance level. Therefore, ruling out such lower-accuracy modules from a hybrid system, which might otherwise appear to be a necessary elimination, is challenged. One of these significant algorithms was recently improved by introducing “a threshold” and could beat the implemented POS + Frequency baseline. This confirms that considering such lower-accuracy algorithms as significant is reasonable.
Original languageEnglish
Pages (from-to)344-364
JournalJournal of Quantitative Linguistics
Volume17
Issue number4
DOIs
Publication statusPublished - 2010

Fingerprint

Dive into the research topics of 'Significance of novel WSD algorithms'. Together they form a unique fingerprint.

Cite this