Senior Design

Senior Thesis

Research

Links

Classes

edit SideBar

Page last modified on August 14, 2008, at 10:27 AM HTKLM

Creating a n-gram language model

At its most basic, a language model is a set of probabilities of the occurrence of particular words in a particular sequence. For the bi- and trigram language models created below, we examine the probability that a particular word occurs given the preceding one (n = 2) or given the two preceding ones (n = 3).

The LMTutorial can be used to create a n-gram language model. The script below will facilitate the creation of the language model using HTK's LM Tutorial. Note that the methods written in the code are based largely on the model creation in the tutorial (hence the retention of the name 'holmes') but this does not affect performance.

lm.sh

Recipe

Training files transcription, with proper formatting (see lm.sh)
Testing file (optional)
HTK installed with proper paths

Alternatives (not explored)

CMU Statistical Language Modeling
The SRI Language Modeling Toolkit