HTK (Hidden markov model ToolKit)

導入

Windowsで利用することを前提とします。

ダウンロード

ダウンロードには、事前にユーザー登録が必要です。

HTK Speech Recognition Toolkit

過去のファイルがIndex of /ftp/softwareにあり、ここにはWindowsのバイナリもあります。

コンパイル

ダウンロードしたファイルのhtk\READMEにコンパイルの方法がありますので、それに従います。

ダウンロードしたファイルを展開する。
コマンドプロンプトを起動する。
ソースを展開したフォルダへ移動する。
htkフォルダへ移動する。
```
cd htk
```
ライブラリとツール用にフォルダを作成する。
```
mkdir bin.win32
```
VCVARS32を実行する。実行に失敗する場合にはパスを確認する。たとえばVisual Studio 2008ならば、それは%PROGRAMFILES(X86)%\Microsoft Visual Studio 9.0\VC\binにある。なお、その内容は"%VS90COMNTOOLS%vsvars32.bat"を実行するだけのため、これをそのまま実行しても同じ。

HTKライブラリをビルドする。

cd HTKLib
nmake /f htk_htklib_nt.mkf all
cd ..

HTKツールをビルドする。

cd HTKTools
nmake /f htk_htktools_nt.mkf all
cd ..
cd HLMLib
nmake /f htk_hlmlib_nt.mkf all
cd ..
cd HLMTools
nmake /f htk_hlmtools_nt.mkf all
cd ..

作成したツールについては、The HTK Toolsに解説があります。

動作確認

ソースコードと同じページのHTK Speech Recognition ToolkitにHTK samplesがあり、これで動作を確認できます。これの使用法は、サンプルを展開したsamples\HTKDemo\README.NTにあります。

まずHTKDemoフォルダ以下に、次のように空のフォルダを作成します。

HTKDemo
- accs
- hmms
  - hmm.0
  - hmm.1
  - hmm.2
  - hmm.3
  - tmp
- proto
- test

そしてHTKDemoフォルダへ移動し、configsにある.dcfファイルを指定してrunDemo.plを実行します。たとえばmonPlainM1S3.dcfを指定するならば、次のようにします。

C:\samples>cd HTKDemo
C:\samples\HTKDemo>perl runDemo.pl configs\monPlainM1S3.dcf

実行時に「'HInit' は、内部コマンドまたは外部コマンド、操作可能なプログラムまたはバッチファイルとして認識されていません。」となるときには、HTKをコンパイルしたフォルダにパスが通っていません。このときはそのフォルダ (標準ではbin.win32) をパスに追加します。

「Must be in directory HTKDemo to run this script」とエラーとなるときには、スクリプトをHTKDemoフォルダで実行していません。または現在のフォルダを正しく取得できていない場合もあるため、runDemo.plのdie "Must be in directory HTKDemo to run this script\n";の記述があるあたりを調べます。

Toolkit

HTK Processing Stages

各ツールのヘルプは、何もオプションを指定せずに呼び出すことで確認できます。

C:\>hslab

USAGE: HSLab [options] waveformFile

 Option                                       Default

 -a      auto-increment global label          off
 -i s    Output transcriptions to MLF s       off
 …

Data Preparation Tools

Speech
- HSLab … 対話方式のラベル編集
```
HSLab [options] waveformFile
```
  Contents of HSLab
- HCopy … ファイルのコピー、または指定フォーマットによる変換を伴うコピー
```
HCopy [options] src [ + src …] tgt …
```
  Contents of HCopy
- HList … サポートされるファイルの内容確認
```
HList [options] file …
```
  Contents of HList
- HQuant … コードブックの構築
```
HQuant [options] vqFile trainFiles…
```
  Contents of HQuant
Transcriptions
- HLEd … スクリプトによるラベルの編集
```
HLEd [options] edCmdFile labFiles…
```
  Contents of HLEd
- HLStats … ラベルを読み込み、認識用の言語モデルを作成
```
HLStats [options] hmmList labFile…
```
  Contents of HLStats
- HDMan … 1つ以上の辞書から発音辞書を作成
```
HDMan [options] newDict srcDict1 srcDict2 …
```
  Contents of HDMan

Contents of Data Preparation Tools

HDMan

コマンド	作用
`AS A B …`	Append silence models A, B, etc to each pronunciation.
`CR X A Y B`	Replace phone Y in the context of A_B by X. Contexts may include an asterix * to denote any phone or a defined context set defined using the DC command.
`DC X A B …`	Define the set A, B, …as the context X.
`DD X A B …`	Delete the definition for word X starting with phones A, B, ….
`DP A B C …`	Delete any occurrences of phones A or B or C ….
`DS src`	Delete each pronunciation from source src unless it is the only one for the current word.
`DW X Y Z …`	Delete words (& definitions) X, Y, Z, ….
`FW X Y Z …`	Define X, Y, Z, … as function words and change each phone in the definition to a function word specific phone. For example, in word W phone A would become W.A.
`IR`	Set the input mode to raw. In raw mode, words are regarded as arbitrary sequences of printing chars. In the default mode, words are strings as defined in section 4.6.
`LC [X]`	Convert all phones to be left-context dependent. If X is given then the 1st phone a in each word is changed to X-a otherwise it is unchanged.
`LP`	Convert all phones to lowercase.
`LW`	Convert all words to lowercase.
`MP X A B …`	Merge any sequence of phones A B … and rename as X.
`RC [X]`	Convert all phones to be right-context dependent. If X is given then the last phone z in each word is changed to z+X otherwise it is unchanged.
`RP X A B …`	Replace all occurrences of phones A or B …by X.
`RS system`	Remove stress marking. Currently the only stress marking system supported is that used in the dictionaries produced by Carnegie Melon University (system = cmu).
`RW X A B …`	Replace all occurrences of word A or B …by X.
`SP X A B …`	Split phone X into the sequence A B C ….
`TC [X [Y]`	] Convert phones to triphones. If X is given then the first phone a is converted to X-a+b otherwise it is unchanged. If Y is given then the last phone z is converted to y-z+Y otherwise if X is given then it is changed to y-z+X otherwise it is unchanged.
`UP`	Convert all phones to uppercase.
`UW`	Convert all words to uppercase.

Contents of Function - HDMan

Training Tools

HCompV …

HCompV [options] [hmmFile] trainFiles…

Contents of HCompV

HInit …
```
HInit [options] hmmFile trainFiles…
```
Contents of HInit
HERest …
```
HERest [options] hmmList dataFiles…
```
Contents of HERest
HRest … Baum-WelchによるHMMの学習
```
HRest [options] hmmFile trainFiles…
```
Contents of HRest
HHEd … HMM定義の操作
```
HHEd [options] editF hmmList
```
Contents of HHEd

Contents of Training Tools

HHEd

RO 100 "stats"

TR 0

QS  "R_NonBoundary"  { *+* }
QS  "R_Silence"      { *+sil }
QS  "R_Stop"         { *+p,*+pd,*+b,*+t,*+td,*+d,*+dd,*+k,*+kd,*+g }
…

TR 2

TB 350 "ST_ae_2_" {("ae","*-ae+*","ae+*","*-ae").state[2]}
TB 350 "ST_b_2_"  {("b","*-b+*","b+*","*-b").state[2]}
TB 350 "ST_ah_2_" {("ah","*-ah+*","ah+*","*-ah").state[2]}
…

TR 1

AU "./fulllist"
CO "./tiedlist"

ST "./trees"

tree.hed

RO … outlier threshhold
1st "TR" … trace level
QS … question - defined by the user

2nd "TR" … enables intermediate level progress reporting
TB … clusters one specific set of states - created with the mkclscript.prl command
AU … synthesize previously unseen triphones, i.e. use the set of newly created decision trees to make all the triphones included in the list
CO … compact the model set: some state definitions will be exactly the same (same means and variances etc.). To save space, only one of these states is kept in the definition, others are added to the tiedlist.
ST … save the decision trees in a file

AUで指定されたファイルにある音素の組み合わせが辞書に存在しないと、次のようなエラーにより処理が中断されます。speech recognition - How to solve error in HTK ERROR [+2662] FindProtoModel: no proto for ei in hSet - Stack Overflow

ERROR [+2662]  FindProtoModel: no proto for ** in hSet
FATAL ERROR - Terminating program HHEd

参考

Recognition Tools

HVite … 音素ラベルの作成
```
HVite [options] VocabFile HMMList DataFiles…
```
Contents of HVite Contents of HVITE

Contents of Recognition Tools

Analysis Tool

HResults … 認識率の計算
```
HResults [options] labelList recFiles…
```
Contents of HResults

Contents of Analysis Tool

参考

HTK 3.0 HOWTO (工事中)

音響モデルの作成

Juliusでの使用を想定した音響モデルの作成方法が、Create Acoustic Models - voxforge.orgで解説されています。またこの資料で用いられているファイルがVoxForge/develop · GitHubからダウンロードできます。

参考

音声メディア研究室 HMM 作成メモ (2002/10/15)

トラブル対処法

エラー内容	対処法
ERROR [+1452] ReadDictProns: word `WORD` out of order in dict `FILENAME`	単語を並び替える
ERROR [+1450] ReadCmd: Invalid Command <・ｿ`**`> in file `FILENAME`	ファイルのエンコーディングをBOMなしに変換
ERROR [+8050] ReadDict: Phone or outsym expected in word `WORD`	ファイル末尾に改行を追加

参考

UNDERSTANDING HTK ERROR MESSAGES - USING HTK

参考

The HTKBook - HTK Speech Recognition Toolkit University of Cambridge
※閲覧にはユーザー登録が必要です
- HTKBook for HTK3 (HTML版)
近藤悠介 on the Web | HTK

チュートリアル

※閲覧にはユーザー登録が必要です ※以下のリンクは、フレームを用いないページで開きます。

Tutorial Overview - Contents of Tutorial Overview

複数の技術系サイトから、まとめて検索