Julius rev.4.3.1 - based on JuliusLib rev.4.3.1 (fast)
Feature Vector Input: [-input devname] input source (default = htkparam) htkparam/mfcfile feature vectors in HTK parameter file format outprob outprob vectors in HTK parameter file format vecnet receive vectors from client (TCP/IP) [-filelist file] filename of input file list
Speech Input: (Can extract MFCC/FBANK/MELSPEC features from waveform) [-input devname] input source (default = htkparam) file/rawfile waveform file (RAW(BE),WAV) mic default microphone device adinnet adinnet client (TCP/IP) stdin standard input [-filelist file] filename of input file list [-adport portnum] adinnet port number to listen (5530) [-48] enable 48kHz sampling with internal down sampler (OFF) [-zmean/-nozmean] enable/disable DC offset removal (OFF) [-lvscale] input level scaling factor (1.0: OFF) (1.0) [-nostrip] disable stripping off zero samples [-record dir] record triggered speech data to dir [-rejectshort msec] reject an input shorter than specified [-rejectlong msec] reject an input longer than specified
Speech Detection: (default: on=mic/net off=files) [-cutsilence] turn on (force) skipping long silence [-nocutsilence] turn off (force) skipping long silence [-lv unsignedshort] input level threshold (0-32767) (2000) [-zc zerocrossnum] zerocross num threshold per sec. (60) [-headmargin msec] header margin length in msec. (300) [-tailmargin msec] tail margin length in msec. (400) [-chunksize sample] unit length for processing (1000)
v4.5では-fvadが追加されています。
[-fvad] FVAD sw (-1=off, 0-3=on / degree (-1) [-fvad_param i f] FVAD parameter (dur/thres) (5 0.50)
GMM utterance verification: -gmm filename GMM definition file -gmmnum num GMM Gaussian pruning num (10) -gmmreject string comma-separated list of noise model name to reject
On-the-fly Decoding: (default: on=mic/net off=files) [-realtime] turn on, input streamed with MAP-CMN [-norealtime] turn off, input buffered with sentence CMN
Others: [-C jconffile] load options from jconf file [-quiet] reduce output to only word string [-demo] equal to "-quiet -progout" [-debug] (for debug) dump numerous log [-callbackdebug] (for debug) output message per callback [-check (wchmm|trellis)] (for debug) check internal structure [-check triphone] triphone mapping check [-outprobout file] Output state probabilities to file [-setting] print engine configuration and exit [-help] print this message and exit
[-AM] start a new acoustic model instance [-LM] start a new language model instance [-SR] start a new recognizer (search) instance [-AM_GMM] start an AM feature instance for GMM [-GLOBAL] start a global section [-nosectioncheck] disable option location check複数モデル認識のためのインスタンス宣言 - julius
Acoustic analysis: [-htkconf file] load parameters from the HTK Config file [-smpFreq freq] sample period (Hz) (16000) [-smpPeriod period] sample period (100ns) (625) [-fsize sample] window size (sample) (400) [-fshift sample] frame shift (sample) (160) [-preemph] pre-emphasis coef. (0.97) [-fbank] number of filterbank channels (24) [-ceplif] cepstral liftering coef. (22) [-rawe] [-norawe] toggle using raw energy (no) [-enormal] [-noenormal] toggle normalizing log energy (no) [-escale] scaling log energy for enormal (1.0) [-silfloor] energy silence floor in dB (50.0) [-delwin frame] delta windows length (frame) (2) [-accwin frame] accel windows length (frame) (2) [-hifreq freq] freq. of upper band limit, off if <0 (-1) [-lofreq freq] freq. of lower band limit, off if <0 (-1) [-sscalc] do spectral subtraction (file input only) [-sscalclen msec] length of head silence for SS (msec) (300) [-ssload filename] load constant noise spectrum from file for SS [-ssalpha value] alpha coef. for SS (2.000000) [-ssfloor value] spectral floor for SS (0.500000) [-zmeanframe/-nozmeanframe] frame-wise DC removal like HTK (OFF) [-usepower/-nousepower] use power in fbank analysis (OFF) [-cmnload file] load initial CMN param from file on startup [-cmnsave file] save CMN param to file after each input [-cmnnoupdate] not update CMN param while recog. (use with -cmnload) [-cmnmapweight] weight value of initial cm for MAP-CMN (100.00) [-cvn] cepstral variance normalisation (on) [-vtln alpha lowcut hicut] enable VTLN (1.0 to disable) (1.000000)
Rev.4.4.1では-cmnstaticが追加され、-cmnnoupdateが変更されています。
[-cmnstatic] no MAP, use static CMN (use with -cmnload) [-cmnnoupdate] not updateCMNinitial param while recog. (use with -cmnload)
オプション | 内容 |
---|---|
-zmeanframe | フレーム単位のDC成分除去を行う (HTKのZMEANSOURCEに相当) |
Acoustic Model: -h hmmdefsfile HMM definition file name [-hlist HMMlistfile] HMMlist filename (must for triphone model) [-iwcd1 methodname] switch IWCD triphone handling on 1st pass best N use N best score (default of n-gram, N=3) max use maximum score avg use average score (default of dfa) [-force_ccd] force to handle IWCD [-no_ccd] don't handle IWCD [-notypecheck] don't check input parameter type [-spmodel HMMname] name of short pause model ("sp") [-multipath] switch decoding for multi-path HMM (auto)
Rev.4.4.1では、-dnnconfが追加されています。
[-dnnconf file] DNN configuration file
指定のファイルは、-dnnconfが記述されている位置に展開されます。dictation-kit-v*.*\doc\Sample.dnnconf
オプション | 内容 | サンプルでの値 |
---|---|---|
feature_type | feature type, in HTK parameter specification format | FBANK_D_A_Z |
feature_options | julius options to configure the acoustic parameter extraction. | -htkconf model/dnn/config.lmfb.40ch.jnas -cvn -cmnload model/dnn/norm.jnas -cmnstaticw |
feature_len | feature vector length (including delta or accel, before splicing) | 120 |
context_len | splicing length | 11 |
オプション | 内容 | サンプルでの値 |
---|---|---|
-htkconf | パラメータを読み込むHTK configファイル | model/dnn/config.lmfb.40ch.jnas |
-cvn | CMN/CVNを用いる | |
-cmnload | ケプストラム平均 (cepstral mean) とケプストラム分散 (cepstral variance) を読み込むファイル | model/dnn/norm.jnas |
-cmnstatic | ケプストラム平均とケプストラム分散を維持し、処理中に更新しない |
オプション | 内容 | サンプルでの値 |
---|---|---|
input_nodes | number of input nodes (should be equal to (feature_len * context_len)) | 1320 |
output_nodes | number of output nodes (num and order should correspond to HMM definition) | 2004 |
hidden_nodes | number of nodes in hidden layers | 2048 |
hidden_layers | number of hidden layers (layers excluding input and output) | 5 |
state_prior | state prior in 'state_id(%d) prior(%e)' format | model/dnn/prior.dnn |
state_prior_factor | state prior factor | 1.0 |
batch_size | batch size (not used) | 64 |
W1 ~ W5 | weights W and biases b for hidden layers, in numpy np.save() format. dtype of these file should be '<f4' (32-bit float little indian) | model/dnn/W_l1.npy … W_l5.npy |
B1 ~ B5 | model/dnn/bias_l1.npy … bias_l5.npy | |
output_W | also weights and biases for output layer | model/dnn/W_output.npy |
output_B | model/dnn/bias_output.npy |
Acoustic Model Computation Method: [-gprune methodname] select Gaussian pruning method: safe safe pruning heuristic heuristic pruning beam beam pruning (default for TM/PTM) none no pruning (default for non tmix models) [-tmix gaussnum] Gaussian num threshold per mixture for pruning (2) [-gshmm hmmdefs] monophone hmmdefs for GS [-gsnum N] N-best state will be selected (24)
N-gram: -d file.bingram n-gram file in Julius binary format -nlr file.arpa forward n-gram file in ARPA format -nrl file.arpa backward n-gram file in ARPA format [-lmp float float] weight and penalty (tri: 8.0 -2.0 mono: 5.0 -1) [-lmp2 float float] for 2nd pass (tri: 8.0 -2.0 mono: 6.0 0) [-transp float] penalty for transparent word (+0.0)N-gram - julius 言語重みおよび挿入ペナルティ - 第8章 認識アルゴリズムとパラメータ
オプション | 内容 | 値 |
---|---|---|
-lmp | 言語重みと挿入ペナルティ (第1パス (2-gram)) たとえば -lmp 8.0 7.0 のとき言語確率の対数尤度log p(w)は、(log p(w)) * 8.0 + 7.0として適用される
|
|
-lmp2 | 言語重みと挿入ペナルティ (第2パス (3-gram)) |
DFA Grammar: -dfa file.dfa DFA grammar file -gram file[,file2...] (list of) grammar prefix(es) -gramlist filename filename of grammar list [-penalty1 float] word insertion penalty (1st pass) (0.0) [-penalty2 float] word insertion penalty (2nd pass) (0.0)
Word Dictionary for N-gram and DFA: -v dictfile dictionary file name [-silhead wordname] (n-gram) beginning-of-sentence word (<s>) [-siltail wordname] (n-gram) end-of-sentence word (</s>) [-mapunk wordname] (n-gram) map unknown words to this (<unk>) [-forcedict] ignore error entry and keep running [-iwspword] (n-gram) add short-pause word for inter-word CD sp [-iwspentry entry] (n-gram) word entry for "-iwspword" (<UNK> [sp] sp sp) [-adddict dictfile] (n-gram) load extra dictionary [-addentry entry] (n-gram) load extra word entry
オプション | 内容 | 値 |
---|---|---|
-v | 単語辞書のファイル名 | |
-silhead | 話の最初にある無音の単語の名前。単語の読み (N-gramエントリ名) または#+単語番号 (単語辞書ファイルの行番号-1) で指定する。これが単語辞書になければエラーとなる。N-gram - julius | |
-siltail | 話の最後にある無音の単語の名前。指定方法は-silheadと同じ | |
Isolated Word Recognition: -w file[,file2...] (list of) wordlist file name(s) -wlist filename file that contains list of wordlists -wsil head tail sp name of silence/pause model head - BOS silence model name (silB) tail - EOS silence model name (silE) sp - their name as context or "NULL" (NULL)
Search Parameters for the First Pass: [-b beamwidth] beam width (by state num) (guessed) (0: full search, -1: force guess) [-bs score_width] beam width (by score offset) (disabled) (-1: disable) [-sepnum wordnum] (n-gram) # of hi-freq word isolated from tree (150) [-1pass] do 1st pass only, omit 2nd pass [-inactive] recognition process not active on startupビーム幅 - 第8章 認識アルゴリズムとパラメータ
オプション | 内容 | 値 |
---|---|---|
-b | 第1パスのビーム幅 (ノード数) |
|
Search Parameters for the Second Pass: [-b2 hyponum] word envelope beam width (by hypo num) (30) [-n N] # of sentence to find (1) [-output N] # of sentence to output (1) [-sb score] score beam threshold (by score) (80.0) [-s hyponum] global stack size of hypotheses (500) [-m hyponum] hypotheses overflow threshold num (2000) [-lookuprange N] frame lookup range in word expansion (5) [-looktrellis] (dfa) expand only backtrellis words [-[no]multigramout] (dfa) output per-grammar results [-oldtree] (dfa) use old build_wchmm() [-oldiwcd] (dfa) use full lcdset [-iwsp] insert sp for all word end (multipath)(off) [-iwsppenalty] trans. penalty for iwsp (multipath) (-1.0)第2パス - 第8章 認識アルゴリズムとパラメータ N-bestリスト - 第8章 認識アルゴリズムとパラメータ
オプション | 内容 | 値 |
---|---|---|
-sb | 第2パスの仮説尤度計算時のスコア幅 | 80.0 |
-b2 | 第2パスの仮説数ビームの幅 (仮説数) | 30 |
-s | 第2パスの最大スタック数 (仮説数) | 500 |
-m | 第2パスの仮説オーバフローのしきい値 | 2000 |
-lookuprange | 第2パスで単語展開時のトレリス制約緩和幅 (フレーム数) | 5 |
-n | 第2パスで見つける文の数 (文数) |
|
Short-pause Segmentation: [-spsegment] enable short-pause segmentation [-spdur] length threshold of sp frames (10) [-pausemodels str] comma-delimited list of pause models for segmentショートポーズセグメンテーション - 第8章 認識アルゴリズムとパラメータ
Graph Output with graph-oriented search: [-lattice] enable word graph (lattice) output [-confnet] enable confusion network output [-nolattice]][-noconfnet] disable lattice / confnet output [-graphrange N] merge same words in graph (0) -1: not merge, leave same loc. with diff. score 0: merge same words at same location >0: merge same words around the margin [-graphcut num] graph cut depth at postprocess (-1: disable)(80) [-graphboundloop num] max. num of boundary adjustment loop (20) [-graphsearchdelay] inhibit search termination until 1st sent. found [-nographsearchdelay] disable it (default)
Forced Alignment: [-walign] optionally output word alignments [-palign] optionally output phoneme alignments [-salign] optionally output state alignments
Rev.4.2.3で追加されています。
Minimum Bayes Risk Decoding: [-mbr] enable rescoring sentence on MBR(WER) [-mbr_wwer] enable rescoring sentence on MBR(WWER) [-nombr] disable rescoring sentence on MBR [-mbr_weight float float] score and loss func. weight on MBR (0.1 1.0)
Minimum Bayes-Risk (ベイズリスク最小化 / MBR)
実行時に「ERROR: m_options: wrong argument: "-mbr"」としてオプションが認識されないときには、libsentのconfigでUSE_MBRを定義してコンパイルします。
Confidence Score: [-cmalpha value] CM smoothing factor (0.050000)
単語信頼度。認識結果の単語信頼度について
Message Output: [-fallback1pass] use 1st pass result when search failed [-progout] progressive output in 1st pass [-proginterval] interval of progout in msec (300)
オプション | 内容 |
---|---|
-progout | 第1パスで、一定時間おきにその時点での最尤仮説系列を出力 |
-proginterval | -progoutの出力間隔 [msec] |
以下はJuliusのオプションであり、JuliusLibを通しては使用できません。
Additional options for application: [--help] display this help [-help] display this help [-outfile] save result in separate .out file [-nolog] not output any log [-logfile arg] output log to file [-separatescore] output AM and LM scores separately [-kanji arg] convert character set for output [-nocharconv] disable charconv [-charconv arg arg] convert character set for output [-outcode arg] select info to output to the module: WLPSCwlps [-module (arg)] run as a server module [-record arg] record input waveform to file in dir