
Julius rev.4.3.1 - based on JuliusLib rev.4.3.1 (fast)

Global Options

Feature Vector Input

Feature Vector Input:
   [-input devname]      input source  (default = htkparam)
        htkparam/mfcfile    feature vectors in HTK parameter file format
        outprob             outprob vectors in HTK parameter file format
        vecnet              receive vectors from client (TCP/IP)
   [-filelist file]      filename of input file list

Speech Input

Speech Input:
   (Can extract MFCC/FBANK/MELSPEC features from waveform)
   [-input devname]    input source  (default = htkparam)
        file/rawfile      waveform file (RAW(BE),WAV)
        mic               default microphone device
        adinnet           adinnet client (TCP/IP)
        stdin             standard input
   [-filelist file]    filename of input file list
   [-adport portnum]   adinnet port number to listen         (5530)
   [-48]               enable 48kHz sampling with internal down sampler (OFF)
   [-zmean/-nozmean]   enable/disable DC offset removal      (OFF)
   [-lvscale]          input level scaling factor (1.0: OFF) (1.0)
   [-nostrip]          disable stripping off zero samples
   [-record dir]       record triggered speech data to dir
   [-rejectshort msec] reject an input shorter than specified
   [-rejectlong msec]  reject an input longer than specified

Speech Detection

Speech Detection: (default: on=mic/net off=files)
   [-cutsilence]       turn on (force) skipping long silence
   [-nocutsilence]     turn off (force) skipping long silence
   [-lv unsignedshort] input level threshold (0-32767)  (2000)
   [-zc zerocrossnum]  zerocross num threshold per sec. (60)
   [-headmargin msec]  header margin length in msec.    (300)
   [-tailmargin msec]  tail margin length in msec.      (400)
   [-chunksize sample] unit length for processing       (1000)


    [-fvad]             FVAD sw (-1=off, 0-3=on / degree      (-1)
    [-fvad_param i f]   FVAD parameter (dur/thres)            (5 0.50)

GMM utterance verification

GMM utterance verification:
   -gmm filename       GMM definition file
   -gmmnum num         GMM Gaussian pruning num              (10)
   -gmmreject string   comma-separated list of noise model name to reject

On-the-fly Decoding

On-the-fly Decoding: (default: on=mic/net off=files)
   [-realtime]         turn on, input streamed with MAP-CMN
   [-norealtime]       turn off, input buffered with sentence CMN


   [-C jconffile]      load options from jconf file
   [-quiet]            reduce output to only word string
   [-demo]             equal to "-quiet -progout"
   [-debug]                 (for debug) dump numerous log
   [-callbackdebug]         (for debug) output message per callback
   [-check (wchmm|trellis)] (for debug) check internal structure
   [-check triphone]   triphone mapping check
   [-outprobout file]  Output state probabilities to file
   [-setting]          print engine configuration and exit
   [-help]             print this message and exit

Instance Declarations

   [-AM]               start a new acoustic model instance
   [-LM]               start a new language model instance
   [-SR]               start a new recognizer (search) instance
   [-AM_GMM]           start an AM feature instance for GMM
   [-GLOBAL]           start a global section
   [-nosectioncheck]   disable option location check
複数モデル認識のためのインスタンス宣言 - julius

Acoustic Model Options (-AM)

Acoustic analysis

Acoustic analysis:
   [-htkconf file]         load parameters from the HTK Config file
   [-smpFreq freq]         sample period (Hz)                    (16000)
   [-smpPeriod period]     sample period (100ns)                 (625)
   [-fsize sample]         window size (sample)                  (400)
   [-fshift sample]        frame shift (sample)                  (160)
   [-preemph]              pre-emphasis coef.                    (0.97)
   [-fbank]                number of filterbank channels         (24)
   [-ceplif]               cepstral liftering coef.              (22)
   [-rawe] [-norawe]       toggle using raw energy               (no)
   [-enormal] [-noenormal] toggle normalizing log energy         (no)
   [-escale]               scaling log energy for enormal        (1.0)
   [-silfloor]             energy silence floor in dB            (50.0)
   [-delwin frame]         delta windows length (frame)          (2)
   [-accwin frame]         accel windows length (frame)          (2)
   [-hifreq freq]          freq. of upper band limit, off if <0  (-1)
   [-lofreq freq]          freq. of lower band limit, off if <0  (-1)
   [-sscalc]               do spectral subtraction (file input only)
   [-sscalclen msec]       length of head silence for SS (msec)  (300)
   [-ssload filename]      load constant noise spectrum from file for SS
   [-ssalpha value]        alpha coef. for SS                    (2.000000)
   [-ssfloor value]        spectral floor for SS                 (0.500000)
   [-zmeanframe/-nozmeanframe] frame-wise DC removal like HTK    (OFF)
   [-usepower/-nousepower] use power in fbank analysis           (OFF)
   [-cmnload file]         load initial CMN param from file on startup
   [-cmnsave file]         save CMN param to file after each input
   [-cmnnoupdate]          not update CMN param while recog. (use with -cmnload)
   [-cmnmapweight]         weight value of initial cm for MAP-CMN (100.00)
   [-cvn]                  cepstral variance normalisation       (on)
   [-vtln alpha lowcut hicut] enable VTLN (1.0 to disable)   (1.000000)


    [-cmnstatic]        no MAP, use static CMN (use with -cmnload)
オプション 内容
-zmeanframe フレーム単位のDC成分除去を行う (HTKのZMEANSOURCEに相当)

Acoustic Model

Acoustic Model:
   -h hmmdefsfile       HMM definition file name
   [-hlist HMMlistfile] HMMlist filename (must for triphone model)
   [-iwcd1 methodname]  switch IWCD triphone handling on 1st pass
            best N      use N best score (default of n-gram, N=3)
            max         use maximum score
            avg         use average score (default of dfa)
   [-force_ccd]         force to handle IWCD
   [-no_ccd]            don't handle IWCD
   [-notypecheck]       don't check input parameter type
   [-spmodel HMMname]   name of short pause model             ("sp")
   [-multipath]         switch decoding for multi-path HMM    (auto)



    [-dnnconf file]     DNN configuration file


Feature Extraction
オプション 内容 サンプルでの値
feature_type feature type, in HTK parameter specification format FBANK_D_A_Z
feature_options julius options to configure the acoustic parameter extraction. -htkconf model/dnn/config.lmfb.40ch.jnas -cvn -cmnload model/dnn/norm.jnas -cmnstaticw
feature_len feature vector length (including delta or accel, before splicing) 120
context_len splicing length 11
オプション 内容 サンプルでの値
-htkconf パラメータを読み込むHTK configファイル model/dnn/config.lmfb.40ch.jnas
-cvn CMN/CVNを用いる  
-cmnload ケプストラム平均 (cepstral mean) とケプストラム分散 (cepstral variance) を読み込むファイル model/dnn/norm.jnas
-cmnstatic ケプストラム平均とケプストラム分散を維持し、処理中に更新しない  
NN Definition
オプション 内容 サンプルでの値
input_nodes number of input nodes (should be equal to (feature_len * context_len)) 1320
output_nodes number of output nodes (num and order should correspond to HMM definition) 2004
hidden_nodes number of nodes in hidden layers 2048
hidden_layers number of hidden layers (layers excluding input and output) 5
state_prior state prior in 'state_id(%d) prior(%e)' format model/dnn/prior.dnn
state_prior_factor state prior factor 1.0
batch_size batch size (not used) 64
W1 ~ W5 weights W and biases b for hidden layers, in numpy np.save() format. dtype of these file should be '<f4' (32-bit float little indian) model/dnn/W_l1.npy … W_l5.npy
B1 ~ B5 model/dnn/bias_l1.npy … bias_l5.npy
output_W also weights and biases for output layer model/dnn/W_output.npy
output_B model/dnn/bias_output.npy

Acoustic Model Computation Method

Acoustic Model Computation Method:
   [-gprune methodname] select Gaussian pruning method:
            safe          safe pruning
            heuristic     heuristic pruning
            beam          beam pruning (default for TM/PTM)
            none          no pruning (default for non tmix models)
   [-tmix gaussnum]    Gaussian num threshold per mixture for pruning (2)
   [-gshmm hmmdefs]    monophone hmmdefs for GS
   [-gsnum N]          N-best state will be selected        (24)

Language Model Options (-LM)


   -d file.bingram     n-gram file in Julius binary format
   -nlr file.arpa      forward n-gram file in ARPA format
   -nrl file.arpa      backward n-gram file in ARPA format
   [-lmp float float]  weight and penalty (tri: 8.0 -2.0 mono: 5.0 -1)
   [-lmp2 float float]       for 2nd pass (tri: 8.0 -2.0 mono: 6.0 0)
   [-transp float]     penalty for transparent word (+0.0)
N-gram - julius 言語重みおよび挿入ペナルティ - 第8章 認識アルゴリズムとパラメータ
オプション  内容
-lmp 言語重みと挿入ペナルティ (第1パス (2-gram))
たとえば-lmp 8.0 7.0のとき言語確率の対数尤度log p(w)は、(log p(w)) * 8.0 + 7.0として適用される
  • 言語重み … 音響モデルによる仮説への影響を抑えるため、言語モデルの尤度に乗じる係数
  • 単語挿入ペナルティ … 短い単語仮説が連続しないように単語間の遷移を抑制する。負では単語挿入を抑制するが、正では単語挿入を促進する
言語重みおよび挿入ペナルティ - 第8章 認識アルゴリズムとパラメータ
  • monophone向け
    • -lmp 5.0 -1.0
    • -lmp2 6.0 0.0
  • triphone向け
    • -lmp 8.0 -2.0
    • -lmp2 8.0 -2.0
  • triphone向け (v2.1設定:単語間triphoneを第1パスで扱わない場合)
    • -lmp 9.0 8.0
    • -lmp2 11.0 -2.0
-lmp2 言語重みと挿入ペナルティ (第2パス (3-gram))  

DFA Grammar

DFA Grammar:
   -dfa file.dfa       DFA grammar file
   -gram file[,file2...] (list of) grammar prefix(es)
   -gramlist filename  filename of grammar list
   [-penalty1 float]   word insertion penalty (1st pass)     (0.0)
   [-penalty2 float]   word insertion penalty (2nd pass)     (0.0)

Word Dictionary for N-gram and DFA

Word Dictionary for N-gram and DFA:
   -v dictfile         dictionary file name
   [-silhead wordname] (n-gram) beginning-of-sentence word   (<s>)
   [-siltail wordname] (n-gram) end-of-sentence word         (</s>)
   [-mapunk wordname]  (n-gram) map unknown words to this    (<unk>)
   [-forcedict]        ignore error entry and keep running
   [-iwspword]         (n-gram) add short-pause word for inter-word CD sp
   [-iwspentry entry]  (n-gram) word entry for "-iwspword" (<UNK> [sp] sp sp)
   [-adddict dictfile] (n-gram) load extra dictionary
   [-addentry entry]   (n-gram) load extra word entry
オプション  内容
-v 単語辞書のファイル名  
-silhead 話の最初にある無音の単語の名前。単語の読み (N-gramエントリ名) または#+単語番号 (単語辞書ファイルの行番号-1) で指定する。これが単語辞書になければエラーとなる。N-gram - julius  
-siltail 話の最後にある無音の単語の名前。指定方法は-silheadと同じ  

Isolated Word Recognition

Isolated Word Recognition:
   -w file[,file2...]  (list of) wordlist file name(s)
   -wlist filename     file that contains list of wordlists
   -wsil head tail sp  name of silence/pause model
                         head - BOS silence model name       (silB)
                         tail - EOS silence model name       (silE)
                          sp  - their name as context or "NULL" (NULL)

Recognizer / Search Options (-SR)

Search Parameters for the First Pass

Search Parameters for the First Pass:
   [-b beamwidth]      beam width (by state num)             (guessed)
                       (0: full search, -1: force guess)
   [-bs score_width]   beam width (by score offset)          (disabled)
                       (-1: disable)
   [-sepnum wordnum]   (n-gram) # of hi-freq word isolated from tree (150)
   [-1pass]            do 1st pass only, omit 2nd pass
   [-inactive]         recognition process not active on startup
ビーム幅 - 第8章 認識アルゴリズムとパラメータ
オプション 内容
-b 第1パスのビーム幅 (ノード数)
  • monophone … 400
  • triphone,PTM … 800
  • triphone,PTM,engine=v2.1 … 1000

Search Parameters for the Second Pass

Search Parameters for the Second Pass:
   [-b2 hyponum]       word envelope beam width (by hypo num) (30)
   [-n N]              # of sentence to find                 (1)
   [-output N]         # of sentence to output               (1)
   [-sb score]         score beam threshold (by score)       (80.0)
   [-s hyponum]        global stack size of hypotheses       (500)
   [-m hyponum]        hypotheses overflow threshold num     (2000)
   [-lookuprange N]    frame lookup range in word expansion  (5)
   [-looktrellis]      (dfa) expand only backtrellis words
   [-[no]multigramout] (dfa) output per-grammar results
   [-oldtree]          (dfa) use old build_wchmm()
   [-oldiwcd]          (dfa) use full lcdset
   [-iwsp]             insert sp for all word end (multipath)(off)
   [-iwsppenalty]      trans. penalty for iwsp (multipath)   (-1.0)
第2パス - 第8章 認識アルゴリズムとパラメータ N-bestリスト - 第8章 認識アルゴリズムとパラメータ
オプション 内容
-sb 第2パスの仮説尤度計算時のスコア幅 80.0
-b2 第2パスの仮説数ビームの幅 (仮説数) 30
-s 第2パスの最大スタック数 (仮説数) 500
-m 第2パスの仮説オーバフローのしきい値 2000
-lookuprange 第2パスで単語展開時のトレリス制約緩和幅 (フレーム数) 5
-n 第2パスで見つける文の数 (文数)
  • 1
  • 'standard' 設定時のデフォルト … 10

Short-pause Segmentation

Short-pause Segmentation:
   [-spsegment]        enable short-pause segmentation
   [-spdur]            length threshold of sp frames         (10)
   [-pausemodels str]  comma-delimited list of pause models for segment
ショートポーズセグメンテーション - 第8章 認識アルゴリズムとパラメータ

Graph Output with graph-oriented search

Graph Output with graph-oriented search:
   [-lattice]          enable word graph (lattice) output
   [-confnet]          enable confusion network output
   [-nolattice]][-noconfnet] disable lattice / confnet output
   [-graphrange N]     merge same words in graph (0)
                       -1: not merge, leave same loc. with diff. score
                        0: merge same words at same location
                       >0: merge same words around the margin
   [-graphcut num]     graph cut depth at postprocess (-1: disable)(80)
   [-graphboundloop num] max. num of boundary adjustment loop (20)
   [-graphsearchdelay] inhibit search termination until 1st sent. found
   [-nographsearchdelay] disable it (default)

Forced Alignment

Forced Alignment:
   [-walign]           optionally output word alignments
   [-palign]           optionally output phoneme alignments
   [-salign]           optionally output state alignments
アラインメント出力 - 第8章 認識アルゴリズムとパラメータ

Minimum Bayes Risk Decoding


Minimum Bayes Risk Decoding:
   [-mbr]              enable rescoring sentence on MBR(WER)
   [-mbr_wwer]         enable rescoring sentence on MBR(WWER)
   [-nombr]            disable rescoring sentence on MBR
   [-mbr_weight float float] score and loss func. weight on MBR (0.1 1.0)

Minimum Bayes-Risk (ベイズリスク最小化 / MBR)

  • WER (Word Error Rate / 単語誤り率 / 単語エラー率)
  • WWER (Weighted Word Error Rate / 重み付き単語誤り率)
ベイズリスク最小化音声認識の複数仮説を用いた音声検索 南條浩輝ほか

実行時に「ERROR: m_options: wrong argument: "-mbr"」としてオプションが認識されないときには、libsentのconfigでUSE_MBRを定義してコンパイルします。

Confidence Score

Confidence Score:
   [-cmalpha value]    CM smoothing factor                    (0.050000)


Message Output

Message Output:
   [-fallback1pass]    use 1st pass result when search failed
   [-progout]          progressive output in 1st pass
   [-proginterval]     interval of progout in msec           (300)
オプション 内容
-progout 第1パスで、一定時間おきにその時点での最尤仮説系列を出力
-proginterval -progoutの出力間隔 [msec]

Additional options for application


Additional options for application:
   [--help]            display this help
   [-help]             display this help
   [-outfile]          save result in separate .out file
   [-nolog]            not output any log
   [-logfile arg]      output log to file
   [-separatescore]    output AM and LM scores separately
   [-kanji arg]        convert character set for output
   [-nocharconv]       disable charconv
   [-charconv arg arg] convert character set for output
   [-outcode arg]      select info to output to the module: WLPSCwlps
   [-module (arg)]     run as a server module
   [-record arg]       record input waveform to file in dir