openSMILE プログラミング解説

導入

用途

自動的な音声認識 (automatic speech recognition)
話者同定 (speaker identification)
感情認識 (emotion recognition)
ビートトラッキング (beat tracking)
和音検出 (chord detection)

ダウンロード

Download - audEERING | Intelligent Audio Engineering – openSMILE

Windows向けには、バイナリがbin\Win32\SMILExtract_Release.exeに、Visual Studio用のソリューションファイルがide\vs10\openSmile.slnにあります。ライセンスについては、研究と個人利用ならば無償とされています。

openEAR

openEAR download | SourceForge.net

コンパイル

バイナリがbinフォルダ以下に用意されているため、それで十分ならばそれを用います。2.2 Compiling the openSMILE source code | openSMILE book

トラブル対処法

portaudioのコンパイル時に、

pa_asio.cpp(115): fatal error C1083: include ファイルを開けません。'asiosys.h':No such file or directory

c1xx : fatal error C1083: ソース ファイルを開けません。'..\..\src\hostapi\asio\ASIOSDK\host\pc\asiolist.cpp':No such file or directory

のようにエラーとなるときには、Developers | SteinbergからASIO SDKをダウンロードし、展開したファイルをASIOSDKのフォルダ名で、thirdparty\portaudio\src\hostapi\asio\ASIOSDKの位置に配置します。

openSmileLibのコンパイル時に、

componentManager.obj : error LNK2001: 外部シンボル ""protected: virtual void __thiscall cSimpleMessageSender::fetchConfig(void)" (?fetchConfig@cSimpleMessageSender@@MAEXXZ)" は未解決です。

のようにエラーとなるときには、src\examples\simpleMessageSender.cppをopenSmileLibプロジェクトに追加します。

デバッグ実行時に「プログラム '***\SMILExtract_Debug.exe' を開始できません。」として実行に失敗するときには、SMILExtractプロジェクトのプロパティでリンカーの[出力ファイル]を、全般の[出力ディレクトリ]に合うように修正します。

SMILExtract

SMILExtract [-option (value)] ...

オプション	機能
-C または -configfile	openSMILEのconfig fileへのパス。既定値は"smile.conf"
-I または -inputfile	入力するオーディオファイルへのパス対応するフォーマットはRIFF-WAVE (PCM) のみのため、それ以外では事前に変換が必要
-O または -output	出力するファイルへのパス
-h	使用方法を表示

SMILExtract -h ※個々のオプションは、それぞれの構成ファイル内のコマンドラインオプションで定義されています。

たとえば構成ファイルにIS09_emotion.confを用いてsample.wavの特徴量を求め、それをsample.arffに出力するには、次のようにします。

SMILExtract -C config\IS09_emotion.conf -I sample01.wav -O sample.arff

同一ファイルに出力すると、結果は末尾に追記されます。

SMILExtract -C config\IS09_emotion.conf -I sample02.wav -O sample.arff

これを利用すると、次のようにすることで複数ファイルを一括して処理できます。

for %i in (*.wav) do SMILExtract -C config\IS09_emotion.conf -I %i -O sample.arff

構成ファイル (configuration files)

openSMILE付属の構成ファイル
区分	構成ファイル
Chroma features for key and chord recognition	chroma_fft.conf chroma_filt.conf
MFCC for speech recognition	MFCC12_0_D_A.conf MFCC12_0_D_A_Z.conf MFCC12_E_D_A.conf MFCC12_E_D_A_Z.conf
PLP for speech recognition	PLP_0_D_A.conf PLP_0_D_A_Z.conf PLP_E_D_A.conf PLP_E_D_A_Z.conf
Prosody (Pitch and loudness)	prosodyAcf.conf … pitch (ACF) and intensity prosodyShs.conf … pitch and intensity prosodyShsViterbiLoudness.conf … pitch and loudness
The INTERSPEECH 2009 Emotion Challenge feature set (Feature, Classifier, and Open Performance Comparison for Non-Prototypical Spontaneous Emotion Recognition)The INTERSPEECH 2009 Emotion Challenge: Results and Lessons Learnt	IS09_emotion.conf
The INTERSPEECH 2010 Paralinguistic Challenge feature set (Age, Gender and Affect) INTERSPEECH 2010 Paralinguistic Challenge (Special Session) - INTERSPEECH 2010	IS10_paraling.conf emobase2010.conf (arff_targets.conf) IS10_paraling_compat.conf
The INTERSPEECH 2011 Speaker State Challenge feature set (Intoxication and Sleepiness)	IS11_speaker_state.conf
The INTERSPEECH 2012 Speaker Trait Challenge feature set (Personality, Likability, Pathology)	IS12_speaker_trait.conf IS12_speaker_trait_compat.conf
The INTERSPEECH 2013 ComParE feature set (Social Signals, Conflict, Emotion, Autism) ComParE (Computational Paralinguistics Challenge)	IS13_ComParE.conf IS13_ComParE_Voc.conf
The MediaEval 2012 TUM feature set for violent scenes detection
live emotion recognition (base set of 988 features, 1st level functionals of low-level descriptors such as MFCC, Pitch, LSP, ...)	emobase.conf emobase_live4.conf emobase_live4_batch.conf emobase_live4_batch_single.conf
emotion features (large set of 6552 features, 1st level functionals of low-level descriptors such as MFCC, Pitch, LSP, ...)	emo_large.conf
extract a pseudo auditory spectrum (26 mel-band spectrum with equal loudness weighting, delta and acceleration coefficients)	audspec.conf audspec_compat.conf
AVEC 2011 challenge	avec2011.conf avec2013.conf
ComParE	ComParE_2016.conf
listing audio devices	list_audio_devices.conf
speech prosody features	liveProsodyAcf.conf … pitch and loudness
SHS viterbi smoothed pitch	smileF0.conf smileF0_base.conf smileF0_mean.conf

セクション (sections)

構成ファイルは次の形式のヘッダから始まる、セクションごとにまとめられます。

[sectionName:sectionType]

4.2 Understanding configuration files | openSMILE book

コンポーネント (components)

コンポーネントは、次のように記述されたセクション以降で定義されます。

[componentInstances:cComponentManager]

有効なコンポーネントとそのオプションは、

SMILExtract -L

とすることで確認できます。

構成コンポーネント (configuring components)

個々のコンポーネントの設定値は、次のように定義されます。

instance[source1].type = cWaveSource

4.2.2 Configuring components | openSMILE book

そしてその値は、コンポーネントのインスタンスと同名で、次のように記述します。

[source1:cWaveSource]
; the following sets the level this component writes to
; the level will be created by this component
; no other components may write to a level having the same name
writer.dmLevel = wave
filename = input.wav

他の構成ファイルのインクルード

他のファイルを読み込みたい場所に記述します。

\{path/to/config.file.to.include\}

4.2.3 Including other configuration files | openSMILE book

これは独立した行ならばどこにでも記述でき、その位置が読み込んだファイルで置き換えられます。

新しいコマンドラインオプションの定義

\cm[longoption(shortoption){default value}:description text]

4.2.4 Linking to command-line options | openSMILE book

「;」「#」「//」「%」の文字に続く行がコメントと解釈されます。また「/*」と「*/」だけの行で囲まれた範囲も、コメントとなります。4.2.6 Comments | openSMILE book

参考

openSMILE book
4.2 Understanding configuration files

参考

Installation and Documentation - audEERING | Intelligent Audio Engineering – openSMILE
- openSMILE book

複数の技術系サイトから、まとめて検索