JP4587854B2

JP4587854B2 - Emotion analysis device, emotion analysis program, program storage medium

Info

Publication number: JP4587854B2
Application number: JP2005084638A
Authority: JP
Inventors: 利明石井; 英二平田; 松美鈴木; 創鈴木; 靖吉田
Original assignee: Tokyo Electric Power Co Inc
Current assignee: Tokyo Electric Power Co Inc
Priority date: 2005-03-23
Filing date: 2005-03-23
Publication date: 2010-11-24
Anticipated expiration: 2025-03-23
Also published as: JP2006267464A

Abstract

<P>PROBLEM TO BE SOLVED: To provide an emotion analyzer which can quantitatively evaluate the perplexed condition of an utterer. <P>SOLUTION: The emotion analyzer 1 is provided with: a means which extracts a sound pressure level value from digital voice data obtained by sampling the uttered voice of a person; a means which divides the time series of the digital voice data into measuring intervals; a means which computes an average value of the sound pressure for every measuring interval; a means that recognizes the condition, in which the sound pressure level value continues for equal to or longer than a prescribed time and becomes equal to or less than a reference value, as a pause and obtains the percentage of the time being occupied by the pause for every measuring interval; a means which recognizes an uttered condition and a silenced condition in accordance with the sound pressure level value and obtains the number of appearances per unit time of the uttered condition divided by the silenced condition as the uttering speed for every measuring interval; and an evaluation result outputting means which appropriately outputs the evaluation point that indicates the degree of perplexity corresponding to the total point of the points corresponding to the time percentage between the point corresponding to the average sound pressure level value for every measuring interval and the point corresponding to the uttering speed. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は人が発話したときの音声信号に基づいて発話者の感情を解析する感情解析装置に関する。 The present invention relates to an emotion analysis apparatus that analyzes an emotion of a speaker based on an audio signal when a person speaks.

発話した音声に基づいて感情を解析する技術は、例えば、以下の特許文献１に記載されている。感情解析は、例えば、コールセンターにおける顧客とオペレータとの通話音声を解析し、顧客の感情やオペレータの動揺などを素早く察知して適切な対応を行う、という用途、オペレータの適正判断や能力判定の用途などに適応可能である。
特表２００３−５０８８０５号公報 A technique for analyzing emotions based on spoken speech is described in Patent Document 1 below, for example. Emotion analysis is, for example, the purpose of analyzing call voices between customers and operators in a call center, quickly detecting customer emotions and operator swaying, etc. It is possible to adapt to.
Special table 2003-508805 gazette

従来の感情解析は、主に、喜怒哀楽の感情を抽出することを目的としていた。しかし、コールセンターのオペレータは、顧客に対して平静に対応することが望ましく、喜怒哀楽を表面に出さないように訓練されている場合が多い。そのため、オペレータが何らかの「不安」を感じていたとしても、その感情を喜怒哀楽として顧客にぶつけるわけにはいかない。不安な状態が持続すれば、オペレータは、いずれ顧客に対して曖昧な応対をせざるを得なくなり、最終的には会話に破綻をきたす。そうなれば、顧客を激怒させ、企業に対する信用を失うことにもなりかねない。 Conventional emotion analysis is mainly aimed at extracting emotions of emotions. However, it is desirable for call center operators to respond calmly to customers, and they are often trained not to express emotions. For this reason, even if the operator feels some kind of “anxiety”, the emotion cannot be hit with the customer as emotions. If the state of anxiety persists, the operator will eventually have to respond vaguely to the customer, and eventually the conversation will fail. This could furious customers and lose trust in the company.

本発明者らは、喜怒哀楽に比べ極めて繊細な感情である「不安」すなわち「困惑」を精度良く抽出しようと試みた。そして、この試みのために用意されたコールセンターにおけるオペレータと顧客との電話による会話の録音音声を企業におけるコールセンター部門の責任者など、顧客との応対技術に秀でた複数の人間に何度もその録音音声を試聴してもらい、オペレータの精神状態を指摘してもらった。その一方で、録音音声のサンプリングデータから抽出される音の特徴情報（音圧レベル、周波数など）を詳しく解析した。その結果、オペレータが困惑していると思われる状態では、サンプリングデータから抽出される所定の特徴情報に特定の変化があることが知見できた。本発明は、この知見に基づきなされたもので、その目的は、発話音声から「困惑」という感情を精度良く抽出し、その困惑の程度を定量評価できる感情解析装置を提供することにある。 The present inventors tried to extract “anxiety”, that is, “confused”, which is an extremely delicate emotion compared to emotions, with high accuracy. The voices of telephone conversations between the operator and the customer at the call center prepared for this attempt are repeatedly given to multiple people who are excellent in customer interaction technology, such as the person in charge at the call center department in the company. The audio recording was auditioned and the mental state of the operator was pointed out. On the other hand, the sound feature information (sound pressure level, frequency, etc.) extracted from the sampling data of the recorded sound was analyzed in detail. As a result, it was found that there is a specific change in the predetermined feature information extracted from the sampling data in a state where the operator seems to be confused. The present invention has been made based on this finding, and an object of the present invention is to provide an emotion analysis apparatus that can accurately extract the emotion “confused” from the speech and can quantitatively evaluate the degree of the confusion.

上記目的を達成するための本発明は、発話者の困惑状態を定量解析する感情解析装置であって、
マイクロホンから採取した人の発話音声をサンプリングして得たデジタル音声データから音圧レベル値を抽出する音圧抽出手段と、
デジタル音声データの時系列を所定の時間間隔毎の測定期間に区分する測定期間設定手段と、
測定期間毎に、音圧抽出手段が抽出した音圧レベル値の平均値を平均音圧レベル値として算出する平均音圧取得手段と、
音圧抽出手段が抽出する音圧レベル値が所定時間以上継続して基準値以下となっている状態を間（ま）として認識するとともに、測定期間毎に間の占める時間割合を求める間割合取得手段と、
音圧抽出手段が抽出する音圧レベル値に応じて発話状態と無音状態とを認識するとともに、無音状態によって区切られる発話状態の単位時間当たりの発現回数を発話速度として測定期間毎に求める話速取得手段と、
測定期間毎に、平均音圧レベル値に応じた点数と発話速度に応じた点数と間の時間割合に応じた点数とを求めるとともに、各点数の合計点の値に応じた評価点に基づいて評価結果を適宜に出力する評価結果出力手段と、
を備えている。 The present invention for achieving the above object is an emotion analysis apparatus for quantitatively analyzing a confused state of a speaker,
A sound pressure extraction means for extracting a sound pressure level value from digital sound data obtained by sampling a person's speech collected from a microphone;
A measurement period setting means for dividing the time series of the digital audio data into measurement periods at predetermined time intervals;
Average sound pressure acquisition means for calculating an average value of sound pressure level values extracted by the sound pressure extraction means as an average sound pressure level value for each measurement period;
Recognize a state where the sound pressure level value extracted by the sound pressure extraction means continues for a predetermined time or more and stays below the reference value as an interval, and obtain an inter-rate ratio for determining the time ratio for each measurement period Means,
The speech speed that recognizes the speech state and the silence state according to the sound pressure level value extracted by the sound pressure extraction means, and obtains the number of occurrences per unit time of the speech state divided by the silence state for each measurement period as the speech speed Acquisition means;
For each measurement period, obtain a score according to the average sound pressure level value and a score according to the speech rate, and a score according to the time ratio, and based on an evaluation score according to the total score value of each score An evaluation result output means for appropriately outputting the evaluation result;
It has.

また、マイクロホンから採取した人の発話音声をサンプリングして得たデジタル音声データから音圧レベル値を抽出する音圧抽出手段と、
デジタル音声データの時系列を所定の時間間隔毎の測定期間に区分する測定期間設定手段と、
測定期間毎に、音圧抽出手段が抽出した音圧レベル値Ｂの平均値を求める平均音圧取得手段と、
音圧抽出手段からの音圧レベル値Ｂが所定の基準音圧レベル値ＢＳ１以下となる状態が所定の時間ｔ以上継続した状態を間（ま）として認識し、測定期間内における間の総時間Ｄ、間以外の状態の総時間をＣとして、測定期間内に占める間の割合ＥをＥ＝Ｄ／（Ｃ＋Ｄ）の式により求める間割合取得手段と、
音圧抽出手段からの音圧レベル値Ｂが所定の基準音圧レベル値ＢＳ２以上となる状態を発話状態とし、測定期間内に検出される単位時間当たりの発話状態の発現回数を発話速度Ｇとして求める話速取得手段と、
音声データの時系列において、初回の測定期間、あるいは初回から所定回数分の測定期間における部分を基準音声とし、当該基準音声における、平均音圧レベル値ＢＳ、間の割合ＥＳ、発話速度ＧＳを取得する基準値取得手段と、
測定期間毎に、音圧レベルＢの平均値とＢＳとの比に応じた点数と、間の割合ＥとＥＳとの比に応じた点数と、発話速度ＧとＧＳとの比に値に応じた点数とを取得するとともに、困惑の程度として各点数の合計点の値の範囲に応じた段階的な評価点を付与し、当該評価点に基づく評価結果を適宜に出力する評価結果出力手段と、
を備えた感情解析装置とすればより好ましい。 Further, sound pressure extraction means for extracting a sound pressure level value from digital sound data obtained by sampling a person's speech collected from a microphone,
A measurement period setting means for dividing the time series of the digital audio data into measurement periods at predetermined time intervals;
Average sound pressure acquisition means for obtaining an average value of sound pressure level values B extracted by the sound pressure extraction means for each measurement period;
A state in which the state in which the sound pressure level value B from the sound pressure extracting means is equal to or less than the predetermined reference sound pressure level value BS1 continues for a predetermined time t or longer is recognized as an interval, and the total time during the measurement period D, a period ratio acquisition means for determining a ratio E during the measurement period by a formula of E = D / (C + D), where C is the total time in a state other than the interval,
A state in which the sound pressure level value B from the sound pressure extraction means is equal to or higher than a predetermined reference sound pressure level value BS2 is defined as an utterance state, and the number of occurrences of the utterance state per unit time detected within the measurement period is defined as an utterance speed G. The required speech speed acquisition means;
In the time series of voice data, the first measurement period or the part in the measurement period of the predetermined number of times from the first time is used as the reference voice, and the average sound pressure level BS, the ratio ES between them, and the speech rate GS in the reference voice are acquired. A reference value acquisition means for
For each measurement period, depending on the score according to the ratio between the average value of the sound pressure level B and the BS, the score according to the ratio between the ratio E and ES, and the ratio between the speech rates G and GS An evaluation result output means for providing a stepwise evaluation score according to the range of the total score value of each score as a degree of confusion and appropriately outputting an evaluation result based on the evaluation score. ,
It is more preferable if it is an emotion analysis apparatus provided with.

本発明は、コンピュータにインストールされるコンピュータプログラムにも及んでおり、当該プログラムは、コンピュータに、
アナログ音声信号をＡ／Ｄ変換して得たデジタル音声データを取得する音声データ取得ステップと、
デジタル音声データの時系列を所定の時間間隔毎の測定期間に区分する測定期間設定ステップと、
測定期間毎に、音圧抽出手段が抽出した音圧レベル値の平均値を平均音圧レベル値として算出する平均音圧取得ステップと、
音圧抽出手段が抽出する音圧レベル値が所定時間以上継続して基準値以下となっている状態を間（ま）として認識するとともに、測定期間毎に間の占める時間割合を求める間割合取得ステップと、
音圧抽出手段が抽出する音圧レベル値に応じて発話状態と無音状態とを認識するとともに、無音状態によって区切られる発話状態の単位時間当たりの発現回数を発話速度として測定期間毎に求める話速取得ステップと、
測定期間毎に、平均音圧レベル値に応じた点数と発話速度に応じた点数と間の時間割合に応じた点数とを求めるとともに、困惑の程度として各点数の合計点の値に応じた評価点を付与し、当該評価点に基づく評価結果を適宜に出力する評価結果出力ステップと、
を実行させる感情解析プログラムとしている。また、当該感情解析プログラムを記録したプログラム格納媒体も本発明の範囲としている。 The present invention extends to a computer program installed in a computer, and the program is stored in the computer.
An audio data acquisition step of acquiring digital audio data obtained by A / D converting an analog audio signal;
A measurement period setting step for dividing the digital audio data time series into measurement periods at predetermined time intervals;
For each measurement period, an average sound pressure acquisition step for calculating an average value of sound pressure level values extracted by the sound pressure extraction means as an average sound pressure level value;
Recognize a state where the sound pressure level value extracted by the sound pressure extraction means continues for a predetermined time or more and stays below the reference value as an interval, and obtain an inter-rate ratio for determining the time ratio for each measurement period Steps,
The speech speed that recognizes the speech state and the silence state according to the sound pressure level value extracted by the sound pressure extraction means, and obtains the number of occurrences per unit time of the speech state divided by the silence state for each measurement period as the speech speed An acquisition step;
For each measurement period, obtain a score according to the average sound pressure level value and a score according to the speech rate, and a score according to the time ratio, and an evaluation according to the total score value of each score as the degree of confusion An evaluation result output step of assigning points and outputting the evaluation results based on the evaluation points as appropriate;
It is an emotion analysis program that executes A program storage medium in which the emotion analysis program is recorded is also included in the scope of the present invention.

本発明の感情解析装置によれば、喜怒哀楽に比べ極めて繊細な感情である「困惑」状態を精度良く抽出し、その困惑の程度を定量評価することができる。 According to the emotion analysis apparatus of the present invention, it is possible to accurately extract a “confused” state, which is an extremely delicate emotion compared to emotions, and to quantitatively evaluate the degree of the confusion.

＝＝＝感情の定量評価について＝＝＝
発話している人の感情やその発話の適正さは、音響出力された人の発話状態を学識経験者や専門家が試聴することで判断することができる。しかし、その判断結果を実際に数値として出力しない限り、コールセンター業務におけるオペレータや顧客の感情を自動的に察知して適切に対処したり、オペレータのスキルを客観的に評価したりすることが出来ない。 === About quantitative evaluation of emotions ===
The feelings of the person who is speaking and the appropriateness of the utterance can be judged by listening to a person with academic experience or an expert who listens to the utterance state of the person who has been acoustically output. However, unless the judgment results are actually output as numerical values, it is impossible to automatically detect and appropriately deal with the feelings of operators and customers in call center operations, and to objectively evaluate operator skills. .

周知のごとく、入力した音声信号から音の特徴情報を抽出し、例えば、基本周波数（ピッチ）や音圧レベルの時間遷移、音声信号をフーリエ変換して得られる単位時間当たりの周波数分析結果（特徴パラメータ）などをグラフにして、発話状態を可視化するための音声分析装置がある（例えば、Kay Elemetrics Corp.製、CSL Computerized Speech Lab Model 4500など）。 As is well known, sound feature information is extracted from an input sound signal, and for example, time transition of fundamental frequency (pitch) and sound pressure level, frequency analysis result per unit time obtained by Fourier transform of sound signal (feature) There is a speech analyzer for visualizing the utterance state by graphing parameters) (for example, CSL Computerized Speech Lab Model 4500 manufactured by Kay Elemetrics Corp.).

そこで本発明者らは、コールセンターのオペレータが顧客と電話で会話したときの発話音声を音響出力しながら、音声分析装置による音の特性に関するグラフを表示し、学識経験者や専門家がこれらの情報について検討し、感情と所定の特徴情報の時間変位状態との相関関係を求め、感情や発話の適正さを定量化した。本発明の感情解析装置は、上記方法により知見した相関関係に基づいて発話音声に含まれる各種感情や発話の適正さを数値化して出力するものである。 Therefore, the present inventors display a graph relating to the sound characteristics of the speech analysis device while the sound output when the call center operator talks to the customer over the telephone is acoustically output, and scholars and experts can provide this information. The correlation between the emotion and the temporal displacement state of the predetermined feature information was obtained, and the appropriateness of the emotion and utterance was quantified. The emotion analysis apparatus of the present invention digitizes and outputs various emotions and appropriateness of utterances included in the uttered speech based on the correlation found by the above method.

＝＝＝感情解析装置の構成＝＝＝
本発明の感情解析装置は、例えば、オーディオカードが実装されたパーソナルコンピュータをハードウエアとし、そのコンピュータにインストールされた感情解析用のアプリケーションプログラム（以下、感情解析ソフト）を実行することで実現される。 === Configuration of Emotion Analysis Device ===
The emotion analysis apparatus according to the present invention is realized by, for example, using a personal computer on which an audio card is mounted as hardware, and executing an emotion analysis application program (hereinafter, emotion analysis software) installed in the computer. .

図１に本発明の本実施例における感情解析装置の機能ブロック構成を示した。オーディオ信号処理部１０は、上記ハードウエア構成では、コンピュータに実装されたサウンドカードに相当する。オーディオ信号処理部１０は、マイクロホン１１から入力した発話者の音声信号をＡ／Ｄ変換してデジタル音声データを出力する。具体的には、サンプリング周波数８ｋＨｚ、量子化数１６ｂｉｔ、１チャンネル（モノラル）でサンプリングしたデジタル音声データに変換する。 FIG. 1 shows a functional block configuration of an emotion analysis apparatus according to this embodiment of the present invention. The audio signal processing unit 10 corresponds to a sound card mounted on a computer in the above hardware configuration. The audio signal processing unit 10 performs A / D conversion on the voice signal of the speaker input from the microphone 11 and outputs digital voice data. Specifically, it is converted into digital audio data sampled at a sampling frequency of 8 kHz, a quantization number of 16 bits, and 1 channel (monaural).

音声分析部２０は、感情解析ソフトウエアによるデジタル音声データ処理に相当し、オーディオ信号処理部１０から順次転送されてくるデジタル音声データから基本周波数（ピッチ）、音圧レベルなどの音の特徴情報を抽出する機能２１と、特徴情報と特徴情報に基づいて求められる発話や間（ま）の持続時間、間の時間割合、特徴パラメータなどの２次情報とを音声分析パラメータとして取得する機能２２などを含んでいる。本実施例では、各音声分析パラメータのそれぞれについての重み付けや限定条件などを初期設定パラメータとし、その初期設定パラメータがユーザインタフェース４０におけるキーボードやマウスなどの操作入力部４１を介して入力されると、そのパラメータを所定形式のファイル（初期設定ファイル）５２としてコンピュータの外部記憶５０に記憶する。 The voice analysis unit 20 corresponds to digital voice data processing by emotion analysis software, and the sound characteristic information such as the fundamental frequency (pitch) and the sound pressure level from the digital voice data sequentially transferred from the audio signal processing unit 10. A function 21 for extracting, and a function 22 for acquiring, as speech analysis parameters, feature information and speech information obtained based on the feature information, secondary duration such as interval, interval ratio, feature parameters, etc. Contains. In the present embodiment, weighting and limiting conditions for each voice analysis parameter are set as initial setting parameters, and when the initial setting parameters are input via the operation input unit 41 such as a keyboard or a mouse in the user interface 40, The parameters are stored in the external storage 50 of the computer as a file (initial setting file) 52 of a predetermined format.

音声分析部２０は、特定の音声分析パラメータと対応する初期設定パラメータとに基づいて、音声分析パラメータを抽出／取得するとともに、その音声分析パラメータに基づいて各種発話状態（感情：怒りや困惑など、発話の適正さ：発話速度、間の時間割合、明瞭さ、語尾の発音特徴など）を抽出する機能２３と、抽出した発話状態の程度や評価結果を求める機能２４とを備えている。音声分析部２０から出力される評価結果は、分析結果出力部３０により、適宜な形式のデータ（テキストや数値など）に変換され、例えば、ユーザインタフェース４０の出力装置（表示装置、印刷装置など）４２や外部記憶５０に評価結果ファイル５３として出力される。もちろん、音声信号出力や評価結果に相当するデータを他のコンピュータへ転送するなど、評価結果は、適宜な装置に、あるいは適宜な形式で出力することもできる。 The voice analysis unit 20 extracts / acquires a voice analysis parameter based on a specific voice analysis parameter and a corresponding initial setting parameter, and based on the voice analysis parameter, various speech states (emotion: anger, confusion, etc. A function 23 for extracting appropriateness of speech: speech speed, time ratio, clarity, pronunciation characteristics of endings, and the like, and a function 24 for obtaining the degree of the extracted speech state and the evaluation result. The evaluation result output from the voice analysis unit 20 is converted into data (text, numerical values, etc.) in an appropriate format by the analysis result output unit 30, for example, an output device (display device, printing device, etc.) of the user interface 40. 42 and the external storage 50 are output as an evaluation result file 53. Of course, the evaluation result can also be output to an appropriate device or in an appropriate format, such as outputting an audio signal or transferring data corresponding to the evaluation result to another computer.

なお、音声分析部２０は、オーディオ信号処理部１０にてリアルタイムでサンプリングされたデジタル音声データの他に、外部記憶５０に記録されているＷＡＶ形式などの所定形式の録音済み音声データファイル５１も処理対象とすることができる。また、マイクロホン１１から採取された発話音声を音声データファイル５０に作成して記録する機能２５も備えている。 In addition to the digital audio data sampled in real time by the audio signal processing unit 10, the audio analysis unit 20 also processes a recorded audio data file 51 in a predetermined format such as a WAV format recorded in the external storage 50. Can be targeted. Also provided is a function 25 for creating and recording speech data collected from the microphone 11 in the audio data file 50.

＝＝＝音声分析パラメータ＝＝＝
音声分析部２０は、サンプリング周期毎に採取されるデジタル音声データの時間変位から特徴情報を抽出する。本実施例では、デジタル音声データの時系列をサンプリング周期より充分に長い期間（例えば、数秒間）を測定期間として区切り、各測定期間におけるデジタル音声データの時系列から、ピッチＡ、音圧レベルＢ、発話持続時間Ｃ、間の持続時間Ｄ、間の全体割合Ｅ、特徴パラメータＦ、発話速度Ｇを音声分析パラメータとして取得する。図２に測定期間の概念を示した。ある測定期間の終了時点と次の測定期間の開始時点とを時系列上で重複させることで、判定を均一化させている。また本実施例では、発話を開始した当初は、話者の感情が安定していると見なし、デジタル音声データにおける上記時系列において、最初あるいは初期の所定の測定期間、あるいは初期の所定回数分の測定期間に相当分を基準音声としている。そして、基準音声から取得される上記各分析パラメータの値を基準値として採用している。以下に各音声分析パラメータについて説明する。 === Speech analysis parameter ===
The voice analysis unit 20 extracts feature information from the time displacement of the digital voice data collected every sampling period. In this embodiment, the time series of digital audio data is divided into periods (for example, several seconds) sufficiently longer than the sampling period as measurement periods, and the pitch A and sound pressure level B are calculated from the time series of digital audio data in each measurement period. , Speech duration C, duration D between, overall ratio E between, feature parameter F, speech rate G are acquired as speech analysis parameters. FIG. 2 shows the concept of the measurement period. The determination is made uniform by overlapping the end point of one measurement period and the start point of the next measurement period in time series. Further, in this embodiment, at the beginning of the utterance, it is assumed that the speaker's emotion is stable, and in the above time series in the digital audio data, the first or initial predetermined measurement period, or the initial predetermined number of times. The amount equivalent to the measurement period is used as the reference voice. And the value of each said analysis parameter acquired from a reference | standard voice is employ | adopted as a reference value. Each speech analysis parameter will be described below.

＜ピッチ（Ａ）＞
音声の基本周波数（Ｈｚ）で、声の第1 倍音である。基準音声の平均ピッチをＡＳとする。測定期間あるいは所定の期間におけるピッチの平均値（平均ピッチ）をＡとする。 <Pitch (A)>
The fundamental frequency (Hz) of voice and the first overtone of voice. Let AS be the average pitch of the reference speech. An average value (average pitch) of pitches in a measurement period or a predetermined period is A.

＜音圧レベル（Ｂ）＞
基準音声の平均音圧レベル（基準音圧レベル）をＢＳとする。単位はｄＢ（デシベル）であり、基準となる音圧を２０μＰａとし、音圧レベル値は、基準音圧レベルＢＳに対する倍数で示される。測定期間あるいは所定の期間における音圧レベルの平均値（平均音圧レベル）をＢとする。 <Sound pressure level (B)>
The average sound pressure level (reference sound pressure level) of the reference sound is defined as BS. The unit is dB (decibel), the reference sound pressure is 20 μPa, and the sound pressure level value is indicated by a multiple of the reference sound pressure level BS. Let B be the average value (average sound pressure level) of the sound pressure levels during the measurement period or a predetermined period.

＜持続時間（Ｃ）＞
一連の発話が続いている時間、音声群の持続時間を音圧の閾値より算出する。デジタル音声データから抽出される音圧レベルの時間変位より求める。単語、音節、文章のそれぞれの会話終了後、所定時間（例えば、０．３秒間）閾値以下の音圧レベルであれば、無音と判断する。なお、音圧レベルの閾値はユーザ入力により設定可能となっている。 <Duration (C)>
The duration of a series of utterances and the duration of the voice group are calculated from the sound pressure threshold. It is obtained from the temporal displacement of the sound pressure level extracted from the digital audio data. After each word, syllable, and sentence conversation, if the sound pressure level is below a threshold for a predetermined time (eg, 0.3 seconds), it is determined that there is no sound. The threshold value of the sound pressure level can be set by user input.

＜間の持続時間（Ｄ）＞
会話において発話していない時間。すなわち、音圧レベルが上記ユーザ入力された音圧レベルの閾値以下で上記所定時間以上継続した状態を「間（ま）」とし、その間の持続時間を求める。 <Duration between (D)>
The time when you are not speaking in a conversation. That is, a state in which the sound pressure level is not more than the threshold value of the sound pressure level input by the user and continues for the predetermined time or more is defined as “between”, and the duration between them is obtained.

＜間の全体割合（Ｅ）＞
間の割合基準値をＥＳとする。会話の持続時間（Ｃ）と間の持続時間（Ｄ）から間の全体割合をＥ＝Ｄ／（Ｃ＋Ｄ）により算出する。 <Overall ratio (E)>
The ratio reference value between them is defined as ES. From the duration (D) between the conversation duration (C) and the duration (D), the overall ratio is calculated by E = D / (C + D).

＜特徴パラメータ（Ｆ）＞
単位時間あたりの周波数を分析したものであり、話者のデジタル音声データをフーリエ変換することにより算出する。なお、特徴パラメータ基準値をＦＳとする。周知の通り、人間の声に含まれる周波数は、６０Ｈｚ〜1 万数千Ｈｚまで広がっている。また、人の声の周波数を分析することで、話者の性別、年齢、身長、職業意識、体調等を読みとることができる。コールセンターのオペレータの適正を判断する際には、周波数分析を利用することにより、「通る声」かどうかを判定することが可能となる。「通る声」は、腹式発声ができていることが前提となり、腹式発声ができていると、声の周波数は２５００Ｈｚ〜３０００Ｈｚ周辺に集まってくる。なお、本実施例では、サンプリング周波数が８ｋＨｚであることから、分析可能な周波数の上限は４０００Ｈｚとなる。 <Characteristic parameter (F)>
This is an analysis of the frequency per unit time, which is calculated by Fourier transforming the digital audio data of the speaker. The feature parameter reference value is FS. As is well known, the frequency contained in a human voice extends from 60 Hz to 10,000 thousand Hz. In addition, by analyzing the frequency of human voice, it is possible to read the gender, age, height, occupational awareness, physical condition, etc. of the speaker. When determining the appropriateness of a call center operator, it is possible to determine whether or not the voice is “passing through” by using frequency analysis. The “passing voice” is based on the premise that an abdominal utterance is made, and when the abdomen utterance is made, the frequency of the voice is gathered around 2500 Hz to 3000 Hz. In this embodiment, since the sampling frequency is 8 kHz, the upper limit of the frequency that can be analyzed is 4000 Hz.

＜発話速度（Ｇ）＞
単位時間あたりの単語の要素数。発話速度基準値をＧＳとする。発話速度は、音圧レベルが上記音圧レベルの閾値以下になった時点を境界とし、次の境界までを「発語の要素」と定義する。そして、単位時間当たりの発語の要素の数を発話速度としている。 <Speaking speed (G)>
Number of word elements per unit time. Assume that the utterance speed reference value is GS. The speaking speed is defined as a “speech element” from the time point when the sound pressure level becomes equal to or lower than the sound pressure level threshold to the next boundary. The number of utterance elements per unit time is used as the utterance speed.

＝＝＝困惑の定量化＝＝＝
本発明の感情解析装置は「困惑」という極めて繊細な感情を定量解析できる点に大きな特徴がある。図３は発話時の音圧レベルの変動を示すグラフである。このグラフ６０では、通常（初期）状態における音圧レベルの変動６１と、困惑状態であると専門家が判断したときの音圧レベルの変動６２とが示されている。「困惑状態」は「通常状態」と比べ、音圧レベルが１／９程度に下がっている。また、発話速度が低下し、間の割合が増加していることがこのグラフから見て取れる。感情解析装置１は、これらの変化を定量解析して困惑の度合いを数値出力する。 === Quantification of confusion ===
The emotion analysis apparatus of the present invention has a great feature in that it can quantitatively analyze an extremely delicate emotion called “confused”. FIG. 3 is a graph showing fluctuations in the sound pressure level during speech. This graph 60 shows a variation 61 in the sound pressure level in the normal (initial) state and a variation 62 in the sound pressure level when the expert determines that the state is in a confused state. The “confused state” is lower than the “normal state” by about 1/9 of the sound pressure level. In addition, it can be seen from this graph that the speaking rate is decreasing and the ratio between the speaking rate is increasing. The emotion analysis apparatus 1 quantitatively analyzes these changes and outputs the degree of confusion numerically.

本実施例では、測定期間における音圧レベルＢ、発話速度Ｅ、間の全体割合Ｇの各音声分析パラメータの値について、その値の範囲に応じて所定の点数を付与する。そして、各音声分析パラメータについて付与された点数の合計点に応じた評価結果を出力する。図４に、本実施例における、上記各音声分析パラメータの値と点数との対応関係と、合計点数と困惑度との対応関係とを示した。音声分析部２０は、困惑状態を抽出するのに採用される上記各音声分析パラメータについて、測定期間における値が初期値に対してどの程度の比であるのかに応じ、段階的に点数（point）を出力するとともに、各音声分析パラメータに対する点数の合計値を算出する。そして、合計点の範囲に応じて困惑の程度を５段階で評価している。なお感情解析装置１は、各分析パラメータの値とポイントとの対応関係や合計点と評価点との対応関係を上述の初期設定パラメータとしてユーザ入力により受け付ける。 In the present embodiment, a predetermined score is assigned to the values of each voice analysis parameter of the sound pressure level B, the speech rate E, and the overall ratio G between the measurement periods according to the range of the values. And the evaluation result according to the total score given about each audio | voice analysis parameter is output. FIG. 4 shows the correspondence between the value of each voice analysis parameter and the score, and the correspondence between the total score and the degree of confusion in this example. For each of the above-described speech analysis parameters employed for extracting the confusion state, the speech analysis unit 20 points in a stepwise manner in accordance with how much the value in the measurement period is relative to the initial value. And the total value of the scores for each speech analysis parameter is calculated. Then, the degree of confusion is evaluated in five stages according to the range of the total points. The emotion analysis device 1 accepts the correspondence between the value of each analysis parameter and the point and the correspondence between the total score and the evaluation point by the user input as the above-mentioned initial setting parameters.

＝＝＝その他の感情・発話状態の評価＝＝＝
本実施例の感情解析装置は、上記困惑に加え、怒り、発話速度の適正さ、発話の間の適正さ、発話の明瞭さ、語尾の発音の特徴、語気の適正さ、抑揚の適正さの程度を数値出力する。 === Evaluation of other emotions and utterance states ===
In addition to the above-mentioned confusion, the emotion analysis apparatus of the present embodiment has anger, appropriateness of speech rate, appropriateness between utterances, clarity of speech, pronunciation characteristics of endings, appropriateness of vocabulary, and appropriateness of inflection. The degree is output numerically.

＜怒り＞
怒りの感情を含んだ発話状態には、ピッチＡの上昇、音圧レベルＢの上昇、発話速度Ｇの上昇が見られる。図５にこれらの音声分析パラメータの数値範囲に応じた点数の対応関係と、その点数の合計である怒りの点数に応じた段階評価点の対応関係とを例示した。 <Anger>
In an utterance state including an angry emotion, an increase in pitch A, an increase in sound pressure level B, and an increase in utterance speed G are observed. FIG. 5 exemplifies the correspondence between the scores according to the numerical ranges of these speech analysis parameters and the correspondence between the graded evaluation points according to the anger score which is the sum of the scores.

＜発話速度の適正さ＞
本実施例では、発話速度Ｇがある範囲以内に収まっているかどうかを評価する。図６に、本実施例における、発話速度の値と発話速度の適正さの段階評価点との対応関係を示した。 <Appropriate speech rate>
In this embodiment, it is evaluated whether or not the speech speed G is within a certain range. FIG. 6 shows the correspondence between the utterance speed value and the evaluation level of the appropriateness of the utterance speed in this example.

＜発話の間の適正さ＞
本実施例では、間の全体割合Ｅと発話速度Ｇとにより間の適正さを評価している。図７に発話の間の全体割合Ｅの値の範囲と点数と対応関係と、発話速度Ｇの値の範囲と点数との対応関係と、これらの音声分析パラメータの合計点と間の適正さの評価点との対応関係の例を示した。 <Adequacy during utterance>
In the present embodiment, the appropriateness is evaluated based on the overall ratio E and the speech rate G. FIG. 7 shows the relationship between the range of the total rate E between utterances, the score and the correspondence, the correspondence between the range of the utterance speed G and the score, and the appropriateness between the total points of these speech analysis parameters. An example of correspondence with evaluation points is shown.

＜発話の明瞭さ＞
発話が「明瞭」であると、「不明瞭」に比べ、音声の基本ピッチの変化が多くなるとともに、音圧も大きくなることが知見できた。本実施例では、ピッチの上下幅、音圧レベルの上下幅、特徴パラメータの分布状況に基づいて発話の明瞭さを評価している。図８にピッチの上下幅Ａ１の値の範囲と点数と対応関係と、音圧レベルの上下幅Ｂ１の値の範囲と点数との対応関係と、特徴パラメータＦ値の範囲と点数と対応関係と、これらの音声分析パラメータの合計点と発話の明瞭さの評価点との対応関係を示した。なお、ここでは、ピッチの上下幅Ａ１＝（最大ピッチ −最小ピッチ）、音圧レベルの上下幅Ｂ１＝（最大音圧レベル−最小音圧レベル）とし、基準音声についてのピッチの上下幅と音圧レベルの上下幅を、それぞれ、Ａ１Ｓ、Ｂ１Ｓとしている。 <Clarity of utterance>
It was found that when the utterance was “clear”, the basic pitch of the voice increased and the sound pressure increased compared to “unclear”. In the present embodiment, the clarity of the utterance is evaluated based on the vertical width of the pitch, the vertical width of the sound pressure level, and the distribution state of the characteristic parameters. FIG. 8 shows the correspondence between the range of the vertical width A1 of the pitch and the score, the correspondence between the range of the vertical width B1 of the sound pressure level and the score, the range of the characteristic parameter F value, the score, and the correspondence. The correspondence between the total score of these speech analysis parameters and the evaluation score of speech clarity was shown. Here, the pitch width A1 = (maximum pitch−minimum pitch) and the sound pressure level height B1 = (maximum sound pressure level−minimum sound pressure level), and the pitch height and sound of the reference sound. The vertical width of the pressure level is A1S and B1S, respectively.

＜語尾の発音の特徴＞
本実施例では、間の直前２秒前から間の状態に至るまでの時間を「語尾」とし、その語尾におけるデジタル音声データを解析し、語尾におけるピッチＡ、音圧レベルＢ、特徴パラメータＦの値に基づいて、話者の発音の特徴として「問題なし」「幼稚」「事務的」「暗い」「その他」の５つの特徴に分類する。図９に上記各発話の特徴に該当するときの音声分析パラメータの値と点数との対応関係を例示した。 <Characteristics of ending pronunciation>
In this embodiment, the time from the immediately preceding 2 seconds to the state in between is defined as “End of word”, the digital voice data at the end of the word is analyzed, and the pitch A, sound pressure level B, and feature parameter F at the end of the word are analyzed. Based on the values, the speaker's pronunciation characteristics are classified into five characteristics: “no problem”, “childhood”, “office work”, “dark”, and “others”. FIG. 9 exemplifies the correspondence between the voice analysis parameter value and the score when corresponding to the features of each utterance.

なお、発音の特徴は「問題なし（通常）」であれば、全てにおいて適性範囲内であり、「幼稚」であれば、音圧レベルの低下やピッチの上下動が認められる。「事務的」は、音圧レベル一定でピッチの上下動が無い。「暗い」は、特徴パラメータにおける周波数分布が狭い。すなわち複式発声ができておらず「通る声」になっていない。「その他」は前記いずれの特徴にも該当しない場合である。すなわち、各音声分析パラメータの値に応じた点数の合計が最も高い点数となった発話の特徴を評価結果とする。具体的な評価手順としては、それぞれの発音の特徴ごとに、各音声分析パラメータの値に応じた点数を組にして設定しておく。そして、各発音の特徴毎に各音声分析パラメータに応じた点数を合計し、最も高い合計点となった発音の特徴を評価結果とする。例えば、語尾の平均音圧レベルＡ、語尾の平均ピッチＢ、語尾の平均特徴パラメータ値Ｆがそれぞれの基準値ＡＳ、ＢＳ、ＦＳに対する割合として、それぞれ８０％、５０％、２０％であったとする。「幼稚」では、音圧とピッチの数値が点数との対応関係に示された数値範囲に該当し、音圧に対する点数４０点とピッチに対する点数５０点を合計した９０点が「幼稚」の点数となる。 If the pronunciation is “no problem (normal)”, it is within the appropriate range, and if it is “childish”, a decrease in sound pressure level and vertical movement of the pitch are recognized. In “office work”, the sound pressure level is constant and the pitch does not move up and down. “Dark” has a narrow frequency distribution in the feature parameter. In other words, it was not possible to make a double utterance, and it was not a “passing voice”. “Other” is a case that does not correspond to any of the above features. That is, the feature of the utterance having the highest score according to the value of each voice analysis parameter is the evaluation result. As a specific evaluation procedure, a score corresponding to the value of each voice analysis parameter is set as a set for each pronunciation feature. Then, the score corresponding to each voice analysis parameter is summed for each feature of each pronunciation, and the feature of the pronunciation having the highest total score is used as the evaluation result. For example, it is assumed that the average sound pressure level A at the ending, the average pitch B at the ending, and the average feature parameter value F at the ending are 80%, 50%, and 20% as the ratios to the respective reference values AS, BS, and FS, respectively. . In “Kindergarten”, the numerical values of sound pressure and pitch fall within the numerical range shown in the correspondence relationship between the scores, and 90 points, which is a total of 40 points for sound pressure and 50 points for pitch, are scored for “Kindergarten”. It becomes.

以下「事務的」「暗い」「問題なし」について同様に計算すると、それぞれの合計点は０点、１２０点、５０点となり、評価結果は「暗い」となる。なお、全ての発音の特徴について合計点が０〜７０点であった場合には「その他」となる。また、この例では「問題なし」「幼稚」「事務的」「暗い」「その他」の発音の特徴に対し、それぞれ０、１、２、３、４の評価点を出力することとしている。 When the same calculation is performed for “office work”, “dark”, and “no problem”, the total points are 0, 120, and 50, and the evaluation result is “dark”. In addition, when the total score is 0 to 70 points for all the pronunciation features, it is “others”. In this example, the evaluation points of 0, 1, 2, 3, and 4 are output for the pronunciation characteristics of “no problem”, “childhood”, “office work”, “dark”, and “others”, respectively.

＜語気の適正さ＞
語気の適正さは、上記発話の特徴を評価する際に採用した音声分析パラメータと同じパラメータに基づいて評価している。上記発話の特徴では、語尾部分を解析したのに対し、語気は「間」以外の全て会話部分を解析する。そして、ピッチの上下動変化、音圧レベルの適正さ、周波数分析による腹式発声の有無を総合的に判定し、評価点として出力するものである。図１０にピッチＡ）、音圧レベルＢ、特徴パラメータＦのそれぞれの音声分析パラメータについて、値と点数との対応関係、および合計点と評価点との対応関係を示した。 <Rationality>
The appropriateness of vocabulary is evaluated based on the same parameters as the voice analysis parameters employed when evaluating the characteristics of the utterance. In the above utterance feature, the ending part is analyzed, while the morale analyzes all the conversation parts except “between”. Then, the vertical movement change of the pitch, the appropriateness of the sound pressure level, and the presence / absence of abdominal utterance by frequency analysis are comprehensively determined and output as evaluation points. FIG. 10 shows the correspondence between the value and the score and the correspondence between the total score and the evaluation score for each of the speech analysis parameters of pitch A), sound pressure level B, and feature parameter F.

＜抑揚の適正さ＞
ピッチの上下幅によって抑揚の適正さを評価する。図１１に、本実施例における、ピッチの上下幅Ａ１の値の範囲と点数と対応関係と、点数と抑揚の適正さの評価点との対応関係とを示した。なお、ピッチの上下幅Ａ１は（最大ピッチ −最小ピッチ）である。 <Adequacy of intonation>
The appropriateness of inflection is evaluated by the vertical width of the pitch. FIG. 11 shows the range of the value of the vertical width A1 of the pitch, the score, and the corresponding relationship, and the corresponding relationship between the score and the evaluation score for the appropriateness of the inflection in this example. The vertical width A1 of the pitch is (maximum pitch−minimum pitch).

＝＝＝ユーザインタフェース＝＝＝
感情解析装置は、感情や発話状態の評価基準をユーザ入力により受け付ける設定パラメータ入力機能と、入力した設定パラメータとデジタル音声データとに基づいて各種感情や発話状態の評価結果を出力する結果提示機能とをＧＵＩにより提供している。図１２にデジタル音声データから特徴情報を抽出するなど基本的な初期設定パラメータを入力するための画面概略を示した。この画面７０には、ピッチを抽出する際に必要な各種設定パラメータを入力するための複数のテキストボックス群７１や、その他の設定パラメータを入力欄するためのテキストボックス群７２などが配設されている。この例では、ピッチを抽出するために、サンプリング周波数、窓関数の指定やその窓関数に適用するフレーム長、フレーム周期などを設定する。また、その他の設定パラメータとして、発話速度を測定する際に無音と判断すべき音圧レベルの閾値、間を検出する際に無音と判断すべき音圧レベルの閾値、間として判断するための閾値の継続期間の閾値、測定期間（ピリオド）の長さや、ピリオドにおける重複期間、あるいは、特徴パラメータを算出する際のフーリエ変換（ＦＦＴ）のフレーム長などの指定入力を受け付ける。 === User interface ===
The emotion analysis device includes a setting parameter input function that accepts evaluation criteria for emotions and utterance states by user input, and a result presentation function that outputs evaluation results of various emotions and utterance states based on the input setting parameters and digital voice data, Is provided by the GUI. FIG. 12 shows an outline of a screen for inputting basic initial setting parameters such as extracting feature information from digital audio data. The screen 70 is provided with a plurality of text box groups 71 for inputting various setting parameters necessary for extracting a pitch, a text box group 72 for inputting other setting parameters, and the like. Yes. In this example, in order to extract the pitch, a sampling frequency, a window function designation, a frame length applied to the window function, a frame period, and the like are set. In addition, as other setting parameters, a threshold of a sound pressure level that should be determined as silence when measuring the speech speed, a threshold of a sound pressure level that should be determined as silence when detecting a gap, and a threshold for determining as a gap Designated inputs such as the threshold value of the continuation period, the length of the measurement period (period), the overlap period in the period, or the frame length of the Fourier transform (FFT) when calculating the feature parameter are accepted.

図１３は、特定の感情あるいは発話状態について各種設定パラメータを受け付けるための画面概略である。ここでは「困惑」に関する各種パラメータを入力するための画面８０を示した。この画面８０には、困惑を抽出するのに採用される、音圧レベル、発話速度、間の割合のそれぞれの各音声分析パラメータについて、閾値や重み付け、すなわち音声分析パラメータの値の範囲と点数との対応付けの指定を受け付けるためのテキストボックス群（８１〜８３）、および合計点と評価結果との対応関係を指定するための入力欄８４などが含まれている。 FIG. 13 is a schematic screen for receiving various setting parameters for a specific emotion or speech state. Here, a screen 80 for inputting various parameters related to “confusing” is shown. On this screen 80, for each voice analysis parameter of the sound pressure level, the speech rate, and the ratio between them, which is used to extract the confusion, the threshold and weight, that is, the range and score of the voice analysis parameter value, A text box group (81-83) for accepting designation of the correspondence, and an input field 84 for designating the correspondence between the total score and the evaluation result.

＝＝＝コールセンターでの適用例＝＝＝
本実施例の感情解析装置の応用形態として、実際にコールセンター業務への適用例を挙げる。オペレータが使用する電話機のマイクロホンからの通話音声を感情解析装置のオーディオ信号処理部へ入力する。感情解析装置は顧客の電話からの着信に応答した時点からデジタル音声データの処理を開始する。そして、顧客との通話中にオペレータの困惑状態の評価点を逐次出力する。 === Application example in a call center ===
As an application form of the emotion analysis apparatus of the present embodiment, an application example to call center business is actually given. The call voice from the microphone of the telephone used by the operator is input to the audio signal processing unit of the emotion analysis device. The emotion analysis apparatus starts processing the digital voice data from the point in time when it responds to an incoming call from the customer's phone. And the evaluation point of an operator's confusion state is output sequentially during the telephone call with a customer.

評価点の出力先をコールセンターの責任者が使用するコンピュータとすれば、責任者は、各オペレータの困惑状態を自身のコンピュータに表示出力される評価点により随時監視することができる。そして、あるオペレータについて重度の困惑状態を認めたならば、電話回線を自身の電話機に切り換える構成が考えられる。それによって、オペレータが顧客との会話に行き詰まる前に経験豊富な責任者を顧客に応対させ、顧客に不快な思いをさせずにコミュニケーションを円滑に継続させることができる。あるいは、困惑の評価点が所定以上（例えば、４以上）になると、その旨を責任者のコンピュータに通知したり、あるいは、自動的に電話回線を他のオペレータなどに切り換えたりするようにしてもよい。 If the output point of the evaluation point is a computer used by the person in charge of the call center, the person in charge can monitor the confusion state of each operator at any time using the evaluation point displayed and output on his computer. If a severe confusion is recognized for a certain operator, a configuration in which the telephone line is switched to the own telephone can be considered. Thereby, before the operator gets stuck in the conversation with the customer, the experienced person in charge can respond to the customer, and the communication can be continued smoothly without making the customer feel uncomfortable. Alternatively, when the confusion evaluation score exceeds a predetermined value (for example, 4 or more), that fact is notified to the computer of the person in charge, or the telephone line is automatically switched to another operator or the like. Good.

本発明の実施例における感情解析装置の機能ブロック図である。It is a functional block diagram of the emotion analysis apparatus in the Example of this invention. 上記感情解析装置における音声データの測定期間の概念図である。It is a conceptual diagram of the measurement period of the audio | voice data in the said emotion analyzer. 発話者の通常状態と困惑状態での音圧レベルの変動グラフである。It is a fluctuation graph of the sound pressure level in a normal state and a confused state of a speaker. 困惑状態を評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating a puzzled state. 怒りを評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating anger. 発話速度の適正さを評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating the appropriateness of speech speed. 発話の間の正確さを評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating the accuracy between utterances. 発話の明瞭さを評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating the clarity of an utterance. 語尾の発話の特徴を評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating the feature of the utterance of an ending. 語気の適正さを評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating the appropriateness of language. 抑揚の適正さを評価するときの採点方法の概略図である。It is the schematic of the scoring method when evaluating the appropriateness of intonation. 上記感情解析装置のＧＵＩにおいて全般的な設定パラメータを入力するための画面概略図ある。It is a screen schematic diagram for inputting general setting parameters in the GUI of the emotion analysis apparatus. 上記感情解析装置のＧＵＩにおいて困惑に関する設定パラメータを入力するための画面概略図である。It is the screen schematic diagram for inputting the setting parameter regarding confusion in the GUI of the emotion analysis apparatus.

Explanation of symbols

１感情解析装置
１０オーディオ信号処理部
１１マイクロホン
２０音声分析部
３０分析結果出力部
４０ユーザインタフェース DESCRIPTION OF SYMBOLS 1 Emotion analyzer 10 Audio signal processing part 11 Microphone 20 Voice analysis part 30 Analysis result output part 40 User interface

Claims

An emotion analysis device that quantitatively analyzes the confused state of a speaker,
A sound pressure extraction means for extracting a sound pressure level value from digital sound data obtained by sampling a person's speech collected from a microphone;
A measurement period setting means for dividing the time series of the digital audio data into measurement periods at predetermined time intervals;
Average sound pressure acquisition means for calculating an average value of sound pressure level values extracted by the sound pressure extraction means as an average sound pressure level value for each measurement period;
Recognize a state where the sound pressure level value extracted by the sound pressure extraction means continues for a predetermined time or more and stays below the reference value as an interval, and obtain an inter-rate ratio for determining the time ratio for each measurement period Means,
The speech speed that recognizes the speech state and the silence state according to the sound pressure level value extracted by the sound pressure extraction means, and obtains the number of occurrences per unit time of the speech state divided by the silence state for each measurement period as the speech speed Acquisition means;
For each measurement period, obtain a score according to the average sound pressure level value and a score according to the speech rate, and a score according to the time ratio, and an evaluation according to the total score value of each score as the degree of confusion An evaluation result output means for assigning points and outputting the evaluation results based on the evaluation points as appropriate;
Emotion analysis device with

An emotion analysis device that quantitatively analyzes the confused state of a speaker,
A sound pressure extraction means for extracting a sound pressure level value from digital sound data obtained by sampling a person's speech collected from a microphone;
A measurement period setting means for dividing the time series of the digital audio data into measurement periods at predetermined time intervals;
Average sound pressure acquisition means for obtaining an average value B of sound pressure level values extracted by the sound pressure extraction means for each measurement period;
A state in which the state in which the sound pressure level value from the sound pressure extracting means is equal to or less than the predetermined reference sound pressure level value BS1 continues for a predetermined time t or more is recognized as an interval, and the total time D during the measurement period , The total time of the state other than between is C, and the ratio E during the measurement period is calculated by the formula E = D / (C + D),
A state where the sound pressure level value from the sound pressure extracting means is equal to or higher than a predetermined reference sound pressure level value BS2 is defined as an utterance state, and the number of occurrences of the utterance state per unit time detected within the measurement period is obtained as the utterance speed G. Speaking speed acquisition means;
In the time series of voice data, the first measurement period or the part in the measurement period of the predetermined number of times from the first time is used as the reference voice, and the average sound pressure level BS, the ratio ES between them, and the speech rate GS in the reference voice are acquired. A reference value acquisition means for
For each measurement period, the number according to the ratio between the average sound pressure level value B and BS, the number according to the ratio between the ratio E and ES, and the ratio between the speech rate G and GS according to the value. An evaluation result output means that obtains a score, assigns a stepwise evaluation score according to the range of the total score value of each score as a degree of confusion, and appropriately outputs an evaluation result based on the evaluation score;
Emotion analysis device with

A computer program installed on a computer,
An audio data acquisition step of acquiring digital audio data obtained by A / D converting an analog audio signal;
A measurement period setting step for dividing the digital audio data time series into measurement periods at predetermined time intervals;
For each measurement period, an average sound pressure acquisition step for calculating an average value of sound pressure level values extracted by the sound pressure extraction means as an average sound pressure level value;
Recognize a state where the sound pressure level value extracted by the sound pressure extraction means continues for a predetermined time or more and stays below the reference value as an interval, and obtain an inter-rate ratio for determining the time ratio for each measurement period Steps,
The speech speed that recognizes the speech state and the silence state according to the sound pressure level value extracted by the sound pressure extraction means, and obtains the number of occurrences per unit time of the speech state divided by the silence state for each measurement period as the speech speed An acquisition step;
For each measurement period, obtain a score according to the average sound pressure level value and a score according to the speech rate, and a score according to the time ratio, and an evaluation according to the total score value of each score as the degree of confusion An evaluation result output step of assigning points and outputting the evaluation results based on the evaluation points as appropriate;
Emotion analysis program that runs.

A program storage medium in which the emotion analysis program according to claim 3 is recorded.