JP2019056879A

JP2019056879A - Voice analysis system, voice analysis method, and voice analysis program

Info

Publication number: JP2019056879A
Application number: JP2017182451A
Authority: JP
Inventors: 林　大介; Daisuke Hayashi; 大介林; 廣平森園; Kohei Morizono
Original assignee: Murata Manufacturing Co Ltd
Current assignee: Murata Manufacturing Co Ltd
Priority date: 2017-09-22
Filing date: 2017-09-22
Publication date: 2019-04-11
Anticipated expiration: 2037-09-22
Also published as: JP6982792B2

Abstract

To determine the atmosphere of the site from a speaker's voice.SOLUTION: A voice analysis system 10 comprises: a sound collection device 11 for collecting a first voice of a first speaker; a sound collection device 12 for collecting a second voice of a second speaker; a computing device 13 for computing, from the first voice, a first emotion vector which is a vector expression having a plurality of kinds of quantified emotions of the first speaker as a plurality of components, computing, from the second voice, a second emotion vector which is a vector expression having a plurality of kinds of quantified emotions of the second speaker as a plurality of components, and computing an atmosphere vector which is a vector expression having a plurality of kinds of quantified emotions indicating the atmosphere of the site between the first speaker and the second speaker as a plurality of components, using the first emotion vector and the second emotion vector; and a display device 14 for displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and the length of the atmosphere vector.SELECTED DRAWING: Figure 2

Description

本発明は、音声解析システム、音声解析方法、及び音声解析プログラムに関わる。 The present invention relates to a voice analysis system, a voice analysis method, and a voice analysis program.

話者の音声から抽出される特徴をモデル化することにより、音声から感情を推定したり、或いは話者を識別したりする技術が知られている。例えば、特許文献１は、会話をする二人の話者の精神活性度と感情との相関関係を予め求めておき、各話者の音声から推定される精神活性度の時間変化に基づいて、各話者の感情の変化を判定する技術について言及している。特許文献２は、話者の音声から抽出される特徴量を成分とする特徴量ベクトルを用いて話者の交代を検出する技術について言及している。 A technique for estimating an emotion from a voice or identifying a speaker by modeling a feature extracted from the voice of a speaker is known. For example, Patent Literature 1 obtains a correlation between emotional activity and emotion of two speakers having a conversation in advance, and based on the temporal change of the mental activity estimated from the voice of each speaker, It refers to a technique for determining changes in the emotions of each speaker. Patent Document 2 refers to a technique for detecting a change of a speaker using a feature amount vector having a feature amount extracted from the speech of the speaker as a component.

特許第５７７２４４８号公報Japanese Patent No. 5772448 特開２０１６−８０９１６号公報Japanese Patent Laid-Open No. 2006-80916

このように、話者の音声から感情を推定したり、或いは話者を識別したりする技術は確立されているものの、複数人の話者の音声から場の雰囲気を判定する技術は、未だ確立されていない。 As described above, techniques for estimating emotions from speaker voices or identifying speakers have been established, but techniques for determining the atmosphere of a field from the voices of multiple speakers are still established. It has not been.

そこで、本発明は、複数人の話者の音声から場の雰囲気を判定する音声解析システムを提案することを課題とする。 Therefore, an object of the present invention is to propose a speech analysis system that determines the atmosphere of a place from the speech of a plurality of speakers.

上述の課題を解決するため、本発明に関わる音声解析システムは、第１の話者の第１の音声を集音する第１の集音装置と、第２の話者の第２の音声を集音する第２の集音装置と、第１の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第１の感情ベクトルを第１の音声から演算し、第２の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第２の感情ベクトルを第２の音声から演算し、第１の感情ベクトルと第２の感情ベクトルとを用いて、第１の話者と第２の話者との間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルを演算する演算装置と、第１の感情ベクトルの少なくとも一つの成分と、第２の感情ベクトルの少なくとも一つの成分と、雰囲気ベクトルの長さとを表示する表示装置と、を備える。 In order to solve the above-described problems, a speech analysis system according to the present invention includes a first sound collecting device that collects a first speech of a first speaker and a second speech of a second speaker. A first sound vector that is a vector expression having a plurality of components of a plurality of types of quantified emotions of the first speaker, and a second sound collecting device that collects sound, A second emotion vector, which is a vector expression having a plurality of types of quantified emotions of the two speakers as a plurality of components, is calculated from the second speech, and the first emotion vector and the second emotion vector are calculated. A computing device that computes an atmosphere vector that is a vector expression having a plurality of components of a plurality of types of quantified emotions that indicate the atmosphere of the field between the first speaker and the second speaker; At least one component of the first emotion vector and at least one of the second emotion vector Min, and a display device for displaying the length of the atmosphere vectors.

本発明に関わる音声解析システムによれば、複数人の話者の音声から場の雰囲気を判定することができる。 According to the voice analysis system according to the present invention, the atmosphere of the place can be determined from the voices of a plurality of speakers.

本発明の実施形態に関わる話者の間の場の雰囲気の解析処理の概略を示す説明図である。It is explanatory drawing which shows the outline of the analysis process of the atmosphere of the field between the speakers in connection with embodiment of this invention. 本発明の実施形態に関わる音声解析システムの概略構成を示す説明図である。It is explanatory drawing which shows schematic structure of the audio | voice analysis system in connection with embodiment of this invention. 本発明の実施形態に関わる表示装置の画面表示例を示す説明図である。It is explanatory drawing which shows the example of a screen display of the display apparatus in connection with embodiment of this invention. 本発明の実施形態に関わる話者の間の場の雰囲気の動向の判定処理の説明図である。It is explanatory drawing of the determination process of the trend of the atmosphere of the place between the speakers in connection with embodiment of this invention. 本発明の実施形態に関わる話者の間の場の雰囲気の動向の判定処理の説明図である。It is explanatory drawing of the determination process of the trend of the atmosphere of the place between the speakers in connection with embodiment of this invention. 本発明の実施形態に関わる音声解析方法の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the audio | voice analysis method concerning embodiment of this invention.

以下、各図を参照しながら本発明の実施形態について説明する。ここで、同一符号は同一の構成要素を示すものとし、重複する説明は省略する。
図１は本発明の実施形態に関わる話者の間の場の雰囲気の解析処理の概略を示す説明図である。感情のモデルとして、ラッセルの円環モデルが知られている。この円環モデルは、感情（例えば、「喜」「怒」「哀」「楽」）を「快−不快」，「覚醒−眠気」の２軸から分類したものである。本実施形態では、「快−不快」，「覚醒−眠気」の２軸から定義される円環モデルの４象限内に感情ベクトルと雰囲気ベクトルとを定義する。ここで、「感情ベクトル」は、話者の複数種類の定量化された感情を複数の成分とするベクトル表現である。また、「雰囲気ベクトル」は、話者の間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である。 Embodiments of the present invention will be described below with reference to the drawings. Here, the same code | symbol shall show the same component, and the overlapping description is abbreviate | omitted.
FIG. 1 is an explanatory diagram showing an outline of a process for analyzing an atmosphere of a place between speakers according to an embodiment of the present invention. Russell's ring model is known as an emotion model. This ring model is a classification of emotions (for example, “joy”, “anger”, “sorrow”, “easy”) from two axes of “pleasant-unpleasant” and “wakefulness-sleepiness”. In the present embodiment, an emotion vector and an atmosphere vector are defined in the four quadrants of an annular model defined from two axes of “pleasant-unpleasant” and “wakefulness-drowsiness”. Here, the “emotion vector” is a vector expression having a plurality of types of quantified emotions of a speaker as a plurality of components. The “atmosphere vector” is a vector expression having a plurality of components of a plurality of types of quantified emotions indicating the atmosphere of the place between the speakers.

円環モデルの４象限内に定義される感情ベクトルは、二つの成分を有する。二つの成分のうち一方の成分は、「快−不快」に関わる定量化された感情を示し、本明細書では、この成分をｘ成分と呼ぶ。二つの成分のうち他方の成分は、「覚醒−眠気」に関わる定量化された感情を示し、本明細書では、この成分をｙ成分と呼ぶ。Ｐ１（ｘ１，ｙ１）は、二人の話者のうちの一方の感情ベクトルを示し、Ｐ２（ｘ２，ｙ２）は、二人の話者のうちの他方の感情ベクトルを示している。円環モデルの４象限内に定義される雰囲気ベクトルは、感情ベクトルの成分と同様の成分を有する。Ｐ３（ｘ３，ｙ３）は、二人の話者の間の場の雰囲気を表す雰囲気ベクトルを示している。Ｐ３（ｘ３，ｙ３）は、例えば、Ｐ１（ｘ１，ｙ１）とＰ２（ｘ２，ｙ２）との和により求めることができ、この場合、ｘ３＝ｘ１＋ｘ２，且つｙ３＝ｙ２＋ｙ１の関係を有する。雰囲気ベクトルの長さは、話者の間の場の雰囲気を示す感情の強さを示しており、雰囲気ベクトルの長さの時間変化（時系列的な移り変わり）を基に、話者の間の場の雰囲気の動向（トレンド）を判定することができる。 An emotion vector defined in the four quadrants of the annular model has two components. One of the two components indicates a quantified emotion related to “pleasant-unpleasant”, and this component is referred to as an x component in this specification. The other component of the two components indicates a quantified emotion related to “wakefulness-drowsiness”, and this component is referred to as a y component in this specification. P1 (x1, y1) represents one emotion vector of the two speakers, and P2 (x2, y2) represents the other emotion vector of the two speakers. The atmosphere vector defined in the four quadrants of the annular model has the same components as the emotion vector components. P3 (x3, y3) represents an atmosphere vector representing the atmosphere of the field between the two speakers. P3 (x3, y3) can be obtained, for example, by the sum of P1 (x1, y1) and P2 (x2, y2). In this case, x3 = x1 + x2, and y3 = y2 + y1. The length of the atmosphere vector indicates the strength of the emotion that indicates the atmosphere of the place between the speakers. Based on the time change of the length of the atmosphere vector (time-series transition), the length between the speakers The trend of the atmosphere of the place can be determined.

図２は、本発明の実施形態に関わる音声解析システム１０の概略構成を示す説明図である。以下の説明では、二人の話者の間の場の雰囲気を判定する場合を例示する。音声解析システム１０は、二つの集音装置１１，１２と、演算装置１３と、表示装置１４とを備えている。集音装置１１は、二人の話者のうち一方の音声を集音し、集音装置１２は、二人の話者のうち他方の音声を集音する。集音装置１１，１２は、例えば、マイクロフォンである。二人の話者のうち、一方は男性でもよく、他方は女性でもよい。演算装置１３は、集音装置１１，１２が集音した音声から二人の話者のそれぞれの感情ベクトルＰ１，Ｐ２と二人の話者の間の雰囲気ベクトルＰ３とを演算するコンピュータシステムである。演算装置１３は、そのハードウェア資源として、プロセッサ及び記憶資源を備えており、記憶資源に格納されている音声解析プログラムをプロセッサが解釈及び実行することにより、感情ベクトルＰ１，Ｐ２及び雰囲気ベクトルＰ３の演算処理を行う。表示装置１４は、感情ベクトルＰ１の少なくとも一つの成分（例えば、ｙ成分）と、感情ベクトルＰ２の少なくとも一つの成分（例えば、ｙ成分）と、雰囲気ベクトルＰ３の長さとを表示する。 FIG. 2 is an explanatory diagram showing a schematic configuration of the speech analysis system 10 according to the embodiment of the present invention. In the following description, a case where the atmosphere of a place between two speakers is determined will be exemplified. The voice analysis system 10 includes two sound collecting devices 11 and 12, a calculation device 13, and a display device 14. The sound collecting device 11 collects the voice of one of the two speakers, and the sound collecting device 12 collects the other voice of the two speakers. The sound collectors 11 and 12 are, for example, microphones. Of the two speakers, one may be male and the other may be female. The computing device 13 is a computer system that computes the emotion vectors P1, P2 of the two speakers and the atmosphere vector P3 between the two speakers from the sounds collected by the sound collectors 11, 12. . The arithmetic unit 13 includes a processor and a storage resource as hardware resources. When the processor interprets and executes a speech analysis program stored in the storage resource, the arithmetic unit 13 generates emotion vectors P1 and P2 and an atmosphere vector P3. Perform arithmetic processing. The display device 14 displays at least one component (for example, y component) of the emotion vector P1, at least one component (for example, y component) of the emotion vector P2, and the length of the atmosphere vector P3.

ここで、二人の話者の音声から各話者の感情ベクトルＰ１，Ｐ２を演算する方法の一例を以下に説明する。演算装置１３は、各話者による一連の発話毎（無音区間を挟まずに連続する一定時間以上の発話毎）に、それぞれの集音装置１１，１２が集音した音声の特徴量を検出し、音声の特徴量の変化の度合いから、各話者の複数種類の定量化された感情を示すｘ成分及びｙ成分を求める。音声の特徴量として、例えば、平均音圧、音圧変化、音圧分布、平均ピッチ（声の高さ）、ピッチ変化、及びピッチ分布などを用いることができる。これらの音声の特徴量の算出方法の一例を以下に示す。
・平均音圧：一例の発話における音声信号のレベルの絶対値を加算し、その一連の発話が行われた時間でこれを平均化することにより、平均音圧を算出する。
・音圧変化：絶対値化された音圧グラフのピークにおける最大値と最小値との差から音圧変化を算出する。音圧グラフのピークは、音圧グラフの１次微分の正から負へのゼロ交差点から算出できる。
・音圧分布：音圧のヒストグラムを正規分布にフィッティングし、その標準偏差から音圧分布を算出する。
・平均ピッチ：ピッチ（声の高さ、或いは基本周波数）は、波形法、相関法又はスペクトル法などから求めることができる。一連の発話における音声信号のピッチを加算し、その一連の発話が行われた時間でこれを平均化することにより、平均ピッチを算出する。
・ピッチ変化：ピッチの最大値と最小値との差から、ピッチ変化を算出する。
・ピッチ分布：ピッチのヒストグラムを正規分布にフィッティングし、その標準偏差からピッチ分布を算出する。 Here, an example of a method for calculating the emotion vectors P1 and P2 of each speaker from the voices of the two speakers will be described below. The arithmetic unit 13 detects the feature amount of the sound collected by each of the sound collectors 11 and 12 for each series of utterances by each speaker (for each utterance for a certain period of time continuously without a silence interval). The x component and the y component indicating a plurality of types of quantified emotions of each speaker are obtained from the degree of change in the voice feature value. For example, an average sound pressure, a change in sound pressure, a sound pressure distribution, an average pitch (voice pitch), a pitch change, and a pitch distribution can be used as the feature amount of the voice. An example of a method for calculating these audio feature amounts will be described below.
Average sound pressure: An average sound pressure is calculated by adding the absolute value of the level of the voice signal in an utterance of one example, and averaging this over the time when the series of utterances were made.
Sound pressure change: The sound pressure change is calculated from the difference between the maximum value and the minimum value at the peak of the sound pressure graph converted into an absolute value. The peak of the sound pressure graph can be calculated from the zero crossing point from positive to negative of the first derivative of the sound pressure graph.
Sound pressure distribution: A sound pressure histogram is fitted to a normal distribution, and the sound pressure distribution is calculated from the standard deviation.
Average pitch: The pitch (voice pitch or fundamental frequency) can be obtained from a waveform method, a correlation method, a spectrum method, or the like. The average pitch is calculated by adding the pitches of the audio signals in a series of utterances and averaging the pitches at the time when the series of utterances were made.
-Pitch change: The pitch change is calculated from the difference between the maximum value and the minimum value of the pitch.
Pitch distribution: A pitch histogram is fitted to a normal distribution, and the pitch distribution is calculated from the standard deviation.

なお、音声の特徴量は、上述の例に限られるものではなく、例えば、メル周波数ケプストラム係数（Mel-Frequency Cepstrum Coefficient：MFCC）の１〜１２次元と、その変化量であるΔＭＦＣＣなどのパラメータを用いることができる。 Note that the audio feature amount is not limited to the above example. For example, parameters such as the mel frequency cepstrum coefficient (Mel-Frequency Cepstrum Coefficient: MFCC) 1 to 12 dimensions and the amount of change such as ΔMFCC are set. Can be used.

図３は本発明の実施形態に関わる表示装置１４の画面表示例を示す。この画面の横軸は時間軸を示し、縦軸は感情ベクトルＰ１，Ｐ２の成分の大きさ又は雰囲気ベクトルＰ３の長さを示している。感情ベクトルＰ１，Ｐ２の成分の大きさ及び雰囲気ベクトルＰ３の長さは、時間の経過とともに変化し得るため、表示装置１４は、一定周期毎にこれらの成分及び長さを表示する。符号２１は、感情ベクトルＰ１の少なくとも一つの成分の一定周期毎の値を通るグラフを示しており、このグラフ２１から、感情ベクトルＰ１の少なくとも一つの成分の時間変化を読み取ることができる。符号２２は、感情ベクトルＰ２の少なくとも一つの成分の一定周期毎の値を通るグラフを示しており、このグラフ２２から、感情ベクトルＰ２の少なくとも一つの成分の時間変化を読み取ることができる。感情ベクトルＰ１の二つの成分のうち表示装置１４に表示される成分は、例えば、ｙ成分である。感情ベクトルＰ２の二つの成分のうち表示装置１４に表示される成分は、例えば、感情ベクトルＰ１の二つの成分のうち表示装置１４に表示される成分と同じ成分である。符号２３は、雰囲気ベクトルＰ３の一定周期毎の長さに対応する高さを有する一連の棒グラフを示しており、一定周期毎に表示される各棒グラフ２３の高さの時間変化から雰囲気ベクトルＰ３の長さの時間変化を読み取ることができる。雰囲気ベクトルＰ３の長さは、話者の間の場の雰囲気を示す感情の強さを示すため、任意の時点における棒グラフ２３の高さからその時点における話者の間の場の雰囲気を示す感情の強さを読み取ることができる。符号２４は、雰囲気ベクトルＰ３の長さの移動平均（単純移動平均）の一定周期毎の値を通るグラフを示しており、このグラフ２４から、雰囲気ベクトルＰ３の長さの移動平均値の時間変化（話者の間の場の雰囲気の動向）を読み取ることができる。なお、各話者の感情の動向と話者の間の場の雰囲気の動向との対応関係が分かればよいため、それぞれのグラフ２１，２２の値から算出される雰囲気ベクトルＰ３の長さと棒グラフ２３の高さとは必ずしも厳密には一致しなくてもよい。例えば、それぞれのグラフ２１，２２の値を加算した値を棒グラフ２３の高さとしてもよく、或いは、それぞれのグラフ２１，２２の値を加算した値に補正係数を乗じた値を棒グラフ２３の高さとしてもよい。この補正係数は、話者の間の場の雰囲気に応じて変えてもよい。 FIG. 3 shows a screen display example of the display device 14 according to the embodiment of the present invention. The horizontal axis of this screen indicates the time axis, and the vertical axis indicates the magnitude of the emotion vectors P1 and P2 or the length of the atmosphere vector P3. Since the magnitudes of the components of the emotion vectors P1 and P2 and the length of the atmosphere vector P3 can change with the passage of time, the display device 14 displays these components and lengths at regular intervals. Reference numeral 21 indicates a graph that passes through a value for each constant period of at least one component of the emotion vector P1, and from this graph 21, the time change of at least one component of the emotion vector P1 can be read. Reference numeral 22 indicates a graph that passes through a value for each constant period of at least one component of the emotion vector P2, and from this graph 22, the time change of at least one component of the emotion vector P2 can be read. Of the two components of the emotion vector P1, the component displayed on the display device 14 is, for example, the y component. Of the two components of the emotion vector P2, the component displayed on the display device 14 is, for example, the same component as the component displayed on the display device 14 of the two components of the emotion vector P1. Reference numeral 23 indicates a series of bar graphs having a height corresponding to the length of the atmosphere vector P3 for each constant period, and from the time change of the height of each bar graph 23 displayed for each constant period, The time change of the length can be read. Since the length of the atmosphere vector P3 indicates the strength of the emotion indicating the atmosphere of the field between the speakers, the emotion indicating the atmosphere of the field between the speakers at that time from the height of the bar graph 23 at any time. Can be read. Reference numeral 24 indicates a graph that passes through the value of the moving average (simple moving average) of the length of the atmosphere vector P3 for each fixed period. From this graph 24, the time change of the moving average value of the length of the atmosphere vector P3. (Trends in the atmosphere of the place between the speakers) can be read. Since it is only necessary to know the correspondence between the emotional trend of each speaker and the atmospheric trend of the speaker, the length of the atmosphere vector P3 calculated from the values of the respective graphs 21 and 22 and the bar graph 23 Is not necessarily exactly the same height. For example, a value obtained by adding the values of the respective graphs 21 and 22 may be set as the height of the bar graph 23, or a value obtained by multiplying the value obtained by adding the values of the respective graphs 21 and 22 and the correction coefficient may be used as the height of the bar graph 23. It may be good. This correction factor may be changed according to the atmosphere of the place between the speakers.

図４及び図５に示すように、演算装置１３は、雰囲気ベクトルＰ３の長さの短期間移動平均値２５と長期間移動平均値２６との関係から、話者の間の場の雰囲気の動向を判定してもよい。例えば、図４に示すように、短期間移動平均値２５のグラフが長期間移動平均値２６のグラフを下から上に向けて交差する事象は、ゴールデンクロスと呼ばれており、雰囲気ベクトルＰ３の長さが増加傾向に転じる兆候を示している。演算装置１３は、ゴールデンクロスを検出すると、話者の間の場の雰囲気が盛り上がる傾向にあるものと判定する。一方、図５に示すように、短期間移動平均値２５のグラフが長期間移動平均値２６のグラフを上から下に向けて交差する事象は、デッドクロスと呼ばれており、雰囲気ベクトルＰ３の長さが減少傾向に転じる兆候を示している。演算装置１３は、デッドクロスを検出すると、話者の間の場の雰囲気が盛り下がる傾向にあるものと判定する。短期間移動平均値２５と長期間移動平均値２６との関係から、話者の間の場の雰囲気が盛り上がる傾向にあるのか或いは盛り下がる傾向にあるのかを判定する手法は、話者へのサービスに応用することができる。例えば、話者の間の場の雰囲気が盛り上がる傾向にあるものと判定された場合、音声解析システム１０は、話者の間の場の雰囲気が一層盛り上がるように、サービスを提供する（例えば、飲食品を提供する）旨のメッセージを表示装置１４に表示してもよい。一方、話者の間の場の雰囲気が盛り下がる傾向にあるものと判定された場合、音声解析システム１０は、話者の交代を提案するメッセージを表示装置１４に表示してもよい。 As shown in FIGS. 4 and 5, the arithmetic unit 13 determines the trend of the atmosphere of the field between the speakers based on the relationship between the short-term moving average value 25 and the long-term moving average value 26 of the length of the atmosphere vector P3. May be determined. For example, as shown in FIG. 4, an event in which the graph of the short-term moving average value 25 intersects the graph of the long-term moving average value 26 from the bottom to the top is called a golden cross, and the atmosphere vector P3 It shows signs that the length has started to increase. When the computing device 13 detects the golden cross, it determines that the atmosphere of the place between the speakers tends to rise. On the other hand, as shown in FIG. 5, the event that the graph of the short-term moving average value 25 intersects the graph of the long-term moving average value 26 from the top to the bottom is called a dead cross, and the atmosphere vector P3 It shows signs that the length tends to decrease. When the computing device 13 detects a dead cross, it determines that the atmosphere of the place between the speakers tends to increase. A method for determining whether the atmosphere of a place between speakers tends to rise or fall from the relationship between the short-term moving average value 25 and the long-term moving average value 26 is a service to the speaker. It can be applied to. For example, when it is determined that the atmosphere of the place between the speakers tends to increase, the speech analysis system 10 provides a service so that the atmosphere of the place between the speakers is further increased (for example, eating and drinking A message indicating that the product is to be provided may be displayed on the display device 14. On the other hand, when it is determined that the atmosphere of the place between the speakers tends to increase, the speech analysis system 10 may display a message on the display device 14 proposing the change of the speakers.

図６は本発明の実施形態に関わる音声解析方法の処理の流れを示すフローチャートである。演算装置１３の記憶資源に格納されている音声解析プログラムは、実施形態に関わる音声解析方法を演算装置１３に実行させるコンピュータプログラムである。
演算装置１３は、二人の話者のうち一方の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である感情ベクトルＰ１を当該一方の話者の音声から演算する（ステップ６１）。
演算装置１３は、二人の話者のうち他方の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である感情ベクトルＰ２を当該他方の話者の音声から演算する（ステップ６２）。
演算装置１３は、感情ベクトルＰ１と感情ベクトルＰ２とを用いて、二人の話者の間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルＰ３を演算する（ステップ６３）。
演算装置１３は、感情ベクトルＰ１の少なくとも一つの成分と、感情ベクトルＰ２の少なくとも一つの成分と、雰囲気ベクトルＰ３の長さとを表示装置１４に表示する（ステップ６４）。 FIG. 6 is a flowchart showing a processing flow of the speech analysis method according to the embodiment of the present invention. The speech analysis program stored in the storage resource of the arithmetic device 13 is a computer program that causes the arithmetic device 13 to execute the speech analysis method according to the embodiment.
The computing device 13 computes an emotion vector P1, which is a vector expression having a plurality of components of a plurality of types of quantified emotions of one of the two speakers, from the voice of the one speaker ( Step 61).
The computing device 13 computes an emotion vector P2 that is a vector expression having a plurality of components of a plurality of types of quantified emotions of the other speaker of the two speakers from the voice of the other speaker ( Step 62).
The computing device 13 uses the emotion vector P1 and the emotion vector P2 to provide an atmosphere vector that is a vector expression having a plurality of types of quantified emotions that indicate the atmosphere of the field between the two speakers. P3 is calculated (step 63).
The arithmetic device 13 displays at least one component of the emotion vector P1, at least one component of the emotion vector P2, and the length of the atmosphere vector P3 on the display device 14 (step 64).

本実施形態によれば、感情ベクトルＰ１の少なくとも一つの成分と、感情ベクトルＰ２の少なくとも一つの成分と、雰囲気ベクトルＰ３の長さとを表示装置１４に表示することにより、各話者の感情のみならず、話者の間の場の雰囲気をも読み取ることができる。また、雰囲気ベクトルＰ３の長さの移動平均値を表示装置１４に表示することにより、雰囲気ベクトルＰ３の長さの時間変化から、話者の間の場の雰囲気の動向を読み取ることができる。また、雰囲気ベクトルＰ３の長さの短期間移動平均値２５と長期間移動平均値２６との関係から、話者の間の場の雰囲気が盛り上がる傾向にあるのか或いは盛り下がる傾向にあるのかを判定することができる。 According to the present embodiment, by displaying at least one component of the emotion vector P1, at least one component of the emotion vector P2, and the length of the atmosphere vector P3 on the display device 14, only the emotions of each speaker can be displayed. It can also read the atmosphere of the place between the speakers. Further, by displaying the moving average value of the length of the atmosphere vector P3 on the display device 14, it is possible to read the trend of the atmosphere of the place between the speakers from the time change of the length of the atmosphere vector P3. Also, from the relationship between the short-term moving average value 25 and the long-term moving average value 26 of the length of the atmosphere vector P3, it is determined whether the atmosphere of the place between the speakers tends to rise or fall. can do.

なお、二人の話者のうち、一方を第１の話者と呼び、他方を第２の話者と呼んでもよい。また、第１の話者の音声を第１の音声と呼び、第２の話者の音声を第２の音声と呼んでもよい。また、第１の話者の感情ベクトルＰ１を第１の感情ベクトルＰ１と呼び、第２の話者の感情ベクトルＰ２を第２の感情ベクトルＰ２と呼んでもよい。また、第１の音声を集音する集音装置１１を第１の集音装置１１と呼び、第２の音声を集音する集音装置１２を第２の集音装置１２と呼んでもよい。また、二人の話者の間の場を第１の話者と第２の話者との間の場と呼んでもよい。 Of the two speakers, one may be called the first speaker and the other may be called the second speaker. Further, the voice of the first speaker may be called the first voice, and the voice of the second speaker may be called the second voice. Further, the emotion vector P1 of the first speaker may be called the first emotion vector P1, and the emotion vector P2 of the second speaker may be called the second emotion vector P2. In addition, the sound collecting device 11 that collects the first sound may be referred to as the first sound collecting device 11, and the sound collecting device 12 that collects the second sound may be referred to as the second sound collecting device 12. Further, a field between two speakers may be called a field between the first speaker and the second speaker.

上述の例では、二人の話者の間の場の雰囲気を判定する場合を例示したが、音声解析システム１０は、三人以上の話者の間の場の雰囲気を判定してもよい。この場合、音声解析システム１０は、話者一人につき一つの集音装置を備えてもよい。また、演算装置１３は、三つ以上の感情ベクトルの演算（例えば、三つ以上の感情ベクトルの加算）により、三人以上の話者の間の場の雰囲気を表す雰囲気ベクトルを求めることができる。また、表示装置１４は、感情ベクトルＰ１の全ての成分（例えば、ｘ成分及びｙ成分）と、感情ベクトルＰ２の全ての成分（例えば、ｘ成分及びｙ成分）と、雰囲気ベクトルＰ３の長さとを表示してもよい。 In the above-described example, the case where the atmosphere of the field between two speakers is determined is illustrated, but the speech analysis system 10 may determine the atmosphere of the field between three or more speakers. In this case, the voice analysis system 10 may include one sound collecting device for each speaker. Further, the arithmetic device 13 can obtain an atmosphere vector representing the atmosphere of the place among three or more speakers by calculating three or more emotion vectors (for example, adding three or more emotion vectors). . The display device 14 also displays all the components of the emotion vector P1 (for example, the x component and the y component), all the components of the emotion vector P2 (for example, the x component and the y component), and the length of the atmosphere vector P3. It may be displayed.

なお、以上説明した実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。本発明は、その趣旨を逸脱することなく、変更／改良され得るととともに、本発明にはその等価物も含まれる。即ち、実施形態に当業者が適宜設計変更を加えたものも、本発明の特徴を備えている限り、本発明の範囲に包含される。また、実施形態が備える各要素は、技術的に可能な限りにおいて組み合わせることができ、これらを組み合わせたものも本発明の特徴を含む限り本発明の範囲に包含される。 The embodiments described above are for facilitating the understanding of the present invention, and are not intended to limit the present invention. The present invention can be changed / improved without departing from the spirit thereof, and the present invention includes equivalents thereof. In other words, those in which the person skilled in the art appropriately changes the design of the embodiments are also included in the scope of the present invention as long as they have the features of the present invention. Moreover, each element with which an embodiment is provided can be combined as long as it is technically possible, and the combination thereof is also included in the scope of the present invention as long as it includes the features of the present invention.

１０…音声解析システム１１…集音装置１２…集音装置１３…演算装置１４…表示装置２１…グラフ２２…グラフ２３…棒グラフ２４…グラフ２５…短期間移動平均値２６…長期間移動平均値 DESCRIPTION OF SYMBOLS 10 ... Voice analysis system 11 ... Sound collector 12 ... Sound collector 13 ... Arithmetic device 14 ... Display device 21 ... Graph 22 ... Graph 23 ... Bar graph 24 ... Graph 25 ... Short-term moving average 26 ... Long-term moving average

Claims

A first sound collecting device for collecting the first voice of the first speaker;
A second sound collecting device for collecting the second voice of the second speaker;
A first emotion vector which is a vector expression having a plurality of types of quantified emotions of the first speaker as a plurality of components is calculated from the first speech, and a plurality of types of the second speaker are calculated. A second emotion vector, which is a vector expression having a quantified emotion as a plurality of components, is calculated from the second sound, and the first emotion vector and the second emotion vector are used to calculate the second emotion vector. An arithmetic unit that calculates an atmosphere vector that is a vector expression having a plurality of components of a plurality of types of quantified emotions that indicate the atmosphere of a field between one speaker and the second speaker;
A display device for displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and a length of the atmosphere vector;
Voice analysis system with

The speech analysis system according to claim 1,
The arithmetic unit calculates a moving average value of the length of the atmosphere vector,
The speech analysis system, wherein the display device displays the moving average value.

The speech analysis system according to claim 2,
The computing device computes a short-term moving average value and a long-term moving average value of the length of the atmosphere vector, and from the relationship between the short-term moving average value and the long-term moving average value, the atmosphere of the field A voice analysis system that determines whether a person has a tendency to swell or swell.

Computer system
Calculating a first emotion vector, which is a vector expression having a plurality of types of quantified emotions of the first speaker as a plurality of components, from the voice of the first speaker;
Calculating a second emotion vector, which is a vector expression having a plurality of types of quantified emotions of the second speaker as a plurality of components, from the voice of the second speaker;
Using the first emotion vector and the second emotion vector, a plurality of types of quantified emotions indicating the atmosphere of the field between the first speaker and the second speaker are displayed. Calculating an atmosphere vector that is a vector representation as a component of
Displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and the length of the atmosphere vector on a display device;
Voice analysis method to execute.

Computer system,
Calculating a first emotion vector, which is a vector expression having a plurality of types of quantified emotions of the first speaker as a plurality of components, from the voice of the first speaker;
Calculating a second emotion vector, which is a vector expression having a plurality of types of quantified emotions of the second speaker as a plurality of components, from the voice of the second speaker;
Using the first emotion vector and the second emotion vector, a plurality of types of quantified emotions indicating the atmosphere of the field between the first speaker and the second speaker are displayed. Calculating an atmosphere vector that is a vector representation as a component of
Displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and the length of the atmosphere vector on a display device;
Voice analysis program to execute.