JP2019056879A - Voice analysis system, voice analysis method, and voice analysis program - Google Patents

Voice analysis system, voice analysis method, and voice analysis program Download PDF

Info

Publication number
JP2019056879A
JP2019056879A JP2017182451A JP2017182451A JP2019056879A JP 2019056879 A JP2019056879 A JP 2019056879A JP 2017182451 A JP2017182451 A JP 2017182451A JP 2017182451 A JP2017182451 A JP 2017182451A JP 2019056879 A JP2019056879 A JP 2019056879A
Authority
JP
Japan
Prior art keywords
vector
speaker
atmosphere
emotion
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2017182451A
Other languages
Japanese (ja)
Other versions
JP6982792B2 (en
Inventor
林 大介
Daisuke Hayashi
大介 林
廣平 森園
Kohei Morizono
廣平 森園
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Murata Manufacturing Co Ltd
Original Assignee
Murata Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Murata Manufacturing Co Ltd filed Critical Murata Manufacturing Co Ltd
Priority to JP2017182451A priority Critical patent/JP6982792B2/en
Publication of JP2019056879A publication Critical patent/JP2019056879A/en
Application granted granted Critical
Publication of JP6982792B2 publication Critical patent/JP6982792B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • User Interface Of Digital Computer (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

To determine the atmosphere of the site from a speaker's voice.SOLUTION: A voice analysis system 10 comprises: a sound collection device 11 for collecting a first voice of a first speaker; a sound collection device 12 for collecting a second voice of a second speaker; a computing device 13 for computing, from the first voice, a first emotion vector which is a vector expression having a plurality of kinds of quantified emotions of the first speaker as a plurality of components, computing, from the second voice, a second emotion vector which is a vector expression having a plurality of kinds of quantified emotions of the second speaker as a plurality of components, and computing an atmosphere vector which is a vector expression having a plurality of kinds of quantified emotions indicating the atmosphere of the site between the first speaker and the second speaker as a plurality of components, using the first emotion vector and the second emotion vector; and a display device 14 for displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and the length of the atmosphere vector.SELECTED DRAWING: Figure 2

Description

本発明は、音声解析システム、音声解析方法、及び音声解析プログラムに関わる。   The present invention relates to a voice analysis system, a voice analysis method, and a voice analysis program.

話者の音声から抽出される特徴をモデル化することにより、音声から感情を推定したり、或いは話者を識別したりする技術が知られている。例えば、特許文献1は、会話をする二人の話者の精神活性度と感情との相関関係を予め求めておき、各話者の音声から推定される精神活性度の時間変化に基づいて、各話者の感情の変化を判定する技術について言及している。特許文献2は、話者の音声から抽出される特徴量を成分とする特徴量ベクトルを用いて話者の交代を検出する技術について言及している。   A technique for estimating an emotion from a voice or identifying a speaker by modeling a feature extracted from the voice of a speaker is known. For example, Patent Literature 1 obtains a correlation between emotional activity and emotion of two speakers having a conversation in advance, and based on the temporal change of the mental activity estimated from the voice of each speaker, It refers to a technique for determining changes in the emotions of each speaker. Patent Document 2 refers to a technique for detecting a change of a speaker using a feature amount vector having a feature amount extracted from the speech of the speaker as a component.

特許第5772448号公報Japanese Patent No. 5772448 特開2016−80916号公報Japanese Patent Laid-Open No. 2006-80916

このように、話者の音声から感情を推定したり、或いは話者を識別したりする技術は確立されているものの、複数人の話者の音声から場の雰囲気を判定する技術は、未だ確立されていない。   As described above, techniques for estimating emotions from speaker voices or identifying speakers have been established, but techniques for determining the atmosphere of a field from the voices of multiple speakers are still established. It has not been.

そこで、本発明は、複数人の話者の音声から場の雰囲気を判定する音声解析システムを提案することを課題とする。   Therefore, an object of the present invention is to propose a speech analysis system that determines the atmosphere of a place from the speech of a plurality of speakers.

上述の課題を解決するため、本発明に関わる音声解析システムは、第1の話者の第1の音声を集音する第1の集音装置と、第2の話者の第2の音声を集音する第2の集音装置と、第1の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第1の感情ベクトルを第1の音声から演算し、第2の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第2の感情ベクトルを第2の音声から演算し、第1の感情ベクトルと第2の感情ベクトルとを用いて、第1の話者と第2の話者との間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルを演算する演算装置と、第1の感情ベクトルの少なくとも一つの成分と、第2の感情ベクトルの少なくとも一つの成分と、雰囲気ベクトルの長さとを表示する表示装置と、を備える。   In order to solve the above-described problems, a speech analysis system according to the present invention includes a first sound collecting device that collects a first speech of a first speaker and a second speech of a second speaker. A first sound vector that is a vector expression having a plurality of components of a plurality of types of quantified emotions of the first speaker, and a second sound collecting device that collects sound, A second emotion vector, which is a vector expression having a plurality of types of quantified emotions of the two speakers as a plurality of components, is calculated from the second speech, and the first emotion vector and the second emotion vector are calculated. A computing device that computes an atmosphere vector that is a vector expression having a plurality of components of a plurality of types of quantified emotions that indicate the atmosphere of the field between the first speaker and the second speaker; At least one component of the first emotion vector and at least one of the second emotion vector Min, and a display device for displaying the length of the atmosphere vectors.

本発明に関わる音声解析システムによれば、複数人の話者の音声から場の雰囲気を判定することができる。   According to the voice analysis system according to the present invention, the atmosphere of the place can be determined from the voices of a plurality of speakers.

本発明の実施形態に関わる話者の間の場の雰囲気の解析処理の概略を示す説明図である。It is explanatory drawing which shows the outline of the analysis process of the atmosphere of the field between the speakers in connection with embodiment of this invention. 本発明の実施形態に関わる音声解析システムの概略構成を示す説明図である。It is explanatory drawing which shows schematic structure of the audio | voice analysis system in connection with embodiment of this invention. 本発明の実施形態に関わる表示装置の画面表示例を示す説明図である。It is explanatory drawing which shows the example of a screen display of the display apparatus in connection with embodiment of this invention. 本発明の実施形態に関わる話者の間の場の雰囲気の動向の判定処理の説明図である。It is explanatory drawing of the determination process of the trend of the atmosphere of the place between the speakers in connection with embodiment of this invention. 本発明の実施形態に関わる話者の間の場の雰囲気の動向の判定処理の説明図である。It is explanatory drawing of the determination process of the trend of the atmosphere of the place between the speakers in connection with embodiment of this invention. 本発明の実施形態に関わる音声解析方法の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the audio | voice analysis method concerning embodiment of this invention.

以下、各図を参照しながら本発明の実施形態について説明する。ここで、同一符号は同一の構成要素を示すものとし、重複する説明は省略する。
図1は本発明の実施形態に関わる話者の間の場の雰囲気の解析処理の概略を示す説明図である。感情のモデルとして、ラッセルの円環モデルが知られている。この円環モデルは、感情(例えば、「喜」「怒」「哀」「楽」)を「快−不快」,「覚醒−眠気」の2軸から分類したものである。本実施形態では、「快−不快」,「覚醒−眠気」の2軸から定義される円環モデルの4象限内に感情ベクトルと雰囲気ベクトルとを定義する。ここで、「感情ベクトル」は、話者の複数種類の定量化された感情を複数の成分とするベクトル表現である。また、「雰囲気ベクトル」は、話者の間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である。
Embodiments of the present invention will be described below with reference to the drawings. Here, the same code | symbol shall show the same component, and the overlapping description is abbreviate | omitted.
FIG. 1 is an explanatory diagram showing an outline of a process for analyzing an atmosphere of a place between speakers according to an embodiment of the present invention. Russell's ring model is known as an emotion model. This ring model is a classification of emotions (for example, “joy”, “anger”, “sorrow”, “easy”) from two axes of “pleasant-unpleasant” and “wakefulness-sleepiness”. In the present embodiment, an emotion vector and an atmosphere vector are defined in the four quadrants of an annular model defined from two axes of “pleasant-unpleasant” and “wakefulness-drowsiness”. Here, the “emotion vector” is a vector expression having a plurality of types of quantified emotions of a speaker as a plurality of components. The “atmosphere vector” is a vector expression having a plurality of components of a plurality of types of quantified emotions indicating the atmosphere of the place between the speakers.

円環モデルの4象限内に定義される感情ベクトルは、二つの成分を有する。二つの成分のうち一方の成分は、「快−不快」に関わる定量化された感情を示し、本明細書では、この成分をx成分と呼ぶ。二つの成分のうち他方の成分は、「覚醒−眠気」に関わる定量化された感情を示し、本明細書では、この成分をy成分と呼ぶ。P1(x1,y1)は、二人の話者のうちの一方の感情ベクトルを示し、P2(x2,y2)は、二人の話者のうちの他方の感情ベクトルを示している。円環モデルの4象限内に定義される雰囲気ベクトルは、感情ベクトルの成分と同様の成分を有する。P3(x3,y3)は、二人の話者の間の場の雰囲気を表す雰囲気ベクトルを示している。P3(x3,y3)は、例えば、P1(x1,y1)とP2(x2,y2)との和により求めることができ、この場合、x3=x1+x2,且つy3=y2+y1の関係を有する。雰囲気ベクトルの長さは、話者の間の場の雰囲気を示す感情の強さを示しており、雰囲気ベクトルの長さの時間変化(時系列的な移り変わり)を基に、話者の間の場の雰囲気の動向(トレンド)を判定することができる。   An emotion vector defined in the four quadrants of the annular model has two components. One of the two components indicates a quantified emotion related to “pleasant-unpleasant”, and this component is referred to as an x component in this specification. The other component of the two components indicates a quantified emotion related to “wakefulness-drowsiness”, and this component is referred to as a y component in this specification. P1 (x1, y1) represents one emotion vector of the two speakers, and P2 (x2, y2) represents the other emotion vector of the two speakers. The atmosphere vector defined in the four quadrants of the annular model has the same components as the emotion vector components. P3 (x3, y3) represents an atmosphere vector representing the atmosphere of the field between the two speakers. P3 (x3, y3) can be obtained, for example, by the sum of P1 (x1, y1) and P2 (x2, y2). In this case, x3 = x1 + x2, and y3 = y2 + y1. The length of the atmosphere vector indicates the strength of the emotion that indicates the atmosphere of the place between the speakers. Based on the time change of the length of the atmosphere vector (time-series transition), the length between the speakers The trend of the atmosphere of the place can be determined.

図2は、本発明の実施形態に関わる音声解析システム10の概略構成を示す説明図である。以下の説明では、二人の話者の間の場の雰囲気を判定する場合を例示する。音声解析システム10は、二つの集音装置11,12と、演算装置13と、表示装置14とを備えている。集音装置11は、二人の話者のうち一方の音声を集音し、集音装置12は、二人の話者のうち他方の音声を集音する。集音装置11,12は、例えば、マイクロフォンである。二人の話者のうち、一方は男性でもよく、他方は女性でもよい。演算装置13は、集音装置11,12が集音した音声から二人の話者のそれぞれの感情ベクトルP1,P2と二人の話者の間の雰囲気ベクトルP3とを演算するコンピュータシステムである。演算装置13は、そのハードウェア資源として、プロセッサ及び記憶資源を備えており、記憶資源に格納されている音声解析プログラムをプロセッサが解釈及び実行することにより、感情ベクトルP1,P2及び雰囲気ベクトルP3の演算処理を行う。表示装置14は、感情ベクトルP1の少なくとも一つの成分(例えば、y成分)と、感情ベクトルP2の少なくとも一つの成分(例えば、y成分)と、雰囲気ベクトルP3の長さとを表示する。   FIG. 2 is an explanatory diagram showing a schematic configuration of the speech analysis system 10 according to the embodiment of the present invention. In the following description, a case where the atmosphere of a place between two speakers is determined will be exemplified. The voice analysis system 10 includes two sound collecting devices 11 and 12, a calculation device 13, and a display device 14. The sound collecting device 11 collects the voice of one of the two speakers, and the sound collecting device 12 collects the other voice of the two speakers. The sound collectors 11 and 12 are, for example, microphones. Of the two speakers, one may be male and the other may be female. The computing device 13 is a computer system that computes the emotion vectors P1, P2 of the two speakers and the atmosphere vector P3 between the two speakers from the sounds collected by the sound collectors 11, 12. . The arithmetic unit 13 includes a processor and a storage resource as hardware resources. When the processor interprets and executes a speech analysis program stored in the storage resource, the arithmetic unit 13 generates emotion vectors P1 and P2 and an atmosphere vector P3. Perform arithmetic processing. The display device 14 displays at least one component (for example, y component) of the emotion vector P1, at least one component (for example, y component) of the emotion vector P2, and the length of the atmosphere vector P3.

ここで、二人の話者の音声から各話者の感情ベクトルP1,P2を演算する方法の一例を以下に説明する。演算装置13は、各話者による一連の発話毎(無音区間を挟まずに連続する一定時間以上の発話毎)に、それぞれの集音装置11,12が集音した音声の特徴量を検出し、音声の特徴量の変化の度合いから、各話者の複数種類の定量化された感情を示すx成分及びy成分を求める。音声の特徴量として、例えば、平均音圧、音圧変化、音圧分布、平均ピッチ(声の高さ)、ピッチ変化、及びピッチ分布などを用いることができる。これらの音声の特徴量の算出方法の一例を以下に示す。
・平均音圧:一例の発話における音声信号のレベルの絶対値を加算し、その一連の発話が行われた時間でこれを平均化することにより、平均音圧を算出する。
・音圧変化:絶対値化された音圧グラフのピークにおける最大値と最小値との差から音圧変化を算出する。音圧グラフのピークは、音圧グラフの1次微分の正から負へのゼロ交差点から算出できる。
・音圧分布:音圧のヒストグラムを正規分布にフィッティングし、その標準偏差から音圧分布を算出する。
・平均ピッチ:ピッチ(声の高さ、或いは基本周波数)は、波形法、相関法又はスペクトル法などから求めることができる。一連の発話における音声信号のピッチを加算し、その一連の発話が行われた時間でこれを平均化することにより、平均ピッチを算出する。
・ピッチ変化:ピッチの最大値と最小値との差から、ピッチ変化を算出する。
・ピッチ分布:ピッチのヒストグラムを正規分布にフィッティングし、その標準偏差からピッチ分布を算出する。
Here, an example of a method for calculating the emotion vectors P1 and P2 of each speaker from the voices of the two speakers will be described below. The arithmetic unit 13 detects the feature amount of the sound collected by each of the sound collectors 11 and 12 for each series of utterances by each speaker (for each utterance for a certain period of time continuously without a silence interval). The x component and the y component indicating a plurality of types of quantified emotions of each speaker are obtained from the degree of change in the voice feature value. For example, an average sound pressure, a change in sound pressure, a sound pressure distribution, an average pitch (voice pitch), a pitch change, and a pitch distribution can be used as the feature amount of the voice. An example of a method for calculating these audio feature amounts will be described below.
Average sound pressure: An average sound pressure is calculated by adding the absolute value of the level of the voice signal in an utterance of one example, and averaging this over the time when the series of utterances were made.
Sound pressure change: The sound pressure change is calculated from the difference between the maximum value and the minimum value at the peak of the sound pressure graph converted into an absolute value. The peak of the sound pressure graph can be calculated from the zero crossing point from positive to negative of the first derivative of the sound pressure graph.
Sound pressure distribution: A sound pressure histogram is fitted to a normal distribution, and the sound pressure distribution is calculated from the standard deviation.
Average pitch: The pitch (voice pitch or fundamental frequency) can be obtained from a waveform method, a correlation method, a spectrum method, or the like. The average pitch is calculated by adding the pitches of the audio signals in a series of utterances and averaging the pitches at the time when the series of utterances were made.
-Pitch change: The pitch change is calculated from the difference between the maximum value and the minimum value of the pitch.
Pitch distribution: A pitch histogram is fitted to a normal distribution, and the pitch distribution is calculated from the standard deviation.

なお、音声の特徴量は、上述の例に限られるものではなく、例えば、メル周波数ケプストラム係数(Mel-Frequency Cepstrum Coefficient:MFCC)の1〜12次元と、その変化量であるΔMFCCなどのパラメータを用いることができる。   Note that the audio feature amount is not limited to the above example. For example, parameters such as the mel frequency cepstrum coefficient (Mel-Frequency Cepstrum Coefficient: MFCC) 1 to 12 dimensions and the amount of change such as ΔMFCC are set. Can be used.

図3は本発明の実施形態に関わる表示装置14の画面表示例を示す。この画面の横軸は時間軸を示し、縦軸は感情ベクトルP1,P2の成分の大きさ又は雰囲気ベクトルP3の長さを示している。感情ベクトルP1,P2の成分の大きさ及び雰囲気ベクトルP3の長さは、時間の経過とともに変化し得るため、表示装置14は、一定周期毎にこれらの成分及び長さを表示する。符号21は、感情ベクトルP1の少なくとも一つの成分の一定周期毎の値を通るグラフを示しており、このグラフ21から、感情ベクトルP1の少なくとも一つの成分の時間変化を読み取ることができる。符号22は、感情ベクトルP2の少なくとも一つの成分の一定周期毎の値を通るグラフを示しており、このグラフ22から、感情ベクトルP2の少なくとも一つの成分の時間変化を読み取ることができる。感情ベクトルP1の二つの成分のうち表示装置14に表示される成分は、例えば、y成分である。感情ベクトルP2の二つの成分のうち表示装置14に表示される成分は、例えば、感情ベクトルP1の二つの成分のうち表示装置14に表示される成分と同じ成分である。符号23は、雰囲気ベクトルP3の一定周期毎の長さに対応する高さを有する一連の棒グラフを示しており、一定周期毎に表示される各棒グラフ23の高さの時間変化から雰囲気ベクトルP3の長さの時間変化を読み取ることができる。雰囲気ベクトルP3の長さは、話者の間の場の雰囲気を示す感情の強さを示すため、任意の時点における棒グラフ23の高さからその時点における話者の間の場の雰囲気を示す感情の強さを読み取ることができる。符号24は、雰囲気ベクトルP3の長さの移動平均(単純移動平均)の一定周期毎の値を通るグラフを示しており、このグラフ24から、雰囲気ベクトルP3の長さの移動平均値の時間変化(話者の間の場の雰囲気の動向)を読み取ることができる。なお、各話者の感情の動向と話者の間の場の雰囲気の動向との対応関係が分かればよいため、それぞれのグラフ21,22の値から算出される雰囲気ベクトルP3の長さと棒グラフ23の高さとは必ずしも厳密には一致しなくてもよい。例えば、それぞれのグラフ21,22の値を加算した値を棒グラフ23の高さとしてもよく、或いは、それぞれのグラフ21,22の値を加算した値に補正係数を乗じた値を棒グラフ23の高さとしてもよい。この補正係数は、話者の間の場の雰囲気に応じて変えてもよい。   FIG. 3 shows a screen display example of the display device 14 according to the embodiment of the present invention. The horizontal axis of this screen indicates the time axis, and the vertical axis indicates the magnitude of the emotion vectors P1 and P2 or the length of the atmosphere vector P3. Since the magnitudes of the components of the emotion vectors P1 and P2 and the length of the atmosphere vector P3 can change with the passage of time, the display device 14 displays these components and lengths at regular intervals. Reference numeral 21 indicates a graph that passes through a value for each constant period of at least one component of the emotion vector P1, and from this graph 21, the time change of at least one component of the emotion vector P1 can be read. Reference numeral 22 indicates a graph that passes through a value for each constant period of at least one component of the emotion vector P2, and from this graph 22, the time change of at least one component of the emotion vector P2 can be read. Of the two components of the emotion vector P1, the component displayed on the display device 14 is, for example, the y component. Of the two components of the emotion vector P2, the component displayed on the display device 14 is, for example, the same component as the component displayed on the display device 14 of the two components of the emotion vector P1. Reference numeral 23 indicates a series of bar graphs having a height corresponding to the length of the atmosphere vector P3 for each constant period, and from the time change of the height of each bar graph 23 displayed for each constant period, The time change of the length can be read. Since the length of the atmosphere vector P3 indicates the strength of the emotion indicating the atmosphere of the field between the speakers, the emotion indicating the atmosphere of the field between the speakers at that time from the height of the bar graph 23 at any time. Can be read. Reference numeral 24 indicates a graph that passes through the value of the moving average (simple moving average) of the length of the atmosphere vector P3 for each fixed period. From this graph 24, the time change of the moving average value of the length of the atmosphere vector P3. (Trends in the atmosphere of the place between the speakers) can be read. Since it is only necessary to know the correspondence between the emotional trend of each speaker and the atmospheric trend of the speaker, the length of the atmosphere vector P3 calculated from the values of the respective graphs 21 and 22 and the bar graph 23 Is not necessarily exactly the same height. For example, a value obtained by adding the values of the respective graphs 21 and 22 may be set as the height of the bar graph 23, or a value obtained by multiplying the value obtained by adding the values of the respective graphs 21 and 22 and the correction coefficient may be used as the height of the bar graph 23. It may be good. This correction factor may be changed according to the atmosphere of the place between the speakers.

図4及び図5に示すように、演算装置13は、雰囲気ベクトルP3の長さの短期間移動平均値25と長期間移動平均値26との関係から、話者の間の場の雰囲気の動向を判定してもよい。例えば、図4に示すように、短期間移動平均値25のグラフが長期間移動平均値26のグラフを下から上に向けて交差する事象は、ゴールデンクロスと呼ばれており、雰囲気ベクトルP3の長さが増加傾向に転じる兆候を示している。演算装置13は、ゴールデンクロスを検出すると、話者の間の場の雰囲気が盛り上がる傾向にあるものと判定する。一方、図5に示すように、短期間移動平均値25のグラフが長期間移動平均値26のグラフを上から下に向けて交差する事象は、デッドクロスと呼ばれており、雰囲気ベクトルP3の長さが減少傾向に転じる兆候を示している。演算装置13は、デッドクロスを検出すると、話者の間の場の雰囲気が盛り下がる傾向にあるものと判定する。短期間移動平均値25と長期間移動平均値26との関係から、話者の間の場の雰囲気が盛り上がる傾向にあるのか或いは盛り下がる傾向にあるのかを判定する手法は、話者へのサービスに応用することができる。例えば、話者の間の場の雰囲気が盛り上がる傾向にあるものと判定された場合、音声解析システム10は、話者の間の場の雰囲気が一層盛り上がるように、サービスを提供する(例えば、飲食品を提供する)旨のメッセージを表示装置14に表示してもよい。一方、話者の間の場の雰囲気が盛り下がる傾向にあるものと判定された場合、音声解析システム10は、話者の交代を提案するメッセージを表示装置14に表示してもよい。   As shown in FIGS. 4 and 5, the arithmetic unit 13 determines the trend of the atmosphere of the field between the speakers based on the relationship between the short-term moving average value 25 and the long-term moving average value 26 of the length of the atmosphere vector P3. May be determined. For example, as shown in FIG. 4, an event in which the graph of the short-term moving average value 25 intersects the graph of the long-term moving average value 26 from the bottom to the top is called a golden cross, and the atmosphere vector P3 It shows signs that the length has started to increase. When the computing device 13 detects the golden cross, it determines that the atmosphere of the place between the speakers tends to rise. On the other hand, as shown in FIG. 5, the event that the graph of the short-term moving average value 25 intersects the graph of the long-term moving average value 26 from the top to the bottom is called a dead cross, and the atmosphere vector P3 It shows signs that the length tends to decrease. When the computing device 13 detects a dead cross, it determines that the atmosphere of the place between the speakers tends to increase. A method for determining whether the atmosphere of a place between speakers tends to rise or fall from the relationship between the short-term moving average value 25 and the long-term moving average value 26 is a service to the speaker. It can be applied to. For example, when it is determined that the atmosphere of the place between the speakers tends to increase, the speech analysis system 10 provides a service so that the atmosphere of the place between the speakers is further increased (for example, eating and drinking A message indicating that the product is to be provided may be displayed on the display device 14. On the other hand, when it is determined that the atmosphere of the place between the speakers tends to increase, the speech analysis system 10 may display a message on the display device 14 proposing the change of the speakers.

図6は本発明の実施形態に関わる音声解析方法の処理の流れを示すフローチャートである。演算装置13の記憶資源に格納されている音声解析プログラムは、実施形態に関わる音声解析方法を演算装置13に実行させるコンピュータプログラムである。
演算装置13は、二人の話者のうち一方の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である感情ベクトルP1を当該一方の話者の音声から演算する(ステップ61)。
演算装置13は、二人の話者のうち他方の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である感情ベクトルP2を当該他方の話者の音声から演算する(ステップ62)。
演算装置13は、感情ベクトルP1と感情ベクトルP2とを用いて、二人の話者の間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルP3を演算する(ステップ63)。
演算装置13は、感情ベクトルP1の少なくとも一つの成分と、感情ベクトルP2の少なくとも一つの成分と、雰囲気ベクトルP3の長さとを表示装置14に表示する(ステップ64)。
FIG. 6 is a flowchart showing a processing flow of the speech analysis method according to the embodiment of the present invention. The speech analysis program stored in the storage resource of the arithmetic device 13 is a computer program that causes the arithmetic device 13 to execute the speech analysis method according to the embodiment.
The computing device 13 computes an emotion vector P1, which is a vector expression having a plurality of components of a plurality of types of quantified emotions of one of the two speakers, from the voice of the one speaker ( Step 61).
The computing device 13 computes an emotion vector P2 that is a vector expression having a plurality of components of a plurality of types of quantified emotions of the other speaker of the two speakers from the voice of the other speaker ( Step 62).
The computing device 13 uses the emotion vector P1 and the emotion vector P2 to provide an atmosphere vector that is a vector expression having a plurality of types of quantified emotions that indicate the atmosphere of the field between the two speakers. P3 is calculated (step 63).
The arithmetic device 13 displays at least one component of the emotion vector P1, at least one component of the emotion vector P2, and the length of the atmosphere vector P3 on the display device 14 (step 64).

本実施形態によれば、感情ベクトルP1の少なくとも一つの成分と、感情ベクトルP2の少なくとも一つの成分と、雰囲気ベクトルP3の長さとを表示装置14に表示することにより、各話者の感情のみならず、話者の間の場の雰囲気をも読み取ることができる。また、雰囲気ベクトルP3の長さの移動平均値を表示装置14に表示することにより、雰囲気ベクトルP3の長さの時間変化から、話者の間の場の雰囲気の動向を読み取ることができる。また、雰囲気ベクトルP3の長さの短期間移動平均値25と長期間移動平均値26との関係から、話者の間の場の雰囲気が盛り上がる傾向にあるのか或いは盛り下がる傾向にあるのかを判定することができる。   According to the present embodiment, by displaying at least one component of the emotion vector P1, at least one component of the emotion vector P2, and the length of the atmosphere vector P3 on the display device 14, only the emotions of each speaker can be displayed. It can also read the atmosphere of the place between the speakers. Further, by displaying the moving average value of the length of the atmosphere vector P3 on the display device 14, it is possible to read the trend of the atmosphere of the place between the speakers from the time change of the length of the atmosphere vector P3. Also, from the relationship between the short-term moving average value 25 and the long-term moving average value 26 of the length of the atmosphere vector P3, it is determined whether the atmosphere of the place between the speakers tends to rise or fall. can do.

なお、二人の話者のうち、一方を第1の話者と呼び、他方を第2の話者と呼んでもよい。また、第1の話者の音声を第1の音声と呼び、第2の話者の音声を第2の音声と呼んでもよい。また、第1の話者の感情ベクトルP1を第1の感情ベクトルP1と呼び、第2の話者の感情ベクトルP2を第2の感情ベクトルP2と呼んでもよい。また、第1の音声を集音する集音装置11を第1の集音装置11と呼び、第2の音声を集音する集音装置12を第2の集音装置12と呼んでもよい。また、二人の話者の間の場を第1の話者と第2の話者との間の場と呼んでもよい。   Of the two speakers, one may be called the first speaker and the other may be called the second speaker. Further, the voice of the first speaker may be called the first voice, and the voice of the second speaker may be called the second voice. Further, the emotion vector P1 of the first speaker may be called the first emotion vector P1, and the emotion vector P2 of the second speaker may be called the second emotion vector P2. In addition, the sound collecting device 11 that collects the first sound may be referred to as the first sound collecting device 11, and the sound collecting device 12 that collects the second sound may be referred to as the second sound collecting device 12. Further, a field between two speakers may be called a field between the first speaker and the second speaker.

上述の例では、二人の話者の間の場の雰囲気を判定する場合を例示したが、音声解析システム10は、三人以上の話者の間の場の雰囲気を判定してもよい。この場合、音声解析システム10は、話者一人につき一つの集音装置を備えてもよい。また、演算装置13は、三つ以上の感情ベクトルの演算(例えば、三つ以上の感情ベクトルの加算)により、三人以上の話者の間の場の雰囲気を表す雰囲気ベクトルを求めることができる。また、表示装置14は、感情ベクトルP1の全ての成分(例えば、x成分及びy成分)と、感情ベクトルP2の全ての成分(例えば、x成分及びy成分)と、雰囲気ベクトルP3の長さとを表示してもよい。   In the above-described example, the case where the atmosphere of the field between two speakers is determined is illustrated, but the speech analysis system 10 may determine the atmosphere of the field between three or more speakers. In this case, the voice analysis system 10 may include one sound collecting device for each speaker. Further, the arithmetic device 13 can obtain an atmosphere vector representing the atmosphere of the place among three or more speakers by calculating three or more emotion vectors (for example, adding three or more emotion vectors). . The display device 14 also displays all the components of the emotion vector P1 (for example, the x component and the y component), all the components of the emotion vector P2 (for example, the x component and the y component), and the length of the atmosphere vector P3. It may be displayed.

なお、以上説明した実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。本発明は、その趣旨を逸脱することなく、変更/改良され得るととともに、本発明にはその等価物も含まれる。即ち、実施形態に当業者が適宜設計変更を加えたものも、本発明の特徴を備えている限り、本発明の範囲に包含される。また、実施形態が備える各要素は、技術的に可能な限りにおいて組み合わせることができ、これらを組み合わせたものも本発明の特徴を含む限り本発明の範囲に包含される。   The embodiments described above are for facilitating the understanding of the present invention, and are not intended to limit the present invention. The present invention can be changed / improved without departing from the spirit thereof, and the present invention includes equivalents thereof. In other words, those in which the person skilled in the art appropriately changes the design of the embodiments are also included in the scope of the present invention as long as they have the features of the present invention. Moreover, each element with which an embodiment is provided can be combined as long as it is technically possible, and the combination thereof is also included in the scope of the present invention as long as it includes the features of the present invention.

10…音声解析システム 11…集音装置 12…集音装置 13…演算装置 14…表示装置 21…グラフ 22…グラフ 23…棒グラフ 24…グラフ 25…短期間移動平均値 26…長期間移動平均値 DESCRIPTION OF SYMBOLS 10 ... Voice analysis system 11 ... Sound collector 12 ... Sound collector 13 ... Arithmetic device 14 ... Display device 21 ... Graph 22 ... Graph 23 ... Bar graph 24 ... Graph 25 ... Short-term moving average 26 ... Long-term moving average

Claims (5)

第1の話者の第1の音声を集音する第1の集音装置と、
第2の話者の第2の音声を集音する第2の集音装置と、
前記第1の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第1の感情ベクトルを前記第1の音声から演算し、前記第2の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第2の感情ベクトルを前記第2の音声から演算し、前記第1の感情ベクトルと前記第2の感情ベクトルとを用いて、前記第1の話者と前記第2の話者との間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルを演算する演算装置と、
前記第1の感情ベクトルの少なくとも一つの成分と、前記第2の感情ベクトルの少なくとも一つの成分と、前記雰囲気ベクトルの長さとを表示する表示装置と、
を備える音声解析システム。
A first sound collecting device for collecting the first voice of the first speaker;
A second sound collecting device for collecting the second voice of the second speaker;
A first emotion vector which is a vector expression having a plurality of types of quantified emotions of the first speaker as a plurality of components is calculated from the first speech, and a plurality of types of the second speaker are calculated. A second emotion vector, which is a vector expression having a quantified emotion as a plurality of components, is calculated from the second sound, and the first emotion vector and the second emotion vector are used to calculate the second emotion vector. An arithmetic unit that calculates an atmosphere vector that is a vector expression having a plurality of components of a plurality of types of quantified emotions that indicate the atmosphere of a field between one speaker and the second speaker;
A display device for displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and a length of the atmosphere vector;
Voice analysis system with
請求項1に記載の音声解析システムであって、
前記演算装置は、前記雰囲気ベクトルの長さの移動平均値を演算し、
前記表示装置は、前記移動平均値を表示する、音声解析システム。
The speech analysis system according to claim 1,
The arithmetic unit calculates a moving average value of the length of the atmosphere vector,
The speech analysis system, wherein the display device displays the moving average value.
請求項2に記載の音声解析システムであって、
前記演算装置は、前記雰囲気ベクトルの長さの短期間移動平均値と長期間移動平均値とを演算し、前記短期間移動平均値と前記長期間移動平均値との関係から、前記場の雰囲気が盛り上がる傾向にあるのか或いは盛り下がる傾向にあるのかを判定する、音声解析システム。
The speech analysis system according to claim 2,
The computing device computes a short-term moving average value and a long-term moving average value of the length of the atmosphere vector, and from the relationship between the short-term moving average value and the long-term moving average value, the atmosphere of the field A voice analysis system that determines whether a person has a tendency to swell or swell.
コンピュータシステムが、
第1の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第1の感情ベクトルを前記第1の話者の音声から演算するステップと、
第2の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第2の感情ベクトルを前記第2の話者の音声から演算するステップと、
前記第1の感情ベクトルと前記第2の感情ベクトルとを用いて、前記第1の話者と前記第2の話者との間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルを演算するステップと、
前記第1の感情ベクトルの少なくとも一つの成分と、前記第2の感情ベクトルの少なくとも一つの成分と、前記雰囲気ベクトルの長さとを表示装置に表示するステップと、
を実行する音声解析方法。
Computer system
Calculating a first emotion vector, which is a vector expression having a plurality of types of quantified emotions of the first speaker as a plurality of components, from the voice of the first speaker;
Calculating a second emotion vector, which is a vector expression having a plurality of types of quantified emotions of the second speaker as a plurality of components, from the voice of the second speaker;
Using the first emotion vector and the second emotion vector, a plurality of types of quantified emotions indicating the atmosphere of the field between the first speaker and the second speaker are displayed. Calculating an atmosphere vector that is a vector representation as a component of
Displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and the length of the atmosphere vector on a display device;
Voice analysis method to execute.
コンピュータシステムに、
第1の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第1の感情ベクトルを前記第1の話者の音声から演算するステップと、
第2の話者の複数種類の定量化された感情を複数の成分とするベクトル表現である第2の感情ベクトルを前記第2の話者の音声から演算するステップと、
前記第1の感情ベクトルと前記第2の感情ベクトルとを用いて、前記第1の話者と前記第2の話者との間の場の雰囲気を示す複数種類の定量化された感情を複数の成分とするベクトル表現である雰囲気ベクトルを演算するステップと、
前記第1の感情ベクトルの少なくとも一つの成分と、前記第2の感情ベクトルの少なくとも一つの成分と、前記雰囲気ベクトルの長さとを表示装置に表示するステップと、
を実行させる音声解析プログラム。
Computer system,
Calculating a first emotion vector, which is a vector expression having a plurality of types of quantified emotions of the first speaker as a plurality of components, from the voice of the first speaker;
Calculating a second emotion vector, which is a vector expression having a plurality of types of quantified emotions of the second speaker as a plurality of components, from the voice of the second speaker;
Using the first emotion vector and the second emotion vector, a plurality of types of quantified emotions indicating the atmosphere of the field between the first speaker and the second speaker are displayed. Calculating an atmosphere vector that is a vector representation as a component of
Displaying at least one component of the first emotion vector, at least one component of the second emotion vector, and the length of the atmosphere vector on a display device;
Voice analysis program to execute.
JP2017182451A 2017-09-22 2017-09-22 Voice analysis system, voice analysis method, and voice analysis program Active JP6982792B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017182451A JP6982792B2 (en) 2017-09-22 2017-09-22 Voice analysis system, voice analysis method, and voice analysis program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2017182451A JP6982792B2 (en) 2017-09-22 2017-09-22 Voice analysis system, voice analysis method, and voice analysis program

Publications (2)

Publication Number Publication Date
JP2019056879A true JP2019056879A (en) 2019-04-11
JP6982792B2 JP6982792B2 (en) 2021-12-17

Family

ID=66106409

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2017182451A Active JP6982792B2 (en) 2017-09-22 2017-09-22 Voice analysis system, voice analysis method, and voice analysis program

Country Status (1)

Country Link
JP (1) JP6982792B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020184243A (en) * 2019-05-09 2020-11-12 株式会社Empath Business support apparatus, business support method, and business support program
JPWO2021019643A1 (en) * 2019-07-29 2021-02-04

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011186521A (en) * 2010-03-04 2011-09-22 Nec Corp Emotion estimation device and emotion estimation method
JP2016007363A (en) * 2014-06-25 2016-01-18 日本電信電話株式会社 Group feeling estimation device, group feeling estimation method, and group feeling estimation program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011186521A (en) * 2010-03-04 2011-09-22 Nec Corp Emotion estimation device and emotion estimation method
JP2016007363A (en) * 2014-06-25 2016-01-18 日本電信電話株式会社 Group feeling estimation device, group feeling estimation method, and group feeling estimation program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020184243A (en) * 2019-05-09 2020-11-12 株式会社Empath Business support apparatus, business support method, and business support program
JPWO2021019643A1 (en) * 2019-07-29 2021-02-04

Also Published As

Publication number Publication date
JP6982792B2 (en) 2021-12-17

Similar Documents

Publication Publication Date Title
US10269374B2 (en) Rating speech effectiveness based on speaking mode
JP6171617B2 (en) Response target speech determination apparatus, response target speech determination method, and response target speech determination program
US9286889B2 (en) Improving voice communication over a network
JP4851447B2 (en) Speech analysis apparatus, speech analysis method, and speech analysis program for detecting pitch frequency
CN111739559B (en) Speech early warning method, device, equipment and storage medium
JP2017508188A (en) A method for adaptive spoken dialogue
JP4746533B2 (en) Multi-sound source section determination method, method, program and recording medium thereof
JP4587854B2 (en) Emotion analysis device, emotion analysis program, program storage medium
JP2007286377A (en) Answer evaluating device and method thereof, and program and recording medium therefor
JP2011186384A (en) Noise estimation device, noise reduction system, noise estimation method and program
JPWO2020013296A1 (en) A device for estimating mental and nervous system diseases
JP6982792B2 (en) Voice analysis system, voice analysis method, and voice analysis program
Godin et al. Glottal waveform analysis of physical task stress speech
JP5803125B2 (en) Suppression state detection device and program by voice
JPH08286693A (en) Information processing device
JP6565500B2 (en) Utterance state determination device, utterance state determination method, and determination program
JP2007316330A (en) Rhythm identifying device and method, voice recognition device and method
JP2017196115A (en) Cognitive function evaluation device, cognitive function evaluation method, and program
JP2012024527A (en) Device for determining proficiency level of abdominal breathing
US11404064B2 (en) Information processing apparatus and speech analysis method
JPWO2016207951A1 (en) Shunt sound analysis device, shunt sound analysis method, computer program, and recording medium
JP2010026323A (en) Speech speed detection device
JP2007328288A (en) Rhythm identification device and method, and voice recognition device and method
JP7334467B2 (en) Response support device and response support method
JP4913666B2 (en) Cough detection device and cough detection program

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20200609

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20210423

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20210528

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20210726

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20211022

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20211104

R150 Certificate of patent or registration of utility model

Ref document number: 6982792

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150