JP6589040B1

JP6589040B1 - Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system

Info

Publication number: JP6589040B1
Application number: JP2018502279A
Authority: JP
Inventors: 武志水本; 哲也菅原
Original assignee: Hylable Inc
Current assignee: Hylable Inc
Priority date: 2018-01-16
Filing date: 2018-01-16
Publication date: 2019-10-09
Anticipated expiration: 2038-01-16
Also published as: WO2019142231A1; JPWO2019142231A1

Abstract

本発明は、議論における参加者の発言量の時間変化に基づく分析を行うための情報を出力できる音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムを提供することを目的とする。本発明の一実施形態に係る音声分析装置１００は、複数の参加者が発した音声を取得する音声取得部１１２と、音声における、複数の参加者それぞれの時間ごとの発言量を特定する分析部１１４と、ユーザからの入力に基づいて、音声において区間を設定する区間設定部１１５と、複数の参加者の発言量の時間変化を互いに積み上げたグラフと、グラフにおける区間を示す情報とを出力する出力部１１６と、を有する。An object of the present invention is to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system that can output information for performing analysis based on temporal changes in the amount of speech of participants in a discussion. An audio analysis apparatus 100 according to an embodiment of the present invention includes an audio acquisition unit 112 that acquires audio uttered by a plurality of participants, and an analysis unit that specifies the amount of speech for each of a plurality of participants in the audio. 114, a section setting unit 115 that sets a section in speech based on an input from the user, a graph obtained by accumulating temporal changes in speech amounts of a plurality of participants, and information indicating the sections in the graph are output. And an output unit 116.

Description

本発明は、音声を分析するための音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムに関する。 The present invention relates to a speech analysis apparatus, a speech analysis method, a speech analysis program, and a speech analysis system for analyzing speech.

グループ学習や会議における議論を分析する方法として、ハークネス法（ハークネスメソッドともいう）が知られている（例えば、非特許文献１参照）。ハークネス法では、各参加者の発言の遷移を線で記録する。これにより、各参加者の議論への貢献や、他者との関係性を分析することができる。ハークネス法は、学生が主体的に学習を行うアクティブ・ラーニングにも効果的に適用できる。 A Harkness method (also referred to as a Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1). In the Harkness method, the transition of each participant's speech is recorded as a line. As a result, each participant's contribution to the discussion and the relationship with others can be analyzed. The Harkness method can be effectively applied to active learning where students learn independently.

Paul Sevigny、「Extreme Discussion Circles : Preparing ESL Students for "The Harkness Method"」、Polyglossia、立命館アジア太平洋大学言語教育センター、平成24年10月、第23号、p. 181-191Paul Sevigny, “Extreme Discussion Circles: Preparing ESL Students for“ The Harkness Method ””, Polyglossia, Center for Language Education, Ritsumeikan Asia Pacific University, October 2012, No. 23, p. 181-191

しかしながら、ハークネス法は議論の開始から終了までの全期間の発言の傾向を示すため、時系列に沿った各参加者の発言量の変化を示すことができない。そのため、各参加者の発言量の時間変化に基づく分析が難しいという問題があった。 However, since the Harkness method shows the tendency of the speech of the whole period from the start to the end of the discussion, it cannot show the change of the speech amount of each participant along the time series. For this reason, there is a problem that it is difficult to analyze based on changes in the amount of speech of each participant.

本発明はこれらの点に鑑みてなされたものであり、議論における参加者の発言量の時間変化に基づく分析を行うための情報を出力できる音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムを提供することを目的とする。 The present invention has been made in view of these points, and a speech analysis device, a speech analysis method, a speech analysis program, and speech analysis that can output information for performing analysis based on temporal changes in the amount of speech of participants in a discussion The purpose is to provide a system.

本発明の第１の態様の音声分析装置は、複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力する出力部と、を有する。 The voice analysis device according to the first aspect of the present invention includes an acquisition unit that acquires voices uttered by a plurality of participants, and an analysis unit that specifies a speech amount for each of the plurality of participants in the voices. A section setting unit for setting a section in the voice based on an input from a user, a graph obtained by accumulating time changes in the amount of speech of the plurality of participants, and information indicating the section in the graph And an output unit for outputting.

前記出力部は、２つの前記区間の間で切り替わった時間に対応する前記グラフ上の位置を、前記区間を示す情報として出力してもよい。 The output unit may output a position on the graph corresponding to a time switched between the two sections as information indicating the section.

前記区間設定部は、前記音声分析装置と通信する通信端末における操作と、前記音声を取得する集音装置における操作と、前記音声に含まれる所定の音とのうち少なくとも１つに基づいて、前記区間を設定してもよい。 The section setting unit is based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device that acquires the voice, and a predetermined sound included in the voice, A section may be set.

前記出力部は、前記複数の参加者それぞれについて算出された前記発言量のばらつきの程度が小さい順に、前記発言量の時間変化を互いに積み上げた前記グラフを出力してもよい。 The output unit may output the graph obtained by accumulating temporal changes in the speech amount in order of increasing degree of variation in the speech amount calculated for each of the plurality of participants.

前記出力部は、前記複数の参加者それぞれについて算出された前記区間ごとの前記発言量のばらつきの程度が小さい順に、前記区間ごとに前記発言量の時間変化を互いに積み上げた前記グラフを出力してもよい。 The output unit outputs the graph obtained by accumulating temporal changes in the amount of speech for each section in descending order of the degree of variation in the amount of speech for each of the sections calculated for each of the plurality of participants. Also good.

前記出力部は、複数の前記音声に設定された同じ前記区間についての複数の前記グラフを出力してもよい。 The output unit may output a plurality of graphs for the same section set for a plurality of the voices.

前記グラフ及び前記区間を示す情報に加えて、前記音声の時間内に発生したイベントを示す情報を、前記グラフ上に出力してもよい。 In addition to the information indicating the graph and the section, information indicating an event that has occurred within the time of the sound may be output on the graph.

前記分析部は、所定の時間窓内に参加者の発言を行った時間の長さを、前記時間窓の長さで割った値を、前記発言量として特定してもよい。 The analysis unit may specify a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window as the amount of speech.

本発明の第２の態様の音声分析方法は、プロセッサが、複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、ユーザからの入力に基づいて、前記音声において区間を設定するステップと、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、を実行する。 In the speech analysis method according to the second aspect of the present invention, the processor acquires the speech uttered by a plurality of participants, and specifies the amount of speech for each of the plurality of participants in the speech. And a step of setting a section in the voice based on an input from a user, a graph in which temporal changes in the amount of speech of the plurality of participants are stacked, and information indicating the section in the graph And executing a step.

本発明の第３の態様の音声分析プログラムは、コンピュータに、複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、ユーザからの入力に基づいて、前記音声において区間を設定するステップと、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、を実行させる。 The speech analysis program according to the third aspect of the present invention includes a step of acquiring, in a computer, speech uttered by a plurality of participants, and a step of specifying a speech amount for each of the plurality of participants in the speech. And a step of setting a section in the voice based on an input from a user, a graph in which temporal changes in the amount of speech of the plurality of participants are stacked, and information indicating the section in the graph And executing a step.

本発明の第４の態様の音声分析システムは、音声分析装置と、前記音声分析装置と通信可能な通信端末と、を備え、前記通信端末は、情報を表示する表示部を有し、前記音声分析装置は、複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを、前記表示部に表示させる出力部と、を有する。 A speech analysis system according to a fourth aspect of the present invention includes a speech analysis device and a communication terminal capable of communicating with the speech analysis device, and the communication terminal includes a display unit that displays information, and the speech The analysis device is based on an acquisition unit that acquires voices uttered by a plurality of participants, an analysis unit that specifies a speech amount for each of the plurality of participants in the voice, and an input from a user, An output unit configured to display on the display unit a section setting unit that sets a section in the voice, a graph obtained by accumulating temporal changes in the amount of speech of the plurality of participants, and information indicating the section in the graph And having.

本発明によれば、議論の時系列に沿った各参加者の発言量の変化を出力できるという効果を奏する。 According to the present invention, it is possible to output a change in the amount of speech of each participant along a discussion time series.

本実施形態に係る音声分析システムの模式図である。It is a mimetic diagram of a voice analysis system concerning this embodiment. 本実施形態に係る音声分析システムのブロック図である。It is a block diagram of the speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムが行う音声分析方法の模式図である。It is a schematic diagram of the audio | voice analysis method which the audio | voice analysis system which concerns on this embodiment performs. 設定画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the setting screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the message amount screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the message amount screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the message amount screen. 区間抽出画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the area extraction screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the message amount screen. 本実施形態に係る音声分析システムが行う音声分析方法のシーケンス図である。It is a sequence diagram of the audio | voice analysis method which the audio | voice analysis system which concerns on this embodiment performs.

［音声分析システムＳの概要］
図１は、本実施形態に係る音声分析システムＳの模式図である。音声分析システムＳは、音声分析装置１００と、集音装置１０と、通信端末２０とを含む。音声分析システムＳが含む集音装置１０及び通信端末２０の数は限定されない。音声分析システムＳは、その他のサーバ、端末等の機器を含んでもよい。[Outline of Speech Analysis System S]
FIG. 1 is a schematic diagram of a speech analysis system S according to the present embodiment. The voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20. The number of the sound collecting devices 10 and the communication terminals 20 included in the voice analysis system S is not limited. The voice analysis system S may include other servers and terminals.

音声分析装置１００、集音装置１０及び通信端末２０は、ローカルエリアネットワーク、インターネット等のネットワークＮを介して接続される。音声分析装置１００、集音装置１０及び通信端末２０のうち少なくとも一部は、ネットワークＮを介さず直接接続されてもよい。 The voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least some of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without going through the network N.

集音装置１０は、異なる向きに配置された複数の集音部（マイクロフォン）を含むマイクロフォンアレイを備える。例えばマイクロフォンアレイは、地面に対する水平面において、同一円周上に等間隔で配置された８個のマイクロフォンを含む。集音装置１０は、マイクロフォンアレイを用いて取得した音声をデータとして音声分析装置１００に送信する。 The sound collection device 10 includes a microphone array including a plurality of sound collection units (microphones) arranged in different directions. For example, the microphone array includes eight microphones arranged at equal intervals on the same circumference in a horizontal plane with respect to the ground. The sound collection device 10 transmits the sound acquired using the microphone array to the sound analysis device 100 as data.

通信端末２０は、有線又は無線の通信を行うことが可能な通信装置である。通信端末２０は、例えばスマートフォン端末等の携帯端末、又はパーソナルコンピュータ等のコンピュータ端末である。通信端末２０は、分析者から分析条件の設定を受け付けるとともに、音声分析装置１００による分析結果を表示する。 The communication terminal 20 is a communication device capable of performing wired or wireless communication. The communication terminal 20 is a portable terminal such as a smartphone terminal or a computer terminal such as a personal computer. The communication terminal 20 receives the analysis condition setting from the analyst and displays the analysis result by the voice analysis device 100.

音声分析装置１００は、集音装置１０によって取得された音声を、後述の音声分析方法によって分析するコンピュータである。また、音声分析装置１００は、音声分析の結果を通信端末２０に送信する。 The voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. In addition, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.

［音声分析システムＳの構成］
図２は、本実施形態に係る音声分析システムＳのブロック図である。図２において、矢印は主なデータの流れを示しており、図２に示していないデータの流れがあってよい。図２において、各ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図２に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。[Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. In FIG. 2, arrows indicate main data flows, and there may be data flows not shown in FIG. In FIG. 2, each block represents a functional unit configuration, not a hardware (device) unit configuration. Therefore, the blocks shown in FIG. 2 may be implemented in a single device, or may be separately implemented in a plurality of devices. Data exchange between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, or the like.

通信端末２０は、各種情報を表示するための表示部２１と、分析者による操作を受け付けるための操作部２２とを有する。表示部２１は、液晶ディスプレイ、有機エレクトロルミネッセンス（OLED: Organic Light Emitting Diode）ディスプレイ等の表示装置を含む。操作部２２は、ボタン、スイッチ、ダイヤル等の操作部材を含む。表示部２１として分析者による接触の位置を検出可能なタッチスクリーンを用いることによって、表示部２１と操作部２２とを一体に構成してもよい。 The communication terminal 20 includes a display unit 21 for displaying various information and an operation unit 22 for receiving an operation by an analyst. The display unit 21 includes a display device such as a liquid crystal display and an organic light emitting diode (OLED) display. The operation unit 22 includes operation members such as buttons, switches, and dials. By using a touch screen capable of detecting the position of contact by the analyst as the display unit 21, the display unit 21 and the operation unit 22 may be configured integrally.

音声分析装置１００は、制御部１１０と、通信部１２０と、記憶部１３０とを有する。制御部１１０は、設定部１１１と、音声取得部１１２と、音源定位部１１３と、分析部１１４と、区間設定部１１５と、出力部１１６とを有する。記憶部１３０は、設定情報記憶部１３１と、音声記憶部１３２と、分析結果記憶部１３３とを有する。 The voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130. The control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, a section setting unit 115, and an output unit 116. The storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.

通信部１２０は、ネットワークＮを介して集音装置１０及び通信端末２０との間で通信をするための通信インターフェースである。通信部１２０は、通信を実行するためのプロセッサ、コネクタ、電気回路等を含む。通信部１２０は、外部から受信した通信信号に所定の処理を行ってデータを取得し、取得したデータを制御部１１０に入力する。また、通信部１２０は、制御部１１０から入力されたデータに所定の処理を行って通信信号を生成し、生成した通信信号を外部に送信する。 The communication unit 120 is a communication interface for communicating between the sound collection device 10 and the communication terminal 20 via the network N. The communication unit 120 includes a processor, a connector, an electric circuit, and the like for executing communication. The communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. In addition, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.

記憶部１３０は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ハードディスクドライブ等を含む記憶媒体である。記憶部１３０は、制御部１１０が実行するプログラムを予め記憶している。記憶部１３０は、音声分析装置１００の外部に設けられてもよく、その場合に通信部１２０を介して制御部１１０との間でデータの授受を行ってもよい。 The storage unit 130 is a storage medium including a ROM (Read Only Memory), a RAM (Random Access Memory), a hard disk drive, and the like. The storage unit 130 stores a program executed by the control unit 110 in advance. The storage unit 130 may be provided outside the voice analysis device 100, and in that case, data may be exchanged with the control unit 110 via the communication unit 120.

設定情報記憶部１３１は、通信端末２０において分析者によって設定された分析条件を示す設定情報を記憶する。音声記憶部１３２は、集音装置１０によって取得された音声を記憶する。分析結果記憶部１３３は、音声を分析した結果を示す分析結果を記憶する。設定情報記憶部１３１、音声記憶部１３２及び分析結果記憶部１３３は、それぞれ記憶部１３０上の記憶領域であってもよく、あるいは記憶部１３０上で構成されたデータベースであってもよい。 The setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20. The sound storage unit 132 stores the sound acquired by the sound collection device 10. The analysis result storage unit 133 stores an analysis result indicating a result of analyzing the voice. Each of the setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be a storage area on the storage unit 130, or may be a database configured on the storage unit 130.

制御部１１０は、例えばＣＰＵ（Central Processing Unit）等のプロセッサであり、記憶部１３０に記憶されたプログラムを実行することにより、設定部１１１、音声取得部１１２、音源定位部１１３、分析部１１４、区間設定部１１５及び出力部１１６として機能する。設定部１１１、音声取得部１１２、音源定位部１１３、分析部１１４、区間設定部１１５及び出力部１１６の機能については、図３〜図９を用いて後述する。制御部１１０の機能の少なくとも一部は、電気回路によって実行されてもよい。また、制御部１１０の機能の少なくとも一部は、ネットワーク経由で実行されるプログラムによって実行されてもよい。 The control unit 110 is a processor such as a CPU (Central Processing Unit), for example, and by executing a program stored in the storage unit 130, a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, It functions as the section setting unit 115 and the output unit 116. The functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, the section setting unit 115, and the output unit 116 will be described later with reference to FIGS. At least a part of the function of the control unit 110 may be executed by an electric circuit. Further, at least part of the functions of the control unit 110 may be executed by a program executed via a network.

本実施形態に係る音声分析システムＳは、図２に示す具体的な構成に限定されない。例えば音声分析装置１００は、１つの装置に限られず、２つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。 The voice analysis system S according to the present embodiment is not limited to the specific configuration shown in FIG. For example, the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.

［音声分析方法の説明］
図３は、本実施形態に係る音声分析システムＳが行う音声分析方法の模式図である。まず分析者は、通信端末２０の操作部２２を操作することによって、分析条件の設定を行う。例えば分析条件は、分析対象とする議論の参加者の人数と、集音装置１０を基準とした各参加者（すなわち、複数の参加者それぞれ）が位置する向きとを示す情報である。通信端末２０は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置１００に送信する（ａ）。音声分析装置１００の設定部１１１は、通信端末２０から設定情報を取得して設定情報記憶部１３１に記憶させる。[Description of voice analysis method]
FIG. 3 is a schematic diagram of a speech analysis method performed by the speech analysis system S according to the present embodiment. First, the analyst sets analysis conditions by operating the operation unit 22 of the communication terminal 20. For example, the analysis condition is information indicating the number of participants in the discussion to be analyzed and the direction in which each participant (that is, each of the plurality of participants) is located with reference to the sound collection device 10. The communication terminal 20 receives the analysis condition setting from the analyst and transmits it as setting information to the speech analysis apparatus 100 (a). The setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and stores the setting information in the setting information storage unit 131.

図４は、設定画面Ａを表示している通信端末２０の表示部２１の前面図である。通信端末２０は、表示部２１上に設定画面Ａを表示し、分析者による分析条件の設定を受け付ける。設定画面Ａは、位置設定領域Ａ１と、開始ボタンＡ２と、終了ボタンＡ３とを含む。位置設定領域Ａ１は、分析対象の議論において、集音装置１０を基準として各参加者Ｕが実際に位置する向きを設定する領域である。例えば位置設定領域Ａ１は、図４のように集音装置１０の位置を中心とした円を表し、さらに円に沿って集音装置１０を基準とした角度を表している。 FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A. The communication terminal 20 displays the setting screen A on the display unit 21 and accepts the analysis condition setting by the analyst. The setting screen A includes a position setting area A1, a start button A2, and an end button A3. The position setting area A1 is an area for setting the direction in which each participant U is actually located with reference to the sound collection device 10 in the discussion of the analysis target. For example, the position setting area A1 represents a circle centered on the position of the sound collecting device 10 as shown in FIG. 4, and further represents an angle along the circle with reference to the sound collecting device 10.

分析者は、通信端末２０の操作部２２を操作することによって、位置設定領域Ａ１において各参加者Ｕの位置を設定する。各参加者Ｕについて設定された位置の近傍には、各参加者Ｕを識別する識別情報（ここではＵ１〜Ｕ４）が割り当てられて表示される。図４の例では、４人の参加者Ｕ１〜Ｕ４が設定されている。位置設定領域Ａ１内の各参加者Ｕに対応する部分は、参加者ごとに異なる色で表示される。これにより、分析者は容易に各参加者Ｕが設定されている向きを認識することができる。 The analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20. In the vicinity of the position set for each participant U, identification information (here, U1 to U4) for identifying each participant U is assigned and displayed. In the example of FIG. 4, four participants U1 to U4 are set. The part corresponding to each participant U in the position setting area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.

開始ボタンＡ２及び終了ボタンＡ３は、それぞれ表示部２１上に表示された仮想的なボタンである。通信端末２０は、分析者によって開始ボタンＡ２が押下されると、音声分析装置１００に開始指示の信号を送信する。通信端末２０は、分析者によって終了ボタンＡ３が押下されると、音声分析装置１００に終了指示の信号を送信する。本実施形態では、分析者による開始指示から終了指示までを１つの議論とする。 The start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively. When the start button A2 is pressed by the analyst, the communication terminal 20 transmits a start instruction signal to the voice analysis device 100. When the end button A3 is pressed by the analyst, the communication terminal 20 transmits an end instruction signal to the voice analysis device 100. In the present embodiment, one discussion is from the start instruction to the end instruction by the analyst.

音声分析装置１００の音声取得部１１２は、通信端末２０から開始指示の信号を受信した場合に、音声の取得を指示する信号を集音装置１０に送信する（ｂ）。集音装置１０は、音声分析装置１００から音声の取得を指示する信号を受信した場合に、音声の取得を開始する。また、音声分析装置１００の音声取得部１１２は、通信端末２０から終了指示の信号を受信した場合に、音声の取得の終了を指示する信号を集音装置１０に送信する。集音装置１０は、音声分析装置１００から音声の取得の終了を指示する信号を受信した場合に、音声の取得を終了する。 When the voice acquisition unit 112 of the voice analysis device 100 receives a start instruction signal from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing voice acquisition to the sound collector 10 (b). When the sound collection device 10 receives a signal instructing acquisition of sound from the sound analysis device 100, the sound collection device 10 starts acquiring sound. In addition, when the voice acquisition unit 112 of the voice analysis device 100 receives an end instruction signal from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the end of voice acquisition to the sound collector 10. When the sound collection device 10 receives a signal instructing the end of voice acquisition from the voice analysis device 100, the sound collection device 10 ends the voice acquisition.

集音装置１０は、複数の集音部においてそれぞれ音声を取得し、各集音部に対応する各チャネルの音声として内部に記録する。そして集音装置１０は、取得した複数のチャネルの音声を、音声分析装置１００に送信する（ｃ）。集音装置１０は、取得した音声を逐次送信してもよく、あるいは所定量又は所定時間の音声を送信してもよい。また、集音装置１０は、取得の開始から終了までの音声をまとめて送信してもよい。音声分析装置１００の音声取得部１１２は、集音装置１０から音声を受信して音声記憶部１３２に記憶させる。 The sound collection device 10 acquires sound in each of the plurality of sound collection units, and records the sound therein as sound of each channel corresponding to each sound collection unit. Then, the sound collecting device 10 transmits the acquired sounds of the plurality of channels to the sound analyzing device 100 (c). The sound collecting device 10 may sequentially transmit the acquired sound, or may transmit sound of a predetermined amount or a predetermined time. In addition, the sound collection device 10 may collectively transmit sounds from the start to the end of acquisition. The voice acquisition unit 112 of the voice analysis device 100 receives the voice from the sound collection device 10 and stores it in the voice storage unit 132.

音声分析装置１００は、集音装置１０から取得した音声を用いて、所定のタイミングで音声を分析する。音声分析装置１００は、分析者が通信端末２０において所定の操作によって分析指示を行った際に、音声を分析してもよい。この場合には、分析者は分析対象とする議論に対応する音声を音声記憶部１３２に記憶された音声の中から選択する。 The voice analysis device 100 analyzes the voice at a predetermined timing using the voice acquired from the sound collection device 10. The voice analysis device 100 may analyze voice when an analyst gives an analysis instruction by a predetermined operation on the communication terminal 20. In this case, the analyst selects the voice corresponding to the discussion to be analyzed from the voices stored in the voice storage unit 132.

また、音声分析装置１００は、音声の取得が終了した際に音声を分析してもよい。この場合には、取得の開始から終了までの音声が分析対象の議論に対応する。また、音声分析装置１００は、音声の取得の途中で逐次（すなわちリアルタイム処理で）音声を分析してもよい。この場合には、音声分析装置１００は、現在時間から遡って過去の所定時間分（例えば３０秒間）の音声が分析対象の議論に対応する。 Moreover, the voice analysis device 100 may analyze the voice when the voice acquisition is completed. In this case, the voice from the start to the end of acquisition corresponds to the analysis target discussion. Further, the voice analysis apparatus 100 may analyze the voice sequentially (that is, by real-time processing) during the voice acquisition. In this case, in the voice analysis device 100, the voice for the past predetermined time (for example, 30 seconds) from the current time corresponds to the discussion of the analysis target.

音声を分析する際に、まず音源定位部１１３は、音声取得部１１２が取得した複数チャネルの音声に基づいて音源定位を行う（ｄ）。音源定位は、音声取得部１１２が取得した音声に含まれる音源の向きを、時間ごと（例えば１０ミリ秒〜１００ミリ秒ごと）に推定する処理である。音源定位部１１３は、時間ごとに推定した音源の向きを、設定情報記憶部１３１に記憶された設定情報が示す参加者の向きと関連付ける。 When analyzing the sound, the sound source localization unit 113 first performs sound source localization based on a plurality of channels of sound acquired by the sound acquisition unit 112 (d). The sound source localization is a process of estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 every time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the sound source direction estimated for each time with the participant direction indicated by the setting information stored in the setting information storage unit 131.

音源定位部１１３は、集音装置１０から取得した音声に基づいて音源の向きを特定可能であれば、ＭＵＳＩＣ（Multiple Signal Classification）法、ビームフォーミング法等、公知の音源定位方法を用いることができる。 The sound source localization unit 113 can use a known sound source localization method such as a MUSIC (Multiple Signal Classification) method or a beam forming method as long as the direction of the sound source can be specified based on the sound acquired from the sound collector 10. .

次に分析部１１４は、音声取得部１１２が取得した音声及び音源定位部１１３が推定した音源の向きに基づいて、音声を分析する（ｅ）。分析部１１４は、完了した議論の全体を分析対象としてもよく、あるいはリアルタイム処理の場合に議論の一部を分析対象としてもよい。 Next, the analysis unit 114 analyzes the sound based on the sound acquired by the sound acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e). The analysis unit 114 may set the entire completed discussion as an analysis target, or may set a part of the discussion as an analysis target in the case of real-time processing.

具体的には、まず分析部１１４は、音声取得部１１２が取得した音声及び音源定位部１１３が推定した音源の向きに基づいて、分析対象の議論において、時間ごと（例えば１０ミリ秒〜１００ミリ秒ごと）に、いずれの参加者が発言（発声）したかを判別する。分析部１１４は、１人の参加者が発言を開始してから終了するまでの連続した期間を発言期間として特定し、分析結果記憶部１３３に記憶させる。同じ時間に複数の参加者が発言を行った場合には、分析部１１４は、参加者ごとに発言期間を特定する。 Specifically, first, the analysis unit 114 performs analysis (for example, 10 milliseconds to 100 millimeters) in the analysis target based on the sound acquired by the sound acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113. Every second), it is determined which participant speaks (speaks). The analysis unit 114 specifies a continuous period from when one participant starts speaking until it ends as a speech period, and causes the analysis result storage unit 133 to store it. When a plurality of participants make a statement at the same time, the analysis unit 114 specifies a statement period for each participant.

また、分析部１１４は、時間ごとの各参加者の発言量を算出し、分析結果記憶部１３３に記憶させる。具体的には、分析部１１４は、ある時間窓（例えば５秒間）において、参加者の発言を行った時間の長さを時間窓の長さで割った値を、時間ごとの発言量（活性度ともいう）として算出する。そして分析部１１４は、議論の開始時間から終了時間（リアルタイム処理の場合には現在）まで、時間窓を所定の時間（例えば１秒）ずつずらしながら、各参加者について時間ごとの発言量の算出を繰り返す。 In addition, the analysis unit 114 calculates the amount of speech of each participant for each hour and stores it in the analysis result storage unit 133. Specifically, the analysis unit 114 calculates a value obtained by dividing the length of time during which a participant has made a speech by the length of the time window in a certain time window (for example, 5 seconds). (Also called degrees). Then, the analysis unit 114 calculates the amount of speech for each participant for each participant while shifting the time window by a predetermined time (for example, 1 second) from the start time of the discussion to the end time (current in the case of real-time processing). repeat.

区間設定部１１５は、ユーザ（参加者又は分析者）からの入力に基づいて、分析対象の議論に対応する音声に対して、１つ以上の区間を設定する。区間は、例えば「国語」、「理科」、「社会」のような議論の題目となった教科ごとに設定されてもよく、「ディスカッション」、「アイデア出し」、「まとめ」のような議論中の段階ごとに設定されてもよい。区間設定部１１５は、区間を示す区間情報を、設定対象の音声と関連付けて分析結果記憶部１３３に記憶させる。 The section setting unit 115 sets one or more sections for the voice corresponding to the discussion to be analyzed based on the input from the user (participant or analyst). The section may be set for each subject subject of the discussion such as “Japanese”, “Science”, “Society”, etc., and discussions such as “Discussion”, “Idea”, and “Summary” are in progress. It may be set for each stage. The section setting unit 115 stores the section information indicating the section in the analysis result storage unit 133 in association with the setting target voice.

区間情報は、区間の名称と、区間の時間（すなわち音声中の区間の開始時間及び終了時間）とを含む。区間設定部１１５は、（１）通信端末２０における操作、（２）集音装置１０における操作、及び（３）集音装置１０が取得した所定の音のうち少なくとも１つに基づいて、区間を設定する。 The section information includes the name of the section and the time of the section (that is, the start time and end time of the section in the voice). The section setting unit 115 selects a section based on at least one of (1) an operation on the communication terminal 20, (2) an operation on the sound collection device 10, and (3) a predetermined sound acquired by the sound collection device 10. Set.

通信端末２０における操作に基づいて区間を設定する場合には、参加者又は分析者は、通信端末２０の操作部２２（例えばタッチスクリーン、マウス、キーボード等）を操作することによって、区間情報に含まれる文字列及び時間を入力する。参加者又は分析者は、議論の終了後に区間情報を入力してもよく、あるいは議論の途中で区間情報を入力してもよい。そして区間設定部１１５は、通信端末２０において特定された区間情報を、通信部１２０を介して受信して分析結果記憶部１３３に記憶させる。 When a section is set based on an operation on the communication terminal 20, a participant or an analyst is included in the section information by operating the operation unit 22 (for example, touch screen, mouse, keyboard, etc.) of the communication terminal 20. Enter the character string and time to be displayed. Participants or analysts may input the section information after the discussion ends, or may input the section information during the discussion. Then, the section setting unit 115 receives the section information specified in the communication terminal 20 via the communication unit 120 and stores it in the analysis result storage unit 133.

集音装置１０における操作に基づいて区間を設定する場合には、参加者又は分析者は、区間の切り替え時に、集音装置１０に設けられたスイッチやタッチスクリーン等の操作部を操作することによって、区間を設定する。集音装置１０の操作部の操作は、予め所定の区間の切り替え（例えば「ディスカッション」区間から「アイデア出し」区間への切り替え）に関連付けられている。区間設定部１１５は、通信部１２０を介して集音装置１０の操作部から操作を示す情報を受信し、該操作のタイミングにおける所定の区間の切り替えを特定する。そして区間設定部１１５は、特定した区間情報を、分析結果記憶部１３３に記憶させる。 When a section is set based on an operation in the sound collecting device 10, a participant or an analyst operates an operation unit such as a switch or a touch screen provided in the sound collecting device 10 at the time of switching the section. Set the interval. The operation of the operation unit of the sound collection device 10 is associated in advance with switching of a predetermined section (for example, switching from a “discussion” section to an “idea out” section). The section setting unit 115 receives information indicating an operation from the operation unit of the sound collecting device 10 via the communication unit 120, and identifies switching of a predetermined section at the timing of the operation. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.

集音装置１０が取得した所定の音に基づいて区間を設定する場合には、参加者又は分析者は、音を発生可能な装置（例えば携帯端末、音楽再生装置等）を用いて、区間の切り替えを示す所定の切り替え音を発生させる。切り替え音は、人間に聴こえる音波でもよく、人間に聴こえない超音波でもよい。切り替え音は、例えば予め定義された周波数又はオン／オフのパターンによって、区間の切り替えを示す。切り替え音は、区間の切り替えのタイミングのみで発せられてもよく、あるいは区間の中で持続的に発せられてもよい。 When the section is set based on the predetermined sound acquired by the sound collection device 10, the participant or the analyst uses a device capable of generating sound (for example, a portable terminal, a music playback device, etc.) A predetermined switching sound indicating switching is generated. The switching sound may be a sound wave that can be heard by humans or an ultrasonic wave that cannot be heard by humans. The switching sound indicates the switching of sections by, for example, a predefined frequency or an on / off pattern. The switching sound may be generated only at the switching timing of the section, or may be generated continuously in the section.

切り替え音として、区間ごとに異なる音を用いることができる。この場合に、区間設定部１１５は、集音装置１０が取得した音声に含まれる切り替え音を検出する。そして区間設定部１１５は、切り替え音が変化したタイミングにおける、変化前の切り替え音に対応する区間から変化後の切り替え音に対応する区間への切り替えを特定する。そして区間設定部１１５は、特定した区間情報を、分析結果記憶部１３３に記憶させる。 A different sound can be used for each section as the switching sound. In this case, the section setting unit 115 detects a switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 identifies switching from the section corresponding to the switching sound before the change to the section corresponding to the switching sound after the change at the timing when the switching sound changes. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.

また、切り替え音として、所定の区間の切り替え（例えば「ディスカッション」区間から「アイデア出し」区間への切り替え）を示す音を用いることができる。この場合に、区間設定部１１５は、集音装置１０が取得した音声に含まれる切り替え音を検出する。そして区間設定部１１５は、切り替え音が発せられたタイミングにおける、所定の区間の切り替えを特定する。そして区間設定部１１５は、特定した区間情報を、分析結果記憶部１３３に記憶させる。 In addition, a sound indicating switching of a predetermined section (for example, switching from a “discussion” section to an “idea out” section) can be used as the switching sound. In this case, the section setting unit 115 detects a switching sound included in the sound acquired by the sound collection device 10. The section setting unit 115 identifies switching of a predetermined section at the timing when the switching sound is generated. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.

出力部１１６は、表示情報を通信端末２０に送信することによって、分析部１１４による分析結果を表示部２１上に表示させる制御を行う（ｆ）。出力部１１６は、表示部２１への表示に限られず、プリンタによる印刷、記憶装置へのデータ記録等、その他の方法によって分析結果を出力してもよい。出力部１１６による分析結果の出力方法を、図５〜図９を用いて以下に説明する。 The output unit 116 controls the display unit 21 to display the analysis result by the analysis unit 114 by transmitting the display information to the communication terminal 20 (f). The output unit 116 is not limited to display on the display unit 21, and the analysis result may be output by other methods such as printing by a printer and data recording in a storage device. An analysis result output method by the output unit 116 will be described below with reference to FIGS.

［区間ごとの発言量の表示方法の説明］
音声分析装置１００の出力部１１６は、分析結果を表示する際に、表示対象の議論についての分析部１１４による分析結果及び区間設定部１１５による区間情報を分析結果記憶部１３３から読み出す。出力部１１６は、分析部１１４による分析が完了した直後の議論を表示対象としてもよく、あるいは分析者によって指定された議論を表示対象としてもよい。[Description of how to display the amount of speech for each section]
When displaying the analysis result, the output unit 116 of the voice analysis device 100 reads the analysis result by the analysis unit 114 and the section information by the section setting unit 115 on the discussion of the display target from the analysis result storage unit 133. The output unit 116 may display the discussion immediately after the analysis by the analysis unit 114 is completed, or may display the discussion designated by the analyst.

図５は、発言量画面Ｂを表示している通信端末２０の表示部２１の前面図である。発言量画面Ｂは、区間ごとの発言量の時間変化を示す情報を表示する画面であり、発言量のグラフＢ１と、区間の名称Ｂ２と、区間の切り替え線Ｂ３とを含む。 FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B. The utterance amount screen B is a screen that displays information indicating the temporal change of the utterance amount for each section, and includes a utterance amount graph B1, a section name B2, and a section switching line B3.

発言量画面Ｂを表示する際に、出力部１１６は、分析結果記憶部１３３から読み出した分析結果及び区間情報に基づいて、区間ごとの各参加者の発言量の時間変化を表示するための表示情報を生成する。 When displaying the utterance amount screen B, the output unit 116 displays a time change of the utterance amount of each participant for each section based on the analysis result and the section information read from the analysis result storage unit 133. Generate information.

グラフＢ１は、各参加者Ｕの発言量の時間変化を示すグラフである。出力部１１６は、縦軸に発言量（活性度）を、横軸に時間をとって、各参加者Ｕについて分析結果が示す時間ごとの発言量を折れ線グラフとして、表示部２１に表示させる。このとき、出力部１１６は、各時点において参加者Ｕの発言量を互いに積み上げて、すなわち参加者Ｕの発言量を順に合計した値を、縦軸にとって表示する。 Graph B1 is a graph which shows the time change of the speech amount of each participant U. The output unit 116 causes the display unit 21 to display the amount of speech for each time indicated by the analysis result for each participant U as a line graph, with the amount of speech (activity) on the vertical axis and time on the horizontal axis. At this time, the output unit 116 displays the value obtained by accumulating the speech amounts of the participants U at each time point, that is, the value obtained by sequentially summing the speech amounts of the participants U in order.

図５の例では、参加者Ｕ４の発言量は参加者Ｕ３及びＵ４の発言量の合計値であり、参加者Ｕ２の発言量は参加者Ｕ２、Ｕ３及びＵ４の発言量の合計値であり、参加者Ｕ１の発言量は参加者Ｕ１、Ｕ２、Ｕ３及びＵ４の発言量の合計値である。出力部１１６は、参加者Ｕの発言量を積み上げる（合計する）順番を、無作為に決定してもよく、あるいは所定の規則に従って決定してもよい。 In the example of FIG. 5, the speech amount of the participant U4 is the total value of the speech amounts of the participants U3 and U4, the speech amount of the participant U2 is the total value of the speech amounts of the participants U2, U3, and U4, The amount of speech of the participant U1 is a total value of the amount of speech of the participants U1, U2, U3, and U4. The output unit 116 may randomly determine the order in which the speech amount of the participant U is accumulated (summed up) or may be determined according to a predetermined rule.

これにより出力部１１６は、各参加者Ｕの発言量に加えて、議論のグループ全体の発言量を表示することができる。分析者は、各参加者Ｕの貢献の時間変化を把握できると同時に、参加者Ｕのグループ全体の盛り上がりの時間変化を把握できる。 Accordingly, the output unit 116 can display the amount of speech of the entire discussion group in addition to the amount of speech of each participant U. The analyst can grasp the time change of the contribution of each participant U and at the same time grasp the time change of the excitement of the entire group of the participant U.

出力部１１６は、参加者ＵごとのグラフＢ１を示す領域又は線を、参加者ごとに異なる色、模様等の表示態様で表示する。図５の例では参加者Ｕごとに異なる模様でグラフＢ１が表示されており、グラフＢ１の近傍に参加者Ｕと模様とを関連付ける凡例が表示されている。これにより、分析者はグラフＢ１がいずれの参加者Ｕに対応するかを容易に判別できる。 The output unit 116 displays the region or line indicating the graph B1 for each participant U in a display mode such as a different color or pattern for each participant. In the example of FIG. 5, the graph B1 is displayed in a different pattern for each participant U, and a legend that associates the participant U with the pattern is displayed in the vicinity of the graph B1. Thus, the analyst can easily determine which participant U the graph B1 corresponds to.

区間の名称Ｂ２は、区間の名称を表す文字列である。区間の切り替え線Ｂ３は、２つの区間の切り替わりのタイミングを示す線である。出力部１１６は、区間情報が示す各区間について、該区間に対応する時間範囲のグラフＢ１の近傍に区間の名称に表示させる。また、出力部１１６は、区間情報が示す区間の時間に基づいて、２つの区間の切り替わりのタイミングを特定する。そして出力部１１６は、特定した切り替わりのタイミングに対応するグラフＢ１の時間（横軸）の位置に切り替え線Ｂ３を表示させる。これにより出力部１１６は、各参加者Ｕの発言量のグラフＢ１が時間ごとにいずれの区間に対応するかを表示することができる。 The section name B2 is a character string representing the section name. The section switching line B3 is a line indicating the switching timing of two sections. The output unit 116 displays each section indicated by the section information in the section name in the vicinity of the graph B1 in the time range corresponding to the section. Further, the output unit 116 specifies the timing of switching between the two sections based on the section time indicated by the section information. Then, the output unit 116 displays the switching line B3 at the time (horizontal axis) position of the graph B1 corresponding to the identified switching timing. Accordingly, the output unit 116 can display which section the speech amount graph B1 of each participant U corresponds to every time.

このように出力部１１６は、各参加者Ｕの発言量の時間変化に重畳して、議論の中で設定された区間を示す情報を表示する。そのため分析者は、各参加者Ｕの発言量の時間変化を、区間ごとに把握することができる。 In this way, the output unit 116 displays information indicating the section set in the discussion, superimposed on the time change of the speech amount of each participant U. Therefore, the analyst can grasp the time change of the speech amount of each participant U for each section.

グラフＢ１は、各参加者Ｕの発言量を積み上げて（合計して）表示しているため、下に配置された参加者Ｕの発言量が変化すると、それにともなって上に配置された参加者Ｕの発言量も見かけ上変化したように表示される。そのため、各参加者Ｕの発言量の時間変化が一見してわかりづらい場合がある。そこで出力部１１６は、グラフＢ１において参加者Ｕの発言量を積み上げる順番を各参加者Ｕの発言量に基づいて決定することによって、各参加者Ｕの発言量の時間変化を見やすく表示することができる。 Since the graph B1 displays the total amount of speech of each participant U (in total), if the amount of speech of the participant U arranged below changes, the participant arranged above accordingly The amount of U's speech is also displayed as if it has changed. For this reason, the temporal change in the amount of speech of each participant U may be difficult to understand at a glance. Therefore, the output unit 116 can display the time change of the speech amount of each participant U in an easy-to-see manner by determining the order in which the speech amount of the participant U is accumulated based on the speech amount of each participant U in the graph B1. it can.

図６は、発言量画面Ｂを表示している通信端末２０の表示部２１の前面図である。図６の発言量画面Ｂにおいては発言量を積み上げる順番が区間ごとに変更されており、それ以外については図５の発言量画面Ｂと同様である。出力部１１６は、図５の発言量画面Ｂと図６の発言量画面Ｂとを分析者の操作に応じて切り替えて表示してもよく、予め定められた少なくとも一方を表示してもよい。 FIG. 6 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B. In the speech amount screen B of FIG. 6, the order in which the speech amount is accumulated is changed for each section, and the rest is the same as the speech amount screen B of FIG. The output unit 116 may switch and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 6 according to the operation of the analyst, or may display at least one predetermined.

積み上げる順番を変更する場合に、出力部１１６は、分析結果記憶部１３３から読み出した分析結果及び区間情報に基づいて、各区間における各参加者Ｕの発言量のばらつきの程度（例えば分散又は標準偏差）を算出する。そして出力部１１６は、区間ごとにばらつきの程度が小さい順番で参加者Ｕの発言量を積み上げることによって、グラフＢ１を生成する。出力部１１６は、区間ごとではなく、全区間のばらつきの程度に基づいて積み上げる順番を決定してもよい。 When changing the stacking order, the output unit 116, based on the analysis result and the section information read from the analysis result storage unit 133, the degree of variation in the amount of speech of each participant U in each section (for example, variance or standard deviation) ) Is calculated. And the output part 116 produces | generates graph B1 by accumulating the amount of participant U's utterances in order with a small degree of dispersion | variation for every area. The output unit 116 may determine the order of stacking based on the degree of variation in all sections, not for each section.

このように発言量のばらつきの程度が小さい順にグラフＢ１の下から積み上げることによって、下に配置された参加者Ｕの発言量の変化が、上に配置された参加者Ｕの見かけの発言量に及ぼす影響を低減することができる。また、区間によって各参加者Ｕの発言量の傾向が変化するため、区間ごとに積み上げの順番を変更することによって、発言量の時間変化をより見やすく表示することができる。 Thus, by stacking from the bottom of the graph B1 in ascending order of the degree of variation in the amount of speech, the change in the amount of speech of the participant U arranged below becomes the apparent amount of speech of the participant U arranged above. The influence exerted can be reduced. Moreover, since the tendency of the speech amount of each participant U varies depending on the section, the temporal change in the speech amount can be displayed more easily by changing the stacking order for each section.

［イベントの表示方法の説明］
出力部１１６は、グラフＢ１において、議論中（すなわち音声取得部１１２が取得した音声の時間内）に発生した所定のイベントを表示してもよい。これにより、分析者はイベントの発生が各参加者Ｕの発言量に与えた影響を分析することができる。イベントは、例えば（１）議論の補助者（教師、ファシリテータ等）のグループへの接近、又は（２）補助者の特定の発言（言葉）である。ここに示したイベントは一例であり、出力部１１６は、音声分析装置１００が認識可能なその他イベントの発生を表示してもよい。[Explanation of event display method]
The output unit 116 may display a predetermined event that occurred during the discussion (that is, within the time of the voice acquired by the voice acquisition unit 112) in the graph B1. Thus, the analyst can analyze the influence of the occurrence of the event on the speech amount of each participant U. The event is, for example, (1) approach to a group of discussion assistants (teacher, facilitator, etc.), or (2) a specific speech (word) of the assistant. The event shown here is an example, and the output unit 116 may display the occurrence of another event that can be recognized by the speech analysis apparatus 100.

補助者のグループへの接近を検出するために、出力部１１６は、集音装置１０と補助者との間で授受される信号を用いる。この場合に、補助者は例えばＢｌｕｅｔｏｏｔｈ（登録商標）等の無線通信の電波や超音波等によって所定の信号を発する発信機を保持し、集音装置１０は該信号を受信する受信機を備える。そして出力部１１６は、集音装置１０の受信機において補助者の発信機からの信号を受信できた場合又は信号を受信した強度が所定の閾値以上となった場合に、補助者が接近したことを判定する。また、出力部１１６は、集音装置１０の受信機において補助者の発信機からの信号を受信できなくなった場合又は信号を受信した強度が所定の閾値未満となった場合に、補助者が離脱したことを判定する。 In order to detect the approach of the assistant to the group, the output unit 116 uses a signal exchanged between the sound collector 10 and the assistant. In this case, the assistant holds a transmitter that emits a predetermined signal by radio communication radio waves or ultrasonic waves such as Bluetooth (registered trademark), and the sound collecting device 10 includes a receiver that receives the signal. The output unit 116 indicates that the assistant has approached when the signal from the assistant's transmitter can be received by the receiver of the sound collector 10 or when the intensity of the received signal exceeds a predetermined threshold. Determine. Further, the output unit 116 is configured so that the assistant can leave when the receiver of the sound collecting device 10 cannot receive a signal from the assistant's transmitter or when the intensity of receiving the signal becomes less than a predetermined threshold. Determine what happened.

また、補助者のグループへの接近を検出するために、出力部１１６は、補助者の声紋（すなわち補助者の声の周波数スペクトル）を用いてもよい。この場合に、出力部１１６は、予め補助者の声紋を登録しておき、議論中に集音装置１０によって取得した音声の中に補助者の声紋を検出する。そして出力部１１６は、補助者の声紋を検出した場合に補助者が接近したことを判定し、補助者の声紋を検出できなくなった場合に補助者が離脱したことを判定する。 Further, in order to detect the approach of the assistant to the group, the output unit 116 may use the assistant's voiceprint (that is, the frequency spectrum of the assistant's voice). In this case, the output unit 116 registers the assistant's voiceprint in advance, and detects the assistant's voiceprint in the voice acquired by the sound collecting device 10 during the discussion. The output unit 116 determines that the assistant has approached when the assistant's voiceprint is detected, and determines that the assistant has left when the assistant's voiceprint cannot be detected.

補助者の特定の言葉を検出するために、出力部１１６は、補助者の音声に対して音声認識を行う。この場合に、補助者は集音装置（例えばピンマイク）を保持し、出力部１１６は補助者が保持する集音装置によって取得した補助者の音声を受信する。集音装置１０とは別に補助者が保持する集音装置を用いることによって、参加者Ｕの音声と補助者の音声とを明確に区別することができる。 In order to detect a specific word of the assistant, the output unit 116 performs voice recognition on the assistant's voice. In this case, the assistant holds a sound collector (for example, a pin microphone), and the output unit 116 receives the voice of the assistant acquired by the sound collector held by the assistant. By using the sound collecting device held by the assistant separately from the sound collecting device 10, the voice of the participant U and the voice of the assistant can be clearly distinguished.

出力部１１６は、補助者が保持する集音装置から取得した音声を、文字列に変換する。出力部１１６は、音声を文字列に変換するために、公知の音声認識方法を用いることができる。そして出力部１１６は、変換された文字列の中に特定の言葉（例えば「最初」、「まとめ」、「最後」等の議論の進行に関わる言葉や、「良い」、「悪い」等の言葉）を検出する。検出対象の言葉は、予め音声分析装置１００に設定される。そして出力部１１６は、特定の言葉を検出した場合に、特定の言葉が発せられたことを判定する。 The output unit 116 converts the voice acquired from the sound collection device held by the assistant into a character string. The output unit 116 can use a known speech recognition method to convert speech into a character string. Then, the output unit 116 includes specific words (for example, words relating to the progress of discussion such as “first”, “summary”, “last”, and words such as “good” and “bad” in the converted character string. ) Is detected. The words to be detected are set in the speech analyzer 100 in advance. When the output unit 116 detects a specific word, the output unit 116 determines that the specific word has been issued.

出力部１１６は、各参加者Ｕの発言量の変化が大きいタイミングの前後にのみ音声認識を行ってもよい。この場合に、出力部１１６は、分析結果記憶部１３３から読み出した分析結果に基づいて、時間ごとの発言量の変化の程度（例えば単位時間あたりの変化の量又は割合）を算出する。発言量の変化の程度は、参加者Ｕごとに算出されてもよく、あるいは全ての参加者Ｕの合計として算出されてもよい。 The output unit 116 may perform voice recognition only before and after the timing when the change in the amount of speech of each participant U is large. In this case, based on the analysis result read from the analysis result storage unit 133, the output unit 116 calculates the degree of change in the amount of speech for each time (for example, the amount or rate of change per unit time). The degree of change in the amount of speech may be calculated for each participant U or may be calculated as the sum of all participants U.

そして出力部１１６は、変化の程度が所定の閾値以上であるタイミングを含む所定の時間範囲（例えば該タイミングの５秒前から５秒後）において、補助者が保持する集音装置によって取得した音声の音声認識を行う。一般的に音声認識は処理の負荷が大きい。そこでこのように発言量の変化の程度が大きいタイミングの前後にのみ音声認識を行うことによって、処理の負荷を低減しながら、発言量の変化の原因となった言葉を分析することができる。 The output unit 116 then obtains the sound acquired by the sound collector held by the assistant in a predetermined time range including the timing at which the degree of change is equal to or greater than the predetermined threshold (for example, from 5 seconds before to 5 seconds after the timing) Perform voice recognition. In general, speech recognition has a heavy processing load. Thus, by performing speech recognition only before and after the timing when the degree of change in the amount of speech is large, it is possible to analyze the words that caused the change in the amount of speech while reducing the processing load.

そして出力部１１６は、以上の方法によって検出したイベントを示す情報を、音声中の時間に関連付けた表示情報を生成する。図７は、発言量画面Ｂを表示している通信端末２０の表示部２１の前面図である。図７の発言量画面ＢにおいてはグラフＢ１上にイベント情報Ｂ４が表示されており、それ以外については図５の発言量画面Ｂと同様である。出力部１１６は、図５の発言量画面Ｂと図７の発言量画面Ｂとを分析者の操作に応じて切り替えて表示してもよく、予め定められた少なくとも一方を表示してもよい。 And the output part 116 produces | generates the display information which linked | related the information which shows the event detected by the above method with the time in an audio | voice. FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the message amount screen B. FIG. In the message volume screen B of FIG. 7, event information B4 is displayed on the graph B1, and the rest is the same as the message volume screen B of FIG. The output unit 116 may switch and display the speech amount screen B of FIG. 5 and the speech amount screen B of FIG. 7 according to the operation of the analyst, or may display at least one predetermined.

イベント情報Ｂ４は、イベントの内容及びタイミングを示す情報である。イベント情報Ｂ４は、例えば補助者が接近又は離脱したことを表す文字列や、音声認識によって検出した補助者の発言を表す文字列によって、イベントの内容を示す。また、イベント情報Ｂ４は、グラフＢ１上でイベントが発生したタイミングを示す矢印によって、イベントのタイミングを示す。 The event information B4 is information indicating the content and timing of the event. The event information B4 indicates the content of the event by, for example, a character string indicating that the assistant has approached or left, or a character string indicating the speech of the assistant detected by voice recognition. The event information B4 indicates the event timing by an arrow indicating the timing at which the event occurs on the graph B1.

このように出力部１１６は、各参加者Ｕの発言量の時間変化に重畳して、議論の中で発生したイベントの内容及びタイミングを示す情報を表示する。そのため分析者は、議論中に発生したイベントが各参加者Ｕの発言量の時間変化にどのように影響を与えたかを分析することができる。分析者は、例えば教師がグループに接近した場合に発言量が多くなった場合には、教師は議論を活性化できたと評価できる。また分析者は、例えば教師によって特定の言葉が発せられた場合に発言量が多くなった場合に、その言葉は議論を活性化させるための有効な言葉であると評価できる。 In this manner, the output unit 116 displays information indicating the content and timing of the event that occurred during the discussion, superimposed on the time variation of the speech amount of each participant U. Therefore, the analyst can analyze how the event that occurred during the discussion affected the temporal change in the amount of speech of each participant U. For example, when the amount of speech increases when the teacher approaches the group, the analyst can evaluate that the teacher has activated the discussion. Also, the analyst can evaluate that the words are effective words for activating the discussion when the amount of utterance increases when a specific word is uttered by the teacher, for example.

［同じ区間の発言量の表示方法の説明］
出力部１１６は、同じ区間における複数の発言量のグラフを抽出して表示することができる。図８は、区間抽出画面Ｃを表示している通信端末２０の表示部２１の前面図である。出力部１１６は、例えば図５〜図７の発言量画面Ｂにおいて分析者がいずれかの区間の名称Ｂ２を指定した場合に、指定された区間について区間抽出画面Ｃを表示する。区間抽出画面Ｃは、同じ区間の発言量のグラフを抽出した結果を表示する画面であり、発言量のグラフＣ１と、区間の名称Ｃ２と、グループの名称Ｃ３とを含む。[Description of how to display the amount of speech in the same section]
The output unit 116 can extract and display a plurality of speech amount graphs in the same section. FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the section extraction screen C. The output unit 116 displays the section extraction screen C for the specified section when the analyst specifies the name B2 of any section on the speech amount screen B of FIGS. The section extraction screen C is a screen for displaying a result of extracting a speech amount graph in the same section, and includes a speech amount graph C1, a section name C2, and a group name C3.

区間抽出画面Ｃを表示する際に、出力部１１６は、指定された区間についての複数のグループの分析結果及び区間情報を、分析結果記憶部１３３から抽出する。表示対象のグループは、同時に議論した異なるグループでもよく、あるいは過去に議論した同じ又は異なるグループでもよい。そして出力部１１６は、抽出した分析結果及び区間情報に基づいて、指定された区間における複数のグループについて各参加者の発言量の時間変化を表示するための表示情報を生成する。 When displaying the section extraction screen C, the output unit 116 extracts the analysis results and section information of a plurality of groups for the specified section from the analysis result storage unit 133. The groups to be displayed may be different groups discussed at the same time, or the same or different groups discussed in the past. And the output part 116 produces | generates the display information for displaying the time change of each participant's speech amount about the some group in the designated area based on the extracted analysis result and area information.

発言量のグラフＣ１は、２つ以上のグループのそれぞれについて、指定された区間における各参加者Ｕの発言量の時間変化を示すグラフである。グラフＣ１の表示態様は、グラフＢ１と同様である。区間の名称Ｃ２は、指定された区間の名称を示す文字列である。 The speech amount graph C1 is a graph showing temporal changes in the speech amount of each participant U in a specified section for each of two or more groups. The display mode of the graph C1 is the same as that of the graph B1. The section name C2 is a character string indicating the name of the designated section.

グループの名称Ｃ３は、表示対象のグループを識別するための名称であり、分析者によって設定されてもよく、あるいは音声分析装置１００によって自動的に決定されてもよい。図８の例では出力部１１６は２つのグループのグラフＣ１を表示しているが、３つ以上のグループのグラフＣ１を表示してもよい。また、出力部１１６は、グループの名称Ｃ３に代えて又は加えて、グループに属する１人又は複数人の参加者Ｕの名称を表示してもよい。 The group name C3 is a name for identifying a group to be displayed, and may be set by an analyst or may be automatically determined by the speech analysis apparatus 100. In the example of FIG. 8, the output unit 116 displays the graph C1 of two groups, but may display the graph C1 of three or more groups. The output unit 116 may display the names of one or more participants U belonging to the group instead of or in addition to the group name C3.

このように出力部１１６は、同じ区間について、異なるグループにおける各参加者の発言量の時間変化を示す複数のグラフを表示する。これにより、分析者は、同じ区間（例えば同じ教科、又は議論における同じ段階）について異なるグループの発言量の時間変化を比較して分析することができる。例えば分析者は、同時に議論した異なるグループを比較することによって、グループごとの発言量の傾向を把握することができる。また、例えば分析者は、同じグループについて同じ区間の過去の複数の議論を比較することによって、同じグループの発言量の傾向の変化を把握することができる。 In this way, the output unit 116 displays a plurality of graphs showing temporal changes in the amount of speech of each participant in different groups for the same section. Thereby, the analyst can compare and analyze temporal changes of the speech amount of different groups for the same section (for example, the same subject or the same stage in the discussion). For example, the analyst can grasp the tendency of the utterance amount for each group by comparing different groups discussed at the same time. Further, for example, the analyst can grasp a change in the tendency of the speech amount of the same group by comparing a plurality of past discussions in the same section for the same group.

［発言量のヒートマップの表示方法の説明］
出力部１１６は、図５のような積み上げグラフに限られず、各参加者Ｕの発言量の時間変化を示すヒートマップを表示してもよい。図９は、発言量画面Ｄを表示している通信端末２０の表示部２１の前面図である。発言量画面Ｄは、発言量のヒートマップＤ１と、区間の名称Ｄ２と、区間の切り替え線Ｄ３とを含む。区間の名称Ｄ２及び区間の切り替え線Ｄ３は、図５における区間の名称Ｂ２及び区間の切り替え線Ｂ３と同様である。[Explanation of how to display the heat map of speech volume]
The output unit 116 is not limited to the stacked graph as illustrated in FIG. 5, and may display a heat map indicating temporal changes in the amount of speech of each participant U. FIG. 9 is a front view of the display unit 21 of the communication terminal 20 displaying the message amount screen D. The utterance amount screen D includes a utterance amount heat map D1, a section name D2, and a section switching line D3. The section name D2 and the section switching line D3 are the same as the section name B2 and the section switching line B3 in FIG.

発言量のヒートマップＤ１は、時間に沿った発言量を、色によって表示する。図９は、色の違いを点の粗密によって表しており、例えば点の密度が高いほど濃い色であり、点の密度が低いほど薄い色である。出力部１１６は、所定の方向（例えば図９の横方向）に時間をとって、各参加者Ｕについて時間ごとの発言量に応じた色の領域を、表示部２１に表示させる。 The speech amount heat map D1 displays the amount of speech along time by color. FIG. 9 represents the difference in color by the density of the dots. For example, the higher the density of the dots, the darker the color, and the lower the density of the dots, the lighter the color. The output unit 116 takes time in a predetermined direction (for example, the horizontal direction in FIG. 9) and causes the display unit 21 to display a color area corresponding to the amount of speech for each participant U.

このように、出力部１１６がグラフの代わりにヒートマップを表示することによっても、分析者は、各参加者Ｕの発言量の時間変化を区間ごとに把握することができる。出力部１１６は、図５のグラフと図９のヒートマップとを分析者の操作に応じて切り替えて表示してもよく、予め定められた少なくとも一方を表示してもよい。 As described above, the output unit 116 also displays the heat map instead of the graph, so that the analyst can grasp the time change of the speech amount of each participant U for each section. The output unit 116 may switch and display the graph of FIG. 5 and the heat map of FIG. 9 according to the operation of the analyst, or may display at least one predetermined.

［音声分析方法のシーケンス］
図１０は、本実施形態に係る音声分析システムＳが行う音声分析方法のシーケンス図である。まず通信端末２０は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置１００に送信する（Ｓ１１）。音声分析装置１００の設定部１１１は、通信端末２０から設定情報を取得して設定情報記憶部１３１に記憶させる。[Speech analysis method sequence]
FIG. 10 is a sequence diagram of a speech analysis method performed by the speech analysis system S according to the present embodiment. First, the communication terminal 20 receives setting of analysis conditions from an analyst, and transmits it as setting information to the voice analysis device 100 (S11). The setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and stores the setting information in the setting information storage unit 131.

次に音声分析装置１００の音声取得部１１２は、音声の取得を指示する信号を集音装置１０に送信する（Ｓ１２）。集音装置１０は、音声分析装置１００から音声の取得を指示する信号を受信した場合に、複数の集音部を用いて音声の記録を開始し、記録した複数チャネルの音声を音声分析装置１００に送信する（Ｓ１３）。音声分析装置１００の音声取得部１１２は、集音装置１０から音声を受信して音声記憶部１３２に記憶させる。 Next, the voice acquisition unit 112 of the voice analyzer 100 transmits a signal instructing acquisition of voice to the sound collector 10 (S12). When the sound collection device 10 receives a signal instructing the acquisition of sound from the sound analysis device 100, the sound collection device 10 starts recording the sound using a plurality of sound collection units, and the sound analysis device 100 converts the recorded sound of the plurality of channels. (S13). The voice acquisition unit 112 of the voice analysis device 100 receives the voice from the sound collection device 10 and stores it in the voice storage unit 132.

音声分析装置１００は、分析者による指示があった時、音声の取得が終了した時、又は音声を取得している途中（すなわちリアルタイム処理）のいずれかのタイミングで、音声の分析を開始する。音声を分析する際に、まず音源定位部１１３は、音声取得部１１２が取得した音声に基づいて音源定位を行う（Ｓ１４）。 The voice analysis apparatus 100 starts voice analysis at any timing when an instruction is given by an analyst, when voice acquisition is completed, or while voice is being acquired (that is, real-time processing). When analyzing sound, the sound source localization unit 113 first performs sound source localization based on the sound acquired by the sound acquisition unit 112 (S14).

次に分析部１１４は、音声取得部１１２が取得した音声及び音源定位部１１３が推定した音源の向きに基づいて、時間ごとにいずれの参加者が発言したかを判別することによって、参加者ごとに発言期間及び発言量を特定する（Ｓ１５）。分析部１１４は、参加者ごとの発言期間及び発言量を、分析結果記憶部１３３に記憶させる。 Next, the analysis unit 114 determines which participant has made a speech for each participant based on the sound acquired by the sound acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113. The speech period and the speech amount are specified (S15). The analysis unit 114 stores the speech period and the speech amount for each participant in the analysis result storage unit 133.

区間設定部１１５は、分析対象の議論に対応する音声に対して、１つ以上の区間を設定する（Ｓ１６）。このとき、区間設定部１１５は、通信端末２０における操作、集音装置１０における操作、及び集音装置１０が取得した所定の音のうち少なくとも１つに基づいて、区間を設定する。区間設定部１１５は、区間を示す区間情報を、設定対象の音声と関連付けて分析結果記憶部１３３に記憶させる。 The section setting unit 115 sets one or more sections for the voice corresponding to the discussion to be analyzed (S16). At this time, the section setting unit 115 sets a section based on at least one of an operation on the communication terminal 20, an operation on the sound collection device 10, and a predetermined sound acquired by the sound collection device 10. The section setting unit 115 stores the section information indicating the section in the analysis result storage unit 133 in association with the setting target voice.

出力部１１６は、分析結果を通信端末２０の表示部２１に表示させる制御を行う（Ｓ１７）。具体的には、出力部１１６は、分析部１１４による分析結果及び区間設定部１１５による区間情報に基づいて、上述の発言量画面Ｂ、区間抽出画面Ｃ又は発言量画面Ｄを表示させるための表示情報を生成し、通信端末２０に送信する。 The output unit 116 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, the output unit 116 is a display for displaying the above-described speech amount screen B, the section extraction screen C, or the speech amount screen D based on the analysis result by the analysis unit 114 and the section information by the section setting unit 115. Information is generated and transmitted to the communication terminal 20.

通信端末２０は、音声分析装置１００から受信した表示情報に従って、表示部２１に分析結果を表示させる（Ｓ１８）。 The communication terminal 20 displays the analysis result on the display unit 21 according to the display information received from the voice analysis device 100 (S18).

［本実施形態の効果］
ハークネス法は議論の開始から終了までの全期間の発言の傾向を示すため、議論の時系列に沿った各参加者の発言量の変化を示すことができない。そのため、各参加者の発言量の時間変化に基づく分析が難しいという問題があった。それに対して、本実施形態に係る音声分析装置１００は、区間ごとに各参加者の発言量の時間変化を表示する。これにより分析者は、各参加者の発言量の時間変化を、区間ごとに把握することができる。[Effect of this embodiment]
The Harkness method shows the tendency of the speech during the entire period from the start to the end of the discussion, so it cannot show the change in the amount of speech of each participant along the time series of the discussion. For this reason, there is a problem that it is difficult to analyze based on changes in the amount of speech of each participant. On the other hand, the speech analysis apparatus 100 according to the present embodiment displays the time change of the speech amount of each participant for each section. Thereby, the analyzer can grasp | ascertain the time change of the amount of speech of each participant for every area.

また、音声分析装置１００は、複数の集音部を有する集音装置１０を用いて取得した音声に基づいて、自動的に複数の参加者の議論を分析する。そのため、非特許文献１に記載のハークネス法のように記録者が議論を監視する必要がなく、またグループごとに記録者を配置する必要がないため、低コストである。 In addition, the voice analysis device 100 automatically analyzes the discussions of a plurality of participants based on the voice acquired using the sound collection device 10 having a plurality of sound collection units. For this reason, unlike the Harkness method described in Non-Patent Document 1, it is not necessary for the recorder to monitor the discussion, and it is not necessary to arrange a recorder for each group.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment, A various deformation | transformation and change are possible within the range of the summary. is there. For example, the specific embodiments of device distribution / integration are not limited to the above-described embodiments, and all or a part of them may be configured to be functionally or physically distributed / integrated in arbitrary units. Can do. In addition, new embodiments generated by any combination of a plurality of embodiments are also included in the embodiments of the present invention. The effect of the new embodiment produced by the combination has the effect of the original embodiment.

音声分析装置１００、集音装置１０及び通信端末２０のプロセッサは、図１０に示す音声分析方法に含まれる各ステップ（工程）の主体となる。すなわち、音声分析装置１００、集音装置１０及び通信端末２０のプロセッサは、図１０に示す音声分析方法を実行するためのプログラムを記憶部から読み出し、該プログラムを実行して音声分析装置１００、集音装置１０及び通信端末２０の各部を制御することによって、図１０に示す音声分析方法を実行する。図１０に示す音声分析方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The speech analysis apparatus 100, the sound collection apparatus 10, and the processor of the communication terminal 20 are main components of each step (process) included in the speech analysis method shown in FIG. That is, the speech analysis device 100, the sound collection device 10, and the processor of the communication terminal 20 read a program for executing the speech analysis method shown in FIG. 10 from the storage unit, and execute the program to execute the speech analysis device 100, the collection device. The voice analysis method shown in FIG. 10 is executed by controlling each part of the sound device 10 and the communication terminal 20. Part of the steps included in the speech analysis method shown in FIG. 10 may be omitted, the order between steps may be changed, and a plurality of steps may be performed in parallel.

Ｓ音声分析システム
１００音声分析装置
１１０制御部
１１２音声取得部
１１４分析部
１１５区間設定部
１１６出力部
１０集音装置
２０通信端末
２１表示部S voice analysis system 100 voice analysis device 110 control unit 112 voice acquisition unit 114 analysis unit 115 section setting unit 116 output unit 10 sound collector 20 communication terminal 21 display unit

Claims

An acquisition unit that acquires audio generated by a plurality of participants belonging to a group in association with the group ;
An analysis unit for identifying a speech amount for each of the plurality of participants in the voice;
Based on the input from the user, the Keru your voice and interval setting unit for setting a name of at least a portion of section and the section,
An output unit that outputs a graph in which temporal changes in the amount of speech of the plurality of participants are stacked, and information indicating the section in the graph;
I have a,
The output unit associates the plurality of graphs of the section in which the same name is set in the plurality of voices associated with the plurality of groups with each of the plurality of groups corresponding to each of the plurality of graphs. Output voice analysis device.

The speech analysis apparatus according to claim 1, wherein the output unit outputs a time position at a boundary between two consecutive sections as information indicating the section on the graph .

The section setting unit, as the input from the user, the operation by the user in a communication terminal communicating with the sound analysis device, and the operation by the user in the sound collecting device for acquiring the voice, the user is generated The speech analysis apparatus according to claim 1, wherein the section is set based on at least one of the timings at which the acquisition unit acquires the predetermined sound.

The output unit determines an order in which the temporal changes of the speech amount are accumulated based on the speech amount of each of the plurality of participants, and outputs the graph in which the temporal changes of the speech amount are accumulated in the determined order. The speech analyzer according to any one of claims 1 to 3.

The voice analysis according to claim 4 , wherein the output unit outputs the graph in which temporal changes in the amount of speech are accumulated in order of increasing degree of variation in the amount of speech calculated for each of the plurality of participants. apparatus.

The output unit outputs the graph obtained by accumulating the temporal changes of the speech amount for each section in order of increasing degree of variation of the speech amount for each section calculated for each of the plurality of participants. The speech analysis apparatus according to claim 5 .

The voice analysis device according to any one of claims 1 to 6, wherein information indicating an event occurring within the time of the voice is output on the graph in addition to the information indicating the graph and the section.

8. The analysis unit according to claim 1, wherein the analysis unit specifies a value obtained by dividing the length of time during which a participant has made a speech within a predetermined time window by the length of the time window as the speech amount. The voice analysis device according to claim 1.

Processor
Obtaining audio from a plurality of participants belonging to a group in association with the group ;
Identifying the amount of speech per time for each of the plurality of participants in the voice;
A step of, based on input from the user, to set the name of your Keru least some sections and the sections in the speech,
Outputting a graph obtained by accumulating time changes in the amount of speech of the plurality of participants, and information indicating the section in the graph;
The execution,
The outputting step associates the plurality of graphs of the section in which the same name is set in the plurality of voices associated with the plurality of groups with the plurality of groups corresponding to the plurality of graphs, respectively. Output voice analysis method.

On the computer,
Obtaining audio from a plurality of participants belonging to a group in association with the group ;
Identifying the amount of speech per time for each of the plurality of participants in the voice;
A step of, based on input from the user, to set the name of your Keru least some sections and the sections in the speech,
Outputting a graph obtained by accumulating time changes in the amount of speech of the plurality of participants, and information indicating the section in the graph;
Was executed,
The outputting step associates the plurality of graphs of the section in which the same name is set in the plurality of voices associated with the plurality of groups with the plurality of groups corresponding to the plurality of graphs, respectively. voice analysis program that output Te.

A voice analysis device, and a communication terminal capable of communicating with the voice analysis device,
The communication terminal has a display unit for displaying information,
The speech analyzer is
An acquisition unit that acquires audio generated by a plurality of participants belonging to a group in association with the group ;
An analysis unit for identifying a speech amount for each of the plurality of participants in the voice;
Based on the input from the user, the Keru your voice and interval setting unit for setting a name of at least a portion of section and the section,
An output unit that causes the display unit to display a graph in which temporal changes in the amount of speech of the plurality of participants are stacked, and information indicating the section in the graph;
I have a,
The output unit associates the plurality of graphs of the section in which the same name is set in the plurality of voices associated with the plurality of groups with each of the plurality of groups corresponding to each of the plurality of graphs. Output voice analysis system.