JP7414319B2

JP7414319B2 - Speech analysis device, speech analysis method, speech analysis program and speech analysis system

Info

Publication number: JP7414319B2
Application number: JP2022147338A
Authority: JP
Inventors: 武志水本; 哲也菅原
Original assignee: Hylable Inc
Current assignee: Hylable Inc
Priority date: 2021-11-08
Filing date: 2022-09-15
Publication date: 2024-01-16
Anticipated expiration: 2038-01-16
Also published as: JP2022174241A

Description

本発明は、音声を分析するための音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムに関する。 The present invention relates to a speech analysis device, a speech analysis method, a speech analysis program, and a speech analysis system for analyzing speech.

グループ学習や会議における議論を分析する方法として、ハークネス法（ハークネスメソッドともいう）が知られている（例えば、非特許文献１参照）。ハークネス法では、各参加者の発言の遷移を線で記録する。これにより、各参加者の議論への貢献や、他者との関係性を分析することができる。ハークネス法は、学生が主体的に学習を行うアクティブ・ラーニングにも効果的に適用できる。 The Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group studies and meetings (see, for example, Non-Patent Document 1). In the Harkness method, the transition of each participant's utterances is recorded with lines. This makes it possible to analyze each participant's contribution to the discussion and their relationship with others. The Harkness method can also be effectively applied to active learning, where students learn independently.

Paul Sevigny、「Extreme Discussion Circles : Preparing ESL Students for "The Harkness Method"」、Polyglossia、立命館アジア太平洋大学言語教育センター、平成24年10月、第23号、p. 181-191Paul Sevigny, "Extreme Discussion Circles: Preparing ESL Students for "The Harkness Method"," Polyglossia, Ritsumeikan Asia Pacific University Language Education Center, October 2012, No. 23, p. 181-191

学校や組織等でハークネス法のような分析を行う際には、複数のグループが同時に議論を行うことが多い。その場合に、議論の補助者（教師、ファシリテータ等）は複数のグループを掛け持つことになるため、各グループの議論における発言の状況を把握することが難しいという問題があった。 When conducting an analysis such as the Harkness method in a school or organization, multiple groups often engage in discussions at the same time. In this case, discussion assistants (teachers, facilitators, etc.) are assigned to multiple groups, which poses a problem in that it is difficult to grasp the status of each group's statements in the discussion.

本発明はこれらの点に鑑みてなされたものであり、複数のグループにおける発言の状況を示す情報を出力できる音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムを提供することを目的とする。 The present invention has been made in view of these points, and an object of the present invention is to provide a speech analysis device, a speech analysis method, a speech analysis program, and a speech analysis system that can output information indicating the status of speech in a plurality of groups. do.

本発明の第１の態様の音声分析装置は、複数の集音装置から複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者それぞれの発言を特定する分析部と、前記複数の集音装置それぞれに関連付けて前記発言の状況を示す情報を表示部に表示させる出力部と、を有する。 A voice analysis device according to a first aspect of the present invention includes an acquisition unit that acquires voices uttered by a plurality of participants from a plurality of sound collection devices, and an analysis that identifies utterances of each of the plurality of participants in the voice. and an output unit that causes a display unit to display information indicating the status of the utterance in association with each of the plurality of sound collecting devices.

前記出力部は、前記集音装置ごと又は前記参加者ごとの発言量を示す情報を、前記発言の状況を示す情報として前記表示部に表示させてもよい。 The output unit may cause the display unit to display information indicating the amount of speech for each sound collection device or for each participant as information indicating the status of the speech.

前記出力部は、前記音声分析装置と通信する通信端末に設けられた前記表示部に、前記発言の状況を示す情報を表示させてもよい。 The output unit may display information indicating the state of the utterance on the display unit provided on a communication terminal that communicates with the speech analysis device.

前記出力部は、前記複数の集音装置それぞれの位置に対応する前記通信端末に設けられた前記表示部上の位置に、前記発言の状況を示す情報を表示させてもよい。 The output unit may display information indicating the status of the utterance at a position on the display unit provided in the communication terminal that corresponds to a position of each of the plurality of sound collecting devices.

前記出力部は、前記複数の集音装置のいずれかに所定の人物が接近した場合に、該集音装置の位置に対応する前記通信端末に設けられた前記表示部上の位置に、前記人物の接近を示す情報を表示させてもよい。 When a predetermined person approaches any of the plurality of sound collecting devices, the output unit displays the person at a position on the display unit provided on the communication terminal corresponding to the position of the sound collecting device. Information indicating the approach of the vehicle may be displayed.

前記音声分析装置は、前記複数の集音装置の間で授受される信号に基づいて、前記複数の集音装置それぞれの位置を設定する設定部をさらに有してもよい。 The voice analysis device may further include a setting unit that sets the position of each of the plurality of sound collecting devices based on signals exchanged between the plurality of sound collecting devices.

前記出力部は、前記複数の集音装置それぞれに設けられた前記表示部に、前記発言の状況を示す情報を表示させてもよい。 The output unit may cause the display unit provided in each of the plurality of sound collecting devices to display information indicating the status of the utterance.

前記出力部は、前記複数の集音装置それぞれに設けられた前記表示部に、前記集音装置ごとの前記発言の状況を示す情報を表示させてもよい。 The output unit may cause the display unit provided in each of the plurality of sound collection devices to display information indicating the status of the utterance for each of the sound collection devices.

前記出力部は、前記複数の集音装置それぞれにおいて前記複数の参加者それぞれに向けて設けられた前記表示部に、前記参加者ごとの前記発言の状況を示す情報を表示させてもよい。 The output unit may cause the display unit provided for each of the plurality of participants in each of the plurality of sound collection devices to display information indicating the state of the utterance of each of the participants.

本発明の第２の態様の音声分析方法は、プロセッサが、複数の集音装置から複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者それぞれの発言を特定するステップと、前記複数の集音装置それぞれに関連付けて前記発言の状況を示す情報を表示部に表示させるステップと、を実行する。 A voice analysis method according to a second aspect of the present invention includes a step in which a processor acquires voices uttered by a plurality of participants from a plurality of sound collection devices, and identifies utterances of each of the plurality of participants in the voices. and displaying information indicating the status of the utterance on a display unit in association with each of the plurality of sound collecting devices.

本発明の第３の態様の音声分析プログラムは、コンピュータに、複数の集音装置から複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者それぞれの発言を特定するステップと、前記複数の集音装置それぞれに関連付けて前記発言の状況を示す情報を表示部に表示させるステップと、を実行させる。 A voice analysis program according to a third aspect of the present invention includes the steps of: acquiring voices uttered by a plurality of participants from a plurality of sound collecting devices to a computer; and identifying utterances of each of the plurality of participants in the voices. and displaying information indicating the status of the utterance on a display unit in association with each of the plurality of sound collecting devices.

本発明の第４の態様の音声分析システムは、音声分析装置と、前記音声分析装置と通信可能な通信端末及び複数の集音装置と、を備え、前記通信端末及び前記複数の集音装置の少なくとも一方は、情報を表示する表示部を有し、前記音声分析装置は、前記複数の集音装置から複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者それぞれの発言を特定する分析部と、前記複数の集音装置それぞれに関連付けて前記発言の状況を示す情報を前記表示部に表示させる出力部と、を有する。 A voice analysis system according to a fourth aspect of the present invention includes a voice analysis device, a communication terminal capable of communicating with the voice analysis device, and a plurality of sound collection devices, At least one of the devices includes a display unit that displays information, and the voice analysis device includes an acquisition unit that acquires voices uttered by the plurality of participants from the plurality of sound collection devices; and an output unit that causes the display unit to display information indicating the status of the utterance in association with each of the plurality of sound collecting devices.

本発明によれば、複数のグループにおける発言の状況を示す情報を出力できるという効果を奏する。 According to the present invention, it is possible to output information indicating the status of comments in a plurality of groups.

本実施形態に係る音声分析システムの模式図である。FIG. 1 is a schematic diagram of a speech analysis system according to the present embodiment. 本実施形態に係る音声分析システムのブロック図である。FIG. 1 is a block diagram of a speech analysis system according to the present embodiment. 本実施形態に係る音声分析システムが行う音声分析方法の模式図である。It is a schematic diagram of the speech analysis method performed by the speech analysis system concerning this embodiment. 設定部によるグループ位置情報の設定方法の模式図である。FIG. 3 is a schematic diagram of a method for setting group position information by a setting section. 参加者設定画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying a participant setting screen. グループ及び参加者の発言の状況を表示している集音装置の側面図である。FIG. 2 is a side view of a sound collection device displaying the status of speech by a group and participants. グループの発言の状況を表示している通信端末の表示部の前面図である。FIG. 3 is a front view of a display unit of a communication terminal displaying the status of group comments. 参加者の発言の状況を表示している通信端末の表示部の前面図である。FIG. 3 is a front view of a display unit of a communication terminal displaying the status of participants' statements. 本実施形態に係る音声分析システムが行う音声分析方法のシーケンス図である。FIG. 2 is a sequence diagram of a speech analysis method performed by the speech analysis system according to the present embodiment.

［音声分析システムＳの概要］
図１は、本実施形態に係る音声分析システムＳの模式図である。音声分析システムＳは、音声分析装置１００と、集音装置１０と、通信端末２０とを含む。音声分析システムＳが含む集音装置１０及び通信端末２０の数は限定されない。音声分析システムＳは、その他のサーバ、端末等の機器を含んでもよい。 [Overview of speech analysis system S]
FIG. 1 is a schematic diagram of a speech analysis system S according to this embodiment. The speech analysis system S includes a speech analysis device 100, a sound collection device 10, and a communication terminal 20. The number of sound collection devices 10 and communication terminals 20 included in the voice analysis system S is not limited. The speech analysis system S may also include other devices such as servers and terminals.

音声分析装置１００、集音装置１０及び通信端末２０は、ローカルエリアネットワーク、インターネット等のネットワークＮを介して接続される。音声分析装置１００、集音装置１０及び通信端末２０のうち少なくとも一部は、ネットワークＮを介さず直接接続されてもよい。 The voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least some of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without going through the network N.

集音装置１０は、異なる向きに配置された複数の集音部（マイクロフォン）を含むマイクロフォンアレイを備える。例えばマイクロフォンアレイは、地面に対する水平面において、同一円周上に等間隔で配置された８個のマイクロフォンを含む。集音装置１０は、マイクロフォンアレイを用いて取得した音声をデータとして音声分析装置１００に送信する。 The sound collecting device 10 includes a microphone array including a plurality of sound collecting sections (microphones) arranged in different directions. For example, a microphone array includes eight microphones arranged at equal intervals on the same circumference in a horizontal plane relative to the ground. The sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.

通信端末２０は、有線又は無線の通信を行うことが可能な通信装置である。通信端末２０は、例えばスマートフォン端末等の携帯端末、又はパーソナルコンピュータ等のコンピュータ端末である。通信端末２０は、分析者から分析条件の設定を受け付けるとともに、音声分析装置１００による分析結果を表示する。通信端末２０は、議論の分析者又は補助者によって保持されてもよく、あるいは各集音装置１０の近傍に配置されてもよい。 The communication terminal 20 is a communication device that can perform wired or wireless communication. The communication terminal 20 is, for example, a mobile terminal such as a smartphone terminal, or a computer terminal such as a personal computer. The communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis results by the speech analysis device 100. The communication terminal 20 may be held by the discussion analyst or assistant, or may be placed near each sound collection device 10.

音声分析装置１００は、集音装置１０によって取得された音声を、後述の音声分析方法によって分析するコンピュータである。また、音声分析装置１００は、音声分析の結果を通信端末２０に送信する。 The voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 using a voice analysis method described below. Additionally, the voice analysis device 100 transmits the voice analysis results to the communication terminal 20.

［音声分析システムＳの構成］
図２は、本実施形態に係る音声分析システムＳのブロック図である。図２において、矢印は主なデータの流れを示しており、図２に示していないデータの流れがあってよい。図２において、各ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図２に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。 [Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to this embodiment. In FIG. 2, arrows indicate main data flows, and there may be data flows that are not shown in FIG. In FIG. 2, each block shows the configuration of a functional unit rather than a hardware (device) unit. As such, the blocks shown in FIG. 2 may be implemented within a single device or may be implemented separately within multiple devices. Data may be exchanged between blocks via any means such as a data bus, a network, or a portable storage medium.

集音装置１０は、上述の集音部に加えて、グループ全体の状況を表示する全体ランプ１１と、各参加者の状況を表示する個別ランプ１２とを有する。全体ランプ１１の数は、１つの集音装置１０につき少なくとも１つである。個別ランプ１２は、少なくとも１つの集音装置１０が処理することが可能な参加者の数（すなわち１つの議論のグループに所属可能な参加者の数以上）である。各個別ランプ１２は、集音装置１０を取り囲む各参加者に対応するように設けられる。例えば集音装置１０上の各参加者の正面に、該参加者に対応する少なくとも１つの個別ランプ１２が設けられる。 In addition to the above-mentioned sound collecting section, the sound collecting device 10 has an overall lamp 11 that displays the status of the entire group, and an individual lamp 12 that displays the status of each participant. The total number of lamps 11 is at least one per sound collector 10. The number of individual lamps 12 is equal to or greater than the number of participants that can be handled by at least one sound collection device 10 (that is, the number of participants that can belong to one discussion group). Each individual lamp 12 is provided to correspond to each participant surrounding the sound collection device 10. For example, in front of each participant on the sound collection device 10, at least one individual lamp 12 is provided corresponding to that participant.

全体ランプ１１及び個別ランプ１２は、それぞれ音声分析装置１００による制御に従って発光する表示部である。例えば全体ランプ１１及び個別ランプ１２は、音声分析装置１００から受信した信号に応じて、光を点滅させたり、光の色を変更したり、あるいは光の強度を変更したりすることができる。全体ランプ１１及び個別ランプ１２として、参加者又は補助者に対して発言の状況を示すことが可能であれば、液晶ディスプレイ等のその他表示装置を用いてもよい。 The overall lamp 11 and the individual lamps 12 are display units that emit light under the control of the audio analysis device 100, respectively. For example, the overall lamp 11 and the individual lamps 12 can blink the light, change the color of the light, or change the intensity of the light depending on the signal received from the speech analysis device 100. Other display devices such as a liquid crystal display may be used as the overall lamp 11 and the individual lamps 12 as long as they can show the speaking status to the participants or assistants.

通信端末２０は、各種情報を表示するための表示部２１と、分析者による操作を受け付けるための操作部２２とを有する。表示部２１は、液晶ディスプレイ、有機エレクトロルミネッセンス（OLED: Organic Light Emitting Diode）ディスプレイ等の表示装置を含む。操作部２２は、ボタン、スイッチ、ダイヤル等の操作部材を含む。表示部２１として分析者による接触の位置を検出可能なタッチスクリーンを用いることによって、表示部２１と操作部２２とを一体に構成してもよい。 The communication terminal 20 includes a display section 21 for displaying various information and an operation section 22 for accepting operations by an analyst. The display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display. The operation unit 22 includes operation members such as buttons, switches, and dials. The display section 21 and the operation section 22 may be integrally configured by using a touch screen capable of detecting the position of touch by the analyst as the display section 21.

音声分析装置１００は、制御部１１０と、通信部１２０と、記憶部１３０とを有する。制御部１１０は、設定部１１１と、音声取得部１１２と、音源定位部１１３と、分析部１１４と、出力部１１５とを有する。記憶部１３０は、設定情報記憶部１３１と、音声記憶部１３２と、分析結果記憶部１３３とを有する。 The speech analysis device 100 includes a control section 110, a communication section 120, and a storage section 130. The control section 110 includes a setting section 111 , an audio acquisition section 112 , a sound source localization section 113 , an analysis section 114 , and an output section 115 . The storage unit 130 includes a setting information storage unit 131 , a voice storage unit 132 , and an analysis result storage unit 133 .

通信部１２０は、ネットワークＮを介して集音装置１０及び通信端末２０との間で通信をするための通信インターフェースである。通信部１２０は、通信を実行するためのプロセッサ、コネクタ、電気回路等を含む。通信部１２０は、外部から受信した通信信号に所定の処理を行ってデータを取得し、取得したデータを制御部１１０に入力する。また、通信部１２０は、制御部１１０から入力されたデータに所定の処理を行って通信信号を生成し、生成した通信信号を外部に送信する。 The communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N. The communication unit 120 includes a processor, a connector, an electric circuit, etc. for performing communication. The communication unit 120 acquires data by performing predetermined processing on communication signals received from the outside, and inputs the acquired data to the control unit 110. Furthermore, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.

記憶部１３０は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ハードディスクドライブ等を含む記憶媒体である。記憶部１３０は、制御部１１０が実行するプログラムを予め記憶している。記憶部１３０は、音声分析装置１００の外部に設けられてもよく、その場合に通信部１２０を介して制御部１１０との間でデータの授受を行ってもよい。 The storage unit 130 is a storage medium including a ROM (Read Only Memory), a RAM (Random Access Memory), a hard disk drive, and the like. The storage unit 130 stores in advance a program to be executed by the control unit 110. The storage unit 130 may be provided outside the speech analysis device 100, and in that case, data may be exchanged with the control unit 110 via the communication unit 120.

設定情報記憶部１３１は、グループ及び参加者の位置を含む分析条件を示す設定情報を記憶する。音声記憶部１３２は、集音装置１０によって取得された音声を記憶する。分析結果記憶部１３３は、音声を分析した結果を示す分析結果を記憶する。設定情報記憶部１３１、音声記憶部１３２及び分析結果記憶部１３３は、それぞれ記憶部１３０上の記憶領域であってもよく、あるいは記憶部１３０上で構成されたデータベースであってもよい。 The setting information storage unit 131 stores setting information indicating analysis conditions including the positions of groups and participants. The audio storage unit 132 stores the audio acquired by the sound collection device 10. The analysis result storage unit 133 stores analysis results indicating the results of analyzing speech. The setting information storage section 131, the voice storage section 132, and the analysis result storage section 133 may each be a storage area on the storage section 130, or may be a database configured on the storage section 130.

制御部１１０は、例えばＣＰＵ（Central Processing Unit）等のプロセッサであり、記憶部１３０に記憶されたプログラムを実行することにより、設定部１１１、音声取得部１１２、音源定位部１１３、分析部１１４及び出力部１１５として機能する。設定部１１１、音声取得部１１２、音源定位部１１３、分析部１１４及び出力部１１５の機能については、図３～図８を用いて後述する。制御部１１０の機能の少なくとも一部は、電気回路によって実行されてもよい。また、制御部１１０の機能の少なくとも一部は、ネットワーク経由で実行されるプログラムによって実行されてもよい。 The control unit 110 is, for example, a processor such as a CPU (Central Processing Unit), and executes a program stored in the storage unit 130 to control the setting unit 111, the audio acquisition unit 112, the sound source localization unit 113, the analysis unit 114, and the like. It functions as an output section 115. The functions of the setting section 111, audio acquisition section 112, sound source localization section 113, analysis section 114, and output section 115 will be described later using FIGS. 3 to 8. At least some of the functions of the control unit 110 may be performed by an electrical circuit. Further, at least part of the functions of the control unit 110 may be executed by a program executed via a network.

本実施形態に係る音声分析システムＳは、図２に示す具体的な構成に限定されない。例えば音声分析装置１００は、１つの装置に限られず、２つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。 The speech analysis system S according to this embodiment is not limited to the specific configuration shown in FIG. 2. For example, the speech analysis device 100 is not limited to one device, and may be configured by two or more physically separate devices connected by wire or wirelessly.

［音声分析方法の説明］
図３は、本実施形態に係る音声分析システムＳが行う音声分析方法の模式図である。まず分析者は、通信端末２０の操作部２２を操作することによって、分析条件を設定する。通信端末２０は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置１００に送信する（ａ）。音声分析装置１００の設定部１１１は、通信端末２０から受信した設定情報又は設定部１１１自身が特定した設定情報を設定情報記憶部１３１に記憶させる。 [Explanation of voice analysis method]
FIG. 3 is a schematic diagram of a speech analysis method performed by the speech analysis system S according to this embodiment. First, the analyst sets analysis conditions by operating the operation unit 22 of the communication terminal 20. The communication terminal 20 receives settings for analysis conditions from the analyst, and transmits them as setting information to the speech analysis device 100 (a). The setting unit 111 of the speech analysis device 100 causes the setting information storage unit 131 to store the setting information received from the communication terminal 20 or the setting information specified by the setting unit 111 itself.

設定情報は、１つの集音装置１０に関するグループにおける各参加者（すなわち、複数の参加者それぞれ）の位置を示す参加者位置情報と、同時に議論を行う各グループ（すなわち、複数のグループそれぞれ）の位置を示すグループ位置情報とを含む。各グループの位置は、各集音装置１０（すなわち、複数の集音装置１０それぞれ）の位置に対応する。例えば参加者位置情報は、分析対象とする議論の参加者の人数と、集音装置１０を基準とした各参加者が位置する向きとを示す情報である。例えばグループ位置情報は、分析対象とするグループの数と、各グループの相対的又は絶対的な位置を示す情報である。 The setting information includes participant position information indicating the position of each participant (i.e., each of a plurality of participants) in a group regarding one sound collection device 10, and participant position information indicating the position of each participant (i.e., each of a plurality of participants) in a group related to one sound collection device 10, and participant position information indicating the position of each participant in a group (i.e., each of a plurality of groups) that discuss simultaneously. and group position information indicating the position. The position of each group corresponds to the position of each sound collection device 10 (that is, each of the plurality of sound collection devices 10). For example, the participant position information is information indicating the number of participants in the discussion to be analyzed and the direction in which each participant is located with respect to the sound collection device 10. For example, group position information is information indicating the number of groups to be analyzed and the relative or absolute position of each group.

図４（ａ）、図４（ｂ）は、設定部１１１によるグループ位置情報の設定方法の模式図である。図４（ａ）は、グループ位置情報を設定するためのグループ設定画面Ａを表示している通信端末２０の表示部２１を表す。通信端末２０は、グループ位置情報の設定を行う際に表示部２１上にグループ設定画面Ａを表示し、分析者によるグループ位置情報の設定を受け付ける。グループ設定画面Ａは、位置設定領域Ａ１と、基準位置Ａ２と、完了ボタンＡ３と、自動設定ボタンＡ４とを含む。 FIGS. 4A and 4B are schematic diagrams of a method for setting group position information by the setting unit 111. FIG. 4A shows the display unit 21 of the communication terminal 20 displaying the group setting screen A for setting group position information. When setting the group position information, the communication terminal 20 displays the group setting screen A on the display unit 21 and accepts the setting of the group position information by the analyst. Group setting screen A includes a position setting area A1, a reference position A2, a completion button A3, and an automatic setting button A4.

位置設定領域Ａ１は、同時に行われる議論（例えば同一の室内で行われる議論）について、所定の基準位置Ａ２を基準とした各グループＧの位置を設定する領域である。例えば位置設定領域Ａ１は、図４（ａ）のように基準位置Ａ２（例えば教卓の位置）を含む矩形領域である。基準位置Ａ２は、位置設定領域Ａ１上で分析者によって指定されてもよく、あるいは音声分析装置１００に予め登録されてもよい。 The position setting area A1 is an area for setting the position of each group G based on a predetermined reference position A2 for discussions that are held simultaneously (for example, discussions that are held in the same room). For example, the position setting area A1 is a rectangular area including the reference position A2 (for example, the position of the teacher's desk) as shown in FIG. 4(a). The reference position A2 may be specified by the analyst on the position setting area A1, or may be registered in the speech analysis device 100 in advance.

分析者は、通信端末２０の操作部２２を操作することによって、位置設定領域Ａ１において各グループＧの位置を設定する。例えば分析者が位置設定領域Ａ１内のどこかを押下すると、音声分析装置１００の設定部１１１は、分析者が押下した位置を１つのグループＧの位置として設定する。各グループＧについて設定された位置には、各グループＧを識別する識別情報（ここではＧ１～Ｇ４）が割り当てられて表示される。各グループＧの識別情報は、分析者によって入力されてもよく、あるいは設定部１１１によって自動的に決定されてもよい。図４（ａ）の例では、４つのグループＧ１～Ｇ４が設定されている。 The analyst sets the position of each group G in the position setting area A1 by operating the operation unit 22 of the communication terminal 20. For example, when the analyst presses somewhere within the position setting area A1, the setting unit 111 of the speech analysis device 100 sets the position pressed by the analyst as the position of one group G. Identification information for identifying each group G (here, G1 to G4) is assigned and displayed at the position set for each group G. The identification information for each group G may be input by the analyst, or may be automatically determined by the setting unit 111. In the example of FIG. 4(a), four groups G1 to G4 are set.

完了ボタンＡ３及び自動設定ボタンＡ４は、それぞれ表示部２１上に表示された仮想的なボタンである。設定部１１１は、分析者によって完了ボタンＡ３が押下されると、グループ設定画面Ａにおいて設定された各グループＧの位置を、グループ位置情報として設定情報記憶部１３１に記憶させることにより、各グループＧの位置を設定する。そして設定部１１１は、グループ設定画面Ａにおいて設定された各グループＧについて、図５に示す参加者設定画面Ｃを通信端末２０に表示させる。 The completion button A3 and the automatic setting button A4 are virtual buttons displayed on the display unit 21, respectively. When the finish button A3 is pressed by the analyst, the setting unit 111 stores the position of each group G set on the group setting screen A in the setting information storage unit 131 as group position information. Set the position of Then, the setting unit 111 causes the communication terminal 20 to display a participant setting screen C shown in FIG. 5 for each group G set on the group setting screen A.

設定部１１１は、分析者によって自動設定ボタンＡ４が押下されると、集音装置１０から取得した情報に基づいて、各グループＧの位置を自動的に設定する。図４（ｂ）は、設定部１１１が各グループＧの位置を自動的に設定する方法の模式図である。例えば設定部１１１は、集音装置１０間で授受される音又は電波の信号に基づいて、各集音装置１０の位置、すなわち各グループＧの位置を自動的に設定する。 When the automatic setting button A4 is pressed by the analyst, the setting unit 111 automatically sets the position of each group G based on the information acquired from the sound collection device 10. FIG. 4B is a schematic diagram of a method in which the setting unit 111 automatically sets the position of each group G. For example, the setting unit 111 automatically sets the position of each sound collection device 10, that is, the position of each group G, based on sound or radio wave signals exchanged between the sound collection devices 10.

自動設定のために、各集音装置１０は、所定の音（音波又は超音波）又は所定の電波（例えばＢｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信の電波）の信号を発生させる発信機を備え、集音装置１０は該信号を受信する受信機を備える。設定部１１１は、各集音装置１０の発信機から信号を順次発生させる。設定部１１１は、各集音装置１０の受信機において信号を検出した時間及び強度を取得する。 For automatic setting, each sound collection device 10 is equipped with a transmitter that generates a signal of a predetermined sound (sound wave or ultrasonic wave) or a predetermined radio wave (for example, a radio wave of short-range wireless communication such as Bluetooth (registered trademark)). The sound collecting device 10 includes a receiver that receives the signal. The setting unit 111 causes the transmitters of each sound collection device 10 to sequentially generate signals. The setting unit 111 acquires the time and intensity at which the signal was detected by the receiver of each sound collection device 10.

次に設定部１１１は、各集音装置１０について、信号を検出した時間を用いて信号の発生源の集音装置１０を特定するとともに、信号を検出した強度を用いて発生源の集音装置１０までの距離Ｂを算出する。そして設定部１１１は、算出した集音装置１０間の距離Ｂに基づいて、例えば三角測量によって各集音装置１０の位置、すなわち各グループＧの位置を決定する。設定部１１１は、決定した各グループＧの位置を、グループ位置情報として設定情報記憶部１３１に記憶させることにより、各グループＧの位置を設定する。そして設定部１１１は、各グループＧについて、図５に示す参加者設定画面Ｃを通信端末２０に表示させる。 Next, for each sound collecting device 10, the setting unit 111 identifies the sound collecting device 10 that is the source of the signal using the time at which the signal was detected, and also specifies the sound collecting device 10 that is the source of the signal using the intensity at which the signal is detected. Calculate the distance B to 10. Then, the setting unit 111 determines the position of each sound collecting device 10, that is, the position of each group G, based on the calculated distance B between the sound collecting devices 10, for example, by triangulation. The setting unit 111 sets the position of each group G by storing the determined position of each group G in the setting information storage unit 131 as group position information. Then, the setting unit 111 causes the communication terminal 20 to display a participant setting screen C shown in FIG. 5 for each group G.

ここでは例示的に集音装置１０間で音又は電波の信号を授受することによって集音装置１０の位置を決定する方法を説明したが、設定部１１１はその他の方法で集音装置１０を決定してもよい。例えば集音装置１０の外部に設けられた装置から音又は電波の信号を発生させてもよい。 Here, a method of determining the position of the sound collecting device 10 by exchanging sound or radio wave signals between the sound collecting devices 10 has been described as an example, but the setting unit 111 determines the position of the sound collecting device 10 using other methods. You may. For example, a sound or radio wave signal may be generated from a device provided outside the sound collection device 10.

設定部１１１は、グループＧの位置の自動設定と手動設定とを組み合わせて行ってもよい。この場合に、例えば設定部１１１は、図４（ｂ）の自動設定によって設定された各グループＧの位置を図４（ａ）の位置設定領域Ａ１に表示し、さらに分析者による手動の設定を受け付ける。これにより、自動設定によって設定された各グループＧの位置を手動設定によって修正し、各グループＧの位置をより確実に設定することができる。 The setting unit 111 may perform a combination of automatic setting and manual setting of the position of group G. In this case, for example, the setting unit 111 displays the position of each group G set by the automatic setting in FIG. 4(b) in the position setting area A1 in FIG. 4(a), and further allows the analyst to manually set the position. accept. Thereby, the position of each group G set by automatic setting can be corrected by manual setting, and the position of each group G can be set more reliably.

図５は、参加者位置情報を設定するための参加者設定画面Ｃを表示している通信端末２０の表示部２１の前面図である。通信端末２０は、参加者位置情報の設定を行う際に表示部２１上に参加者設定画面Ｃを表示し、グループ設定画面Ａで設定された各グループについて分析者による参加者位置情報の設定を受け付ける。参加者設定画面Ｃは、位置設定領域Ｃ１と、開始ボタンＣ２と、終了ボタンＣ３とを含む。位置設定領域Ｃ１は、分析対象の議論において、集音装置１０を基準として各参加者Ｕが実際に位置する向きを設定する領域である。例えば位置設定領域Ｃ１は、図５のように集音装置１０の位置を中心とした円を表し、さらに円に沿って集音装置１０を基準とした角度を表している。 FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying a participant setting screen C for setting participant location information. The communication terminal 20 displays a participant setting screen C on the display unit 21 when setting participant location information, and allows the analyst to set participant location information for each group set on the group setting screen A. accept. The participant setting screen C includes a position setting area C1, a start button C2, and an end button C3. The position setting area C1 is an area for setting the direction in which each participant U is actually positioned with respect to the sound collection device 10 in a discussion to be analyzed. For example, the position setting area C1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 5, and further represents an angle with the sound collector 10 as a reference along the circle.

分析者は、通信端末２０の操作部２２を操作することによって、位置設定領域Ｃ１において各参加者Ｕの位置を入力する。設定部１１１は、分析者によって入力された各参加者Ｕの位置を、参加者位置情報として設定情報記憶部１３１に記憶させることにより、各参加者Ｕの位置を設定する。各参加者Ｕについて設定された位置の近傍には、各参加者Ｕを識別する識別情報（ここではＵ１～Ｕ４）が割り当てられて表示される。図５の例では、４人の参加者Ｕ１～Ｕ４が設定されている。位置設定領域Ｃ１内の各参加者Ｕに対応する部分は、参加者ごとに異なる色で表示される。これにより、分析者は容易に各参加者Ｕが設定されている向きを認識することができる。 The analyst inputs the position of each participant U in the position setting area C1 by operating the operation unit 22 of the communication terminal 20. The setting unit 111 sets the position of each participant U by storing the position of each participant U input by the analyst in the setting information storage unit 131 as participant position information. Identification information (here, U1 to U4) for identifying each participant U is assigned and displayed near the position set for each participant U. In the example of FIG. 5, four participants U1 to U4 are set. The portion corresponding to each participant U within the position setting area C1 is displayed in a different color for each participant. This allows the analyst to easily recognize the orientation in which each participant U is set.

開始ボタンＣ２及び終了ボタンＣ３は、それぞれ表示部２１上に表示された仮想的なボタンである。通信端末２０は、分析者によって開始ボタンＣ２が押下されると、音声分析装置１００に開始指示の信号を送信する。通信端末２０は、分析者によって終了ボタンＣ３が押下されると、音声分析装置１００に終了指示の信号を送信する。本実施形態では、分析者による開始指示から終了指示までを１つの議論とする。 The start button C2 and the end button C3 are virtual buttons displayed on the display unit 21, respectively. When the analyst presses the start button C2, the communication terminal 20 transmits a start instruction signal to the speech analysis device 100. When the analyst presses the end button C3, the communication terminal 20 transmits a signal instructing the end to the voice analysis device 100. In the present embodiment, one discussion includes from the start instruction to the end instruction by the analyst.

音声分析装置１００の音声取得部１１２は、通信端末２０から開始指示の信号を受信した場合に、音声の取得を指示する信号を集音装置１０に送信する（ｂ）。集音装置１０は、音声分析装置１００から音声の取得を指示する信号を受信した場合に、音声の取得を開始する。また、音声分析装置１００の音声取得部１１２は、通信端末２０から終了指示の信号を受信した場合に、音声の取得の終了を指示する信号を集音装置１０に送信する。集音装置１０は、音声分析装置１００から音声の取得の終了を指示する信号を受信した場合に、音声の取得を終了する。 When the voice acquisition unit 112 of the voice analysis device 100 receives the start instruction signal from the communication terminal 20, it transmits a signal instructing to acquire the voice to the sound collection device 10 (b). The sound collection device 10 starts acquiring audio when receiving a signal instructing audio acquisition from the audio analysis device 100 . Furthermore, when the audio acquisition unit 112 of the audio analysis device 100 receives a termination instruction signal from the communication terminal 20, it transmits a signal instructing the termination of audio acquisition to the sound collection device 10. When the sound collection device 10 receives a signal from the speech analysis device 100 instructing the end of the audio acquisition, the sound collection device 10 ends the audio acquisition.

集音装置１０は、複数の集音部においてそれぞれ音声を取得し、各集音部に対応する各チャネルの音声として内部に記録する。そして集音装置１０は、取得した複数のチャネルの音声を、音声分析装置１００に送信する（ｃ）。集音装置１０は、取得した音声を逐次送信してもよく、あるいは所定量又は所定時間の音声を送信してもよい。音声分析装置１００の音声取得部１１２は、集音装置１０から音声を受信して音声記憶部１３２に記憶させる。 The sound collection device 10 acquires sounds from each of the plurality of sound collection sections and internally records them as sounds of each channel corresponding to each sound collection section. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collection device 10 may transmit the acquired sounds one after another, or may transmit a predetermined amount or a predetermined amount of time. The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores it in the voice storage unit 132.

音声分析装置１００は、集音装置１０から取得したグループごとの音声を、逐次、すなわちリアルタイム処理で分析する。例えば音声分析装置１００は、現在時間から遡って過去の所定時間分（例えば３０秒間）の音声を順次分析対象とする。 The voice analysis device 100 analyzes the voices of each group acquired from the sound collection device 10 sequentially, that is, in real-time processing. For example, the voice analysis device 100 sequentially analyzes voices for a predetermined period of time (for example, 30 seconds) in the past going back from the current time.

音声を分析する際に、まず音源定位部１１３は、音声取得部１１２が取得した複数チャネルの音声に基づいて音源定位を行う（ｄ）。音源定位は、音声取得部１１２が取得した音声に含まれる音源の向きを、時間ごと（例えば１０ミリ秒～１００ミリ秒ごと）に推定する処理である。音源定位部１１３は、時間ごとに推定した音源の向きを、設定情報記憶部１３１に記憶された設定情報が示す参加者の向きと関連付ける。 When analyzing audio, the sound source localization section 113 first localizes the sound source based on the multiple channels of audio acquired by the audio acquisition section 112 (d). Sound source localization is a process of estimating the direction of a sound source included in the audio acquired by the audio acquisition unit 112 at each time (for example, every 10 to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.

音源定位部１１３は、集音装置１０から取得した音声に基づいて音源の向きを特定可能であれば、ＭＵＳＩＣ（Multiple Signal Classification）法、ビームフォーミング法等、公知の音源定位方法を用いることができる。 The sound source localization unit 113 can use a known sound source localization method such as the MUSIC (Multiple Signal Classification) method or the beamforming method, as long as the direction of the sound source can be specified based on the sound acquired from the sound collection device 10. .

次に分析部１１４は、音声取得部１１２が取得した音声及び音源定位部１１３が推定した音源の向きに基づいて、音声を分析する（ｅ）。具体的には、まず分析部１１４は、音声取得部１１２が取得した音声及び音源定位部１１３が推定した音源の向きに基づいて、分析対象の議論において、時間ごと（例えば１０ミリ秒～１００ミリ秒ごと）に、いずれの参加者が発言（発声）したかを判別する。分析部１１４は、１人の参加者が発言を開始してから終了するまでの連続した期間を発言期間として特定し、分析結果記憶部１３３に記憶させる。同じ時間に複数の参加者が発言を行った場合には、分析部１１４は、参加者ごとに発言期間を特定する。 Next, the analysis unit 114 analyzes the audio based on the audio acquired by the audio acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e). Specifically, first, the analysis unit 114 analyzes the analysis target every time (for example, from 10 milliseconds to 100 milliseconds) in the discussion of the analysis target based on the sound acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113. (every second), it is determined which participant spoke (uttered). The analysis unit 114 specifies a continuous period from when one participant starts speaking until it ends as a speaking period, and stores it in the analysis result storage unit 133. If multiple participants speak at the same time, the analysis unit 114 identifies the speaking period for each participant.

また、分析部１１４は、時間ごとの各参加者の発言量を算出し、分析結果記憶部１３３に記憶させる。具体的には、分析部１１４は、ある時間窓（例えば５秒間）において、参加者の発言を行った時間の長さを時間窓の長さで割った値を、時間ごとの発言量として算出する。そして分析部１１４は、議論の開始時間から終了時間（リアルタイム処理の場合には現在）まで、時間窓を所定の時間（例えば１秒）ずつずらしながら、各参加者について時間ごとの発言量の算出を繰り返す。また、分析部１１４は、グループに属する参加者の発言量の時間ごとの合計値を、グループの時間ごとの発言量（活性度）として算出する。 Furthermore, the analysis unit 114 calculates the amount of speech by each participant for each time period, and stores it in the analysis result storage unit 133. Specifically, the analysis unit 114 calculates the amount of speech per hour by dividing the length of time a participant speaks by the length of the time window in a certain time window (for example, 5 seconds). do. The analysis unit 114 then shifts the time window by a predetermined time (for example, 1 second) from the start time of the discussion to the end time (in the case of real-time processing, the current time), and calculates the amount of speech for each participant per hour. repeat. In addition, the analysis unit 114 calculates the total value of the amount of speech of participants belonging to the group for each time as the amount of speech (activity level) of the group for each time.

そして分析部１１４は、参加者及びグループの発言量を用いて、参加者及びグループの発言の状況を特定する。分析部１１４は、参加者の発言の状況として、例えば現在から遡って所定期間（例えば２０秒間）の参加者の発言量の合計値又は積分値、あるいは該発言量の合計値又は積分値の参加者間の比（すなわち相対値）を算出する。このように算出された参加者の発言の状況は、各参加者が議論に対する貢献度を表す指標として利用できる。 Then, the analysis unit 114 uses the amount of comments made by the participants and groups to identify the state of comments made by the participants and groups. The analysis unit 114 determines, as the state of the participants' comments, the total or integral value of the amount of comments made by the participants over a predetermined period (for example, 20 seconds) going back from the present, or the total value or the integral value of the amount of comments made by the participants. Calculate the ratio (i.e. relative value) between the two. The status of participants' statements calculated in this manner can be used as an index representing the degree of contribution of each participant to the discussion.

また、分析部１１４は、グループの発言の状況として、例えば現在から遡って所定期間（例えば２０秒間）のグループの発言量の合計値又は積分値、あるいは該発言量の合計値又は積分値のグループ間の比（すなわち相対値）を算出する。このように算出されたグループの発言の状況は、各グループにおける議論の盛り上がりの程度を表す指標として利用できる。分析部１１４は、各参加者及び各グループの発言の状況を示す状況情報を、分析結果として分析結果記憶部１３３に記憶させる。 The analysis unit 114 also determines the status of the group's comments, such as the total or integrated value of the group's amount of comments over a predetermined period (for example, 20 seconds) retroactively, or the group's total or integrated value of the amount of comments. Calculate the ratio (i.e. relative value) between The status of the group's statements calculated in this way can be used as an index representing the level of excitement of the discussion in each group. The analysis unit 114 causes the analysis result storage unit 133 to store situation information indicating the state of statements of each participant and each group as an analysis result.

グループの発言の状況として、グループの雰囲気を用いてもよい。グループの雰囲気は、議論において発言者の交替が多いか少ないかの雰囲気を示す情報である。具体的には、分析部１１４は、同じ参加者Ｕ間の発言の遷移（すなわち、ある発言期間の後に別の発言期間に切り替わったこと）の回数の平均値と、異なる参加者間の発言の遷移の回数の平均値とを算出し、それらの間の比をグループの雰囲気として特定する。例えば同じ参加者Ｕ間の発言の遷移の比が大きい場合にそのグループは１人の参加者が長く発言する傾向にあり、異なる参加者間の発言の遷移の比が大きい場合にそのグループは複数の参加者が交替で発言する傾向にある。 The atmosphere of the group may be used as the status of the group's comments. The atmosphere of the group is information indicating whether there are many or few changes of speakers in the discussion. Specifically, the analysis unit 114 calculates the average number of speech transitions between the same participants U (that is, switching from one speech period to another speech period) and the number of speech transitions between different participants. The average number of transitions is calculated, and the ratio between them is specified as the atmosphere of the group. For example, when the ratio of speech transitions between the same participants U is large, one participant in that group tends to speak for a long time, and when the ratio of speech transitions between different participants is large, the group tends to have multiple participants. Participants tend to take turns speaking.

出力部１１５は、分析部１１４が特定した各参加者及び各グループの発言の状況を、該参加者及び該グループに対応する集音装置１０に関連付けて、集音装置１０及び通信端末２０の表示部に表示させる。具体的には、出力部１１５は、分析結果記憶部１３３に記憶された状況情報を読み出して集音装置１０に送信することによって、分析部１１４が特定した発言の状況を示す情報を、全体ランプ１１及び個別ランプ１２に表示させる制御を行う（ｆ）。 The output unit 115 associates the speech status of each participant and each group identified by the analysis unit 114 with the sound collection device 10 corresponding to the participant and the group, and displays it on the sound collection device 10 and the communication terminal 20. Display it in the section. Specifically, the output unit 115 reads the situation information stored in the analysis result storage unit 133 and transmits it to the sound collection device 10, thereby transmitting information indicating the situation of the statement identified by the analysis unit 114 to the overall lamp. 11 and the individual lamps 12 (f).

また、出力部１１５は、分析結果記憶部１３３に記憶された状況情報を読み出して通信端末２０に送信することによって、分析部１１４が特定した発言の状況を示す情報を、表示部２１に表示させる制御を行う（ｇ）。出力部１１５による発言の状況の出力方法を、図６～図８を用いて以下に説明する。 The output unit 115 also causes the display unit 21 to display information indicating the status of the statement identified by the analysis unit 114 by reading out the status information stored in the analysis result storage unit 133 and transmitting it to the communication terminal 20. Perform control (g). A method of outputting the status of a statement by the output unit 115 will be explained below using FIGS. 6 to 8.

［発言の状況の表示方法の説明］
図６は、グループ及び参加者の発言の状況を表示している集音装置１０の側面図である。出力部１１５は、集音装置１０に設けられた全体ランプ１１及び個別ランプを用いて、グループ及び参加者の発言の状況を示す情報を表示する。具体的には、出力部１１５は、グループの発言の状況に対応する所定の光を全体ランプ１１から発生させることによって、集音装置１０にグループの発言の状況を表示させる。例えば出力部１１５は、グループの発言量の合計値又は積分値に応じて、光の点滅の速度、光の色、又は光の強度を変化させる。これにより、議論の参加者及び補助者は、グループ全体の発言の状況を容易に把握することができる。 [Explanation of how to display the speaking status]
FIG. 6 is a side view of the sound collection device 10 displaying the status of the group and participants' statements. The output unit 115 uses the overall lamp 11 and individual lamps provided in the sound collection device 10 to display information indicating the status of the group and participants' statements. Specifically, the output unit 115 causes the sound collection device 10 to display the status of the group's utterances by causing the overall lamp 11 to emit a predetermined light corresponding to the status of the group's utterances. For example, the output unit 115 changes the blinking speed of the light, the color of the light, or the intensity of the light depending on the total value or integral value of the amount of speech of the group. Thereby, discussion participants and assistants can easily grasp the status of statements made by the entire group.

また、出力部１１５は、参加者の発言の状況に対応する所定の光を、該参加者に対応する個別ランプ１２から発生させることによって、集音装置１０に各参加者の発言の状況を表示させる。すなわち出力部１１５は、各参加者に対応する位置（例えば各参加者の正面）に設けられている個別ランプ１２を、参加者ごとの発言の状況に応じて発光させる。例えば出力部１１５は、参加者の発言量の合計値又は積分値に応じて、光の点滅の速度、光の色、又は光の強度を変化させる。これにより、議論の参加者は自身の発言の状況を客観的に認識することができ、また議論の補助者は参加者ごとの議論への貢献度を把握することができる。 Furthermore, the output unit 115 displays the speaking status of each participant on the sound collection device 10 by generating a predetermined light corresponding to the speaking status of the participant from the individual lamp 12 corresponding to the participant. let That is, the output unit 115 causes the individual lamps 12 provided at positions corresponding to each participant (for example, in front of each participant) to emit light in accordance with the state of speech of each participant. For example, the output unit 115 changes the blinking speed of the light, the color of the light, or the intensity of the light according to the total value or integral value of the amount of speech by the participants. This allows discussion participants to objectively recognize the status of their own statements, and discussion assistants to understand the contribution of each participant to the discussion.

図７は、グループの発言の状況を表示している通信端末２０の表示部２１の前面図である。出力部１１５は、グループの発言の状況を示す情報を、全体状況画面Ｄとして通信端末２０の表示部２１に表示させる。全体状況画面Ｄは、グループごとの発言の状況を示す円Ｄ１と、補助者の位置を示すアイコンＤ２とを含む。 FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the status of group comments. The output unit 115 causes the display unit 21 of the communication terminal 20 to display information indicating the status of the group's comments as the overall status screen D. The overall situation screen D includes a circle D1 indicating the status of statements for each group, and an icon D2 indicating the position of the assistant.

出力部１１５は、設定情報記憶部１３１に記憶されたグループ位置情報が示す各グループの位置に対応する表示部２１上の位置に、グループの発言の状況に対応する表示態様の円Ｄ１を表示する。すなわち出力部１１５は、各グループ（各集音装置１０）の位置を示すマップ上に、グループの発言の状況を示す情報を表示する。例えば出力部１１５は、グループの発言量の合計値又は積分値に応じて、円Ｄ１の色、形状、模様等を変化させる。これにより、議論の分析者又は補助者は、各グループの発言の状況を容易に把握することができる。 The output unit 115 displays a circle D1 in a display mode corresponding to the group's speaking status at a position on the display unit 21 corresponding to the position of each group indicated by the group position information stored in the setting information storage unit 131. . That is, the output unit 115 displays information indicating the status of the group's statements on a map indicating the position of each group (each sound collection device 10). For example, the output unit 115 changes the color, shape, pattern, etc. of the circle D1 according to the total value or integral value of the amount of speech of the group. This allows the discussion analyst or assistant to easily grasp the status of each group's statements.

また、出力部１１５は、議論の補助者の位置に対応する表示部２１上の位置に、補助者の位置を示すアイコンＤ２を表示する。補助者の位置を検出するために、出力部１１５は、例えば集音装置１０と補助者との間で授受される信号を用いる。この場合に、補助者は例えばＢｌｕｅｔｏｏｔｈ等の無線通信の電波や超音波等によって所定の信号を発する発信機を保持し、集音装置１０は該信号を受信する受信機を備える。 Further, the output unit 115 displays an icon D2 indicating the position of the assistant at a position on the display unit 21 corresponding to the position of the assistant in the discussion. In order to detect the position of the assistant, the output unit 115 uses, for example, a signal exchanged between the sound collection device 10 and the assistant. In this case, the assistant holds a transmitter that emits a predetermined signal using radio waves or ultrasonic waves of wireless communication such as Bluetooth, and the sound collection device 10 includes a receiver that receives the signal.

そして出力部１１５は、集音装置１０の受信機において補助者の発信機からの信号を受信できた場合又は信号を受信した強度が所定の閾値以上となった場合に、集音装置１０に補助者が接近したことを検出する。また、出力部１１５は、集音装置１０の受信機において補助者の発信機からの信号を受信できなくなった場合又は信号を受信した強度が所定の閾値未満となった場合に、集音装置１０から補助者が離脱したことを検出する。 Then, when the receiver of the sound collection device 10 is able to receive a signal from the transmitter of the assistant, or when the intensity of the received signal is equal to or higher than a predetermined threshold, the output unit 115 sends the sound collection device 10 to the sound collection device 10 to provide assistance. Detects when someone approaches. Further, the output unit 115 outputs a signal to the sound collecting device 10 when the receiver of the sound collecting device 10 is unable to receive a signal from the transmitter of the assistant or when the intensity of the received signal becomes less than a predetermined threshold. Detects that the assistant has left the station.

出力部１１５はいずれかの集音装置１０に補助者が接近したことを検出した場合に、表示部２１上の該集音装置１０（グループ）の近傍にアイコンＤ２を表示する。これにより、議論の分析者は、補助者がグループに接近している場合と接近していない場合との間で各グループの発言の状況の変化を分析することができ、また補助者の評価を行うことができる。 When the output unit 115 detects that an assistant approaches one of the sound collection devices 10, it displays an icon D2 near the sound collection device 10 (group) on the display unit 21. This allows the discussion analyst to analyze changes in the situation of each group's statements between when the assistant is close to the group and when the assistant is not, and also to evaluate the assistant. It can be carried out.

図８は、参加者の発言の状況を表示している通信端末２０の表示部２１の前面図である。出力部１１５は、全体状況画面Ｄ上で分析者又は補助者によっていずれかのグループが指定された場合に、該グループに属する各参加者の発言の状況を示す情報を、個別状況画面Ｅとして通信端末２０の表示部２１に表示させる。個別状況画面Ｅは、参加者ごとの発言の状況を示す領域Ｅ１を含む。領域Ｅ１は、複数の参加者に対応する複数の領域からなる。 FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the status of participants' statements. When any group is specified by the analyst or assistant on the overall situation screen D, the output unit 115 communicates information indicating the state of statements of each participant belonging to the group as an individual situation screen E. It is displayed on the display unit 21 of the terminal 20. The individual status screen E includes an area E1 showing the status of each participant's statements. Area E1 consists of multiple areas corresponding to multiple participants.

出力部１１５は、参加者ごとの発言の状況に対応する表示態様の領域Ｅ１を表示する。例えば出力部１１５は、各参加者の発言量の合計値又は積分値に応じて、領域Ｅ１の中で各参加者に対応する領域の色、模様等を変化させる。また、出力部１１５は、各参加者に対応する領域の近傍に、各参加者を識別する識別情報（ここではＵ１～Ｕ４）を表示する。これにより、議論の分析者又は補助者は、１つのグループに属する各参加者の発言の状況を容易に把握することができる。 The output unit 115 displays an area E1 in a display mode corresponding to the speaking situation of each participant. For example, the output unit 115 changes the color, pattern, etc. of the area corresponding to each participant in the area E1 according to the total value or integral value of the amount of speech by each participant. Furthermore, the output unit 115 displays identification information (here, U1 to U4) for identifying each participant near the area corresponding to each participant. This allows the discussion analyst or assistant to easily grasp the status of statements made by each participant belonging to one group.

図６～図８に示したグループ及び参加者の発言の状況を示す情報の出力方法は一例であり、グループ及び参加者に対応する集音装置１０に関連付けて該情報を表示することが可能なその他の出力方法を用いてもよい。出力部１１５は、必ずしも集音装置１０及び通信端末２０の両方によって発言の状況を示す情報を表示する必要はなく、それらの少なくとも一方によって発言の状況を示す情報を表示してもよい。出力部１１５は、プリンタによる印刷、記憶装置へのデータ記録等、その他の手段によって発言の状況を示す情報を出力してもよい。 The method of outputting information indicating the status of statements of groups and participants shown in FIGS. 6 to 8 is an example, and the information can be displayed in association with the sound collection device 10 corresponding to the group and participant. Other output methods may also be used. The output unit 115 does not necessarily need to display information indicating the speaking status by both the sound collection device 10 and the communication terminal 20, and may display information indicating the speaking status by at least one of them. The output unit 115 may output information indicating the speech status by other means, such as printing with a printer or recording data in a storage device.

［音声分析方法のシーケンス］
図９は、本実施形態に係る音声分析システムＳが行う音声分析方法のシーケンス図である。まず通信端末２０は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置１００に送信する（Ｓ１１）。音声分析装置１００の設定部１１１は、通信端末２０から取得した設定情報又は設定部１１１自身が特定した設定情報を、設定情報記憶部１３１に記憶させる。設定情報は、１つの集音装置１０に関するグループにおける各参加者の位置を示す参加者位置情報と、同時に議論を行う各グループ（すなわち各集音装置１０）の位置を示すグループ位置情報とを含む。 [Speech analysis method sequence]
FIG. 9 is a sequence diagram of a speech analysis method performed by the speech analysis system S according to this embodiment. First, the communication terminal 20 receives settings for analysis conditions from the analyst, and transmits them as setting information to the speech analysis device 100 (S11). The setting unit 111 of the speech analysis device 100 causes the setting information storage unit 131 to store the setting information acquired from the communication terminal 20 or the setting information specified by the setting unit 111 itself. The setting information includes participant position information indicating the position of each participant in the group regarding one sound collection device 10, and group position information indicating the position of each group (i.e., each sound collection device 10) having a discussion at the same time. .

次に音声分析装置１００の音声取得部１１２は、音声の取得を指示する信号を集音装置１０に送信する（Ｓ１２）。集音装置１０は、音声分析装置１００から音声の取得を指示する信号を受信した場合に、複数の集音部を用いて音声の記録を開始し、記録した複数チャネルの音声を音声分析装置１００に送信する（Ｓ１３）。音声分析装置１００の音声取得部１１２は、集音装置１０から音声を受信して音声記憶部１３２に記憶させる。 Next, the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing the acquisition of voice to the sound collection device 10 (S12). When the sound collection device 10 receives a signal instructing to acquire audio from the audio analysis device 100, it starts recording audio using the plurality of sound collection units, and transfers the recorded audio of the plurality of channels to the audio analysis device 100. (S13). The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores it in the voice storage unit 132.

音声分析装置１００は、取得した音声を、逐次、すなわちリアルタイム処理で分析する。音声を分析する際に、まず音源定位部１１３は、音声取得部１１２が取得した音声に基づいて音源定位を行う（Ｓ１４）。 The speech analysis device 100 analyzes the acquired speech sequentially, that is, in real-time processing. When analyzing the sound, the sound source localization unit 113 first performs sound source localization based on the sound acquired by the sound acquisition unit 112 (S14).

次に分析部１１４は、音声取得部１１２が取得した音声及び音源定位部１１３が推定した音源の向きに基づいて、時間ごとにいずれの参加者が発言したかを判別することによって、参加者の時間ごとの発言量を算出する。そして分析部１１４は、算出した発言量を用いて参加者の発言の状況を特定する（Ｓ１５）。参加者の発言の状況は、例えば参加者の発言量の合計値又は積分値、あるいは参加者の発言量の合計値又は積分値の参加者間の比（すなわち相対値）である。 Next, the analysis unit 114 determines which participant spoke at each time based on the audio acquired by the audio acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113. Calculate the amount of speech per hour. Then, the analysis unit 114 identifies the state of the participant's speech using the calculated amount of speech (S15). The status of the participants' comments is, for example, the total value or integral value of the amount of comments made by the participants, or the ratio (ie, relative value) between the participants of the total value or integral value of the amount of comments made by the participants.

また、分析部１１４は、参加者の時間ごとの発言量をグループごとに合計することによってグループの時間ごとの発言量を算出し、算出した発言量を用いてグループの発言の状況を特定する（Ｓ１６）。グループの発言の状況は、例えばグループの発言量の合計値又は積分値、あるいはグループの発言量の合計値又は積分値のグループ間の比（すなわち相対値）である。分析部１１４は、各参加者及び各グループの発言の状況を示す状況情報を、分析結果として分析結果記憶部１３３に記憶させる。 In addition, the analysis unit 114 calculates the amount of comments made by each group by summing up the amount of comments made by each group for each group, and uses the calculated amount of comments to identify the state of the group's comments ( S16). The status of comments of a group is, for example, the total value or integral value of the amount of comments made by the group, or the ratio (ie, relative value) between groups of the total value or integral value of the amount of comments made by the group. The analysis unit 114 causes the analysis result storage unit 133 to store situation information indicating the state of statements of each participant and each group as an analysis result.

出力部１１５は、各参加者及び各グループの発言の状況を示す状況情報を集音装置１０及び通信端末２０に送信することによって、各参加者及び各グループの発言の状況を示す情報を表示する制御を行う（Ｓ１７）。 The output unit 115 displays information indicating the speaking status of each participant and each group by transmitting status information indicating the speaking status of each participant and each group to the sound collection device 10 and the communication terminal 20. Control is performed (S17).

集音装置１０は、音声分析装置１００から受信した状況情報に従って、全体ランプ１１及び個別ランプ１２を発光させることによって、各参加者及び各グループの発言の状況を示す情報を表示する（Ｓ１８）。通信端末２０は、音声分析装置１００から受信した表示情報に従って、各参加者及び各グループの発言の状況を示す情報を表す全体状況画面Ｄ及び個別状況画面Ｅを、表示部２１に表示させる（Ｓ１９）。音声分析装置１００は、所定の時間間隔でステップＳ１２～Ｓ１９を繰り返すことによって、リアルタイム処理で音声を分析する。 The sound collection device 10 displays information indicating the speech status of each participant and each group by lighting the overall lamp 11 and the individual lamps 12 according to the status information received from the voice analysis device 100 (S18). According to the display information received from the speech analysis device 100, the communication terminal 20 causes the display unit 21 to display the overall situation screen D and the individual situation screen E, which represent information indicating the state of speech of each participant and each group (S19 ). The speech analysis device 100 analyzes speech in real-time processing by repeating steps S12 to S19 at predetermined time intervals.

［本実施形態の効果］
本実施形態に係る音声分析装置１００は、複数の集音部を有する集音装置１０を用いて取得した音声に基づいて集音装置１０に係るグループにおける発言の状況を特定し、集音装置１０又は通信端末２０に表示させる。そのため、複数のグループが同時に議論を行っている場合であっても、議論の分析者又は補助者は、集音装置１０又は通信端末２０における表示を参照することによって、複数のグループにおける発言の状況を容易に把握することができる。 [Effects of this embodiment]
The speech analysis device 100 according to the present embodiment identifies the state of speech in the group related to the sound collecting device 10 based on the sound acquired using the sound collecting device 10 having a plurality of sound collecting sections, and Or display it on the communication terminal 20. Therefore, even when multiple groups are having a discussion at the same time, the discussion analyst or assistant can check the status of comments in the multiple groups by referring to the display on the sound collection device 10 or the communication terminal 20. can be easily understood.

音声分析装置１００が集音装置１０上に発言の状況を表示する場合に、議論の分析者又は補助者に加えて、参加者も自身が属するグループの発言の状況を客観的に知ることができる。また、集音装置１０上で参加者ごとに設けられた個別ランプ１２に発言の状況を表示することによって、参加者は自身の発言の状況を他の参加者の発言の状況と容易に区別することができる。 When the speech analysis device 100 displays the speech status on the sound collection device 10, in addition to the discussion analyst or assistant, the participants can also objectively know the speech status of the group to which they belong. . In addition, by displaying the speaking status on the individual lamp 12 provided for each participant on the sound collection device 10, the participant can easily distinguish his own speaking status from the speaking status of other participants. be able to.

音声分析装置１００が通信端末２０上に発言の状況を表示する場合に、議論の分析者又は補助者は、全てのグループの発言の状況を俯瞰的に見ることができる。また、各集音装置１０にランプを設ける必要がないため低コストである。 When the speech analysis device 100 displays the speech status on the communication terminal 20, the discussion analyst or assistant can see the speech status of all groups from a bird's-eye view. Further, since there is no need to provide a lamp for each sound collection device 10, the cost is low.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the gist. be. For example, the specific embodiment of dispersion/integration of devices is not limited to the above embodiments, and all or part of the devices may be configured by functionally or physically distributing/integrating them in arbitrary units. Can be done. In addition, new embodiments created by arbitrary combinations of multiple embodiments are also included in the embodiments of the present invention. The effects of the new embodiment resulting from the combination have the effects of the original embodiment.

上述の説明において、音声分析装置１００は集音装置１０を参加者が取り囲んで行われる議論における音声の分析に用いられているが、その他の用途にも適用できる。例えば音声分析装置１００は、ポスターセッション等において１人の解説者が複数の観客に対して説明するような状況にも適用できる。 In the above description, the voice analysis device 100 is used to analyze voices in a discussion conducted with participants surrounding the sound collection device 10, but it can also be applied to other uses. For example, the speech analysis device 100 can be applied to a situation where one commentator gives an explanation to a plurality of audiences at a poster session or the like.

音声分析装置１００、集音装置１０及び通信端末２０のプロセッサは、図９に示す音声分析方法に含まれる各ステップ（工程）の主体となる。すなわち、音声分析装置１００、集音装置１０及び通信端末２０のプロセッサは、図９に示す音声分析方法を実行するためのプログラムを記憶部から読み出し、該プログラムを実行して音声分析装置１００、集音装置１０及び通信端末２０の各部を制御することによって、図９に示す音声分析方法を実行する。図９に示す音声分析方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The processors of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are the main bodies of each step (process) included in the voice analysis method shown in FIG. That is, the processors of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 read a program for executing the speech analysis method shown in FIG. The voice analysis method shown in FIG. 9 is executed by controlling each part of the sound device 10 and the communication terminal 20. Some of the steps included in the speech analysis method shown in FIG. 9 may be omitted, the order of the steps may be changed, or a plurality of steps may be performed in parallel.

Ｓ音声分析システム
１００音声分析装置
１１０制御部
１１１設定部
１１２音声取得部
１１４分析部
１１５出力部
１０集音装置
２０通信端末
２１表示部 S speech analysis system 100 speech analysis device 110 control section 111 setting section 112 speech acquisition section 114 analysis section 115 output section 10 sound collection device 20 communication terminal 21 display section

Claims

a storage unit that stores information on a plurality of participants surrounding the sound collection device in association with each of the plurality of sound collection devices;
an acquisition unit that acquires, from each of the plurality of sound collection devices , voices uttered by the plurality of participants associated with the sound collection device in the storage unit ;
an analysis unit that identifies utterances of each of the plurality of participants associated with the plurality of sound collection devices in the storage unit in the audio acquired by the acquisition unit from each of the plurality of sound collection devices ;
a setting unit that sets the position of each of the plurality of sound collecting devices based on signals exchanged between the plurality of sound collecting devices;
Information indicating the status of the utterances of the plurality of participants associated with the sound collection devices in the storage section at a position on the display section corresponding to the position of each of the plurality of sound collection devices set by the setting section. an output section that displays
A voice analysis device with

The output unit displays information indicating the status of the utterance at a position corresponding to the position of each of the plurality of sound collecting devices on a map showing the positions of the plurality of sound collecting devices displayed on the display unit. The speech analysis device according to claim 1 .

The speech analysis device according to claim 1 or 2 , wherein the output section causes the display section provided in a communication terminal that communicates with the speech analysis device to display information indicating the status of the utterance.

The voice analysis device according to claim 1 or 2 , wherein the output unit causes the display unit provided in each of the plurality of sound collection devices to display information indicating the state of the utterance.

The processor
In a storage unit that stores information of a plurality of participants surrounding the sound collection device in association with each of the plurality of sound collection devices from each of the plurality of sound collection devices, the plurality of participants associated with the sound collection device; a step of obtaining the voice emitted by the
a step of identifying the utterances of each of the plurality of participants associated with the plurality of sound collection devices in the storage unit in the audio obtained in the acquiring step from each of the plurality of sound collection devices ;
setting the position of each of the plurality of sound collecting devices based on signals exchanged between the plurality of sound collecting devices;
Indicating the status of the utterances of the plurality of participants associated with the plurality of sound collecting devices in the storage section at a position on the display section corresponding to the position of each of the plurality of sound collecting devices set in the setting step. a step of displaying information;
A speech analysis method that performs.

to the computer,
In a storage unit that stores information of a plurality of participants surrounding the sound collection device in association with each of the plurality of sound collection devices from each of the plurality of sound collection devices, the plurality of participants associated with the sound collection device; a step of obtaining the voice emitted by the
a step of identifying the utterances of each of the plurality of participants associated with the plurality of sound collection devices in the storage unit in the audio obtained in the acquiring step from each of the plurality of sound collection devices ;
setting the position of each of the plurality of sound collecting devices based on signals exchanged between the plurality of sound collecting devices;
Indicating the status of the utterances of the plurality of participants associated with the plurality of sound collecting devices in the storage section at a position on the display section corresponding to the position of each of the plurality of sound collecting devices set in the setting step. a step of displaying information;
A speech analysis program that runs

comprising a voice analysis device and a communication terminal capable of communicating with the voice analysis device,
The communication terminal has a display unit that displays information,
The voice analysis device includes:
a storage unit that stores information on a plurality of participants surrounding the sound collection device in association with each of the plurality of sound collection devices;
an acquisition unit that acquires, from each of the plurality of sound collection devices , voices uttered by the plurality of participants associated with the sound collection device in the storage unit ;
an analysis unit that identifies utterances of each of the plurality of participants associated with the plurality of sound collection devices in the storage unit in the audio acquired by the acquisition unit from each of the plurality of sound collection devices ;
a setting unit that sets the position of each of the plurality of sound collecting devices based on signals exchanged between the plurality of sound collecting devices;
Indicating the status of the utterances of the plurality of participants associated with the plurality of sound collecting devices in the storage section at a position on the display section corresponding to the position of each of the plurality of sound collecting devices set by the setting section. an output section that displays information;
A voice analysis system with