JP2022038498A

JP2022038498A - Selection program, selection method and selection device

Info

Publication number: JP2022038498A
Application number: JP2020143044A
Authority: JP
Inventors: ユカ田中; Yuka Tanaka; 敏裕小高; Toshihiro Odaka; 拓也古田; Takuya Furuta; 智裕大嶽; Tomohiro Otake; 幹篤 ▲角▼岡; Motoshi Sumioka
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2022-03-10

Abstract

To select participants with high facilitator ability as facilitators.SOLUTION: A selection device acquires voice information including voices of a plurality of speakers. The selection device detects utterance information in which an utterance section in which an utterance included in the voice information is made and a speaker who made the utterance in the utterance section are associated with each other. The selection device performs voice recognition on the voice information and extracts words included in the voice information. The selection device evaluates impressions of the plurality of speakers before and after the utterance information including specific words included in the voice information. The selection device selects a facilitator based on the evaluation of the impressions of the plurality of speakers.SELECTED DRAWING: Figure 1

Description

本発明は、選定装置等に関する。 The present invention relates to a selection device and the like.

近年、会議や交流会等のイベントにおける交流を支援するシステムが知られている。このようなシステムでは、共通の趣味の参加者をグルーピングすることや、ファシリテーターを選定することなどにより、交流を支援する。 In recent years, a system that supports exchanges at events such as conferences and exchange meetings has been known. Such a system supports exchanges by grouping participants with common hobbies and selecting facilitators.

例えば、立候補または推薦による幹事候補者の中からイベントの幹事を選定する技術がある。この技術を用いることで、幹事が決まっていない状態でもイベントの開催に向けた準備を進めることができる。 For example, there is a technique for selecting an event secretary from candidates for secretary by candidacy or recommendation. By using this technology, it is possible to prepare for the event even if the secretary has not been decided.

特開２０１８－１２４７５０号公報Japanese Unexamined Patent Publication No. 2018-124750 特開２０１９－８１３０号公報Japanese Unexamined Patent Publication No. 2019-8130 特開２０１９－６１１２９号公報Japanese Unexamined Patent Publication No. 2019-61129 国際公開第２０１７／１６８６６３号International Publication No. 2017/168663

しかしながら、上述した技術では、ファシリテート力が高い参加者をファシリテーターに選定することができない場合がある。 However, with the above-mentioned technique, it may not be possible to select a participant with high facilitator ability as a facilitator.

例えば、初対面の人が多数含まれるイベントなどにおいて、グルーピングした参加者に交流を実施させる場合、グループの交流が成功するかはファシリテーターの選び方によるところが大きい。選出されたファシリテーターはアジェンダに従ってその場を仕切るが、不慣れな人がすすめると交流の場は盛り上がらない。上述した技術では、イベントへの参加回数等から交流意欲はわかるものの、選ばれた人が話を聞きだす能力に長けているか判断できないので、交流が盛り上がるようなファシリテーターを選出できない場合がある。 For example, in an event that includes a large number of people who meet for the first time, when the grouped participants are allowed to interact with each other, the success of the group exchange depends largely on how the facilitator is selected. The elected facilitators will partition the venue according to the agenda, but if an inexperienced person recommends it, the venue for interaction will not be lively. With the above-mentioned technology, although the willingness to interact can be understood from the number of times of participation in the event, it is not possible to judge whether the selected person is good at listening to the story, so it may not be possible to select a facilitator that excites the interaction.

１つの側面では、本発明は、ファシリテート力が高い参加者をファシリテーターに選定する選定プログラム、選定方法および選定装置を提供することを目的とする。 In one aspect, it is an object of the present invention to provide a selection program, a selection method, and a selection device for selecting a participant having a high facilitator ability as a facilitator.

第１の案では、コンピュータに次の処理を実行させる。コンピュータは、複数の発話者の音声が含まれる音声情報を取得する。コンピュータは、音声情報に含まれる発話が行われた発話区間と該発話区間における発話を行った発話者とを対応付けた発話情報を検出する。コンピュータは、音声情報に対して音声認識を行い、音声情報に含まれる単語を抽出する。コンピュータは、音声情報に含まれる特定の単語を含む発話情報の前後で、複数の発話者の印象を評価する。コンピュータは、複数の発話者の印象の評価に基づき、ファシリテーターを選定する。 In the first plan, the computer is made to perform the following processing. The computer acquires voice information including the voices of a plurality of speakers. The computer detects the utterance information in which the utterance section in which the utterance is made and the speaker who made the utterance in the utterance section are associated with each other. The computer performs voice recognition on the voice information and extracts words included in the voice information. The computer evaluates the impressions of a plurality of speakers before and after the utterance information including a specific word contained in the voice information. The computer selects a facilitator based on the evaluation of the impressions of multiple speakers.

一実施形態によれば、ファシリテート力が高い参加者をファシリテーターに選定することができる。 According to one embodiment, participants with high facilitator ability can be selected as facilitators.

図１は、本実施例１に係る選定装置の処理の一例を説明するための図である。FIG. 1 is a diagram for explaining an example of processing of the selection apparatus according to the first embodiment. 図２は、本実施例１に係るシステムの一例を示す図である。FIG. 2 is a diagram showing an example of the system according to the first embodiment. 図３は、本実施例１に係る選定装置の構成を示す機能ブロック図である。FIG. 3 is a functional block diagram showing the configuration of the selection device according to the first embodiment. 図４は、発話情報のデータ構造の一例を示す図である。FIG. 4 is a diagram showing an example of a data structure of utterance information. 図５は、発話印象評価情報のデータ構造の一例を示す図である。FIG. 5 is a diagram showing an example of a data structure of utterance impression evaluation information. 図６は、オウム返し発生情報のデータ構造の一例を示す図である。FIG. 6 is a diagram showing an example of a data structure of parrot return generation information. 図７は、ファシリテート力評価情報のデータ構造の一例を示す図である。FIG. 7 is a diagram showing an example of a data structure of facilitating force evaluation information. 図８は、参加者レーティング情報のデータ構造の一例を示す図である。FIG. 8 is a diagram showing an example of a data structure of participant rating information. 図９は、本実施例１に係る選定装置の処理手順を示すフローチャートである。FIG. 9 is a flowchart showing a processing procedure of the selection apparatus according to the first embodiment. 図１０は、発話情報を検出する処理手順を示すサブルーチンである。FIG. 10 is a subroutine showing a processing procedure for detecting utterance information. 図１１は、オウム返しを特定する処理手順を示すサブルーチンである。FIG. 11 is a subroutine showing a processing procedure for specifying the parrot return. 図１２は、本実施例２に係る選定装置の構成を示す機能ブロック図である。FIG. 12 is a functional block diagram showing the configuration of the selection device according to the second embodiment. 図１３は、本実施例２に係る選定装置の処理手順を示すフローチャートである。FIG. 13 is a flowchart showing a processing procedure of the selection apparatus according to the second embodiment. 図１４は、選定装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。FIG. 14 is a diagram showing an example of a hardware configuration of a computer that realizes the same function as the selection device.

以下に、本願の開示する選定プログラム、選定方法および選定装置の実施例を図面に基づいて説明する。なお、この実施例によりこの発明が限定されるものではない。また、各実施例は、矛盾のない範囲内で適宜組み合わせることができる。 Hereinafter, examples of the selection program, selection method, and selection device disclosed in the present application will be described with reference to the drawings. The present invention is not limited to this embodiment. In addition, each embodiment can be appropriately combined within a consistent range.

図１は、本実施例１に係る選定装置の処理の一例を説明するための図である。本実施例１に係る選定装置は、会議や交流会等の参加者である複数の発話者の音声が含まれる音声情報を取得する。そして、選定装置は、取得した音声情報に含まれる発話が行われた発話区間とこの発話区間における発話を行った発話者とを対応付けた発話情報を検出する。図１に示す発話情報には、発話区間に対応する発話ＩＤ、この発話ＩＤに対応する発話者、発話の開始時刻および終了時刻が含まれる。また、選定装置は、音声情報に対して音声認識を行って生成した文字列を発話ＩＤに対応付けて発話内容文字列として記憶する。 FIG. 1 is a diagram for explaining an example of processing of the selection apparatus according to the first embodiment. The selection device according to the first embodiment acquires voice information including voices of a plurality of speakers who are participants in a conference or an exchange meeting. Then, the selection device detects the utterance information in which the utterance section in which the utterance is performed and the utterance speaker in this utterance section, which are included in the acquired voice information, are associated with each other. The utterance information shown in FIG. 1 includes an utterance ID corresponding to the utterance section, a speaker corresponding to the utterance ID, and a start time and end time of the utterance. Further, the selection device associates the character string generated by performing voice recognition with the voice information with the utterance ID and stores it as the utterance content character string.

さらに、選定装置は、この発話内容文字列に含まれる単語を抽出する。そして、選定装置は、時系列順に隣接する発話情報において、抽出した単語が一致し、かつ発話者が異なるオウム返しを特定する。オウム返しとは、発話者が直前の発話者が発した単語をそのまま発することを指す。具体的には、図１の発話情報において、発話ＩＤ「ｈ１１」に含まれる「コマーシャル」という単語Ｗ１と、発話ＩＤ「ｈ２１」に含まれる「コマーシャル」という単語Ｗ２とが抽出され、これらの発話者が異なることによりオウム返しが特定される。 Further, the selection device extracts words included in this utterance content character string. Then, the selection device identifies parrot returns in which the extracted words match and the speakers are different in the adjacent utterance information in chronological order. Echolalia means that the speaker speaks the word spoken by the previous speaker as it is. Specifically, in the utterance information of FIG. 1, the word "commercial" W1 included in the utterance ID "h11" and the word "commercial" W2 included in the utterance ID "h21" are extracted, and these utterances are made. The parrot return is specified by different persons.

また、選定装置は、音声情報に対する参加者の印象を評価する。図１に示す発話印象評価情報には、発話印象評価情報を識別する発話印象ＩＤ、この発話印象ＩＤに対応する発話ＩＤ、印象評価値に対応する時刻、参加者の印象を評価して数値化した印象評価値が含まれる。 In addition, the selection device evaluates the participants' impressions of the voice information. The utterance impression evaluation information shown in FIG. 1 includes an utterance impression ID that identifies the utterance impression evaluation information, a utterance ID corresponding to this utterance impression ID, a time corresponding to the impression evaluation value, and an evaluation and quantification of the participant's impression. The impression evaluation value that was made is included.

続いて、選定装置は、オウム返しにより、印象がよくなったか否かを判定する。具体的には、図１に示す時間軸において、オウム返しが発生した発話ＩＤ「ｈ２１」の直後の発話区間に対応する発話ＩＤ「ｈ１２」の印象評価値Ｖ１が上昇しているか否かを判定する。 Subsequently, the selection device determines whether or not the impression is improved by returning the parrot. Specifically, on the time axis shown in FIG. 1, it is determined whether or not the impression evaluation value V1 of the utterance ID “h12” corresponding to the utterance section immediately after the utterance ID “h21” in which the parrot is returned has increased. do.

その後、選定装置は、オウム返しにより印象評価値を上昇させた参加者に対してポイントを加算することにより参加者をレーティングし、このレーティング結果を用いてファシリテーターを選定する。 After that, the selection device rates the participants by adding points to the participants whose impression evaluation value is increased by returning the parrot, and selects the facilitator using the rating result.

上記のように、本実施例１に係る選定装置は、オウム返しにより参加者の印象をよくした参加者をファシリテーターとして選定する。これによって、ファシリテート力が高い参加者をファシリテーターに選定することができる。 As described above, the selection device according to the first embodiment selects the participants who have improved the impression of the participants by returning the parrots as facilitators. As a result, participants with high facilitator ability can be selected as facilitators.

次に、本実施例１にかかるシステムの構成について説明する。図２は、本実施例１に係るシステムの一例を示す図である。図２に示すように、このシステムは、マイク端末１０と、選定装置１００とを有する。たとえば、マイク端末１０と、選定装置１００とは、無線によって相互に接続される。なお、マイク端末１０と、選定装置１００とを有線で接続してもよい。 Next, the configuration of the system according to the first embodiment will be described. FIG. 2 is a diagram showing an example of the system according to the first embodiment. As shown in FIG. 2, this system has a microphone terminal 10 and a selection device 100. For example, the microphone terminal 10 and the selection device 100 are wirelessly connected to each other. The microphone terminal 10 and the selection device 100 may be connected by wire.

マイク端末１０は、音声を収録する装置である。マイク端末１０は、音声情報を選定装置１００に送信する。音声情報には、会議や交流会等の参加者である発話者Ａ～Ｅの音声の情報が含まれる。マイク端末１０は、複数のマイクを備えていてもよい。マイク端末１０は、複数のマイクを備えている場合、各マイクで集音した音声情報を、選定装置１００に送信する。 The microphone terminal 10 is a device for recording voice. The microphone terminal 10 transmits voice information to the selection device 100. The voice information includes voice information of speakers A to E who are participants in a conference or an exchange meeting. The microphone terminal 10 may include a plurality of microphones. When the microphone terminal 10 includes a plurality of microphones, the voice information collected by each microphone is transmitted to the selection device 100.

選定装置１００は、マイク端末１０から音声情報を取得し、発話者Ａ～Ｅのうち、ファシリテート力が高いと判定した参加者をファシリテーターに選定する。 The selection device 100 acquires voice information from the microphone terminal 10, and selects, among the speakers A to E, the participants who are determined to have high facilitating ability as facilitators.

図３は、本実施例１に係る選定装置の構成を示す機能ブロック図である。図３に示すように、この選定装置１００は、通信部１１０と、入力部１２０と、表示部１３０と、記憶部１４０と、制御部１５０とを有する。 FIG. 3 is a functional block diagram showing the configuration of the selection device according to the first embodiment. As shown in FIG. 3, the selection device 100 includes a communication unit 110, an input unit 120, a display unit 130, a storage unit 140, and a control unit 150.

通信部１１０は、無線によって、マイク端末１０とデータ通信を実行する処理部である。通信部１１０は、通信装置の一例である。通信部１１０は、マイク端末１０から音声情報を受信し、受信した音声情報を、制御部１５０に出力する。なお、選定装置１００は、有線によって、マイク端末１０に接続してもよい。選定装置１００は、通信部１１０によってネットワークに接続し、外部装置（図示略）とデータを送受信してもよい。 The communication unit 110 is a processing unit that wirelessly executes data communication with the microphone terminal 10. The communication unit 110 is an example of a communication device. The communication unit 110 receives voice information from the microphone terminal 10 and outputs the received voice information to the control unit 150. The selection device 100 may be connected to the microphone terminal 10 by wire. The selection device 100 may be connected to a network by the communication unit 110 to transmit / receive data to / from an external device (not shown).

入力部１２０は、選定装置１００に各種の情報を入力するための入力装置である。入力部１２０は、キーボードやマウス、タッチパネル等に対応する。 The input unit 120 is an input device for inputting various information to the selection device 100. The input unit 120 corresponds to a keyboard, a mouse, a touch panel, and the like.

表示部１３０は、制御部１５０から出力される情報を表示する表示装置である。表示部１３０は、液晶ディスプレイやタッチパネル等に対応する。 The display unit 130 is a display device that displays information output from the control unit 150. The display unit 130 corresponds to a liquid crystal display, a touch panel, or the like.

記憶部１４０は、音声バッファ１４０ａと、学習音響特徴情報１４０ｂと、発話情報１４０ｃと、発話印象評価情報１４０ｄと、オウム返し発生情報１４０ｅと、ファシリテート力評価情報１４０ｆと、参加者レーティング情報１４０ｇとを有する。記憶部１４０は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子や、ＨＤＤ（Hard Disk Drive）などの記憶装置に対応する。 The storage unit 140 includes a voice buffer 140a, learning acoustic feature information 140b, utterance information 140c, utterance impression evaluation information 140d, parrot return generation information 140e, facilitating ability evaluation information 140f, and participant rating information 140g. Have. The storage unit 140 corresponds to a semiconductor memory element such as a RAM (Random Access Memory) and a flash memory (Flash Memory), and a storage device such as an HDD (Hard Disk Drive).

音声バッファ１４０ａは、マイク端末１０から送信される音声情報を格納するバッファである。音声情報では、音声信号と時刻とが対応付けられる。 The voice buffer 140a is a buffer for storing voice information transmitted from the microphone terminal 10. In the voice information, the voice signal and the time are associated with each other.

学習音響特徴情報１４０ｂは、予め学習される発話者Ａ～Ｅそれぞれの音声の音響特徴の情報である。音響特徴には、ピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向が含まれる。たとえば、学習音響特徴情報１４０ｂは、ピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向の値をそれぞれ要素とするベクトルである。 The learning acoustic feature information 140b is information on the acoustic features of the voices of the speakers A to E that are learned in advance. Acoustic features include pitch frequency, frame power, formant frequency, and voice arrival direction. For example, the learning acoustic feature information 140b is a vector having values of pitch frequency, frame power, formant frequency, and voice arrival direction as elements.

発話情報１４０ｃは、参加者の音声情報に含まれる発話が行われた発話区間とこの発話区間における発話を行った発話者とを対応付けた情報である。図４は、発話情報のデータ構造の一例を示す図である。図４に示す発話情報１４０ｃには、発話区間に対応する発話ＩＤ、この発話ＩＤに対応する発話者、発話の開始時刻および終了時刻が含まれる。発話情報には、音声情報に対して音声認識を行って生成した発話内容文字列が含まれる。 The utterance information 140c is information in which the utterance section in which the utterance is made and the utterance speaker in this utterance section, which are included in the voice information of the participants, are associated with each other. FIG. 4 is a diagram showing an example of a data structure of utterance information. The utterance information 140c shown in FIG. 4 includes an utterance ID corresponding to the utterance section, a speaker corresponding to the utterance ID, and a start time and end time of the utterance. The utterance information includes a utterance content character string generated by performing voice recognition on the voice information.

発話印象評価情報１４０ｄは、参加者の印象を評価した情報である。図５は、発話印象評価情報のデータ構造の一例を示す図である。図５に示す発話印象評価情報１４０ｄには、各発話印象評価情報を識別する発話印象ＩＤ、この発話印象ＩＤに対応する発話区間（発話ＩＤ）、印象評価値に対応する時刻、参加者の印象を評価して数値化した印象評価値が含まれる。 The utterance impression evaluation information 140d is information that evaluates the impression of the participants. FIG. 5 is a diagram showing an example of a data structure of utterance impression evaluation information. The utterance impression evaluation information 140d shown in FIG. 5 includes an utterance impression ID that identifies each utterance impression evaluation information, a utterance section (speech ID) corresponding to this utterance impression ID, a time corresponding to the impression evaluation value, and a participant's impression. The impression evaluation value that is evaluated and quantified is included.

オウム返し発生情報１４０ｅは、時系列順に隣接する発話情報において、抽出した単語が一致し、かつ発話者が異なるオウム返しの発生を示す情報である。図６は、オウム返し発生情報のデータ構造の一例を示す図である。図６に示すオウム返し発生情報１４０ｅには、各オウム返し発生情報を識別するオウム返しＩＤ、このオウム返し発生ＩＤに対応する発話区間（発話ＩＤ）、この発話ＩＤに対応する発話者が含まれる。 The parrot return generation information 140e is information indicating the occurrence of parrot return in which the extracted words match and the speakers are different in the adjacent utterance information in chronological order. FIG. 6 is a diagram showing an example of a data structure of parrot return generation information. The parrot return generation information 140e shown in FIG. 6 includes an parrot return ID that identifies each parrot return generation information, an utterance section (utterance ID) corresponding to this parrot return generation ID, and a speaker corresponding to this utterance ID. ..

ファシリテート力評価情報１４０ｆは、参加者のファシリテート力を評価した情報である。図７は、ファシリテート力評価情報のデータ構造の一例を示す図である。図７に示すファシリテート力評価情報１４０ｆは、各発話者、各発話者に対する評価情報、各発話者に対する評価値が含まれる。評価情報は、オウム返しが発生する度に生成され、オウム返しにより印象がよくなった場合に１、印象が変化しない場合に０、印象が悪くなった場合に－１の値がそれぞれ付与される。評価値は、評価情報の平均値であり、評価情報の合計をオウム返しの発生回数で除算することにより算出される。 The facilitating ability evaluation information 140f is information for evaluating the facilitating ability of the participants. FIG. 7 is a diagram showing an example of a data structure of facilitating force evaluation information. The facilitating power evaluation information 140f shown in FIG. 7 includes an evaluation information for each speaker, each speaker, and an evaluation value for each speaker. The evaluation information is generated every time the parrot return occurs, and a value of 1 is given when the impression is improved by the parrot return, 0 is given when the impression does not change, and -1 is given when the impression is bad. .. The evaluation value is the average value of the evaluation information, and is calculated by dividing the total of the evaluation information by the number of occurrences of parrot return.

参加者レーティング情報１４０ｇは、参加者のレーティング（格付け）を行った情報である。図８は、参加者レーティング情報のデータ構造の一例を示す図である。図８に示す参加者レーティング情報１４０ｇには、各発話者、各発話者が評価値の最高値を獲得した履歴を表す参加情報、各発話者に対するレーティングが含まれる。参加情報は、例えば発話者Ｅについて、三回目に参加したイベント（Ｅ３）において、評価値０．７で最高値を獲得したことを表す。レーティングは、参加情報において最高値を記録した回数に応じて１ずつ加算される。 Participant rating information 140g is information obtained by rating participants. FIG. 8 is a diagram showing an example of a data structure of participant rating information. The participant rating information 140g shown in FIG. 8 includes each speaker, participation information representing the history in which each speaker has acquired the highest evaluation value, and a rating for each speaker. The participation information indicates that, for example, the speaker E has obtained the highest evaluation value of 0.7 in the event (E3) in which the speaker E participated for the third time. The rating is added by 1 according to the number of times the highest value is recorded in the participation information.

制御部１５０は、取得部１５０ａと、発話情報検出部１５０ｂと、音声認識部１５０ｃと、発話印象評価部１５０ｄと、特定部１５０ｅと、判定部１５０ｆと、選定部１５０ｇとを有する。制御部１５０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジック等によって実現される。 The control unit 150 includes an acquisition unit 150a, an utterance information detection unit 150b, a voice recognition unit 150c, an utterance impression evaluation unit 150d, a specific unit 150e, a determination unit 150f, and a selection unit 150g. The control unit 150 is realized by hard-wired logic such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), and an FPGA (Field Programmable Gate Array).

取得部１５０ａは、通信部１１０を介して、マイク端末１０から音声情報を取得する処理部である。取得部１５０ａは、音声情報を順次、音声バッファ１４０ａに格納する。 The acquisition unit 150a is a processing unit that acquires voice information from the microphone terminal 10 via the communication unit 110. The acquisition unit 150a sequentially stores the voice information in the voice buffer 140a.

発話情報検出部１５０ｂは、音声バッファ１４０ａから音声情報を取得し、音声情報から図４に示す発話情報１４０ｃを検出する処理部である。発話情報検出部１５０ｂは、発話区間検出処理、音響解析処理、類似性評価処理を行う。 The utterance information detection unit 150b is a processing unit that acquires voice information from the voice buffer 140a and detects the utterance information 140c shown in FIG. 4 from the voice information. The utterance information detection unit 150b performs utterance section detection processing, acoustic analysis processing, and similarity evaluation processing.

まず、発話情報検出部１５０ｂが実行する「発話区間検出処理」の一例について説明する。発話情報検出部１５０ｂは、音声情報のパワーを特定し、パワーが閾値未満となる無音区間に挟まれた区間を、発話区間として検出する。発話情報検出部１５０ｂは、国際公開第２００９／１４５１９２号に開示された技術を用いて、発話区間を検出してもよい。 First, an example of the "utterance section detection process" executed by the utterance information detection unit 150b will be described. The utterance information detection unit 150b identifies the power of the voice information, and detects a section sandwiched between silent sections whose power is less than the threshold value as the utterance section. The utterance information detection unit 150b may detect the utterance section by using the technique disclosed in International Publication No. 2009/145192.

発話情報検出部１５０ｂは、発話区間によって区切られる音声情報を、固定長のフレームに分割する。発話情報検出部１５０ｂは、各フレームのフレームを識別するフレーム番号を設定する。発話情報検出部１５０ｂは、各フレームに対して、後述する音響解析処理、類似性評価処理を実行する。 The utterance information detection unit 150b divides the voice information divided by the utterance section into fixed-length frames. The utterance information detection unit 150b sets a frame number for identifying the frame of each frame. The utterance information detection unit 150b executes acoustic analysis processing and similarity evaluation processing, which will be described later, for each frame.

続いて、発話情報検出部１５０ｂが実行する「音響解析処理」の一例について説明する。たとえば、発話情報検出部１５０ｂは、音声情報に含まれる発話区間の各フレームを基にして、音響特徴を算出する。発話情報検出部１５０ｂは、音響特徴として、ピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向をそれぞれ算出する。 Subsequently, an example of the "acoustic analysis process" executed by the utterance information detection unit 150b will be described. For example, the utterance information detection unit 150b calculates the acoustic characteristics based on each frame of the utterance section included in the voice information. The utterance information detection unit 150b calculates the pitch frequency, the frame power, the formant frequency, and the voice arrival direction as acoustic features.

発話情報検出部１５０ｂが、音響特徴として「ピッチ周波数」を算出する処理の一例について説明する。発話情報検出部１５０ｂは、ＲＡＰＴ（A Robust Algorithm for Pitch Tracking）の推定手法を用いて、フレームに含まれる音声信号のピッチ周波数ｐ（ｎ）を算出する。「ｎ」はフレーム番号を示す。発話情報検出部１５０ｂは、「D.Talkin,"A Robust Algorithm for Pitch Tracking (RAPT),"in Speech Coding & Synthesis,W.B. Kleijn and K. K. Pailwal (Eds.),Elsevier,pp.495－518,1995」に記載された技術を用いて、ピッチ周波数を算出してもよい。 An example of a process in which the utterance information detection unit 150b calculates the “pitch frequency” as an acoustic feature will be described. The utterance information detection unit 150b calculates the pitch frequency p (n) of the audio signal included in the frame by using the estimation method of RAPT (A Robust Algorithm for Pitch Tracking). "N" indicates a frame number. The speech information detection unit 150b is "D.Talkin," A Robust Algorithm for Pitch Tracking (RAPT), "in Speech Coding & Synthesis, WB Kleijn and KK Pailwal (Eds.), Elsevier, pp.495-518, 1995". The pitch frequency may be calculated using the technique described in.

発話情報検出部１５０ｂが、音響特徴として「フレームパワー」を算出する処理の一例について説明する。たとえば、発話情報検出部１５０ｂは、式（１）に基づいて、所定長のフレームにおけるパワーＳ（ｎ）を算出する。式（１）において、「ｎ」はフレーム番号を示し、「Ｍ」は１フレームの時間長（たとえば、２０ｍｓ）を示し、「ｔ」は時間を示す。「Ｃ（ｔ）」は、時間ｔにおける音声信号を示す。なお、発話情報検出部１５０ｂは、所定の平滑化係数を用いて、時間平滑化したパワーを、フレームパワーとして算出してもよい。 An example of a process in which the utterance information detection unit 150b calculates "frame power" as an acoustic feature will be described. For example, the utterance information detection unit 150b calculates the power S (n) in a frame having a predetermined length based on the equation (1). In the formula (1), "n" indicates a frame number, "M" indicates a time length of one frame (for example, 20 ms), and "t" indicates a time. “C (t)” indicates an audio signal at time t. The utterance information detection unit 150b may calculate the time-smoothed power as the frame power using a predetermined smoothing coefficient.

発話情報検出部１５０ｂが、音響特徴として「フォルマント周波数」を算出する処理の一例について説明する。発話情報検出部１５０ｂは、フレームに含まれる音声信号Ｃ（ｔ）に対して線形予測（Linear Prediction Coding）分析を行い、複数のピークを抽出することで、複数のフォルマント周波数を算出する。たとえば、発話情報検出部１５０ｂは、周波数の低い順に、第１フォルマント周波数：Ｆ１、第２フォルマント周波数：Ｆ２、第３フォルマント周波数：Ｆ３を算出する。発話情報検出部１５０ｂは、特開昭６２－５４２９７号公報に開示された技術を用いて、フォルマント周波数を算出してもよい。 An example of a process in which the utterance information detection unit 150b calculates the “formant frequency” as an acoustic feature will be described. The utterance information detection unit 150b performs linear prediction (Linear Prediction Coding) analysis on the voice signal C (t) included in the frame, and calculates a plurality of formant frequencies by extracting a plurality of peaks. For example, the utterance information detection unit 150b calculates the first formant frequency: F1, the second formant frequency: F2, and the third formant frequency: F3 in ascending order of frequency. The utterance information detection unit 150b may calculate the formant frequency by using the technique disclosed in Japanese Patent Application Laid-Open No. 62-54297.

発話情報検出部１５０ｂが、音響特徴として「音声到来方向」を算出する処理の一例について説明する。発話情報検出部１５０ｂは、２つのマイクに収録された音声情報の位相差を基にして、音声到来方向を算出する。 An example of a process in which the utterance information detection unit 150b calculates the “voice arrival direction” as an acoustic feature will be described. The utterance information detection unit 150b calculates the voice arrival direction based on the phase difference of the voice information recorded in the two microphones.

この場合、発話情報検出部１５０ｂは、マイク端末１０の複数のマイクによって収録された各音声情報から、発話区間をそれぞれ検出し、各発話区間の同一時間のフレームの音声情報を比較して、位相差を算出する。発話情報検出部１５０ｂは、特開２００８－１７５７３３号公報に開示された技術を用いて、音声到来方向を算出してもよい。 In this case, the utterance information detection unit 150b detects the utterance section from each voice information recorded by the plurality of microphones of the microphone terminal 10, compares the voice information of the frames of the same time in each utterance section, and ranks. Calculate the phase difference. The utterance information detection unit 150b may calculate the voice arrival direction by using the technique disclosed in Japanese Patent Application Laid-Open No. 2008-175733.

発話情報検出部１５０ｂは、上記の音響解析処理を実行することで、音声情報の発話区間に含まれる各フレームの音響特徴をそれぞれ算出する。発話情報検出部１５０ｂは、音響特徴として、ピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向のうち、少なくとも一つを、音響特徴として用いてもよいし、複数の組み合わせを音響特徴として用いてもよい。以下の説明において、音声情報の発話区間に含まれる各フレームの音響特徴を「評価対象音響特徴」と表記する。 The utterance information detection unit 150b calculates the acoustic characteristics of each frame included in the utterance section of the voice information by executing the above acoustic analysis process. The utterance information detection unit 150b may use at least one of the pitch frequency, the frame power, the formant frequency, and the voice arrival direction as the acoustic feature, or may use a plurality of combinations as the acoustic feature. good. In the following description, the acoustic feature of each frame included in the utterance section of the voice information is referred to as "evaluation target acoustic feature".

続いて、発話情報検出部１５０ｂが実行する「類似性評価処理」の一例について説明する。発話情報検出部１５０ｂは、発話区間の各フレームの評価対象音響特徴と、学習音響特徴情報１４０ｂとの類似度を算出する。 Subsequently, an example of the "similarity evaluation process" executed by the utterance information detection unit 150b will be described. The utterance information detection unit 150b calculates the degree of similarity between the evaluation target acoustic feature of each frame of the utterance section and the learning acoustic feature information 140b.

たとえば、発話情報検出部１５０ｂは、ピアソンの積率相関係数を類似度として算出してもよいし、ユークリッド距離を用いて、類似度を算出してもよい。 For example, the utterance information detection unit 150b may calculate the Pearson product-moment correlation coefficient as the similarity, or may calculate the similarity using the Euclidean distance.

発話情報検出部１５０ｂが、ピアソンの積率相関係数を類似度として算出する場合について説明する。ピアソンの積率相関係数ｃｏｒは、式（２）によって算出される。式（２）において、「Ｘ」は、学習音響特徴情報１４０ｂに含まれる発話者Ａ～Ｅそれぞれの音響特徴のピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向の値をそれぞれ要素とするベクトルである。「Ｙ」は、評価対象音響特徴のピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向の値をそれぞれ要素とするベクトルである。「ｉ」は、ベクトルの要素を示す番号である。発話情報検出部１５０ｂは、ピアソンの積率相関係数ｃｏｒが、閾値Ｔｈｃ以上となる評価対象音響特徴のフレームを、発話者Ａ～Ｅのいずれかの音声を含むフレームとして特定する。たとえば、閾値Ｔｈｃを「０．７」とする。閾値Ｔｈｃを適宜変更してもよい。 A case where the utterance information detection unit 150b calculates the Pearson product-moment correlation coefficient as the degree of similarity will be described. Pearson's product-moment correlation coefficient cor is calculated by Eq. (2). In the equation (2), "X" is a vector whose elements are the pitch frequency, frame power, formant frequency, and voice arrival direction of each of the acoustic features of the speakers A to E included in the learning acoustic feature information 140b. be. “Y” is a vector whose elements are the pitch frequency, frame power, formant frequency, and voice arrival direction of the acoustic feature to be evaluated. “I” is a number indicating an element of the vector. The utterance information detection unit 150b specifies a frame of the evaluation target acoustic feature having a Pearson product-moment correlation coefficient cor of the threshold value Thc or more as a frame including any of the voices of the speakers A to E. For example, the threshold Thc is set to "0.7". The threshold Thc may be changed as appropriate.

発話情報検出部１５０ｂが、ユークリッド距離を用いて、類似度を算出する場合について説明する。ユークリッド距離ｄは、式（３）によって算出され、類似度Ｒは、式（４）によって算出される。式（３）において、ａ_１～ａ_ｉは、学習音響特徴情報１４０ｂに含まれる発話者Ａ～Ｅそれぞれの音響特徴のピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向の値に対応する。ｂ_１～ｂ_ｉは、評価対象音響特徴のピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向の値に対応する。発話情報検出部１５０ｂは、類似度Ｒが閾値Ｔｈｒ以上となる評価対象音響特徴のフレームを、発話者Ａ～Ｅのいずれかの音声を含むフレームとして特定する。たとえば、閾値Ｔｈｒを「０．７」とする。閾値Ｔｈｒを適宜変更してもよい。 A case where the utterance information detection unit 150b calculates the similarity using the Euclidean distance will be described. The Euclidean distance d is calculated by the equation (3), and the similarity R is calculated by the equation (4). In the equation ( ₃ ), a1 to _ai correspond to the pitch frequency, frame power, formant frequency, and voice arrival direction value of each of the acoustic features of the speakers A to E included in the learning acoustic feature information 140b. b ₁ to bi correspond to the pitch frequency, frame power, _formant frequency, and voice arrival direction value of the acoustic feature to be evaluated. The utterance information detection unit 150b specifies a frame of the evaluation target acoustic feature whose similarity R is equal to or higher than the threshold value Thr as a frame including any of the voices of the speakers A to E. For example, the threshold Thr is set to "0.7". The threshold Thr may be changed as appropriate.

Ｒ＝１／（１＋ｄ）・・・（４） R = 1 / (1 + d) ... (4)

発話情報検出部１５０ｂは、類似度が閾値以上となる評価対象音響特徴のフレームを、発話者Ａ～Ｅのいずれかの音声を含むフレームとして特定する。換言すると、発話情報検出部１５０ｂは、音声情報からフレームごとに発話者Ａ～Ｅを特定する。 The utterance information detection unit 150b specifies a frame of the evaluation target acoustic feature whose similarity is equal to or higher than the threshold value as a frame including any of the voices of the speakers A to E. In other words, the utterance information detection unit 150b identifies the utterances A to E for each frame from the voice information.

発話情報検出部１５０ｂは、上記処理を繰り返し実行し、全ての発話区間について発話者を特定する。発話情報検出部１５０ｂは、発話情報に各発話区間の開始時刻および終了時刻を含めて、発話情報１４０ｃとして記憶部１４０に記憶させる。 The utterance information detection unit 150b repeatedly executes the above process to identify the speaker for all utterance sections. The utterance information detection unit 150b includes the start time and end time of each utterance section in the utterance information, and stores the utterance information 140c in the storage unit 140.

音声認識部１５０ｃは、音声情報を取得し、音声情報に対して音声認識を行い図４に示す発話内容文字列を生成する処理部である。音声認識部１５０ｃは、音声認識により生成した文字列を各発話区間と対応付けて、発話内容文字列として発話情報１４０ｃに含めて記憶部１４０に記憶させる。また、音声認識部１５０ｃは、生成した文字列から単語を抽出する処理部である。音声認識部１５０ｃは、発話内容文字列に含まれる単語を抽出する。 The voice recognition unit 150c is a processing unit that acquires voice information, performs voice recognition on the voice information, and generates an utterance content character string shown in FIG. The voice recognition unit 150c associates the character string generated by voice recognition with each utterance section, includes it in the utterance information 140c as an utterance content character string, and stores it in the storage unit 140. Further, the voice recognition unit 150c is a processing unit that extracts a word from the generated character string. The voice recognition unit 150c extracts words included in the utterance content character string.

音声認識部１５０ｃは、どのような技術を用いて、音声情報を文字列に変換してもよい。たとえば、音声認識部１５０ｃは、特開平４－２５５９００号公報に開示された技術を用いて、音声情報を文字列に変換する。 The voice recognition unit 150c may use any technique to convert the voice information into a character string. For example, the voice recognition unit 150c converts voice information into a character string by using the technique disclosed in Japanese Patent Application Laid-Open No. 4-255900.

発話印象評価部１５０ｄは、音声情報を取得し、音声情報における音声信号のピッチ周波数の上下幅に基づいて、音声情報に対する参加者の印象を評価する処理部である。発話印象評価部１５０ｄは、「発話印象評価処理」を行う。 The utterance impression evaluation unit 150d is a processing unit that acquires voice information and evaluates the participant's impression of the voice information based on the vertical width of the pitch frequency of the voice signal in the voice information. The utterance impression evaluation unit 150d performs the "utterance impression evaluation process".

発話印象評価部１５０ｄが実行する「発話印象評価処理」の一例について説明する。発話印象評価部１５０ｄは、各発話区間の音声信号を取得し、フレームごとのピッチ周波数の上下幅を算出する。そして、発話印象評価部１５０ｄは、印象が普通である場合を基準値０として、音声信号のピッチ周波数の上下幅が大きいほど印象がよいと判定して正の絶対値が大きい印象評価値を付与し、音声信号のピッチ周波数の上下幅が小さいほど印象が悪いと判定して負の絶対値が大きい印象評価値を付与する。そして、発話印象評価部１５０ｄは、印象評価値を時刻と対応付けて発話印象評価情報１４０ｄとして記憶部１４０に記憶させる。また、発話印象評価部１５０ｄは、参加者の脈拍等の生体情報を用いて参加者の印象を評価してもよい。なお、発話印象評価部１５０ｄは、発話者Ａ～Ｅの印象の平均値を印象評価値としてもよい。 An example of the "utterance impression evaluation process" executed by the utterance impression evaluation unit 150d will be described. The utterance impression evaluation unit 150d acquires the audio signal of each utterance section and calculates the vertical width of the pitch frequency for each frame. Then, the speech impression evaluation unit 150d determines that the larger the vertical width of the pitch frequency of the audio signal is, the better the impression is, with the case where the impression is normal as the reference value 0, and gives an impression evaluation value having a large positive absolute value. However, it is determined that the smaller the vertical width of the pitch frequency of the audio signal is, the worse the impression is, and the impression evaluation value having a large negative absolute value is given. Then, the utterance impression evaluation unit 150d stores the impression evaluation value in the storage unit 140 as the utterance impression evaluation information 140d in association with the time. In addition, the utterance impression evaluation unit 150d may evaluate the participant's impression using biological information such as the participant's pulse. The utterance impression evaluation unit 150d may use the average value of the impressions of the speakers A to E as the impression evaluation value.

特定部１５０ｅは、発話情報１４０ｃを取得し、発話内容文字列からオウム返しを特定する処理部である。特定部１５０ｅは、オウム返し特定処理を実行する。 The specific unit 150e is a processing unit that acquires the utterance information 140c and specifies the parrot return from the utterance content character string. The specific unit 150e executes the parrot return specific process.

特定部１５０ｅが実行する「オウム返し特定処理」の一例について説明する。特定部１５０ｅは、発話情報１４０ｃを取得し、時系列順に隣接する発話情報において、発話内容文字列に含まれる単語が一致し、かつ発話者が異なる発話情報をオウム返しとして特定する。そして、特定部１５０ｅは、特定したオウム返しをオウム返し発生情報１４０ｅとして記憶部１４０に記憶させる。なお、時系列順に隣接する発話情報とは、時系列で前後に隣り合う発話情報を指すが、１つ以上の発話情報を介在して隣り合う発話情報であってもよい。すなわち、ある発話情報に対して、直後に同じ単語を発話した場合をオウム返しと特定してもよいが、ある発話情報に対して、他者の発話を挟んで同じ単語を発話した場合をオウム返しに含めてもよい。 An example of the "echolalia specific process" executed by the specific unit 150e will be described. The specific unit 150e acquires the utterance information 140c, and identifies the utterance information in which the words included in the utterance content character string match and the utterance speakers are different in the adjacent utterance information in chronological order as the parrot return. Then, the specific unit 150e stores the specified parrot return as the parrot return generation information 140e in the storage unit 140. The utterance information adjacent to each other in chronological order refers to utterance information adjacent to each other in chronological order, but may be adjacent utterance information via one or more utterance information. That is, the case where the same word is spoken immediately after a certain utterance information may be specified as Echolalia, but the case where the same word is spoken with another person's utterance in between the utterance information may be specified as Echolalia. It may be included in the return.

判定部１５０ｆは、発話印象評価情報１４０ｄおよびオウム返し発生情報１４０ｅを取得し、オウム返しにより、印象がよくなったか否かを判定する処理部である。判定部１５０ｆは、判定処理を実行する。 The determination unit 150f is a processing unit that acquires the utterance impression evaluation information 140d and the parrot return generation information 140e, and determines whether or not the impression is improved by the parrot return. The determination unit 150f executes the determination process.

判定部１５０ｆが実行する「判定処理」の一例について説明する。判定部１５０ｆは、取得したオウム返し発生情報１４０ｅから選択した１つのオウム返しが発生した発話区間を特定し、取得した発話印象評価情報１４０ｄにおいて選択したオウム返しの直後の発話者が異なる発話区間において、印象評価値が上昇しているか否かを判定する。そして、判定部１５０ｆは、印象評価値が上昇していれば印象がよくなったと判定し、印象評価値が変化しなければ印象が変化しなかったと判定し、印象評価値が下降していれば印象が悪くなったと判定する。判定部１５０ｆは、評価情報を平均することにより評価値を算出し、判定の結果を図７に示す評価情報および評価値として記憶部１４０に記憶させる。 An example of the "determination process" executed by the determination unit 150f will be described. The determination unit 150f identifies an utterance section in which one parrot return occurs selected from the acquired parrot return generation information 140e, and in an utterance section in which the speaker immediately after the parrot return selected in the acquired speech impression evaluation information 140d is different. , Judge whether the impression evaluation value is increasing. Then, the determination unit 150f determines that the impression has improved if the impression evaluation value has increased, determines that the impression has not changed if the impression evaluation value has not changed, and determines that the impression has not changed if the impression evaluation value has decreased. It is judged that the impression has deteriorated. The determination unit 150f calculates the evaluation value by averaging the evaluation information, and stores the determination result in the storage unit 140 as the evaluation information and the evaluation value shown in FIG. 7.

選定部１５０ｇは、ファシリテート力評価情報１４０ｆを取得し、判定部１５０ｆの判定の結果に基づいて、ファシリテーターを選定する処理部である。選定部１５０ｇは、選定処理を実行する。 The selection unit 150g is a processing unit that acquires facilitator force evaluation information 140f and selects a facilitator based on the judgment result of the determination unit 150f. The selection unit 150g executes the selection process.

選定部１５０ｇが実行する「選定処理」の一例について説明する。選定部１５０ｇは、取得したファシリテート力評価情報１４０ｆに基づいて、評価値が最も高い発話者のレーティングを１上げるよう参加者レーティング情報１４０ｇを更新して記憶部１４０に記憶させる。そして、選定部１５０ｇは、ファシリテーターとしてレーティングが最も高い発話者を選定する。 An example of the "selection process" executed by the selection unit 150g will be described. Based on the acquired facilitating ability evaluation information 140f, the selection unit 150g updates the participant rating information 140g so as to raise the rating of the speaker with the highest evaluation value by 1, and stores it in the storage unit 140. Then, the selection unit 150g selects the speaker with the highest rating as a facilitator.

次に、本実施例１に係る選定装置１００の処理手順の一例について説明する。図９は、本実施例１に係る選定装置の処理手順を示すフローチャートである。図９に示すように、選定装置１００は、事前準備として交流会等の参加者である発話者Ａ～Ｅの音声データを取得し、取得した音響データを解析して各発話者の音響特徴を算出する（ステップＳ１０１）。この事前準備には、過去に行われた交流会等の音声データを用いてもよいし、発話者Ａ～Ｅを選定装置１００に登録する際に取得した音声データを用いてもよいし、交流会等の冒頭における自己紹介や雑談の際に取得した音声データを用いてもよい。 Next, an example of the processing procedure of the selection device 100 according to the first embodiment will be described. FIG. 9 is a flowchart showing a processing procedure of the selection apparatus according to the first embodiment. As shown in FIG. 9, the selection device 100 acquires the voice data of the speakers A to E who are participants of the exchange meeting or the like as a preliminary preparation, analyzes the acquired acoustic data, and determines the acoustic characteristics of each speaker. Calculate (step S101). For this advance preparation, voice data of an exchange meeting or the like held in the past may be used, or voice data acquired when the speakers A to E are registered in the selection device 100 may be used, or exchange may be performed. The voice data acquired at the time of self-introduction or chat at the beginning of a meeting or the like may be used.

続いて、選定装置１００の取得部１５０ａは、複数の発話者Ａ～Ｅの音声を含む音声情報を取得し、音声バッファ１４０ａに格納する（ステップＳ１０２）。 Subsequently, the acquisition unit 150a of the selection device 100 acquires voice information including the voices of the plurality of speakers A to E and stores the voice information in the voice buffer 140a (step S102).

その後、選定装置１００の発話情報検出部１５０ｂは、音声情報から発話情報１４０ｃを検出する（ステップＳ１０３）。図１０は、発話情報を検出する処理手順を示すサブルーチンである。図１０に示すように、発話情報検出部１５０ｂは、取得した音声情報から発話区間を検出する（ステップＳ１３０１）。続いて、発話情報検出部１５０ｂは、各発話区間に含まれるフレームごとに、音響特徴を算出する（ステップＳ１３０２）。さらに、発話情報検出部１５０ｂは、算出した評価対象音響特徴とステップＳ１０１において算出した学習音響特徴との類似度を算出し、発話者を特定する（ステップＳ１３０３）。そして、発話情報検出部１５０ｂは、発話区間と発話者とを対応付けた発話情報１４０ｃを記憶部１４０に記憶させる（ステップＳ１３０４）。その後、発話情報検出部１５０ｂは、全ての音声情報から発話区間を検出したか否かを判定する（ステップＳ１３０５）。発話情報検出部１５０ｂが、全ての音声情報から発話区間を検出していないと判定した場合（ステップＳ１３０５：Ｎｏ）、ステップＳ１３０１に戻り処理を繰り返す。一方、発話情報検出部１５０ｂが、全ての音声情報から発話区間を検出したと判定した場合（ステップＳ１３０５：Ｙｅｓ）。このサブルーチンを終了する。 After that, the utterance information detection unit 150b of the selection device 100 detects the utterance information 140c from the voice information (step S103). FIG. 10 is a subroutine showing a processing procedure for detecting utterance information. As shown in FIG. 10, the utterance information detection unit 150b detects the utterance section from the acquired voice information (step S1301). Subsequently, the utterance information detection unit 150b calculates the acoustic characteristics for each frame included in each utterance section (step S1302). Further, the utterance information detection unit 150b calculates the similarity between the calculated evaluation target acoustic feature and the learning acoustic feature calculated in step S101, and identifies the speaker (step S1303). Then, the utterance information detection unit 150b stores the utterance information 140c in which the utterance section and the speaker are associated with each other in the storage unit 140 (step S1304). After that, the utterance information detection unit 150b determines whether or not the utterance section is detected from all the voice information (step S1305). When the utterance information detection unit 150b determines that the utterance section has not been detected from all the voice information (step S1305: No), the process returns to step S1301 and the process is repeated. On the other hand, when it is determined that the utterance information detection unit 150b has detected the utterance section from all the voice information (step S1305: Yes). Exit this subroutine.

図９に戻り、選定装置１００の音声認識部１５０ｃは、音声情報に対して音声認識を行い、単語を抽出する（ステップＳ１０４）。 Returning to FIG. 9, the voice recognition unit 150c of the selection device 100 performs voice recognition on the voice information and extracts a word (step S104).

また、選定装置１００の発話印象評価部１５０ｄは、音声情報に対する参加者の印象を評価する（ステップＳ１０５）。 Further, the utterance impression evaluation unit 150d of the selection device 100 evaluates the participant's impression of the voice information (step S105).

続いて、選定装置１００の特定部１５０ｅは、オウム返し特定する（ステップＳ１０６）。図１１は、オウム返しを特定する処理手順を示すサブルーチンである。図１１に示すように、特定部１５０ｅは、時系列で最初の発話情報の発話者を最終発話者に設定する（ステップＳ１６０１）。続いて、特定部１５０ｅは、時系列で次の発話情報の発話者が最終発話者と一致するか否かを判定する（ステップＳ１６０２）。 Subsequently, the specifying unit 150e of the selection device 100 specifies the parrot return (step S106). FIG. 11 is a subroutine showing a processing procedure for specifying the parrot return. As shown in FIG. 11, the specific unit 150e sets the speaker of the first utterance information in the time series as the last speaker (step S1601). Subsequently, the specific unit 150e determines whether or not the speaker of the next utterance information matches the final speaker in chronological order (step S1602).

特定部１５０ｅが、時系列で次の発話情報の発話者が最終発話者と一致すると判定した場合（ステップＳ１６０２：Ｙｅｓ）、ステップＳ１６０１に戻る。一方、特定部１５０ｅが、時系列で次の発話情報の発話者が最終発話者と一致しないと判定した場合（ステップＳ１６０２：Ｎｏ）、特定部１５０ｅは、時系列で次の発話情報の発話者を最終発話者に設定する（ステップＳ１６０３）。 When the specific unit 150e determines that the speaker of the next utterance information matches the final speaker in chronological order (step S1602: Yes), the process returns to step S1601. On the other hand, when the specific unit 150e determines that the speaker of the next utterance information does not match the final speaker in the time series (step S1602: No), the specific unit 150e is the speaker of the next utterance information in the time series. Is set as the last speaker (step S1603).

その後、特定部１５０ｅは、最終発話者が設定されている発話情報と直前の発話情報とにおいて、単語が一致するか否かを判定する（ステップＳ１６０４）。特定部１５０ｅが、単語が一致すると判定した場合（ステップＳ１６０４：Ｙｅｓ）、特定部１５０ｅは、オウム返しの発生を特定し、オウム返し発生情報１４０ｅを記憶部１４０に記憶させる（ステップＳ１６０５）。一方、特定部１５０ｅが、単語が一致しないと判定した場合（ステップＳ１６０４：Ｎｏ）、ステップＳ１６０６に進む。 After that, the specific unit 150e determines whether or not the words match in the utterance information set by the final speaker and the immediately preceding utterance information (step S1604). When the specific unit 150e determines that the words match (step S1604: Yes), the specific unit 150e identifies the occurrence of parrot return and stores the parrot return generation information 140e in the storage unit 140 (step S1605). On the other hand, when the specific unit 150e determines that the words do not match (step S1604: No), the process proceeds to step S1606.

ステップＳ１６０６において、特定部１５０ｅは、最終話者の発話区間が時系列で最後であるか否かを判定する。特定部１５０ｅが、最終話者の発話区間が時系列で最後ではないと判定した場合（ステップＳ１６０６：Ｎｏ）、ステップＳ１６０２に戻り処理を繰り返す。一方、特定部１５０ｅが、最終話者の発話区間が時系列で最後であると判定した場合（ステップＳ１６０６：Ｙｅｓ）、このサブルーチンを終了する。 In step S1606, the specific unit 150e determines whether or not the utterance section of the last speaker is the last in the time series. When the specific unit 150e determines that the utterance section of the last speaker is not the last in the time series (step S1606: No), the process returns to step S1602 and the process is repeated. On the other hand, when the specific unit 150e determines that the utterance section of the last speaker is the last in the time series (step S1606: Yes), this subroutine is terminated.

図９に戻り、選定装置１００の判定部１５０ｆは、オウム返しにより発話者Ａ～Ｅの印象がよくなったか否かを判定する（ステップＳ１０７）。 Returning to FIG. 9, the determination unit 150f of the selection device 100 determines whether or not the impression of the speakers A to E is improved by returning the parrot (step S107).

続いて、選定装置１００の選定部１５０ｇは、判定部１５０ｆの判定結果に基づいて、参加者のレーティングを行い、参加者レーティング情報１４０ｇを記憶部１４０に記憶させる（ステップＳ１０８）。そして、選定部１５０ｇは、参加者レーティング情報１４０ｇに基づいて、最もレーティングの高い発話者をファシリテーターに選定し（ステップＳ１０９）、一連の処理が終了する。 Subsequently, the selection unit 150g of the selection device 100 evaluates the participants based on the determination result of the determination unit 150f, and stores the participant rating information 140g in the storage unit 140 (step S108). Then, the selection unit 150g selects the speaker with the highest rating as the facilitator based on the participant rating information 140g (step S109), and a series of processes is completed.

次に、本実施例１に係る選定装置１００の効果について説明する。選定装置１００は、オウム返しを特定し、オウム返しにより参加者の印象をよくした参加者をファシリテーターに選定する。ここで、ファシリテート力が高いとは、他者の話しを聞く能力が高いことが重要であり、他者の話しを聞く能力が高い人は会話の中でオウム返しを多用すると考えられている。そこで、オウム返しにより参加者の印象をよくする参加者をファシリテーターに選定することにより、ファシリテート力が高い参加者をファシリテーターに選定することができ、交流化等を盛り上げることができる。 Next, the effect of the selection device 100 according to the first embodiment will be described. The selection device 100 identifies the parrot return and selects the participant who has improved the impression of the participant by the parrot return as the facilitator. Here, it is important that high facilitation ability means high ability to listen to others, and it is considered that people with high ability to listen to others often use Echolalia in conversation. .. Therefore, by selecting participants who improve the impression of participants by returning parrots as facilitators, participants with high facilitating ability can be selected as facilitators, and it is possible to excite exchanges and the like.

選定装置１００は、オウム返しを特定するだけでなく、印象がよくなったか否かを判定してファシリテーターを選定する。これによって、オウム返しのみを特定する場合よりも精度よく、ファシリテート力が高い参加者をファシリテーターに選定することができる。 The selection device 100 not only identifies the parrot return, but also determines whether or not the impression has improved and selects the facilitator. This makes it possible to select participants with high facilitating ability as facilitators with higher accuracy than when specifying only parrot return.

次に、本実施例２に係る検出装置について説明する。本実施例２に係るシステムは、実施例１の図３で説明したシステムと同様にして、マイク端末１０に無線によって接続されているものとする。本実施例２においても、マイク端末１０は、発話者Ａ～Ｅの音声を収録して音声情報を出力する。 Next, the detection device according to the second embodiment will be described. It is assumed that the system according to the second embodiment is wirelessly connected to the microphone terminal 10 in the same manner as the system described with reference to FIG. 3 of the first embodiment. Also in the second embodiment, the microphone terminal 10 records the voices of the speakers A to E and outputs the voice information.

本実施例２に係る選定装置は、マイク端末１０から音声情報を取得し、発話者Ａ～Ｅのうち、ファシリテート力が高いと判定した参加者をファシリテーターに選定する。 The selection device according to the second embodiment acquires voice information from the microphone terminal 10, and selects the participants A to E who are determined to have high facilitating ability as facilitators.

図１２は、本実施例２に係る選定装置の構成を示す機能ブロック図である。図１２に示すように、この選定装置２００は、通信部２１０と、入力部２２０と、表示部２３０と、記憶部２４０と、制御部２５０とを有する。 FIG. 12 is a functional block diagram showing the configuration of the selection device according to the second embodiment. As shown in FIG. 12, the selection device 200 includes a communication unit 210, an input unit 220, a display unit 230, a storage unit 240, and a control unit 250.

通信部２１０は、無線によって、マイク端末１０とデータ通信を実行する処理部である。通信部２１０は、通信装置の一例である。通信部２１０は、マイク端末１０から音声情報を受信し、受信した音声情報を、制御部２５０に出力する。なお、選定装置２００は、有線によって、マイク端末１０に接続してもよい。選定装置２００は、通信部２１０によってネットワークに接続し、外部装置（図示略）とデータを送受信してもよい。 The communication unit 210 is a processing unit that wirelessly executes data communication with the microphone terminal 10. The communication unit 210 is an example of a communication device. The communication unit 210 receives voice information from the microphone terminal 10 and outputs the received voice information to the control unit 250. The selection device 200 may be connected to the microphone terminal 10 by wire. The selection device 200 may be connected to the network by the communication unit 210 to transmit / receive data to / from an external device (not shown).

入力部２２０は、選定装置２００に各種の情報を入力するための入力装置である。入力部２２０は、キーボードやマウス、タッチパネル等に対応する。 The input unit 220 is an input device for inputting various information to the selection device 200. The input unit 220 corresponds to a keyboard, a mouse, a touch panel, and the like.

表示部２３０は、制御部２５０から出力される情報を表示する表示装置である。表示部２３０は、液晶ディスプレイやタッチパネル等に対応する。 The display unit 230 is a display device that displays information output from the control unit 250. The display unit 230 corresponds to a liquid crystal display, a touch panel, or the like.

記憶部２４０は、音声バッファ２４０ａと、学習音響特徴情報２４０ｂと、発話情報２４０ｃと、発話印象評価情報２４０ｄと、オウム返し発生情報２４０ｅと、ファシリテート力評価情報２４０ｆと、参加者レーティング情報２４０ｇと、誉め言葉特定情報２４０ｈとを有する。記憶部２４０は、ＲＡＭ、フラッシュメモリなどの半導体メモリ素子や、ＨＤＤなどの記憶装置に対応する。 The storage unit 240 includes a voice buffer 240a, learning acoustic feature information 240b, utterance information 240c, utterance impression evaluation information 240d, parrot return generation information 240e, facilitating power evaluation information 240f, and participant rating information 240g. , Complimentary word specific information 240h. The storage unit 240 corresponds to a semiconductor memory element such as a RAM or a flash memory, or a storage device such as an HDD.

音声バッファ２４０ａは、マイク端末１０から送信される音声情報を格納するバッファである。音声情報では、音声信号と時刻とが対応付けられる。 The voice buffer 240a is a buffer for storing voice information transmitted from the microphone terminal 10. In the voice information, the voice signal and the time are associated with each other.

学習音響特徴情報２４０ｂは、予め学習される発話者Ａ～Ｅそれぞれの音声の音響特徴の情報である。音響特徴には、ピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向が含まれる。たとえば、学習音響特徴情報２４０ｂは、ピッチ周波数、フレームパワー、フォルマント周波数、音声到来方向の値をそれぞれ要素とするベクトルである。 The learning acoustic feature information 240b is information on the acoustic features of the voices of the speakers A to E that are learned in advance. Acoustic features include pitch frequency, frame power, formant frequency, and voice arrival direction. For example, the learning acoustic feature information 240b is a vector having values of pitch frequency, frame power, formant frequency, and voice arrival direction as elements.

発話情報２４０ｃは、参加者の音声情報に含まれる発話が行われた発話区間とこの発話区間における発話を行った発話者とを対応付けた情報である。 The utterance information 240c is information in which the utterance section in which the utterance is made and the utterance speaker in this utterance section, which are included in the voice information of the participants, are associated with each other.

発話印象評価情報２４０ｄは、音声情報に対する参加者の印象を評価した情報である。 The utterance impression evaluation information 240d is information that evaluates the participant's impression of the voice information.

オウム返し発生情報２４０ｅは、抽出した単語が一致し、かつ発話者が異なるオウム返しの発生を示す情報である。 The parrot return generation information 240e is information indicating the occurrence of parrot return in which the extracted words match and the speakers are different.

ファシリテート力評価情報２４０ｆは、参加者のファシリテート力を評価した情報である。 The facilitating ability evaluation information 240f is information for evaluating the facilitating ability of the participants.

参加者レーティング情報２４０ｇは、参加者のレーティング（格付け）を行った情報である。 Participant rating information 240g is information obtained by rating participants.

誉め言葉特定情報２４０ｈは、予め登録した誉め言葉を特定したことを表す情報である。誉め言葉特定情報２４０ｈは、特定された誉め言葉と、誉め言葉が発話された時刻とが対応付けられて記憶された情報である。 The praise word specific information 240h is information indicating that the praise word registered in advance has been specified. The compliment word specific information 240h is information stored in association with the specified compliment word and the time when the compliment word is spoken.

制御部２５０は、取得部２５０ａと、発話情報検出部２５０ｂと、音声認識部２５０ｃと、発話印象評価部２５０ｄと、特定部２５０ｅと、判定部２５０ｆと、選定部２５０ｇと、誉め言葉特定部２５０ｈとを有する。制御部２５０は、ＣＰＵやＭＰＵ、ＡＳＩＣやＦＰＧＡなどのハードワイヤードロジック等によって実現される。 The control unit 250 includes an acquisition unit 250a, an utterance information detection unit 250b, a voice recognition unit 250c, an utterance impression evaluation unit 250d, a specific unit 250e, a determination unit 250f, a selection unit 250g, and a compliment specific unit 250h. And have. The control unit 250 is realized by a hard-wired logic such as a CPU, an MPU, an ASIC, or an FPGA.

取得部２５０ａは、通信部２１０を介して、マイク端末１０から音声情報を取得する処理部である。取得部２５０ａは、音声情報を順次、音声バッファ２４０ａに格納する。 The acquisition unit 250a is a processing unit that acquires voice information from the microphone terminal 10 via the communication unit 210. The acquisition unit 250a sequentially stores the voice information in the voice buffer 240a.

発話情報検出部２５０ｂは、音声バッファ２４０ａから音声情報を取得し、音声情報から図４に示す発話情報２４０ｃを検出する処理部である。発話情報検出部２５０ｂは、発話区間検出処理、音響解析処理、類似性評価処理を行う。 The utterance information detection unit 250b is a processing unit that acquires voice information from the voice buffer 240a and detects the utterance information 240c shown in FIG. 4 from the voice information. The utterance information detection unit 250b performs utterance section detection processing, acoustic analysis processing, and similarity evaluation processing.

発話情報検出部２５０ｂが実行する発話区間検出処理、音響解析処理、類似性評価処理は、実施例１で説明した発話印象評価部１５０ｄと同様である。 The utterance section detection process, the acoustic analysis process, and the similarity evaluation process executed by the utterance information detection unit 250b are the same as those of the utterance impression evaluation unit 150d described in the first embodiment.

音声認識部２５０ｃは、音声情報を取得し、音声情報に対して音声認識を行い図４に示す発話内容文字列を生成する処理部である。音声認識部２５０ｃは、音声認識により生成した文字列を各発話区間と対応付けて、発話内容文字列として発話情報２４０ｃに含めて記憶部２４０に記憶させる。また、音声認識部２５０ｃは、生成した文字列から単語を抽出する処理部である。音声認識部２５０ｃは、発話内容文字列に含まれる単語を抽出する。 The voice recognition unit 250c is a processing unit that acquires voice information, performs voice recognition on the voice information, and generates an utterance content character string shown in FIG. The voice recognition unit 250c associates the character string generated by voice recognition with each utterance section, includes it in the utterance information 240c as an utterance content character string, and stores it in the storage unit 240. Further, the voice recognition unit 250c is a processing unit that extracts a word from the generated character string. The voice recognition unit 250c extracts words included in the utterance content character string.

発話印象評価部２５０ｄは、音声情報を取得し、音声情報における音声信号のピッチ周波数の上下幅に基づいて、音声情報に対する参加者の印象を評価する処理部である。発話印象評価部２５０ｄは、「発話印象評価処理」を行う。 The utterance impression evaluation unit 250d is a processing unit that acquires voice information and evaluates the participant's impression of the voice information based on the vertical width of the pitch frequency of the voice signal in the voice information. The utterance impression evaluation unit 250d performs the "utterance impression evaluation process".

発話印象評価部２５０ｄが実行する発話印象評価処理は、実施例１で説明した発話印象評価部１５０ｄと同様である。 The utterance impression evaluation process executed by the utterance impression evaluation unit 250d is the same as that of the utterance impression evaluation unit 150d described in the first embodiment.

特定部２５０ｅは、発話情報２４０ｃを取得し、発話内容文字列からオウム返しを特定する処理部である。特定部２５０ｅは、オウム返し特定処理を実行する。 The specific unit 250e is a processing unit that acquires the utterance information 240c and specifies the parrot return from the utterance content character string. The specific unit 250e executes the parrot return specific process.

特定部２５０ｅが実行するオウム返し特定処理は、実施例１で説明した特定部１５０ｅと同様である。 The parrot return specifying process executed by the specific unit 250e is the same as that of the specific unit 150e described in the first embodiment.

判定部２５０ｆは、発話印象評価情報２４０ｄ、オウム返し発生情報２４０ｅ、および誉め言葉特定情報を取得し、オウム返しにより、印象がよくなったか否かを判定する処理部である。判定部２５０ｆは、判定処理を実行する。 The determination unit 250f is a processing unit that acquires the utterance impression evaluation information 240d, the parrot return generation information 240e, and the praise word specific information, and determines whether or not the impression is improved by the parrot return. The determination unit 250f executes the determination process.

判定部２５０ｆが実行する「判定処理」の一例について説明する。判定部２５０ｆは、取得したオウム返し発生情報２４０ｅから選択した１つのオウム返しが発生した発話区間を特定し、取得した発話印象評価情報２４０ｄにおいて選択したオウム返しの直後の発話者が異なる発話区間において、印象評価値が上昇しているまたは誉め言葉が特定されている場合に印象がよくなったと判定する。また、判定部２５０ｆは、取得したオウム返し発生情報２４０ｅから選択した１つのオウム返しが発生した発話区間を特定し、取得した発話印象評価情報２４０ｄにおいて選択したオウム返しの直後の発話者が異なる発話区間において、印象評価値が上昇し、かつ誉め言葉が特定されている場合に印象がよくなったと判定してもよい。 An example of the "determination process" executed by the determination unit 250f will be described. The determination unit 250f identifies an utterance section in which one parrot return occurs selected from the acquired parrot return generation information 240e, and in an utterance section in which the speaker immediately after the parrot return selected in the acquired speech impression evaluation information 240d is different. , It is judged that the impression is improved when the impression evaluation value is increased or the compliment is specified. Further, the determination unit 250f identifies the utterance section in which one parrot return occurs selected from the acquired parrot return generation information 240e, and the speaker immediately after the parrot return selected in the acquired speech impression evaluation information 240d is a different utterance. It may be determined that the impression is improved when the impression evaluation value is increased and the compliment is specified in the section.

選定部２５０ｇは、ファシリテート力評価情報２４０ｆを取得し、判定部２５０ｆの判定結果に基づいて、ファシリテーターを選定する処理部である。選定部２５０ｇは、選定処理を実行する。 The selection unit 250g is a processing unit that acquires facilitator force evaluation information 240f and selects a facilitator based on the determination result of the determination unit 250f. The selection unit 250g executes the selection process.

選定部２５０ｇが実行する選定処理は、実施例１で説明した選定部１５０ｇと同様である。 The selection process executed by the selection unit 250g is the same as that of the selection unit 150g described in the first embodiment.

誉め言葉特定部２５０ｈは、発話情報２４０ｃを取得し、単語から他者への誉め言葉を特定する処理部である。誉め言葉特定部２５０ｈは、誉め言葉特定処理を実行する。 The praise word specifying unit 250h is a processing unit that acquires utterance information 240c and identifies praise words from words to others. The compliment word specifying unit 250h executes the compliment word specifying process.

誉め言葉特定部２５０ｈが実行する「誉め言葉特定処理」の一例について説明する。誉め言葉特定部２５０ｈは、発話情報２４０ｃを取得し、発話内容文字列に含まれる単語から、予め登録された誉め言葉に一致す単語を特定する。誉め言葉は、例えば「すごい」等の単語である。そして、誉め言葉特定部２５０ｈは、特定した誉め言葉と、誉め言葉が発話された時刻とを対応付けて誉め言葉特定情報２４０ｈとして記憶部２４０に記憶させる。 An example of the "praise word identification process" executed by the compliment word identification unit 250h will be described. The praise word specifying unit 250h acquires the utterance information 240c and identifies a word matching the pre-registered praise word from the words included in the utterance content character string. Complimentary words are words such as "wow". Then, the praise word specifying unit 250h stores the specified praise word and the time when the praise word is uttered in the storage unit 240 as the praise word specific information 240h in association with each other.

次に、本実施例２に係る選定装置２００の処理手順の一例について説明する。図１３は、本実施例２に係る選定装置の処理手順を示すフローチャートである。図１３に示すように、選定装置２００は、事前準備として交流会等の参加者である発話者Ａ～Ｅの音声データを取得し、取得した音響データを解析して各発話者の音響特徴を算出する（ステップＳ２０１）。 Next, an example of the processing procedure of the selection device 200 according to the second embodiment will be described. FIG. 13 is a flowchart showing a processing procedure of the selection apparatus according to the second embodiment. As shown in FIG. 13, the selection device 200 acquires the voice data of the speakers A to E who are participants of the exchange meeting or the like as a preliminary preparation, analyzes the acquired acoustic data, and determines the acoustic characteristics of each speaker. Calculate (step S201).

続いて、選定装置２００の取得部２５０ａは、複数の発話者の音声を含む音声情報を取得し、音声バッファ２４０ａに格納する（ステップＳ２０２）。 Subsequently, the acquisition unit 250a of the selection device 200 acquires voice information including the voices of a plurality of speakers and stores the voice information in the voice buffer 240a (step S202).

その後、選定装置２００の発話情報検出部２５０ｂは、音声情報から発話情報２４０ｃを検出する（ステップＳ２０３）。 After that, the utterance information detection unit 250b of the selection device 200 detects the utterance information 240c from the voice information (step S203).

選定装置２００の音声認識部２５０ｃは、音声情報に対して音声認識を行い、単語を抽出する（ステップＳ２０４）。 The voice recognition unit 250c of the selection device 200 performs voice recognition on the voice information and extracts a word (step S204).

また、選定装置２００の発話印象評価部２５０ｄは、音声情報に対する参加者の印象を評価する（ステップＳ２０５）。 Further, the utterance impression evaluation unit 250d of the selection device 200 evaluates the participant's impression of the voice information (step S205).

さらに、選定装置２００の誉め言葉特定部２５０ｈは、発話内容文字列に含まれる単語から誉め言葉を特定する（ステップＳ２０６）。 Further, the praise word specifying unit 250h of the selection device 200 identifies the praise word from the words included in the utterance content character string (step S206).

続いて、選定装置２００の特定部２５０ｅは、オウム返し特定する（ステップＳ２０７）。 Subsequently, the specifying unit 250e of the selection device 200 identifies the parrot return (step S207).

選定装置２００の判定部２５０ｆは、オウム返しにより発話者Ａ～Ｅの印象がよくなったか否かを判定する（ステップＳ２０８）。 The determination unit 250f of the selection device 200 determines whether or not the impression of the speakers A to E is improved by returning the parrot (step S208).

続いて、選定装置２００の選定部２５０ｇは、判定部２５０ｆの判定の結果に基づいて、参加者のレーティングを行い、参加者レーティング情報２４０ｇを記憶部２４０に記憶させる（ステップＳ２０９）。そして、選定部２５０ｇは、参加者レーティング情報２４０ｇに基づいて、最もレーティングの高い発話者をファシリテーターに選定し（ステップＳ２１０）、一連の処理が終了する。 Subsequently, the selection unit 250g of the selection device 200 evaluates the participants based on the determination result of the determination unit 250f, and stores the participant rating information 240g in the storage unit 240 (step S209). Then, the selection unit 250g selects the speaker with the highest rating as the facilitator based on the participant rating information 240g (step S210), and a series of processes is completed.

次に、本実施例２に係る選定装置２００の効果について説明する。選定装置２００は、オウム返しを特定し、オウム返しにより参加者の印象をよくしたか、またはオウム返しとともに誉め言葉を発したかを判定し、ファシリテーターに選定する。これによって、オウム返しだけでなく、誉め言葉を用いて交流の場を盛り上げることができる参加者をファシリテーターに選定することができ、交流化等を盛り上げることができる。 Next, the effect of the selection device 200 according to the second embodiment will be described. The selection device 200 identifies the parrot return, determines whether the participant's impression is improved by the parrot return, or whether the compliment is given together with the parrot return, and is selected as the facilitator. As a result, it is possible to select participants who can liven up the place of exchange by using praise words as well as returning the parrot as facilitators, and it is possible to liven up the exchange.

上記実施例で用いた単語、発話例、発話人数、シチュエーション等は、あくまで一例であり、任意に変更することができる。例えば、参加者の印象変化を評価する例として、オウム返しを例示したが、これに限定されるものではなく、予め定めておいた特定の単語（例えば褒め言葉、キーワードなど）が発話されたタイミングで評価することもできる。 The words, utterance examples, number of utterances, situations, etc. used in the above embodiment are merely examples and can be arbitrarily changed. For example, as an example of evaluating a change in the impression of a participant, Echolalia is exemplified, but the timing is not limited to this, and a predetermined specific word (for example, a compliment, a keyword, etc.) is spoken. It can also be evaluated with.

上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Information including processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散や統合の具体的形態は図示のものに限られない。つまり、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution or integration of each device is not limited to the one shown in the figure. That is, all or a part thereof can be functionally or physically distributed / integrated in any unit according to various loads, usage conditions, and the like.

さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

次に、上記実施例に示した選定装置１００（２００）と同様の機能を実現するコンピュータのハードウェア構成の一例について説明する。図１４は、選定装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。 Next, an example of a computer hardware configuration that realizes the same functions as the selection device 100 (200) shown in the above embodiment will be described. FIG. 14 is a diagram showing an example of a hardware configuration of a computer that realizes the same function as the selection device.

図１４に示すように、コンピュータ３００は、各種演算処理を実行するＣＰＵ３０１と、ユーザからのデータの入力を受け付ける入力装置３０２と、ディスプレイ３０３とを有する。また、コンピュータ３００は、記憶媒体からプログラム等を読み取る読み取り装置３０４と、有線または無線ネットワークを介して、マイク、カメラ、振動センサ等からデータを取得するインタフェース装置３０５とを有する。コンピュータ３００は、各種情報を一時記憶するＲＡＭ３０６と、ハードディスク装置３０７とを有する。そして、各装置３０１～３０７は、バス３０８に接続される。 As shown in FIG. 14, the computer 300 has a CPU 301 for executing various arithmetic processes, an input device 302 for receiving data input from a user, and a display 303. Further, the computer 300 has a reading device 304 that reads a program or the like from a storage medium, and an interface device 305 that acquires data from a microphone, a camera, a vibration sensor, or the like via a wired or wireless network. The computer 300 has a RAM 306 that temporarily stores various information and a hard disk device 307. Then, each of the devices 301 to 307 is connected to the bus 308.

ハードディスク装置３０７は、取得プログラム３０７ａ、発話情報検出プログラム３０７ｂ、音声認識プログラム３０７ｃ、発話印象評価プログラム３０７ｄ、特定プログラム３０７ｅ、判定プログラム３０７ｆ、選定プログラム３０７ｇを有する。ＣＰＵ３０１は、取得プログラム３０７ａ、発話情報検出プログラム３０７ｂ、音声認識プログラム３０７ｃ、発話印象評価プログラム３０７ｄ、特定プログラム３０７ｅ、判定プログラム３０７ｆ、選定プログラム３０７ｇ（選定装置２００では、さらに誉め言葉特定プログラム）を読み出してＲＡＭ３０６に展開する。 The hard disk device 307 includes an acquisition program 307a, an utterance information detection program 307b, a voice recognition program 307c, an utterance impression evaluation program 307d, a specific program 307e, a determination program 307f, and a selection program 307g. The CPU 301 reads out the acquisition program 307a, the utterance information detection program 307b, the voice recognition program 307c, the utterance impression evaluation program 307d, the specific program 307e, the determination program 307f, and the selection program 307g (further compliment specification program in the selection device 200). Expand to RAM 306.

取得プログラム３０７ａは、取得プロセス３０６ａとして機能する。発話情報検出プログラム３０７ｂは、発話情報検出プロセス３０６ｂとして機能する。音声認識プログラム３０７ｃは、音声認識プロセス３０６ｃとして機能する。発話印象評価プログラム３０７ｄは、発話印象評価プロセス３０６ｄとして機能する。特定プログラム３０７ｅは、特定プロセス３０６ｅとして機能する。判定プログラム３０７ｆは、判定プロセス３０６ｆとして機能する。選定プログラム３０７ｇは、選定プロセス３０６ｇとして機能する。選定装置２００では、誉め言葉特定プログラムは、誉め言葉特定プロセスとして機能する。 The acquisition program 307a functions as the acquisition process 306a. The utterance information detection program 307b functions as the utterance information detection process 306b. The speech recognition program 307c functions as a speech recognition process 306c. The utterance impression evaluation program 307d functions as the utterance impression evaluation process 306d. The specific program 307e functions as the specific process 306e. The determination program 307f functions as the determination process 306f. The selection program 307g functions as the selection process 306g. In the selection device 200, the compliment identification program functions as a compliment identification process.

取得プロセス３０６ａの処理は、取得部１５０ａ，２５０ａの処理に対応する。発話情報検出プロセス３０６ｂの処理は、発話情報検出部１５０ｂ，２５０ｂの処理に対応する。音声認識プロセス３０６ｃの処理は、音声認識部１５０ｃ，２５０ｃの処理に対応する。発話印象評価プロセス３０６ｄの処理は、発話印象評価部１５０ｄ，２５０ｄの処理に対応する。特定プロセス３０６ｅの処理は、特定部１５０ｅ，２５０ｅの処理に対応する。判定プロセス３０６ｆの処理は、判定部１５０ｆ，２５０ｆの処理に対応する。選定プロセス３０６ｇの処理は、選定部１５０ｇ，２５０ｇの処理に対応する。選定装置２００では、誉め言葉特定プロセスの処理は、誉め言葉特定部２５０ｈの処理に対応する。 The processing of the acquisition process 306a corresponds to the processing of the acquisition units 150a and 250a. The processing of the utterance information detection process 306b corresponds to the processing of the utterance information detection units 150b and 250b. The processing of the voice recognition process 306c corresponds to the processing of the voice recognition units 150c and 250c. The processing of the utterance impression evaluation process 306d corresponds to the processing of the utterance impression evaluation units 150d and 250d. The processing of the specific process 306e corresponds to the processing of the specific units 150e and 250e. The processing of the determination process 306f corresponds to the processing of the determination units 150f and 250f. The processing of the selection process 306g corresponds to the processing of the selection units 150g and 250g. In the selection device 200, the processing of the compliment specifying process corresponds to the processing of the compliment specifying unit 250h.

なお、各プログラム３０７ａ～３０７ｇについては、必ずしも最初からハードディスク装置３０７に記憶させておかなくてもよい。例えば、コンピュータ３００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させておく。そして、コンピュータ３００が各プログラム３０７ａ～３０７ｇを読み出して実行するようにしてもよい。 The programs 307a to 307g do not necessarily have to be stored in the hard disk device 307 from the beginning. For example, each program is stored in a "portable physical medium" such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, or an IC card inserted into a computer 300. Then, the computer 300 may read and execute each program 307a to 307g.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following additional notes will be further disclosed with respect to the embodiments including each of the above embodiments.

（付記１）複数の発話者の音声が含まれる音声情報を取得し、前記音声情報に含まれる発話が行われた発話区間と該発話区間における発話を行った発話者とを対応付けた発話情報を検出し、前記音声情報に対して音声認識を行い、前記音声情報に含まれる単語を抽出し、前記音声情報に含まれる特定の単語を含む前記発話情報の前後で、前記複数の発話者の印象を評価し、前記複数の発話者の印象の評価に基づき、ファシリテーターを選定する処理をコンピュータに実行させることを特徴とする選定プログラム。 (Appendix 1) Voice information including the voices of a plurality of speakers is acquired, and the utterance information included in the voice information is associated with the utterance section in which the utterance is made and the utterance in the utterance section. Is detected, voice recognition is performed on the voice information, words included in the voice information are extracted, and before and after the utterance information including a specific word included in the voice information, of the plurality of speakers. A selection program characterized by evaluating an impression and causing a computer to execute a process of selecting a facilitator based on the evaluation of the impressions of the plurality of speakers.

（付記２）時系列順に隣接する前記発話情報において、前記単語が一致し、かつ前記発話者が異なるオウム返しを特定し、前記オウム返しにより、前記印象がよくなったか否かを判定し、前記判定の結果に基づいて、前記ファシリテーターを選定する処理をコンピュータに実行させることを特徴とする付記１に記載の選定プログラム。 (Appendix 2) In the utterance information adjacent to each other in chronological order, the parrot return in which the words match and the speaker is different is specified, and it is determined whether or not the parrot return improves the impression. The selection program according to Appendix 1, wherein a computer is made to execute a process of selecting the facilitator based on the result of the determination.

（付記３）前記評価する処理は、前記複数の発話者の生体情報に基づいて前記複数の発話者の印象を評価する処理を含むことを特徴とする付記１または２に記載の選定プログラム。 (Appendix 3) The selection program according to Appendix 1 or 2, wherein the evaluation process includes a process of evaluating the impression of the plurality of speakers based on the biological information of the plurality of speakers.

（付記４）前記評価する処理は、前記音声情報に含まれる音声信号のピッチ周波数の上下幅に基づいて前記複数の発話者の印象を評価する処理を含むことを特徴とする付記１から３のいずれか一つに記載の選定プログラム。 (Supplementary Note 4) The process of the evaluation includes the process of evaluating the impression of the plurality of speakers based on the vertical width of the pitch frequency of the voice signal included in the voice information. The selection program described in any one.

（付記５）前記単語から他者への誉め言葉を特定し、前記判定する処理は、前記誉め言葉を用いて、前記オウム返しにより、前記印象がよくなったか否かを判定する処理を含むことを特徴とする付記２に記載の選定プログラム。 (Appendix 5) The process of identifying a compliment to another person from the word and determining the determination includes a process of determining whether or not the impression is improved by the parrot return using the compliment. The selection program described in Appendix 2, which is characterized by the above.

（付記６）複数の発話者の音声が含まれる音声情報を取得し、前記音声情報に含まれる発話が行われた発話区間と該発話区間における発話を行った発話者とを対応付けた発話情報を検出し、前記音声情報に対して音声認識を行い、前記音声情報に含まれる単語を抽出し、前記音声情報に含まれる特定の単語を含む前記発話情報の前後で、前記複数の発話者の印象を評価し、前記複数の発話者の印象の評価に基づき、ファシリテーターを選定する処理をコンピュータが実行することを特徴とする選定方法。 (Appendix 6) Voice information including the voices of a plurality of speakers is acquired, and the utterance information included in the voice information is associated with the utterance section in which the utterance is made and the utterance in the utterance section. Is detected, voice recognition is performed on the voice information, words included in the voice information are extracted, and before and after the utterance information including a specific word included in the voice information, of the plurality of speakers. A selection method characterized in that a computer executes a process of evaluating an impression and selecting a facilitator based on the evaluation of the impressions of the plurality of speakers.

（付記７）複数の発話者の音声が含まれる音声情報を取得する取得部と、前記音声情報に含まれる発話が行われた発話区間と該発話区間における発話を行った発話者とを対応付けた発話情報を検出する発話情報検出部と、前記音声情報に対して音声認識を行い、前記音声情報に含まれる単語を抽出する音声認識部と、前記音声情報に含まれる特定の単語を含む前記発話情報の前後で、前記複数の発話者の印象を評価する発話印象評価部と、前記複数の発話者の印象の評価に基づき、ファシリテーターを選定する選定部と、を有することを特徴とする選定装置。 (Appendix 7) Correspondence between the acquisition unit that acquires voice information including the voices of a plurality of speakers, the utterance section in which the utterance included in the voice information is performed, and the speaker who utters in the utterance section. The utterance information detection unit that detects the utterance information, the voice recognition unit that performs voice recognition on the voice information and extracts the words included in the voice information, and the voice recognition unit including the specific words included in the voice information. Selection characterized by having an utterance impression evaluation unit that evaluates the impressions of the plurality of speakers before and after the utterance information, and a selection unit that selects a facilitator based on the evaluation of the impressions of the plurality of speakers. Device.

１００，２００検出装置
１１０，２１０通信部
１２０，２２０入力部
１３０，２３０表示部
１４０，２４０記憶部
１４０ａ，２４０ａ音声バッファ
１４０ｂ，２４０ｂ学習音響特徴情報
１４０ｃ，２４０ｃ発話情報
１４０ｄ，２４０ｄ発話印象評価情報
１４０ｅ，２４０ｅオウム返し特定情報
１４０ｆ，２４０ｆファシリテート力評価情報
１４０ｇ，２４０ｇ参加者レーティング情報
２４０ｈ誉め言葉特定情報
１５０，２５０制御部
１５０ａ，２５０ａ取得部
１５０ｂ，２５０ｂ発話情報検出部
１５０ｃ，２５０ｃ音声認識部
１５０ｄ，２５０ｄ発話印象評価部
１５０ｅ，２５０ｅ特定部
１５０ｆ，２５０ｆ判定部
１５０ｇ，２５０ｇ選定部
２５０ｈ誉め言葉特定部 100, 200 Detection device 110, 210 Communication unit 120, 220 Input unit 130, 230 Display unit 140, 240 Storage unit 140a, 240a Voice buffer 140b, 240b Learning acoustic feature information 140c, 240c Speech information 140d, 240d Speech impression evaluation information 140e , 240e Parrot return specific information 140f, 240f Facilitate power evaluation information 140g, 240g Participant rating information 240h Complimentary word specific information 150, 250 Control unit 150a, 250a Acquisition unit 150b, 250b Speech information detection unit 150c, 250c Voice recognition unit 150d , 250d Speech impression evaluation unit 150e, 250e Specific unit 150f, 250f Judgment unit 150g, 250g Selection unit 250h Complimentary word identification unit

Claims

Acquires voice information that includes the voices of multiple speakers,
The utterance information in which the utterance section in which the utterance was made and the speaker who made the utterance in the utterance section are associated with each other is detected.
Voice recognition is performed on the voice information, words included in the voice information are extracted, and the words are extracted.
Impressions of the plurality of speakers are evaluated before and after the utterance information including a specific word included in the voice information.
A selection program characterized by having a computer execute a process of selecting a facilitator based on the evaluation of the impressions of the plurality of speakers.

In the utterance information adjacent to each other in chronological order, the word is matched and the speaker identifies a different parrot return.
It is determined whether or not the impression is improved by the parrot return, and it is determined.
The selection program according to claim 1, wherein a computer executes a process of selecting the facilitator based on the result of the determination.

The selection program according to claim 1 or 2, wherein the evaluation process includes a process of evaluating the impression of the plurality of speakers based on the biological information of the plurality of speakers.

One of claims 1 to 3, wherein the evaluation process includes a process of evaluating the impression of the plurality of speakers based on the vertical width of the pitch frequency of the voice signal included in the voice information. The selection program described in one.

Identify words of praise to others from the above words,
The selection program according to claim 2, wherein the determination process includes a process of determining whether or not the impression is improved by the parrot return using the compliment.

Acquires voice information that includes the voices of multiple speakers,
The utterance information in which the utterance section in which the utterance was made and the speaker who made the utterance in the utterance section are associated with each other is detected.
Voice recognition is performed on the voice information, words included in the voice information are extracted, and the words are extracted.
Impressions of the plurality of speakers are evaluated before and after the utterance information including a specific word included in the voice information.
A selection method characterized in that a computer executes a process of selecting a facilitator based on the evaluation of the impressions of the plurality of speakers.

An acquisition unit that acquires voice information that includes the voices of multiple speakers, and
An utterance information detection unit that detects utterance information in which the utterance section in which the utterance is performed included in the voice information and the utterance speaker in the utterance section are associated with each other.
A voice recognition unit that performs voice recognition on the voice information and extracts words included in the voice information.
An utterance impression evaluation unit that evaluates the impressions of the plurality of speakers before and after the utterance information including a specific word included in the voice information.
A selection unit that selects facilitators based on the evaluation of the impressions of multiple speakers,
A selection device characterized by having.