JP2022139436A

JP2022139436A - Conference support device, conference support system, conference support method, and program

Info

Publication number: JP2022139436A
Application number: JP2021039820A
Authority: JP
Inventors: 威一郎横尾; Iichiro Yokoo
Original assignee: NEC Platforms Ltd
Current assignee: NEC Platforms Ltd
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2022-09-26

Abstract

To provide a conference support device, a conference support system, a conference support method, and a program that support progress of an online conference.SOLUTION: In a conference support device 1, conference support devices 10 to 30 that support progress of a conference each includes: an emotion analysis unit that detects emotional expression of conference attendee; a prediction unit that predicts whether the attendee has intention of speaking on the basis of the emotional expression; and a questions unit that asks the attendee for an opinion thereof if the attendee is predicted to have an intention to speak.SELECTED DRAWING: Figure 1

Description

本発明は、会議支援装置、会議支援システム、会議支援方法、およびプログラムに関し、特に、オンライン会議の進行を支援する会議支援装置、会議支援システム、会議支援方法、およびプログラムに関する。 The present invention relates to a meeting support device, a meeting support system, a meeting support method, and a program, and more particularly to a meeting support device, a meeting support system, a meeting support method, and a program that support the progress of an online meeting.

近年、テレワーク、リモートワーク、あるいは在宅勤務など、社員が個別に業務を実施する形態が急速に普及している。これに伴って、社員たちが会議室に集合する代わりに、インターネットなどのネットワークを通じて、オンライン会議（あるいはＷｅｂ会議、またはリモート会議とも呼ばれる）が開催される機会が増大している。オンライン会議において、会議の出席者がマイクロフォンおよびカメラを備えた端末を用いてビデオ通話することを可能にするオンライン会議支援システム（以下、単に会議支援システムと呼ぶ）が利用されている。 In recent years, telework, remote work, telecommuting, and other forms in which employees perform their own work have rapidly spread. Along with this, there are increasing opportunities for employees to hold online meetings (also called web meetings or remote meetings) through networks such as the Internet instead of gathering employees in conference rooms. In an online conference, an online conference support system (hereinafter simply referred to as a conference support system) is used that enables conference attendees to make video calls using terminals equipped with microphones and cameras.

特許文献１には、オンライン会議をより効率的かつ快適にするための会議支援システムが開示されている。具体的には、特許文献１には、出席者がオンライン会議に入室する際、出席者がユーザＩＤおよびパスワードを入力する代わりに、予め登録された顔画像データと、出席者の端末から取得した顔画像データとを照合することによって、出席者を顔認証することが記載されている。さらに、特許文献１には、仮想空間における出席者の分身を表すアバターを生成して、さらに、出席者の感情や状態を示す感情アイコンを、そのアバターに付加することが記載されている。 Patent Literature 1 discloses a meeting support system for making online meetings more efficient and comfortable. Specifically, in Patent Document 1, when attendees enter an online conference, instead of the attendees entering a user ID and password, pre-registered face image data and It describes face authentication of attendees by matching with face image data. Furthermore, Patent Literature 1 describes generating an avatar that represents an attendee's alter ego in a virtual space, and adding an emotion icon that indicates the attendee's emotion or state to the avatar.

特開２０１９－０６１５９４号公報JP 2019-061594 A

会議の出席者同士の発言が重なったり、長時間にわたって誰も発言しなかったりするなど、議論が滞ることが度々ありうる。特に、オンライン会議では、出席者が発言のタイミングをつかみづらいという問題もある。オンライン会議が円滑に進行するように、ファシリテーションを行うことができる会議支援システムが要望されている。 Discussions can often be stagnant, such as when participants in a meeting speak at the same time, or when no one speaks for a long period of time. Especially in an online conference, there is also the problem that it is difficult for attendees to grasp the timing of their remarks. There is a demand for a conference support system that can facilitate an online conference to proceed smoothly.

本発明は上記の課題に鑑みてなされたものであり、その目的は、会議の進行を支援することにある。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and an object of the present invention is to support the progress of a conference.

上記の課題を解決するために、本発明の一態様に係わる会議支援装置は、会議の出席者の感情表現を検出する感情分析手段と、前記感情表現に基づいて、前記出席者が発言の意図を有するか予測する予測手段と、前記出席者が発言の意図を有すると予測された場合、前記出席者に意見を尋ねる質問手段とを備えている。 In order to solve the above problems, a conference support apparatus according to one aspect of the present invention includes emotion analysis means for detecting emotional expressions of attendees in a conference, and analysis of intentions of remarks by the attendees based on the emotional expressions. and a questioning means for asking an opinion of the attendee when it is predicted that the attendee has an intention to speak.

本発明の一態様に係わる会議支援方法では、会議の出席者の感情表現を検出し、前記感情表現に基づいて、前記出席者が発言の意図を有するか予測し、前記出席者が発言の意図を有すると予測された場合、前記出席者に意見を尋ねる。 A conference support method according to an aspect of the present invention detects an emotional expression of a conference attendee, predicts whether the attendee has an intention to speak based on the emotional expression, and predicts whether the attendee has an intention to speak. , ask the attendees for their opinion.

本発明の一態様に係わるプログラムは、会議の出席者の感情表現を検出する処理と、前記感情表現に基づいて、前記出席者が発言の意図を有するか予測する処理と、前記出席者が発言の意図を有すると予測された場合、前記出席者に意見を尋ねる処理とをコンピュータに実行させる。 A program according to an aspect of the present invention includes processing for detecting an emotional expression of a conference attendee, processing for predicting whether the attendee has an intention to speak based on the emotional expression, processing for predicting whether the attendee has an intention to speak, If the intention is predicted, the computer is caused to execute a process of asking the attendees for their opinions.

上記の一態様によれば、会議の進行を支援することができる。 According to the aspect described above, it is possible to assist progress of the conference.

実施形態１～３に係わる会議支援装置が適用可能な会議支援システムの構成の一例を概略的に示す図である。1 is a diagram schematically showing an example of configuration of a conference support system to which conference support devices according to Embodiments 1 to 3 can be applied; FIG. 実施形態１または２に係わる会議支援装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a conference support device according to Embodiment 1 or 2; FIG. 実施形態１に係わる会議支援装置の動作を示すフローチャートである。4 is a flow chart showing the operation of the conference support device according to the first embodiment; 実施形態２に係わる会議支援装置の動作を示すフローチャートである。9 is a flow chart showing the operation of the conference support device according to the second embodiment; 実施形態３に係わる会議支援装置の構成を示すブロック図である。11 is a block diagram showing the configuration of a conference support device according to Embodiment 3; FIG. 実施形態３に係わる会議支援装置の動作を示すフローチャートである。11 is a flow chart showing the operation of the conference support device according to the third embodiment; 予測器の再学習のためのデータベースの一例である。1 is an example of a database for predictor retraining; 実施形態１～３に係わる会議支援装置のハードウェア構成を示すブロック図である。2 is a block diagram showing a hardware configuration of a conference support device according to Embodiments 1-3; FIG.

（会議支援システム１）
図１は、後述する実施形態１～３に係わる会議支援装置１０，２０，３０のいずれか（以下、「会議支援装置１０（２０，３０）」と記載する）を適用することが可能な会議支援システム１の構成の一例を概略的に示す図である。図１に示すように、会議支援システム１は、会議支援装置１０（２０，３０）および端末１００X，１００Y，１００Zを備えている。会議支援システム１は、生体認証装置２００、音声認識装置３００、及びアバター生成装置４００をさらに備えている。なお、生体認証装置２００、音声認識装置３００、及びアバター生成装置４００は、会議支援システム１の必須構成ではない。会議支援システム１は、生体認証装置２００、音声認識装置３００、及びアバター生成装置４００の一部または全部を備えていなくてもよい。 (Meeting support system 1)
FIG. 1 shows a conference to which any one of conference support devices 10, 20, and 30 (hereinafter referred to as "conference support device 10 (20, 30)") according to Embodiments 1 to 3, which will be described later, can be applied. 1 is a diagram schematically showing an example of a configuration of a support system 1; FIG. As shown in FIG. 1, the conference support system 1 includes conference support devices 10 (20, 30) and terminals 100X, 100Y, 100Z. The conference support system 1 further includes a biometric authentication device 200 , a speech recognition device 300 and an avatar generation device 400 . Note that the biometric authentication device 200 , the voice recognition device 300 and the avatar generation device 400 are not essential components of the conference support system 1 . The meeting support system 1 may not include some or all of the biometric authentication device 200 , the speech recognition device 300 and the avatar generation device 400 .

会議支援システム１では、出席者ｘ，ｙ，ｚの端末１００X，１００Y，１００Zが、それぞれ、インターネットなどの広域ネットワークを介して、会議支援装置１０（２０，３０）と接続されている。会議支援装置１０（２０，３０）は、生体認証装置２００、音声認識装置３００、およびアバター生成装置４００と接続されている。生体認証装置２００、音声認識装置３００、およびアバター生成装置４００は、会議支援装置１０（２０，３０）と同じクラウドサーバ内で構成されていてもよいし、会議支援装置１０（２０，３０）とは異なるクラウドサーバ内で構成されていてもよい。 In the conference support system 1, terminals 100X, 100Y, and 100Z of attendees x, y, and z are respectively connected to a conference support device 10 (20, 30) via a wide area network such as the Internet. Conference support device 10 ( 20 , 30 ) is connected to biometric authentication device 200 , speech recognition device 300 , and avatar generation device 400 . Biometric authentication device 200, speech recognition device 300, and avatar generation device 400 may be configured in the same cloud server as meeting support device 10 (20, 30), or may be configured in the same cloud server as meeting support device 10 (20, 30). may be configured in different cloud servers.

各出席者は、自分の顔を写した顔画像データを会議支援システム１に登録する。また、各出席者は、オンライン会議における自分の分身であるアバターのデータを登録する。例えば、出席者ｘは自分のポートレートをアバターとして選択する。一方、出席者ｙは、好きなキャラクターをアバターとして選択する。出席者ｚは、似顔絵をアバターとして選択する。各出席者が会議支援システム１に登録したアバターは、オンライン会議中、各出席者の端末１００の図示しない画面上に表示される。 Each attendee registers face image data showing his/her own face in the conference support system 1. - 特許庁Also, each attendee registers the data of their avatar, which is their alter ego in the online conference. For example, attendee x selects his portrait as his avatar. On the other hand, attendee y selects a favorite character as an avatar. Attendee z selects a caricature as an avatar. The avatar registered by each attendee in the conference support system 1 is displayed on the screen (not shown) of each attendee's terminal 100 during the online conference.

また、各出席者は、自分の名前またはニックネームなどの識別名を、会議支援システム１に登録する。出席者の呼称は、音声認識の目的および表示名に使用される。例えば、出席者ｘは、自分の氏名を呼称として登録する。出席者ｙは、あだ名などのニックネームを登録する。出席者ｚは、自分の役職（「部長」など）を登録する。端末１００の図示しない画面上には、各出席者のアバターとともに、各出席者の呼称が表示される。 Also, each attendee registers his/her name or identification name such as a nickname in the conference support system 1 . Attendee names are used for voice recognition purposes and display names. For example, attendee x registers his name as a nickname. Attendee y registers a nickname such as a nickname. Attendee z registers his/her position (such as "manager"). An avatar of each attendee and a name of each attendee are displayed on a screen (not shown) of the terminal 100 .

端末１００X，１００Y，１００Zは、出席者ｘ，ｙ，ｚによって所持され、管理されるPC（Personal Computer）などのユーザデバイスである。端末１００X，１００Y，１００Zは同一の構成を備えている。以下では、端末１００X，１００Y，１００Zのいずれかを、端末１００と略記する。図１に示すように、端末１００は、集音装置１１０、撮像装置１２０、および会議アプリケーション１３０を備えている。集音装置１１０は、出席者の発言を集音するためのマイクロフォンなどの機器である。集音装置１１０は、出席者の発言に伴って発生する振動情報を、音声データに変換する。撮像装置１２０は、出席者の顔を撮影するためのカメラなどの機器である。撮像装置１２０は、出席者の顔を撮影することによって、出席者の顔を写した時系列の顔画像データを生成する。 Terminals 100X, 100Y, and 100Z are user devices such as PCs (Personal Computers) owned and managed by attendees x, y, and z. Terminals 100X, 100Y, and 100Z have the same configuration. One of the terminals 100X, 100Y, and 100Z is abbreviated as the terminal 100 below. As shown in FIG. 1, the terminal 100 includes a sound collector 110, an imaging device 120, and a conference application . The sound collection device 110 is a device such as a microphone for collecting speech of attendees. The sound collecting device 110 converts vibration information generated in response to speech of attendees into audio data. The imaging device 120 is equipment such as a camera for photographing the faces of attendees. The imaging device 120 generates time-series face image data of the attendee's face by photographing the attendee's face.

会議アプリケーション１３０は、端末１００の集音装置１１０が生成した音声データ、および、撮像装置１２０が生成した時系列の顔画像データを、ネットワークを介して、会議支援装置１０（２０，３０）との間で送受信する。会議アプリケーション１３０は、端末１００の集音装置１１０を制御することによって、集音装置１１０の入力のオン／オフを任意のタイミングで切り替える。通常、会議アプリケーション１３０は、出席者が自らミュート解除する操作を行わない限り、集音装置１１０の入力をオフにする。また、会議アプリケーション１３０は、アバターや会議資料を表示するためのブラウザなど、さまざまなUI（User Interface）を生成して、端末１００が有する図示しない画面上にＵＩを表示する。 The conference application 130 transmits audio data generated by the sound collector 110 of the terminal 100 and time-series face image data generated by the imaging device 120 to the conference support device 10 (20, 30) via the network. send and receive between The conference application 130 controls the sound collector 110 of the terminal 100 to switch on/off the input of the sound collector 110 at arbitrary timing. Normally, the conference application 130 turns off the input of the sound collector 110 unless the attendee performs an operation to unmute themselves. The conference application 130 also generates various UIs (User Interfaces) such as avatars and a browser for displaying conference materials, and displays the UIs on a screen (not shown) of the terminal 100 .

会議の予約時間になったとき、またはその所定時間前になったとき、会議アプリケーション１３０は、自動的に起動して、端末１００の前にいる人物を、撮像装置１２０に撮影させる。会議アプリケーション１３０は、撮像装置１２０によって取得された顔画像データを、会議支援装置１０（２０，３０）へ送信する。会議アプリケーション１３０は、会議支援装置１０（２０，３０）から、生体認証装置２００による認証結果を受信する。生体認証に成功した場合、会議アプリケーション１３０は、オンライン会議に自動で入室する。なお、生体認証に失敗した場合、会議アプリケーション１３０は、ユーザＩＤおよびパスワードを用いて、手動でログインすることを、出席者に要求してもよい。 When the reserved time for the conference comes or a predetermined time comes before it, the conference application 130 automatically starts up and causes the imaging device 120 to photograph a person in front of the terminal 100 . The conference application 130 transmits face image data acquired by the imaging device 120 to the conference support device 10 (20, 30). The conference application 130 receives the authentication result by the biometric authentication device 200 from the conference support device 10 (20, 30). If the biometric authentication is successful, the conference application 130 automatically enters the online conference. Note that if the biometric authentication fails, the conference application 130 may request the attendee to manually log in using the user ID and password.

生体認証装置２００は、顔、指紋、虹彩、静脈、声紋、又はその他の生体情報を用いて、生体認証を実行する。一例では、生体認証装置２００は、会議支援装置１０（２０，３０）から、端末１００の撮像装置１２０が生成した顔画像データを取得する。生体認証装置２００は、会議支援装置１０（２０，３０）から取得した顔画像データと、出席者の識別情報と紐づけられてDB（Data Base）に登録された顔画像データとを照合する。生体認証装置２００は、顔画像データから抽出した特徴を比較する照合技術によって、端末１００の前にいる人物が、ＤＢに登録された出席者本人であるかどうかを判定する。そして、生体認証装置２００は、認証の成功又は失敗を、会議支援装置１０（２０，３０）へ通知する。会議支援装置１０（２０，３０）は、生体認証装置２００から得た認証結果に基づいて、端末１００が会議支援装置１０（２０，３０）に接続することの可否を決定する。 The biometric authentication device 200 performs biometric authentication using face, fingerprint, iris, vein, voiceprint, or other biometric information. In one example, the biometric authentication device 200 acquires face image data generated by the imaging device 120 of the terminal 100 from the conference support device 10 (20, 30). The biometrics authentication device 200 collates the face image data acquired from the meeting support device 10 (20, 30) with the face image data linked to the identification information of the attendees and registered in a DB (Data Base). The biometric authentication device 200 determines whether or not the person in front of the terminal 100 is the attendee registered in the DB by a matching technique that compares features extracted from face image data. Then, the biometric device 200 notifies the meeting support device 10 (20, 30) of the success or failure of the authentication. Based on the authentication result obtained from the biometric device 200, the conference support device 10 (20, 30) determines whether or not the terminal 100 can connect to the conference support device 10 (20, 30).

音声認識装置３００は、音響モデル及び言語モデルに基づいて、音声データを、言語を表すテキストデータに変換する。音響モデルは、音声データが持つ音響的な特徴を表す。一方、言語モデルは、音素の並び順に関する制約を表す。一例では、音声認識装置３００は、会議支援装置１０（２０，３０）から、端末１００の集音装置１１０が生成した音声データを取得する。音声認識装置３００は、会議支援装置１０（２０，３０）から取得した音声データを、テキストデータに変換する。そして、音声認識装置３００は、音声データから変換したテキストデータを、会議支援装置１０（２０，３０）へ送信する。 The speech recognition device 300 converts speech data into text data representing language based on the acoustic model and the language model. The acoustic model represents acoustic features of speech data. On the other hand, the language model expresses restrictions on the order of phonemes. In one example, the speech recognition device 300 acquires speech data generated by the sound collector 110 of the terminal 100 from the conference support device 10 (20, 30). The speech recognition device 300 converts speech data acquired from the conference support device 10 (20, 30) into text data. Then, the speech recognition device 300 transmits the text data converted from the speech data to the conference support device 10 (20, 30).

アバター生成装置４００は、仮想空間（ここではオンライン会議）における出席者の分身を表すアバターを生成する。アバター生成装置４００は、生成したアバターのデータを、会議支援装置１０（２０，３０）へ送信する。会議支援装置１０（２０，３０）は、端末１００へ、アバターのデータを送信する。端末１００にインストールされた会議アプリケーション１３０は、会議支援装置１０（２０，３０）から受信したアバターのデータを用いて、端末１００の図示しない画面上にアバターを表示させる。 The avatar generation device 400 generates an avatar that represents the alter ego of the attendee in the virtual space (here, the online conference). The avatar generation device 400 transmits data of the generated avatar to the conference support device 10 (20, 30). The conference support device 10 ( 20 , 30 ) transmits avatar data to the terminal 100 . The conference application 130 installed in the terminal 100 uses the avatar data received from the conference support device 10 (20, 30) to display the avatar on the screen (not shown) of the terminal 100. FIG.

加えて、アバター生成装置４００は、特許文献１に記載の関連する技術のように、出席者の表情または感情を表すアイコンをアバターに付加してもよい。この場合、アバター生成装置４００は、後述する実施形態１～３に係わる会議支援装置１０（２０，３０）のように、端末１００から送信された時系列の顔画像データ、および音声データを用いて、出席者の感情表現を検出する。そして、アバター生成装置４００は、感情表現の分析結果に基づいて、出席者が発言の意図を有するかどうかを判定する。 In addition, the avatar generation device 400 may add an icon representing the facial expression or emotion of the attendee to the avatar, like the related technology described in Patent Document 1. In this case, the avatar generation device 400 uses time-series face image data and voice data transmitted from the terminal 100, like the conference support device 10 (20, 30) according to Embodiments 1 to 3, which will be described later. , to detect the emotional expressions of attendees. Then, avatar generation device 400 determines whether the attendee has an intention to speak based on the analysis result of the emotional expression.

会議支援装置１０（２０，３０）は、オンライン会議中の各出席者の表情及び動作を分析し、その分析結果に基づく感情表現を、各出席者のアバターに反映させる。特に、ある出席者が発言したのち、他の出席者に感情表現があった場合、会議支援装置１０（２０，３０）は、アバターにその感情表現を反映させる。この機能により、オンライン会議において、現実の集合会議の雰囲気を再現することができる。 The conference support device 10 (20, 30) analyzes the facial expressions and actions of each attendee during the online conference, and reflects the emotional expression based on the analysis result on each attendee's avatar. In particular, if another attendee expresses an emotion after a certain attendee speaks, the conference support device 10 (20, 30) reflects the emotional expression on the avatar. This function makes it possible to reproduce the atmosphere of a real group meeting in an online meeting.

また、会議支援装置１０（２０，３０）は、音声認識装置３００から得たテキストデータを分析する。そして、会議支援装置１０（２０，３０）は、テキストデータの分析結果に基づいて、オンライン会議のファシリテーションを行う。例えば、会議支援装置１０（２０，３０）は、テキストデータにある出席者の名前が含まれる場合、その出席者に対して、意見を尋ねる。 Also, the conference support device 10 ( 20 , 30 ) analyzes the text data obtained from the speech recognition device 300 . Then, the conference support device 10 (20, 30) facilitates the online conference based on the analysis result of the text data. For example, if the name of an attendee is included in the text data, the conference support device 10 (20, 30) asks the attendee for their opinion.

以下の実施形態１～３では、会議支援システム１が３つの端末１００X，１００Y，１００Zを含む例を説明する。しかしながら、会議支援システム１は、２つ以上の任意の数の端末１００を含んでいてよい。 In Embodiments 1 to 3 below, an example in which the conference support system 1 includes three terminals 100X, 100Y, and 100Z will be described. However, the conference support system 1 may include any number of terminals 100 equal to or greater than two.

〔実施形態１〕
図２～図３を参照して、実施形態１について説明する。 [Embodiment 1]
Embodiment 1 will be described with reference to FIGS. 2 and 3. FIG.

（会議支援装置１０）
図２は、本実施形態１に係わる会議支援装置１０の構成を示すブロック図である。図２に示すように、会議支援装置１０は、感情分析部１１、予測部１２、および質問部１３を備えている。 (Meeting support device 10)
FIG. 2 is a block diagram showing the configuration of the conference support device 10 according to the first embodiment. As shown in FIG. 2, the conference support device 10 includes an emotion analysis section 11, a prediction section 12, and a question section 13. As shown in FIG.

感情分析部１１は、会議の出席者の感情表現を検出する。感情分析部１１は、感情分析手段の一例である。一例では、感情分析部１１は、端末１００から、時系列の顔画像データを受信する。感情分析部１１は、時系列の顔画像データに写る出席者の顔の変化を検出する。例えば、感情分析部１１は、時系列の顔画像データ間で、画素値の類似度に基づいて、画素を一対一で対応付ける。そして、感情分析部１１は、顔の口、鼻、目などの各パーツと対応する画素の位置座標が、時系列の顔画像データにおいて、どのように変化しているのかを計算する。 The emotion analysis unit 11 detects the emotional expressions of the attendees of the conference. The emotion analysis unit 11 is an example of emotion analysis means. In one example, the emotion analysis unit 11 receives time-series face image data from the terminal 100 . The emotion analysis unit 11 detects changes in faces of attendees appearing in time-series face image data. For example, the emotion analysis unit 11 associates pixels one-to-one between time-series face image data based on the degree of similarity of pixel values. Then, the emotion analysis unit 11 calculates how the positional coordinates of the pixels corresponding to each part of the face, such as the mouth, nose, and eyes, change in the time-series face image data.

感情分析部１１は、時系列の顔画像データにおける出席者の特定の表情変化を、感情表現として検出する。例えば、顔の各パーツの変化と、悲しみ、怒り、喜びなどの感情とを紐づけた第1の評価テーブル（図示せず）が、予め準備されている。例えば、第1の評価テーブルにおいて、出席者の顔の眉が上がることが、何らかの感情（例えば、怒り）に係わる感情表現であると特定されている。この場合、感情分析部１１は、感情時系列の顔画像データにおいて、出席者の顔の眉が上がってゆくことを、一つの感情表現として検出する。 The emotion analysis unit 11 detects specific facial expression changes of attendees in time-series face image data as emotional expressions. For example, a first evaluation table (not shown) is prepared in advance, which associates changes in each part of the face with emotions such as sadness, anger, and joy. For example, in the first rating table, raising the eyebrows on the attendee's face is identified as an emote involving some emotion (eg, anger). In this case, the emotion analysis unit 11 detects, in the emotion time-series facial image data, that the eyebrows of the attendee's face are raised as one emotional expression.

さらに、感情分析部１１は、時系列の顔画像データにおける出席者の特定の動作を、感情表現として検出してもよい。例えば、出席者の動作と、悲しみ、怒り、喜びなどの感情とを紐づけた第２の評価テーブルが、予め準備されている。例えば、第２の評価テーブルにおいて、出席者が首を振ることが、何らかの感情に係わる感情表現であると特定されている。この場合、感情分析部１１は、感情時系列の顔画像データにおいて、出席者が首を振ることを、一つの感情表現として検出する。 Furthermore, the emotion analysis unit 11 may detect specific actions of attendees in time-series face image data as emotional expressions. For example, a second evaluation table is prepared in advance, in which actions of attendees are associated with emotions such as sadness, anger, and joy. For example, in the second rating table, the attendee's head shaking is identified as an emotional expression involving some emotion. In this case, the emotion analysis unit 11 detects the head shaking of the attendee as one emotional expression in the emotion time-series face image data.

感情分析部１１は、上記のようにして、時系列の顔画像データから検出した感情表現のデータを、予測部１２へ出力する。 The emotion analysis unit 11 outputs the data of the emotion expression detected from the time-series face image data as described above to the prediction unit 12 .

予測部１２は、感情表現に基づいて、出席者が発言の意図を有するか予測する。予測部１２は、予測手段の一例である。一例では、予測部１２は、出席者が発言の意図を有するかどうかを予測する予測器を用いる。出席者が過去に発言した際の感情表現のデータが、図示しないデータベースに格納されている。予測器は、このデータベースに格納された、出席者が過去に発言した際の感情表現のデータを用いて、CNN（Convolutional Neural Network）を深層学習させることによって得られる。予測器を機械学習させる処理は、図示しない学習部（図６）によって実行されてもよい（実施形態３）。 The prediction unit 12 predicts whether the attendee has the intention of speaking based on the emotional expression. The prediction unit 12 is an example of prediction means. In one example, the predictor 12 uses a predictor to predict whether an attendee has an intention to speak. Emotional expression data of past statements made by attendees is stored in a database (not shown). The predictor is obtained by deep learning a CNN (Convolutional Neural Network) using the emotional expression data of the attendees' past remarks stored in this database. The process of machine learning the predictor may be executed by a learning unit (FIG. 6) not shown (third embodiment).

予測部１２は、感情分析部１１から受信した感情表現のデータを、機械学習した予測器へ入力する。機械学習した予測器は、入力された感情表現のデータから特徴を抽出する。機械学習した予測器は、抽出した特徴に基づいて、出席者が発言の意図を有するかどうかを予測し、その予測結果を出力する。予測部１２は、出席者が発言の意図を有するか否かを示す予測結果を、質問部１３へ出力する。 The prediction unit 12 inputs the emotional expression data received from the emotion analysis unit 11 to a machine learning predictor. A machine-learned predictor extracts features from input emotional expression data. A machine-learning predictor predicts whether the attendee has the intention of speaking based on the extracted features, and outputs the prediction result. The prediction unit 12 outputs to the question unit 13 a prediction result indicating whether or not the attendee has an intention to speak.

出席者が発言の意図を有すると予測された場合、質問部１３は、出席者に意見を尋ねる。質問部１３は、質問手段の一例である。一例では、質問部１３は、予測部１２から、出席者が発言の意図を有するか否かを示す予測結果を受信する。出席者が発言の意図を有すると予測された場合、例えば、質問部１３は、事前に登録された出席者の識別名と、「発言をお願いします。」などの定型文とを組み合わせることによって、音声メッセージを生成する。そして、質問部１３は、端末１００の会議アプリケーション１３０に指示することによって、端末１００の図示しないスピーカから、出席者の意見を尋ねる音声メッセージを出力させる。 When it is predicted that the attendee has an intention to speak, the question section 13 asks the attendee for their opinion. The question part 13 is an example of question means. In one example, the question unit 13 receives from the prediction unit 12 a prediction result indicating whether or not the attendee has an intention to speak. When it is predicted that the attendee has the intention of speaking, for example, the question unit 13 combines the identification name of the attendee registered in advance with a standard sentence such as "I would like to speak." , to generate a voice message. Then, the questioning unit 13 instructs the conference application 130 of the terminal 100 to output a voice message asking the opinion of the attendee from a speaker (not shown) of the terminal 100 .

あるいは、質問部１３は、端末１００の図示しない画面上に、出席者の意見を尋ねるメッセージを表示させるように、会議アプリケーション１３０に指示してもよい。また、質問部１３は、端末１００の集音装置１１０の入力をオン（すなわちミュートを解除）するように、会議アプリケーション１３０に指示する。 Alternatively, the question unit 13 may instruct the conference application 130 to display a message asking the opinions of attendees on a screen (not shown) of the terminal 100 . In addition, the question unit 13 instructs the conference application 130 to turn on the input of the sound collector 110 of the terminal 100 (that is, cancel the mute).

（会議支援装置１０の動作）
図３を参照して、本実施形態１に係わる会議支援装置１０の動作を説明する。図３は、会議支援装置１０の各部が実行する処理の流れを示すフローチャートである。以下では、会議の出席者ｘ、ｙ，ｚのいずれかのことを、「出席者ｘ（ｙ，ｚ）」と記載する。 (Operation of conference support device 10)
The operation of the conference support device 10 according to the first embodiment will be described with reference to FIG. FIG. 3 is a flow chart showing the flow of processing executed by each unit of the conference support device 10. As shown in FIG. Any one of attendees x, y, and z of the conference is hereinafter referred to as "attendee x(y,z)."

図３に示すように、感情分析部１１は、出席者ｘ（ｙ，ｚ）の感情表現を検出する（Ｓ1）。感情分析部１１は、検出した感情表現のデータを、予測部１２へ出力する。 As shown in FIG. 3, the emotion analysis section 11 detects the emotional expression of the attendee x(y,z) (S1). The emotion analysis unit 11 outputs the detected emotion expression data to the prediction unit 12 .

予測部１２は、出席者ｘ（ｙ，ｚ）に発言の意図があるかどうか予測する（Ｓ２）。予測部１２は、出席者ｘ（ｙ，ｚ）に発言の意図があるか否かを示す予測結果を、質問部１３へ出力する。 The prediction unit 12 predicts whether the attendee x(y, z) has an intention to speak (S2). The prediction unit 12 outputs to the question unit 13 a prediction result indicating whether or not the attendee x(y, z) intends to speak.

出席者ｘ（ｙ，ｚ）に発言の意図がないと予測された場合（Ｓ３でＮｏ）、会議支援装置１０の動作は終了する。 If it is predicted that the attendee x(y, z) has no intention of speaking (No in S3), the operation of the conference support device 10 ends.

一方、出席者ｘ（ｙ，ｚ）に発言の意図があると予測された場合（Ｓ３でＹｅｓ）、質問部１３は、出席者ｘ（ｙ，ｚ）に意見を尋ねる（Ｓ４）。例えば、質問部１３は、端末１００にインストールされた会議アプリケーション１３０に指示することによって、端末１００の図示しないスピーカから、出席者ｘ（ｙ，ｚ）に発言を促す音声メッセージを出力させる。あるいは、質問部１３は、端末１００の図示しない画面上に、出席者ｘ（ｙ，ｚ）に意見を尋ねるメッセージを表示させる。 On the other hand, when it is predicted that the attendee x(y,z) intends to speak (Yes in S3), the question unit 13 asks the attendee x(y,z) for his opinion (S4). For example, the question unit 13 instructs the conference application 130 installed in the terminal 100 to output a voice message prompting the attendee x(y,z) to speak from a speaker (not shown) of the terminal 100 . Alternatively, the questioning unit 13 causes the screen (not shown) of the terminal 100 to display a message asking the attendee x(y,z) for an opinion.

以上で、会議支援装置１０の動作は終了する。 Thus, the operation of the conference support device 10 ends.

（変形例）
一変形例では、会議支援装置１０は、出席者の感情表現と対応する感情（例えば、喜び、楽しい、怒り、悲しい）を、端末１００の画面に表示させてもよい。あるいは、会議支援装置１０は、音声認識により得られたテキストデータを、端末１００の画面に表示させてもよい。これにより、聴覚障害者であっても、より積極的に会議に参加することができる。 (Modification)
In a modified example, the conference support device 10 may cause the screen of the terminal 100 to display the emotional expression of the attendee and the corresponding emotion (for example, joy, joy, anger, sadness). Alternatively, the conference support device 10 may display text data obtained by speech recognition on the screen of the terminal 100 . As a result, even hearing-impaired people can more actively participate in the conference.

他の変形例では、ミュート解除のための特定のジェスチャを決めておいてもよい。例えば、出席者が瞬きを３回したとき、または出席者が挙手をしたときに、会議支援装置１０は、その出席者の端末１００をミュート解除する。これにより、出席者は、少ない手間でミュートを解除することができる。 In other variations, a specific gesture for unmuting may be defined. For example, when the attendee blinks his eyes three times or raises his hand, the conference support device 10 unmutes the attendee's terminal 100 . This allows attendees to cancel muting with little effort.

（本実施形態の効果）
本実施形態の構成によれば、感情分析部１１は、会議の出席者の感情表現を検出する。予測部１２は、感情表現に基づいて、出席者が発言の意図を有するか予測する。出席者が発言の意図を有すると予測された場合、質問部１３は、出席者に意見を尋ねる。出席者は、意見を求められることによって、きっかけを得られるので、発言をしやすくなる。これにより、会議の円滑な進行を支援することができる。 (Effect of this embodiment)
According to the configuration of this embodiment, the emotion analysis unit 11 detects the emotional expressions of the attendees of the conference. The prediction unit 12 predicts whether the attendee has the intention of speaking based on the emotional expression. When it is predicted that the attendee has an intention to speak, the question section 13 asks the attendee for their opinion. Attendees are encouraged to speak up by being asked for their opinions. As a result, smooth progress of the conference can be supported.

〔実施形態２〕
図４～図５を参照して、実施形態２について説明する。本実施形態２では、会議支援装置２０は、音声認識装置３００（図１）から得たテキストデータに基づいて、ファシリテーションを行う。本実施形態２と前記実施形態１との間で共通する構成に関して、本実施形態２では、前記実施形態１の説明を引用して、その説明を省略する。 [Embodiment 2]
Embodiment 2 will be described with reference to FIGS. 4 and 5. FIG. In Embodiment 2, the conference support device 20 performs facilitation based on text data obtained from the speech recognition device 300 (FIG. 1). Regarding the configuration common between the second embodiment and the first embodiment, the description of the first embodiment is quoted in the second embodiment, and the description thereof is omitted.

本実施形態２に係わる会議支援装置２０の構成は、前記実施形態１に係わる会議支援装置１０（図２）と同じである。 The configuration of the conference support device 20 according to the second embodiment is the same as that of the conference support device 10 (FIG. 2) according to the first embodiment.

（会議支援装置２０の動作）
図４を参照して、本実施形態２に係わる会議支援装置２０の動作を説明する。図４は、会議支援装置２０の各部が実行する処理の流れを示すフローチャートである。以下では、会議の出席者ｙ，ｚのいずれかのことを、「出席者ｙ（ｚ）」と記載する。最初、出席者ｘが発言している。出席者ｙ（ｚ）は、出席者ｙ（ｚ）の端末１００の集音装置１１０の入力をオフにしている。 (Operation of conference support device 20)
The operation of the conference support device 20 according to the second embodiment will be described with reference to FIG. FIG. 4 is a flow chart showing the flow of processing executed by each unit of the conference support device 20. As shown in FIG. In the following, either of the attendees y and z of the conference will be referred to as "attendee y(z)". First, attendee x is speaking. The attendee y(z) turns off the input of the sound collector 110 of the terminal 100 of the attendee y(z).

図４に示すように、質問部１３は、出席者ｘの発言を分析する（Ｓ２０１）。例えば、質問部１３は、出席者ｘの端末１００Ｘから、音声データを受信する。質問部１３は、音声認識装置３００へ、音声データを送信する。その後、質問部１３は、音声認識装置３００から、音声認識されたテキストデータを受信する。質問部１３は、音声認識装置３００から、音声認識が完了したことを知らせるｎｕｌｌデータを受信したとき、出席者ｘが発言を終了したと判定する。 As shown in FIG. 4, the question unit 13 analyzes the statement of the attendee x (S201). For example, the question unit 13 receives voice data from the terminal 100X of the attendee x. The question unit 13 transmits voice data to the voice recognition device 300 . After that, the question unit 13 receives text data that has undergone speech recognition from the speech recognition device 300 . When the questioning unit 13 receives null data notifying that the speech recognition is completed from the speech recognition device 300, the questioning unit 13 determines that the attendee x has finished speaking.

次に、質問部１３は、音声認識装置３００から受信したテキストデータにおいて、出席者ｙ（ｚ）の識別名を検索する。出席者ｘが、出席者ｙ（ｚ）の識別名を呼称していた場合（Ｓ２０２でＹｅｓ）、フローは後述のステップＳ２０６へ進む。 Next, the questioning unit 13 searches the text data received from the speech recognition device 300 for the identification name of the attendee y(z). If attendee x called the identification name of attendee y(z) (Yes in S202), the flow proceeds to step S206, which will be described later.

質問部１３は、出席者ｘによって識別名を呼称された出席者ｙ（ｚ）に意見を尋ねる（Ｓ２０６）。このとき、質問部１３は、識別名を呼称された出席者ｙ（ｚ）の端末１００にインストールされた会議アプリケーション１３０に対し、集音装置１１０の入力をオン（ミュート解除）するように指示してもよい。一方、出席者ｘが出席者ｙ（ｚ）の識別名を呼称していない場合（Ｓ２０２でＮｏ）、フローは続くステップＳ２０３へ進む。 The questioning unit 13 asks the opinion of the attendee y(z) whose identification name is called by the attendee x (S206). At this time, the question unit 13 instructs the conference application 130 installed in the terminal 100 of the attendee y(z) whose identification name is called to turn on the input of the sound collector 110 (unmute). may On the other hand, if attendee x has not called the identification name of attendee y(z) (No in S202), the flow proceeds to step S203.

続いて、感情分析部１１は、出席者ｙ（ｚ）の感情表現を検出する（Ｓ２０３）。感情分析部１１は、検出した感情表現のデータを、予測部１２へ出力する。 Subsequently, the emotion analysis unit 11 detects the emotional expression of the attendee y(z) (S203). The emotion analysis unit 11 outputs the detected emotion expression data to the prediction unit 12 .

予測部１２は、出席者ｙ（ｚ）に発言の意図があるかどうか予測する（Ｓ２０４）。予測部１２は、出席者ｙ（ｚ）に発言の意図があるか否かを示す予測結果を、質問部１３へ出力する。 The prediction unit 12 predicts whether the attendee y(z) intends to speak (S204). The prediction unit 12 outputs to the question unit 13 a prediction result indicating whether or not the attendee y(z) intends to speak.

出席者ｙ（ｚ）に発言の意図がないと予測された場合（Ｓ２０５でＮｏ）、会議支援装置１０の動作は終了する。 If it is predicted that the attendee y(z) has no intention of speaking (No in S205), the operation of the conference support device 10 ends.

一方、出席者ｘが出席者ｙ（ｚ）の識別名を呼称していた場合（Ｓ２０２でＹｅｓ）、または、出席者ｙ（ｚ）に発言の意図があると予測された場合（Ｓ２０５でＹｅｓ）、質問部１３は、出席者ｙ（ｚ）に意見を尋ねる（Ｓ２０６）。 On the other hand, if attendee x called the identification name of attendee y(z) (Yes in S202), or if it was predicted that attendee y(z) intended to speak (Yes in S205) ), and the questioning unit 13 asks the attendee y(z) for an opinion (S206).

例えば、質問部１３は、端末１００にインストールされた会議アプリケーション１３０に指示することによって、端末１００の図示しないスピーカから、出席者ｙ（ｚ）に発言を促す音声メッセージを出力させる。あるいは、質問部１３は、端末１００の図示しない画面上に、出席者ｙ（ｚ）に意見を尋ねるメッセージを表示させる。 For example, the question unit 13 instructs the conference application 130 installed in the terminal 100 to output a voice message prompting the attendee y(z) to speak from a speaker (not shown) of the terminal 100 . Alternatively, the questioning unit 13 causes the screen (not shown) of the terminal 100 to display a message asking the attendee y(z) for an opinion.

以上で、会議支援装置２０の動作は終了する。 Thus, the operation of the conference support device 20 ends.

（変形例）
一変形例では、会議支援装置２０の質問部１３は、音声認識装置３００から、テキストデータを一定時間受信しない場合、出席者の発言が滞っていると判定する。この場合、質問部１３は、出席者の中で、役職が一番上の人物に発言を求める。例えば、出席者ｚの識別名が「部長」であることから、質問部１３は、会議の決定権がｚにあると推定する。したがって、質問部１３は、「部長（識別名）、決議をお願いします。」などの定型文の音声データを出力する。 (Modification)
In a modified example, the question unit 13 of the conference support device 20 determines that the speech of the attendee is delayed when text data is not received from the speech recognition device 300 for a certain period of time. In this case, the questioning unit 13 asks the person with the highest position among the attendees to speak. For example, since the identification name of attendee z is "Manager", the questioning unit 13 presumes that z has the right to decide on the meeting. Therefore, the questioning unit 13 outputs voice data of fixed sentences such as "Director (identification name), please make a resolution."

さらに、本実施形態の構成によれば、会議支援装置２０は、音声認識装置３００から得たテキストデータに基づいて、ファシリテーションを行う。具体的には、出席者ｘが出席者ｙ（ｚ）の名前を呼称していた場合、質問部１３は、出席者ｘによって名前を呼称された出席者ｙ（ｚ）に意見を尋ねる。これにより、会議の円滑な進行を支援することができる。 Furthermore, according to the configuration of this embodiment, the conference support device 20 performs facilitation based on text data obtained from the speech recognition device 300 . Specifically, when attendee x calls attendee y(z) by name, question unit 13 asks attendee y(z), whose name is called by attendee x, about his/her opinion. As a result, smooth progress of the conference can be supported.

〔実施形態３〕
図５～図７を参照して、実施形態３について説明する。本実施形態３では、会議支援装置３０は、出席者が発言の意図を有するかどうかに関する予測の結果（すなわち的中または外れ）に基づいて、予測器の再学習を行う。本実施形態３と前記実施形態１～２との間で共通する構成に関して、本実施形態３では、前記実施形態１～２の説明を引用して、その説明を省略する。 [Embodiment 3]
Embodiment 3 will be described with reference to FIGS. 5 to 7. FIG. In Embodiment 3, the conference support device 30 re-learns the predictor based on the prediction result (that is, hit or miss) regarding whether the attendee has the intention of speaking. Regarding the configuration common between the third embodiment and the first and second embodiments, in the third embodiment, the description of the first and second embodiments is cited, and the description thereof is omitted.

（会議支援装置３０）
図５は、本実施形態３に係わる会議支援装置３０の構成を示すブロック図である。図５に示すように、会議支援装置３０は、感情分析部１１、予測部１２、および質問部１３を備えている。本実施形態３に係わる会議支援装置３０は、学習部３４をさらに備えている。 (Meeting support device 30)
FIG. 5 is a block diagram showing the configuration of the conference support device 30 according to the third embodiment. As shown in FIG. 5, the conference support device 30 includes an emotion analysis section 11, a prediction section 12, and a question section 13. The conference support device 30 according to the third embodiment further includes a learning section 34 .

学習部３４は、出席者が発言の意図を有するかに関する予測の結果を用いて、予測器を再学習する。学習部３４は、学習手段の一例である。一例では、学習部３４は、出席者が発言の意図を有するかに関する予測が的中したか、それとも外れたかという結果を、予測器にフィードバックする。具体的には、予測が的中した場合、学習部３４は、予測器の重み係数を維持する。一方、予測が外れた場合、学習部３４は、予測の結果に影響の大きい特徴の重み係数を小さくする。そのほか、学習部３４は、任意の方位法で、予測器の重み係数を補正してよい。 The learning unit 34 re-learns the predictor using the prediction results regarding whether the attendee has the intention of speaking. The learning unit 34 is an example of learning means. In one example, the learner 34 feeds back to the predictor the result of whether the prediction about whether the attendee has the intention to speak was correct or not. Specifically, when the prediction is correct, the learning unit 34 maintains the weighting factor of the predictor. On the other hand, when the prediction is incorrect, the learning unit 34 reduces the weighting factor of the feature that greatly affects the prediction result. In addition, the learning unit 34 may correct the weighting factors of the predictors using any orientation method.

さらに、学習部３４は、出席者に意見を尋ねた後の出席者の感情表現に基づいて、予測器を再学習してもよい。具体的には、学習部３４は、出席者に意見を尋ねた後に、出席者の感情表現を、感情分析部１１に再び検出させる。学習部３４は、感情分析部１１から、出席者の２度目の感情表現のデータを受信する。そして、学習部３４は、出席者の２度目の感情表現のデータを用いて、予測器を再学習させる。例えば、学習部３４は、感情分析部１１から受信した出席者の２度目の感情表現のデータを用いて、予測器を深層学習させる。 Furthermore, the learning unit 34 may re-learn the predictor based on the emotional expressions of the attendees after asking their opinions. Specifically, after asking the attendees for their opinions, the learning unit 34 causes the emotion analysis unit 11 to detect the emotional expressions of the attendees again. The learning unit 34 receives the second emotional expression data of the attendee from the emotion analysis unit 11 . Then, the learning unit 34 re-learns the predictor using the second emotional expression data of the attendee. For example, the learning unit 34 deep-learns the predictor using the second emotional expression data of the attendee received from the emotion analysis unit 11 .

（会議支援装置３０の動作）
図６を参照して、本実施形態３に係わる会議支援装置３０の動作を説明する。図６は、会議支援装置３０の各部が実行する処理の流れを示すフローチャートである。以下では、会議の出席者ｘ、ｙ，ｚのいずれかのことを、「出席者ｘ（ｙ，ｚ）」と記載する。 (Operation of conference support device 30)
The operation of the conference support device 30 according to the third embodiment will be described with reference to FIG. FIG. 6 is a flow chart showing the flow of processing executed by each unit of the conference support device 30. As shown in FIG. Any one of attendees x, y, and z of the conference is hereinafter referred to as "attendee x(y,z)."

図６に示すように、感情分析部１１は、出席者ｘ（ｙ，ｚ）の１度目の感情表現を検出する（Ｓ３０１）。感情分析部１１は、検出した１度目の感情表現のデータを、予測部１２へ出力する。 As shown in FIG. 6, the emotion analysis unit 11 detects the first emotion expression of the attendee x(y, z) (S301). The emotion analysis unit 11 outputs the data of the detected emotion expression for the first time to the prediction unit 12 .

予測部１２は、出席者ｘ（ｙ，ｚ）の１度目の感情表現に基づいて、出席者ｘ（ｙ，ｚ）に発言の意図があるかどうか予測する（Ｓ３０２）。予測部１２は、出席者ｘ（ｙ，ｚ）に発言の意図があるか否かを示す予測結果を、質問部１３へ出力する。 The prediction unit 12 predicts whether the attendee x(y,z) intends to speak based on the first emotional expression of the attendee x(y,z) (S302). The prediction unit 12 outputs to the question unit 13 a prediction result indicating whether or not the attendee x(y, z) intends to speak.

出席者ｘ（ｙ，ｚ）に発言の意図がないと予測された場合（Ｓ３０３でＮｏ）、会議支援装置１０の動作は終了する。 If it is predicted that the attendee x(y, z) has no intention of speaking (No in S303), the operation of the conference support device 10 ends.

一方、出席者ｘ（ｙ，ｚ）に発言の意図があると予測された場合（Ｓ３０３でＹｅｓ）、質問部１３は、出席者ｘ（ｙ，ｚ）に意見を尋ねる（Ｓ３０４）。例えば、質問部１３は、端末１００にインストールされた会議アプリケーション１３０に指示することによって、端末１００の図示しないスピーカから、出席者ｘ（ｙ，ｚ）に発言を促す音声メッセージを出力させる。あるいは、質問部１３は、端末１００の図示しない画面上に、出席者ｘ（ｙ，ｚ）に意見を尋ねるメッセージを表示させる。 On the other hand, when it is predicted that attendee x(y,z) intends to speak (Yes in S303), question unit 13 asks attendee x(y,z) for his opinion (S304). For example, the question unit 13 instructs the conference application 130 installed in the terminal 100 to output a voice message prompting the attendee x(y,z) to speak from a speaker (not shown) of the terminal 100 . Alternatively, the questioning unit 13 causes the screen (not shown) of the terminal 100 to display a message asking the attendee x(y,z) for an opinion.

さらに、感情分析部１１は、ステップＳ３０４において質問した出席者ｘ（ｙ，ｚ）の感情表現を再び検出する（Ｓ３０５）。 Furthermore, the emotion analysis unit 11 again detects the emotional expression of the attendee x(y, z) who asked the question in step S304 (S305).

学習部３４は、ステップＳ３０３における予測の結果と、ステップＳ３０５において検出された出席者ｘ（ｙ，ｚ）の２度目の感情表現とに基づいて、再学習のためのデータベース（図７）を更新するとともに、予測器を再学習する（Ｓ３０６）。ステップＳ３０３における予測の結果とは、出席者ｘ（ｙ，ｚ）に発言の意図があるという予測が的中したか、それとも外れたかを意味する。 The learning unit 34 updates the database for re-learning (FIG. 7) based on the prediction result in step S303 and the second emotional expression of the attendee x(y, z) detected in step S305. At the same time, the predictor is re-learned (S306). The result of the prediction in step S303 means whether the prediction that attendee x(y, z) has an intention to speak is true or false.

ステップＳ３０４の後、一定時間以内に、出席者ｘ（ｙ，ｚ）の端末１００から音声データが入力された場合、学習部３４は、予測が的中したと判定する。一方、ステップＳ３０４の後、一定時間以内に、出席者ｘ（ｙ，ｚ）の端末１００から音声データが入力されなかった場合、学習部３４は、予測が外れたと判定する。 After step S304, if voice data is input from the terminal 100 of the attendee x(y, z) within a certain period of time, the learning unit 34 determines that the prediction is correct. On the other hand, after step S304, if voice data is not input from the terminal 100 of the attendee x(y,z) within a certain period of time, the learning unit 34 determines that the prediction has failed.

あるいは、学習部３４は、音声認識装置３００から受信するテキストデータを分析して、「特にないです」「ありません」「ないです」などの否定語を検出した場合は、予測が外れたと判定してもよい。 Alternatively, the learning unit 34 analyzes the text data received from the speech recognition device 300, and if negative words such as "nothing in particular", "nothing", "nothing", etc. are detected, it is determined that the prediction is wrong. good too.

ステップＳ３０５において検出された出席者ｘ（ｙ，ｚ）の感情表現が、ステップＳ３０１において検出された感情表現と部分的に重複する場合、学習部３４は、それらの差分を用いて、予測器の再学習を行う。例えば、ステップＳ３０１において検出された１度目の感情表現が「眉が上がる」および「鼻が膨らむ」であり、ステップＳ３０５において検出された出席者ｘ（ｙ，ｚ）の２度目の感情表現が「眉が上がる」および「頬が上がる」であるとする。この場合、学習部３４は、２度目の感情表現と、１度目の感情表現との差分である「頬が上がる」を、予測器を再学習するために用いる。 If the emotional expression of the attendee x(y, z) detected in step S305 partially overlaps the emotional expression detected in step S301, the learning unit 34 uses the difference between them to Relearn. For example, the first emotional expression detected in step S301 is "eyebrows rise" and "nose puffs up", and the second emotional expression of attendee x(y, z) detected in step S305 is " Suppose that it is "eyebrows rise" and "cheeks rise". In this case, the learning unit 34 uses the difference between the second emotional expression and the first emotional expression, ie, “cheek rises” to re-learn the predictor.

以上で、会議支援装置３０の動作は終了する。 Thus, the operation of the conference support device 30 ends.

（再学習のためのデータベースの一例）
図７は、学習部３４が予測器を再学習させるために利用するデータベースの一例を示す。図７に示す例において、左端の列は、予測器による予測が行われた回数を示す。左端から数えて２～４列目は、感情分析部１１が検出した１度目の出席者の感情表現を表す。左端から数えて５，６列目は、出席者が発言の意図を有するという予測が的中したのか、それとも外れたのかを表す。右端の列は、感情分析部１１が検出した出席者の２度目の感情表現（図７では「感情表現（差分）」と表記）を表す。 (Example of database for re-learning)
FIG. 7 shows an example of a database used by the learning unit 34 to re-learn the predictor. In the example shown in FIG. 7, the leftmost column indicates the number of times predictions were made by the predictor. The second to fourth columns counted from the left end represent the emotional expressions of the first attendee detected by the emotion analysis unit 11 . The 5th and 6th columns counted from the left end indicate whether the prediction that the attendee has the intention to speak was true or false. The rightmost column represents the attendee's second emotional expression (denoted as “emotional expression (difference)” in FIG. 7) detected by the emotion analysis unit 11 .

一例では、学習部３４は、ステップＳ３０５において、同一の感情表現が所定回数、検出された場合、予測器を再学習させる。一例では、図７に示すように、「頬が上がる」という感情表現が4回検出されたとき、学習部３４は、予測器の再学習を実行する。 In one example, the learning unit 34 re-learns the predictor when the same emotional expression is detected a predetermined number of times in step S305. In one example, as shown in FIG. 7, the learning unit 34 re-learns the predictor when the emotional expression "cheeks go up" is detected four times.

さらに、本実施形態の構成によれば、学習部３４は、出席者が発言の意図を有するかに関する予測の結果を用いて、予測器を再学習する。予測器の再学習を行うことによって、出席者が発言の意図を有するかどうかに関する予測の精度を向上させることができる。 Furthermore, according to the configuration of this embodiment, the learning unit 34 re-learns the predictor using the result of prediction regarding whether the attendee has an intention to speak. By re-learning the predictor, it is possible to improve the accuracy of the prediction as to whether the attendee has the intention to speak.

（ハードウェア構成について）
前記実施形態１～３で説明した会議支援装置１０，２０，３０の各構成要素は、機能単位のブロックを示している。これらの構成要素の一部又は全部は、例えば図８に示すような情報処理装置９００により実現される。図８は、情報処理装置９００のハードウェア構成の一例を示すブロック図である。 (About hardware configuration)
Each component of the conference support devices 10, 20, and 30 described in the first to third embodiments represents a functional unit block. Some or all of these components are realized by an information processing device 900 as shown in FIG. 8, for example. FIG. 8 is a block diagram showing an example of the hardware configuration of the information processing device 900. As shown in FIG.

図８に示すように、情報処理装置９００は、一例として、以下のような構成を含む。 As shown in FIG. 8, the information processing apparatus 900 includes, as an example, the following configuration.

・ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１
・ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２
・ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３
・ＲＡＭ９０３にロードされるプログラム９０４
・プログラム９０４を格納する記憶装置９０５
・記録媒体９０６の読み書きを行うドライブ装置９０７
・通信ネットワーク９０９と接続する通信インタフェース９０８
・データの入出力を行う入出力インタフェース９１０
・各構成要素を接続するバス９１１
前記実施形態１～３で説明した会議支援装置１０，２０，３０の各構成要素は、これらの機能を実現するプログラム９０４をＣＰＵ９０１が読み込んで実行することで実現される。各構成要素の機能を実現するプログラム９０４は、例えば、予め記憶装置９０５やＲＯＭ９０２に格納されており、必要に応じてＣＰＵ９０１がＲＡＭ９０３にロードして実行される。なお、プログラム９０４は、通信ネットワーク９０９を介してＣＰＵ９０１に供給されてもよいし、予め記録媒体９０６に格納されており、ドライブ装置９０７が当該プログラムを読み出してＣＰＵ９０１に供給してもよい。 - CPU (Central Processing Unit) 901
・ROM (Read Only Memory) 902
・RAM (Random Access Memory) 903
・Program 904 loaded into RAM 903
- Storage device 905 for storing program 904
A drive device 907 that reads and writes the recording medium 906
- A communication interface 908 that connects to the communication network 909
- An input/output interface 910 for inputting/outputting data
A bus 911 connecting each component
Each component of the conference support apparatuses 10, 20, and 30 described in the first to third embodiments is implemented by the CPU 901 reading and executing the program 904 that implements these functions. A program 904 that implements the function of each component is stored in advance in the storage device 905 or the ROM 902, for example, and is loaded into the RAM 903 and executed by the CPU 901 as necessary. The program 904 may be supplied to the CPU 901 via the communication network 909 or may be stored in the recording medium 906 in advance, and the drive device 907 may read the program and supply it to the CPU 901 .

上記の構成によれば、前記実施形態１～３において説明した会議支援装置１０，２０，３０が、ハードウェアとして実現される。したがって、前記実施形態において説明した効果と同様の効果を奏することができる。 According to the above configuration, the conference support devices 10, 20, and 30 described in the first to third embodiments are implemented as hardware. Therefore, the same effects as those described in the above embodiment can be obtained.

（付記）
本発明の一態様は、以下の付記のようにも記載され得るが、以下に限定されない。 (Appendix)
One aspect of the present invention can also be described in the following supplementary remarks, but is not limited to the following.

（付記１）
会議の出席者の感情表現を検出する感情分析手段と、
前記感情表現に基づいて、前記出席者が発言の意図を有するか予測する予測手段と、
前記出席者が発言の意図を有すると予測された場合、前記出席者に意見を尋ねる質問手段と
を備えた会議支援装置。 (Appendix 1)
sentiment analysis means for detecting emotional expressions of conference attendees;
Prediction means for predicting whether the attendee has an intention to speak based on the emotional expression;
and questioning means for asking an opinion of the attendee when it is predicted that the attendee has an intention to speak.

（付記２）
前記感情分析手段は、前記出席者の顔を撮影して得られた顔画像データから、前記出席者の感情表現を検出する
ことを特徴とする付記１に記載の会議支援装置。 (Appendix 2)
The conference support apparatus according to appendix 1, wherein the emotion analysis means detects the emotional expression of the attendee from face image data obtained by photographing the face of the attendee.

（付記３）
前記感情表現は、前記出席者の動作、および前記出席者の表情の変化を含む
ことを特徴とする付記１または２に記載の会議支援装置。 (Appendix 3)
3. The conference support device according to appendix 1 or 2, wherein the emotional expression includes an action of the attendee and a change in facial expression of the attendee.

（付記４）
前記質問手段は、前記出席者に発言を促すための音声メッセージを送出する
ことを特徴とする付記１から３のいずれか１項に記載の会議支援装置。 (Appendix 4)
4. The conference support device according to any one of appendices 1 to 3, wherein the question means sends a voice message for prompting the attendee to speak.

（付記５）
前記予測手段は、前記出席者が過去に発言した際の感情表現を機械学習した予測器を用いて、前記出席者が発言の意図を有するかどうかを予測する
ことを特徴とする付記１から４のいずれか1項に記載の会議支援装置。 (Appendix 5)
Supplements 1 to 4, wherein the prediction means predicts whether or not the attendee has an intention to speak by using a predictor that performs machine learning of emotional expressions when the attendee has spoken in the past. The conference support device according to any one of 1.

（付記６）
前記質問手段は、音声認識装置を用いて、前記出席者の発言を分析し、
前記出席者の識別名が呼称された後、前記感情分析手段は、前記出席者の感情表現を検出する
ことを特徴とする付記１から５のいずれか1項に記載の会議支援装置。 (Appendix 6)
The question means uses a speech recognition device to analyze the speech of the attendee,
6. The conference support device according to any one of appendices 1 to 5, wherein the emotion analysis means detects the emotional expression of the attendee after the identification name of the attendee is called.

（付記７）
前記出席者が発言の意図を有するかに関する予測の結果を用いて、前記予測器を再学習する学習手段をさらに備えた
ことを特徴とする付記５に記載の会議支援装置。 (Appendix 7)
The conference support device according to appendix 5, further comprising learning means for re-learning the predictor using a prediction result regarding whether the attendee has an intention to speak.

（付記８）
会議の出席者の感情表現を検出し、
前記感情表現に基づいて、前記出席者が発言の意図を有するか予測し、
前記出席者が発言の意図を有すると予測された場合、前記出席者に意見を尋ねる
会議支援方法。 (Appendix 8)
Detects the emotional expressions of meeting attendees,
predicting whether the attendee has an intention to speak based on the emotional expression;
A method of supporting a meeting, which asks an opinion of the attendee when it is predicted that the attendee has an intention to speak.

（付記９）
会議の出席者の感情表現を検出する処理と、
前記感情表現に基づいて、前記出席者が発言の意図を有するか予測する処理と、
前記出席者が発言の意図を有すると予測された場合、前記出席者に意見を尋ねる処理と
をコンピュータに実行させるためのプログラム。 (Appendix 9)
a process of detecting an emotional expression of a meeting attendee;
a process of predicting whether the attendee has an intention to speak based on the emotional expression;
A program for causing a computer to execute a process of asking an opinion of said attendee when said attendee is predicted to have an intention to speak.

（付記１０）
付記１から７のいずれか１項に記載の会議支援装置と、
前記出席者の顔を撮影する撮像装置と、
前記出席者の発言を集音する集音装置と、
を備えた会議支援システム。 (Appendix 10)
The conference support device according to any one of Appendices 1 to 7;
an imaging device that captures the face of the attendee;
a sound collecting device that collects the speech of the attendee;
A conference support system with

（付記１１）
前記出席者を生体認証する生体認証装置をさらに備えた
ことを特徴とする付記１０に記載の会議支援システム。 (Appendix 11)
11. The meeting support system according to appendix 10, further comprising a biometric authentication device that biometrically authenticates the attendee.

（付記１２）
前記生体認証装置は、前記出席者の顔画像データと、事前に登録された顔画像データとを照合することによって、前記出席者を顔認証する
ことを特徴とする付記１１に記載の会議支援システム。 (Appendix 12)
12. The conference support system according to appendix 11, wherein the biometric authentication device authenticates the face of the attendee by matching face image data of the attendee with face image data registered in advance. .

（付記１３）
仮想空間内における前記出席者の分身を表すアバターを生成するアバター生成装置をさらに備えた
ことを特徴とする付記１０から１２のいずれか１項に記載の会議支援システム。 (Appendix 13)
13. The conference support system according to any one of appendices 10 to 12, further comprising an avatar generation device that generates an avatar representing the alter ego of the attendee in the virtual space.

（付記１４）
前記アバター生成装置は、前記出席者の前記感情表現に応じて、前記アバターの態様を変更する
ことを特徴とする付記１３に記載の会議支援システム。 (Appendix 14)
14. The conference support system according to Supplementary Note 13, wherein the avatar generation device changes the mode of the avatar according to the emotional expression of the attendee.

（付記１５）
前記集音装置から入力された音声データを音声認識する音声認識装置をさらに備えた
ことを特徴とする付記１０から１４のいずれか１項に記載の会議支援システム。
(Appendix 15)
15. The conference support system according to any one of appendices 10 to 14, further comprising a voice recognition device that recognizes voice data input from the sound collector.

本発明は、一例では、インターネットなどのネットワークを通じたオンライン会議を支援するための会議支援システムに利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used, for example, in a conference support system for supporting an online conference through a network such as the Internet.

１会議支援システム
１０会議支援装置
１１感情分析部
１２予測部
１３質問部
２０会議支援装置
３０会議支援装置
３４学習部
１１０集音装置
１２０撮像装置
２００生体認証装置
３００音声認識装置
４００アバター生成装置 1 meeting support system 10 meeting support device 11 emotion analysis unit 12 prediction unit 13 question unit 20 meeting support device 30 meeting support device 34 learning unit 110 sound collector 120 imaging device 200 biometric authentication device 300 voice recognition device 400 avatar generation device

Claims

sentiment analysis means for detecting emotional expressions of conference attendees;
Prediction means for predicting whether the attendee has an intention to speak based on the emotional expression;
and questioning means for asking an opinion of the attendee when it is predicted that the attendee has an intention to speak.

2. The conference support apparatus according to claim 1, wherein said emotion analysis means detects an emotional expression of said attendee from face image data corresponding to an image of said attendee's face.

3. The conference support apparatus according to claim 1, wherein said emotional expression includes behavior of said attendee and changes in facial expression of said attendee.

4. The conference support apparatus according to any one of claims 1 to 3, wherein said question means sends out a voice message for prompting said attendee to speak.

The prediction means predicts whether or not the attendee has an intention to speak by using a predictor that has undergone machine learning of emotional expressions when the attendee has spoken in the past. 5. The conference support device according to any one of 4.

The question means uses a speech recognition device to analyze the speech of the attendee,
6. The conference support device according to any one of claims 1 to 5, wherein the emotion analysis means detects the emotional expression of the attendee after the identification name of the attendee is called.

6. The conference support apparatus according to claim 5, further comprising learning means for re-learning said predictor using a prediction result regarding whether said attendee has an intention to speak.

Detects the emotional expressions of meeting attendees,
predicting whether the attendee has an intention to speak based on the emotional expression;
A method of supporting a meeting, which asks an opinion of the attendee when it is predicted that the attendee has an intention to speak.

a process of detecting an emotional expression of a meeting attendee;
a process of predicting whether the attendee has an intention to speak based on the emotional expression;
A program for causing a computer to execute a process of asking an opinion of said attendee when said attendee is predicted to have an intention to speak.

a meeting support device according to any one of claims 1 to 7;
an imaging device that captures the face of the attendee;
a sound collecting device that collects the speech of the attendee;
A conference support system with