JP2021111239A

JP2021111239A - Providing system, providing method, providing device, and computer program

Info

Publication number: JP2021111239A
Application number: JP2020003983A
Authority: JP
Inventors: 裕介本家; Yusuke Honke
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2021-08-02

Abstract

To provide a providing system for assisting smooth communication among users.SOLUTION: A providing system comprises: a video obtainment section for obtaining a video of a second user which is obtainable by photographing the second user listening to voice of a first user representing a speaker; a determination section for, based on the obtained video of the second user, determining at least one of an emotion or a concentration level of the second user; and a providing section for providing a determination result by the determination section to the first user.SELECTED DRAWING: Figure 4

Description

本開示は、提供システム、提供方法、提供装置、及びコンピュータプログラムに関する。 The present disclosure relates to a providing system, a providing method, a providing device, and a computer program.

従来、ネットワークを介してユーザ同士がコミュニケーションを図る電子会議システムが提案されている（例えば、特許文献１参照）。 Conventionally, an electronic conference system has been proposed in which users communicate with each other via a network (see, for example, Patent Document 1).

特開２０１８−１３９６５２号公報JP-A-2018-139652

特許文献１に記載のような電子会議システムでは、通常のフェイストゥーフェイスの会議と比較して、対話相手の顔が画面上に小さく映ったりだとか、互いの目線が合いにくいなどの理由により、互いの意思疎通が図りにくい。このため、会議の生産性が低くなるという課題がある。 In an electronic conference system as described in Patent Document 1, compared to a normal face-to-face conference, the faces of the dialogue partners appear smaller on the screen, and it is difficult for the eyes of each other to meet each other. It is difficult to communicate with each other. Therefore, there is a problem that the productivity of the conference is lowered.

また、話者に対して反対の意見を持っていたとしても、目上の人に対しては意見を言い出しにくく、建設的な議論にならない場合もある。 In addition, even if you have an opinion against the speaker, it may be difficult to give an opinion to your superiors, and it may not be a constructive discussion.

これは、ネットワーク越しの会話では、聞き手の感情が話者に伝わりにくいという事情からであると考えられる。 This is thought to be due to the fact that it is difficult for the listener's emotions to be conveyed to the speaker in conversations over the network.

本開示は、このような事情に鑑みてなされてものであり、ユーザ同士の円滑なコミュニケーションを支援する提供システム、提供方法、提供装置、及びコンピュータプログラムを提供することを目的とする。 The present disclosure has been made in view of such circumstances, and an object of the present disclosure is to provide a providing system, a providing method, a providing device, and a computer program that support smooth communication between users.

本開示の一態様に係る提供システムは、話者である第１ユーザの音声を聴取する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得する映像取得部と、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する判断部と、前記判断部による判断結果を、前記第１ユーザに提供する提供部とを備える。 The providing system according to one aspect of the present disclosure includes an image acquisition unit that acquires an image of the second user, which is obtained by photographing a second user who listens to the voice of the first user who is a speaker. A determination unit that determines at least one of the emotion and concentration of the second user based on the video of the second user, and a providing unit that provides the determination result by the determination unit to the first user. ..

本開示の他の態様に係る提供方法は、話者である第１ユーザの音声及び映像を視聴する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得するステップと、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断するステップと、前記判断するステップにおける判断結果を、前記第１ユーザに提供するステップとを含む。 The providing method according to another aspect of the present disclosure includes a step of acquiring the video of the second user, which is obtained by photographing the second user who views the voice and video of the first user who is the speaker, and the acquisition. It includes a step of determining at least one of the emotion and concentration of the second user based on the video of the second user, and a step of providing the determination result in the determination step to the first user. ..

本開示の他の態様に係る提供装置は、話者である第１ユーザの音声及び映像を視聴する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得する映像取得部と、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する判断部と、前記判断部による判断結果を、前記第１ユーザに提供する提供部とを備える。 The providing device according to another aspect of the present disclosure includes a video acquisition unit that acquires the video of the second user, which is obtained by photographing the second user who views the voice and video of the first user who is the speaker. A determination unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user, and a providing unit that provides the determination result by the determination unit to the first user. And.

本開示の他の態様に係るコンピュータプログラムは、コンピュータを、話者である第１ユーザの音声及び映像を視聴する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得する映像取得部と、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する判断部と、前記判断部による判断結果を、前記第１ユーザに提供する提供部として機能させる。 The computer program according to another aspect of the present disclosure is a video for acquiring the video of the second user, which is obtained by photographing the computer with a second user who watches the voice and video of the first user who is a speaker. The acquisition unit, a determination unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user, and the determination result by the determination unit are provided to the first user. To function as a provider.

なお、コンピュータプログラムを、ＣＤ−ＲＯＭ（Compact Disc-Read Only Memory）等のコンピュータ読取可能な非一時的な記録媒体やインターネット等の通信ネットワークを介して流通させることができるのは、言うまでもない。また、本開示は、提供装置の一部又は全部を実現する半導体集積回路として実現したり、提供装置を含む提供システムとして実現したりすることもできる。 Needless to say, computer programs can be distributed via computer-readable non-temporary recording media such as CD-ROMs (Compact Disc-Read Only Memory) and communication networks such as the Internet. Further, the present disclosure can be realized as a semiconductor integrated circuit that realizes a part or all of the providing device, or can be realized as a providing system including the providing device.

本開示によると、ユーザ同士の円滑なコミュニケーションを支援することができる。 According to the present disclosure, it is possible to support smooth communication between users.

図１は、本開示の実施形態１に係る提供システムの概略構成を示す図である。FIG. 1 is a diagram showing a schematic configuration of a provision system according to the first embodiment of the present disclosure. 図２は、本開示の実施形態１に係る第１装置の機能的構成を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration of the first device according to the first embodiment of the present disclosure. 図３は、ディスプレイに表示される映像の一例を示す図である。FIG. 3 is a diagram showing an example of an image displayed on the display. 図４は、本開示の実施形態１に係る第２装置の機能的構成を示すブロック図である。FIG. 4 is a block diagram showing a functional configuration of the second device according to the first embodiment of the present disclosure. 図５は、本開示の実施形態１に係る提供システムによる、第１装置から第２装置への第１ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。FIG. 5 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the first user from the first device to the second device by the providing system according to the first embodiment of the present disclosure. 図６は、本開示の実施形態１に係る提供システムによる、第２装置から第１装置への第２ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。FIG. 6 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the second user from the second device to the first device by the providing system according to the first embodiment of the present disclosure. 図７は、本開示の実施形態２に係る第１装置の機能的構成を示すブロック図である。FIG. 7 is a block diagram showing a functional configuration of the first device according to the second embodiment of the present disclosure. 図８は、本開示の実施形態２に係る第２装置の機能的構成を示すブロック図である。FIG. 8 is a block diagram showing a functional configuration of the second device according to the second embodiment of the present disclosure. 図９は、本開示の実施形態２に係る提供システムによる、第１装置から第２装置への第１ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。FIG. 9 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the first user from the first device to the second device by the providing system according to the second embodiment of the present disclosure. 図１０は、本開示の実施形態２に係る提供システムによる、第２装置から第１装置への第２ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。FIG. 10 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the second user from the second device to the first device by the providing system according to the second embodiment of the present disclosure. 図１１は、本開示の実施形態３に係る第１装置の機能的構成を示すブロック図である。FIG. 11 is a block diagram showing a functional configuration of the first device according to the third embodiment of the present disclosure. 図１２は、本開示の実施形態３に係る第２装置の機能的構成を示すブロック図である。FIG. 12 is a block diagram showing a functional configuration of the second device according to the third embodiment of the present disclosure. 図１３は、本開示の実施形態３に係る提供システムによる、第１装置から第２装置への第１ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。FIG. 13 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the first user from the first device to the second device by the providing system according to the third embodiment of the present disclosure. 図１４は、本開示の実施形態３に係る提供システムによる、第２装置から第１装置への第２ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。FIG. 14 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the second user from the second device to the first device by the providing system according to the third embodiment of the present disclosure.

［本開示の実施形態の概要］
最初に本開示の実施形態の概要を列記して説明する。
（１）本開示の一実施形態に係る提供システムは、話者である第１ユーザの音声を聴取する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得する映像取得部と、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する判断部と、前記判断部による判断結果を、前記第１ユーザに提供する提供部とを備える。 [Summary of Embodiments of the present disclosure]
First, an outline of the embodiments of the present disclosure will be listed and described.
(1) The providing system according to the embodiment of the present disclosure is an image acquisition unit that acquires an image of the second user, which is obtained by photographing a second user who listens to the voice of the first user who is a speaker. A determination unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user, and a determination result by the determination unit are provided to the first user. It has a part.

この構成によると、第１ユーザの発話内容の聞き手である第２ユーザの感情及び集中度の少なくとも一方の判断結果が、第１ユーザに提供される。このため、第１ユーザは、自分の発話内容に対し、第２ユーザがどのような感情を抱いているか、又は第２ユーザが集中して話を聞いているかなどを知ることができる。これに対し、第１ユーザは、例えば、発話内容に対して否定的な感情を抱く第２ユーザに対して質問を行ったり、第２ユーザが集中していない場合には話題を変えるなどの対策を行うことができる。これにより、ユーザ同士の円滑なコミュニケーションを支援することができる。 According to this configuration, the determination result of at least one of the emotion and the concentration of the second user who is the listener of the utterance content of the first user is provided to the first user. Therefore, the first user can know what kind of emotion the second user has with respect to the content of his / her utterance, or whether the second user concentrates on listening to the story. On the other hand, the first user asks a question to the second user who has a negative feeling about the utterance content, or changes the topic when the second user is not concentrated. It can be performed. This makes it possible to support smooth communication between users.

（２）好ましくは、前記提供システムは、さらに、前記第２ユーザの音声を取得する音声取得部を備え、前記判断部は、取得された前記第２ユーザの映像及び音声に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する。 (2) Preferably, the providing system further includes an audio acquisition unit that acquires the audio of the second user, and the determination unit is based on the acquired video and audio of the second user. 2 Judge at least one of the user's emotions and concentration.

この構成によると、第２ユーザの音声を考慮して第２ユーザの感情及び集中度の少なくとも一方を判断することができる。このため、第２ユーザの映像だけを用いて感情及び集中度の少なくとも一方を判断する場合に比べ、第２ユーザの感情又は集中度を高精度で判断することができる。 According to this configuration, at least one of the emotion and the degree of concentration of the second user can be determined in consideration of the voice of the second user. Therefore, the emotion or concentration of the second user can be determined with higher accuracy than in the case of determining at least one of the emotion and the concentration of the second user using only the video of the second user.

（３）さらに好ましくは、前記提供システムは、ネットワークを介して相互に接続される第１装置及び第２装置を備え、前記第１装置は、前記第１ユーザの音声及び映像を取得する第１取得部と、取得された前記第１ユーザの音声及び映像を前記第２装置に送信する第１送信部と、前記第２装置から、前記第２ユーザの音声及び映像を受信する第１受信部と、受信された前記第２ユーザの音声及び映像と、前記判断部による判断結果とを出力する、前記提供部としての第１出力部とを含み、前記第２装置は、前記第２ユーザの音声及び映像を取得する、前記音声取得部及び前記映像取得部としての第２取得部と、取得された前記第２ユーザの音声及び映像を前記第１装置に送信する第２送信部と、前記第１装置から、前記第１ユーザの音声及び映像を受信する第２受信部と、受信された前記第１ユーザの音声及び映像を出力する第２出力部とを含む。 (3) More preferably, the provided system includes a first device and a second device connected to each other via a network, and the first device acquires audio and video of the first user. An acquisition unit, a first transmission unit that transmits the acquired audio and video of the first user to the second device, and a first reception unit that receives the audio and video of the second user from the second device. The second device includes a first output unit as the providing unit that outputs the received audio and video of the second user and the determination result by the determination unit. A second acquisition unit as the audio acquisition unit and the video acquisition unit that acquires audio and video, a second transmission unit that transmits the acquired audio and video of the second user to the first device, and the above. It includes a second receiving unit that receives the audio and video of the first user from the first device, and a second output unit that outputs the received audio and video of the first user.

この構成によると、第１ユーザと第２ユーザとの間でネットワーク越しに対話を行い、第２ユーザの感情及び集中度の少なくとも一方の判断結果を第１ユーザに提供することができる。このため、例えば、第１ユーザを会議の進行役とする電子会議システムにおいて、第１ユーザが第２ユーザの感情又は集中度を把握しながら、第２ユーザに適宜意見を求めたりしながら議事を進行することができる。これにより、議論を建設的なものとし、生産性の高い会議を実現することができる。なお、判断部は、第２ユーザの場合と同様に、第１ユーザの音声及び映像から第１ユーザの感情及び集中度の少なくとも一方を判断し、第２装置の第２出力部が、第１ユーザの感情及び集中度の少なくとも一方の判断結果を出力するようにしてもよい。これにより、第１ユーザ及び第２ユーザは、相互に相手の感情又は集中度を把握することができる。 According to this configuration, it is possible to have a dialogue between the first user and the second user over the network and provide the first user with the determination result of at least one of the emotion and the concentration of the second user. Therefore, for example, in an electronic conference system in which the first user is the facilitator of the conference, the first user grasps the emotion or concentration of the second user and asks the second user for opinions as appropriate. You can proceed. This makes the discussion constructive and enables highly productive meetings. As in the case of the second user, the determination unit determines at least one of the emotion and concentration of the first user from the audio and video of the first user, and the second output unit of the second device is the first. The judgment result of at least one of the user's emotion and concentration may be output. As a result, the first user and the second user can mutually grasp the emotions or the degree of concentration of the other party.

（４）また、前記判断部は、前記第１装置に備えられ、前記第１受信部が受信した前記第２ユーザの音声及び映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断してもよい。 (4) Further, the determination unit is provided in the first device, and at least one of the emotion and concentration of the second user is based on the audio and video of the second user received by the first receiving unit. May be judged.

この構成によると、第１装置が、第２装置から送信される第２ユーザの音声及び映像に基づいて、第２ユーザの感情及び集中度の少なくとも一方を判断することができる。このため、第１装置は、第２ユーザの音声及び映像と第２ユーザの感情又は集中度との同期を正確に取ることができる。これにより、第２ユーザの音声及び映像と第２ユーザの感情又は集中度とを正確に対応付けて第１ユーザに提供することができる。 According to this configuration, the first device can determine at least one of the emotion and concentration of the second user based on the audio and video of the second user transmitted from the second device. Therefore, the first device can accurately synchronize the audio and video of the second user with the emotion or concentration of the second user. As a result, the audio and video of the second user can be accurately associated with the emotion or concentration of the second user and provided to the first user.

（５）また、前記提供システムは、ネットワークを介して相互に接続される第１装置及び第２装置を備え、前記第１装置は、前記第１ユーザの音声を取得する第１取得部と、取得された前記第１ユーザの音声を前記第２装置に送信する第１送信部と、前記第２装置から、前記第２ユーザの音声を受信する第１受信部と、受信された前記第２ユーザの音声と、前記判断部による判断結果とを出力する、前記提供部としての第１出力部とを含み、前記第２装置は、前記第２ユーザの音声及び映像を取得する、前記音声取得部及び前記映像取得部としての第２取得部と、取得された前記第２ユーザの音声を前記第１装置に送信する第２送信部と、前記第１装置から、前記第１ユーザの音声を受信する第２受信部と、受信された前記第１ユーザの音声を出力する第２出力部とを含み、前記判断部は、前記第２装置に備えられ、前記第２取得部が取得した前記第２ユーザの音声及び映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断し、前記第２送信部は、さらに、前記判断部による判断結果を送信し、前記第１受信部は、さらに、前記判断部による判断結果を受信し、前記第１出力部は、前記第１受信部が受信した前記判断部による判断結果を出力してもよい。 (5) Further, the provided system includes a first device and a second device connected to each other via a network, and the first device includes a first acquisition unit for acquiring the voice of the first user. The first transmitting unit that transmits the acquired voice of the first user to the second device, the first receiving unit that receives the acquired voice of the second user from the second device, and the second that has been received. The second device acquires the voice and video of the second user, including the first output unit as the providing unit, which outputs the voice of the user and the judgment result by the determination unit. A second acquisition unit as a unit and the video acquisition unit, a second transmission unit that transmits the acquired voice of the second user to the first device, and the voice of the first user from the first device. The second receiving unit for receiving and the second output unit for outputting the received voice of the first user are included, and the determining unit is provided in the second device and acquired by the second acquisition unit. Based on the audio and video of the second user, at least one of the emotion and the degree of concentration of the second user is determined, and the second transmission unit further transmits the determination result by the determination unit and receives the first reception. The unit may further receive the determination result by the determination unit, and the first output unit may output the determination result by the determination unit received by the first receiving unit.

この構成によると、第２装置から第１装置に第２ユーザの映像を送信することなく、第２装置が第２ユーザの感情及び集中度の少なくとも一方の判断結果を、第２ユーザの音声とともに第１装置に送信することができる。このため、第２装置から第１装置への伝送データを削減しつつ、第２ユーザの感情又は集中度の判断結果を第１装置に送信することができる。また、第２装置から第１装置へ映像を送信する必要がない。このため、例えば、第１出力部は、第２ユーザの映像の代わりに、第２ユーザの感情に基づく表情を有する第２ユーザのアバターを表示装置に出力することもできる。これにより、第２ユーザのプライバシーを保護することもできる。 According to this configuration, the second device determines at least one of the emotions and the concentration of the second user together with the voice of the second user without transmitting the image of the second user from the second device to the first device. It can be transmitted to the first device. Therefore, it is possible to transmit the determination result of the emotion or the degree of concentration of the second user to the first device while reducing the transmission data from the second device to the first device. Further, it is not necessary to transmit the video from the second device to the first device. Therefore, for example, the first output unit can output the avatar of the second user having a facial expression based on the emotion of the second user to the display device instead of the video of the second user. Thereby, the privacy of the second user can also be protected.

（６）また、前記第２装置は、さらに、前記第２ユーザの感情及び集中度の少なくとも一方の判断結果に基づいて、前記第２ユーザに対して発言を促す発言促進部を備えてもよい。 (6) Further, the second device may further include a speech promotion unit that prompts the second user to speak based on the determination result of at least one of the emotion and the concentration of the second user. ..

この構成によると、例えば、第２ユーザが第１ユーザの発話内容に対して否定的な感情を抱いていたり、第２ユーザが集中していない場合などに、第２ユーザに発言を促すことができる。これにより、議論を有意義なものとし、ユーザ同士の円滑なコミュニケーションを支援することができる。 According to this configuration, for example, when the second user has a negative feeling toward the utterance content of the first user, or when the second user is not concentrated, the second user can be prompted to speak. can. This makes the discussion meaningful and supports smooth communication between users.

（７）また、前記提供システムは、さらに、前記判断部による判断結果に基づいて、前記第１ユーザと前記第２ユーザとの対話における前記第２ユーザの貢献度を算出する算出部を備えてもよい。 (7) Further, the providing system further includes a calculation unit that calculates the degree of contribution of the second user in the dialogue between the first user and the second user based on the judgment result by the judgment unit. May be good.

この構成によると、第２ユーザの感情及び集中度の少なくとも一方の判断結果に基づいて、第２ユーザの対話における貢献度を算出することができる。例えば、対話に集中していた第２ユーザの貢献度を高く算出したり、軽蔑や嫌悪の感情が低く、喜びや驚きの感情が高い第２ユーザの貢献度を高く算出したりすることが可能である。 According to this configuration, the degree of contribution in the dialogue of the second user can be calculated based on the judgment result of at least one of the emotion and the degree of concentration of the second user. For example, it is possible to calculate the contribution of the second user who was concentrating on the dialogue high, or to calculate the contribution of the second user who has low feelings of contempt and disgust and high feelings of joy and surprise. Is.

（８）また、前記判断部は、さらに、前記第２ユーザの感情及び集中度の少なくとも一方の判断結果の履歴に基づいて、当該第２ユーザの感情及び集中度の少なくとも一方の判断結果を補正してもよい。 (8) Further, the judgment unit further corrects the judgment result of at least one of the emotion and concentration of the second user based on the history of the judgment result of at least one of the emotion and concentration of the second user. You may.

この構成によると、感情又は集中度の判断結果をスコアにより表現した場合に、感情の起伏や集中度の変化が相対的に小さい第２ユーザの各スコアと、感情の起伏や集中度の変化が相対的に大きい第２ユーザの各スコアとを正規化又は標準化することができる。これにより、第２ユーザ間で感情又は集中度を正確に比較することができる。 According to this configuration, when the judgment result of emotion or concentration is expressed by a score, each score of the second user in which the change in emotional ups and downs and concentration is relatively small, and the change in emotional ups and downs and concentration are Each score of the relatively large second user can be normalized or standardized. This makes it possible to accurately compare emotions or concentration levels among the second users.

（９）本開示の他の実施形態に係る提供方法は、話者である第１ユーザの音声及び映像を視聴する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得するステップと、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断するステップと、前記判断するステップにおける判断結果を、前記第１ユーザに提供するステップとを含む。 (9) The providing method according to another embodiment of the present disclosure acquires the video of the second user obtained by photographing the second user who views the voice and video of the first user who is the speaker. The first user is provided with a step of determining at least one of the emotion and concentration of the second user based on the step and the acquired video of the second user, and a determination result in the determination step. Including steps.

この構成は、上述の提供システムが備える特徴的な処理部に対応するステップを含む。このため、この構成によると、上述の提供システムと同様の作用及び効果を奏することができる。 This configuration includes steps corresponding to the characteristic processing units included in the above-mentioned providing system. Therefore, according to this configuration, the same operation and effect as the above-mentioned providing system can be obtained.

（１０）本開示の他の実施形態に係る提供装置は、話者である第１ユーザの音声及び映像を視聴する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得する映像取得部と、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する判断部と、前記判断部による判断結果を、前記第１ユーザに提供する提供部とを備える。 (10) The providing device according to another embodiment of the present disclosure acquires the video of the second user obtained by photographing the second user who views the voice and video of the first user who is the speaker. Based on the video acquisition unit and the acquired video of the second user, a determination unit that determines at least one of the emotion and concentration of the second user, and a determination result by the determination unit are transmitted to the first user. It has a providing unit to provide.

この構成によると、第１ユーザの発話内容に聞き手である第２ユーザの感情及び集中度の少なくとも一方の判断結果が、第１ユーザに提供される。このため、第１ユーザは、自分の発話内容に対し、第２ユーザがどのような感情を抱いているか、又は第２ユーザが集中して話を聞いているかなどを知ることができる。これに対し、第２ユーザは、例えば、発話内容に対して否定的な感情を抱く第２ユーザに対して質問を行ったり、第２ユーザが集中していない場合には話題を変えるなどの対策を行うことができる。これにより、ユーザ同士の円滑なコミュニケーションを支援することができる。 According to this configuration, the judgment result of at least one of the emotion and the concentration of the second user who is the listener is provided to the first user in the utterance content of the first user. Therefore, the first user can know what kind of emotion the second user has with respect to the content of his / her utterance, or whether the second user concentrates on listening to the story. On the other hand, the second user asks a question to the second user who has a negative feeling about the utterance content, or changes the topic when the second user is not concentrated. It can be performed. This makes it possible to support smooth communication between users.

（１１）本開示の他の実施形態に係るコンピュータプログラムは、コンピュータを、話者である第１ユーザの音声及び映像を視聴する第２ユーザを撮影することにより得られる、前記第２ユーザの映像を取得する映像取得部と、取得された前記第２ユーザの映像に基づいて、前記第２ユーザの感情及び集中度の少なくとも一方を判断する判断部と、前記判断部による判断結果を、前記第１ユーザに提供する提供部として機能させる。 (11) The computer program according to another embodiment of the present disclosure is obtained by photographing a second user who watches the voice and video of the first user who is a speaker, and the video of the second user. A determination unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user, and a determination result by the determination unit. 1 Make it function as a provider to provide to users.

この構成によると、コンピュータを、上述の提供装置として機能させることができる。このため、上述の提供装置と同様の作用及び効果を奏することができる。 According to this configuration, the computer can function as the above-mentioned providing device. Therefore, the same operation and effect as the above-mentioned providing device can be obtained.

［本開示の実施形態の詳細］
以下、本開示の実施形態について、図面を参照しながら説明する。なお、以下で説明する実施形態は、いずれも本開示の一具体例を示すものである。以下の実施形態で示される数値、形状、材料、構成要素、構成要素の配置位置及び接続形態、ステップ、ステップの順序などは、一例であり、本開示を限定するものではない。また、以下の実施形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意に付加可能な構成要素である。また、各図は、模式図であり、必ずしも厳密に図示されたものではない。 [Details of Embodiments of the present disclosure]
Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be noted that all of the embodiments described below show a specific example of the present disclosure. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, step order, and the like shown in the following embodiments are examples, and do not limit the present disclosure. Further, among the components in the following embodiments, the components not described in the independent claims are components that can be arbitrarily added. Further, each figure is a schematic view and is not necessarily exactly illustrated.

また、同一の構成要素には同一の符号を付す。それらの機能及び名称も同様であるため、それらの説明は適宜省略する。 Further, the same components are designated by the same reference numerals. Since their functions and names are the same, their description will be omitted as appropriate.

＜実施形態１＞
〔提供システムの全体構成〕
図１は、本開示の実施形態１に係る提供システムの概略構成を示す図である。提供システム１は、相互にネットワーク３を介して接続された第１装置２、第２装置４及び感情・集中力データベース（以下、「感情・集中力ＤＢ」という。）５を備える。 <Embodiment 1>
[Overall configuration of the provided system]
FIG. 1 is a diagram showing a schematic configuration of a provision system according to the first embodiment of the present disclosure. The providing system 1 includes a first device 2, a second device 4, and an emotion / concentration database (hereinafter, referred to as “emotion / concentration DB”) 5 connected to each other via a network 3.

第１装置２は、例えば、第１拠点にいる１又は複数の第１ユーザの映像データ（以下、「映像」という。）及び音声データ（以下、「音声」という。）を取得し、取得した第１ユーザの映像及び音声に基づいて第１ユーザごとに第１ユーザの感情及び集中度の少なくとも一方を判断する。第１装置２は、第１ユーザの映像及び音声と、第１ユーザの感情及び集中度の少なくとも一方とを第２装置４に送信することにより第２装置４に提供する。第１装置２は、例えば、企業の一の事業所である第１拠点に設置される。 The first device 2 acquires, for example, video data (hereinafter, referred to as “video”) and audio data (hereinafter, referred to as “audio”) of one or a plurality of first users in the first base. At least one of the emotion and concentration of the first user is determined for each first user based on the video and audio of the first user. The first device 2 provides the second device 4 by transmitting the video and audio of the first user and at least one of the emotion and concentration of the first user to the second device 4. The first device 2 is installed, for example, at a first base, which is a business establishment of a company.

第２装置４は、第１装置２から、第１装置２が送信する上記データを受信する。第２装置４は、受信した第１ユーザの映像をディスプレイに表示し、受信した第１ユーザの音声をスピーカーから出力する。また、第２装置４は、受信した第１ユーザの感情及び集中度の少なくとも一方をディスプレイに表示する。ディスプレイ及びスピーカーは、第２装置４に内蔵されていてもよいし、有線又は無線により接続されていてもよい。 The second device 4 receives the data transmitted by the first device 2 from the first device 2. The second device 4 displays the received video of the first user on the display, and outputs the received voice of the first user from the speaker. In addition, the second device 4 displays at least one of the received emotions and concentration of the first user on the display. The display and the speaker may be built in the second device 4, or may be connected by wire or wirelessly.

第２装置４は、例えば、第２拠点にいる１又は複数の第２ユーザの映像及び音声を取得し、取得した第２ユーザの映像及び音声に基づいて、第２ユーザごとに第２ユーザの感情及び集中度の少なくとも一方を判断する。第２装置４は、第２ユーザの映像及び音声と、第２ユーザの感情及び集中度の少なくとも一方とを第１装置２に送信することにより第１装置２に提供する。第２装置４は、例えば、上記企業の他の事業所である第２拠点に設置される。 The second device 4 acquires, for example, the video and audio of one or a plurality of second users in the second base, and based on the acquired video and audio of the second user, for each second user, the second user Judge at least one of emotions and concentration. The second device 4 provides the first device 2 by transmitting the video and audio of the second user and at least one of the emotion and concentration of the second user to the first device 2. The second device 4 is installed, for example, at a second base, which is another business establishment of the above-mentioned company.

第１装置２は、第２装置４から、第２装置４が送信する上記データを受信する。第１装置２は、受信した第２ユーザの映像をディスプレイに表示し、受信した第２ユーザの音声をスピーカーから出力する。また、第１装置２は、受信した第２ユーザの感情及び集中度の少なくとも一方をディスプレイに表示する。ディスプレイ及びスピーカーは、第１装置２に内蔵されていてもよいし、外部接続されていてもよい。 The first device 2 receives the data transmitted by the second device 4 from the second device 4. The first device 2 displays the received video of the second user on the display, and outputs the received voice of the second user from the speaker. In addition, the first device 2 displays at least one of the received emotions and concentration of the second user on the display. The display and the speaker may be built in the first device 2 or may be externally connected.

感情・集中力ＤＢ５は、第１装置２及び第２装置４のそれぞれで判断されたユーザごとの感情又は集中度の判断結果の履歴を記憶する。 The emotion / concentration DB 5 stores the history of the determination result of the emotion or the degree of concentration for each user determined by each of the first device 2 and the second device 4.

〔第１装置２の構成〕
図２は、本開示の実施形態１に係る第１装置２の機能的構成を示すブロック図である。 [Structure of First Device 2]
FIG. 2 is a block diagram showing a functional configuration of the first device 2 according to the first embodiment of the present disclosure.

第１装置２は、映像取得部２１と、映像符号化部２２と、映像解析部２３と、音声取得部２４と、音声符号化部２５と、音声解析部２６と、多重化部２７と、感情・集中力判断部２８と、第１送信部２９と、第１受信部３０と、分離部３１と、映像復号化部３２と、音声復号化部３３と、感情・集中力処理部３４と、表示・出力部３５とを備える。 The first device 2 includes a video acquisition unit 21, a video coding unit 22, a video analysis unit 23, an audio acquisition unit 24, an audio coding unit 25, an audio analysis unit 26, a multiplexing unit 27, and the like. The emotion / concentration determination unit 28, the first transmission unit 29, the first reception unit 30, the separation unit 31, the video decoding unit 32, the audio decoding unit 33, and the emotion / concentration processing unit 34. , Display / output unit 35.

第１装置２は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、通信インタフェース、入出力インタフェース等を備える一般的なコンピュータにより実現することができる。例えば、ＨＤＤに記録されたコンピュータプログラムをＲＡＭ上に展開し、ＣＰＵ上で実行することにより、各処理部２１〜３５は機能的に実現される。ただし、各処理部２１〜３５の一部又は全部がＬＳＩ（Large Scale Integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）等のハードウェアにより実現されていてもよい。 The first device 2 is realized by a general computer equipped with a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), an HDD (Hard Disk Drive), a communication interface, an input / output interface, and the like. be able to. For example, by expanding the computer program recorded in the HDD on the RAM and executing it on the CPU, each processing unit 21 to 35 is functionally realized. However, a part or all of the processing units 21 to 35 may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), and FPGA (Field-Programmable Gate Array).

映像取得部２１は、第１取得部として機能し、第１装置２に内蔵又は有線もしくは無線により接続されたカメラからカメラが撮影した映像を取得する。映像取得部２１は、取得した映像を、映像符号化部２２及び映像解析部２３に出力する。 The image acquisition unit 21 functions as a first acquisition unit, and acquires images taken by the camera from a camera built in the first device 2 or connected by wire or wirelessly. The video acquisition unit 21 outputs the acquired video to the video coding unit 22 and the video analysis unit 23.

映像符号化部２２は、映像取得部２１から映像を受け、当該映像を所定の符号化方法に従い符号化する。例えば、映像符号化部２２は、映像が４Ｋ映像又は８Ｋ映像の場合には、Ｈ．２６５／ＨＥＶＣ（High Efficiency Video Coding）に従って映像を符号化する。映像符号化部２２は、符号化済み映像を多重化部２７に出力する。 The video coding unit 22 receives the video from the video acquisition unit 21 and encodes the video according to a predetermined coding method. For example, when the video is a 4K video or an 8K video, the video coding unit 22 has H. Video is encoded according to 265 / HEVC (High Efficiency Video Coding). The video coding unit 22 outputs the coded video to the multiplexing unit 27.

映像解析部２３は、映像取得部２１から映像を受け、映像に映っているユーザ（以下、「第１ユーザ」という）と第１ユーザの映像中の位置とを特定する。例えば、映像解析部２３は、顔認識アルゴリズムを用いて映像中の第１ユーザを特定する。ただし、第１ユーザを識別するための情報（例えば、ユーザ名）と映像中の第１ユーザの位置を第１装置２の操作者が外部入力により指定するものであってもよい。 The video analysis unit 23 receives the video from the video acquisition unit 21 and identifies the user (hereinafter, referred to as “first user”) shown in the video and the position of the first user in the video. For example, the image analysis unit 23 identifies the first user in the image by using the face recognition algorithm. However, the information for identifying the first user (for example, the user name) and the position of the first user in the video may be specified by the operator of the first device 2 by external input.

映像解析部２３は、当該映像を解析することにより第１ユーザの感情及び集中度を判断する。つまり、映像解析部２３は、入力映像に基づいて、感情の種類ごとに、感情の度合いを数値化した感情スコアを算出する。映像解析部２３が解析対象とする感情の種類は、例えば、怒り、軽蔑、嫌悪、驚き、恐怖、喜び、悲しみ、驚き、幸せ、不快などである。映像解析部２３は、感情の種類ごとに設けられた識別器を用いて、当該識別器に映像を入力することにより第１ユーザの感情スコアを算出する。 The video analysis unit 23 determines the emotion and concentration of the first user by analyzing the video. That is, the video analysis unit 23 calculates an emotion score that quantifies the degree of emotion for each type of emotion based on the input video. The types of emotions analyzed by the image analysis unit 23 are, for example, anger, contempt, disgust, surprise, fear, joy, sadness, surprise, happiness, and discomfort. The image analysis unit 23 calculates the emotion score of the first user by inputting an image into the classifier using a classifier provided for each type of emotion.

識別器として、例えば、映像を入力として受け、感情スコアを出力する多層ニューラルネットワークを用いることができる。この多層ニューラルネットワークは、例えば、ユーザの映像及び感情スコアを教師データとして、多層ニューラルネットワークのパラメータを深層学習等の機械学習を行うことにより構築される。なお、識別器は多層ニューラルネットワークに限定されるものではなく、例えば、線形回帰モデル、ロジスティック回帰モデル、サポートベクターマシン、ランダムフォレスト、ＡｄａＢｏｏｓｔ、ナイーブベイズ、ｋ近傍法等の他の識別器を用いることができる。 As the discriminator, for example, a multi-layer neural network that receives an image as an input and outputs an emotion score can be used. This multi-layer neural network is constructed, for example, by performing machine learning such as deep learning on the parameters of the multi-layer neural network using the user's video and emotion score as teacher data. The classifier is not limited to the multi-layer neural network, and other classifiers such as a linear regression model, a logistic regression model, a support vector machine, a random forest, AdaBoost, a naive bays, and a k-nearest neighbor method are used. be able to.

また、映像解析部２３は、入力映像に基づいて、ユーザの集中の度合いを数値化した集中度を算出する。つまり、映像解析部２３は、識別器に映像を入力することにより第１ユーザの集中度を算出する。 Further, the video analysis unit 23 calculates the degree of concentration in which the degree of concentration of the user is quantified based on the input video. That is, the video analysis unit 23 calculates the degree of concentration of the first user by inputting the video to the classifier.

識別器として、例えば、映像を入力として受け、集中度を出力する多層ニューラルネットワークを用いることができる。この多層ニューラルネットワークは、例えば、ユーザの映像及び集中度を教師データとして、多層ニューラルネットワークのパラメータを深層学習等の機械学習を行うことにより構築される。なお、識別器は多層ニューラルネットワークに限定されるものではなく、例えば、線形回帰モデル、ロジスティック回帰モデル、サポートベクターマシン、ランダムフォレスト、ＡｄａＢｏｏｓｔ、ナイーブベイズ、ｋ近傍法等の他の識別器を用いることができる。 As the discriminator, for example, a multi-layer neural network that receives an image as an input and outputs a degree of concentration can be used. This multi-layer neural network is constructed, for example, by performing machine learning such as deep learning on the parameters of the multi-layer neural network using the user's image and concentration as teacher data. The classifier is not limited to the multi-layer neural network, and other classifiers such as a linear regression model, a logistic regression model, a support vector machine, a random forest, AdaBoost, a naive bays, and a k-nearest neighbor method are used. be able to.

なお、映像解析部２３は、映像中に複数の第１ユーザが含まれる場合には、第１ユーザごとに感情スコア及び集中度を算出する。 When a plurality of first users are included in the video, the video analysis unit 23 calculates the emotion score and the degree of concentration for each first user.

映像解析部２３は、第１ユーザを識別するための情報及び第１ユーザの映像中の位置と、算出した第１ユーザの感情の種類ごとの感情スコア及び集中度とを感情・集中力判断部２８に出力する。 The video analysis unit 23 determines the emotion / concentration determination unit for the information for identifying the first user, the position in the video of the first user, and the calculated emotion score and concentration for each type of emotion of the first user. Output to 28.

音声取得部２４は、第１取得部として機能し、第１装置２に内蔵又は有線もしくは無線により接続されたマイクから第１ユーザの音声を取得する。音声取得部２４は、取得した第１ユーザの音声を、音声符号化部２５及び音声解析部２６に出力する。 The voice acquisition unit 24 functions as a first acquisition unit, and acquires the voice of the first user from a microphone built in the first device 2 or connected by wire or wirelessly. The voice acquisition unit 24 outputs the acquired voice of the first user to the voice coding unit 25 and the voice analysis unit 26.

音声符号化部２５は、音声取得部２４から音声を受け、当該音声を所定の符号化方法に従い符号化する。例えば、音声符号化部２５は、ＭＰＥＧ−４ＡＡＣに従い音声を符号化する。音声符号化部２５は、符号化済み音声を多重化部２７に出力する。 The voice coding unit 25 receives the voice from the voice acquisition unit 24 and encodes the voice according to a predetermined coding method. For example, the voice coding unit 25 encodes voice according to MPEG-4 AAC. The voice coding unit 25 outputs the coded voice to the multiplexing unit 27.

音声解析部２６は、音声取得部２４から音声を受け、音声を発している第１ユーザを特定する。第１ユーザの特定は、例えば、事前に登録された音声データに基づき、話者を識別することにより行ってもよい。音声解析部２６は、例えば、音声から話者の声紋を分析し、隠れマルコフモデル、ニューラルネットワーク、決定木などの識別手法を用いて話者を特定する。ただし、第１ユーザが発話する際に、第１ユーザを識別するための情報を第１ユーザ又は第１装置２の操作者が外部入力するものであってもよい。 The voice analysis unit 26 receives a voice from the voice acquisition unit 24 and identifies the first user who is emitting the voice. The first user may be specified, for example, by identifying the speaker based on the voice data registered in advance. The voice analysis unit 26 analyzes the voiceprint of the speaker from the voice, and identifies the speaker by using an identification method such as a hidden Markov model, a neural network, or a decision tree. However, when the first user speaks, the information for identifying the first user may be externally input by the first user or the operator of the first device 2.

音声解析部２６は、当該音声を解析することにより第１ユーザの感情及び集中度を判断する。つまり、音声解析部２６は、入力音声に基づいて、感情の種類ごとに、感情スコアを算出する。音声解析部２６が解析対象とする感情の種類は、映像解析部２３が解析対象とする感情の種類と同様である。音声解析部２６は、感情の種類ごとに設けられた識別器を用いて、当該識別器に音声を入力することにより感情スコアを算出する。 The voice analysis unit 26 determines the emotion and concentration of the first user by analyzing the voice. That is, the voice analysis unit 26 calculates the emotion score for each type of emotion based on the input voice. The type of emotion to be analyzed by the voice analysis unit 26 is the same as the type of emotion to be analyzed by the video analysis unit 23. The voice analysis unit 26 calculates an emotion score by inputting voice into the classifier using a classifier provided for each type of emotion.

識別器として、例えば、音声を入力として受け、感情スコアを出力する多層ニューラルネットワークを用いることができる。この多層ニューラルネットワークは、例えば、ユーザの音声及び感情スコアを教師データとして、多層ニューラルネットワークのパラメータを深層学習等の機械学習を行うことにより構築される。なお、識別器は多層ニューラルネットワークに限定されるものではなく、例えば、線形回帰モデル、ロジスティック回帰モデル、サポートベクターマシン、ランダムフォレスト、ＡｄａＢｏｏｓｔ、ナイーブベイズ、ｋ近傍法等の他の識別器を用いることができる。 As the discriminator, for example, a multi-layer neural network that receives voice as an input and outputs an emotion score can be used. This multi-layer neural network is constructed, for example, by performing machine learning such as deep learning on the parameters of the multi-layer neural network using the user's voice and emotion score as teacher data. The classifier is not limited to the multi-layer neural network, and other classifiers such as a linear regression model, a logistic regression model, a support vector machine, a random forest, AdaBoost, a naive bays, and a k-nearest neighbor method are used. be able to.

また、音声解析部２６は、入力音声に基づいて、第１ユーザの集中度を算出する。 In addition, the voice analysis unit 26 calculates the degree of concentration of the first user based on the input voice.

識別器として、例えば、音声を入力として受け、集中度を出力する多層ニューラルネットワークを用いることができる。この多層ニューラルネットワークは、例えば、ユーザの音声及び集中度を教師データとして、多層ニューラルネットワークのパラメータを深層学習等の機械学習を行うことにより構築される。なお、識別器は多層ニューラルネットワークに限定されるものではなく、例えば、線形回帰モデル、ロジスティック回帰モデル、サポートベクターマシン、ランダムフォレスト、ＡｄａＢｏｏｓｔ、ナイーブベイズ、ｋ近傍法等の他の識別器を用いることができる。 As the discriminator, for example, a multi-layer neural network that receives voice as an input and outputs a degree of concentration can be used. This multi-layer neural network is constructed, for example, by performing machine learning such as deep learning on the parameters of the multi-layer neural network using the user's voice and concentration as teacher data. The classifier is not limited to the multi-layer neural network, and other classifiers such as a linear regression model, a logistic regression model, a support vector machine, a random forest, AdaBoost, a naive bays, and a k-nearest neighbor method are used. be able to.

なお、音声解析部２６は、音声中に複数の第１ユーザが含まれる場合には、第１ユーザごとに感情スコア及び集中度を算出する。 When a plurality of first users are included in the voice, the voice analysis unit 26 calculates the emotion score and the degree of concentration for each first user.

音声解析部２６は、第１ユーザを識別するための情報と、算出した第１ユーザの感情の種類ごとの感情スコアと、集中度とを感情・集中力判断部２８に出力する。 The voice analysis unit 26 outputs the information for identifying the first user, the calculated emotion score for each type of emotion of the first user, and the degree of concentration to the emotion / concentration determination unit 28.

多重化部２７は、映像符号化部２２及び音声符号化部２５から符号化済み映像及び符号化済み音声をそれぞれ受け、符号化済み映像及び符号化済み音声を多重化することにより、多重化データを生成する。例えば、多重化部２７は、ＭＰＥＧ−ＨＭＭＴ（MPEG Media Transport）に従って多重化を行う。多重化部２７は、生成した多重化データを第１送信部２９に出力する。 The multiplexing unit 27 receives the encoded video and the encoded audio from the video coding unit 22 and the audio coding unit 25, respectively, and multiplexes the encoded video and the encoded audio to obtain the multiplexed data. To generate. For example, the multiplexing unit 27 performs multiplexing according to MPEG-H MMT (MPEG Media Transport). The multiplexing unit 27 outputs the generated multiplexed data to the first transmission unit 29.

感情・集中力判断部２８は、映像解析部２３から第１ユーザを識別するための情報及び第１ユーザの映像中の位置と、第１ユーザの感情の種類ごとの感情スコア及び第１ユーザの集中度を受ける。また、感情・集中力判断部２８は、音声解析部２６から第１ユーザを識別するための情報と、第１ユーザの感情の種類ごとの感情スコア及び第１ユーザの集中度を受ける。 The emotion / concentration determination unit 28 includes information for identifying the first user from the video analysis unit 23, a position in the video of the first user, an emotion score for each type of emotion of the first user, and the emotion score of the first user. Receive a degree of concentration. Further, the emotion / concentration determination unit 28 receives information for identifying the first user from the voice analysis unit 26, an emotion score for each type of emotion of the first user, and a degree of concentration of the first user.

感情・集中力判断部２８は、映像解析部２３及び音声解析部２６から受けた第１ユーザの感情の種類ごとの感情スコアに基づいて、第１ユーザの感情を判断する。例えば、感情・集中力判断部２８は、感情の種類ごとに、映像解析部２３から受けた第１ユーザの当該種類に対応する感情スコアと、音声解析部２６から受けた第１ユーザの当該種類に対応する感情スコアとを単純加算又は重みづけ加算することで、当該種類の感情スコアを算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの感情スコアに応じて変化させてもよい。 The emotion / concentration determination unit 28 determines the emotion of the first user based on the emotion score for each type of emotion of the first user received from the video analysis unit 23 and the voice analysis unit 26. For example, the emotion / concentration determination unit 28 has an emotion score corresponding to the type of the first user received from the video analysis unit 23 and the type of the first user received from the voice analysis unit 26 for each type of emotion. The emotion score of the type is calculated by simply adding or weighting the emotion score corresponding to. The weight of the weighting addition may be set in advance or may be changed according to the two emotion scores.

なお、感情スコアの算出方法はこれに限定されるものではない。例えば、感情・集中力判断部２８は、感情の種類ごとに設けられた識別器を用いて、第１ユーザの感情スコアを算出してもよい。具体的には、感情・集中力判断部２８は、各感情の種類の識別器に映像解析部２３から受けた第１ユーザの当該種類に対応する感情スコアと、音声解析部２６から受けた第１ユーザの当該種類に対応する感情スコアとを入力することにより、当該感情の種類に対する第１ユーザの感情スコアを算出する。 The method of calculating the emotion score is not limited to this. For example, the emotion / concentration determination unit 28 may calculate the emotion score of the first user by using the discriminator provided for each type of emotion. Specifically, the emotion / concentration determination unit 28 receives the emotion score corresponding to the type of the first user received from the video analysis unit 23 and the voice analysis unit 26 to the discriminator of each emotion type. By inputting the emotion score corresponding to the type of one user, the emotion score of the first user for the type of emotion is calculated.

識別器として、例えば、映像に基づき算出された感情スコアと音声に基づき算出された感情スコアを入力として受け、感情スコアを出力する多層ニューラルネットワークを用いることができる。この多層ニューラルネットワークは、例えば、映像に基づき算出された感情スコア及び音声に基づき算出された感情スコアと、ニューラルネットワークの設計者が判断した感情スコアとを教師データとして、多層ニューラルネットワークのパラメータを深層学習等の機械学習を行うことにより構築される。なお、識別器は多層ニューラルネットワークに限定されるものではなく、例えば、線形回帰モデル、ロジスティック回帰モデル、サポートベクターマシン、ランダムフォレスト、ＡｄａＢｏｏｓｔ、ナイーブベイズ、ｋ近傍法等の他の識別器を用いることができる。 As the discriminator, for example, a multi-layer neural network that receives an emotion score calculated based on video and an emotion score calculated based on audio as input and outputs an emotion score can be used. In this multi-layer neural network, for example, the emotion score calculated based on the video and the emotion score calculated based on the voice and the emotion score determined by the designer of the neural network are used as teacher data, and the parameters of the multi-layer neural network are deep-layered. It is constructed by performing machine learning such as learning. The classifier is not limited to the multi-layer neural network, and other classifiers such as a linear regression model, a logistic regression model, a support vector machine, a random forest, AdaBoost, a naive bays, and a k-nearest neighbor method are used. be able to.

また、感情・集中力判断部２８は、映像解析部２３及び音声解析部２６から受けた第１ユーザの感情の種類ごとの集中度に基づいて、第１ユーザの集中度を判断する。例えば、感情・集中力判断部２８は、映像解析部２３から受けた第１ユーザの集中度と、音声解析部２６から受けた第１ユーザの集中度とを単純加算又は重みづけ加算することで、第１ユーザの集中度を算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの集中度に応じて変化させてもよい。 Further, the emotion / concentration determination unit 28 determines the concentration level of the first user based on the concentration level of each type of emotion of the first user received from the video analysis unit 23 and the voice analysis unit 26. For example, the emotion / concentration determination unit 28 simply adds or weights the concentration of the first user received from the video analysis unit 23 and the concentration of the first user received from the voice analysis unit 26. , Calculate the concentration of the first user. The weight of the weighting addition may be set in advance or may be changed according to the two degrees of concentration.

なお、集中度の算出方法はこれに限定されるものではない。例えば、感情・集中力判断部２８は、識別器を用いて、第１ユーザの集中度を算出してもよい。具体的には、感情・集中力判断部２８は、識別器に映像解析部２３から受けた第１ユーザの集中度と、音声解析部２６から受けた第１ユーザの集中度とを入力することにより、第１ユーザの集中度を算出する。 The method of calculating the degree of concentration is not limited to this. For example, the emotion / concentration determination unit 28 may calculate the concentration level of the first user by using the discriminator. Specifically, the emotion / concentration determination unit 28 inputs the concentration level of the first user received from the video analysis unit 23 and the concentration level of the first user received from the voice analysis unit 26 into the classifier. To calculate the concentration of the first user.

識別器として、例えば、映像に基づき算出された集中度と音声に基づき算出された集中度を入力として受け、集中度を出力する多層ニューラルネットワークを用いることができる。この多層ニューラルネットワークは、例えば、映像に基づき算出された集中度及び音声に基づき算出された集中度と、ニューラルネットワークの設計者が判断した集中度とを教師データとして、多層ニューラルネットワークのパラメータを深層学習等の機械学習を行うことにより構築される。なお、識別器は多層ニューラルネットワークに限定されるものではなく、例えば、線形回帰モデル、ロジスティック回帰モデル、サポートベクターマシン、ランダムフォレスト、ＡｄａＢｏｏｓｔ、ナイーブベイズ、ｋ近傍法等の他の識別器を用いることができる。 As the discriminator, for example, a multi-layer neural network that receives the concentration level calculated based on the video and the concentration level calculated based on the audio as inputs and outputs the concentration level can be used. In this multi-layer neural network, for example, the concentration calculated based on the video and the concentration calculated based on the sound and the concentration determined by the designer of the neural network are used as training data, and the parameters of the multi-layer neural network are deep-layered. It is constructed by performing machine learning such as learning. The classifier is not limited to the multi-layer neural network, and other classifiers such as a linear regression model, a logistic regression model, a support vector machine, a random forest, AdaBoost, a naive bays, and a k-nearest neighbor method are used. be able to.

なお、感情・集中力判断部２８は、第１ユーザが複数いる場合には、第１ユーザごとに感情スコア及び集中度を算出する。 When there are a plurality of first users, the emotion / concentration determination unit 28 calculates the emotion score and the degree of concentration for each first user.

感情・集中力判断部２８は、算出した感情の種類ごとの第１ユーザの感情スコアと、第１ユーザの集中度とを、第１ユーザの識別子及び算出時刻と対応付けて感情・集中力ＤＢ５に書き込む。なお、第１ユーザの識別子には、第１ユーザを識別するための情報（例えば、ユーザ名）と、第１ユーザの映像中の位置情報とが含まれるものとする。 The emotion / concentration determination unit 28 associates the calculated emotion score of the first user for each type of emotion with the degree of concentration of the first user in association with the identifier of the first user and the calculated time, and the emotion / concentration DB5. Write to. It should be noted that the identifier of the first user includes information for identifying the first user (for example, a user name) and position information in the video of the first user.

なお、感情・集中力判断部２８は、第１ユーザの感情スコアの履歴に基づいて、算出した第１ユーザの感情スコアを補正してもよい。例えば、感情・集中力判断部２８は、感情の種類ごとに、過去一定期間の第１ユーザの感情スコアを感情・集中力ＤＢ５から読み出し、読み出した感情スコアに基づいて、感情スコアの標準偏差及び平均を算出する。感情・集中力判断部２８は、以下の式１に従い、感情の種類ごとに、算出した第１ユーザの感情スコアを、算出した感情スコアの標準偏差及び平均を用いて標準化する。これにより、第１ユーザ間で感情スコアを標準化することができる。 The emotion / concentration determination unit 28 may correct the calculated emotion score of the first user based on the history of the emotion score of the first user. For example, the emotion / concentration determination unit 28 reads the emotion score of the first user for a certain period in the past from the emotion / concentration DB 5 for each type of emotion, and based on the read emotion score, the standard deviation of the emotion score and the standard deviation of the emotion score. Calculate the average. The emotion / concentration determination unit 28 standardizes the calculated emotion score of the first user for each type of emotion using the standard deviation and average of the calculated emotion score according to the following equation 1. Thereby, the emotion score can be standardized among the first users.

標準化された感情スコア＝（算出した感情スコア−感情スコアの平均）
／感情スコアの標準偏差 …（式１） Standardized emotion score = (calculated emotion score-average emotion score)
/ Standard deviation of emotion score ... (Equation 1)

また、感情・集中力判断部２８は、感情スコアの標準化の代わりに、感情スコアの正規化を行ってもよい。例えば、感情・集中力判断部２８は、感情の種類ごとに、過去一定期間の第１ユーザの感情スコアを感情・集中力ＤＢ５から読み出し、読み出した感情スコアに基づいて、感情スコアの最大値及び最小値を算出する。感情・集中力判断部２８は、以下の式２に従い、感情の種類ごとに、算出した第１ユーザの感情スコアを、算出した感情スコアの最大値及び最小値を用いて正規化する。これにより、第１ユーザ間で感情スコアを正規化することができる。 Further, the emotion / concentration determination unit 28 may normalize the emotion score instead of standardizing the emotion score. For example, the emotion / concentration determination unit 28 reads the emotion score of the first user for a certain period in the past from the emotion / concentration DB5 for each type of emotion, and based on the read emotion score, the maximum value of the emotion score and Calculate the minimum value. The emotion / concentration determination unit 28 normalizes the calculated emotion score of the first user for each type of emotion using the calculated maximum and minimum values of the emotion score according to the following equation 2. This makes it possible to normalize the emotional score among the first users.

正規化された感情スコア＝（感情スコア−感情スコアの最小値）
／（感情スコアの最大値−感情スコアの最小値） …（式２） Normalized emotion score = (emotion score-minimum emotion score)
/ (Maximum emotional score-Minimum emotional score) ... (Equation 2)

また、感情・集中力判断部２８は、第１ユーザの集中度の履歴に基づいて、算出した第１ユーザの集中度を補正してもよい。例えば、感情・集中力判断部２８は、過去一定期間の第１ユーザの集中度を感情・集中力ＤＢ５から読み出し、読み出した集中度に基づいて、集中度の標準偏差及び平均を算出する。感情・集中力判断部２８は、以下の式３に従い、算出した第１ユーザの集中度を、算出した集中度の標準偏差及び平均を用いて標準化する。これにより、第１ユーザ間で集中度を標準化することができる。 In addition, the emotion / concentration determination unit 28 may correct the calculated concentration level of the first user based on the history of the concentration level of the first user. For example, the emotion / concentration determination unit 28 reads the concentration of the first user for a certain period in the past from the emotion / concentration DB 5, and calculates the standard deviation and the average of the concentration based on the read concentration. The emotion / concentration determination unit 28 standardizes the calculated concentration of the first user according to the following equation 3 using the standard deviation and the average of the calculated concentration. As a result, the degree of concentration can be standardized among the first users.

標準化された集中度＝（算出した集中度−集中度の平均）
／集中度の標準偏差 …（式３） Standardized concentration = (calculated concentration-average concentration)
/ Standard deviation of concentration ratio ... (Equation 3)

また、感情・集中力判断部２８は、集中度の標準化の代わりに、集中度の正規化を行ってもよい。例えば、感情・集中力判断部２８は、過去一定期間の第１ユーザの集中度を感情・集中力ＤＢ５から読み出し、読み出した集中度に基づいて、集中度の最大値及び最小値を算出する。感情・集中力判断部２８は、以下の式４に従い、感情の種類ごとに、算出した第１ユーザの集中度を、算出した集中度の最大値及び最小値を用いて正規化する。これにより、第１ユーザ間で集中度を正規化することができる。 Further, the emotion / concentration determination unit 28 may normalize the concentration level instead of standardizing the concentration level. For example, the emotion / concentration determination unit 28 reads the concentration of the first user for a certain period in the past from the emotion / concentration DB 5, and calculates the maximum value and the minimum value of the concentration based on the read concentration. The emotion / concentration determination unit 28 normalizes the calculated concentration of the first user for each type of emotion using the calculated maximum and minimum values of the concentration according to the following equation 4. As a result, the degree of concentration can be normalized among the first users.

正規化された集中度＝（集中度−集中度の最小値）
／（集中度の最大値−集中度の最小値） …（式４） Normalized concentration = (concentration-minimum concentration)
/ (Maximum concentration-Minimum concentration) ... (Equation 4)

感情・集中力判断部２８は、算出した感情の種類ごとの第１ユーザの感情スコアと、第１ユーザの集中度とを、第１ユーザの識別子及び算出時刻と合わせて第１送信部２９に出力する。 The emotion / concentration determination unit 28 sends the calculated emotion score of the first user for each type of emotion and the degree of concentration of the first user to the first transmission unit 29 together with the identifier of the first user and the calculated time. Output.

第１送信部２９は、多重化部２７から多重化データを受け、感情・集中力判断部２８から第１ユーザの識別子及び算出時刻が付加された感情の種類ごとの第１ユーザの感情スコアと、第１ユーザの集中度とを受ける。第１送信部２９は、受けたこれらのデータを、第２装置４に送信する。 The first transmission unit 29 receives the multiplexed data from the multiplexing unit 27, and the emotion / concentration determination unit 28 adds the identifier of the first user and the calculated time to the emotion score of the first user for each type of emotion. , Receives the concentration of the first user. The first transmission unit 29 transmits these received data to the second device 4.

第１受信部３０は、第２装置４から符号化済み映像及び符号化済み音声が多重化された多重化データと、感情の種類ごとの第２ユーザの感情スコアと、第２ユーザの集中度とを受信する。なお、これらのデータには、第２ユーザの識別子と、第２ユーザの感情スコア及び集中度の算出時刻とが付加されている。第１受信部３０は、第２装置４から受信したこれらのデータのセットを分離部３１に出力する。なお、第２ユーザの識別子には、第２ユーザを識別するための情報（例えば、ユーザ名）と、第２ユーザの映像中の位置情報とが含まれているものとする。 The first receiving unit 30 includes multiplexed data in which encoded video and encoded audio are multiplexed from the second device 4, emotion scores of the second user for each type of emotion, and concentration of the second user. And receive. An identifier of the second user and a calculation time of the emotion score and the degree of concentration of the second user are added to these data. The first receiving unit 30 outputs a set of these data received from the second device 4 to the separating unit 31. It is assumed that the identifier of the second user includes information for identifying the second user (for example, a user name) and position information in the video of the second user.

分離部３１は、第１受信部３０からデータセットを受け、データセットを分離する。つまり、分離部３１は、データセットに含まれる多重化データを符号化済み映像および符号化済み音声に分離し、分離した符号化済み映像および符号化済み音声を映像復号化部３２及び音声復号化部３３にそれぞれ出力する。また、分離部３１は、データセットから第２ユーザの識別子及び算出時刻が付加された感情の種類ごとの感情スコアと集中度とを分離し、分離したこれらのデータを感情・集中力処理部３４に出力する。 The separation unit 31 receives the data set from the first receiving unit 30 and separates the data set. That is, the separation unit 31 separates the multiplexed data included in the data set into the encoded video and the encoded audio, and separates the separated encoded video and the encoded audio into the video decoding unit 32 and the audio decoding. Output to each unit 33. Further, the separation unit 31 separates the emotion score and the degree of concentration for each type of emotion to which the second user's identifier and the calculated time are added from the data set, and separates these separated data into the emotion / concentration processing unit 34. Output to.

映像復号化部３２は、分離部３１から符号化済み映像を受け、当該映像を所定の復号化方法に従い復号化する。復号化方法は、第２装置４における映像の符号化方法に対応する方法とする。例えば、映像が４Ｋ映像又は８Ｋ映像の場合であって、第２装置４がＨ．２６５／ＨＥＶＣに従って映像を符号化した場合には、映像復号化部３２は、Ｈ．２６５／ＨＥＶＣに従って符号化済み映像を復号化する。映像復号化部３２は、復号化した映像を感情・集中力処理部３４及び表示・出力部３５に出力する。 The video decoding unit 32 receives the encoded video from the separation unit 31 and decodes the video according to a predetermined decoding method. The decoding method is a method corresponding to the video coding method in the second device 4. For example, when the video is a 4K video or an 8K video, the second device 4 is H.I. When the video is encoded according to 265 / HEVC, the video decoding unit 32 uses H.I. The encoded video is decoded according to 265 / HEVC. The video decoding unit 32 outputs the decoded video to the emotion / concentration processing unit 34 and the display / output unit 35.

音声復号化部３３は、分離部３１から符号化済み音声を受け、当該音声を所定の復号化方法に従い復号化する。復号化方法は、第２装置４における音声の符号化方法に対応する方法とする。例えば、第２装置４がＭＰＥＧ−４ＡＡＣに従い音声を符号化した場合には、音声復号化部３３は、ＭＰＥＧ−４ＡＡＣに従い音声を復号化する。音声復号化部３３は、復号化した音声を表示・出力部３５に出力する。 The voice decoding unit 33 receives the encoded voice from the separation unit 31 and decodes the voice according to a predetermined decoding method. The decoding method is a method corresponding to the voice coding method in the second device 4. For example, when the second device 4 encodes the voice according to the MPEG-4 AAC, the voice decoding unit 33 decodes the voice according to the MPEG-4 AAC. The voice decoding unit 33 outputs the decoded voice to the display / output unit 35.

感情・集中力処理部３４は、分離部３１から第２ユーザの識別子及び計測時刻と、第２ユーザの感情の種類ごとの感情スコア及び集中度とを受ける。また、感情・集中力処理部３４は、映像復号化部３２から映像を受ける。 The emotion / concentration processing unit 34 receives the identifier and measurement time of the second user from the separation unit 31 and the emotion score and concentration level for each type of emotion of the second user. Further, the emotion / concentration processing unit 34 receives an image from the image decoding unit 32.

感情・集中力処理部３４は、これらのデータから、ディスプレイに表示するための表示用データを作成する。例えば、感情・集中力処理部３４は、感情の種類ごとに感情スコアを所定の閾値で閾値処理することにより、感情に対応した表示用のアイコンの表示用データを作成する。例えば、感情・集中力処理部３４は、幸せな感情についての感情スコアが８０以上である第２ユーザに対して、当該第２ユーザの映像中の位置の近傍に幸せな感情に対応したアイコンを表示させるための表示用データを作成する。また、感情・集中力処理部３４は、不快な感情についての感情スコアが８０以上である第２ユーザに対して、当該第２ユーザの映像中の位置の近傍に不快な感情に対応したアイコンを表示させるための表示用データを作成する。 The emotion / concentration processing unit 34 creates display data for display on the display from these data. For example, the emotion / concentration processing unit 34 creates display data of a display icon corresponding to an emotion by performing threshold processing of an emotion score for each type of emotion with a predetermined threshold value. For example, the emotion / concentration processing unit 34 displays an icon corresponding to the happy emotion in the vicinity of the position in the video of the second user for the second user having an emotion score of 80 or more for the happy emotion. Create display data for display. In addition, the emotion / concentration processing unit 34 displays an icon corresponding to the unpleasant emotion in the vicinity of the position in the video of the second user for the second user having an emotion score of 80 or more for the unpleasant emotion. Create display data for display.

また、感情・集中力処理部３４は、例えば、映像から第２ユーザの映像を切り出し、切り出した映像の隣に、第２ユーザの感情及び集中度の計測時刻、検出した感情及び集中度を表示するための表示用データを作成する。 Further, the emotion / concentration processing unit 34 cuts out the image of the second user from the image, and displays the measurement time of the emotion and concentration of the second user, the detected emotion and concentration next to the cut out image, for example. Create display data to do so.

感情・集中力処理部３４は、作成した表示用データを表示・出力部３５に出力する。 The emotion / concentration processing unit 34 outputs the created display data to the display / output unit 35.

表示・出力部３５は、提供部及び第１出力部として機能し、音声復号化部３３から音声を受け、音声をスピーカーから出力する。 The display / output unit 35 functions as a providing unit and a first output unit, receives audio from the audio decoding unit 33, and outputs the audio from the speaker.

また、表示・出力部３５は、映像復号化部３２から映像を受け、感情・集中力処理部３４から表示用データを受け、表示用データを映像に重畳させ、重畳後の映像をディスプレイに表示させる。 Further, the display / output unit 35 receives an image from the image decoding unit 32, receives display data from the emotion / concentration processing unit 34, superimposes the display data on the image, and displays the superimposed image on the display. Let me.

図３は、ディスプレイに表示される映像の一例を示す図である。 FIG. 3 is a diagram showing an example of an image displayed on the display.

映像は、映像表示領域６０と感情履歴通知領域６１とを含む。映像表示領域６０には、映像復号化部３２から受けた映像が表示される。ここでは、第２ユーザであるユーザ７１Ａ〜７１Ｃが表示されている。また、ユーザ７１Ａ〜７１Ｃの近傍には、感情・集中力処理部３４から受けた表示用データに示されるアイコン７２Ａ〜７２Ｃがそれぞれ表示されている。アイコン７２Ａ及び７２Ｃは幸せな感情に対応したアイコンであり、アイコン７２Ｂは不快な感情に対応したアイコンである。つまり、ユーザ７１Ａ及びユーザ７１Ｃの幸せな感情についての感情スコアは８０以上であり、アイコン７２Ｂの不快な感情についての感情スコアは８０以上であることが示されている。 The video includes a video display area 60 and an emotion history notification area 61. The video received from the video decoding unit 32 is displayed in the video display area 60. Here, the second users, users 71A to 71C, are displayed. Further, in the vicinity of the users 71A to 71C, icons 72A to 72C shown in the display data received from the emotion / concentration processing unit 34 are displayed, respectively. The icons 72A and 72C are icons corresponding to happy emotions, and the icons 72B are icons corresponding to unpleasant emotions. That is, it is shown that the emotion scores of the happy emotions of the users 71A and 71C are 80 or more, and the emotion scores of the unpleasant emotions of the icon 72B are 80 or more.

感情履歴通知領域６１には、映像から切り出されたユーザ７１Ａ〜７１Ｃの映像が表示されている。また、その隣には、第２ユーザごとに判断結果７３Ａ〜７３Ｃが表示されている。判断結果７３Ａ〜７３Ｃは、ユーザ７１Ａ〜７１Ｃから検出された感情及び感情の計測時刻と、集中度とがそれぞれ示されている。例えば、判断結果７３Ａは、１４：１０：２５にユーザ７１Ａの幸せな感情についての感情スコアが８０以上になったことと、その時の集中度が８０％であることとを示している。また、判断結果７３Ｂは、１４：０８：１０にユーザ７１Ｂの不快な感情についての感情スコアが８０以上になったことと、その時の集中度が６０％であることとを示している。さらに、判断結果７３Ｃは、１４：０７：５０にユーザ７１Ｃの幸せな感情についての感情スコアが８０以上になったことと、その時の集中度が９０％であることとを示している。なお、判断結果７３Ａ〜７３Ｃは、計測時刻の集中度ではなく、現在時刻の集中度を示してもよい。 In the emotion history notification area 61, the images of the users 71A to 71C cut out from the images are displayed. Next to that, the judgment results 73A to 73C are displayed for each second user. The determination results 73A to 73C indicate the emotions detected from the users 71A to 71C, the measurement time of the emotions, and the degree of concentration, respectively. For example, the judgment result 73A indicates that the emotion score for the happy emotion of the user 71A became 80 or more at 14:10:25, and the concentration ratio at that time was 80%. Further, the judgment result 73B indicates that the emotion score of the user 71B regarding the unpleasant emotion became 80 or more at 14:08:10, and the concentration ratio at that time was 60%. Further, the judgment result 73C indicates that the emotion score for the happy emotion of the user 71C became 80 or more at 14:07:50, and the concentration ratio at that time was 90%. The determination results 73A to 73C may indicate the degree of concentration of the current time instead of the degree of concentration of the measurement time.

〔第２装置４の構成〕
図４は、本開示の実施形態１に係る第２装置４の機能的構成を示すブロック図である。第２装置４の構成は、第１装置２の構成と対をなす。 [Structure of the second device 4]
FIG. 4 is a block diagram showing a functional configuration of the second device 4 according to the first embodiment of the present disclosure. The configuration of the second device 4 is paired with the configuration of the first device 2.

第２装置４は、映像取得部４１と、映像符号化部４２と、映像解析部４３と、音声取得部４４と、音声符号化部４５と、音声解析部４６と、多重化部４７と、感情・集中力判断部４８と、第２送信部４９と、第２受信部５０と、分離部５１と、映像復号化部５２と、音声復号化部５３と、感情・集中力処理部５４と、表示・出力部５５とを備える。 The second device 4 includes a video acquisition unit 41, a video coding unit 42, a video analysis unit 43, an audio acquisition unit 44, an audio coding unit 45, an audio analysis unit 46, a multiplexing unit 47, and the like. The emotion / concentration determination unit 48, the second transmission unit 49, the second reception unit 50, the separation unit 51, the video decoding unit 52, the audio decoding unit 53, and the emotion / concentration processing unit 54. , Display / output unit 55.

第２装置４は、ＣＰＵ、ＲＯＭ、ＲＡＭ、ＨＤＤ、通信インタフェース、入出力インタフェース等を備える一般的なコンピュータにより実現することができる。例えば、ＨＤＤに記録されたコンピュータプログラムをＲＡＭ上に展開し、ＣＰＵ上で実行することにより、各処理部２１〜３５は機能的に実現される。ただし、各処理部４１〜５５の一部又は全部がＬＳＩ、ＡＳＩＣ、ＦＰＧＡ等のハードウェアにより実現されていてもよい。 The second device 4 can be realized by a general computer including a CPU, ROM, RAM, HDD, communication interface, input / output interface, and the like. For example, by expanding the computer program recorded in the HDD on the RAM and executing it on the CPU, each processing unit 21 to 35 is functionally realized. However, a part or all of the processing units 41 to 55 may be realized by hardware such as LSI, ASIC, and FPGA.

映像取得部４１は、第２取得部として機能し、第２装置４に内蔵又は有線もしくは無線により接続されたカメラからカメラが撮影した映像を取得する。映像取得部４１は、取得した映像を、映像符号化部４２及び映像解析部４３に出力する。 The image acquisition unit 41 functions as a second acquisition unit, and acquires images taken by the camera from a camera built in the second device 4 or connected by wire or wirelessly. The video acquisition unit 41 outputs the acquired video to the video coding unit 42 and the video analysis unit 43.

映像符号化部４２は、映像取得部４１から映像を受け、当該映像を所定の符号化方法に従い符号化する。例えば、映像符号化部４２は、映像が４Ｋ映像又は８Ｋ映像の場合には、Ｈ．２６５／ＨＥＶＣ（High Efficiency Video Coding）に従って映像を符号化する。映像符号化部４２は、符号化済み映像を多重化部４７に出力する。 The video coding unit 42 receives the video from the video acquisition unit 41 and encodes the video according to a predetermined coding method. For example, when the video is a 4K video or an 8K video, the video coding unit 42 may use the H. Video is encoded according to 265 / HEVC (High Efficiency Video Coding). The video coding unit 42 outputs the coded video to the multiplexing unit 47.

映像解析部４３は、映像取得部４１から映像を受け、映像に映っているユーザ（以下、「第２ユーザ」という）と第２ユーザの映像中の位置とを特定する。例えば、映像解析部４３は、顔認識アルゴリズムを用いて映像中の第２ユーザを特定する。ただし、第２ユーザを識別するための情報（例えば、ユーザ名）と映像中の第２ユーザの位置を第２装置４の操作者が外部入力により指定するものであってもよい。 The image analysis unit 43 receives an image from the image acquisition unit 41, and identifies a user (hereinafter, referred to as “second user”) shown in the image and a position in the image of the second user. For example, the image analysis unit 43 uses a face recognition algorithm to identify a second user in the image. However, the information for identifying the second user (for example, the user name) and the position of the second user in the video may be specified by the operator of the second device 4 by external input.

映像解析部４３は、判断部として機能し、当該映像を解析することにより第２ユーザの感情及び集中度を判断する。つまり、映像解析部４３は、入力映像に基づいて、感情の種類ごとに、感情の度合いを数値化した感情スコアを算出する。映像解析部４３が解析対象とする感情の種類は、例えば、怒り、軽蔑、嫌悪、驚き、恐怖、喜び、悲しみ、驚き、幸せ、不快などである。映像解析部４３は、感情の種類ごとに設けられた識別器を用いて、当該識別器に映像を入力することにより第２ユーザの感情スコアを算出する。 The image analysis unit 43 functions as a determination unit, and determines the emotion and concentration of the second user by analyzing the image. That is, the video analysis unit 43 calculates an emotion score that quantifies the degree of emotion for each type of emotion based on the input video. The types of emotions analyzed by the image analysis unit 43 are, for example, anger, contempt, disgust, surprise, fear, joy, sadness, surprise, happiness, and discomfort. The image analysis unit 43 calculates the emotion score of the second user by inputting an image into the classifier using the classifiers provided for each type of emotion.

また、映像解析部４３は、入力映像に基づいて、ユーザの集中の度合いを数値化した集中度を算出する。つまり、映像解析部４３は、識別器に映像を入力することにより第２ユーザの集中度を算出する。 Further, the video analysis unit 43 calculates the degree of concentration in which the degree of concentration of the user is quantified based on the input video. That is, the video analysis unit 43 calculates the degree of concentration of the second user by inputting the video to the classifier.

なお、映像解析部４３は、映像中に複数の第２ユーザが含まれる場合には、第２ユーザごとに感情スコア及び集中度を算出する。 When a plurality of second users are included in the video, the video analysis unit 43 calculates the emotion score and the degree of concentration for each second user.

映像解析部４３は、第２ユーザを識別するための情報及び第２ユーザの映像中の位置と、算出した第２ユーザの感情の種類ごとの感情スコア及び集中度とを感情・集中力判断部４８に出力する。 The video analysis unit 43 determines the emotion / concentration determination unit for the information for identifying the second user, the position in the video of the second user, and the calculated emotion score and concentration for each type of emotion of the second user. Output to 48.

音声取得部４４は、第２取得部として機能し、第２装置４に内蔵又は有線もしくは無線により接続されたマイクから第２ユーザの音声を取得する。音声取得部４４は、取得した第２ユーザの音声を、音声符号化部４５及び音声解析部４６に出力する。 The voice acquisition unit 44 functions as a second acquisition unit, and acquires the voice of the second user from a microphone built in the second device 4 or connected by wire or wirelessly. The voice acquisition unit 44 outputs the acquired voice of the second user to the voice coding unit 45 and the voice analysis unit 46.

音声符号化部４５は、音声取得部４４から音声を受け、当該音声を所定の符号化方法に従い符号化する。例えば、音声符号化部４５は、ＭＰＥＧ−４ＡＡＣに従い音声を符号化する。音声符号化部４５は、符号化済み音声を多重化部４７に出力する。 The voice coding unit 45 receives the voice from the voice acquisition unit 44 and encodes the voice according to a predetermined coding method. For example, the voice coding unit 45 encodes voice according to MPEG-4 AAC. The voice coding unit 45 outputs the coded voice to the multiplexing unit 47.

音声解析部４６は、音声取得部４４から音声を受け、音声を発している第２ユーザを特定する。第２ユーザの特定は、例えば、事前に登録された音声データに基づき、話者を識別することにより行ってもよい。音声解析部４６は、例えば、音声から話者の声紋を分析し、隠れマルコフモデル、ニューラルネットワーク、決定木などの識別手法を用いて話者を特定する。ただし、第２ユーザが発話する際に、第２ユーザを識別するための情報を第２ユーザ又は第２装置４の操作者が外部入力するものであってもよい。 The voice analysis unit 46 receives a voice from the voice acquisition unit 44 and identifies a second user who is emitting the voice. The second user may be specified, for example, by identifying the speaker based on the voice data registered in advance. The voice analysis unit 46 analyzes the voiceprint of the speaker from the voice, and identifies the speaker by using an identification method such as a hidden Markov model, a neural network, or a decision tree. However, when the second user speaks, the information for identifying the second user may be externally input by the second user or the operator of the second device 4.

音声解析部４６は、判断部として機能し、当該音声を解析することにより第２ユーザの感情及び集中度を判断する。つまり、音声解析部４６は、入力音声に基づいて、感情の種類ごとに、感情スコアを算出する。音声解析部４６が解析対象とする感情の種類は、映像解析部４３が解析対象とする感情の種類と同様である。音声解析部４６は、感情の種類ごとに設けられた識別器を用いて、当該識別器に音声を入力することにより感情スコアを算出する。 The voice analysis unit 46 functions as a determination unit, and determines the emotion and concentration of the second user by analyzing the voice. That is, the voice analysis unit 46 calculates the emotion score for each type of emotion based on the input voice. The type of emotion to be analyzed by the voice analysis unit 46 is the same as the type of emotion to be analyzed by the video analysis unit 43. The voice analysis unit 46 calculates an emotion score by inputting voice into the classifier using a classifier provided for each type of emotion.

また、音声解析部４６は、入力音声に基づいて、第２ユーザの集中度を算出する。 In addition, the voice analysis unit 46 calculates the degree of concentration of the second user based on the input voice.

なお、音声解析部４６は、音声中に複数の第２ユーザが含まれる場合には、第２ユーザごとに感情スコア及び集中度を算出する。 When a plurality of second users are included in the voice, the voice analysis unit 46 calculates the emotion score and the degree of concentration for each second user.

音声解析部４６は、第２ユーザを識別するための情報と、算出した第２ユーザの感情の種類ごとの感情スコアと、集中度とを感情・集中力判断部４８に出力する。 The voice analysis unit 46 outputs the information for identifying the second user, the calculated emotion score for each type of emotion of the second user, and the degree of concentration to the emotion / concentration determination unit 48.

多重化部４７は、映像符号化部４２及び音声符号化部４５から符号化済み映像及び符号化済み音声をそれぞれ受け、符号化済み映像及び符号化済み音声を多重化することにより、多重化データを生成する。例えば、多重化部４７は、ＭＰＥＧ−ＨＭＭＴに従って多重化を行う。多重化部４７は、生成した多重化データを第２送信部４９に出力する。 The multiplexing unit 47 receives the encoded video and the encoded audio from the video coding unit 42 and the audio coding unit 45, respectively, and multiplexes the encoded video and the encoded audio to obtain the multiplexed data. To generate. For example, the multiplexing unit 47 performs multiplexing according to MPEG-H MMT. The multiplexing unit 47 outputs the generated multiplexing data to the second transmission unit 49.

感情・集中力判断部４８は、映像解析部４３から第２ユーザを識別するための情報及び第２ユーザの映像中の位置と、第２ユーザの感情の種類ごとの感情スコア及び第２ユーザの集中度を受ける。また、感情・集中力判断部４８は、音声解析部４６から第２ユーザを識別するための情報と、第２ユーザの感情の種類ごとの感情スコア及び第２ユーザの集中度を受ける。 The emotion / concentration determination unit 48 includes information for identifying the second user from the video analysis unit 43, a position in the video of the second user, an emotion score for each type of emotion of the second user, and the second user's emotion score. Receive a degree of concentration. Further, the emotion / concentration determination unit 48 receives information for identifying the second user from the voice analysis unit 46, an emotion score for each type of emotion of the second user, and a degree of concentration of the second user.

感情・集中力判断部４８は、判断部として機能し、映像解析部４３及び音声解析部４６から受けた第２ユーザの感情の種類ごとの感情スコアに基づいて、第２ユーザの感情を判断する。例えば、感情・集中力判断部４８は、感情の種類ごとに、映像解析部４３から受けた第２ユーザの当該種類に対応する感情スコアと、音声解析部４６から受けた第２ユーザの当該種類に対応する感情スコアとを単純加算又は重みづけ加算することで、当該種類の感情スコアを算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの感情スコアに応じて変化させてもよい。 The emotion / concentration judgment unit 48 functions as a judgment unit, and determines the emotion of the second user based on the emotion score for each type of emotion of the second user received from the video analysis unit 43 and the voice analysis unit 46. .. For example, the emotion / concentration determination unit 48 has an emotion score corresponding to the type of the second user received from the video analysis unit 43 and the type of the second user received from the voice analysis unit 46 for each type of emotion. The emotion score of the type is calculated by simply adding or weighting the emotion score corresponding to. The weight of the weighting addition may be set in advance or may be changed according to the two emotion scores.

なお、感情スコアの算出方法はこれに限定されるものではない。例えば、感情・集中力判断部４８は、感情の種類ごとに設けられた識別器を用いて、第２ユーザの感情スコアを算出してもよい。具体的には、感情・集中力判断部４８は、各感情の種類の識別器に映像解析部４３から受けた第２ユーザの当該種類に対応する感情スコアと、音声解析部４６から受けた第２ユーザの当該種類に対応する感情スコアとを入力することにより、当該感情の種類に対する第２ユーザの感情スコアを算出する。 The method of calculating the emotion score is not limited to this. For example, the emotion / concentration determination unit 48 may calculate the emotion score of the second user by using the discriminator provided for each type of emotion. Specifically, the emotion / concentration determination unit 48 receives the emotion score corresponding to the type of the second user received from the video analysis unit 43 in the discriminator of each emotion type, and the voice analysis unit 46. By inputting the emotion score corresponding to the type of the two users, the emotion score of the second user for the type of emotion is calculated.

また、感情・集中力判断部４８は、映像解析部４３及び音声解析部４６から受けた第２ユーザの感情の種類ごとの集中度に基づいて、第２ユーザの集中度を判断する。例えば、感情・集中力判断部４８は、映像解析部４３から受けた第２ユーザの集中度と、音声解析部４６から受けた第２ユーザの集中度とを単純加算又は重みづけ加算することで、第２ユーザの集中度を算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの集中度に応じて変化させてもよい。 Further, the emotion / concentration determination unit 48 determines the concentration level of the second user based on the concentration level of each type of emotion of the second user received from the video analysis unit 43 and the voice analysis unit 46. For example, the emotion / concentration determination unit 48 simply adds or weights the concentration of the second user received from the video analysis unit 43 and the concentration of the second user received from the voice analysis unit 46. , Calculate the concentration of the second user. The weight of the weighting addition may be set in advance or may be changed according to the two degrees of concentration.

なお、集中度の算出方法はこれに限定されるものではない。例えば、感情・集中力判断部４８は、識別器を用いて、第２ユーザの集中度を算出してもよい。具体的には、感情・集中力判断部４８は、識別器に映像解析部４３から受けた第２ユーザの集中度と、音声解析部４６から受けた第２ユーザの集中度とを入力することにより、第２ユーザの集中度を算出する。 The method of calculating the degree of concentration is not limited to this. For example, the emotion / concentration determination unit 48 may calculate the concentration level of the second user by using the discriminator. Specifically, the emotion / concentration determination unit 48 inputs the concentration level of the second user received from the video analysis unit 43 and the concentration level of the second user received from the voice analysis unit 46 into the classifier. To calculate the concentration of the second user.

なお、感情・集中力判断部４８は、第２ユーザが複数いる場合には、第２ユーザごとに感情スコア及び集中度を算出する。 When there are a plurality of second users, the emotion / concentration determination unit 48 calculates the emotion score and the degree of concentration for each second user.

感情・集中力判断部４８は、算出した感情の種類ごとの第２ユーザの感情スコアと、第２ユーザの集中度とを、第２ユーザの識別子及び算出時刻と対応付けて感情・集中力ＤＢ５に書き込む。なお、第２ユーザの識別子には、第２ユーザを識別するための情報（例えば、ユーザ名）と、第２ユーザの映像中の位置情報とが含まれるものとする。 The emotion / concentration determination unit 48 associates the calculated emotion score of the second user for each type of emotion with the degree of concentration of the second user with the identifier of the second user and the calculated time, and the emotion / concentration DB5. Write to. It should be noted that the identifier of the second user includes information for identifying the second user (for example, a user name) and position information in the video of the second user.

なお、感情・集中力判断部４８は、第２ユーザの感情スコアの履歴に基づいて、算出した第２ユーザの感情スコアを補正してもよい。例えば、感情・集中力判断部４８は、感情の種類ごとに、過去一定期間の第２ユーザの感情スコアを感情・集中力ＤＢ５から読み出し、読み出した感情スコアに基づいて、感情スコアの標準偏差及び平均を算出する。感情・集中力判断部４８は、上述の式１に従い、感情の種類ごとに、算出した第２ユーザの感情スコアを、算出した感情スコアの標準偏差及び分散を用いて標準化する。これにより、第２ユーザ間で感情スコアを標準化することができる。 The emotion / concentration determination unit 48 may correct the calculated emotion score of the second user based on the history of the emotion score of the second user. For example, the emotion / concentration determination unit 48 reads the emotion score of the second user for a certain period in the past from the emotion / concentration DB5 for each type of emotion, and based on the read emotion score, the standard deviation of the emotion score and the standard deviation of the emotion score. Calculate the average. The emotion / concentration determination unit 48 standardizes the calculated emotion score of the second user for each type of emotion using the calculated standard deviation and variance of the emotion score according to the above equation 1. Thereby, the emotion score can be standardized among the second users.

また、感情・集中力判断部４８は、感情スコアの標準化の代わりに、感情スコアの正規化を行ってもよい。例えば、感情・集中力判断部４８は、感情の種類ごとに、過去一定期間の第２ユーザの感情スコアを感情・集中力ＤＢ５から読み出し、読み出した感情スコアに基づいて、感情スコアの最大値及び最小値を算出する。感情・集中力判断部４８は、上述の式２に従い、感情の種類ごとに、算出した第２ユーザの感情スコアを、算出した感情スコアの最大値及び最小値を用いて正規化する。これにより、第２ユーザ間で感情スコアを正規化することができる。 Further, the emotion / concentration determination unit 48 may normalize the emotion score instead of standardizing the emotion score. For example, the emotion / concentration determination unit 48 reads the emotion score of the second user for a certain period in the past from the emotion / concentration DB5 for each type of emotion, and based on the read emotion score, the maximum value of the emotion score and Calculate the minimum value. The emotion / concentration determination unit 48 normalizes the calculated emotion score of the second user for each type of emotion using the calculated maximum and minimum values of the emotion score according to the above equation 2. This makes it possible to normalize the emotional score among the second users.

また、感情・集中力判断部４８は、第２ユーザの集中度の履歴に基づいて、算出した第２ユーザの集中度を補正してもよい。例えば、感情・集中力判断部４８は、過去一定期間の第２ユーザの集中度を感情・集中力ＤＢ５から読み出し、読み出した集中度に基づいて、集中度の標準偏差及び平均を算出する。感情・集中力判断部４８は、上述の式３に従い、算出した第２ユーザの集中度を、算出した集中度の標準偏差及び平均を用いて標準化する。これにより、第２ユーザ間で集中度を標準化することができる。 In addition, the emotion / concentration determination unit 48 may correct the calculated concentration level of the second user based on the history of the concentration level of the second user. For example, the emotion / concentration determination unit 48 reads out the concentration of the second user in the past fixed period from the emotion / concentration DB 5, and calculates the standard deviation and the average of the concentration based on the read concentration. The emotion / concentration determination unit 48 standardizes the calculated concentration of the second user according to the above equation 3 by using the standard deviation and the average of the calculated concentration. As a result, the degree of concentration can be standardized among the second users.

また、感情・集中力判断部４８は、集中度の標準化の代わりに、集中度の正規化を行ってもよい。例えば、感情・集中力判断部４８は、過去一定期間の第２ユーザの集中度を感情・集中力ＤＢ５から読み出し、読み出した集中度に基づいて、集中度の最大値及び最小値を算出する。感情・集中力判断部４８は、上述の式４に従い、感情の種類ごとに、算出した第２ユーザの集中度を、算出した集中度の最大値及び最小値を用いて正規化する。これにより、第２ユーザ間で集中度を正規化することができる。 Further, the emotion / concentration determination unit 48 may normalize the concentration level instead of standardizing the concentration level. For example, the emotion / concentration determination unit 48 reads the concentration of the second user for a certain period in the past from the emotion / concentration DB 5, and calculates the maximum value and the minimum value of the concentration based on the read concentration. The emotion / concentration determination unit 48 normalizes the calculated concentration of the second user for each type of emotion using the calculated maximum and minimum values of the concentration according to the above equation 4. As a result, the degree of concentration can be normalized among the second users.

感情・集中力判断部４８は、算出した感情の種類ごとの第２ユーザの感情スコアと、第２ユーザの集中度とを、第２ユーザの識別子及び算出時刻と合わせて第２送信部４９に出力する。 The emotion / concentration determination unit 48 sends the calculated emotion score of the second user for each type of emotion and the degree of concentration of the second user to the second transmission unit 49 together with the identifier of the second user and the calculated time. Output.

第２送信部４９は、多重化部４７から多重化データを受け、感情・集中力判断部４８から第２ユーザの識別子及び算出時刻が付加された感情の種類ごとの第２ユーザの感情スコアと、第２ユーザの集中度とを受ける。第２送信部４９は、受けたこれらのデータを、第１装置２に送信する。 The second transmission unit 49 receives the multiplexed data from the multiplexing unit 47, and the emotion / concentration determination unit 48 adds the second user's identifier and the calculated time to the emotion score of the second user for each type of emotion. , Receives the concentration of the second user. The second transmission unit 49 transmits these received data to the first device 2.

第２受信部５０は、第１装置２から符号化済み映像及び符号化済み音声が多重化された多重化データと、感情の種類ごとの第１ユーザの感情スコアと、第１ユーザの集中度とを受信する。なお、これらのデータには、第１ユーザの識別子と、第１ユーザの感情スコア及び集中度の算出時刻とが付加されている。第２受信部５０は、第１装置２から受信したこれらのデータのセットを分離部５１に出力する。なお、第１ユーザの識別子には、第１ユーザを識別するための情報（例えば、ユーザ名）と、第１ユーザの映像中の位置情報とが含まれているものとする。 The second receiving unit 50 includes multiplexed data in which encoded video and encoded audio are multiplexed from the first device 2, an emotion score of the first user for each type of emotion, and a degree of concentration of the first user. And receive. An identifier of the first user and a calculation time of the emotion score and the degree of concentration of the first user are added to these data. The second receiving unit 50 outputs a set of these data received from the first device 2 to the separating unit 51. It is assumed that the identifier of the first user includes information for identifying the first user (for example, a user name) and position information in the video of the first user.

分離部５１は、第２受信部５０からデータセットを受け、データセットを分離する。つまり、分離部５１は、データセットに含まれる多重化データを符号化済み映像および符号化済み音声に分離し、分離した符号化済み映像および符号化済み音声を映像復号化部５２及び音声復号化部５３にそれぞれ出力する。また、分離部５１は、データセットから第１ユーザの識別子及び算出時刻が付加された感情の種類ごとの感情スコアと集中度とを分離し、分離したこれらのデータを感情・集中力処理部５４に出力する。 The separation unit 51 receives the data set from the second receiving unit 50 and separates the data set. That is, the separation unit 51 separates the multiplexed data included in the data set into the encoded video and the encoded audio, and separates the separated encoded video and the encoded audio into the video decoding unit 52 and the audio decoding. Output to unit 53, respectively. Further, the separation unit 51 separates the emotion score and the degree of concentration for each type of emotion to which the identifier of the first user and the calculated time are added from the data set, and separates these separated data into the emotion / concentration processing unit 54. Output to.

映像復号化部５２は、分離部５１から符号化済み映像を受け、当該映像を所定の復号化方法に従い復号化する。復号化方法は、第１装置２における映像の符号化方法に対応する方法とする。例えば、映像が４Ｋ映像又は８Ｋ映像の場合であって、第１装置２がＨ．２６５／ＨＥＶＣに従って映像を符号化した場合には、映像復号化部５２は、Ｈ．２６５／ＨＥＶＣに従って符号化済み映像を復号化する。映像復号化部５２は、復号化した映像を感情・集中力処理部５４及び表示・出力部５５に出力する。 The video decoding unit 52 receives the encoded video from the separation unit 51 and decodes the video according to a predetermined decoding method. The decoding method is a method corresponding to the video coding method in the first device 2. For example, when the video is a 4K video or an 8K video, the first device 2 is H.I. When the video is encoded according to 265 / HEVC, the video decoding unit 52 uses H. The encoded video is decoded according to 265 / HEVC. The video decoding unit 52 outputs the decoded video to the emotion / concentration processing unit 54 and the display / output unit 55.

音声復号化部５３は、分離部５１から符号化済み音声を受け、当該音声を所定の復号化方法に従い復号化する。復号化方法は、第１装置２における音声の符号化方法に対応する方法とする。例えば、第１装置２がＭＰＥＧ−４ＡＡＣに従い音声を符号化した場合には、音声復号化部５３は、ＭＰＥＧ−４ＡＡＣに従い音声を復号化する。音声復号化部５３は、復号化した音声を表示・出力部５５に出力する。 The voice decoding unit 53 receives the encoded voice from the separation unit 51 and decodes the voice according to a predetermined decoding method. The decoding method is a method corresponding to the voice coding method in the first device 2. For example, when the first device 2 encodes the voice according to the MPEG-4 AAC, the voice decoding unit 53 decodes the voice according to the MPEG-4 AAC. The voice decoding unit 53 outputs the decoded voice to the display / output unit 55.

感情・集中力処理部５４は、分離部５１から第１ユーザの識別子及び計測時刻と、第１ユーザの感情の種類ごとの感情スコア及び集中度とを受ける。また、感情・集中力処理部５４は、映像復号化部５２から映像を受ける。 The emotion / concentration processing unit 54 receives the identifier and measurement time of the first user from the separation unit 51, and the emotion score and concentration level for each type of emotion of the first user. Further, the emotion / concentration processing unit 54 receives an image from the image decoding unit 52.

感情・集中力処理部５４は、これらのデータから、ディスプレイに表示するための表示用データを作成する。例えば、感情・集中力処理部５４は、感情の種類ごとに感情スコアを所定の閾値で閾値処理することにより、感情に対応した表示用のアイコンの表示用データを作成する。例えば、感情・集中力処理部５４は、幸せな感情についての感情スコアが８０以上である第１ユーザに対して、当該第１ユーザの映像中の位置の近傍に幸せな感情に対応したアイコンを表示させるための表示用データを作成する。また、感情・集中力処理部５４は、不快な感情についての感情スコアが８０以上である第１ユーザに対して、当該第１ユーザの映像中の位置の近傍に不快な感情に対応したアイコンを表示させるための表示用データを作成する。 The emotion / concentration processing unit 54 creates display data for display on the display from these data. For example, the emotion / concentration processing unit 54 creates display data of a display icon corresponding to an emotion by performing threshold processing of an emotion score for each type of emotion with a predetermined threshold value. For example, the emotion / concentration processing unit 54 displays an icon corresponding to the happy emotion in the vicinity of the position in the image of the first user for the first user having an emotion score of 80 or more for the happy emotion. Create display data for display. Further, the emotion / concentration processing unit 54 displays an icon corresponding to the unpleasant emotion in the vicinity of the position in the image of the first user for the first user having an emotion score of 80 or more for the unpleasant emotion. Create display data for display.

また、感情・集中力処理部５４は、例えば、映像から第１ユーザの映像を切り出し、切り出した映像の隣に、第１ユーザの感情及び集中度の計測時刻、検出した感情及び集中度を表示するための表示用データを作成する。 Further, the emotion / concentration processing unit 54 cuts out the image of the first user from the image, and displays the measurement time of the emotion and concentration of the first user, the detected emotion and concentration next to the cut out image, for example. Create display data to do so.

感情・集中力処理部５４は、作成した表示用データを表示・出力部５５に出力する。 The emotion / concentration processing unit 54 outputs the created display data to the display / output unit 55.

表示・出力部５５は、第２出力部として機能し、音声復号化部５３から音声を受け、音声をスピーカーから出力する。 The display / output unit 55 functions as a second output unit, receives audio from the audio decoding unit 53, and outputs the audio from the speaker.

また、表示・出力部５５は、映像復号化部５２から映像を受け、感情・集中力処理部５４から表示用データを受け、表示用データを映像に重畳させ、重畳後の映像をディスプレイに表示させる。 Further, the display / output unit 55 receives an image from the image decoding unit 52, receives display data from the emotion / concentration processing unit 54, superimposes the display data on the image, and displays the superimposed image on the display. Let me.

〔提供システム１の処理フロー〕
図５は、本開示の実施形態１に係る提供システム１による、第１装置２から第２装置４への第１ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。 [Processing flow of providing system 1]
FIG. 5 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the first user from the first device 2 to the second device 4 by the providing system 1 according to the first embodiment of the present disclosure.

第１装置２の映像取得部２１は、カメラから映像を取得する（Ｓ１）。 The image acquisition unit 21 of the first device 2 acquires an image from the camera (S1).

第１装置２の音声取得部２４は、マイクから音声を取得する（Ｓ２）。 The voice acquisition unit 24 of the first device 2 acquires voice from the microphone (S2).

第１装置２の映像解析部２３は、ステップＳ１において取得された映像を解析することにより、映像から第１ユーザを特定し、第１ユーザの位置、第１ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ３）。 The video analysis unit 23 of the first device 2 identifies the first user from the video by analyzing the video acquired in step S1, and the emotion score for each position of the first user and the type of emotion of the first user. And the degree of concentration is determined (S3).

第１装置２の音声解析部２６は、ステップＳ２において取得された音声を解析することにより、音声から第１ユーザを特定し、第１ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ４）。 The voice analysis unit 26 of the first device 2 identifies the first user from the voice by analyzing the voice acquired in step S2, and determines the emotion score and the degree of concentration for each type of emotion of the first user. (S4).

第１装置２の感情・集中力判断部２８は、ステップＳ３において決定された第１ユーザの感情の種類ごとの感情スコア及び集中度と、ステップＳ４において決定された第１ユーザの感情の種類ごとの感情スコア及び集中度とに基づいて、第１ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ５）。 The emotion / concentration determination unit 28 of the first device 2 determines the emotion score and concentration level for each type of emotion of the first user determined in step S3, and each type of emotion of the first user determined in step S4. Based on the emotional score and the degree of concentration of the first user, the emotional score and the degree of concentration for each type of emotion of the first user are determined (S5).

第１装置２の映像符号化部２２は、ステップＳ１において取得された映像を符号化する（Ｓ６）。 The video coding unit 22 of the first device 2 encodes the video acquired in step S1 (S6).

第１装置２の音声符号化部２５は、ステップＳ２において取得された音声を符号化する（Ｓ７）。 The voice coding unit 25 of the first device 2 encodes the voice acquired in step S2 (S7).

第１装置２の多重化部２７は、ステップＳ６において符号化された映像と、ステップＳ７において符号化された音声とを多重化し、多重化データを生成する（Ｓ８）。 The multiplexing unit 27 of the first device 2 multiplexes the video encoded in step S6 and the audio encoded in step S7 to generate multiplexed data (S8).

第１装置２の第１送信部２９は、ステップＳ８において生成された多重化データと、ステップＳ５において決定された第１ユーザの感情の種類ごとの感情スコア及び集中度に第１ユーザの識別子及び算出時刻が付加されたデータセットを、第２装置４に送信する。第２装置４の第２受信部５０は、当該データセットを受信する（Ｓ９）。 The first transmission unit 29 of the first device 2 sets the multiplexing data generated in step S8, the emotion score and the degree of concentration for each type of emotion of the first user determined in step S5, and the identifier of the first user. The data set to which the calculated time is added is transmitted to the second device 4. The second receiving unit 50 of the second device 4 receives the data set (S9).

第２装置４の第２受信部５０は、ステップＳ９において受信されたデータセットを、符号化済み映像、符号化済み音声、第１ユーザの識別子及び算出時刻が付加された感情の種類ごとの感情スコア及び集中度とに分離する（Ｓ１０）。 The second receiving unit 50 of the second device 4 adds the encoded video, the encoded audio, the identifier of the first user, and the calculated time to the data set received in step S9 for each emotion type. Separated into score and concentration (S10).

第２装置４の映像復号化部５２は、ステップＳ１０において分離された符号化済み映像を復号化する（Ｓ１１）。 The video decoding unit 52 of the second device 4 decodes the coded video separated in step S10 (S11).

第２装置４の音声復号化部５３は、ステップＳ１０において分離された符号化済み音声を復号化する（Ｓ１２）。 The voice decoding unit 53 of the second device 4 decodes the coded voice separated in step S10 (S12).

第２装置４の感情・集中力処理部５４は、ステップＳ１１において復号化された映像と、ステップＳ１０において分離された第１ユーザの識別子及び算出時刻が付加された感情の種類ごとの感情スコア及び集中度とに基づいて、ディスプレイに第１ユーザの感情及び集中度を表示するための表示用データを作成する（Ｓ１３）。 The emotion / concentration processing unit 54 of the second device 4 adds an emotion score for each type of emotion to which the video decoded in step S11, the identifier of the first user separated in step S10, and the calculated time are added. Based on the concentration level, display data for displaying the emotion and concentration level of the first user on the display is created (S13).

第２装置４の表示・出力部５５は、ステップＳ１３において作成された表示用データをステップＳ１１において復号された映像に重畳させ、重畳後の映像をディスプレイに表示させる（Ｓ１４）。 The display / output unit 55 of the second device 4 superimposes the display data created in step S13 on the video decoded in step S11, and displays the superposed video on the display (S14).

第２装置４の表示・出力部５５は、ステップＳ１２において復号された音声をスピーカーから出力する（Ｓ１５）。 The display / output unit 55 of the second device 4 outputs the sound decoded in step S12 from the speaker (S15).

図５に示した処理を実行することにより、第１ユーザの感情及び集中度が第２ユーザに提供されることになる。 By executing the process shown in FIG. 5, the emotion and concentration of the first user are provided to the second user.

図６は、本開示の実施形態１に係る提供システム１による、第２装置４から第１装置２への第２ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。 FIG. 6 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the second user from the second device 4 to the first device 2 by the providing system 1 according to the first embodiment of the present disclosure.

第２装置４の映像取得部４１は、カメラから映像を取得する（Ｓ２１）。 The image acquisition unit 41 of the second device 4 acquires an image from the camera (S21).

第２装置４の音声取得部４４は、マイクから音声を取得する（Ｓ２２）。 The voice acquisition unit 44 of the second device 4 acquires voice from the microphone (S22).

第２装置４の映像解析部４３は、ステップＳ２１で取得された映像を解析することにより、映像から第２ユーザを特定し、第２ユーザの位置、第２ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ２３）。 The video analysis unit 43 of the second device 4 identifies the second user from the video by analyzing the video acquired in step S21, and the emotion score for each position of the second user and the type of emotion of the second user. And the degree of concentration is determined (S23).

第２装置４の音声解析部４６は、ステップＳ２２において取得された音声を解析することにより、音声から第２ユーザを特定し、第２ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ２４）。 The voice analysis unit 46 of the second device 4 identifies the second user from the voice by analyzing the voice acquired in step S22, and determines the emotion score and the degree of concentration for each type of emotion of the second user. (S24).

第２装置４の感情・集中力判断部４８は、ステップＳ２３において決定された第２ユーザの感情の種類ごとの感情スコア及び集中度と、ステップＳ２４において決定された第２ユーザの感情の種類ごとの感情スコア及び集中度とに基づいて、第２ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ２５）。 The emotion / concentration determination unit 48 of the second device 4 has an emotion score and concentration for each type of emotion of the second user determined in step S23, and each type of emotion of the second user determined in step S24. Based on the emotional score and the degree of concentration of the second user, the emotional score and the degree of concentration for each type of emotion of the second user are determined (S25).

第２装置４の映像符号化部４２は、ステップＳ２１において取得された映像を符号化する（Ｓ２６）。 The video coding unit 42 of the second device 4 encodes the video acquired in step S21 (S26).

第２装置４の音声符号化部４５は、ステップＳ２２において取得された音声を符号化する（Ｓ２７）。 The voice coding unit 45 of the second device 4 encodes the voice acquired in step S22 (S27).

第２装置４の多重化部４７は、ステップＳ２６において符号化された映像と、ステップＳ２７において符号化された音声とを多重化し、多重化データを生成する（Ｓ２８）。 The multiplexing unit 47 of the second device 4 multiplexes the video encoded in step S26 and the audio encoded in step S27 to generate multiplexed data (S28).

第２装置４の第２送信部４９は、ステップＳ２８において生成された多重化データと、ステップＳ２５において決定された第２ユーザの感情の種類ごとの感情スコア及び集中度に第２ユーザの識別子及び算出時刻が付加されたデータセットを、第１装置２に送信する。第１装置２の第１受信部３０は、当該データセットを受信する（Ｓ２９）。 The second transmission unit 49 of the second device 4 sets the multiplexing data generated in step S28, the emotion score and the degree of concentration of each type of emotion of the second user determined in step S25, and the identifier of the second user and the degree of concentration. The data set to which the calculated time is added is transmitted to the first device 2. The first receiving unit 30 of the first device 2 receives the data set (S29).

第１装置２の第１受信部３０は、ステップＳ２９において受信されたデータセットを、符号化済み映像、符号化済み音声、第２ユーザの識別子及び算出時刻が付加された感情の種類ごとの感情スコア及び集中度とに分離する（Ｓ３０）。 The first receiving unit 30 of the first device 2 adds the encoded video, the encoded audio, the identifier of the second user, and the calculated time to the data set received in step S29 for each emotion type. Separated into score and concentration (S30).

第１装置２の映像復号化部３２は、ステップＳ３０において分離された符号化済み映像を復号化する（Ｓ３１）。 The video decoding unit 32 of the first device 2 decodes the coded video separated in step S30 (S31).

第１装置２の音声復号化部３３は、ステップＳ３０において分離された符号化済み音声を復号化する（Ｓ３２）。 The voice decoding unit 33 of the first device 2 decodes the coded voice separated in step S30 (S32).

第１装置２の感情・集中力処理部３４は、ステップＳ３１において復号化された映像と、ステップＳ３０において分離された第２ユーザの識別子及び算出時刻が付加された感情の種類ごとの感情スコア及び集中度とに基づいて、ディスプレイに第２ユーザの感情及び集中度を表示するための表示用データを作成する（Ｓ３３）。 The emotion / concentration processing unit 34 of the first device 2 has an emotion score for each type of emotion to which the video decoded in step S31, the identifier of the second user separated in step S30, and the calculated time are added. Based on the concentration level, display data for displaying the emotion and concentration level of the second user on the display is created (S33).

第１装置２の表示・出力部３５は、ステップＳ３３において作成された表示用データをステップＳ３１において復号された映像に重畳させ、重畳後の映像をディスプレイに表示させる（Ｓ３４）。 The display / output unit 35 of the first device 2 superimposes the display data created in step S33 on the video decoded in step S31, and displays the superposed video on the display (S34).

第１装置２の表示・出力部３５は、ステップＳ３２において復号された音声をスピーカーから出力する（Ｓ３５）。 The display / output unit 35 of the first device 2 outputs the sound decoded in step S32 from the speaker (S35).

図６に示した処理を実行することにより、第２ユーザの感情及び集中度が第１ユーザに提供されることになる。 By executing the process shown in FIG. 6, the emotion and concentration of the second user are provided to the first user.

〔実施形態１の効果等〕
実施形態１によると、第１ユーザの発話内容の聞き手である第２ユーザの感情及び集中度の少なくとも一方の判断結果が、第１ユーザに提供される。このため、第１ユーザは、自分の発話内容に対し、第２ユーザがどのような感情を抱いているか、又は第２ユーザが集中して話を聞いているかなどを知ることができる。これに対し、第１ユーザは、例えば、発話内容に対して否定的な感情を抱く第２ユーザに対して質問を行ったり、第２ユーザが集中していない場合には話題を変えるなどの対策を行うことができる。これにより、ユーザ同士の円滑なコミュニケーションを支援することができる。 [Effects of Embodiment 1 and the like]
According to the first embodiment, the determination result of at least one of the emotion and the concentration of the second user who is the listener of the utterance content of the first user is provided to the first user. Therefore, the first user can know what kind of emotion the second user has with respect to the content of his / her utterance, or whether the second user concentrates on listening to the story. On the other hand, the first user asks a question to the second user who has a negative feeling about the utterance content, or changes the topic when the second user is not concentrated. It can be performed. This makes it possible to support smooth communication between users.

同様に、第２ユーザの発話内容の聞き手である第１ユーザの感情及び集中度の少なくとも一方の判断結果が、第２ユーザに提供される。これにより、第２ユーザも、第１ユーザと同様の対策を行うことが可能である。 Similarly, the judgment result of at least one of the emotion and the concentration of the first user who is the listener of the utterance content of the second user is provided to the second user. As a result, the second user can take the same measures as the first user.

また、第２ユーザの音声を考慮して第２ユーザの感情及び集中度の少なくとも一方が判断される。このため、第２ユーザの映像だけを用いて感情及び集中度の少なくとも一方を判断する場合に比べ、第２ユーザの感情又は集中度を高精度で判断することができる。第１ユーザの感情及び集中度の判断においても同様である。 In addition, at least one of the emotions and the degree of concentration of the second user is determined in consideration of the voice of the second user. Therefore, the emotion or concentration of the second user can be determined with higher accuracy than in the case of determining at least one of the emotion and the concentration of the second user using only the video of the second user. The same applies to the judgment of the emotion and concentration of the first user.

また、第１ユーザと第２ユーザとの間でネットワーク３越しに対話を行い、第２ユーザの感情及び集中度の少なくとも一方の判断結果を第１ユーザに提供することができる。このため、例えば、第１ユーザを会議の進行役とする電子会議システムにおいて、第１ユーザが第２ユーザの感情又は集中度を把握しながら、第２ユーザに適宜意見を求めたりしながら議事を進行することができる。これにより、議論を建設的なものとし、生産性の高い会議を実現することができる。なお、第１装置２は、第２ユーザの場合と同様に、第１ユーザの音声及び映像から第１ユーザの感情及び集中度の少なくとも一方を判断し、第２装置４が、第１ユーザの感情及び集中度の少なくとも一方の判断結果をディスプレイに表示する。これにより、第１ユーザ及び第２ユーザは、相互に相手の感情又は集中度を把握することができる。 Further, it is possible to have a dialogue between the first user and the second user over the network 3 and provide the first user with a judgment result of at least one of the emotion and the concentration of the second user. Therefore, for example, in an electronic conference system in which the first user is the facilitator of the conference, the first user grasps the emotion or concentration of the second user and asks the second user for opinions as appropriate. You can proceed. This makes the discussion constructive and enables highly productive meetings. As in the case of the second user, the first device 2 determines at least one of the emotion and concentration of the first user from the audio and video of the first user, and the second device 4 determines the emotion and concentration of the first user. The result of judgment of at least one of emotion and concentration is displayed on the display. As a result, the first user and the second user can mutually grasp the emotions or the degree of concentration of the other party.

また、第１装置２で第１ユーザの感情及び集中度を判断し、第２装置４で第２ユーザの感情及び集中度を判断している。このため、第１装置２は、第２装置４に映像を送信しないようにしてもよく、第２装置４は、第１装置２に映像を送信しないようにしてもよい。これにより、第１装置２から第２装置４への伝送データを削減しつつ、第１ユーザの感情又は集中度の判断結果を第２装置４に送信することができる。また、第２装置４から第１装置２への伝送データを削減しつつ、第２ユーザの感情又は集中度の判断結果を第１装置２に送信することができる。 Further, the first device 2 determines the emotion and concentration of the first user, and the second device 4 determines the emotion and concentration of the second user. Therefore, the first device 2 may not transmit the video to the second device 4, and the second device 4 may not transmit the video to the first device 2. As a result, the determination result of the emotion or concentration of the first user can be transmitted to the second device 4 while reducing the transmission data from the first device 2 to the second device 4. Further, the determination result of the emotion or the degree of concentration of the second user can be transmitted to the first device 2 while reducing the transmission data from the second device 4 to the first device 2.

なお、第１装置２の感情・集中力処理部３４は、第２ユーザの感情に基づく表情を有する第２ユーザのアバターを表示させるための表示用データを作成してもよい。同様に、第２装置４の感情・集中力処理部５４は、第１ユーザの感情に基づく第１ユーザのアバターを表示させるための表示用データを作成してもよい。これにより、ユーザの映像の代わりにアバターを表示させることができるため、第２ユーザ及び第１ユーザのプライバシーを保護することもできる。 The emotion / concentration processing unit 34 of the first device 2 may create display data for displaying the avatar of the second user having a facial expression based on the emotion of the second user. Similarly, the emotion / concentration processing unit 54 of the second device 4 may create display data for displaying the avatar of the first user based on the emotion of the first user. As a result, the avatar can be displayed instead of the user's image, so that the privacy of the second user and the first user can be protected.

また、第１装置２の感情・集中力判断部２８及び第２装置４の感情・集中力判断部４８は、過去のユーザの感情スコア及び集中度に基づいて、感情スコア及び集中度を標準化することができる。つまり、感情の起伏や集中度の変化が相対的に小さいユーザの各スコアと、感情の起伏や集中度の変化が相対的に大きいユーザの各スコアとを標準化することができる。これにより、ユーザ間で感情又は集中度を正確に比較することができる。 Further, the emotion / concentration determination unit 28 of the first device 2 and the emotion / concentration determination unit 48 of the second device 4 standardize the emotion score and concentration based on the emotion score and concentration of the past user. be able to. That is, it is possible to standardize each score of a user having a relatively small change in emotional ups and downs and concentration, and each score of a user having a relatively large change in emotional ups and downs and concentration. This makes it possible to accurately compare emotions or concentration levels among users.

＜実施形態２＞
実施形態１では、第１装置２が第１ユーザの感情及び集中度を判断し、第２装置４が第２ユーザの感情及び集中度を判断した。実施形態２では、第１装置２が第２ユーザの感情及び集中度を判断し、第２装置４が第１ユーザの感情及び集中度を判断する例について説明する。 <Embodiment 2>
In the first embodiment, the first device 2 determines the emotion and concentration of the first user, and the second device 4 determines the emotion and concentration of the second user. In the second embodiment, an example will be described in which the first device 2 determines the emotion and concentration of the second user, and the second device 4 determines the emotion and concentration of the first user.

実施形態２に係る提供システム１の構成は実施形態１と同様である。 The configuration of the provision system 1 according to the second embodiment is the same as that of the first embodiment.

〔第１装置２の構成〕
図７は、本開示の実施形態２に係る第１装置２の機能的構成を示すブロック図である。 [Structure of First Device 2]
FIG. 7 is a block diagram showing a functional configuration of the first device 2 according to the second embodiment of the present disclosure.

第１装置２は、映像取得部２１と、映像符号化部２２と、音声取得部２４と、音声符号化部２５と、多重化部２７と、第１送信部２９と、第１受信部３０と、分離部３１と、映像復号化部３２と、音声復号化部３３と、映像解析部２３と、音声解析部２６と、感情・集中力判断部２８と、表示・出力部３５とを備える。 The first device 2 includes a video acquisition unit 21, a video coding unit 22, an audio acquisition unit 24, an audio coding unit 25, a multiplexing unit 27, a first transmission unit 29, and a first reception unit 30. A separation unit 31, a video decoding unit 32, an audio decoding unit 33, a video analysis unit 23, an audio analysis unit 26, an emotion / concentration determination unit 28, and a display / output unit 35. ..

映像取得部２１、映像符号化部２２、音声取得部２４、音声符号化部２５及び多重化部２７の処理は、実施形態１と同様である。 The processing of the video acquisition unit 21, the video coding unit 22, the audio acquisition unit 24, the audio coding unit 25, and the multiplexing unit 27 is the same as that of the first embodiment.

第１送信部２９は、多重化部２７から多重化データを受け、当該多重化データを第２装置４に送信する。 The first transmission unit 29 receives the multiplexing data from the multiplexing unit 27 and transmits the multiplexing data to the second device 4.

第１受信部３０は、第２装置４から符号化済み映像及び符号化済み音声が多重化された多重化データを受信する。第１受信部３０は、受信した多重化データを分離部３１に出力する。 The first receiving unit 30 receives the multiplexed data in which the encoded video and the encoded audio are multiplexed from the second device 4. The first receiving unit 30 outputs the received multiplexed data to the separating unit 31.

分離部３１は、第１受信部３０から多重化データを受け、多重化データを符号化済み映像および符号化済み音声に分離する。分離部３１は、分離した符号化済み映像および符号化済み音声を映像復号化部３２及び音声復号化部３３にそれぞれ出力する。 The separation unit 31 receives the multiplexed data from the first receiving unit 30 and separates the multiplexed data into a coded video and a coded audio. The separation unit 31 outputs the separated encoded video and encoded audio to the video decoding unit 32 and the audio decoding unit 33, respectively.

映像復号化部３２及び音声復号化部３３の処理は、実施形態１と同様である。映像復号化部３２は、映像取得部として機能し、復号化した映像を映像解析部２３及び感情・集中力判断部２８に出力し、音声復号化部３３は、音声取得部として機能し、復号化した音声を音声解析部２６に出力する。 The processing of the video decoding unit 32 and the audio decoding unit 33 is the same as that of the first embodiment. The video decoding unit 32 functions as a video acquisition unit and outputs the decoded video to the video analysis unit 23 and the emotion / concentration determination unit 28, and the audio decoding unit 33 functions as an audio acquisition unit and decodes the video. The converted voice is output to the voice analysis unit 26.

映像解析部２３は、映像復号化部３２から映像を受け、映像に映っている第２ユーザと第２ユーザの映像中の位置とを特定する。また、映像解析部２３は、判断部として機能し、当該映像を解析することにより第２ユーザの感情及び集中度を判断する。映像解析部２３は、第２ユーザを識別するための情報及び第２ユーザの映像中の位置と、算出した第２ユーザの感情の種類ごとの感情スコア及び集中度とを感情・集中力判断部２８に出力する。なお、映像解析部２３の処理は、処理の対象とするユーザが第２ユーザである点を除いて実施形態１の映像解析部２３と同様である。 The video analysis unit 23 receives the video from the video decoding unit 32 and identifies the second user and the position in the video of the second user in the video. In addition, the video analysis unit 23 functions as a determination unit, and determines the emotion and concentration of the second user by analyzing the video. The video analysis unit 23 determines the emotion / concentration determination unit for the information for identifying the second user, the position in the video of the second user, and the calculated emotion score and concentration for each type of emotion of the second user. Output to 28. The processing of the video analysis unit 23 is the same as that of the video analysis unit 23 of the first embodiment except that the user to be processed is the second user.

音声解析部２６は、音声復号化部３３から音声を受け、音声を発している第２ユーザを特定する。また、音声解析部２６は、判断部として機能し、当該音声を解析することにより、第２ユーザの感情及び集中度を判断する。音声解析部２６は、第２ユーザを識別するための情報と、算出した第２ユーザの感情の種類ごとの感情スコアと、集中度とを感情・集中力判断部２８に出力する。なお、音声解析部２６の処理は、処理の対象とするユーザが第２ユーザである点を除いて実施形態１の音声解析部２６と同様である。 The voice analysis unit 26 receives a voice from the voice decoding unit 33 and identifies a second user who is emitting the voice. In addition, the voice analysis unit 26 functions as a determination unit, and determines the emotion and concentration of the second user by analyzing the voice. The voice analysis unit 26 outputs the information for identifying the second user, the calculated emotion score for each type of emotion of the second user, and the degree of concentration to the emotion / concentration determination unit 28. The processing of the voice analysis unit 26 is the same as that of the voice analysis unit 26 of the first embodiment except that the user to be processed is the second user.

感情・集中力判断部２８は、映像解析部２３から第２ユーザを識別するための情報及び第２ユーザの映像中の位置と、第２ユーザの感情の種類ごとの感情スコア及び第２ユーザの集中度を受ける。また、感情・集中力判断部２８は、音声解析部２６から第２ユーザを識別するための情報と、第２ユーザの感情の種類ごとの感情スコア及び第２ユーザの集中度を受ける。 The emotion / concentration determination unit 28 includes information for identifying the second user from the video analysis unit 23, a position in the video of the second user, an emotion score for each type of emotion of the second user, and the second user's emotion score. Receive a degree of concentration. Further, the emotion / concentration determination unit 28 receives information for identifying the second user from the voice analysis unit 26, an emotion score for each type of emotion of the second user, and a degree of concentration of the second user.

感情・集中力判断部２８は、判断部として機能し、映像解析部２３及び音声解析部２６から受けた第２ユーザの感情の種類ごとの感情スコアに基づいて、第２ユーザの感情を判断する。例えば、感情・集中力判断部２８は、感情の種類ごとに、映像解析部２３から受けた第２ユーザの当該種類に対応する感情スコアと、音声解析部２６から受けた第２ユーザの当該種類に対応する感情スコアとを単純加算又は重みづけ加算することで、当該種類の感情スコアを算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの感情スコアに応じて変化させてもよい。 The emotion / concentration judgment unit 28 functions as a judgment unit, and determines the emotion of the second user based on the emotion score for each type of emotion of the second user received from the video analysis unit 23 and the voice analysis unit 26. .. For example, the emotion / concentration determination unit 28 has an emotion score corresponding to the type of the second user received from the video analysis unit 23 and the type of the second user received from the voice analysis unit 26 for each type of emotion. The emotion score of the type is calculated by simply adding or weighting the emotion score corresponding to. The weight of the weighting addition may be set in advance or may be changed according to the two emotion scores.

また、感情・集中力判断部２８は、映像解析部２３及び音声解析部２６から受けた第２ユーザの感情の種類ごとの集中度に基づいて、第２ユーザの集中度を判断する。例えば、感情・集中力判断部２８は、映像解析部２３から受けた第２ユーザの集中度と、音声解析部２６から受けた第２ユーザの集中度とを単純加算又は重みづけ加算することで、第２ユーザの集中度を算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの集中度に応じて変化させてもよい。 Further, the emotion / concentration determination unit 28 determines the concentration level of the second user based on the concentration level of each type of emotion of the second user received from the video analysis unit 23 and the voice analysis unit 26. For example, the emotion / concentration determination unit 28 simply adds or weights the concentration of the second user received from the video analysis unit 23 and the concentration of the second user received from the voice analysis unit 26. , Calculate the concentration of the second user. The weight of the weighting addition may be set in advance or may be changed according to the two degrees of concentration.

なお、感情・集中力判断部２８は、第２ユーザが複数いる場合には、第２ユーザごとに感情スコア及び集中度を算出する。 When there are a plurality of second users, the emotion / concentration determination unit 28 calculates the emotion score and the degree of concentration for each of the second users.

また、感情・集中力判断部２８は、ディスプレイに表示するための表示用データを作成する。例えば、感情・集中力判断部２８は、感情の種類ごとに感情スコアを所定の閾値で閾値処理することにより、感情に対応した表示用のアイコンの表示用データを作成する。例えば、感情・集中力判断部２８は、幸せな感情についての感情スコアが８０以上である第２ユーザに対して、当該第２ユーザの映像中の位置の近傍に幸せな感情に対応したアイコンを表示させるための表示用データを作成する。また、感情・集中力判断部２８は、不快な感情についての感情スコアが８０以上である第２ユーザに対して、当該第２ユーザの映像中の位置の近傍に不快な感情に対応したアイコンを表示させるための表示用データを作成する。 In addition, the emotion / concentration determination unit 28 creates display data for display on the display. For example, the emotion / concentration determination unit 28 creates display data of a display icon corresponding to an emotion by performing threshold processing of an emotion score for each type of emotion with a predetermined threshold value. For example, the emotion / concentration determination unit 28 displays an icon corresponding to the happy emotion in the vicinity of the position in the video of the second user for the second user having an emotion score of 80 or more for the happy emotion. Create display data for display. In addition, the emotion / concentration determination unit 28 displays an icon corresponding to the unpleasant emotion in the vicinity of the position in the video of the second user for the second user having an emotion score of 80 or more for the unpleasant emotion. Create display data for display.

また、感情・集中力判断部２８は、例えば、映像から第２ユーザの映像を切り出し、切り出した映像の隣に、第２ユーザの感情及び集中度の計測時刻、検出した感情及び集中度を表示するための表示用データを作成する。 Further, the emotion / concentration determination unit 28 cuts out the image of the second user from the image, and displays the measurement time of the emotion and concentration of the second user, the detected emotion and concentration next to the cut out image, for example. Create display data to do so.

感情・集中力判断部２８は、作成した表示用データを表示・出力部３５に出力する。 The emotion / concentration determination unit 28 outputs the created display data to the display / output unit 35.

表示・出力部３５の処理は、実施形態１の表示・出力部３５と同様である。 The processing of the display / output unit 35 is the same as that of the display / output unit 35 of the first embodiment.

〔第２装置４の構成〕
図８は、本開示の実施形態２に係る第２装置４の機能的構成を示すブロック図である。 [Structure of the second device 4]
FIG. 8 is a block diagram showing a functional configuration of the second device 4 according to the second embodiment of the present disclosure.

第２装置４は、映像取得部４１と、映像符号化部４２と、音声取得部４４と、音声符号化部４５と、多重化部４７と、第２送信部４９と、第２受信部５０と、分離部５１と、映像復号化部５２と、音声復号化部５３と、映像解析部４３と、音声解析部４６と、感情・集中力判断部４８と、表示・出力部５５とを備える。 The second device 4 includes a video acquisition unit 41, a video coding unit 42, an audio acquisition unit 44, an audio coding unit 45, a multiplexing unit 47, a second transmission unit 49, and a second reception unit 50. The separation unit 51, the video decoding unit 52, the audio decoding unit 53, the video analysis unit 43, the audio analysis unit 46, the emotion / concentration determination unit 48, and the display / output unit 55 are provided. ..

映像取得部４１、映像符号化部４２、音声取得部４４、音声符号化部４５及び多重化部４７の処理は、実施形態１と同様である。 The processing of the video acquisition unit 41, the video coding unit 42, the audio acquisition unit 44, the audio coding unit 45, and the multiplexing unit 47 is the same as that of the first embodiment.

第２送信部４９は、多重化部４７から多重化データを受け、当該多重化データを第１装置２に送信する。 The second transmission unit 49 receives the multiplexing data from the multiplexing unit 47 and transmits the multiplexing data to the first device 2.

第２受信部５０は、第１装置２から符号化済み映像及び符号化済み音声が多重化された多重化データを受信する。第２受信部５０は、受信した多重化データを分離部５１に出力する。 The second receiving unit 50 receives the multiplexed data in which the encoded video and the encoded audio are multiplexed from the first device 2. The second receiving unit 50 outputs the received multiplexed data to the separating unit 51.

分離部５１は、第２受信部５０から多重化データを受け、多重化データを符号化済み映像および符号化済み音声に分離する。分離部５１は、分離した符号化済み映像および符号化済み音声を映像復号化部５２及び音声復号化部５３にそれぞれ出力する。 The separation unit 51 receives the multiplexed data from the second receiving unit 50, and separates the multiplexed data into a coded video and a coded audio. The separation unit 51 outputs the separated encoded video and encoded audio to the video decoding unit 52 and the audio decoding unit 53, respectively.

映像復号化部５２及び音声復号化部５３の処理は、実施形態１と同様である。映像復号化部５２は、復号化した映像を映像解析部４３及び感情・集中力判断部４８に出力し、音声復号化部５３は、復号化した音声を音声解析部４６に出力する。 The processing of the video decoding unit 52 and the audio decoding unit 53 is the same as that of the first embodiment. The video decoding unit 52 outputs the decoded video to the video analysis unit 43 and the emotion / concentration determination unit 48, and the audio decoding unit 53 outputs the decoded audio to the audio analysis unit 46.

映像解析部４３は、映像復号化部５２から映像を受け、映像に映っている第１ユーザと第１ユーザの映像中の位置とを特定する。また、映像解析部４３は、当該映像を解析することにより第１ユーザの感情及び集中度を判断する。映像解析部４３は、第１ユーザを識別するための情報及び第１ユーザの映像中の位置と、算出した第１ユーザの感情の種類ごとの感情スコア及び集中度とを感情・集中力判断部４８に出力する。なお、映像解析部４３の処理は、処理の対象とするユーザが第１ユーザである点を除いて実施形態１の映像解析部４３と同様である。 The video analysis unit 43 receives the video from the video decoding unit 52 and identifies the first user and the position in the video of the first user in the video. In addition, the video analysis unit 43 determines the emotion and concentration of the first user by analyzing the video. The video analysis unit 43 determines the emotion / concentration determination unit for the information for identifying the first user, the position in the video of the first user, and the calculated emotion score and concentration for each type of emotion of the first user. Output to 48. The processing of the video analysis unit 43 is the same as that of the video analysis unit 43 of the first embodiment except that the user to be processed is the first user.

音声解析部４６は、音声復号化部５３から音声を受け、音声を発している第１ユーザを特定する。また、音声解析部４６は、当該音声を解析することにより、第１ユーザの感情及び集中度を判断する。音声解析部４６は、第１ユーザを識別するための情報と、算出した第１ユーザの感情の種類ごとの感情スコアと、集中度とを感情・集中力判断部４８に出力する。なお、音声解析部４６の処理は、処理の対象とするユーザが第１ユーザである点を除いて実施形態１の音声解析部４６と同様である。 The voice analysis unit 46 receives the voice from the voice decoding unit 53 and identifies the first user who is emitting the voice. In addition, the voice analysis unit 46 determines the emotion and concentration of the first user by analyzing the voice. The voice analysis unit 46 outputs the information for identifying the first user, the calculated emotion score for each type of emotion of the first user, and the degree of concentration to the emotion / concentration determination unit 48. The processing of the voice analysis unit 46 is the same as that of the voice analysis unit 46 of the first embodiment except that the user to be processed is the first user.

感情・集中力判断部４８は、映像解析部４３から第１ユーザを識別するための情報及び第１ユーザの映像中の位置と、第１ユーザの感情の種類ごとの感情スコア及び第１ユーザの集中度を受ける。また、感情・集中力判断部４８は、音声解析部４６から第１ユーザを識別するための情報と、第１ユーザの感情の種類ごとの感情スコア及び第１ユーザの集中度を受ける。 The emotion / concentration determination unit 48 includes information for identifying the first user from the video analysis unit 43, a position in the video of the first user, an emotion score for each type of emotion of the first user, and the emotion score of the first user. Receive a degree of concentration. Further, the emotion / concentration determination unit 48 receives information for identifying the first user from the voice analysis unit 46, an emotion score for each type of emotion of the first user, and a degree of concentration of the first user.

感情・集中力判断部４８は、映像解析部４３及び音声解析部４６から受けた第１ユーザの感情の種類ごとの感情スコアに基づいて、第１ユーザの感情を判断する。例えば、感情・集中力判断部４８は、感情の種類ごとに、映像解析部４３から受けた第１ユーザの当該種類に対応する感情スコアと、音声解析部４６から受けた第１ユーザの当該種類に対応する感情スコアとを単純加算又は重みづけ加算することで、当該種類の感情スコアを算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの感情スコアに応じて変化させてもよい。 The emotion / concentration determination unit 48 determines the emotion of the first user based on the emotion score for each type of emotion of the first user received from the video analysis unit 43 and the voice analysis unit 46. For example, the emotion / concentration determination unit 48 has an emotion score corresponding to the type of the first user received from the video analysis unit 43 and the type of the first user received from the voice analysis unit 46 for each type of emotion. The emotion score of the type is calculated by simply adding or weighting the emotion score corresponding to. The weight of the weighting addition may be set in advance or may be changed according to the two emotion scores.

また、感情・集中力判断部４８は、映像解析部４３及び音声解析部４６から受けた第１ユーザの感情の種類ごとの集中度に基づいて、第１ユーザの集中度を判断する。例えば、感情・集中力判断部４８は、映像解析部４３から受けた第１ユーザの集中度と、音声解析部４６から受けた第１ユーザの集中度とを単純加算又は重みづけ加算することで、第１ユーザの集中度を算出する。なお、重みづけ加算の重みは、あらかじめ設定されていてもよいし、２つの集中度に応じて変化させてもよい。 Further, the emotion / concentration determination unit 48 determines the concentration level of the first user based on the concentration level of each type of emotion of the first user received from the video analysis unit 43 and the voice analysis unit 46. For example, the emotion / concentration determination unit 48 simply adds or weights the concentration of the first user received from the video analysis unit 43 and the concentration of the first user received from the voice analysis unit 46. , Calculate the concentration of the first user. The weight of the weighting addition may be set in advance or may be changed according to the two degrees of concentration.

なお、感情・集中力判断部４８は、第１ユーザが複数いる場合には、第１ユーザごとに感情スコア及び集中度を算出する。 When there are a plurality of first users, the emotion / concentration determination unit 48 calculates the emotion score and the degree of concentration for each first user.

また、感情・集中力判断部４８は、ディスプレイに表示するための表示用データを作成する。例えば、感情・集中力判断部４８は、感情の種類ごとに感情スコアを所定の閾値で閾値処理することにより、感情に対応した表示用のアイコンの表示用データを作成する。例えば、感情・集中力判断部４８は、幸せな感情についての感情スコアが８０以上である第１ユーザに対して、当該第１ユーザの映像中の位置の近傍に幸せな感情に対応したアイコンを表示させるための表示用データを作成する。また、感情・集中力判断部４８は、不快な感情についての感情スコアが８０以上である第１ユーザに対して、当該第１ユーザの映像中の位置の近傍に不快な感情に対応したアイコンを表示させるための表示用データを作成する。 In addition, the emotion / concentration determination unit 48 creates display data for display on the display. For example, the emotion / concentration determination unit 48 creates display data of a display icon corresponding to an emotion by performing threshold processing of the emotion score for each type of emotion with a predetermined threshold value. For example, the emotion / concentration determination unit 48 displays an icon corresponding to the happy emotion in the vicinity of the position in the image of the first user for the first user having an emotion score of 80 or more for the happy emotion. Create display data for display. In addition, the emotion / concentration determination unit 48 displays an icon corresponding to the unpleasant emotion in the vicinity of the position in the image of the first user for the first user having an emotion score of 80 or more for the unpleasant emotion. Create display data for display.

また、感情・集中力判断部４８は、例えば、映像から第１ユーザの映像を切り出し、切り出した映像の隣に、第１ユーザの感情及び集中度の計測時刻、検出した感情及び集中度を表示するための表示用データを作成する。 Further, the emotion / concentration determination unit 48, for example, cuts out the image of the first user from the image and displays the measurement time of the emotion and concentration of the first user, the detected emotion and concentration next to the cut out image. Create display data to do so.

感情・集中力判断部４８は、作成した表示用データを表示・出力部５５に出力する。 The emotion / concentration determination unit 48 outputs the created display data to the display / output unit 55.

表示・出力部５５の処理は、実施形態１の表示・出力部５５と同様である。 The processing of the display / output unit 55 is the same as that of the display / output unit 55 of the first embodiment.

〔提供システム１の処理フロー〕
図９は、本開示の実施形態２に係る提供システム１による、第１装置２から第２装置４への第１ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。 [Processing flow of providing system 1]
FIG. 9 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the first user from the first device 2 to the second device 4 by the providing system 1 according to the second embodiment of the present disclosure.

第１装置２は、図４に示したのと同様のステップＳ１、Ｓ２、Ｓ６〜Ｓ８の処理を実行する。 The first device 2 executes the same processes of steps S1, S2, and S6 to S8 as shown in FIG.

第１装置２の第１送信部２９は、ステップＳ８において生成された多重化データを第２装置４に送信し、第２装置４の第２受信部５０は当該多重化データを受信する（Ｓ１６）。 The first transmission unit 29 of the first device 2 transmits the multiplexed data generated in step S8 to the second device 4, and the second receiving unit 50 of the second device 4 receives the multiplexed data (S16). ).

第２装置４の分離部５１は、ステップＳ１６において受信された多重化データを符号化済み映像及び符号化済み音声に分離する（Ｓ１０）。 The separation unit 51 of the second device 4 separates the multiplexed data received in step S16 into coded video and coded audio (S10).

第２装置４は、図４に示したのと同様のステップＳ１１及びＳ１２の処理を実行する。 The second device 4 executes the same processes of steps S11 and S12 as shown in FIG.

第２装置４の映像解析部４３は、ステップＳ１１において復号された映像を解析することにより、映像から第１ユーザを特定し、第１ユーザの位置、第１ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ１７）。 The video analysis unit 43 of the second device 4 identifies the first user from the video by analyzing the video decoded in step S11, and the emotion score for each position of the first user and the type of emotion of the first user. And the degree of concentration is determined (S17).

第２装置４の音声解析部４６は、ステップＳ１２において復号された音声を解析することにより、音声から第１ユーザを特定し、第１ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ１８）。 The voice analysis unit 46 of the second device 4 identifies the first user from the voice by analyzing the voice decoded in step S12, and determines the emotion score and the degree of concentration for each type of emotion of the first user. (S18).

第２装置４の感情・集中力判断部４８は、ステップＳ１７において決定された第１ユーザの感情の種類ごとの感情スコア及び集中度と、ステップＳ１８において決定された第１ユーザの感情の種類ごとの感情スコア及び集中度とに基づいて、第１ユーザの感情の種類ごとの感情スコア及び集中度を決定する。また、感情・集中力判断部４８は、ステップＳ１１において復号化された映像と、決定された第１ユーザの感情の種類ごとの感情スコア及び集中度とに基づいて、ディスプレイに第１ユーザの感情及び集中度を表示するための表示用データを作成する（Ｓ１９）。 The emotion / concentration determination unit 48 of the second device 4 determines the emotion score and concentration level for each type of emotion of the first user determined in step S17, and each type of emotion of the first user determined in step S18. Based on the emotional score and the degree of concentration of the first user, the emotional score and the degree of concentration for each type of emotion of the first user are determined. Further, the emotion / concentration determination unit 48 displays the emotion of the first user on the display based on the video decoded in step S11 and the determined emotion score and concentration of each type of emotion of the first user. And display data for displaying the degree of concentration is created (S19).

第２装置４の表示・出力部５５は、ステップＳ１９において作成された表示用データをステップＳ１１において復号された映像に重畳させ、重畳後の映像をディスプレイに表示させる（Ｓ１４）。 The display / output unit 55 of the second device 4 superimposes the display data created in step S19 on the video decoded in step S11, and displays the superposed video on the display (S14).

図９に示した処理を実行することにより、第１ユーザの感情及び集中度が第２ユーザに提供されることになる。 By executing the process shown in FIG. 9, the emotion and concentration of the first user are provided to the second user.

図１０は、本開示の実施形態２に係る提供システム１による、第２装置４から第１装置２への第２ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。 FIG. 10 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the second user from the second device 4 to the first device 2 by the providing system 1 according to the second embodiment of the present disclosure.

第２装置４は、図５に示したのと同様のステップＳ２１、Ｓ２２、Ｓ２６〜Ｓ２８の処理を実行する。 The second device 4 executes the same processes of steps S21, S22, and S26 to S28 as shown in FIG.

第２装置４の第２送信部４９は、ステップＳ２８において生成された多重化データを第１装置２に送信し、第１装置２の第１受信部３０は当該多重化データを受信する（Ｓ３６）。 The second transmission unit 49 of the second device 4 transmits the multiplexed data generated in step S28 to the first device 2, and the first receiving unit 30 of the first device 2 receives the multiplexed data (S36). ).

第１装置２の分離部３１は、ステップＳ３６において受信された多重化データを符号化済み映像及び符号化済み音声に分離する（Ｓ３０）。 The separation unit 31 of the first device 2 separates the multiplexed data received in step S36 into a coded video and a coded audio (S30).

第１装置２は、図５に示したのと同様のステップＳ３１及びＳ３２の処理を実行する。 The first device 2 executes the same processes of steps S31 and S32 as shown in FIG.

第１装置２の映像解析部２３は、ステップＳ３１において復号された映像を解析することにより、映像から第２ユーザを特定し、第２ユーザの位置、第２ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ３７）。 The video analysis unit 23 of the first device 2 identifies the second user from the video by analyzing the video decoded in step S31, and the emotion score for each position of the second user and the type of emotion of the second user. And the degree of concentration is determined (S37).

第１装置２の音声解析部２６は、ステップＳ３２において復号された音声を解析することにより、音声から第２ユーザを特定し、第２ユーザの感情の種類ごとの感情スコア及び集中度を決定する（Ｓ３８）。 The voice analysis unit 26 of the first device 2 identifies the second user from the voice by analyzing the voice decoded in step S32, and determines the emotion score and the degree of concentration for each type of emotion of the second user. (S38).

第１装置２の感情・集中力判断部２８は、ステップＳ３７において決定された第２ユーザの感情の種類ごとの感情スコア及び集中度と、ステップＳ３８において決定された第２ユーザの感情の種類ごとの感情スコア及び集中度とに基づいて、第２ユーザの感情の種類ごとの感情スコア及び集中度を決定する。また、感情・集中力判断部２８は、ステップＳ３１において復号化された映像と、決定された第２ユーザの感情の種類ごとの感情スコア及び集中度とに基づいて、ディスプレイに第２ユーザの感情及び集中度を表示するための表示用データを作成する（Ｓ３９）。 The emotion / concentration determination unit 28 of the first device 2 determines the emotion score and concentration level for each type of emotion of the second user determined in step S37, and each type of emotion of the second user determined in step S38. Based on the emotion score and concentration of the second user, the emotion score and concentration of each type of emotion of the second user are determined. Further, the emotion / concentration determination unit 28 displays the emotion of the second user on the display based on the video decoded in step S31 and the determined emotion score and concentration of each type of emotion of the second user. And display data for displaying the degree of concentration is created (S39).

第１装置２の表示・出力部５５は、ステップＳ３９において作成された表示用データをステップＳ３１において復号された映像に重畳させ、重畳後の映像をディスプレイに表示させる（Ｓ３４）。 The display / output unit 55 of the first device 2 superimposes the display data created in step S39 on the video decoded in step S31, and displays the superposed video on the display (S34).

第１装置２の表示・出力部５５は、ステップＳ３２において復号された音声をスピーカーから出力する（Ｓ３５）。 The display / output unit 55 of the first device 2 outputs the sound decoded in step S32 from the speaker (S35).

図１０に示した処理を実行することにより、第２ユーザの感情及び集中度が第２ユーザに提供されることになる。 By executing the process shown in FIG. 10, the emotion and concentration of the second user are provided to the second user.

〔実施形態２の効果等〕
実施形態２によると、第１装置２が、第２装置４から送信される第２ユーザの音声及び映像に基づいて、第２ユーザの感情及び集中度の少なくとも一方を判断することができる。このため、第１装置２は、第２ユーザの音声及び映像と第２ユーザの感情又は集中度との同期を正確に取ることができる。これにより、第２ユーザの音声及び映像と第２ユーザの感情又は集中度とを正確に対応付けて第１ユーザに提供することができる。 [Effects of Embodiment 2 and the like]
According to the second embodiment, the first device 2 can determine at least one of the emotion and the degree of concentration of the second user based on the audio and video of the second user transmitted from the second device 4. Therefore, the first device 2 can accurately synchronize the audio and video of the second user with the emotion or concentration of the second user. As a result, the audio and video of the second user can be accurately associated with the emotion or concentration of the second user and provided to the first user.

また、第２装置４が、第１装置２から送信される第１ユーザの音声及び映像に基づいて、第１ユーザの感情及び集中度の少なくとも一方を判断することができる。このため、第２装置４は、第１ユーザの音声及び映像と第１ユーザの感情又は集中度との同期を正確に取ることができる。これにより、第１ユーザの音声及び映像と第１ユーザの感情又は集中度とを正確に対応付けて第２ユーザに提供することができる。 In addition, the second device 4 can determine at least one of the emotion and the degree of concentration of the first user based on the audio and video of the first user transmitted from the first device 2. Therefore, the second device 4 can accurately synchronize the audio and video of the first user with the emotion or concentration of the first user. As a result, the audio and video of the first user can be accurately associated with the emotion or concentration of the first user and provided to the second user.

＜実施形態３＞
実施形態１及び２に示した提供システム１では、相手側の装置を利用するユーザの感情又は集中度をユーザに提示することはできるのの、当該感情又は集中度に基づいた処理はなされていない。 <Embodiment 3>
In the providing system 1 shown in the first and second embodiments, the emotion or concentration of the user who uses the device on the other side can be presented to the user, but the processing based on the emotion or concentration is not performed. ..

実施形態３では、ユーザの感情又は集中度に基づいて、所定の処理を実行する例について説明する。具体的には、感情及び集中度の少なくとも一方の判断結果に基づいて、ユーザに対して発言を促す提供システム１について説明する。 In the third embodiment, an example of executing a predetermined process based on the emotion or the degree of concentration of the user will be described. Specifically, the providing system 1 that prompts the user to speak based on the judgment result of at least one of emotion and concentration will be described.

実施形態３に係る提供システム１の構成は実施形態１と同様である。 The configuration of the provision system 1 according to the third embodiment is the same as that of the first embodiment.

〔第１装置２の構成〕
図１１は、本開示の実施形態３に係る第１装置２の機能的構成を示すブロック図である。実施形態３に係る第１装置２の構成は、実施形態１と同様である。ただし、感情・集中力処理部３４の処理結果が第１送信部２９に入力される点が、実施形態１と異なる。 [Structure of First Device 2]
FIG. 11 is a block diagram showing a functional configuration of the first device 2 according to the third embodiment of the present disclosure. The configuration of the first device 2 according to the third embodiment is the same as that of the first embodiment. However, it differs from the first embodiment in that the processing result of the emotion / concentration processing unit 34 is input to the first transmission unit 29.

感情・集中力処理部３４は、発言促進部として機能し、分離部３１から受けた第２ユーザの感情の種類ごとの感情スコアと、集中度とに基づいて、第２ユーザに発言を促すか否かを決定する。例えば、感情・集中力処理部３４は、集中度と所定の閾値とを比較し、第２ユーザに発言を促すか否かを決定する。より具体的には、感情・集中力処理部３４は、集中度が所定の閾値（例えば、３０）未満の第２ユーザに対して発言を促すことを決定する。 Whether the emotion / concentration processing unit 34 functions as a speech promotion unit and prompts the second user to speak based on the emotion score for each type of emotion of the second user received from the separation unit 31 and the degree of concentration. Decide whether or not. For example, the emotion / concentration processing unit 34 compares the degree of concentration with a predetermined threshold value and determines whether or not to prompt the second user to speak. More specifically, the emotion / concentration processing unit 34 determines to prompt a second user whose concentration is less than a predetermined threshold value (for example, 30) to speak.

また、感情・集中力処理部３４は、所定の種類の感情と所定の閾値とを比較し、第２ユーザに発言を促すか否かを決定してもよい。例えば、感情・集中力処理部３４は、怒りの感情が所定の閾値（例えば、９０）以上の第２ユーザに対して発言を促すことを決定する。また、感情・集中力処理部３４は、喜びの感情が所定の閾値（例えば、３０）未満の第２ユーザに対して発言を促すことを決定する。 Further, the emotion / concentration processing unit 34 may compare a predetermined type of emotion with a predetermined threshold value and decide whether or not to prompt the second user to speak. For example, the emotion / concentration processing unit 34 determines that the emotion of anger prompts a second user who has a predetermined threshold value (for example, 90) or more to speak. Further, the emotion / concentration processing unit 34 determines that the emotion of joy prompts the second user who has a feeling of joy less than a predetermined threshold value (for example, 30) to speak.

感情・集中力処理部３４は、発言を促す第２ユーザを特定した（第２ユーザの識別子を含む）発言促進指示信号を第１送信部２９に出力する。 The emotion / concentration processing unit 34 outputs a speech promotion instruction signal (including the identifier of the second user) that identifies the second user who prompts the speech to the first transmission unit 29.

第１送信部２９は、感情・集中力処理部３４から発言促進指示信号を受け、当該発言促進指示信号を第２装置４に送信する。 The first transmission unit 29 receives a speech promotion instruction signal from the emotion / concentration processing unit 34, and transmits the speech promotion instruction signal to the second device 4.

一方、第２装置４から発言を促進する第１ユーザを特定した発言促進指示信号が送信された場合には、第１受信部３０は、当該信号を受信し、分離部３１に出力する。 On the other hand, when a speech promotion instruction signal identifying the first user who promotes speech is transmitted from the second device 4, the first reception unit 30 receives the signal and outputs it to the separation unit 31.

分離部３１は、第１受信部３０から受けたデータに発言促進指示信号が含まれている場合には、当該信号を感情・集中力処理部３４に出力する。 When the data received from the first receiving unit 30 includes a speech promotion instruction signal, the separating unit 31 outputs the signal to the emotion / concentration processing unit 34.

感情・集中力処理部３４は、分離部３１から発言促進指示信号を受けた場合には、発言促進指示信号に示される第１ユーザに発言を促すための表示用データ作成し、表示・出力部３５に出力する。例えば、感情・集中力処理部３４は、第１ユーザのユーザ名が「Ａ」である場合には、「Ａさんは何か意見ありませんか？」などのようなメッセージの表示用データを作成する。 When the emotion / concentration processing unit 34 receives the speech promotion instruction signal from the separation unit 31, the emotion / concentration processing unit 34 creates display data for prompting the first user indicated by the speech promotion instruction signal to speak, and displays / outputs the display / output unit. Output to 35. For example, when the user name of the first user is "A", the emotion / concentration processing unit 34 creates data for displaying a message such as "Does Mr. A have any opinion?" ..

表示・出力部３５は、感情・集中力処理部３４から表示用データを受け、当該表示用データをディスプレイに表示させる。 The display / output unit 35 receives display data from the emotion / concentration processing unit 34 and displays the display data on the display.

図１２は、本開示の実施形態３に係る第２装置４の機能的構成を示すブロック図である。実施形態３に係る第２装置４の構成は、実施形態１と同様である。ただし、感情・集中力処理部５４の処理結果が第２送信部４９に入力される点が、実施形態１と異なる。 FIG. 12 is a block diagram showing a functional configuration of the second device 4 according to the third embodiment of the present disclosure. The configuration of the second device 4 according to the third embodiment is the same as that of the first embodiment. However, it differs from the first embodiment in that the processing result of the emotion / concentration processing unit 54 is input to the second transmission unit 49.

感情・集中力処理部５４は、分離部５１から受けた第１ユーザの感情の種類ごとの感情スコアと、集中度とに基づいて、第１ユーザに発言を促すか否かを決定する。例えば、感情・集中力処理部５４は、集中度と所定の閾値とを比較し、第１ユーザに発言を促すか否かを決定する。より具体的には、感情・集中力処理部５４は、集中度が所定の閾値（例えば、３０）未満の第１ユーザに対して発言を促すことを決定する。 The emotion / concentration processing unit 54 determines whether or not to prompt the first user to speak based on the emotion score for each type of emotion of the first user received from the separation unit 51 and the degree of concentration. For example, the emotion / concentration processing unit 54 compares the degree of concentration with a predetermined threshold value and determines whether or not to prompt the first user to speak. More specifically, the emotion / concentration processing unit 54 determines to prompt the first user whose concentration is less than a predetermined threshold value (for example, 30) to speak.

また、感情・集中力処理部５４は、所定の種類の感情と所定の閾値とを比較し、第１ユーザに発言を促すか否かを決定してもよい。例えば、感情・集中力処理部５４は、怒りの感情が所定の閾値（例えば、９０）以上の第１ユーザに対して発言を促すことを決定する。また、感情・集中力処理部５４は、喜びの感情が所定の閾値（例えば、３０）未満の第１ユーザに対して発言を促すことを決定する。 Further, the emotion / concentration processing unit 54 may compare a predetermined type of emotion with a predetermined threshold value and decide whether or not to prompt the first user to speak. For example, the emotion / concentration processing unit 54 determines that the emotion of anger prompts a first user who has a predetermined threshold value (for example, 90) or more to speak. Further, the emotion / concentration processing unit 54 determines that the emotion of joy prompts the first user who has a feeling of joy less than a predetermined threshold value (for example, 30) to speak.

感情・集中力処理部５４は、発言を促す第１ユーザを特定した（第１ユーザの識別子を含む）発言促進指示信号を第２送信部４９に出力する。 The emotion / concentration processing unit 54 outputs a speech promotion instruction signal (including the identifier of the first user) that identifies the first user who prompts the speech to the second transmission unit 49.

第２送信部４９は、感情・集中力処理部５４から発言促進指示信号を受け、当該発言促進指示信号を第１装置２に送信する。 The second transmission unit 49 receives a speech promotion instruction signal from the emotion / concentration processing unit 54, and transmits the speech promotion instruction signal to the first device 2.

一方、第１装置２から発言を促進する第２ユーザを特定した発言促進指示信号が送信された場合には、第２受信部５０は、当該信号を受信し、分離部５１に出力する。 On the other hand, when a speech promotion instruction signal specifying a second user who promotes speech is transmitted from the first device 2, the second reception unit 50 receives the signal and outputs it to the separation unit 51.

分離部５１は、第２受信部５０から受けたデータに発言促進指示信号が含まれている場合には、当該信号を感情・集中力処理部５４に出力する。 When the data received from the second receiving unit 50 includes a speech promotion instruction signal, the separation unit 51 outputs the signal to the emotion / concentration processing unit 54.

感情・集中力処理部５４は、分離部５１から発言促進指示信号を受けた場合には、発言促進指示信号に示される第２ユーザに発言を促すための表示用データ作成し、表示・出力部５５に出力する。例えば、感情・集中力処理部５４は、第２ユーザのユーザ名が「Ｂ」である場合には、「Ｂさんは何か意見ありませんか？」などのようなメッセージの表示用データを作成する。 When the emotion / concentration processing unit 54 receives the speech promotion instruction signal from the separation unit 51, the emotion / concentration processing unit 54 creates display data for prompting the second user indicated by the speech promotion instruction signal to speak, and displays / outputs the display / output unit. Output to 55. For example, when the user name of the second user is "B", the emotion / concentration processing unit 54 creates data for displaying a message such as "Does Mr. B have any opinion?" ..

表示・出力部５５は、感情・集中力処理部５４から表示用データを受け、当該表示用データをディスプレイに表示させる。 The display / output unit 55 receives display data from the emotion / concentration processing unit 54 and displays the display data on the display.

〔提供システム１の処理フロー〕
図１３は、本開示の実施形態３に係る提供システム１による、第１装置２から第２装置４への第１ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。 [Processing flow of providing system 1]
FIG. 13 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the first user from the first device 2 to the second device 4 by the providing system 1 according to the third embodiment of the present disclosure.

提供システム１は、図５に示した実施形態１と同様のステップＳ１からＳ１５までの処理を実行する。 The providing system 1 executes the processes from steps S1 to S15 similar to those of the first embodiment shown in FIG.

第２装置４の感情・集中力処理部５４は、ステップＳ１０において分離された第１ユーザの感情の種類ごとの感情スコアと、集中度とに基づいて、第１ユーザに発言を促すか否かを決定し、発言を促す第１ユーザを特定した発言促進指示信号を第２送信部４９に出力する（Ｓ４１）。 Whether or not the emotion / concentration processing unit 54 of the second device 4 prompts the first user to speak based on the emotion score for each type of emotion of the first user separated in step S10 and the degree of concentration. Is determined, and a speech promotion instruction signal identifying the first user who prompts the speech is output to the second transmission unit 49 (S41).

第２装置４の第２送信部４９は、発言促進指示信号を第１装置２に送信し、第１装置２の第１受信部３０は、当該信号を受信する（Ｓ４２）。 The second transmission unit 49 of the second device 4 transmits a speech promotion instruction signal to the first device 2, and the first reception unit 30 of the first device 2 receives the signal (S42).

第１装置２の分離部３１は、ステップＳ４２において受信された発言促進信号を感情・集中力処理部３４に出力し、感情・集中力処理部３４は、発言促進指示信号に示される第１ユーザに発言を促すための表示用データ作成し、表示・出力部３５に出力する。表示・出力部３５は、感情・集中力処理部３４から表示用データを受け、当該表示用データをディスプレイに表示させる（Ｓ４３）。 The separation unit 31 of the first device 2 outputs the speech promotion signal received in step S42 to the emotion / concentration processing unit 34, and the emotion / concentration processing unit 34 is the first user indicated in the speech promotion instruction signal. Create display data to encourage the user to speak, and output it to the display / output unit 35. The display / output unit 35 receives display data from the emotion / concentration processing unit 34 and displays the display data on the display (S43).

図１４は、本開示の実施形態３に係る提供システム１による、第２装置４から第１装置２への第２ユーザの感情及び集中度の提供処理の手順の一例を示すシーケンス図である。 FIG. 14 is a sequence diagram showing an example of a procedure for providing the emotion and concentration of the second user from the second device 4 to the first device 2 by the providing system 1 according to the third embodiment of the present disclosure.

提供システム１は、図６に示した実施形態１と同様のステップＳ２１からＳ３５までの処理を実行する。 The providing system 1 executes the same processes from steps S21 to S35 as in the first embodiment shown in FIG.

第１装置２の感情・集中力処理部３４は、ステップＳ３０において分離された第２ユーザの感情の種類ごとの感情スコアと、集中度とに基づいて、第２ユーザに発言を促すか否かを決定し、発言を促す第２ユーザを特定した発言促進指示信号を第１送信部２９に出力する（Ｓ４４）。 Whether or not the emotion / concentration processing unit 34 of the first device 2 prompts the second user to speak based on the emotion score for each type of emotion of the second user separated in step S30 and the degree of concentration. Is determined, and a speech promotion instruction signal identifying the second user who prompts the speech is output to the first transmission unit 29 (S44).

第１装置２の第１送信部２９は、発言促進指示信号を第２装置４に送信し、第２装置４の第２受信部５０は、当該信号を受信する（Ｓ４５）。 The first transmission unit 29 of the first device 2 transmits a speech promotion instruction signal to the second device 4, and the second reception unit 50 of the second device 4 receives the signal (S45).

第２装置４の分離部５１は、ステップＳ４５において受信された発言促進信号を感情・集中力処理部５４に出力し、感情・集中力処理部５４は、発言促進指示信号に示される第２ユーザに発言を促すための表示用データ作成し、表示・出力部５５に出力する。表示・出力部５５は、感情・集中力処理部５４から表示用データを受け、当該表示用データをディスプレイに表示させる（Ｓ４６）。 The separation unit 51 of the second device 4 outputs the speech promotion signal received in step S45 to the emotion / concentration processing unit 54, and the emotion / concentration processing unit 54 is the second user indicated in the speech promotion instruction signal. Create display data to encourage the user to speak, and output it to the display / output unit 55. The display / output unit 55 receives display data from the emotion / concentration processing unit 54 and displays the display data on the display (S46).

〔実施形態３の効果等〕
実施形態３によると、例えば、第２ユーザが第１ユーザの発話内容に対して否定的な感情を抱いていたり、第２ユーザが集中していない場合などに、第２ユーザに発言を促すことができる。同様に、第１ユーザが第２ユーザの発話内容に対して否定的な感情を抱いていたり、第１ユーザが集中していない場合などに、第１ユーザに発言を促すことができる。これにより、議論を有意義なものとし、ユーザ同士の円滑なコミュニケーションを支援することができる。 [Effects of Embodiment 3 and the like]
According to the third embodiment, for example, when the second user has a negative feeling toward the utterance content of the first user, or when the second user is not concentrated, the second user is urged to speak. Can be done. Similarly, when the first user has a negative feeling toward the utterance content of the second user, or when the first user is not concentrated, the first user can be urged to speak. This makes the discussion meaningful and supports smooth communication between users.

なお、第１装置２の感情・集中力処理部３４は、第２ユーザに発言を促すか否かを決定したが、第１ユーザに発言を促すか否かを決定してもよい。つまり、感情・集中力処理部３４は、感情・集中力判断部２８から、第１ユーザの感情スコア及び集中度を取得し、取得した感情スコア及び集中度に基づいて、第１ユーザに発言を促すか否かを決定する。感情・集中力処理部３４は、決定した結果に基づいて、第１ユーザに発言を促すための表示用データ作成し、表示・出力部３５に出力する。 Although the emotion / concentration processing unit 34 of the first device 2 has determined whether or not to prompt the second user to speak, it may be determined whether or not to prompt the first user to speak. That is, the emotion / concentration processing unit 34 acquires the emotion score and concentration level of the first user from the emotion / concentration determination unit 28, and makes a statement to the first user based on the acquired emotion score and concentration level. Decide whether to encourage. The emotion / concentration processing unit 34 creates display data for prompting the first user to speak based on the determined result, and outputs the display / output data to the display / output unit 35.

同様に、第２装置４の感情・集中力処理部５４は、第２ユーザに発言を促すかを決定してもよい。つまり、感情・集中力処理部５４は、感情・集中力判断部４８から、第２ユーザの感情スコア及び集中度を取得し、取得した感情スコア及び集中度に基づいて、第２ユーザに発言を促すか否かを決定する。感情・集中力処理部５４は、決定した結果に基づいて、第２ユーザに発言を促すための表示用データ作成し、表示・出力部５５に出力する。 Similarly, the emotion / concentration processing unit 54 of the second device 4 may determine whether to prompt the second user to speak. That is, the emotion / concentration processing unit 54 acquires the emotion score and concentration level of the second user from the emotion / concentration determination unit 48, and makes a statement to the second user based on the acquired emotion score and concentration level. Decide whether to encourage. The emotion / concentration processing unit 54 creates display data for prompting the second user to speak based on the determined result, and outputs the display / output data to the display / output unit 55.

＜実施形態の変形例＞
上述の実施形態において、会議に参加する第１ユーザ及び第２ユーザのそれぞれについて、第１ユーザと第２ユーザとの対話における各ユーザの貢献度を算出してもよい。 <Modified example of the embodiment>
In the above-described embodiment, for each of the first user and the second user who participate in the conference, the degree of contribution of each user in the dialogue between the first user and the second user may be calculated.

例えば、図２又は図１１に示した第１装置２の構成において、感情・集中力判断部２８は、第１ユーザの感情の種類ごとの感情スコアと集中度とに基づいて第１ユーザの貢献度を算出してもよい。例えば、感情・集中力判断部２８は、会議における第１ユーザの平均の集中度を算出し、平均集中度が大きい程、値が大きくなるような変換式に従い貢献度を算出してもよい。感情・集中力判断部２８は、算出した第１ユーザの貢献度を、第１ユーザの識別子とともに感情・集中力ＤＢ５に書き込む。 For example, in the configuration of the first device 2 shown in FIG. 2 or FIG. 11, the emotion / concentration determination unit 28 contributes to the first user based on the emotion score and the degree of concentration for each type of emotion of the first user. The degree may be calculated. For example, the emotion / concentration determination unit 28 may calculate the average concentration of the first user in the meeting, and calculate the contribution according to a conversion formula such that the larger the average concentration is, the larger the value is. The emotion / concentration determination unit 28 writes the calculated contribution degree of the first user in the emotion / concentration DB 5 together with the identifier of the first user.

同様に、図４又は図１２に示した第２装置４の構成において、感情・集中力判断部４８が、第２ユーザの貢献度を算出し、算出結果を感情・集中力ＤＢ５に書き込んでもよい。 Similarly, in the configuration of the second device 4 shown in FIG. 4 or FIG. 12, the emotion / concentration determination unit 48 may calculate the contribution degree of the second user and write the calculation result in the emotion / concentration DB5. ..

また、図７に示した第１装置２の構成において、感情・集中力判断部２８が、第２ユーザの貢献度を算出し、算出結果を感情・集中力ＤＢ５に書き込んでもよい。 Further, in the configuration of the first device 2 shown in FIG. 7, the emotion / concentration determination unit 28 may calculate the contribution degree of the second user and write the calculation result in the emotion / concentration DB5.

また、図８に示した第２装置４の構成において、感情・集中力判断部４８が、第１ユーザの貢献度を算出し、算出結果を感情・集中力ＤＢ５に書き込んでもよい。 Further, in the configuration of the second device 4 shown in FIG. 8, the emotion / concentration determination unit 48 may calculate the contribution degree of the first user and write the calculation result in the emotion / concentration DB 5.

本変形例によると、ユーザの感情及び集中度の少なくとも一方の判断結果に基づいて、ユーザの対話における貢献度を算出することができる。例えば、対話に集中していたユーザの貢献度を高く算出したり、軽蔑や嫌悪の感情が低く、喜びや驚きの感情が高いユーザの貢献度を高く算出したりすることが可能である。 According to this modification, the degree of contribution in the user's dialogue can be calculated based on the judgment result of at least one of the user's emotion and the degree of concentration. For example, it is possible to calculate a high degree of contribution of a user who has concentrated on dialogue, or to calculate a high degree of contribution of a user who has low feelings of contempt and disgust and high feelings of joy and surprise.

［付記］
以上、本開示の実施形態に係る提供システム１について説明したが、本開示は、この実施形態に限定されるものではない。 [Additional Notes]
Although the provision system 1 according to the embodiment of the present disclosure has been described above, the present disclosure is not limited to this embodiment.

上記各装置は、複数のコンピュータにより実現されてもよい。 Each of the above devices may be realized by a plurality of computers.

上記各装置の一部又は全部の機能がクラウドコンピューティングによって提供されてもよい。つまり、各装置の一部又は全部の機能がクラウドサーバにより実現されていてもよい。 Some or all the functions of each of the above devices may be provided by cloud computing. That is, some or all the functions of each device may be realized by the cloud server.

さらに、上記実施形態及び上記変形例の少なくとも一部を任意に組み合わせてもよい。 Further, at least a part of the above-described embodiment and the above-described modification may be arbitrarily combined.

今回開示された実施形態はすべての点で例示であって制限的なものではないと考えられるべきである。本開示の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed this time should be considered to be exemplary in all respects and not restrictive. The scope of the present disclosure is indicated by the scope of claims, not the above-mentioned meaning, and is intended to include all modifications within the meaning and scope equivalent to the scope of claims.

１提供システム
２第１装置
３ネットワーク
４第２装置
５感情・集中力ＤＢ
２１映像取得部（第１取得部）
２２映像符号化部
２３映像解析部（判断部）
２４音声取得部（第１取得部）
２５音声符号化部
２６音声解析部（判断部）
２７多重化部
２８感情・集中力判断部（判断部）
２９第１送信部（提供部）
３０第１受信部
３１分離部
３２映像復号化部（映像取得部）
３３音声復号化部（音声取得部）
３４感情・集中力処理部（発言促進部、算出部）
３５表示・出力部（提供部、第１出力部）
４１映像取得部（映像取得部、第２取得部）
４２映像符号化部
４３映像解析部（判断部）
４４音声取得部（音声取得部、第２取得部）
４５音声符号化部
４６音声解析部（判断部）
４７多重化部
４８感情・集中力判断部（判断部）
４９第２送信部（提供部）
５０第２受信部
５１分離部
５２映像復号化部
５３音声復号化部
５４感情・集中力処理部
５５表示・出力部（第２出力部）
６０映像表示領域
６１感情履歴通知領域
７１Ａユーザ
７１Ｂユーザ
７１Ｃユーザ
７２Ａアイコン
７２Ｂアイコン
７２Ｃアイコン
７３Ａ判断結果
７３Ｂ判断結果
７３Ｃ判断結果
1 Provided system 2 1st device 3 Network 4 2nd device 5 Emotion / concentration DB
21 Video acquisition section (1st acquisition section)
22 Video coding unit 23 Video analysis unit (judgment unit)
24 Voice acquisition unit (1st acquisition unit)
25 Voice coding unit 26 Voice analysis unit (judgment unit)
27 Multiplexing Department 28 Emotion / Concentration Judgment Department (Judgment Department)
29 First transmitter (provider)
30 First receiving unit 31 Separation unit 32 Video decoding unit (video acquisition unit)
33 Voice decoding unit (voice acquisition unit)
34 Emotion / Concentration Processing Department (Speech Promotion Department, Calculation Department)
35 Display / output section (providing section, first output section)
41 Video acquisition unit (video acquisition unit, second acquisition unit)
42 Video coding unit 43 Video analysis unit (judgment unit)
44 Voice acquisition unit (voice acquisition unit, second acquisition unit)
45 Voice coding unit 46 Voice analysis unit (judgment unit)
47 Multiplexing section 48 Emotion / concentration judgment section (judgment section)
49 Second transmitter (provider)
50 Second receiving unit 51 Separation unit 52 Video decoding unit 53 Audio decoding unit 54 Emotion / concentration processing unit 55 Display / output unit (second output unit)
60 Video display area 61 Emotion history notification area 71A User 71B User 71C User 72A Icon 72B Icon 72C Icon 73A Judgment result 73B Judgment result 73C Judgment result

Claims

A video acquisition unit that acquires the video of the second user, which is obtained by photographing the second user who listens to the voice of the first user who is the speaker.
A judgment unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user.
A providing system including a providing unit that provides a determination result by the determining unit to the first user.

The provided system further includes a voice acquisition unit that acquires the voice of the second user.
The providing system according to claim 1, wherein the determination unit determines at least one of the emotion and the degree of concentration of the second user based on the acquired video and audio of the second user.

The provided system includes a first device and a second device connected to each other via a network.
The first device is
The first acquisition unit that acquires the audio and video of the first user, and
A first transmission unit that transmits the acquired audio and video of the first user to the second device, and
A first receiving unit that receives audio and video of the second user from the second device, and
The first output unit as the providing unit, which outputs the received audio and video of the second user and the determination result by the determination unit, is included.
The second device is
The audio acquisition unit, the second acquisition unit as the video acquisition unit, and the second acquisition unit that acquires the audio and video of the second user.
A second transmission unit that transmits the acquired audio and video of the second user to the first device, and
A second receiving unit that receives the audio and video of the first user from the first device, and
The providing system according to claim 2, further comprising a second output unit that outputs the received audio and video of the first user.

The determination unit is provided in the first device, and determines at least one of the emotion and the degree of concentration of the second user based on the audio and video of the second user received by the first receiving unit. The provision system according to item 3.

The provided system includes a first device and a second device connected to each other via a network.
The first device is
The first acquisition unit that acquires the voice of the first user, and
A first transmission unit that transmits the acquired voice of the first user to the second device, and
A first receiving unit that receives the voice of the second user from the second device, and
The first output unit as the providing unit, which outputs the received voice of the second user and the determination result by the determination unit, is included.
The second device is
The audio acquisition unit, the second acquisition unit as the video acquisition unit, and the second acquisition unit that acquires the audio and video of the second user.
A second transmission unit that transmits the acquired voice of the second user to the first device, and
A second receiving unit that receives the voice of the first user from the first device, and
Including a second output unit that outputs the received voice of the first user.
The determination unit determines at least one of the emotion and concentration of the second user based on the audio and video of the second user acquired by the second acquisition unit, which is provided in the second device.
The second transmission unit further transmits the determination result by the determination unit.
The first receiving unit further receives the determination result by the determination unit, and receives the determination result.
The providing system according to claim 2, wherein the first output unit outputs a determination result by the determination unit received by the first receiving unit.

The second device further
The invention according to any one of claims 3 to 5, further comprising a speech promotion unit that prompts the second user to speak based on the determination result of at least one of the emotion and concentration of the second user. Offering system.

The provided system further
Any one of claims 3 to 6, further comprising a calculation unit that calculates the degree of contribution of the second user in the dialogue between the first user and the second user based on the judgment result by the determination unit. Provided system described in.

The judgment unit further corrects the judgment result of at least one of the emotion and the concentration of the second user based on the history of the judgment result of at least one of the emotion and the concentration of the second user. The providing system according to any one of claims 7.

A step of acquiring the video of the second user, which is obtained by photographing the second user who watches the audio and video of the first user who is the speaker.
A step of determining at least one of the emotion and concentration of the second user based on the acquired video of the second user, and
A providing method including a step of providing a determination result in the determination step to the first user.

An image acquisition unit that acquires the image of the second user, which is obtained by photographing the second user who views the sound and image of the first user who is the speaker.
A judgment unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user.
A providing device including a providing unit that provides a determination result by the determining unit to the first user.

Computer,
An image acquisition unit that acquires the image of the second user, which is obtained by photographing the second user who views the sound and image of the first user who is the speaker.
A judgment unit that determines at least one of the emotion and concentration of the second user based on the acquired video of the second user.
A computer program for causing the determination result by the determination unit to function as a provision unit to be provided to the first user.