JP2023047956A

JP2023047956A - Information processing device, information processing method, and information processing program

Info

Publication number: JP2023047956A
Application number: JP2021157164A
Authority: JP
Inventors: 征幸上村; Masayuki Kamimura
Original assignee: SoftBank Corp
Current assignee: SoftBank Corp
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2023-04-06
Anticipated expiration: 2041-09-27
Also published as: JP7292343B2

Abstract

To improve the usability in remote meetings.SOLUTION: An information processing program according to the present application cause a computer to execute a calculation procedure for calculating, for each of a plurality of speakers, a listening degree indicating the degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speeches of the plurality of speakers in a remote conference in which a plurality of participants participate, and an auxiliary control procedure for providing an auxiliary function that makes it easier for listeners to hear the speech of a low-attentive speaker whose degree of listening is lower than that of a high-attentive speaker whose degree of listening is high.SELECTED DRAWING: Figure 7

Description

本発明は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来、遠隔地にいる人同士が、電話回線やインターネット回線を用いてオンライン上で会議を行うリモート会議（遠隔会議）システムに関する技術が知られている。例えば、複数の拠点間で遠隔通話により会議を実施する電話会議システムにおいて、受話側で聞く発言者の音像位置を任意に設定するためのレンダリング処理手段を会議参加者側それぞれに設ける技術が知られている。 2. Description of the Related Art Conventionally, there is known a technology related to a remote conference (teleconference) system in which people in remote locations hold an online conference using a telephone line or an Internet line. For example, in a teleconference system in which a conference is held between a plurality of bases through remote calls, there is known a technique in which each conference participant is provided with rendering processing means for arbitrarily setting the sound image position of the speaker who is heard on the receiving side. ing.

特開２００６－２７９４９２号公報JP-A-2006-279492

しかしながら、上記の従来技術では、リモート会議におけるユーザビリティを向上させることができるとは限らない。例えば、上記の従来技術では、受話側で各発言者の音声を仮想的にそれぞれの発言者位置に自由に配置させるにすぎない。そのため、複数の発言者のうち、相対的に、聞き手が注目する相手ではない（聞き手が注目していない）発言者の話を聞き手が満足に聞くことが困難となる可能性がある。したがって、上記の従来技術では、リモート会議におけるユーザビリティが高いとは言えない場合がある。 However, the conventional technology described above does not necessarily improve usability in a remote conference. For example, in the above conventional technology, the voice of each speaker is virtually freely arranged at each speaker's position on the receiving side. Therefore, it may be difficult for the listener to satisfactorily listen to the speech of a speaker who is relatively not the listener's attention (the listener is not paying attention to) among the plurality of speakers. Therefore, it may not be said that the conventional technology described above has high usability in a remote conference.

実施形態に係る情報処理プログラムは、複数の参加者が参加するリモート会議において、複数の発言者の発言を聞いている聞き手による前記複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を前記複数の発言者それぞれについて算出する算出手順と、前記傾聴度合が高い高傾聴発言者と比べて、前記傾聴度合が低い低傾聴発言者の発言を、前記聞き手にとって聞きやすくする補助機能を提供する補助制御手順と、をコンピュータに実行させる。 An information processing program according to an embodiment, in a remote conference in which a plurality of participants participate, calculates a listening degree indicating a degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speech of the plurality of speakers. A calculation procedure for calculating each of a plurality of speakers, and assistance for providing an auxiliary function that makes it easier for the listener to hear the speech of the low-attentive speaker whose degree of listening is low compared to the high-attentive speaker whose high degree of listening is. causing a computer to execute a control procedure;

図１は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of an information processing system according to an embodiment. 図２は、実施形態に係る情報処理装置の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of an information processing apparatus according to the embodiment; 図３は、実施形態に係る画面の一例を示す図である。FIG. 3 is a diagram illustrating an example of a screen according to the embodiment; 図４は、実施形態に係る複数音声の３次元配置の一例を示す図である。FIG. 4 is a diagram showing an example of three-dimensional arrangement of multiple voices according to the embodiment. 図５は、実施形態に係る傾聴度合について説明するための図である。FIG. 5 is a diagram for explaining the listening degree according to the embodiment. 図６は、実施形態に係る補助機能の一例を示す図である。FIG. 6 is a diagram illustrating an example of an auxiliary function according to the embodiment; 図７は、実施形態に係る補助機能の一例を示す図である。FIG. 7 is a diagram illustrating an example of an auxiliary function according to the embodiment; 図８は、実施形態に係る情報処理手順を示す図である。FIG. 8 is a diagram illustrating an information processing procedure according to the embodiment; 図９は、変形例に係る補助機能の一例を示す図である。FIG. 9 is a diagram illustrating an example of an auxiliary function according to a modification; 図１０は、変形例に係る補助機能の一例を示す図である。FIG. 10 is a diagram illustrating an example of an auxiliary function according to a modification; 図１１は、変形例に係る補助機能の一例を示す図である。FIG. 11 is a diagram illustrating an example of an auxiliary function according to a modification; 図１２は、情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 12 is a hardware configuration diagram showing an example of a computer that implements the functions of the information processing apparatus.

以下に、本願に係る情報処理装置、情報処理方法及び情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法及び情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Embodiments for implementing an information processing apparatus, an information processing method, and an information processing program according to the present application (hereinafter referred to as "embodiments") will be described in detail below with reference to the drawings. The information processing apparatus, information processing method, and information processing program according to the present application are not limited to this embodiment. Also, in each of the following embodiments, the same parts are denoted by the same reference numerals, and overlapping descriptions are omitted.

（実施形態）
〔１．情報処理システムの構成例〕
図１は、実施形態に係る情報処理システム１の構成例を示す図である。情報処理システム１は、リモート会議サービスの利用者によって利用される情報処理装置１００と、リモート会議サービスを提供する配信サーバ２００とを備える。情報処理装置１００と配信サーバ２００とは所定のネットワークＮを介して、有線または無線により通信可能に接続される。なお、図１に示す情報処理システム１には、任意の数の情報処理装置１００と任意の数の配信サーバ２００とが含まれてもよい。以下では、リモート会議の一例として、Ｗｅｂ会議（オンライン会議ともいう）の場合について説明する。 (embodiment)
[1. Configuration example of information processing system]
FIG. 1 is a diagram showing a configuration example of an information processing system 1 according to an embodiment. The information processing system 1 includes an information processing device 100 used by users of the remote conference service, and a distribution server 200 that provides the remote conference service. The information processing device 100 and the distribution server 200 are connected via a predetermined network N so as to be communicable by wire or wirelessly. Note that the information processing system 1 shown in FIG. 1 may include an arbitrary number of information processing apparatuses 100 and an arbitrary number of distribution servers 200 . A case of a web conference (also referred to as an online conference) will be described below as an example of a remote conference.

情報処理装置１００は、Ｗｅｂ会議サービスの利用者によって利用される情報処理装置である。情報処理装置１００は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣや、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。なお、以下では、Ｗｅｂ会議サービスの利用者のうち、所定のＷｅｂ会議に参加している利用者のことを「参加者」と記載する。 The information processing device 100 is an information processing device used by users of the web conference service. The information processing apparatus 100 is realized by, for example, a smartphone, a tablet terminal, a notebook PC, a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like. In addition, hereinafter, among the users of the Web conference service, users participating in a predetermined Web conference are referred to as "participants".

また、情報処理装置１００には、Ｗｅｂ会議サービスを利用するためのアプリケーション（以下、「Ｗｅｂ会議アプリ」ともいう）がインストールされている。情報処理装置１００は、Ｗｅｂ会議アプリをインストールした利用者（後の参加者）を識別可能な参加者識別情報（例えば、ＩＤ）を発行する。また、情報処理装置１００は、参加者識別情報とともに、参加者に関する基本情報である参加者情報（参加者の本人情報、会議における参加者の役割（「プレゼンター」、「参加者」等）、デバイスおよびアプリケーションに関する情報、ＩＰアドレス、設定キーワード等）をメタデータとして配信サーバ２００に送信する。 An application for using a web conference service (hereinafter also referred to as a “web conference application”) is installed in the information processing apparatus 100 . The information processing apparatus 100 issues participant identification information (for example, an ID) capable of identifying a user (later participant) who installed the web conference application. In addition to the participant identification information, the information processing apparatus 100 also includes participant information (participant's identity information, participant's role in the conference (“presenter”, “participant”, etc.), device information), which is basic information about the participant. and information about applications, IP addresses, setting keywords, etc.) are sent to the distribution server 200 as metadata.

配信サーバ２００は、Ｗｅｂ会議サービスを提供するサーバ装置である。具体的には、配信サーバ２００は、複数の参加者それぞれの参加者情報を複数の参加者それぞれの情報処理装置１００から受信する。続いて、配信サーバ２００は、参加者全員の参加者情報を集約した会議メタデータを生成する。続いて、配信サーバ２００は、会議メタデータを生成すると、生成した会議メタデータを複数の参加者それぞれの情報処理装置１００に送信する。また、配信サーバ２００は、参加者情報がアップデートされた場合、更新された会議メタデータをその都度生成し、複数の参加者それぞれの情報処理装置１００に送信する。 The distribution server 200 is a server device that provides a web conference service. Specifically, the distribution server 200 receives the participant information of each of the multiple participants from the information processing device 100 of each of the multiple participants. Subsequently, the distribution server 200 generates conference metadata that summarizes the participant information of all the participants. Subsequently, after generating the conference metadata, the distribution server 200 transmits the generated conference metadata to the information processing apparatuses 100 of the respective participants. Further, when the participant information is updated, the distribution server 200 generates updated conference metadata each time, and transmits the updated conference metadata to the information processing apparatuses 100 of the respective participants.

また、配信サーバ２００は、参加者全員の参加者情報に基づいて、各参加者の氏名、所属団体（社名、所属部署等）を設定キーワードとして取得する。また、配信サーバ２００は、参加者全員の参加者情報に基づいて、参加者によってあらかじめ設定されたキーワードを設定キーワードとして取得する。なお、設定キーワードは、単語に限らず、フレーズ等の文章であってもよい。また、配信サーバ２００は、会議での頻出ワード、呼びかけ語（「ちょっといいですか」等）を設定キーワードとして取得する。なお、設定キーワードの取得は、配信サーバ２００内に設けられた学習部が、過去のＷｅｂ会議において設定されたキーワードを教師データとして機械学習し、その学習結果に基づいて取得してもよい。配信サーバ２００は、設定キーワードを取得すると、各参加者の設定キーワードと各参加者の参加者識別情報とを対応付けたキーワードリストを生成する。配信サーバ２００は、キーワードリストを生成すると、生成したキーワードリストを複数の参加者それぞれの情報処理装置１００に送信する。 Also, the distribution server 200 acquires the name and organization (company name, department, etc.) of each participant as set keywords based on the participant information of all the participants. Also, the distribution server 200 acquires keywords preset by the participants as set keywords based on the participant information of all the participants. Note that the set keyword is not limited to words, and may be sentences such as phrases. In addition, the distribution server 200 acquires frequently used words in meetings and address words (such as "Would you like a minute?") as setting keywords. It should be noted that the set keyword may be acquired based on the result of machine learning performed by a learning unit provided in the distribution server 200 using keywords set in past web conferences as teacher data. Upon acquiring the set keywords, the distribution server 200 generates a keyword list in which the set keywords of each participant and the participant identification information of each participant are associated with each other. After generating the keyword list, the distribution server 200 transmits the generated keyword list to the information processing devices 100 of the respective participants.

また、情報処理装置１００は、カメラ、マイク、スピーカー等の各種センサの機能を有するデバイスを備える。なお、以下では、情報処理装置１００を使用している参加者のことを「本人」と記載する場合がある。例えば、情報処理装置１００は、マイクが検出した参加者（本人）の音声に関する音声データおよびカメラが検出した参加者（本人）の画像データを参加者識別情報とともに配信サーバ２００に送信する。以下では、画像データが映像（動画像ともいう）である場合について説明する。なお、画像データには、静止画像が含まれてよい。 The information processing apparatus 100 also includes devices having various sensor functions such as a camera, a microphone, and a speaker. In the following description, the participant using the information processing device 100 may be referred to as the "principal". For example, the information processing device 100 transmits audio data relating to the participant's (principal's) voice detected by the microphone and the participant's (principal's) image data detected by the camera to the distribution server 200 together with the participant identification information. A case where the image data is a video (also referred to as a moving image) will be described below. Note that the image data may include a still image.

また、配信サーバ２００は、Ｗｅｂ会議に参加する複数の参加者それぞれの音声に関する音声データを複数の参加者それぞれの情報処理装置１００から受信する。続いて、配信サーバ２００は、受信した音声データを参加者（本人）以外の他の参加者の情報処理装置１００に送信する。また、配信サーバ２００は、複数の参加者それぞれの画像データを複数の参加者それぞれの情報処理装置１００から受信する。続いて、配信サーバ２００は、受信した画像データを参加者（本人）以外の他の参加者の情報処理装置１００に送信する。なお、配信サーバ２００は、参加者（本人）の画像データを配信しない場合、デフォルトの画像データ（例えば、参加者（本人）の名前やイニシャルを示す文字を含む画像データ）または参加者（本人）による設定等により登録された画像データを他の参加者の情報処理装置１００に送信する。 In addition, the distribution server 200 receives audio data relating to the voices of each of the multiple participants participating in the Web conference from the information processing devices 100 of the multiple participants. Subsequently, the distribution server 200 transmits the received voice data to the information processing devices 100 of other participants than the participant (the person himself/herself). Also, the distribution server 200 receives image data of each of the multiple participants from the information processing devices 100 of the multiple participants. Subsequently, the distribution server 200 transmits the received image data to the information processing devices 100 of participants other than the participant (the person himself/herself). When the distribution server 200 does not distribute the image data of the participant (principal), default image data (for example, image data including characters indicating the name or initials of the participant (principal)) or the participant (principal) The registered image data is transmitted to the information processing apparatuses 100 of the other participants.

また、情報処理装置１００は、画面を備え、配信サーバ２００から受信した他の参加者の画像を画面に表示する。具体的には、情報処理装置１００は、Ｗｅｂ会議における複数の参加者それぞれの参加者画像を含む全画面画像を画面に表示する。また、情報処理装置１００は、Ｗｅｂ会議における複数の参加者それぞれの参加者画像を画面のそれぞれ異なる表示領域に表示する。 The information processing apparatus 100 also has a screen, and displays images of other participants received from the distribution server 200 on the screen. Specifically, the information processing apparatus 100 displays on the screen a full-screen image including participant images of each of the plurality of participants in the web conference. Further, the information processing apparatus 100 displays the participant images of each of the plurality of participants in the Web conference in different display areas of the screen.

また、情報処理装置１００は、例えば、複数のスピーカーを備え、配信サーバ２００から受信した他の参加者の音声を複数のスピーカーそれぞれから出力する。具体的には、情報処理装置１００は、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者それぞれの音声を複数のスピーカーそれぞれから出力する。ここで、発言者の人数とスピーカーの台数は、異なってよい。より具体的には、複数のスピーカーから出力される音像の中で、複数の発言者それぞれの音像の配置関係が定位される。例えば、右のスピーカーから聞こえるような定位であっても、左のスピーカーからも音量をぐっと下げたり、遅延を設けたりすることで音飛来の方向を聞き手に感知させることができる。すなわち、スピーカーの台数は、発言者の人数より少なくてよい。あるいは、スピーカーの台数は、発言者の人数より多くてもよい。このように、情報処理装置１００は、複数の発言者それぞれの音声の音源を３次元的に異なる位置に配置することで、聞き手にとって、複数の発言者それぞれの音声を聞き分けやすくすることができる。なお、情報処理装置１００は、複数のスピーカーの代わりに、イヤホン（ヘッドホン）を備え、配信サーバ２００から受信した他の参加者の音声をイヤホン（ヘッドホン）から出力してもよい。 The information processing apparatus 100 also includes, for example, a plurality of speakers, and outputs the voices of other participants received from the distribution server 200 from each of the plurality of speakers. Specifically, the information processing apparatus 100 outputs the voices of each of the plurality of speakers from each of the plurality of speakers so that the voices of each of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions. Here, the number of speakers and the number of speakers may differ. More specifically, among the sound images output from the speakers, the positional relationship of the sound images of the plurality of speakers is localized. For example, even if the sound is localized so that it can be heard from the right speaker, it is possible to make the listener perceive the direction from which the sound is coming by lowering the volume of the left speaker as well or providing a delay. That is, the number of speakers may be less than the number of speakers. Alternatively, the number of speakers may be greater than the number of speakers. In this way, the information processing apparatus 100 arranges the sound sources of the voices of the plurality of speakers at three-dimensionally different positions, thereby making it easier for the listener to distinguish between the voices of the plurality of speakers. Note that the information processing apparatus 100 may include earphones (headphones) instead of the plurality of speakers, and output the voices of other participants received from the distribution server 200 from the earphones (headphones).

例えば、情報処理装置１００は、会議の開始時は、会議メタデータに基づいて、複数の発言者それぞれの音声の音源の位置（定位ともいう）、音量、および音声加工（残響処理等）の有無を決定する。また、情報処理装置１００は、会議中は、複数の発言者の発言を聞いている聞き手の複数の発言者それぞれに対する傾聴の度合いを示す傾聴度合に基づいて、複数の発言者それぞれの音声の定位、音量、および音声加工の有無を決定する。また、情報処理装置１００は、複数の発言者それぞれの音声が、それぞれの音声について決定された音声の定位から、それぞれの音声について決定された音量および音声加工された状態で聞こえるように、複数の発言者それぞれの音声を出力する。なお、傾聴度合についての詳細は後述する。また、情報処理装置１００は、利用者のアプリケーションの設定により、複数音声の分離加減（定位分離、音量、残響音のメリハリ具合）を変更する。 For example, at the start of the conference, the information processing apparatus 100 determines the position (also called localization) of the sound source of each speaker's voice (also called localization), volume, and presence/absence of voice processing (reverberation processing, etc.) for each speaker based on the conference metadata. to decide. In addition, during the conference, the information processing apparatus 100 determines the localization of the voices of the plurality of speakers based on the degree of listening that indicates the degree of listening to each of the plurality of speakers by the listener who is listening to the speech of the plurality of speakers. , volume, and the presence or absence of voice processing. In addition, the information processing apparatus 100 can be configured so that the voices of each of the plurality of speakers can be heard from the localization of the voices determined for each voice, with the volume determined for each voice, and in a voice-processed state. Outputs the voice of each speaker. Details of the listening degree will be described later. In addition, the information processing apparatus 100 changes the degree of separation of multiple voices (localization separation, volume, sharpness of reverberation sound) according to user application settings.

以下では、参加者ＩＤ「Ｕ１」により特定される参加者を「参加者Ｕ１」とする場合がある。このように、以下では、「参加者Ｕ＊（＊は任意の数値）」と記載した場合、その参加者は参加者ＩＤ「Ｕ＊」により特定される参加者であることを示す。例えば、「参加者Ｕ２」と記載した場合、その参加者は参加者ＩＤ「Ｕ２」により特定される参加者である。 Below, the participant identified by the participant ID "U1" may be referred to as "participant U1". Thus, hereinafter, when described as "participant U* (* is an arbitrary number)", it indicates that the participant is identified by the participant ID "U*". For example, when "participant U2" is described, the participant is the participant specified by the participant ID "U2".

また、以下では、情報処理装置１００を利用する参加者に応じて、情報処理装置１００を情報処理装置１００－１、１００－２として説明する。例えば、情報処理装置１００－１は、参加者Ｕ１により使用される情報処理装置１００である。また、例えば、情報処理装置１００－２は、参加者Ｕ２により使用される情報処理装置１００である。また、以下では、情報処理装置１００－１、１００－２について、特に区別なく説明する場合には、情報処理装置１００と記載する。 Further, hereinafter, the information processing device 100 will be described as the information processing devices 100-1 and 100-2 according to the participants using the information processing device 100. FIG. For example, the information processing device 100-1 is the information processing device 100 used by the participant U1. Further, for example, the information processing device 100-2 is the information processing device 100 used by the participant U2. Further, hereinafter, the information processing apparatuses 100-1 and 100-2 will be referred to as the information processing apparatus 100 when they are not distinguished from each other.

なお、上述した実施形態では、リモート会議がＷｅｂ会議である場合について説明したが、本実施形態に係るリモート会議は、Ｗｅｂ会議に限られない。例えば、本実施形態に係るリモート会議は、Ｗｅｂ会議の他にも、テレビ会議または電話会議であってもよい。 In the above-described embodiment, the remote conference is a web conference, but the remote conference according to this embodiment is not limited to the web conference. For example, the remote conference according to this embodiment may be a video conference or a telephone conference in addition to the web conference.

〔２．情報処理装置の構成例〕
図２は、実施形態に係る情報処理装置１００の構成例を示す図である。図２に示すように、情報処理装置１００は、通信部１１０と、記憶部１２０と、表示部１３０と、音声出力部１４０と、検出部１５０と、制御部１６０とを有する。なお、情報処理装置１００は、情報処理装置１００の利用者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）を有してもよい。 [2. Configuration example of information processing device]
FIG. 2 is a diagram illustrating a configuration example of the information processing apparatus 100 according to the embodiment. As shown in FIG. 2 , information processing apparatus 100 includes communication section 110 , storage section 120 , display section 130 , audio output section 140 , detection section 150 and control section 160 . Note that the information processing apparatus 100 may have an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the user of the information processing apparatus 100 or the like.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。また、通信部１１０は、ネットワークＮ（図示略）と有線又は無線で接続され、例えば、配信サーバ２００や他の情報処理装置１００との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Also, the communication unit 110 is connected to a network N (not shown) by wire or wirelessly, and transmits and receives information to and from the distribution server 200 and other information processing apparatuses 100, for example.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、又は、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２０は、各種プログラム（情報処理プログラムの一例に相当）を記憶する。例えば、記憶部１２０は、Ｗｅｂ会議アプリのプログラムを記憶する。 (storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 120 stores various programs (corresponding to an example of an information processing program). For example, the storage unit 120 stores a web conference application program.

また、記憶部１２０は、各種データを記憶する。例えば、記憶部１２０は、会議制御部１６１が取得した会議メタデータを記憶する。また、記憶部１２０は、生成部１６３が生成した議事進捗情報を記憶する。また、記憶部１２０は、マイクに入力された参加者の音声データを記憶する。また、記憶部１２０は、カメラによって撮影された参加者の画像データを記憶する。また、記憶部１２０は、参加者の画像を録画した録画データを記憶する。また、記憶部１２０は、会議制御部１６１が取得した他の参加者の音声データを記憶する。また、記憶部１２０は、会議制御部１６１が取得した他の参加者の画像データを記憶する。また、記憶部１２０は、会議制御部１６１が取得した他の参加者の画像を録画した録画データを記憶する。 In addition, the storage unit 120 stores various data. For example, the storage unit 120 stores conference metadata acquired by the conference control unit 161 . The storage unit 120 also stores the progress information generated by the generation unit 163 . The storage unit 120 also stores the voice data of the participants input to the microphone. The storage unit 120 also stores image data of the participants captured by the camera. The storage unit 120 also stores recorded data of images of participants. The storage unit 120 also stores voice data of other participants acquired by the conference control unit 161 . The storage unit 120 also stores image data of other participants acquired by the conference control unit 161 . The storage unit 120 also stores recorded data of images of other participants acquired by the conference control unit 161 .

（表示部１３０）
表示部１３０は、ディスプレイ等の画像出力デバイスによって実現される。表示部１３０は、会議制御部１６１または補助制御部１６５の制御に従って、各種情報を表示する。なお、情報処理装置１００にタッチパネルが採用される場合には、入力部と表示部１３０とは一体化される。また、以下の説明では、表示部１３０を画面と記載する場合がある。 (Display unit 130)
The display unit 130 is implemented by an image output device such as a display. The display unit 130 displays various information under the control of the conference control unit 161 or the auxiliary control unit 165 . Note that when the information processing apparatus 100 employs a touch panel, the input unit and the display unit 130 are integrated. Also, in the following description, the display unit 130 may be referred to as a screen.

具体的には、表示部１３０は、Ｗｅｂ会議における複数の参加者それぞれの参加者画像をそれぞれ異なる表示領域に表示する。例えば、表示部１３０は、会議制御部１６１の制御に従って、会議制御部１６１が取得した複数の参加者それぞれの参加者画像をそれぞれ異なる表示領域に表示する。 Specifically, the display unit 130 displays the participant images of each of the plurality of participants in the web conference in different display areas. For example, the display unit 130 displays the participant images of the plurality of participants acquired by the conference control unit 161 in different display areas under the control of the conference control unit 161 .

また、表示部１３０は、Ｗｅｂ会議における複数の参加者それぞれの参加者画像を含む全画面画像をさらに表示する。例えば、表示部１３０は、会議制御部１６１の制御に従って、会議制御部１６１が取得した複数の参加者それぞれの参加者画像を含む全画面画像を表示する。 Moreover, the display unit 130 further displays a full-screen image including participant images of each of the plurality of participants in the web conference. For example, the display unit 130 displays a full-screen image including the participant images of each of the multiple participants acquired by the conference control unit 161 under the control of the conference control unit 161 .

（音声出力部１４０）
音声出力部１４０は、スピーカー等の音声出力デバイスによって実現される。音声出力部１４０は、例えば、聞き手の左右に配置された２つのスピーカーによって実現される。例えば、音声出力部１４０は、聞き手の情報処理装置１００から見て所定距離だけ左の位置に設置されたスピーカー（以下、左スピーカーともいう）および情報処理装置１００本体から見て所定距離だけ右の位置に設置されたスピーカー（以下、右スピーカーともいう）によって実現される。例えば、音声出力部１４０は、ステレオ方式により、左右２つのスピーカーによって音声を出力する。 (Audio output unit 140)
Audio output unit 140 is realized by an audio output device such as a speaker. The audio output unit 140 is implemented by, for example, two speakers arranged on the left and right sides of the listener. For example, the audio output unit 140 includes a speaker (hereinafter also referred to as a left speaker) installed at a predetermined distance to the left of the information processing device 100 of the listener and a speaker located at a predetermined distance to the right of the information processing device 100 main body. This is realized by a speaker (hereinafter also referred to as a right speaker) installed at the position. For example, the audio output unit 140 outputs audio through two left and right speakers in a stereo system.

また、音声出力部１４０は、Ｗｅｂ会議における複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者それぞれの音声を出力する。例えば、音声出力部１４０は、会議制御部１６１の制御に従って、会議制御部１６１が取得した複数の発言者それぞれの音声が、それぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者それぞれの音声を出力する。また、以下の説明では、音声出力部１４０をスピーカーと記載する場合がある。 In addition, the audio output unit 140 outputs the voices of the plurality of speakers so that the voices of the plurality of speakers in the Web conference can be heard from the plurality of sound sources arranged at different positions. For example, under the control of the conference control unit 161, the audio output unit 140 outputs a plurality of utterances so that each of the plurality of speakers' voices acquired by the conference control unit 161 can be heard from each of a plurality of sound sources arranged at different positions. output the voice of each person. Also, in the following description, the audio output unit 140 may be referred to as a speaker.

（検出部１５０）
検出部１５０は、各種センサデバイスによって実現される。例えば、検出部１５０は、音センサであるマイク等の集音デバイスによって実現される。音センサは、参加者の音声などを集音し、集音した音声データを制御部１６０に出力する。また、以下の説明では、音センサをマイクと記載する場合がある。 (Detector 150)
The detection unit 150 is realized by various sensor devices. For example, the detection unit 150 is realized by a sound collecting device such as a microphone, which is a sound sensor. The sound sensor collects the voices of the participants and outputs the collected voice data to the control unit 160 . Also, in the following description, the sound sensor may be referred to as a microphone.

また、検出部１５０は、画像センサであるカメラ等の撮像デバイスによって実現される。画像センサは、参加者などの画像を撮影し、撮影した画像データを制御部１６０に出力する。また、以下の説明では、画像センサをカメラと記載する場合がある。 Also, the detection unit 150 is implemented by an imaging device such as a camera, which is an image sensor. The image sensor captures images of participants and the like, and outputs captured image data to the control unit 160 . Also, in the following description, the image sensor may be referred to as a camera.

また、検出部１５０は、聞き手の視線を検出する。具体的には、検出部１５０は、公知の視線検出技術を用いて、聞き手の視線を検出する。例えば、検出部１５０は、カメラによって取得された聞き手の画像に含まれる聞き手の目の目頭と虹彩の位置関係に基づいて、聞き手の視線の方向を特定する。 Further, the detection unit 150 detects the line of sight of the listener. Specifically, the detection unit 150 detects the line of sight of the listener using a known line of sight detection technique. For example, the detection unit 150 identifies the listener's line of sight direction based on the positional relationship between the inner corners of the listener's eyes and the iris included in the listener's image acquired by the camera.

あるいは、検出部１５０は、赤外線ＬＥＤおよび赤外線カメラを備える。検出部１５０は、赤外線ＬＥＤで聞き手の顔を照らした状態で、赤外線カメラによって聞き手の顔を撮影する。検出部１５０は、赤外線カメラによって取得された聞き手の画像に含まれる聞き手の目の角膜反射と瞳孔の位置関係に基づいて、聞き手の視線の方向を特定してもよい。 Alternatively, detection unit 150 includes an infrared LED and an infrared camera. The detector 150 photographs the listener's face with an infrared camera while the infrared LED illuminates the listener's face. The detection unit 150 may identify the listener's line of sight direction based on the positional relationship between the corneal reflection of the listener's eye and the pupil included in the listener's image acquired by the infrared camera.

（制御部１６０）
制御部１６０は、コントローラ（Controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等によって、情報処理装置１００の内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭ等の記憶領域を作業領域として実行されることにより実現される。 (control unit 160)
The control unit 160 is a controller, and for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like controls the information processing apparatus 100. Various programs (corresponding to an example of an information processing program) stored in the internal storage device are executed by using a storage area such as a RAM as a work area.

図２に示すように、制御部１６０は、会議制御部１６１と、算出部１６２と、生成部１６３と、取得部１６４と、補助制御部１６５とを有し、以下に説明する情報処理の作用を実現または実行する。なお、制御部１６０の内部構成は、図２に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 2, the control unit 160 includes a conference control unit 161, a calculation unit 162, a generation unit 163, an acquisition unit 164, and an auxiliary control unit 165. realize or perform Note that the internal configuration of the control unit 160 is not limited to the configuration shown in FIG. 2, and may be another configuration as long as it performs information processing described later.

（会議制御部１６１）
会議制御部１６１は、入力部を介して参加者の操作を受け付けると、Ｗｅｂ会議アプリを起動する。また、会議制御部１６１は、Ｗｅｂ会議アプリを起動すると、カメラおよびマイクを起動する。続いて、会議制御部１６１は、マイクが検出した参加者（本人）の音声に関する音声データおよびカメラが検出した参加者（本人）の画像データを参加者識別情報とともに配信サーバ２００に送信する。 (Meeting control unit 161)
The conference control unit 161 activates the web conference application upon receiving the operation of the participant via the input unit. Also, when the web conference application is activated, the conference control unit 161 activates the camera and the microphone. Subsequently, the conference control unit 161 transmits audio data relating to the voice of the participant (principal) detected by the microphone and image data of the participant (principal) detected by the camera to the distribution server 200 together with the participant identification information.

また、会議制御部１６１は、参加者識別情報とともに、参加者に関する基本情報である参加者情報（参加者の本人情報、会議における参加者の役割（「プレゼンター」、「参加者」等）、デバイスおよびアプリケーションに関する情報、ＩＰアドレス等）をメタデータとして配信サーバ２００に送信する。 In addition to the participant identification information, the conference control unit 161 also controls the participant information, which is basic information about the participants (personal information of the participants, roles of the participants in the conference (“presenter”, “participant”, etc.), device and information about the application, IP address, etc.) are transmitted to the distribution server 200 as metadata.

また、会議制御部１６１は、配信サーバ２００から会議メタデータを取得する。例えば、会議制御部１６１は、参加者Ｕ１１および参加者Ｕ２１～Ｕ２３の４名の参加者によるＷｅｂ会議の会議メタデータを取得する。会議制御部１６１は、会議メタデータを取得すると、会議メタデータに基づいて、会議の開始時における複数の発言者それぞれの参加者画像の配置を決定する。 Also, the conference control unit 161 acquires conference metadata from the distribution server 200 . For example, the conference control unit 161 acquires conference metadata of a web conference by four participants, participant U11 and participants U21 to U23. After acquiring the conference metadata, the conference control unit 161 determines the layout of the participant images of each of the multiple speakers at the start of the conference, based on the conference metadata.

図３は、実施形態に係る画面の一例を示す図である。図３では、参加者Ｕ１１および参加者Ｕ２１～Ｕ２３の４名の参加者が参加するＷｅｂ会議において、３名の発言者である参加者Ｕ２１～Ｕ２３（以下、発言者Ｕ２１～Ｕ２３ともいう）の発言を聞いている聞き手である参加者Ｕ１１（以下、聞き手Ｕ１１ともいう）の情報処理装置１００－１１の画面の一例を示す。会議制御部１６１は、図３に示すような画面を表示部１３０－１１に表示する。 FIG. 3 is a diagram illustrating an example of a screen according to the embodiment; In FIG. 3, in a Web conference in which four participants, a participant U11 and participants U21 to U23, participate, participants U21 to U23 (hereinafter also referred to as speakers U21 to U23) who are three speakers An example of the screen of the information processing device 100-11 of the participant U11 (hereinafter also referred to as the listener U11) who is listening to the speech is shown. Conference control unit 161 displays a screen as shown in FIG. 3 on display unit 130-11.

図３では、会議制御部１６１は、Ｗｅｂ会議における３名の参加者Ｕ２１～Ｕ２３それぞれの参加者画像Ｇ２１～Ｇ２３を配信サーバ２００から取得する。続いて、会議制御部１６１は、３名の参加者Ｕ２１～Ｕ２３それぞれの参加者画像Ｇ２１～Ｇ２３を画面のそれぞれ異なる表示領域Ｆ２１～Ｆ２３に表示する。図３に示す例では、会議制御部１６１は、会議メタデータのうち、会議における参加者の役割を示す情報に基づいて、会議の開始時における参加者Ｕ２１～Ｕ２３それぞれの参加者画像Ｇ２１～Ｇ２３を表示する表示領域Ｆ２１～Ｆ２３の位置を決定する。 In FIG. 3, the conference control unit 161 acquires the participant images G21 to G23 of the three participants U21 to U23 in the web conference from the distribution server 200, respectively. Subsequently, the conference control unit 161 displays the participant images G21 to G23 of the three participants U21 to U23 respectively in different display areas F21 to F23 of the screen. In the example shown in FIG. 3, the conference control unit 161 generates participant images G21 to G23 of the participants U21 to U23 at the start of the conference based on the information indicating the roles of the participants in the conference among the conference metadata. are determined.

例えば、会議制御部１６１は、参加者Ｕ２１の役割が「プレゼンター」であることに基づいて、会議の開始時における参加者Ｕ２１の参加者画像Ｇ２１を中央の表示領域Ｆ２１に配置することを決定する。また、会議制御部１６１は、表示領域Ｆ２１に参加者Ｕ２１の音源の位置を示すアイコンＧ２１１を表示する。 For example, the conference control unit 161 determines to arrange the participant image G21 of the participant U21 at the start of the conference in the central display area F21 based on the fact that the role of the participant U21 is "presenter". . The conference control unit 161 also displays an icon G211 indicating the position of the sound source of the participant U21 in the display area F21.

また、会議制御部１６１は、残りの参加者Ｕ２２（参加者Ｕ２３）の役割が「参加者」であることに基づいて、会議の開始時における参加者Ｕ２２（参加者Ｕ２３）の参加者画像Ｇ２２（参加者画像Ｇ２３）を中央より右の表示領域Ｆ２２（中央より左の表示領域Ｆ２３）に配置することを決定する。また、会議制御部１６１は、表示領域Ｆ２２（表示領域Ｆ２３）に参加者Ｕ２２（参加者Ｕ２３）の音源の位置を示すアイコンＧ２２１（アイコンＧ２３１）を表示する。 Further, the conference control unit 161 changes the participant image G22 of the participant U22 (participant U23) at the start of the conference based on the fact that the role of the remaining participant U22 (participant U23) is "participant". (Participant image G23) is determined to be arranged in the display area F22 on the right of the center (the display area F23 on the left of the center). The conference control unit 161 also displays an icon G221 (icon G231) indicating the position of the sound source of the participant U22 (participant U23) in the display area F22 (display area F23).

また、会議制御部１６１は、参加者Ｕ２１～Ｕ２３それぞれの参加者画像Ｇ２１～Ｇ２３を含む全画面画像Ｇ１１を画面に表示する。なお、全画面画像Ｇ１１のうち、参加者画像Ｇ２１～Ｇ２３以外の領域には、会議に関する資料等が表示されるが、図３では描画を省略する。 Further, the conference control unit 161 displays on the screen a full-screen image G11 including the participant images G21 to G23 of the participants U21 to U23, respectively. In the full-screen image G11, materials related to the conference are displayed in areas other than the participant images G21 to G23, but drawing is omitted in FIG.

図４は、実施形態に係る複数音声の３次元配置の一例を示す図である。図４では、参加者Ｕ１１および参加者Ｕ２１～Ｕ２３の４名の参加者が参加するＷｅｂ会議において、３名の発言者Ｕ２１～Ｕ２３の発言を聞いている聞き手である参加者Ｕ１１にとっての３名の発言者Ｕ２１～Ｕ２３それぞれの音声の音源の配置の一例を示す。 FIG. 4 is a diagram showing an example of three-dimensional arrangement of multiple voices according to the embodiment. In FIG. 4, in a web conference in which four participants, the participant U11 and the participants U21 to U23, participate, three speakers for the participant U11 who is a listener listening to the statements of the three speakers U21 to U23 shows an example of the arrangement of the sound sources of the respective speakers U21 to U23.

図４に示す例では、聞き手Ｕ１１の情報処理装置１００－１１から見て所定距離だけ左の位置に左スピーカー１４０－１１－Ｌが配置されている。また、聞き手Ｕ１１の情報処理装置１００－１１から見て所定距離だけ右の位置に右スピーカー１４０－１１－Ｒが配置されている。また、情報処理装置１００－１１の表示部１３０－１１には、発言者Ｕ２１～Ｕ２３の参加者画像Ｇ２１～Ｇ２３がそれぞれ異なる表示領域に表示されている。また、情報処理装置１００－１１の表示部１３０－１１の上部には、検出部１５０－１１を構成するカメラが設置されている。 In the example shown in FIG. 4, a left speaker 140-11-L is arranged at a predetermined distance to the left of the information processing device 100-11 of the listener U11. A right speaker 140-11-R is arranged at a predetermined distance to the right of the information processing device 100-11 of the listener U11. Participant images G21 to G23 of speakers U21 to U23 are displayed in different display areas on display unit 130-11 of information processing device 100-11. Further, a camera constituting a detection unit 150-11 is installed above the display unit 130-11 of the information processing device 100-11.

図４では、会議制御部１６１は、Ｗｅｂ会議における３名の発言者Ｕ２１～Ｕ２３それぞれの音声データを配信サーバ２００から取得する。続いて、会議制御部１６１は、複数の発言者Ｕ２１～Ｕ２３それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者Ｕ２１～Ｕ２３それぞれの音声を出力する。図４に示す例では、会議制御部１６１は、会議メタデータのうち、会議における参加者の役割を示す情報に基づいて、会議の開始時における複数の発言者それぞれの音声の音源の位置（定位ともいう）、音量、および音声加工（残響処理等）の有無を決定する。 In FIG. 4, the conference control unit 161 acquires from the distribution server 200 the voice data of each of the three speakers U21 to U23 in the web conference. Subsequently, the conference control unit 161 outputs the voices of the plurality of speakers U21 to U23 so that the voices of the plurality of speakers U21 to U23 can be heard from the plurality of sound sources arranged at different positions. In the example shown in FIG. 4, the conference control unit 161 determines the position (orientation) of the sound source of each of the plurality of speakers at the start of the conference, based on the information indicating the roles of the participants in the conference among the conference metadata. ), volume, and presence/absence of audio processing (reverberation processing, etc.).

例えば、会議制御部１６１は、参加者Ｕ２１の役割が「プレゼンター」であることに基づいて、会議の開始時における参加者Ｕ２１の音声の音源を聞き手Ｕ１１にとって最も聞きやすい目の前（中央）の位置に配置することを決定する。また、会議制御部１６１は、残りの参加者Ｕ２２（参加者Ｕ２３）の役割が「参加者」であることに基づいて、会議の開始時における参加者Ｕ２２（参加者Ｕ２３）の音声の音源を参加者Ｕ２１の音声よりも聞きにくい中央より右の位置（中央より左の位置）に配置することを決定する。 For example, based on the fact that the role of the participant U21 is "presenter", the conference control unit 161 places the sound source of the voice of the participant U21 at the start of the conference in front of the listener U11 (in the center), where it is easiest for the listener U11 to hear. Decide to put it in position. Further, based on the fact that the role of the remaining participants U22 (participant U23) is "participant", the conference control unit 161 selects the sound source of the voice of the participant U22 (participant U23) at the start of the conference. It is decided to place it at a position to the right of the center (position to the left of the center) where it is harder to hear than the voice of the participant U21.

また、会議制御部１６１は、参加者Ｕ２１の役割が「プレゼンター」であることに基づいて、会議の開始時における参加者Ｕ２１の音声の音量を、聞き手Ｕ１１にとって最も聞きやすい大きい音量（例えば、「１０」）にすることを決定する。また、会議制御部１６１は、残りの参加者Ｕ２２（参加者Ｕ２３）の役割が「参加者」であることに基づいて、会議の開始時における参加者Ｕ２２（参加者Ｕ２３）の音声の音量を参加者Ｕ２１の音声の音量よりも小さい音量（例えば、「７」）にすることを決定する。 Further, based on the fact that the role of the participant U21 is "presenter", the conference control unit 161 sets the volume of the voice of the participant U21 at the start of the conference to the loudest volume (for example, " 10”). Further, the conference control unit 161 adjusts the volume of the voice of the participant U22 (participant U23) at the start of the conference based on the fact that the role of the remaining participant U22 (participant U23) is "participant". Decide to set the volume (for example, "7") to be smaller than the volume of the voice of the participant U21.

また、一般的に、複数の音声のうち、一部の音声に対して音声加工（例えば、残響処理）を施すと、音声加工を施さない音声に対して脳が集中しやすくなることが知られている。そこで、会議制御部１６１は、参加者Ｕ２１の役割が「プレゼンター」であることに基づいて、会議の開始時における参加者Ｕ２１の音声に対する残響処理を行わないことを決定する。また、会議制御部１６１は、残りの参加者Ｕ２２（参加者Ｕ２３）の役割が「参加者」であることに基づいて、会議の開始時における参加者Ｕ２２（参加者Ｕ２３）の音声に対する残響処理を行うことを決定する。 In addition, it is generally known that if a part of a plurality of sounds is subjected to sound processing (for example, reverberation processing), the brain will be more likely to concentrate on the unprocessed sound. ing. Therefore, the conference control unit 161 determines not to perform reverberation processing on the voice of the participant U21 at the start of the conference, based on the fact that the role of the participant U21 is "presenter". Further, the conference control unit 161 performs reverberation processing on the voice of the participant U22 (participant U23) at the start of the conference based on the fact that the role of the remaining participant U22 (participant U23) is "participant". decide to do

（算出部１６２）
算出部１６２は、複数の参加者が参加するＷｅｂ会議において、複数の発言者の発言を聞いている聞き手による複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を複数の発言者それぞれについて算出する。具体的には、算出部１６２は、検出部１５０によって検出された視線の方向に基づいて、傾聴度合を算出する。より具体的には、算出部１６２は、検出部１５０によって検出された視線の方向に基づいて、聞き手に注視されている発言者を特定し、特定された発言者の傾聴度合を他の発言者の傾聴度合よりも高く算出する。 (Calculation unit 162)
In a Web conference in which a plurality of participants participate, the calculation unit 162 calculates, for each of the plurality of speakers, the listening degree indicating the degree of listening to the speech of each of the plurality of speakers by the listener who is listening to the speech of the plurality of speakers. calculate. Specifically, the calculator 162 calculates the degree of attentive listening based on the direction of the line of sight detected by the detector 150 . More specifically, the calculation unit 162 identifies the speaker being watched by the listener based on the line-of-sight direction detected by the detection unit 150, and calculates the degree of listening of the identified speaker to other speakers. It is calculated to be higher than the listening degree of

図５は、実施形態に係る傾聴度合について説明するための図である。図５は、図４における傾聴度合について説明する。図５では、算出部１６２は、検出部１５０によって検出された聞き手Ｕ１１の視線の方向に基づいて、聞き手Ｕ１１に注視されている発言者Ｕ２１を特定し、特定された発言者Ｕ２１の傾聴度合を他の発言者Ｕ２２およびＵ２３の傾聴度合よりも高く算出する。例えば、算出部１６２は、聞き手Ｕ１１に注視されている発言者Ｕ２１の傾聴度合を他の発言者Ｕ２２およびＵ２３の傾聴度合である「７０」よりも高い「１００」と算出する。また、以下では、他の発言者と比べて、相対的に傾聴度合が高い発言者を「高傾聴発言者」と記載する。また、他の発言者と比べて、相対的に傾聴度合が低い発言者を「低傾聴発言者」と記載する。図４および図５に示す例では、他の発言者Ｕ２２およびＵ２３と比べて、相対的に傾聴度合が高い発言者Ｕ２１が高傾聴発言者である。また、他の発言者Ｕ２１と比べて、相対的に傾聴度合が低い発言者Ｕ２２およびＵ２３が低傾聴発言者である。 FIG. 5 is a diagram for explaining the listening degree according to the embodiment. FIG. 5 explains the degree of listening in FIG. In FIG. 5, the calculation unit 162 identifies the speaker U21 being watched by the listener U11 based on the direction of the listener U11's line of sight detected by the detection unit 150, and calculates the listening degree of the identified speaker U21. It is calculated to be higher than the listening degree of the other speakers U22 and U23. For example, the calculation unit 162 calculates the listening degree of the speaker U21, who is being watched by the listener U11, as "100", which is higher than the listening degrees of the other speakers U22 and U23, which are "70". Also, hereinafter, a speaker whose degree of listening is relatively high compared to other speakers is referred to as a "highly listening speaker". In addition, a speaker whose degree of listening is relatively low compared to other speakers is described as a “speaker with low listening intent”. In the examples shown in FIGS. 4 and 5, the speaker U21, whose degree of listening is relatively high compared to the other speakers U22 and U23, is the high listening speaker. In addition, speakers U22 and U23, whose degree of listening is relatively low compared to other speaker U21, are low-listening speakers.

（生成部１６３）
生成部１６３は、Ｗｅｂ会議における発言ごとに、発言を文字変換した文字情報と、発言の発言時刻と、発言の発言者とを対応付けて記録した議事進捗情報を生成する。例えば、生成部１６３は、会議制御部１６１が取得した他の参加者の音声データおよび検出部１５０のマイクが検出した参加者の音声データに基づいて、公知の音声認識技術を用いて、各発言に対応する音声データを文字変換した文字情報を生成する。また、生成部１６３は、会議制御部１６１が他の参加者の音声データとともに取得した他の参加者の参加者識別情報に基づいて、各発言の発言者を特定する。また、生成部１６３は、会議制御部１６１が取得した他の参加者の音声データの取得時刻（発言時刻に相当）および検出部１５０のマイクが検出した参加者の音声データの取得時刻（発言時刻に相当）に基づいて、各発言の発言時刻を特定する。続いて、生成部１６３は、各発言を文字変換した文字情報と、各発言の発言時刻と、各発言の発言者とを対応付けて記録した議事進捗情報を生成する。生成部１６３は、議事進捗情報を生成すると、生成した議事進捗情報を記憶部１２０に記憶する。 (Generating unit 163)
The generation unit 163 generates proceedings progress information in which character information obtained by converting the statement into text, the statement time of the statement, and the speaker of the statement are recorded in association with each other for each statement in the Web conference. For example, the generating unit 163 uses a known voice recognition technology based on the voice data of the other participants acquired by the conference control unit 161 and the voice data of the participants detected by the microphone of the detecting unit 150 to generate each utterance. Character information is generated by converting the voice data corresponding to the character. Further, the generation unit 163 identifies the speaker of each utterance based on the participant identification information of the other participants acquired by the conference control unit 161 together with the voice data of the other participants. In addition, the generation unit 163 generates the acquisition time (corresponding to the speech time) of the voice data of the other participants acquired by the conference control unit 161 and the acquisition time of the voice data of the participant detected by the microphone of the detection unit 150 (the speech time). ), the utterance time of each utterance is specified. Subsequently, the generation unit 163 generates proceedings progress information in which character information obtained by character conversion of each utterance, utterance time of each utterance, and speaker of each utterance are recorded in association with each other. After generating the proceedings progress information, the generation unit 163 stores the generated proceedings progress information in the storage unit 120 .

（取得部１６４）
取得部１６４は、Ｗｅｂ会議の前に、事前に設定されたキーワードやフレーズ等である設定キーワードを取得する。例えば、取得部１６４は、配信サーバ２００からキーワードリストを取得する。続いて、取得部１６４は、キーワードリストを取得すると、取得したキーワードリストに基づいて、参加者の参加者識別情報と対応付けられた設定キーワードを取得する。 (Acquisition unit 164)
The acquisition unit 164 acquires set keywords, such as keywords and phrases set in advance, before the Web conference. For example, the acquisition unit 164 acquires a keyword list from the distribution server 200. FIG. Subsequently, when acquiring the keyword list, the acquisition unit 164 acquires set keywords associated with the participant identification information of the participants based on the acquired keyword list.

（補助制御部１６５）
補助制御部１６５は、傾聴度合が高い高傾聴発言者と比べて、傾聴度合が低い低傾聴発言者の発言を、聞き手にとって聞きやすくする補助機能を提供する。具体的には、補助制御部１６５は、低傾聴発言者の発言を文字変換した文字情報の中から、事前に設定された文字列と一致する文字列が検出された場合に、聞き手に対して低傾聴発言者の発言への注意を促す強調処理を行う。 (Auxiliary control unit 165)
The auxiliary control unit 165 provides an auxiliary function that makes it easier for listeners to hear the speech of a low-attentive speaker with a low degree of listening than a high-attentive speaker with a high degree of listening. Specifically, when a character string that matches a preset character string is detected from the character information obtained by converting the utterance of the low-attentive speaker, the auxiliary control unit 165 notifies the listener of Emphasis processing is performed to call attention to the remarks of the speaker who listens less attentively.

図６は、実施形態に係る補助機能の一例を示す図である。図６は、発言者Ｕ２２の発言に聞き手Ｕ１１の設定キーワードである「キーワード＃１」（例えば、「キーワード＃１」は、聞き手Ｕ１１の名前であってよい。）が含まれる点が図４と異なる。補助制御部１６５は、記憶部１２０を参照して、図４における低傾聴発言者である発言者Ｕ２２の発言を文字変換した文字情報の中から、聞き手Ｕ１１の設定キーワードである「キーワード＃１」と一致する文字列を検出する。 FIG. 6 is a diagram illustrating an example of an auxiliary function according to the embodiment; FIG. 6 differs from FIG. 4 in that the utterance of speaker U22 includes "keyword #1" (for example, "keyword #1" may be the name of listener U11), which is the set keyword of listener U11. different. The auxiliary control unit 165 refers to the storage unit 120, and selects "keyword #1", which is the set keyword of the listener U11, from the character information obtained by converting the utterance of the speaker U22, who is a speaker with low listening attention in FIG. Find strings that match .

補助制御部１６５は、発言者Ｕ２２の発言に聞き手Ｕ１１の設定キーワードである「キーワード＃１」と一致する文字列を検出した場合に、強調処理の一例として、低傾聴発言者である発言者Ｕ２２の音声の音量を他の発言者Ｕ２１およびＵ２３の音声の音量よりも大きくして出力するよう音声出力部１４０を制御する。例えば、補助制御部１６５は、低傾聴発言者である発言者Ｕ２２の音声の音量を、聞き手Ｕ１１にとって最も聞きやすい大きい音量（例えば、「１０」）にして出力するよう音声出力部１４０を制御する。また、補助制御部１６５は、他の発言者Ｕ２１およびＵ２３の音声の音量を、発言者Ｕ２２の音声の音量よりも小さい音量（例えば、「７」）にして出力するよう音声出力部１４０を制御する。 When the auxiliary control unit 165 detects a character string that matches the set keyword of the listener U11, "keyword #1", in the utterance of the speaker U22, the auxiliary control unit 165 performs an example of the emphasizing process. The voice output unit 140 is controlled so that the voice volume of U21 and U23 is louder than the voices of the other speakers U21 and U23. For example, the auxiliary control unit 165 controls the voice output unit 140 so that the volume of the voice of the speaker U22, who is a speaker with low listening comprehension, is set to the loudest volume (for example, "10") that is easy for the listener U11 to hear. . In addition, the auxiliary control unit 165 controls the audio output unit 140 so that the volume of the voices of the other speakers U21 and U23 is set to be lower than the volume of the voice of the speaker U22 (for example, "7"). do.

また、補助制御部１６５は、強調処理の一例として、低傾聴発言者である発言者Ｕ２２の音声に対する残響処理を行うことなく発言者Ｕ２２の音声を出力するよう音声出力部１４０を制御する。また、補助制御部１６５は、強調処理の一例として、他の発言者Ｕ２１およびＵ２３の音声に対する残響処理を行った状態で他の発言者Ｕ２１およびＵ２３の音声を出力するよう音声出力部１４０を制御する。 In addition, as an example of emphasis processing, the auxiliary control unit 165 controls the audio output unit 140 to output the voice of the speaker U22 without performing reverberation processing on the voice of the speaker U22 who is a low-attentive speaker. In addition, as an example of emphasis processing, the auxiliary control unit 165 controls the audio output unit 140 to output the voices of the other speakers U21 and U23 while performing reverberation processing on the voices of the other speakers U21 and U23. do.

また、補助制御部１６５は、強調処理の一例として、複数の参加者画像のうち、低傾聴発言者である発言者Ｕ２２の参加者画像Ｇ２２を視覚的に強調して表示するよう表示部１３０を制御する。例えば、補助制御部１６５は、低傾聴発言者である発言者Ｕ２２の参加者画像Ｇ２２をハイライト表示または点滅表示するよう表示部１３０を制御する。 In addition, as an example of the emphasis processing, the auxiliary control unit 165 causes the display unit 130 to visually emphasize and display the participant image G22 of the speaker U22, who is a low-attentive speaker, among the plurality of participant images. Control. For example, the auxiliary control unit 165 controls the display unit 130 to highlight or blink the participant image G22 of the speaker U22 who is a low-attentive speaker.

なお、図６では図示を省略するが、補助制御部１６５は、強調処理の一例として、検出された文字列に関する情報を表示するよう表示部１３０を制御してもよい。図６の例では、補助制御部１６５は、検出された設定キーワードである「キーワード＃１」を表示するよう表示部１３０を制御する。 Although not shown in FIG. 6, the auxiliary control unit 165 may control the display unit 130 to display information about the detected character string as an example of the highlighting process. In the example of FIG. 6, the auxiliary control unit 165 controls the display unit 130 to display the detected setting keyword "keyword #1".

図７は、実施形態に係る補助機能の一例を示す図である。図７は、図４における低傾聴発言者である発言者Ｕ２２の音声に対応する音源の位置と図４における高傾聴発言者であるＵ２１の音声に対応する音源の位置が入れ替わっている点が図６と異なる。 FIG. 7 is a diagram illustrating an example of an auxiliary function according to the embodiment; In FIG. 7, the position of the sound source corresponding to the voice of speaker U22, who is a low-attentive speaker in FIG. 4, and the position of the sound source corresponding to the voice of U21, who is a high-attentive speaker in FIG. different from 6.

補助制御部１６５は、発言者Ｕ２２の発言に聞き手Ｕ１１の設定キーワードである「キーワード＃１」と一致する文字列を検出した場合に、強調処理の一例として、低傾聴発言者である発言者Ｕ２２の音声に対応する音源の位置を元の位置（図４および図６における「右」の位置）から高傾聴発言者である発言者Ｕ２１の音声に対応する音源の位置（図４および図６における「中央」の位置）へ変更するよう音声出力部１４０を制御する。 When the auxiliary control unit 165 detects a character string that matches the set keyword of the listener U11, "keyword #1", in the utterance of the speaker U22, the auxiliary control unit 165 performs an example of the emphasizing process. from the original position (the "right" position in FIGS. 4 and 6) to the position of the sound source corresponding to the voice of speaker U21, who is a speaker with high listening attention (in FIGS. 4 and 6). control the audio output unit 140 to change to the “center” position).

また、補助制御部１６５は、強調処理の一例として、複数の参加者画像のうち、低傾聴発言者である発言者Ｕ２２の参加者画像Ｇ２２の表示位置を元の位置（図４および図６における「右」の位置）から高傾聴発言者である発言者Ｕ２１の参加者画像Ｇ２１の表示位置（図４および図６における「中央」の位置）へ変更するよう表示部１３０を制御する。 In addition, as an example of the emphasis processing, the auxiliary control unit 165 shifts the display position of the participant image G22 of the speaker U22 who is a speaker with low listening attention among the plurality of participant images to the original position ( The display unit 130 is controlled to change from the "right" position) to the display position of the participant image G21 of the speaker U21, who is a speaker with high listening intent (the "center" position in FIGS. 4 and 6).

なお、上述した実施形態では、補助制御部１６５が、低傾聴発言者の発言の中に設定キーワードと一致する文字列を検出した場合に、低傾聴発言者の音声に対応する音源の位置を元の位置から高傾聴発言者に対応する音源の位置へ変更するよう音声出力部１４０を制御する場合について説明したが、発言者の定位位置を変更するトリガーは、設定キーワードの検出に限られない。具体的には、検出部１５０は、聞き手の視線の方向の変化を検出する。例えば、検出部１５０は、聞き手の視線の方向が高傾聴発言者から低傾聴発言者へ変化したことを検出する。補助制御部１６５は、検出部１５０によって検出された聞き手の視線の方向の変化に基づいて、聞き手に注視され始めた低傾聴発言者を特定する。続いて、補助制御部１６５は、特定された低傾聴発言者を聞き手が注視している時間の長さが所定の閾値を超えたか否かを判定する。補助制御部１６５は、特定された低傾聴発言者を聞き手が注視している時間の長さが所定の閾値を超えたと判定した場合、低傾聴発言者の音声に対応する音源の位置を元の位置から高傾聴発言者に対応する音源の位置へ変更するよう音声出力部１４０を制御する。 Note that in the above-described embodiment, when the auxiliary control unit 165 detects a character string that matches the set keyword in the utterance of the low-attentive speaker, Although a case has been described in which the voice output unit 140 is controlled to change from the position of the speaker to the position of the sound source corresponding to the speaker with high listening attention, the trigger for changing the localization position of the speaker is not limited to the detection of the set keyword. Specifically, the detection unit 150 detects a change in the direction of the listener's line of sight. For example, the detection unit 150 detects that the direction of the listener's line of sight has changed from a speaker with high listening attention to a speaker with low listening attention. The auxiliary control unit 165 identifies the low-attentive speaker who is starting to be watched by the listener, based on the change in the direction of the listener's line of sight detected by the detection unit 150 . Subsequently, the auxiliary control unit 165 determines whether or not the length of time during which the listener watches the specified low-attentive speaker has exceeded a predetermined threshold. If the auxiliary control unit 165 determines that the length of time during which the listener is gazing at the specified low-attentive speaker exceeds a predetermined threshold value, the auxiliary control unit 165 restores the position of the sound source corresponding to the low-attentive speaker's voice to the original position. The voice output unit 140 is controlled to change from the position to the position of the sound source corresponding to the speaker who listens highly.

なお、算出部１６２は、補助制御部１６５の代わりに、検出部１５０によって検出された聞き手の視線の方向の変化に基づいて、聞き手に注視され始めた低傾聴発言者を特定してもよい。続いて、算出部１６２は、特定された低傾聴発言者を聞き手が注視している時間の長さが所定の閾値を超えたか否かを判定する。続いて、算出部１６２は、特定された低傾聴発言者を聞き手が注視している時間の長さが所定の閾値を超えたと判定した場合、特定された低傾聴発言者の傾聴度合を他の発言者の傾聴度合よりも高く算出する。すなわち、算出部１６２は、聞き手によって注視されている時間の長さが所定の閾値を超えた発言者の傾聴度合を他の発言者の傾聴度合よりも高く算出する。つまり、聞き手によって注視されている時間の長さが所定の閾値を超えた発言者は、低傾聴発言者から高傾聴発言者へと変更されてもよい。補助制御部１６５は、算出部１６２によって算出された傾聴度合に基づいて、新たに高傾聴発言者へと変更された発言者（元の低傾聴発言者）の音声に対応する音源の位置を元の位置から元の高傾聴発言者に対応する音源の位置へ変更するよう音声出力部１４０を制御してもよい。 Instead of the auxiliary control unit 165, the calculation unit 162 may identify a low-attentive speaker who is starting to be watched by the listener, based on the change in the direction of the listener's line of sight detected by the detection unit 150. Subsequently, the calculation unit 162 determines whether or not the length of time during which the listener watches the specified low-attentive speaker exceeds a predetermined threshold. Subsequently, when the calculation unit 162 determines that the length of time during which the listener is paying attention to the specified low-attentive speaker has exceeded a predetermined threshold value, the calculation unit 162 calculates the listening degree of the specified low-attentive speaker to another It is calculated higher than the degree of listening of the speaker. That is, the calculation unit 162 calculates the degree of listening of a speaker whose length of time that the listener is paying attention to exceeds a predetermined threshold value to be higher than the degree of listening of other speakers. In other words, a speaker whose length of time being watched by listeners exceeds a predetermined threshold may be changed from a low-attentive speaker to a high-attentive speaker. Based on the degree of attentive listening calculated by the calculating unit 162, the auxiliary control unit 165 calculates the position of the sound source corresponding to the voice of the speaker newly changed to the speaker with high attentive listening (original speaker with low attentive listening). , to the position of the sound source corresponding to the original high-attentive speaker.

〔３．情報処理手順〕
図８は、実施形態に係る情報処理手順を示す図である。図８に示すように、情報処理装置１００の検出部１５０は、複数の参加者が参加するリモート会議において、複数の発言者の発言を聞いている聞き手の視線を検出する（ステップＳ１０１）。 [3. Information processing procedure]
FIG. 8 is a diagram illustrating an information processing procedure according to the embodiment; As shown in FIG. 8, the detection unit 150 of the information processing apparatus 100 detects the line of sight of a listener who is listening to the statements of a plurality of speakers in a remote conference in which a plurality of participants participate (step S101).

情報処理装置１００の算出部１６２は、複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を複数の発言者それぞれについて算出する（ステップＳ１０２）。例えば、算出部１６２は、検出部１５０が検出した視線の方向に基づいて、聞き手に注視されている発言者を特定し、特定された発言者の傾聴度合を他の発言者の傾聴度合よりも高く算出する。 The calculation unit 162 of the information processing device 100 calculates, for each of the plurality of speakers, the degree of listening that indicates the degree of listening to the speech of each of the plurality of speakers (step S102). For example, the calculation unit 162 identifies the speaker being watched by the listener based on the direction of the line of sight detected by the detection unit 150, and sets the listening degree of the identified speaker higher than the listening degree of the other speakers. Calculate high.

情報処理装置１００の補助制御部１６５は、傾聴度合が高い高傾聴発言者と比べて、傾聴度合が低い低傾聴発言者の発言を、聞き手にとって聞きやすくする補助機能を提供する（ステップＳ１０３）。例えば、補助制御部１６５は、低傾聴発言者の発言を文字変換した文字情報の中から、事前に設定された文字列と一致する文字列が検出された場合に、聞き手に対して低傾聴発言者の発言への注意を促す強調処理を行う。 The auxiliary control unit 165 of the information processing apparatus 100 provides an auxiliary function that makes it easier for listeners to hear the speech of the low-attentive speaker whose degree of listening is lower than that of the high-attentive speaker whose high listening degree is (step S103). For example, if a character string matching a preset character string is detected from the character information obtained by converting the utterance of the speaker with low attentive listening, the auxiliary control unit 165 Emphasis processing is performed to call attention to the person's remarks.

〔４．変形例〕
上述した実施形態に係る情報処理システム１は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、情報処理システム１の他の実施形態について説明する。なお、実施形態と同一部分には、同一符号を付して説明を省略する。 [4. Modification]
The information processing system 1 according to the above-described embodiments may be implemented in various different forms other than the above-described embodiments. Therefore, other embodiments of the information processing system 1 will be described below. In addition, the same code|symbol is attached|subjected to the same part as embodiment, and description is abbreviate|omitted.

〔４－１．タイムシフト再生〕
図９は、変形例に係る補助機能の一例を示す図である。図９では、聞き手Ｕ１１が低傾聴発言者である発言者Ｕ２２の発言を聞き逃してしまったとする。そこで、聞き手Ｕ１１は、聞き逃してしまった発言者Ｕ２２の参加者画像を録画した個別録画画像Ｇ２２´を巻き戻して高速で再生する操作を行う。例えば、聞き手Ｕ１１は、図３に示す画面において、低傾聴発言者である発言者Ｕ２２の画像Ｇ２２を選択する操作（クリックまたはタップなどの操作）を行ったとする。なお、聞き手Ｕ１１が発言者Ｕ２２を指定する操作は、発言者Ｕ２２の画像Ｇ２２を選択する操作に限られない。例えば、聞き手Ｕ１１が発言者Ｕ２２を指定する操作は、聞き手Ｕ１１が専用のフィールドに発言者Ｕ２２を識別可能な情報を入力することによって行われてもよい。 [4-1. Time shift playback]
FIG. 9 is a diagram illustrating an example of an auxiliary function according to a modification; In FIG. 9, it is assumed that the listener U11 has missed the speech of the speaker U22, who is a speaker with low listening comprehension. Therefore, the listener U11 performs an operation to rewind and reproduce at high speed the individually recorded image G22' in which the participant image of the speaker U22 who has missed to hear is recorded. For example, it is assumed that the listener U11 performs an operation (an operation such as clicking or tapping) to select the image G22 of the speaker U22 who is a low-attentive speaker on the screen shown in FIG. Note that the operation by the listener U11 to designate the speaker U22 is not limited to the operation of selecting the image G22 of the speaker U22. For example, the listener U11's operation to specify the speaker U22 may be performed by the listener U11 inputting information that enables the speaker U22 to be identified in a dedicated field.

補助制御部１６５は、Ｗｅｂ会議の最中に、複数の参加者それぞれの参加者画像をそれぞれ録画した複数の個別録画画像のうち、聞き手Ｕ１１によって指定された低傾聴発言者である発言者Ｕ２２に対応する個別録画画像Ｇ２２´を再生して表示するよう表示部１３０を制御する。具体的には、補助制御部１６５は、聞き手Ｕ１１から個別録画画像を再生したい発言者の指定を受け付けると、聞き手Ｕ１１によって指定された発言者Ｕ２２に対応する個別録画画像Ｇ２２´を配信サーバ２００から取得する。続いて、補助制御部１６５は、聞き手Ｕ１１よって指定された低傾聴発言者である発言者Ｕ２２の画像の表示領域Ｆ２２に個別録画画像Ｇ２２´を再生して表示するよう表示部１３０を制御する。なお、配信サーバ２００は、複数の参加者それぞれの参加者画像をそれぞれ録画した複数の個別録画画像を記憶する。そして、配信サーバ２００は、聞き手によって指定された個別録画画像を情報処理装置１００にストリーミング配信してもよい。例えば、補助制御部１６５は、聞き手Ｕ１１から個別録画画像を再生したい発言者の指定を受け付けると、配信サーバ２００からストリーミング配信された個別録画画像Ｇ２２´を表示するよう表示部１３０を制御する。 During the Web conference, the auxiliary control unit 165 selects the speaker U22, who is a low-attentive speaker designated by the listener U11, among the plurality of individually recorded images obtained by recording the participant images of the respective participants. The display unit 130 is controlled to reproduce and display the corresponding individual recorded image G22'. Specifically, when the auxiliary control unit 165 receives from the listener U11 the specification of the speaker whose individual recorded image is to be reproduced, the auxiliary control unit 165 transmits the individual recorded image G22' corresponding to the speaker U22 specified by the listener U11 from the distribution server 200. get. Subsequently, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the individual recorded image G22' in the image display area F22 of the speaker U22, who is the low-attentive speaker designated by the listener U11. In addition, the distribution server 200 stores a plurality of individually recorded images obtained by recording the participant images of each of the plurality of participants. Then, the distribution server 200 may stream-distribute the individual recorded image specified by the listener to the information processing apparatus 100 . For example, when the auxiliary control unit 165 receives from the listener U11 the specification of a speaker whose individual recorded image is to be reproduced, the auxiliary control unit 165 controls the display unit 130 to display the individual recorded image G22′ streamed from the distribution server 200.

また、補助制御部１６５は、聞き手Ｕ１１によって指定された再生速度「１．５倍速」で個別録画画像Ｇ２２´を再生して表示するよう表示部１３０を制御する。図９に示す個別録画画像Ｇ２２´には、聞き手Ｕ１１によって指定された再生速度「１．５倍速」を示すアイコンＧ２２２が含まれる。また、個別録画画像Ｇ２２´には、聞き手Ｕ１１によって指定されていないが、選択可能な再生速度「１．０倍速」を示すアイコンＧ２２３および再生速度「２．０倍速」を示すアイコンＧ２２４が含まれる。 Further, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the individual recorded image G22' at the reproduction speed "1.5x" specified by the listener U11. The individually recorded image G22' shown in FIG. 9 includes an icon G222 indicating the playback speed "1.5x speed" specified by the listener U11. The individually recorded image G22' also includes an icon G223 indicating a selectable playback speed of "1.0x speed" and an icon G224 indicating a selectable playback speed of "2.0x speed", which are not specified by the listener U11. .

また、補助制御部１６５は、聞き手Ｕ１１によって指定された再生時間「３０秒」における個別録画画像Ｇ２２´を再生して表示するよう表示部１３０を制御する。図９に示す個別録画画像Ｇ２２´には、聞き手Ｕ１１によって指定された再生時間「３０秒」を示すアイコンＧ２２５が含まれる。また、個別録画画像Ｇ２２´には、聞き手Ｕ１１によって指定されていないが、選択可能な再生時間「１０秒」を示すアイコンＧ２２６が含まれる。なお、聞き手Ｕ１１がアイコンＧ２２５またはアイコンＧ２２６を選択する回数に比例して、再生時間が長くなる。 Further, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the individual recorded image G22' at the reproduction time "30 seconds" specified by the listener U11. The individual recorded image G22' shown in FIG. 9 includes an icon G225 indicating the reproduction time "30 seconds" specified by the listener U11. The individual recorded image G22' also includes an icon G226 indicating a selectable reproduction time of "10 seconds" although not specified by the listener U11. Note that the reproduction time is lengthened in proportion to the number of times the listener U11 selects the icon G225 or the icon G226.

また、個別録画画像Ｇ２２´には、聞き手Ｕ１１によって選択された場合に、録画画像から元のＷｅｂ会議の画像（ライブ画像）に切り替え可能なボタンＧ２２７が含まれる。 The individual recorded image G22' also includes a button G227 that can switch from the recorded image to the original Web conference image (live image) when selected by the listener U11.

また、上述した例では、補助制御部１６５が、Ｗｅｂ会議の最中に、個別録画画像を再生して表示するよう表示部１３０を制御する場合について説明したが、補助制御部１６５は、Ｗｅｂ会議の最中に、全画面画像を録画した全体録画画像を再生して表示するよう表示部１３０を制御してもよい。具体的には、表示部１３０は、参加者の操作に従って、Ｗｅｂ会議の最中に、議事進捗情報を表示する。補助制御部１６５は、議事進捗情報の中から、聞き手によって指定された発言の発言時刻を再生開始時刻として受け付けた場合に、受け付けた再生開始時刻から全画面画像を録画した全体録画画像を配信サーバ２００から取得する。続いて、補助制御部１６５は、取得した全体録画画像を再生して表示するよう表示部１３０を制御する。これにより、情報処理装置１００は、例えば、前の会議が押して２０分遅れの参加となったような聞き手が、重要な部分のみを倍速再生してキャッチアップするのを助けることができる。 In the above example, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the individual recorded image during the Web conference. , the display unit 130 may be controlled to reproduce and display the entire recorded image obtained by recording the full screen image. Specifically, the display unit 130 displays the proceedings progress information during the web conference according to the operation of the participants. When the auxiliary control unit 165 accepts the speech time of the speech specified by the listener as the reproduction start time from the proceedings progress information, the auxiliary control unit 165 transmits the entire recorded image obtained by recording the full screen image from the received reproduction start time to the distribution server. 200. Subsequently, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the acquired whole recorded image. As a result, the information processing apparatus 100 can help a listener who is 20 minutes late to participate in the previous meeting to catch up by reproducing only the important part at double speed.

〔４－２．傾聴クラスタ表現〕
図１０は、変形例に係る補助機能の一例を示す図である。一般的に、Ｗｅｂ会議では、対面と異なり、相手の視線を感じることが難しい。そこで、生成部１６３は、傾聴度合が所定の閾値を超える超傾聴発言者を複数の聞き手それぞれについて特定し、各聞き手を各聞き手について特定された超傾聴発言者のクラスタに分類し、分類された各クラスタに関するクラスタ情報を生成する。 [4-2. Listening Cluster Representation]
FIG. 10 is a diagram illustrating an example of an auxiliary function according to a modification; Generally, in a web conference, unlike a face-to-face meeting, it is difficult to feel the line of sight of the other party. Therefore, the generation unit 163 identifies a super-listening speaker whose degree of listening exceeds a predetermined threshold for each of a plurality of listeners, classifies each listener into a cluster of super-listening speakers specified for each listener, and classifies Generate cluster information for each cluster.

図９に示す例では、生成部１６３は、４名の聞き手である「ａｙｙａｐａｎ」、「ｄｏｉｇａｋｉ」、「ｉｓｈｉｇｅ」、および「ｙａｍａｄａ」それぞれについて傾聴度合が所定の閾値を超える超傾聴発言者である発言者Ｕ２１を特定し、４名の聞き手を発言者Ｕ２１のクラスタＣＬ２１に分類し、分類されたクラスタＣＬ２１に関するクラスタ情報を生成する。例えば、生成部１６３は、クラスタＣＬ２１によって示される傾聴度合に加えて、４名の聞き手のそれぞれの表情や相槌等の感情を認識可能なアイコンを含む画像Ｇ３１を生成する。 In the example shown in FIG. 9 , the generating unit 163 determines that four listeners “ayyapan”, “doigaki”, “ishige”, and “yamada” are super-listening speakers whose listening degrees exceed a predetermined threshold. The speaker U21 is specified, the four listeners are classified into the cluster CL21 of the speaker U21, and cluster information about the classified cluster CL21 is generated. For example, the generation unit 163 generates an image G31 that includes icons that allow recognition of emotions such as facial expressions and backtracking of each of the four listeners, in addition to the degree of listening indicated by the cluster CL21.

また、生成部１６３は、クラスタＣＬ２１の参加者と同じＷｅｂ会議に参加している３名の聞き手である「ｉｗａｋｉ」、「ｔｏｎｏｍａ」、および「ｙａｍａｏｋａ」それぞれについて傾聴度合が所定の閾値を超える超傾聴発言者である発言者Ｕ２２を特定し、３名の聞き手を発言者Ｕ２２のクラスタＣＬ２２に分類し、分類されたクラスタＣＬ２２に関するクラスタ情報を生成する。例えば、生成部１６３は、クラスタＣＬ２２によって示される傾聴度合に加えて、３名の聞き手のそれぞれの表情や相槌等の感情を認識可能なアイコンを含む画像Ｇ３２を生成する。 In addition, the generation unit 163 determines that the degree of listening for each of the three listeners "iwaki", "tonoma", and "yamaoka" who are participating in the same Web conference as the participant of the cluster CL21 exceeds the predetermined threshold. A speaker U22 who is a speaker who listens attentively is specified, three listeners are classified into a cluster CL22 of the speaker U22, and cluster information about the classified cluster CL22 is generated. For example, the generation unit 163 generates an image G32 that includes, in addition to the degree of attentive listening indicated by the cluster CL22, icons capable of recognizing emotions such as facial expressions and backtracking of each of the three listeners.

補助制御部１６５は、Ｗｅｂ会議の最中に、生成部１６３によって生成された画像Ｇ３１および画像Ｇ３２を表示するよう表示部１３０を制御する。 Auxiliary control unit 165 controls display unit 130 to display image G31 and image G32 generated by generation unit 163 during the Web conference.

これにより、情報処理装置１００は、リモート会議において、複数の参加者それぞれが注目する相手の視線を視覚的に視認可能とすることができる。また、情報処理装置１００は、複数の参加者それぞれが、同じＷｅｂ会議に参加しながら、複数のグループにわかれて議論することも、自由にグループ間の行き来も可能とすることができる。 As a result, the information processing apparatus 100 can visually recognize the line of sight of each of the plurality of participants in the remote conference. In addition, the information processing apparatus 100 enables each of a plurality of participants to participate in the same Web conference, divide into a plurality of groups for discussion, and freely move between groups.

〔４－３．音声品質ヘルスチェック〕
図１１は、変形例に係る補助機能の一例を示す図である。一般的に、Ｗｅｂ会議では、対面と異なり、相手に声が正常に届いているか自分ではわからない。そこで、図１１では、本人（音声の出力元）である参加者Ｕ１の情報処理装置１００－１から出力された第１音声データの特徴を示す第１音声特徴データと、他の会議参加者（音声の出力先）である参加者Ｕ２の情報処理装置１００－２から出力された第２音声データの特徴を示す第２音声特徴データが一致しない場合に、参加者Ｕ１に対して通知を行う。 [4-3. Voice quality health check]
FIG. 11 is a diagram illustrating an example of an auxiliary function according to a modification; In general, in a web conference, unlike a face-to-face conference, one does not know whether the other party has received the voice normally. Therefore, in FIG. 11, first voice feature data indicating the features of the first voice data output from the information processing device 100-1 of the participant U1 who is the person (voice output source), and other conference participants ( If the second voice feature data indicating the feature of the second voice data output from the information processing device 100-2 of the participant U2 (the voice output destination) does not match, the participant U1 is notified.

具体的には、情報処理装置１００－１の会議制御部１６１は、Ｗｅｂ会議における本人の音声データ（第１音声データ）を取得する。会議制御部１６１は、第１音声データを取得すると、第１音声データを配信サーバ２００に送信する。配信サーバ２００は、情報処理装置１００－１から第１音声データを取得すると、取得した第１音声データを情報処理装置１００－２に送信する。また、配信サーバ２００は、情報処理装置１００－１から第１音声データを取得すると、取得した第１音声データの特徴を示す音声特徴データを生成する。ここで、音声特徴データは、例えば、音声振幅の時間変化量や平均振幅通過連続数を特徴化（数値化）したデータである。 Specifically, the conference control unit 161 of the information processing device 100-1 acquires the person's voice data (first voice data) in the web conference. After acquiring the first audio data, the conference control unit 161 transmits the first audio data to the distribution server 200 . When distribution server 200 acquires the first audio data from information processing device 100-1, distribution server 200 transmits the acquired first audio data to information processing device 100-2. Further, when distribution server 200 acquires the first audio data from information processing device 100-1, distribution server 200 generates audio feature data indicating the characteristics of the acquired first audio data. Here, the sound feature data is, for example, data obtained by characterizing (digitizing) the amount of change in sound amplitude over time or the number of continuous average amplitude passages.

また、情報処理装置１００－１の生成部１６３は、会議制御部１６１が取得した第１音声データに基づいて、第１音声データの特徴を示す第１音声特徴データを生成する。生成部１６３は、第１音声特徴データを生成すると、第１音声特徴データを配信サーバ２００に送信する。配信サーバ２００は、情報処理装置１００－１から第１音声特徴データを取得すると、取得した第１音声特徴データと取得した第１音声データの特徴を示す音声特徴データとを比較して、両者が一致する場合に、第１音声特徴データを情報処理装置１００－２に送信する。 Further, generation unit 163 of information processing device 100-1 generates first sound feature data indicating the characteristics of the first sound data based on the first sound data acquired by conference control unit 161. FIG. After generating the first sound feature data, generation unit 163 transmits the first sound feature data to distribution server 200 . When the distribution server 200 acquires the first sound feature data from the information processing device 100-1, the distribution server 200 compares the acquired first sound feature data with the sound feature data representing the feature of the acquired first sound data, and determines whether the two are the same. If they match, the first sound feature data is transmitted to information processing device 100-2.

情報処理装置１００－２の会議制御部１６１は、配信サーバ２００から第１音声データを取得すると、第１音声データを音声出力部１４０から出力する。情報処理装置１００－２の取得部１６４は、音声出力部１４０から出力された第２音声データを取得する。情報処理装置１００－２の生成部１６３は、取得部１６４が取得した第２音声データに基づいて、第２音声データの特徴を示す第２音声特徴データを生成する。また、情報処理装置１００－２の補助制御部１６５は、配信サーバ２００から第１音声特徴データを取得する。補助制御部１６５は、生成部１６３が生成した第２音声特徴データと第１音声特徴データが一致しない場合に、参加者Ｕ１に対して通知を行う。 Conference control unit 161 of information processing apparatus 100-2 outputs the first audio data from audio output unit 140 upon obtaining the first audio data from distribution server 200. FIG. Acquisition unit 164 of information processing device 100-2 acquires the second audio data output from audio output unit 140. FIG. Generation unit 163 of information processing device 100-2 generates second sound feature data indicating the characteristics of the second sound data based on the second sound data acquired by acquisition unit 164. FIG. Auxiliary control unit 165 of information processing device 100 - 2 also acquires the first sound feature data from distribution server 200 . The auxiliary control unit 165 notifies the participant U1 when the second sound feature data generated by the generation unit 163 and the first sound feature data do not match.

上述した情報処理システム１は、音声同様に途中ネットワーク経路で欠損することがあっては比較ができないため、確実に受信できる仕組みを有する（ＴＣＰかつＡＣＫ確認するまで再送）する。 The above-described information processing system 1 has a mechanism for reliable reception (retransmission until TCP and ACK confirmation) because comparison cannot be performed if loss occurs along the network path in the same way as voice.

これにより、情報処理装置１００は、複数点の音声特徴データのマッチ・アンマッチの結果から、どこで音声パケットが脱落しているかの見当をつきやすくすることができる。なお、アンマッチの結果は、被疑区間の提示を含めて、本人のみならず参加者全員に共有されてもよい。 As a result, the information processing apparatus 100 can easily estimate where the voice packet is dropped from the match/unmatch results of the voice feature data at a plurality of points. Note that the unmatched result, including the presentation of the suspected section, may be shared not only with the person himself but also with all the participants.

〔４－４．音源の数および位置〕
また、上述した実施形態では、情報処理装置１００が、３名の発言者それぞれの音声が聞き手の左右および中央のそれぞれ異なる位置に配置された音源それぞれから聞こえるように３名の発言者それぞれの音声を出力する場合について説明したが、音源の数は３個に限られない。具体的には、情報処理装置１００は、２名以下の発言者それぞれの音声が発言者の人数に応じてそれぞれ異なる位置に配置された２個以下の音源それぞれから聞こえるように２名以下の発言者それぞれの音声を出力してもよい。また、情報処理装置１００は、４名以上の発言者それぞれの音声が発言者の人数に応じてそれぞれ異なる位置に配置された４個以上の音源それぞれから聞こえるように４名以上の発言者それぞれの音声を出力してもよい。 [4-4. Number and position of sound sources]
Further, in the above-described embodiment, the information processing apparatus 100 arranges the voices of the three speakers so that the voices of the three speakers can be heard from the sound sources arranged at different positions on the left, right, and center of the listener. , but the number of sound sources is not limited to three. Specifically, the information processing apparatus 100 arranges the voices of two or less speakers so that the voices of the two or fewer speakers can be heard from each of two or fewer sound sources arranged at different positions according to the number of speakers. You may output the voice of each person. In addition, the information processing apparatus 100 is configured so that each of the four or more speakers can be heard from each of four or more sound sources arranged at different positions according to the number of speakers. Audio may be output.

より具体的には、情報処理装置１００は、４名以上の発言者それぞれの音声が、聞き手の前方の直線上に配置された４個以上の音源それぞれから聞こえるように４名以上の発言者それぞれの音声を出力してもよい。例えば、補助制御部１６５は、算出部１６２が算出した傾聴度合が高い発言者に対応する音源の位置ほど、聞き手に近い位置に配置する。つまり、補助制御部１６５は、算出部１６２が算出した傾聴度合が低い発言者に対応する音源の位置ほど、聞き手から遠い位置に配置する。なお、音源の位置は、例えば、聞き手の前方の直線上に等間隔に配置されてよい。 More specifically, the information processing apparatus 100 controls the sound of each of the four or more speakers so that each of the four or more speakers can be heard from each of four or more sound sources arranged on a straight line in front of the listener. may be output. For example, the auxiliary control unit 165 arranges the position of the sound source corresponding to the speaker whose listening degree calculated by the calculation unit 162 is high, closer to the listener. In other words, the auxiliary control unit 165 arranges the position of the sound source corresponding to the speaker whose degree of attentive listening calculated by the calculation unit 162 is low, at a position farther from the listener. Note that the positions of the sound sources may be arranged at equal intervals on a straight line in front of the listener, for example.

また、情報処理装置１００は、４名以上の発言者それぞれの音声が、聞き手を中心とする円の円周上に配置された４個以上の音源それぞれから聞こえるように４名以上の発言者それぞれの音声を出力してもよい。音源の位置は、例えば、聞き手を中心とする円の円周上に等間隔に配置されてよい。例えば、補助制御部１６５は、算出部１６２が算出した傾聴度合が最も高い発言者に対応する音源の位置を聞き手の目の前に配置する。また、補助制御部１６５は、算出部１６２が算出した傾聴度合が次に高い発言者に対応する音源の位置を聞き手の目の前の左右の位置に配置する。また、補助制御部１６５は、算出部１６２が算出した傾聴度合が相対的に低い発言者に対応する音源の位置を聞き手の後方に配置する。また、一般的に、サラウンドシステムにおいて、聞き手の後方の音は聞き手にとって聞こえづらい（ほとんど聞こえない）ことが知られている。そこで、補助制御部１６５は、算出部１６２が算出した傾聴度合が相対的に低い発言者の音声の音量をあえてゼロにしてもよい。 In addition, the information processing apparatus 100 is arranged so that the voices of the four or more speakers can be heard from four or more sound sources arranged on the circumference of a circle with the listener at the center. may be output. The positions of the sound sources may, for example, be evenly spaced on the circumference of a circle centered at the listener. For example, the auxiliary control unit 165 arranges the position of the sound source corresponding to the speaker with the highest listening degree calculated by the calculation unit 162 in front of the listener. Further, the auxiliary control unit 165 arranges the positions of the sound sources corresponding to the speaker with the next highest listening degree calculated by the calculation unit 162 to the right and left positions in front of the listener. Further, the auxiliary control unit 165 arranges the position of the sound source corresponding to the speaker whose degree of attentive listening calculated by the calculation unit 162 is relatively low behind the listener. Also, it is generally known that in a surround system, it is difficult for the listener to hear sounds behind the listener (almost inaudible). Therefore, the auxiliary control unit 165 may intentionally set the volume of the voice of the speaker whose degree of attentive listening calculated by the calculation unit 162 to be relatively low to zero.

上述したように、情報処理装置１００は、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者それぞれの音声を出力する。 As described above, the information processing apparatus 100 outputs the voices of the plurality of speakers so that the voices of the plurality of speakers can be heard from the plurality of sound sources arranged at different positions.

〔４－５．スピーカーの数〕
また、上述した実施形態では、情報処理装置１００が、聞き手の左右に配置された２台のスピーカーから複数の発言者それぞれの音声を出力する場合について説明したが、スピーカーの台数は２台に限られない。例えば、情報処理装置１００は、聞き手を取り囲むように配置された３個以上（例えば、６個や８個等）のスピーカーを備え、サラウンド方式により、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者それぞれの音声を３個以上のスピーカーそれぞれから出力してもよい。なお、情報処理装置１００は、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように複数の発言者それぞれの音声を１台のスピーカーから出力してもよい。 [4-5. number of speakers]
In the above-described embodiment, the information processing apparatus 100 outputs the voices of a plurality of speakers from two speakers arranged on the left and right sides of the listener, but the number of speakers is limited to two. can't For example, the information processing device 100 includes three or more (for example, six or eight) speakers arranged to surround the listener, and the surround system allows the voices of a plurality of speakers to be played at different positions. The voices of each of the plurality of speakers may be output from each of three or more speakers so that they can be heard from each of the plurality of arranged sound sources. Information processing apparatus 100 may output the voices of a plurality of speakers from a single speaker so that the voices of the speakers can be heard from a plurality of sound sources arranged at different positions.

〔５．効果〕
上述してきたように、実施形態に係る情報処理装置１００は、算出部１６２と補助制御部１６５を備える。算出部１６２は、複数の参加者が参加するリモート会議において、複数の発言者の発言を聞いている聞き手による複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を複数の発言者それぞれについて算出する。補助制御部１６５は、傾聴度合が高い高傾聴発言者と比べて、傾聴度合が低い低傾聴発言者の発言を、聞き手にとって聞きやすくする補助機能を提供する。 [5. effect〕
As described above, the information processing apparatus 100 according to the embodiment includes the calculator 162 and the auxiliary controller 165 . In a remote conference in which a plurality of participants participate, the calculation unit 162 calculates, for each of the plurality of speakers, the listening degree indicating the degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speeches of the plurality of speakers. calculate. The auxiliary control unit 165 provides an auxiliary function that makes it easier for listeners to hear the speech of a low-attentive speaker with a low degree of listening than a high-attentive speaker with a high degree of listening.

このように、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の話を、聞き手が聞きやすくするようにすることができる。これにより、情報処理装置１００は、複数の参加者が参加するリモート会議において、参加者が充実した議論を行うことを可能とすることができる。また、情報処理装置１００は、複数の参加者が参加するリモート会議において、参加者が会議中快適に過ごすことを可能とすることができる。したがって、情報処理装置１００は、リモート会議におけるユーザビリティを向上させることができる。 In this way, the information processing apparatus 100 can make it easier for the listener to hear the speaker who is not the listener's attention (the listener is not paying attention). As a result, the information processing apparatus 100 can enable participants to have substantial discussions in a remote conference in which a plurality of participants participate. In addition, the information processing apparatus 100 can enable participants to spend time comfortably during a remote conference in which a plurality of participants participate. Therefore, the information processing apparatus 100 can improve usability in the remote conference.

また、算出部１６２は、聞き手の視線を検出する検出部１５０によって検出された視線の方向に基づいて、傾聴度合を算出する。 Further, the calculation unit 162 calculates the listening degree based on the line-of-sight direction detected by the detection unit 150 that detects the line-of-sight of the listener.

これにより、情報処理装置１００は、聞き手の視線に基づいて傾聴度合を算出するため、傾聴度合を適切に算出することができる。 As a result, the information processing apparatus 100 calculates the degree of attentive listening based on the line of sight of the listener, so that the degree of attentive listening can be calculated appropriately.

また、算出部１６２は、検出部１５０によって検出された視線の方向に基づいて、聞き手に注視されている発言者を特定し、特定された発言者の傾聴度合を他の発言者の傾聴度合よりも高く算出する。 Further, the calculation unit 162 identifies the speaker being watched by the listener based on the line-of-sight direction detected by the detection unit 150, and compares the listening degree of the identified speaker to the listening degree of the other speakers. calculated to be higher.

これにより、情報処理装置１００は、聞き手が注目している発言者の傾聴度合を他の発言者の傾聴度合よりも高く算出するため、傾聴度合を適切に算出することができる。 As a result, the information processing apparatus 100 calculates the degree of listening to the speaker that the listener is paying attention to higher than the degree of listening to other speakers, so that the degree of listening can be calculated appropriately.

また、補助制御部１６５は、低傾聴発言者の発言を文字変換した文字情報の中から、事前に設定された文字列と一致する文字列が検出された場合に、聞き手に対して低傾聴発言者の発言への注意を促す強調処理を行う。例えば、事前に設定された文字列は、過去のリモート会議において事前に設定された文字列を教師データとして機械学習した結果に基づき、設定された文字列である。 Further, when a character string matching a preset character string is detected from the character information obtained by converting the utterance of the speaker with low attentive listening, the auxiliary control unit 165 instructs the listener to make the utterance with low attentive listening. Emphasis processing is performed to call attention to the person's remarks. For example, the preset character string is a character string set based on the results of machine learning using character strings preset in past remote meetings as teacher data.

これにより、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の発言に事前に設定されたキーワード等が登場した場合に、聞き手が注目する相手ではない（聞き手が注目していない）発言者に注目するよう聞き手に対して注意を促すことができる。したがって、情報処理装置１００は、聞き手が注目する相手の発言に集中することを妨げることなく、必要なときだけ、聞き手が注目する相手ではない（聞き手が注目していない）発言者の話を聞き手が聞きやすくするようにすることができる。 As a result, when a preset keyword or the like appears in the utterance of a speaker who is not the listener's attention (the listener is not paying attention), the information processing apparatus 100 is not the listener's attention ( The listener can be alerted to pay attention to the speaker (whom the listener is not paying attention to). Therefore, the information processing apparatus 100 allows the listener to hear the speech of a speaker who is not the listener's attention (the listener is not paying attention to) only when necessary, without preventing the listener from concentrating on the speech of the listener's attention. can be made easier to hear.

また、補助制御部１６５は、強調処理として、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように音声出力部１４０から出力される複数の発言者それぞれの音声のうち、低傾聴発言者の音声の音量を他の発言者の音声の音量よりも大きくして出力するよう音声出力部１４０を制御する。 In addition, as an emphasis process, the auxiliary control unit 165 outputs the voices of the plurality of speakers output from the voice output unit 140 so that the voices of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions. Among them, the voice output unit 140 is controlled so that the volume of the voice of the low-attentive speaker is louder than the volume of the voices of the other speakers.

これにより、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の音声を聞き手にとって聞きやすい大きい音量にするため、聞き手が注目する相手ではない（聞き手が注目していない）発言者の話を聞き手が聞きやすくするようにすることができる。 As a result, the information processing apparatus 100 increases the volume of the voice of the speaker who is not the listener's attention (the listener's attention is not focused) to a volume that is easy for the listener to hear. not) can make it easier for listeners to hear what the speaker is saying.

また、補助制御部１６５は、強調処理として、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように音声出力部１４０から出力される複数の発言者それぞれの音声のうち、低傾聴発言者の音声に対応する音源の位置を元の位置から高傾聴発言者の音声に対応する音源の位置へ変更するよう音声出力部１４０を制御する。 In addition, as an emphasis process, the auxiliary control unit 165 outputs the voices of the plurality of speakers output from the voice output unit 140 so that the voices of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions. Among them, the voice output unit 140 is controlled to change the position of the sound source corresponding to the voice of the low-attentive speaker from the original position to the position of the sound source corresponding to the voice of the high-attentive speaker.

これにより、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の音声に対応する音源の位置を聞き手にとって聞きやすい位置に変更するため、聞き手が注目する相手ではない（聞き手が注目していない）発言者の話を聞き手が聞きやすくするようにすることができる。 As a result, the information processing apparatus 100 changes the position of the sound source corresponding to the voice of the speaker who is not the listener's attention (the listener is not paying attention) to a position that is easy for the listener to hear. It is possible to make it easier for the listener to hear the speaker who is not (the listener does not pay attention to).

また、補助制御部１６５は、強調処理として、複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように音声出力部１４０から出力される複数の発言者それぞれの音声のうち、低傾聴発言者の音声に対する残響処理を行うことなく低傾聴発言者の音声を出力するよう音声出力部１４０を制御する。 In addition, as an emphasis process, the auxiliary control unit 165 outputs the voices of the plurality of speakers output from the voice output unit 140 so that the voices of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions. Of these, the voice output unit 140 is controlled so as to output the voice of the low-attentive speaker without performing reverberation processing on the voice of the low-attentive speaker.

一般的に、複数の音声のうち、一部の音声に対して音声加工（例えば、残響処理）を施すと、音声加工を施さない音声に対して脳が集中しやすくなることが知られている。これにより、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の話を聞き手が聞きやすくするようにすることができる。 In general, it is known that if some voices out of a plurality of voices are subjected to voice processing (for example, reverberation processing), the brain will be more likely to concentrate on the voices that have not undergone voice processing. . As a result, the information processing apparatus 100 can make it easier for the listener to hear the speaker who is not the listener's attention (the listener is not paying attention).

また、補助制御部１６５は、強調処理として、表示部１３０のそれぞれ異なる表示領域に表示される複数の参加者それぞれの参加者画像のうち、低傾聴発言者の参加者画像を他の発言者の参加者画像よりも視覚的に強調して表示するよう表示部１３０を制御する。 In addition, the auxiliary control unit 165 performs emphasis processing, among the participant images of each of the plurality of participants displayed in the different display areas of the display unit 130, to display the participant image of the low-attentive speaker as the participant image of the other speaker. The display unit 130 is controlled so as to visually emphasize and display the participant image.

これにより、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の参加者画像を視覚的に強調して表示するため、聞き手が注目する相手ではない（聞き手が注目していない）発言者の発言に対する注意を促すことができる。 As a result, the information processing apparatus 100 visually emphasizes and displays the participant image of the speaker who is not the listener's attention (the listener is not paying attention). is not paying attention to).

また、補助制御部１６５は、強調処理として、検出された文字列に関する情報を表示するよう表示部１３０を制御する。 Further, the auxiliary control unit 165 controls the display unit 130 to display information about the detected character string as the highlighting process.

これにより、情報処理装置１００は、設定されたキーワードを表示して聞き手の視覚に訴えるため、聞き手が注目する相手ではない（聞き手が注目していない）発言者の発言に対する注意を促すことができる。 As a result, the information processing apparatus 100 displays the set keyword to appeal to the listener's visual sense, so it is possible to call attention to the statement of the speaker who is not the listener's attention (the listener is not paying attention to). .

また、補助制御部１６５は、リモート会議の最中に、表示部１３０のそれぞれ異なる表示領域に表示される複数の参加者それぞれの参加者画像をそれぞれ録画した複数の個別録画画像のうち、聞き手によって指定された低傾聴発言者に対応する個別録画画像を再生して表示するよう表示部１３０を制御する。 In addition, the auxiliary control unit 165 selects, by the listener, among a plurality of individually recorded images obtained by recording the participant images of the respective participants displayed in different display areas of the display unit 130 during the remote conference. The display unit 130 is controlled so as to reproduce and display the individual recorded image corresponding to the designated low-attentive speaker.

これにより、情報処理装置１００は、聞き手が注目する相手ではない（聞き手が注目していない）発言者の発言を聞き逃してしまった場合であっても、聞き逃してしまった発言を会議中にキャッチアップすることを可能とすることができる。 As a result, even if the listener fails to hear the speech of a speaker who is not the listener's attention (the listener is not paying attention), the information processing apparatus 100 can reproduce the missed speech during the conference. It can be made possible to catch up.

また、補助制御部１６５は、聞き手によって指定された再生時間における個別録画画像を再生して表示するよう表示部１３０を制御する。 Further, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the individual recorded image at the reproduction time specified by the listener.

これにより、情報処理装置１００は、聞き手が個別録画画像の再生時間を指定できるため、聞き手にとってのユーザビリティを向上させることができる。 As a result, the information processing apparatus 100 allows the listener to specify the playback time of the individual recorded image, thereby improving usability for the listener.

また、補助制御部１６５は、聞き手によって指定された再生速度で個別録画画像を再生して表示するよう表示部１３０を制御する。 Further, the auxiliary control section 165 controls the display section 130 to reproduce and display the individual recorded image at the reproduction speed specified by the listener.

これにより、情報処理装置１００は、聞き手が個別録画画像の再生速度を指定できるため、聞き手にとってのユーザビリティを向上させることができる。 As a result, the information processing apparatus 100 allows the listener to specify the playback speed of the individual recorded image, thereby improving usability for the listener.

また、補助制御部１６５は、聞き手によって指定された低傾聴発言者に対応する表示領域に個別録画画像を再生して表示するよう表示部１３０を制御する。 Further, the auxiliary control unit 165 controls the display unit 130 to reproduce and display the individual recorded image in the display area corresponding to the low-attentive speaker designated by the listener.

これにより、情報処理装置１００は、聞き手が、他の発言者の発言を聞きながら、聞き逃してしまった発言者の発言を聞くことを可能にすることができる。 As a result, the information processing apparatus 100 enables the listener to listen to the speech of the speaker who has missed it while listening to the speech of the other speaker.

また、補助制御部１６５は、リモート会議の最中に、表示部１３０のそれぞれ異なる表示領域に表示される複数の参加者それぞれの参加者画像を含む全画面画像を録画した全体録画画像を再生して表示するよう表示部１３０を制御する。 Also, during the remote conference, the auxiliary control unit 165 reproduces the entire recorded image obtained by recording the full-screen images including the participant images of each of the plurality of participants displayed in different display areas of the display unit 130. The display unit 130 is controlled to display the

これにより、情報処理装置１００は、聞き手が、例えば、遅れて参加した会議全体の内容を会議中にキャッチアップすることを可能とすることができる。 Thereby, the information processing apparatus 100 can enable the listener to catch up during the conference, for example, the content of the entire conference that the listener joined late.

また、情報処理装置１００は、生成部１６３をさらに備える。生成部１６３は、リモート会議における発言ごとに、発言を文字変換した文字情報と、発言の発言時刻と、発言の発言者とを対応付けて記録した議事進捗情報を生成する。補助制御部１６５は、リモート会議の最中に、表示部１３０に表示された議事進捗情報の中から、聞き手によって指定された発言の発言時刻を再生開始時刻として、再生開始時刻から全体録画画像を再生して表示するよう表示部１３０を制御する。 Information processing apparatus 100 further includes generation unit 163 . The generation unit 163 generates proceedings progress information in which character information obtained by converting the statement into characters, the statement time of the statement, and the speaker of the statement are associated and recorded for each statement in the remote conference. During the remote conference, the auxiliary control unit 165 sets the utterance time of the utterance specified by the listener from the progress information displayed on the display unit 130 as the playback start time, and reproduces the entire recorded image from the playback start time. Control the display unit 130 to reproduce and display.

これにより、情報処理装置１００は、聞き手が、例えば、遅れて参加した会議全体の内容であって、聞き手が重要だと思う発言以降の内容を会議中にキャッチアップすることを可能とすることができる。 As a result, the information processing apparatus 100 enables the listener to catch up during the conference, for example, the content of the entire conference in which the listener participates late, and the content after the statement that the listener considers important. can.

また、情報処理装置１００は、生成部１６３をさらに備える。生成部１６３は、傾聴度合が所定の閾値を超える超傾聴発言者を複数の聞き手それぞれについて特定し、各聞き手を各聞き手について特定された超傾聴発言者のクラスタに分類し、分類された各クラスタに関するクラスタ情報を生成する。補助制御部１６５は、リモート会議の最中に、生成部１６３によって生成されたクラスタ情報を表示するよう表示部１３０を制御する。 Information processing apparatus 100 further includes generation unit 163 . The generation unit 163 identifies, for each of a plurality of listeners, a speaker whose listening degree exceeds a predetermined threshold, classifies each listener into a cluster of speakers who are super-listening and classifies each cluster. Generate cluster information about . The auxiliary control unit 165 controls the display unit 130 to display the cluster information generated by the generation unit 163 during the remote conference.

また、情報処理装置１００は、取得部１６４と生成部１６３をさらに備える。取得部１６４は、リモート会議における参加者の音声データを取得する。生成部１６３は、音声データに基づいて、音声データの特徴を示す第１音声特徴データを生成する。補助制御部１６５は、他の参加者の他の情報処理装置１００が受信した音声データの特徴を示す第２音声特徴データと第１音声特徴データが一致しない場合に、参加者に対して通知を行う。 Information processing apparatus 100 further includes acquisition unit 164 and generation unit 163 . The acquisition unit 164 acquires voice data of participants in the remote conference. The generation unit 163 generates first sound feature data indicating the feature of the sound data based on the sound data. The auxiliary control unit 165 notifies the participant when the second audio feature data indicating the characteristics of the audio data received by the other information processing device 100 of the other participant does not match the first audio feature data. conduct.

これにより、情報処理装置１００は、複数点の音声特徴データのマッチ・アンマッチの結果から、どこで音声パケットが脱落しているかの見当をつきやすくすることができる。 As a result, the information processing apparatus 100 can easily estimate where the voice packet is dropped from the match/unmatch results of the voice feature data at a plurality of points.

〔６．ハードウェア構成〕
また、上述してきた実施形態に係る情報処理装置１００は、例えば図１２に示すような構成のコンピュータ１０００によって実現される。図１２は、情報処理装置１００の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を備える。 [6. Hardware configuration]
Further, the information processing apparatus 100 according to the embodiment described above is implemented by a computer 1000 configured as shown in FIG. 12, for example. FIG. 12 is a hardware configuration diagram showing an example of a computer that implements the functions of the information processing apparatus 100. As shown in FIG. Computer 1000 includes CPU 1100 , RAM 1200 , ROM 1300 , HDD 1400 , communication interface (I/F) 1500 , input/output interface (I/F) 1600 and media interface (I/F) 1700 .

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on programs stored in the ROM 1300 or HDD 1400 and controls each section. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、所定の通信網を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを所定の通信網を介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. Communication interface 1500 receives data from another device via a predetermined communication network, sends the data to CPU 1100, and transmits data generated by CPU 1100 to another device via a predetermined communication network.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as displays and printers, and input devices such as keyboards and mice, through an input/output interface 1600 . CPU 1100 acquires data from an input device via input/output interface 1600 . CPU 1100 also outputs the generated data to an output device via input/output interface 1600 .

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 Media interface 1700 reads programs or data stored in recording medium 1800 and provides them to CPU 1100 via RAM 1200 . CPU 1100 loads such a program from recording medium 1800 onto RAM 1200 via media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable disc), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. etc.

例えば、コンピュータ１０００が実施形態に係る情報処理装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１６０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing apparatus 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the functions of the control unit 160 by executing programs loaded on the RAM 1200 . CPU 1100 of computer 1000 reads these programs from recording medium 1800 and executes them, but as another example, these programs may be obtained from another device via a predetermined communication network.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail based on the drawings. It is possible to carry out the invention in other forms with modifications.

〔７．その他〕
また、上記実施形態及び変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [7. others〕
Further, among the processes described in the above embodiments and modifications, all or part of the processes described as being performed automatically can be performed manually, or described as being performed manually. All or part of the processing can also be performed automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

例えば、上述した実施形態では、情報処理システム１が中央処理型（集中型ともいう）のコンピューティングシステムである例について説明したが、情報処理システム１の構成はこれに限られない。例えば、情報処理システム１は、複数台の情報処理装置１００で構成される分散型コンピューティングシステムであってよい。この場合、情報処理システムは、複数台の情報処理装置１００がネットワークを介して互いに接続される。また、この場合、情報処理システムでは、それぞれの情報処理装置１００に図２で説明した情報処理装置１００の機能が実装される。 For example, in the above-described embodiment, an example in which the information processing system 1 is a central processing type (also referred to as a centralized type) computing system has been described, but the configuration of the information processing system 1 is not limited to this. For example, the information processing system 1 may be a distributed computing system composed of a plurality of information processing apparatuses 100 . In this case, the information processing system includes a plurality of information processing apparatuses 100 connected to each other via a network. Further, in this case, in the information processing system, the functions of the information processing device 100 described with reference to FIG. 2 are implemented in each information processing device 100 .

また、上述した情報処理装置１００は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットホーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティング等で呼び出して実現するなど、構成は柔軟に変更できる。 The information processing apparatus 100 described above may be implemented by a plurality of server computers, and depending on the function, may be implemented by calling an external platform or the like using an API (Application Programming Interface), network computing, or the like. can be changed flexibly.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Also, the above-described embodiments and modifications can be appropriately combined within a range that does not contradict the processing contents.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、補助制御部は、補助制御手段や補助制御回路に読み替えることができる。 Also, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the auxiliary control section can be read as auxiliary control means or an auxiliary control circuit.

１情報処理システム
１００情報処理装置
１１０通信部
１２０記憶部
１３０表示部
１４０音声出力部
１５０検出部
１６０制御部
１６１会議制御部
１６２算出部
１６３生成部
１６４取得部
１６５補助制御部 1 information processing system 100 information processing device 110 communication unit 120 storage unit 130 display unit 140 audio output unit 150 detection unit 160 control unit 161 conference control unit 162 calculation unit 163 generation unit 164 acquisition unit 165 auxiliary control unit

実施形態に係る情報処理プログラムは、複数の参加者が参加するリモート会議において、複数の発言者の発言を聞いている聞き手による前記複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を前記複数の発言者それぞれについて算出する算出手順と、前記傾聴度合が高い高傾聴発言者と比べて、前記傾聴度合が低い低傾聴発言者の発言を、前記聞き手にとって聞きやすくする補助機能を提供する補助制御手順と、をコンピュータに実行させる。
また、前記算出手順は、前記聞き手の視線を検出する検出部によって検出された視線の方向に基づいて、前記傾聴度合を算出する。
また、前記算出手順は、前記検出部によって検出された視線の方向に基づいて、前記聞き手に注視されている発言者を特定し、特定された発言者の前記傾聴度合を他の発言者の前記傾聴度合よりも高く算出する。
また、前記補助制御手順は、前記低傾聴発言者の発言を文字変換した文字情報の中から、事前に設定された文字列と一致する文字列が検出された場合に、前記聞き手に対して前記低傾聴発言者の発言への注意を促す強調処理を行う。
また、前記事前に設定された文字列は、過去のリモート会議において事前に設定された文字列を教師データとして機械学習した結果に基づき、設定された文字列である。
また、前記補助制御手順は、前記強調処理として、前記複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように音声出力部から出力される前記複数の発言者それぞれの音声のうち、前記低傾聴発言者の音声の音量を他の発言者の音声の音量よりも大きくして出力するよう前記音声出力部を制御する。
また、前記補助制御手順は、前記強調処理として、前記複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように音声出力部から出力される前記複数の発言者それぞれの音声のうち、前記低傾聴発言者の音声に対応する音源の位置を元の位置から前記高傾聴発言者の音声に対応する音源の位置へ変更するよう前記音声出力部を制御する。
また、前記補助制御手順は、前記強調処理として、前記複数の発言者それぞれの音声がそれぞれ異なる位置に配置された複数の音源それぞれから聞こえるように音声出力部から出力される前記複数の発言者それぞれの音声のうち、前記低傾聴発言者の音声に対する残響処理を行うことなく前記低傾聴発言者の音声を出力するよう前記音声出力部を制御する。
また、前記補助制御手順は、前記強調処理として、表示部のそれぞれ異なる表示領域に表示される前記複数の参加者それぞれの参加者画像のうち、前記低傾聴発言者の前記参加者画像を他の発言者の前記参加者画像よりも視覚的に強調して表示するよう前記表示部を制御する。
また、前記補助制御手順は、前記強調処理として、前記検出された文字列に関する情報を表示するよう前記表示部を制御する。
また、前記補助制御手順は、前記リモート会議の最中に、表示部のそれぞれ異なる表示領域に表示される前記複数の参加者それぞれの参加者画像をそれぞれ録画した複数の個別録画画像のうち、前記聞き手によって指定された前記低傾聴発言者に対応する個別録画画像を再生して表示するよう前記表示部を制御する。
また、前記補助制御手順は、前記聞き手によって指定された再生時間における前記個別録画画像を再生して表示するよう前記表示部を制御する。
また、前記補助制御手順は、前記聞き手によって指定された再生速度で前記個別録画画像を再生して表示するよう前記表示部を制御する。
また、前記補助制御手順は、前記聞き手によって指定された前記低傾聴発言者に対応する前記表示領域に前記個別録画画像を再生して表示するよう前記表示部を制御する。
また、前記補助制御手順は、前記リモート会議の最中に、表示部のそれぞれ異なる表示領域に表示される前記複数の参加者それぞれの参加者画像を含む全画面画像を録画した全体録画画像を再生して表示するよう前記表示部を制御する。
また、前記リモート会議における発言ごとに、前記発言を文字変換した文字情報と、前記発言の発言時刻と、前記発言の発言者とを対応付けて記録した議事進捗情報を生成する生成手順をさらに備え、前記補助制御手順は、前記リモート会議の最中に、前記表示部に表示された前記議事進捗情報の中から、前記聞き手によって指定された前記発言の前記発言時刻を再生開始時刻として、前記再生開始時刻から前記全体録画画像を再生して表示するよう前記表示部を制御する。
また、前記傾聴度合が所定の閾値を超える超傾聴発言者を複数の前記聞き手それぞれについて特定し、各聞き手を各聞き手について特定された前記超傾聴発言者のクラスタに分類し、分類された各クラスタに関するクラスタ情報を生成する生成手順と、をさらに備え、前記補助制御手順は、前記リモート会議の最中に、前記生成手順によって生成された前記クラスタ情報を表示するよう表示部を制御する。
また、前記リモート会議における参加者の音声データを取得する取得手順と、前記音声データに基づいて、前記音声データの特徴を示す第１音声特徴データを生成する生成手順と、をさらに備え、前記補助制御手順は、他の参加者の他の情報処理装置が受信した前記音声データの特徴を示す第２音声特徴データと前記第１音声特徴データが一致しない場合に、前記参加者に対して通知を行う。
また、実施形態に係る情報処理方法は、コンピュータが実行する情報処理方法であって、複数の参加者が参加するリモート会議において、複数の発言者の発言を聞いている聞き手による前記複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を前記複数の発言者それぞれについて算出する算出工程と、前記傾聴度合が高い高傾聴発言者と比べて、前記傾聴度合が低い低傾聴発言者の発言を、前記聞き手にとって聞きやすくする補助機能を提供する補助制御工程と、を含む。
また、実施形態に係る情報処理装置は、複数の参加者が参加するリモート会議において、複数の発言者の発言を聞いている聞き手による前記複数の発言者それぞれの発言に対する傾聴の度合いを示す傾聴度合を前記複数の発言者それぞれについて算出する算出部と、前記傾聴度合が高い高傾聴発言者と比べて、前記傾聴度合が低い低傾聴発言者の発言を、前記聞き手にとって聞きやすくする補助機能を提供する補助制御部と、を備える。 An information processing program according to an embodiment, in a remote conference in which a plurality of participants participate, calculates a listening degree indicating a degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speech of the plurality of speakers. A calculation procedure for calculating each of a plurality of speakers, and assistance for providing an auxiliary function that makes it easier for the listener to hear the speech of the low-attentive speaker whose degree of listening is low compared to the high-attentive speaker whose high degree of listening is. causing a computer to execute a control procedure;
In addition, the calculating step calculates the degree of attentive listening based on the direction of the line of sight detected by the detection unit that detects the line of sight of the listener.
Further, the calculation step identifies a speaker being watched by the listener based on the direction of the line of sight detected by the detection unit, and calculates the degree of listening of the identified speaker to the It is calculated higher than the listening degree.
Further, the auxiliary control procedure instructs the listener, when a character string matching a preset character string is detected from the character information obtained by converting the utterance of the low-attentive speaker, to the listener. Emphasis processing is performed to call attention to the remarks of the speaker who listens less attentively.
Further, the character string set in advance is a character string set based on the result of machine learning using character strings set in advance in past remote meetings as teacher data.
Further, the auxiliary control procedure includes, as the emphasizing process, outputting the voice of each of the plurality of speakers from a sound output unit so that the voice of each of the plurality of speakers can be heard from each of a plurality of sound sources arranged at different positions. out of the voices, the voice output unit is controlled so that the volume of the voice of the low-attentive speaker is louder than the voices of the other speakers.
Further, the auxiliary control procedure includes, as the emphasizing process, outputting the voice of each of the plurality of speakers from a sound output unit so that the voice of each of the plurality of speakers can be heard from each of a plurality of sound sources arranged at different positions. and controlling the voice output unit to change the position of the sound source corresponding to the voice of the low-attentive speaker from the original position to the position of the sound source corresponding to the voice of the high-attentive speaker.
Further, the auxiliary control procedure includes, as the emphasizing process, outputting the voice of each of the plurality of speakers from a sound output unit so that the voice of each of the plurality of speakers can be heard from each of a plurality of sound sources arranged at different positions. The voice output unit is controlled so as to output the voice of the low-attentive speaker without performing reverberation processing on the voice of the low-attentive speaker.
Further, the auxiliary control procedure includes, as the emphasizing process, the participant image of the low-attentive speaker among the participant images of the plurality of participants displayed in different display areas of the display unit. The display unit is controlled so as to visually emphasize and display the participant image of the speaker.
Further, the auxiliary control procedure controls the display unit to display information about the detected character string as the highlighting process.
In addition, the auxiliary control procedure includes, during the remote conference, among the plurality of individually recorded images obtained by recording the participant images of the plurality of participants respectively displayed in different display areas of the display unit, the The display unit is controlled so as to reproduce and display the individual recorded image corresponding to the low-attentive speaker designated by the listener.
Also, the auxiliary control procedure controls the display unit to reproduce and display the individually recorded image at the reproduction time designated by the listener.
Also, the auxiliary control procedure controls the display unit to reproduce and display the individually recorded image at a reproduction speed designated by the listener.
Further, the auxiliary control procedure controls the display unit to reproduce and display the individual recorded image in the display area corresponding to the low-attentive speaker designated by the listener.
Further, the auxiliary control procedure reproduces an entire recorded image obtained by recording a full-screen image including participant images of each of the plurality of participants displayed in different display areas of a display unit during the remote conference. The display unit is controlled to display as
Further, the method further includes a generation procedure for generating proceedings progress information in which character information obtained by converting the utterance into characters, the utterance time of the utterance, and the speaker of the utterance are recorded in association with each other for each utterance in the remote conference. , the auxiliary control procedure includes, during the remote conference, using the utterance time of the utterance specified by the listener as a reproduction start time from among the proceedings progress information displayed on the display unit, and performing the reproduction. The display unit is controlled to reproduce and display the entire recorded image from the start time.
Further, a super-listening speaker whose degree of listening exceeds a predetermined threshold is specified for each of the plurality of listeners, each listener is classified into clusters of the super-listening speaker specified for each listener, and each classified cluster. and a generating procedure for generating cluster information relating to, wherein the auxiliary control procedure controls a display to display the cluster information generated by the generating procedure during the remote conference.
The method further comprises: an acquisition procedure for acquiring voice data of a participant in the remote conference; and a generation procedure for generating first voice feature data indicating features of the voice data based on the voice data. The control procedure notifies the participant when the second audio feature data indicating the characteristics of the audio data received by the other information processing device of the other participant does not match the first audio feature data. conduct.
Further, an information processing method according to an embodiment is an information processing method executed by a computer, in which, in a remote conference in which a plurality of participants participate, a listener who is listening to the speeches of the plurality of speakers a calculating step of calculating, for each of the plurality of speakers, a degree of attentive listening that indicates the degree of listening to each utterance; and an auxiliary control step for providing auxiliary functions to facilitate listening to the listener.
Further, the information processing apparatus according to the embodiment provides a degree of listening that indicates the degree of listening to each of the plurality of speakers by a listener who is listening to the statements of the plurality of speakers in a remote conference in which a plurality of participants participate. for each of the plurality of speakers, and an auxiliary function that makes it easier for the listener to hear the utterances of the low-attentive speaker with a low degree of listening compared to the high-attentive speaker with a high degree of listening. and an auxiliary control unit for

Claims

In a remote conference in which a plurality of participants participate, a calculation for calculating, for each of the plurality of speakers, a listening degree indicating the degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speeches of the plurality of speakers. a procedure;
an auxiliary control procedure for providing an auxiliary function that makes it easier for the listener to hear the speech of the low-attentive speaker whose degree of listening is lower than that of the high-attentive speaker whose degree of listening is high;
An information processing program that causes a computer to execute

The calculation procedure is
calculating the degree of attentive listening based on the direction of the line of sight detected by the detection unit that detects the line of sight of the listener;
The information processing program according to claim 1.

The calculation procedure is
Based on the line-of-sight direction detected by the detection unit, the speaker being watched by the listener is identified, and the listening degree of the identified speaker is calculated higher than the listening degree of other speakers. ,
The information processing program according to claim 2.

The auxiliary control procedure is
When a character string matching a preset character string is detected from the character information obtained by converting the utterance of the speaker with low attentiveness to listening, the listener is instructed to respond to the utterance of the speaker with low attentiveness. Perform emphasis processing to call attention,
The information processing program according to any one of claims 1 to 3.

The preset string is
A character string set based on the results of machine learning using character strings set in advance in past remote meetings as teacher data.
The information processing program according to claim 4.

The auxiliary control procedure is
As the emphasizing process, among the voices of the plurality of speakers output from the voice output unit so that the voices of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions, the low-level listening is performed. controlling the audio output unit to output the volume of the speaker's voice louder than the volume of the other speaker's voice;
6. The information processing program according to claim 4 or 5.

The auxiliary control procedure is
As the emphasizing process, among the voices of the plurality of speakers output from the voice output unit so that the voices of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions, the low-level listening is performed. controlling the audio output unit to change the position of the sound source corresponding to the speaker's voice from the original position to the position of the sound source corresponding to the high-attentive speaker's voice;
The information processing program according to any one of claims 4 to 6.

The auxiliary control procedure is
As the emphasizing process, among the voices of the plurality of speakers output from the voice output unit so that the voices of the plurality of speakers can be heard from each of the plurality of sound sources arranged at different positions, the low-level listening is performed. controlling the audio output unit to output the voice of the low-attentive speaker without performing reverberation processing on the speaker's voice;
The information processing program according to any one of claims 4 to 7.

The auxiliary control procedure is
As the emphasizing process, among the participant images of the plurality of participants displayed in different display areas of the display unit, the participant image of the low-attentive speaker is selected from the participant images of the other speakers. Control the display unit to visually emphasize and display
The information processing program according to any one of claims 4 to 8.

The auxiliary control procedure is
controlling the display unit to display information about the detected character string as the highlighting process;
The information processing program according to claim 9 .

The auxiliary control procedure is
The low listening level designated by the listener among a plurality of individually recorded images obtained by recording participant images of the plurality of participants respectively displayed in different display areas of a display unit during the remote conference. controlling the display unit to reproduce and display an individual recorded image corresponding to the speaker;
The information processing program according to any one of claims 1 to 10.

The auxiliary control procedure is
controlling the display unit to reproduce and display the individually recorded image at the reproduction time specified by the listener;
The information processing program according to claim 11.

The auxiliary control procedure is
controlling the display unit to reproduce and display the individually recorded image at the reproduction speed specified by the listener;
The information processing program according to claim 11 or 12.

The auxiliary control procedure is
controlling the display unit to reproduce and display the individually recorded image in the display area corresponding to the low-attentive speaker designated by the listener;
The information processing program according to any one of claims 11 to 13.

The auxiliary control procedure is
During the remote conference, the display unit reproduces and displays an entire recorded image obtained by recording full-screen images including participant images of the plurality of participants displayed in different display areas of the display unit. to control the
The information processing program according to any one of claims 1 to 14.

further comprising a generating procedure for generating proceedings progress information in which character information obtained by converting the utterance into characters, the utterance time of the utterance, and the speaker of the utterance are associated and recorded for each utterance in the remote conference;
The auxiliary control procedure is
During the remote conference, from the progress information displayed on the display unit, the speech time of the speech specified by the listener is set as a playback start time, and the entire recorded image is started from the playback start time. controlling the display to play and display
The information processing program according to claim 15.

A super-listening speaker whose degree of listening exceeds a predetermined threshold is specified for each of the plurality of listeners, each listener is classified into a cluster of the super-listening speaker specified for each listener, and clusters for each cluster are classified. a generation procedure for generating information;
further comprising
The auxiliary control procedure is
during the remote conference, controlling a display unit to display the cluster information generated by the generating procedure;
The information processing program according to any one of claims 1 to 16.

an acquisition procedure for acquiring voice data of participants in the remote conference;
a generation procedure for generating first audio feature data indicating a feature of the audio data based on the audio data;
further comprising
The auxiliary control procedure is
Notifying the participant when the second audio feature data indicating the characteristics of the audio data received by another information processing program of another participant does not match the first audio feature data,
The information processing program according to any one of claims 1 to 17.

A computer-executed information processing method comprising:
In a remote conference in which a plurality of participants participate, a calculation for calculating, for each of the plurality of speakers, a listening degree indicating the degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speeches of the plurality of speakers. process and
an auxiliary control step of providing an auxiliary function that makes it easier for the listener to hear the speech of the low-attentive speaker whose degree of listening is lower than that of the high-attentive speaker whose degree of listening is high;
Information processing method including.

In a remote conference in which a plurality of participants participate, a calculation for calculating, for each of the plurality of speakers, a listening degree indicating the degree of listening to the speech of each of the plurality of speakers by a listener who is listening to the speeches of the plurality of speakers. Department and
an auxiliary control unit that provides an auxiliary function that makes it easier for the listener to hear the speech of the low-attentive speaker whose degree of listening is lower than that of the high-attentive speaker whose degree of listening is high;
Information processing device.