JP2023043497A

JP2023043497A - remote conference system

Info

Publication number: JP2023043497A
Application number: JP2021151164A
Authority: JP
Inventors: 竜太田邨; Ryuta Tamura
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2023-03-29

Abstract

To provide a remote conference system that enables hearing utterance content of each speaker even when multiple participants speak at the same time during a remote conference.SOLUTION: A remote conference system 10 includes a remote conference server device 15 and remote conference client devices 11, 12, 13, 14. The remote conference server device 15 includes a seat determination unit 22. The remote conference server device 15 or remote conference client devices 11, 12, 13, 14 include sound field characteristic determination units 23 and speech synthesis units 24. The sound field characteristic determination unit 23 determines a sound field characteristic of an audio to be reproduced on each of the remote conference client devices 11, 12, 13, 14 separately for each of the remote conference client devices 11, 12, 13, 14. The speech synthesis unit 24 synthesizes the audio to be played on each of the remote conference client devices 11, 12, 13, 14 based on the sound field characteristics determined by the sound field characteristic determination unit 23.SELECTED DRAWING: Figure 2

Description

本発明は、リモート会議システムに関する。 The present invention relates to a remote conference system.

従来、リモート会議を実行するためのリモート会議システムとして、例えば、特許文献１に示すような多地点テレビ会議システムがある。特許文献１の多地点テレビ会議システムは、空間情報管理手段及び映像配置制御手段と、協同作業状態検出手段と、操作入力管理手段と、を備える。空間情報管理手段及び映像配置制御手段は、顔映像と協同作業映像の映像配置を管理する。協同作業状態検出手段は、協同作業手段の処理状態を検出する。操作入力管理手段は、協同作業手段が利用者からの入力として端末の別を指定する情報を待っている状態であることを検出した場合に利用者が顔映像領域に対して行ったポインティング操作を協同作業手段に対する端末指定の入力情報に変換する。 Conventionally, as a remote conference system for executing a remote conference, for example, there is a multipoint video conference system as disclosed in Patent Document 1. The multipoint video conference system of Patent Document 1 includes spatial information management means, video layout control means, collaborative work state detection means, and operation input management means. Spatial information management means and image layout control means manage the image layout of the face image and the collaborative work image. The cooperative work state detection means detects the processing state of the cooperative work means. The operation input management means controls the pointing operation performed by the user on the face image area when detecting that the collaborative work means is waiting for information specifying the type of terminal as input from the user. It is converted into terminal-specified input information for collaborative work means.

特開平８－２０５１１２号JP-A-8-205112

しかしながら、特許文献１の多地点テレビ会議システムでは、仮想的な会議空間における座席位置に応じて各参加者の画像を合成し、仮想的な会議室の画像を演出することは可能であるが、参加者が発する音声に対しては何ら作用を加えていないため、各参加者の音声音像が１点に集中する。そのため、特許文献１の多地点テレビ会議システムでは、実際の会議室とは異なり、複数の参加者が同時に発話した場合に、各参加者の発言内容が聞き取りにくいという問題があった。 However, in the multi-point video conference system of Patent Document 1, it is possible to produce an image of a virtual conference room by synthesizing the images of each participant according to the seat position in the virtual conference space. Since no action is applied to the voices uttered by the participants, the voice sound image of each participant concentrates on one point. Therefore, in the multi-point video conference system of Patent Document 1, unlike in an actual conference room, when a plurality of participants speak at the same time, there is a problem that it is difficult to hear the content of each participant's speech.

そこで、本発明は、リモート会議中に複数の参加者が同時に発言した場合であっても、個々の話者の発言内容を聞き取り可能なリモート会議システムを提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a remote conference system in which even when a plurality of participants speak at the same time during a remote conference, the content of each speaker's speech can be heard.

上記目的を達成するために、本発明の一局面に係るリモート会議システムは、リモート会議サーバー装置と、リモート会議クライアント装置とを備える。前記リモート会議サーバー装置は、前記リモート会議を主催する。前記リモート会議クライアント装置は、前記リモート会議に参加する参加者の各々に割り当てられる。前記リモート会議サーバー装置は、座席決定部を含む。前記座席決定部は、前記リモート会議を行うリモート会議室における前記参加者の座席を決定する。前記リモート会議サーバー装置又は前記リモート会議クライアント装置は、音場特性決定部と、音声合成部とを含む。音場特性決定部は、前記座席決定部によって決定される各座席相互間での音場特性を決定する。音声合成部は、前記参加者に対して出力される音声を合成する。前記音場特性決定部は、前記各リモート会議クライアント装置で再生する音声の音場特性を、前記リモート会議クライアント装置毎に個別に決定する。前記音声合成部は、前記音場特性決定部によって決定された音場特性に基づいて、前記リモート会議クライアント装置の各々で再生する音声を合成する。 To achieve the above object, a remote conference system according to one aspect of the present invention includes a remote conference server device and a remote conference client device. The remote conference server device hosts the remote conference. The remote conference client device is assigned to each participant joining the remote conference. The remote conference server device includes a seat determiner. The seat determination unit determines the seats of the participants in the remote conference room where the remote conference is held. The remote conference server device or the remote conference client device includes a sound field characteristic determination section and a speech synthesis section. A sound field characteristic determination unit determines sound field characteristics between the seats determined by the seat determination unit. The voice synthesizing unit synthesizes voices to be output to the participants. The sound field characteristic determining unit individually determines, for each remote conference client apparatus, sound field characteristics of audio reproduced by each of the remote conference client apparatuses. The speech synthesizing unit synthesizes speech to be reproduced by each of the remote conference client devices based on the sound field characteristics determined by the sound field characteristics determining unit.

本発明によると、複数の参加者が同時に発言した場合であっても、個々の話者の発言内容を容易に聞き取ることができる。 According to the present invention, even when a plurality of participants speak at the same time, it is possible to easily hear the content of each speaker's speech.

本実施形態に係るリモート会議システムの全体概要を示す概要図である。1 is a schematic diagram showing an overall overview of a remote conference system according to an embodiment; FIG. 本実施形態に係るリモート会議システムの全体概要を示すシステム構成図である。1 is a system configuration diagram showing an overall overview of a remote conference system according to an embodiment; FIG. 本実施形態に係るリモート会議システムにおいてリモート会議クライアント装置の表示部に表示されるリモート会議室の一例を示す図である。FIG. 3 is a diagram showing an example of a remote conference room displayed on the display unit of the remote conference client device in the remote conference system according to the embodiment; 本実施形態に係るリモート会議システムにおけるリモート会議サーバー装置の作動フローを示すフローチャートである。4 is a flow chart showing the operation flow of the remote conference server device in the remote conference system according to the present embodiment; 本実施形態に係るリモート会議システムにおいて１対１対話モード時にリモート会議クライアント装置の表示部に表示されるリモート会議室の一例を示す図である。FIG. 4 is a diagram showing an example of a remote conference room displayed on the display unit of the remote conference client device in the one-to-one interactive mode in the remote conference system according to the present embodiment;

以下、本発明の一実施形態を、図面を参照して説明する。図１は、本実施形態におけるリモート会議システム１０の全体概要を示す概要図である。図２は、本実施形態におけるリモート会議システム１０の全体概要を示すシステム構成図である。 An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a schematic diagram showing an overall overview of a remote conference system 10 according to this embodiment. FIG. 2 is a system configuration diagram showing an overall overview of the remote conference system 10 according to this embodiment.

リモート会議システム１０は、複数の参加者Ａ、Ｂ、Ｃ、Ｄが互いに離れた場所で一方向又は双方向に通信を行う通信会議システムである。図１に示すように、参加者Ａ、Ｂ、Ｃ、Ｄは、それぞれ、例えば、各自の自宅、各自の勤務先（具体的には、勤務先の自席又は勤務先の会議室等）、又は外出先等からリモート会議クライアント装置１１、１２、１３、１４を操作することによってリモート会議に参加する。図２に示すように、リモート会議システム１０は、複数のリモート会議クライアント装置１１、１２、１３、１４と、リモート会議サーバー装置１５と、から構成されている。 The remote conference system 10 is a communication conference system in which a plurality of participants A, B, C, and D perform one-way or two-way communication at remote locations. As shown in FIG. 1, participants A, B, C, and D are, for example, at their own homes, at their workplaces (specifically, at their desks at their workplaces or in conference rooms at their workplaces, etc.), or Participate in the remote conference by operating the remote conference client devices 11, 12, 13, and 14 from outside. As shown in FIG. 2, the remote conference system 10 includes a plurality of remote conference client devices 11, 12, 13, and 14 and a remote conference server device 15.

リモート会議クライアント装置１１、１２、１３、１４は、それぞれリモート会議に参加する参加者Ａ、Ｂ、Ｃ、Ｄに割り当てられる通信装置である。リモート会議クライアント装置１１、１２、１３、１４は、例えば、スマートフォン、タブレット、パーソナルコンピューター、テレビ受像装置等のリモート会議サーバー装置１５と通信可能な装置である。リモート会議クライアント装置１１、１２、１３、１４の各々は、通信ユニット１６と、スピーカー１７と、マイク１８と、表示部１９と、を主に備えている。 Remote conference client devices 11, 12, 13, and 14 are communication devices assigned to participants A, B, C, and D who participate in the remote conference, respectively. The remote conference client devices 11, 12, 13, and 14 are devices capable of communicating with the remote conference server device 15, such as smart phones, tablets, personal computers, and television receivers. Each of the remote conference client devices 11 , 12 , 13 and 14 mainly includes a communication unit 16 , a speaker 17 , a microphone 18 and a display section 19 .

通信ユニット１６は、リモート会議サーバー装置１５と通信を行う。すなわち、通信ユニット１６は、参加者Ａ、Ｂ、Ｃ、Ｄのリモート会議クライアント装置１１、１２、１３、１４の各々から、音声及び画像を、リモート会議サーバー装置１５に伝達する音声アップロード手段である。 The communication unit 16 communicates with the remote conference server device 15 . That is, the communication unit 16 is audio upload means for transmitting audio and images from each of the remote conference client devices 11, 12, 13, 14 of the participants A, B, C, D to the remote conference server device 15. .

例えば、第１リモート会議クライアント装置１１の通信ユニット１６は、第１リモート会議クライアント装置１１において形成される第１参加者Ａの発話の音声データを、リモート会議サーバー装置１５に送信する。一方で、第１リモート会議クライアント装置１１の通信ユニット１６は、他のリモート会議クライアント装置１２、１３、１４において形成される他の参加者Ｂ、Ｃ、Ｄの発話の音声データ、及びリモート会議サーバー装置１５において形成される音場形成用データを、リモート会議サーバー装置１５から受信する。通信ユニット１６は、受信した音声データ及び音場形成用データをスピーカー１７に対して出力する。音場形成用データとは、リモート会議室３０における音場特性（リモート会議室３０において参加者Ａ、Ｂ、Ｃ、Ｄの相互間で伝達される音声の特性）を決定するためのデータである。 For example, the communication unit 16 of the first remote conference client device 11 transmits voice data of the first participant A's speech formed in the first remote conference client device 11 to the remote conference server device 15 . On the other hand, the communication unit 16 of the first remote conference client device 11 receives the audio data of the utterances of the other participants B, C, D formed in the other remote conference client devices 12, 13, 14 and the remote conference server Sound field forming data formed in the device 15 is received from the remote conference server device 15 . The communication unit 16 outputs the received audio data and sound field forming data to the speaker 17 . The sound field forming data is data for determining sound field characteristics in the remote conference room 30 (characteristics of voices transmitted between participants A, B, C, and D in the remote conference room 30). .

同様に、第１リモート会議クライアント装置１１の通信ユニット１６は、第１リモート会議クライアント装置１１において形成される第１参加者Ａの画像等の画像データを、リモート会議サーバー装置１５に送信する。第１リモート会議クライアント装置１１の通信ユニット１６は、他のリモート会議クライアント装置１２、１３、１４において形成される他の参加者Ｂ、Ｃ、Ｄの画像等の画像データ、及びリモート会議サーバー装置１５において形成される画像データを、リモート会議サーバー装置１５から受信する。通信ユニット１６は、受信した画像データを表示部１９に出力する。 Similarly, the communication unit 16 of the first remote conference client device 11 transmits image data such as the image of the first participant A formed on the first remote conference client device 11 to the remote conference server device 15 . The communication unit 16 of the first remote conference client device 11 receives image data such as images of the other participants B, C, and D formed in the other remote conference client devices 12 , 13 , and 14 and the remote conference server device 15 . image data formed in is received from the remote conference server device 15 . The communication unit 16 outputs the received image data to the display section 19 .

スピーカー１７は、リモート会議クライアント装置１１、１２、１３、１４において形成される音声データ、並びに通信ユニット１６が受信した音声データ及び音場形成用データを出力する。具体的には、例えば、第１リモート会議クライアント装置１１のスピーカー１７は、第１リモート会議クライアント装置１１において形成される音声データを出力する。一方で、第１リモート会議クライアント装置１１のスピーカー１７は、リモート会議システム１０の利用時には、他のリモート会議クライアント装置１２、１３、１４において形成される音声データを含んだリモート会議サーバー装置１５から送信される音声データを出力する。 The speaker 17 outputs audio data formed by the remote conference client devices 11, 12, 13, and 14, and audio data and sound field forming data received by the communication unit 16. FIG. Specifically, for example, the speaker 17 of the first remote conference client device 11 outputs voice data formed in the first remote conference client device 11 . On the other hand, when using the remote conference system 10, the speaker 17 of the first remote conference client device 11 transmits from the remote conference server device 15 including voice data formed in the other remote conference client devices 12, 13, 14. Outputs the audio data that is played.

スピーカー１７は、例えば、リモート会議クライアント装置の左右に配置された２つのスピーカーによって構成される。スピーカー１７は、左右のスピーカーで異なる音を出力することが可能である。すなわち、スピーカー１７は、ステレオ再生が可能である。なお、スピーカー１７は、１つのスピーカーによって構成されてもよいし、３つ以上のスピーカーによって構成されてもよい。 The speakers 17 are, for example, two speakers arranged on the left and right sides of the remote conference client device. The left and right speakers of the speaker 17 can output different sounds. That is, the speaker 17 is capable of stereo reproduction. Note that the speaker 17 may be composed of one speaker, or may be composed of three or more speakers.

マイク１８は、参加者Ａ、Ｂ、Ｃ、Ｄがリモート会議中に音声を発する際に使用する。参加者Ａ、Ｂ、Ｃ、Ｄがマイク１８に音声を入力することで、音声信号が通信ユニット１６に出力される。 Microphones 18 are used by participants A, B, C, and D to speak during the remote conference. When the participants A, B, C, and D input their voices into the microphones 18 , voice signals are output to the communication unit 16 .

表示部１９は、各リモート会議クライアント装置１１、１２、１３、１４において形成される画像データ及びリモート会議サーバー装置１５において形成される画像データを表示する。 The display unit 19 displays image data formed in each of the remote conference client devices 11 , 12 , 13 , and 14 and image data formed in the remote conference server device 15 .

リモート会議サーバー装置１５は、リモート会議を主催するホスト制御処理装置である。図２に示すように、リモート会議サーバー装置１５は、通信部２０と、画像形成部２１と、座席決定部２２と、音場特性決定部２３と、音声合成部２４と、発話回数記録部２５と、を主に含んでいる。 The remote conference server device 15 is a host control processing device that hosts remote conferences. As shown in FIG. 2, the remote conference server device 15 includes a communication unit 20, an image forming unit 21, a seat determining unit 22, a sound field characteristic determining unit 23, a voice synthesizing unit 24, and an utterance frequency recording unit 25. and mainly contains

通信部２０は、リモート会議クライアント装置１１、１２、１３、１４の通信ユニット１６と通信可能に接続されている。通信部２０は、リモート会議クライアント装置１１、１２、１３、１４の通信ユニット１６との間で、音声データ、音場形成用データ及び画像データの送受信を行う。すなわち、通信部２０は、リモート会議サーバー装置１５から各リモート会議クライアント装置１１、１２、１３、１４に個別の音声及び画像を配信する配信手段である。なお、通信部２０と通信ユニット１６との間の通信は、有線及び無線を問わない。 The communication unit 20 is communicably connected to the communication units 16 of the remote conference client devices 11, 12, 13, and 14. FIG. The communication unit 20 transmits and receives audio data, sound field forming data, and image data to and from the communication units 16 of the remote conference client devices 11 , 12 , 13 , and 14 . That is, the communication unit 20 is distribution means for distributing individual voices and images from the remote conference server device 15 to each of the remote conference client devices 11 , 12 , 13 , and 14 . Communication between the communication unit 20 and the communication unit 16 may be wired or wireless.

図３は、リモート会議システム１０において各リモート会議クライアント装置１１、１２、１３、１４の表示部１９に表示されるリモート会議室３０の一例を示す図である。画像形成部２１は、各リモート会議クライアント装置１１、１２、１３、１４の表示部１９に表示させる画像データを形成する。画像形成部２１は、例えば、図３に示すように、リモート会議室３０を想定し、想定したリモート会議室３０の中に会議用テーブル３１を仮定し、仮定した会議用テーブル３１の周りに参加者の人数分の座席（座席３２、３３、３４、３５）を配置する。すなわち、画像形成部２１は、仮想的なリモート会議室３０を設定し、その仮想的に設定されたリモート会議室３０の中に、会議用テーブル３１及び座席３２、３３、３４、３５を配置する。より具体的には、画像形成部２１は、長方形の会議用テーブル３１を仮定し、会議用テーブル３１の片側に座席３２、３３を配置し、座席３２、３３の向かい側に座席３４、３５を配置する。なお、リモート会議室３０の形状、会議用テーブル３１の形状、座席３２、３３、３４、３５の配置等は、画像形成部２１によって任意で決定されてもよいし、ユーザーによって任意で決定されてもよい。 FIG. 3 is a diagram showing an example of a remote conference room 30 displayed on the display units 19 of the remote conference client devices 11, 12, 13, and 14 in the remote conference system 10. As shown in FIG. The image forming unit 21 forms image data to be displayed on the display units 19 of the remote conference client devices 11 , 12 , 13 , and 14 . For example, as shown in FIG. 3, the image forming unit 21 assumes a remote conference room 30, assumes a conference table 31 in the assumed remote conference room 30, and participates around the assumed conference table 31. Seats (seats 32, 33, 34, 35) for the number of persons are arranged. That is, the image forming unit 21 sets the virtual remote conference room 30, and arranges the conference table 31 and the seats 32, 33, 34, and 35 in the virtual remote conference room 30. . More specifically, the image forming unit 21 assumes a rectangular conference table 31, arranges seats 32 and 33 on one side of the conference table 31, and arranges seats 34 and 35 on opposite sides of the seats 32 and 33. do. The shape of the remote conference room 30, the shape of the conference table 31, the arrangement of the seats 32, 33, 34, and 35 may be arbitrarily determined by the image forming unit 21, or may be arbitrarily determined by the user. good too.

座席決定部２２は、リモート会議室３０における参加者Ａ、Ｂ、Ｃ、Ｄの座席（座席３２、３３、３４、３５）を決定する。具体的には、座席決定部２２は、リモート会議システム１０の利用時に、画像形成部２１によって配置されたリモート会議室３０の座席（座席３２、３３、３４、３５）に、参加者Ａ、Ｂ、Ｃ、Ｄを振り分ける。 The seat determination unit 22 determines seats (seats 32 , 33 , 34 and 35 ) for the participants A, B, C and D in the remote conference room 30 . Specifically, when the remote conference system 10 is used, the seat determination unit 22 assigns participants A and B to the seats (seats 32, 33, 34, and 35) of the remote conference room 30 arranged by the image forming unit 21. , C, and D.

より具体的には、座席決定部２２は、図３に示すように、第１リモート会議クライアント装置１１から参加している第１参加者Ａを第１座席３２に振り分ける。座席決定部２２は、第２リモート会議クライアント装置１２から参加している第２参加者Ｂを第１座席３２の隣の第２座席３３に振り分ける。座席決定部２２は、第３リモート会議クライアント装置１３から参加している第３参加者Ｃを第１座席３２の向かい側の第３座席３４に振り分ける。座席決定部２２は、第４リモート会議クライアント装置１４から参加している第４参加者Ｄを第２座席３３の向かい側（第３座席３４の隣）の第４座席３５に振り分ける。なお、座席決定部２２は、例えば、リモート会議室３０における参加者Ａ、Ｂ、Ｃ、Ｄの座席（座席３２、３３、３４、３５）の配置を、画像形成部２１によって配置された座席３２、３３、３４、３５に応じて任意で決定する。または、座席決定部２２は、例えば、座席３２、３３、３４、３５の配置を、参加者Ａ、Ｂ、Ｃ、Ｄの要望（リモート会議クライアント装置１１、１２、１３、１４からの要求信号）に応じて決定する。 More specifically, the seat determining unit 22 assigns the first participant A participating from the first remote conference client device 11 to the first seat 32, as shown in FIG. The seat determination unit 22 distributes the second participant B participating from the second remote conference client device 12 to the second seat 33 next to the first seat 32 . The seat determination unit 22 distributes the third participant C participating from the third remote conference client device 13 to the third seat 34 on the opposite side of the first seat 32 . The seat determination unit 22 distributes the fourth participant D participating from the fourth remote conference client device 14 to the fourth seat 35 on the opposite side of the second seat 33 (next to the third seat 34). Note that the seat determination unit 22 determines, for example, the arrangement of the seats (seats 32, 33, 34, and 35) of the participants A, B, C, and D in the remote conference room 30 from the seats 32 arranged by the image forming unit 21. , 33, 34, 35. Alternatively, the seat determination unit 22, for example, arranges the seats 32, 33, 34, and 35 according to requests from the participants A, B, C, and D (request signals from the remote conference client devices 11, 12, 13, and 14). Decide accordingly.

画像形成部２１は、座席決定部２２が参加者Ａ、Ｂ、Ｃ、Ｄを各座席（座席３２、３３、３４、３５）に振り分けることで、参加者Ａ、Ｂ、Ｃ、Ｄの画像を、振り分けた座席３２、３３、３４、３５に対応させて形成する。より具体的には、画像形成部２１は、例えば、第１参加者Ａの画像を、第１座席３２の上に表示させる画像データを形成する。画像形成部２１は、第２参加者Ｂの画像を、第２座席３３の上に表示させる画像データを形成する。画像形成部２１は、第３参加者Ｃの画像を、第３座席３４の上に表示させる画像データを形成する。画像形成部２１は、第４参加者Ｄの画像を、第４座席３５の上に表示させる画像データを形成する。 The image forming unit 21 divides the participants A, B, C, and D into the respective seats (seats 32, 33, 34, and 35) by the seat determining unit 22, thereby forming images of the participants A, B, C, and D. , corresponding to the assigned seats 32, 33, 34, 35. More specifically, the image forming unit 21 forms image data for displaying an image of the first participant A on the first seat 32, for example. The image forming unit 21 forms image data for displaying the image of the second participant B on the second seat 33 . The image forming unit 21 forms image data for displaying the image of the third participant C on the third seat 34 . The image forming section 21 forms image data for displaying the image of the fourth participant D on the fourth seat 35 .

音場特性決定部２３は、仮想的に設定されたリモート会議室３０における各参加者Ａ、Ｂ、Ｃ、Ｄの相対的な位置関係（座席３２、３３、３４、３５への着席位置よる相対的な位置関係）に応じて音声の音場特性（音場及び音像定位、並びに音声の奥行き感）を決定する。具体的には、音場特性決定部２３は、座席決定部２２によって決定される各座席３２、３３、３４、３５相互間での音場形成用データを形成する。 The sound field characteristic determining unit 23 determines the relative positional relationships of the participants A, B, C, and D in the virtually set remote conference room 30 (the relative positions of the participants to the seats 32, 33, 34, and 35). sound field characteristics (sound field and sound image localization, and sense of depth of sound). Specifically, the sound field characteristic determining section 23 forms sound field forming data between the respective seats 32 , 33 , 34 and 35 determined by the seat determining section 22 .

音声合成部２４は、参加者Ａ、Ｂ、Ｃ、Ｄに対して出力される音声及び画像を合成する。具体的には、音声合成部２４は、音場特性決定部２３によって決定された音場特性（音場特性決定部２３によって形成された音場形成用データ）に基づいて、各リモート会議クライアント装置１１、１２、１３、１４で再生する音声及び画像を合成する。 The voice synthesizing unit 24 synthesizes voices and images output to the participants A, B, C, and D. FIG. Specifically, the speech synthesizing unit 24, based on the sound field characteristics determined by the sound field characteristics determining unit 23 (sound field forming data generated by the sound field characteristics determining unit 23), generates a sound for each remote conference client device. 11, 12, 13 and 14 to synthesize sounds and images.

発話回数記録部２５は、リモート会議中における各参加者Ａ、Ｂ、Ｃ、Ｄの発話回数を記録する。具体的には、発話回数記録部２５は、各リモート会議クライアント装置１１、１２、１３、１４から送信される音声データに基づいて各参加者Ａ、Ｂ、Ｃ、Ｄの発話回数を記録する。発話回数記録部２５は、各参加者Ａ、Ｂ、Ｃ、Ｄの発話回数から、リモート会議に参加中の各参加者Ａ、Ｂ、Ｃ、Ｄの発話頻度を算定する。具体的には、発話回数記録部２５は、一定時間（例えば、リモート会議開始から１０分間）内における各参加者Ａ、Ｂ、Ｃ、Ｄの発話回数の単純平均、又は、直近の比率を高めた各参加者Ａ、Ｂ、Ｃ、Ｄの発話回数の加重平均に基づいて、リモート会議に参加中の各参加者Ａ、Ｂ、Ｃ、Ｄの発話頻度を算定する。従って、参加者Ａ、Ｂ、Ｃ、Ｄの各々の発話頻度を、リモート会議の開始から所定時間毎に細かく算定することができる。 The utterance count recording unit 25 records the utterance counts of the participants A, B, C, and D during the remote conference. Specifically, the utterance count recording unit 25 records the utterance counts of the participants A, B, C, and D based on the audio data transmitted from the remote conference client devices 11 , 12 , 13 , and 14 . The utterance frequency recording unit 25 calculates the utterance frequency of each participant A, B, C, and D participating in the remote conference from the utterance frequency of each participant A, B, C, and D. Specifically, the utterance count recording unit 25 increases the number of utterances of each participant A, B, C, and D within a certain period of time (for example, 10 minutes from the start of the remote conference), or the most recent ratio. Based on the weighted average of the number of utterances of each participant A, B, C, and D, the utterance frequency of each participant A, B, C, and D participating in the remote conference is calculated. Therefore, the speech frequency of each of the participants A, B, C, and D can be finely calculated every predetermined time from the start of the remote conference.

座席決定部２２は、発話回数記録部２５が算定した各参加者Ａ、Ｂ、Ｃ、Ｄの発話頻度に基づいて、リモート会議室３０における参加者Ａ、Ｂ、Ｃ、Ｄの仮想的な着席位置（座席３２、３３、３４、３５の位置）を調整する。具体的には、発話回数記録部２５が各参加者Ａ、Ｂ、Ｃ、Ｄの発話頻度を算出した結果、参加者Ａの発話頻度が他の参加者Ｂ、Ｃ、Ｄの直近の発話頻度より多いと判明した場合には、座席決定部２２は、直近の発言量の多い第１参加者Ａと他の参加者Ｂ、Ｃ、Ｄとの間の距離が長くなるように、リモート会議室３０における第１参加者Ａの仮想的な着席位置（第１座席３２の位置）を、他の参加者Ｂ、Ｃ、Ｄの仮想的な着席位置（座席３３、３４、３５の位置）より離す。このように、座席決定部２２は、各参加者Ａ、Ｂ、Ｃ、Ｄの過去の発話履歴に基づいて、参加者Ａ、Ｂ、Ｃ、Ｄの仮想的な着席位置を調整することで、リモート会議室３０において、発話頻度が多い参加者（参加者Ａ）を容易に識別することができる。 The seat determination unit 22 determines the virtual seating of the participants A, B, C, and D in the remote conference room 30 based on the frequency of speech of each of the participants A, B, C, and D calculated by the number-of-speech recording unit 25. Adjust the position (position of seats 32, 33, 34, 35). Specifically, as a result of the utterance frequency recording unit 25 calculating the utterance frequency of each of the participants A, B, C, and D, the utterance frequency of the participant A is the most recent utterance frequency of the other participants B, C, and D. If it is found to be more, the seat determination unit 22 selects the remote conference room so that the distance between the first participant A who has the most recent speech volume and the other participants B, C, and D is longer. The virtual seating position of the first participant A in 30 (the position of the first seat 32) is separated from the virtual seating positions of the other participants B, C, and D (the positions of the seats 33, 34, and 35). . In this way, the seat determination unit 22 adjusts the virtual seating positions of the participants A, B, C, and D based on the past speech histories of the participants A, B, C, and D. In the remote conference room 30, a participant (participant A) who frequently speaks can be easily identified.

次に、リモート会議システム１０におけるステレオ再生による音像合成について説明する。リモート会議システム１０においては、複数の参加者Ａ、Ｂ、Ｃ、Ｄが同時に話を行っていても、各参加者Ａ、Ｂ、Ｃ、Ｄの声を聞き取り易くするために、各参加者Ａ、Ｂ、Ｃ、Ｄの音声音像を離れた位置に配置するようにステレオ再生を行う。具体的には、音場特性決定部２３及び音声合成部２４が、仮想的に設定されたリモート会議室３０における各参加者Ａ、Ｂ、Ｃ、Ｄの着席位置（座席３２、３３、３４、３５の位置）による相対的な位置関係に応じてステレオ再生時の音声を調整することで、各参加者Ａ、Ｂ、Ｃ、Ｄがあたかもその場所から話しているように再現することができる。具体的には、リモート会議室３０における話者と、話者以外の参加者との相対的な距離によって以下のような処理を行う。 Next, sound image synthesis by stereo reproduction in the remote conference system 10 will be described. In the remote conference system 10, even if a plurality of participants A, B, C, and D are talking at the same time, each participant A , B, C, and D are arranged at distant positions. Specifically, the sound field characteristic determination unit 23 and the voice synthesis unit 24 determine the seating positions of the participants A, B, C, and D (seats 32, 33, 34, 35)), each participant A, B, C, and D can be reproduced as if they were speaking from that position by adjusting the sound during stereo reproduction according to the relative positional relationship. Specifically, the following processing is performed according to the relative distance between the speaker in the remote conference room 30 and the participants other than the speaker.

音声合成部２４は、遠くの話者の音声データを高域フィルターによって処理する。すなわち、音声合成部２４は、音声データを処理するフィルターの係数を、リモート会議室３０における話者と、話者以外の参加者との相対的な距離に応じて調整する。これにより、スピーカー１７から出力される際の音声に奥行感を出すことができる。具体的には、図３に示すように、音声合成部２４が、第１リモート会議クライアント装置１１で（第１参加者Ａに対して）再生する音声を合成する場合には、第１参加者Ａに対して、第２参加者Ｂ及び第３参加者Ｃよりも遠くの話者である第４参加者Ｄの音声データを高域フィルターによって処理する。ここで、音声合成部２４は、音場特性決定部２３によって決定された音場特性に基づいて、第４参加者Ｄの音声データを、第１参加者Ａに対して遠くの話者の音声データと判断する。音場特性決定部２３は、リモート会議室３０における第１参加者Ａの着席位置（第１座席３２）を基準として、第４参加者Ｄの着席位置（第４座席３５）と、第２参加者Ｂの着席位置（第２座席３３）及び第３参加者Ｃの着席位置（第３座席３４）との位置関係を相対的に比較することで、第４参加者Ｄの着席位置が第１参加者Ａの着席位置から遠い（第４参加者Ｄが第１参加者Ａの着席位置から遠い話者である）と判断する。そして、音場特性決定部２３は、その判断結果を音声合成部２４に送信する。 The voice synthesizer 24 processes the voice data of the distant speaker with a high-pass filter. That is, the speech synthesis unit 24 adjusts the coefficient of the filter that processes the speech data according to the relative distance between the speaker in the remote conference room 30 and the participants other than the speaker. As a result, the sound output from the speaker 17 can have a sense of depth. Specifically, as shown in FIG. 3, when the speech synthesizing unit 24 synthesizes speech to be played back by the first remote conference client device 11 (for the first participant A), the first participant For A, the speech data of a fourth participant D, who is a farther speaker than the second participant B and the third participant C, is processed by a high-pass filter. Here, the speech synthesizing unit 24 converts the speech data of the fourth participant D into the speech of a speaker far away from the first participant A based on the sound field characteristics determined by the sound field characteristics determining unit 23 . judged as data. The sound field characteristic determining unit 23 determines the seating position (fourth seat 35) of the fourth participant D and the second participation By relatively comparing the positional relationship between the seating position of the person B (second seat 33) and the seating position of the third participant C (third seat 34), the seating position of the fourth participant D is the first seat position. It is determined that the speaker is far from the seating position of the participant A (the fourth participant D is a speaker far from the seating position of the first participant A). Then, the sound field characteristic determining section 23 transmits the determination result to the speech synthesizing section 24 .

また、音声合成部２４は、音声データを処理するフィルターの係数を調整することで、遠くの話者の音声データのうちの高い周波数の音ほど空気によって吸収される現象を再現する。このようにフィルターの係数を調整することで、空気中での音の伝わり方を忠実に再現することができ、低い周波数成分を有する音のみが遠くまで聞こえるように調整することができる。 In addition, the speech synthesizer 24 adjusts the coefficient of the filter that processes the speech data to reproduce the phenomenon that the higher the frequency of the speech data of the distant speaker, the more the sound is absorbed by the air. By adjusting the coefficients of the filter in this way, it is possible to faithfully reproduce how sound travels through the air, and it is possible to make adjustments so that only sounds with low frequency components can be heard over long distances.

さらに、音声合成部２４は、遠くの話者の音声データについては、音声レベルの増加が緩やかとなるようにアタックを調整する。すなわち、音声合成部２４は、各参加者Ａ、Ｂ、Ｃ、Ｄの無発声状態からの発話開始を検知して、各参加者Ａ、Ｂ、Ｃ、Ｄの音声の音量の立ち上がりを、リモート会議室３０における話者と、話者以外の参加者との相対的な距離に応じて調整する。具体的には、図３に示すように、音声合成部２４が、第１リモート会議クライアント装置１１で（第１参加者Ａに対して）再生する音声を合成する場合には、第１参加者Ａに対して遠くの話者である第４参加者Ｄの音声の音量の立ち上がりが、他の参加者Ｂ、Ｃの音声の音量の立ち上がりより緩やかとなるようにアタックを調整する。 Furthermore, the voice synthesis unit 24 adjusts the attack of voice data of a distant speaker so that the voice level increases slowly. That is, the voice synthesizing unit 24 detects the start of speech of each of the participants A, B, C, and D from the non-speech state, and remotely controls the rise of the volume of the voice of each of the participants A, B, C, and D. Adjust according to the relative distance between the speaker in the conference room 30 and the participants other than the speaker. Specifically, as shown in FIG. 3, when the speech synthesizing unit 24 synthesizes speech to be played back by the first remote conference client device 11 (for the first participant A), the first participant The attack is adjusted so that the rise in the voice volume of the fourth participant D, who is a speaker far from A, is gentler than the rise in the voice volume of the other participants B and C.

さらにまた、音声合成部２４は、遠くの話者の音声データにはリバーブ（残響）を適用する。音声合成部２４が遠くの話者の音声データにリバーブ（残響）を適用することで、話者の音声に、空間的な深み、或いは広がり感を与えることができる。 Furthermore, the speech synthesizer 24 applies reverb to the speech data of the distant speaker. By applying reverb (reverberation) to the voice data of the distant speaker by the voice synthesizing unit 24, it is possible to give the voice of the speaker a sense of spatial depth or breadth.

また、音声合成部２４は、リモート会議室３０における話者と、話者以外の参加者との左右方向の相対的な位置関係に応じて、スピーカー１７のステレオ再生における左右のスピーカーの音量を調整する。具体的には、図３に示すように、音声合成部２４が、第１リモート会議クライアント装置１１で（第１参加者Ａに対して）再生する音声を合成する場合に、第１参加者Ａに対して左側（図３においては右側）の話者である第２参加者Ｂの音声を出力する際には、スピーカー１７のステレオ再生における左側のスピーカーの音量を右側のスピーカーの音量より大きく調整する。 In addition, the speech synthesis unit 24 adjusts the volume of the left and right speakers in the stereo reproduction of the speakers 17 according to the relative positional relationship in the left and right direction between the speaker in the remote conference room 30 and the participants other than the speaker. do. Specifically, as shown in FIG. 3, when the speech synthesizing unit 24 synthesizes speech to be played back by the first remote conference client device 11 (for the first participant A), the first participant A When outputting the voice of the second participant B who is the speaker on the left side (right side in FIG. 3) of the speaker 17, the volume of the left speaker in the stereo reproduction of the speaker 17 is adjusted to be higher than the volume of the right speaker do.

このように、リモート会議システム１０では、リモート会議サーバー装置１５の音場特性決定部２３及び音声合成部２４において、各参加者Ａ、Ｂ、Ｃ、Ｄ用に処理された音声データ及び画像を生成し、参加者Ａ、Ｂ、Ｃ、Ｄ毎に個別に配信する。 As described above, in the remote conference system 10, the sound field characteristic determination unit 23 and the audio synthesis unit 24 of the remote conference server device 15 generate processed audio data and images for each of the participants A, B, C, and D. and distributed to participants A, B, C, and D individually.

図４は、リモート会議システム１０におけるリモート会議サーバー装置１５の作動フローを示すフローチャートである。上記のような機能を有するリモート会議システム１０において、リモート会議サーバー装置１５は、例えば、図４に示すような作動フローに沿って処理を行う。 FIG. 4 is a flow chart showing the operation flow of the remote conference server device 15 in the remote conference system 10. As shown in FIG. In the remote conference system 10 having the functions as described above, the remote conference server device 15 performs processing according to, for example, an operation flow as shown in FIG.

図４に示すように、リモート会議サーバー装置１５は、各リモート会議クライアント装置１１、１２、１３、１４から送信される信号に基づいてリモート会議の参加者Ａ、Ｂ、Ｃ、Ｄを特定する（ステップＳ１）。リモート会議サーバー装置１５は、参加者Ａ、Ｂ、Ｃ、Ｄを特定すると、画像形成部２１によって、図３に示すようなリモート会議室３０を想定し、想定したリモート会議室３０の中に会議用テーブル３１を仮定し、仮定した会議用テーブル３１の周りに参加者の人数分の座席（座席３２、３３、３４、３５）を配置する（ステップＳ２）。リモート会議サーバー装置１５は、リモート会議室３０に座席３２、３３、３４、３５を配置すると、座席決定部２２によって、リモート会議室３０における参加者Ａ、Ｂ、Ｃ、Ｄの座席（座席３２、３３、３４、３５）を図３に示すように決定する（ステップＳ３）。リモート会議サーバー装置１５は、参加者Ａ、Ｂ、Ｃ、Ｄの座席を決定すると、音場特性決定部２３によって、座席３２、３３、３４、３５相互間での音場特性を、リモート会議クライアント装置１１、１２、１３、１４毎に個別に決定する（ステップＳ４）。 As shown in FIG. 4, the remote conference server device 15 identifies remote conference participants A, B, C, and D based on signals transmitted from the remote conference client devices 11, 12, 13, and 14 ( step S1). When remote conference server device 15 identifies participants A, B, C, and D, remote conference server device 15 assumes remote conference room 30 as shown in FIG. A conference table 31 is assumed, and seats for the number of participants (seats 32, 33, 34, 35) are arranged around the assumed conference table 31 (step S2). When the remote conference server device 15 arranges the seats 32 , 33 , 34 and 35 in the remote conference room 30 , the seat determining unit 22 determines the seats of the participants A, B, C and D in the remote conference room 30 (seats 32 , 32 , 35 ). 33, 34, 35) are determined as shown in FIG. 3 (step S3). When the remote conference server device 15 determines the seats of the participants A, B, C, and D, the sound field characteristics determination unit 23 determines the sound field characteristics between the seats 32, 33, 34, and 35 to the remote conference client. It is determined individually for each of the devices 11, 12, 13 and 14 (step S4).

その後、リモート会議が開始され、参加者Ａ、Ｂ、Ｃ、Ｄからの発話がある、すなわち、リモート会議クライアント装置１１、１２、１３、１４からリモート会議サーバー装置１５に音声データが送信されると、リモート会議サーバー装置１５は、送信された音声データから発話者を特定する（ステップＳ５）。リモート会議サーバー装置１５は、発話者を特定すると、音声合成部２４によって、各リモート会議クライアント装置１１、１２、１３、１４で再生する発話者の音声を、リモート会議クライアント装置１１、１２、１３、１４毎に合成する（ステップＳ６）。この際、音声合成部２４は、音場特性決定部２３によって決定された音場特性に基づいて発話者の音声を合成する。リモート会議サーバー装置１５は、音場特性決定部２３によって発話者の音声を合成すると、合成した音声データを通信部２０によって各リモート会議クライアント装置１１、１２、１３、１４に送信する（ステップＳ７）。 After that, when the remote conference is started and participants A, B, C, and D speak, that is, voice data is transmitted from the remote conference client devices 11, 12, 13, and 14 to the remote conference server device 15. , the remote conference server device 15 identifies the speaker from the transmitted voice data (step S5). When the remote conference server device 15 identifies the speaker, the voice synthesizing unit 24 synthesizes the speaker's voice reproduced by each of the remote conference client devices 11, 12, 13, and 14 into the remote conference client devices 11, 12, 13, 14 is synthesized (step S6). At this time, the speech synthesizing unit 24 synthesizes the speaker's speech based on the sound field characteristics determined by the sound field characteristics determining unit 23 . After synthesizing the voice of the speaker by the sound field characteristic determination unit 23, the remote conference server device 15 transmits the synthesized voice data to each of the remote conference client devices 11, 12, 13, and 14 through the communication unit 20 (step S7). .

リモート会議サーバー装置１５は、合成した音声データを各リモート会議クライアント装置１１、１２、１３、１４に送信すると、参加者Ａ、Ｂ、Ｃ、Ｄからの発話があるか否かを判断する（ステップＳ８）。すなわち、リモート会議サーバー装置１５は、リモート会議クライアント装置１１、１２、１３、１４からリモート会議サーバー装置１５に音声データが送信されているか否かを判断する。リモート会議サーバー装置１５は、参加者Ａ、Ｂ、Ｃ、Ｄからの発話があると判断すると（ステップＳ８－Ｙｅｓ）、送信された音声データから発話者を特定する（ステップＳ５）。リモート会議サーバー装置１５は、参加者Ａ、Ｂ、Ｃ、Ｄからの発話がないと判断すると（ステップＳ８－Ｎｏ）、リモート会議が終了したか否かを判断する（ステップＳ９）。この時、リモート会議サーバー装置１５は、リモート会議を終了する旨の信号がリモート会議クライアント装置１１、１２、１３、１４の少なくとも１つから送信された場合、或いは、リモート会議クライアント装置１１、１２、１３、１４の少なくとも１つからの通信が切断された場合には、リモート会議が終了したと判断する。リモート会議サーバー装置１５は、リモート会議が終了していないと判断すると（ステップＳ９－Ｎｏ）、参加者Ａ、Ｂ、Ｃ、Ｄからの発話があるか否かを判断する（ステップＳ８）。リモート会議サーバー装置１５は、リモート会議が終了したと判断すると（ステップＳ９－Ｎｏ）、リモート会議システム１０のホスト処理を終了する。なお、ステップＳ９において、リモート会議を終了する旨の信号がリモート会議クライアント装置１１、１２、１３、１４の全てから送信された場合、或いは、全てのリモート会議クライアント装置１１、１２、１３、１４からの通信が切断された場合に、リモート会議サーバー装置１５は、リモート会議が終了したと判断してもよい。 When the remote conference server device 15 transmits the synthesized voice data to each of the remote conference client devices 11, 12, 13, and 14, the remote conference server device 15 determines whether or not there is an utterance from the participants A, B, C, and D (step S8). That is, the remote conference server device 15 determines whether voice data is being transmitted to the remote conference server device 15 from the remote conference client devices 11 , 12 , 13 , and 14 . When the remote conference server device 15 determines that there is an utterance from the participants A, B, C, and D (step S8-Yes), it identifies the utterer from the transmitted voice data (step S5). When the remote conference server device 15 determines that there is no speech from the participants A, B, C, and D (step S8-No), it determines whether or not the remote conference has ended (step S9). At this time, the remote conference server device 15 receives a signal to end the remote conference from at least one of the remote conference client devices 11, 12, 13, and 14, or the remote conference client devices 11, 12, When communication from at least one of 13 and 14 is disconnected, it is determined that the remote conference has ended. When the remote conference server device 15 determines that the remote conference has not ended (step S9-No), it determines whether or not there are any speeches from the participants A, B, C, and D (step S8). When the remote conference server device 15 determines that the remote conference has ended (step S9-No), the host processing of the remote conference system 10 ends. In step S9, if a signal indicating that the remote conference is to be terminated has been sent from all of the remote conference client devices 11, 12, 13, and 14, or from all of the remote conference client devices 11, 12, 13, and 14, communication is disconnected, the remote conference server device 15 may determine that the remote conference has ended.

次に、図５を参照して、リモート会議システム１０の１対１対話機能について説明する。図５は、リモート会議システム１０において１対１対話モード時にリモート会議クライアント装置１１、１２、１３、１４の表示部１９に表示されるリモート会議室３０の一例を示す図である。 Next, with reference to FIG. 5, the one-to-one interaction function of the remote conference system 10 will be described. FIG. 5 is a diagram showing an example of the remote conference room 30 displayed on the display units 19 of the remote conference client devices 11, 12, 13, and 14 in the one-to-one interactive mode in the remote conference system 10. As shown in FIG.

リモート会議システム１０には、リモート会議に参加中の参加者Ａ、Ｂ、Ｃ、Ｄの中から特定の参加者と１対１で対話を行うことができる１対１対話モードが設定されている。１対１対話モードは、各リモート会議クライアント装置１１、１２、１３、１４に設けられている。参加者Ａ、Ｂ、Ｃ、Ｄは、リモート会議クライアント装置１１、１２、１３、１４の表示部１９に表示される参加者リストから特定の参加者を選択することで、１対１対話モードに入ることができる。例えば、参加者Ａが特定の参加者Ｃとのみ対話がしたい場合には、参加者Ａは、第１リモート会議クライアント装置１１の表示部１９に表示される参加者リストから参加者Ｃを選択して１対１対話モードに入ることで、参加者Ｃとのみ対話をすることができる。 The remote conference system 10 is set with a one-to-one dialogue mode that enables one-to-one dialogue with a specific participant from participants A, B, C, and D participating in the remote conference. . A one-to-one interaction mode is provided for each remote conference client device 11 , 12 , 13 , 14 . Participants A, B, C, and D select a specific participant from the participant list displayed on the display unit 19 of the remote conference client devices 11, 12, 13, and 14 to enter the one-to-one interactive mode. can enter. For example, if participant A wishes to have a conversation with only a specific participant C, participant A selects participant C from the participant list displayed on the display unit 19 of the first remote conference client device 11. By entering the one-to-one dialogue mode with the C, it is possible to have a dialogue only with the participant C.

リモート会議クライアント装置１１、１２、１３、１４において１対１対話モードが設定されると、リモート会議サーバー装置１５は、１対１対話モードに設定されたリモート会議クライアント装置１１、１２、１３、１４から伝達される音声を、１対１対話モードに設定されたリモート会議クライアント装置１１、１２、１３、１４のみに配信する。このように、リモート会議サーバー装置１５が音声を配信することで、特定の参加者Ａ、Ｂ、Ｃ、Ｄとのみ対話をすることができる。 When the remote conference client devices 11, 12, 13, and 14 are set to the one-to-one interactive mode, the remote conference server device 15 controls the remote conference client devices 11, 12, 13, and 14 set to the one-to-one interactive mode. audio transmitted from is delivered only to the remote conference client devices 11, 12, 13, 14 set to the one-to-one interaction mode. In this way, the remote conference server device 15 distributes audio, so that only specific participants A, B, C, and D can have a conversation.

例えば、１対１対話モードが、第１リモート会議クライアント装置１１（第１参加者Ａ）と、第３リモート会議クライアント装置１３（第３参加者Ｃ）と、の間で設定されている場合、リモート会議サーバー装置１５は、第１リモート会議クライアント装置１１から伝達される音声を、第３リモート会議クライアント装置１３のみに配信するとともに、第３リモート会議クライアント装置１３から伝達される音声を、第１リモート会議クライアント装置１１のみに配信する。すなわち、１対１モードが設定されていない第２リモート会議クライアント装置１２（第２参加者Ｂ）及び第４リモート会議クライアント装置１４（第４参加者Ｄ）には、第１リモート会議クライアント装置１１及び第３リモート会議クライアント装置１３から伝達される音声は配信されない。 For example, if the one-to-one interaction mode is set between the first remote conference client device 11 (first participant A) and the third remote conference client device 13 (third participant C), The remote conference server device 15 distributes the audio transmitted from the first remote conference client device 11 only to the third remote conference client device 13, and distributes the audio transmitted from the third remote conference client device 13 to the first remote conference client device 13. Distribute only to the remote conference client device 11 . That is, for the second remote conference client device 12 (second participant B) and the fourth remote conference client device 14 (fourth participant D) for which the one-to-one mode is not set, the first remote conference client device 11 and audio transmitted from the third remote conference client device 13 is not distributed.

リモート会議システム１０における１対１対話モードの設定は、１対１対話モードに設定されたリモート会議クライアント装置のみが解除できる。例えば、１対１対話モードが、第１リモート会議クライアント装置１１（第１参加者Ａ）と、第３リモート会議クライアント装置１３（第３参加者Ｃ）と、の間で設定されている場合、１対１対話モードの設定は、第１リモート会議クライアント装置１１と第３リモート会議クライアント装置１３とのいずれかのみで解除できる。このように、１対１対話モードに設定されたリモート会議クライアント装置のみが１対１対話モードの設定を解除できることから、１対１対話モードではない他の参加者Ａ、Ｂ、Ｃ、Ｄから１対１対話モードの設定を解除されることを防止できる。 The setting of the one-to-one interactive mode in the remote conference system 10 can be canceled only by the remote conference client device set to the one-to-one interactive mode. For example, if the one-to-one interaction mode is set between the first remote conference client device 11 (first participant A) and the third remote conference client device 13 (third participant C), The setting of the one-to-one interactive mode can be canceled by either the first remote conference client device 11 or the third remote conference client device 13 only. In this way, only the remote conference client device that has been set to the one-to-one interactive mode can cancel the one-to-one interactive mode setting. It is possible to prevent the setting of the one-to-one interactive mode from being canceled.

座席決定部２２は、１対１対話モードが所定のリモート会議クライアント装置１１、１２、１３、１４間で設定されると、１対１対話モードのリモート会議クライアント装置１１、１２、１３、１４から参加する参加者Ａ、Ｂ、Ｃ、Ｄ同士のリモート会議室３０における相対的距離を一時的に近づけるように、リモート会議室３０における参加者Ａ、Ｂ、Ｃ、Ｄの座席３２、３３、３４、３５の位置を決定する。併せて、座席決定部２２は、１対１対話モードではないリモート会議クライアント装置１１、１２、１３、１４から参加する参加者Ａ、Ｂ、Ｃ、Ｄとのリモート会議室３０における相対的距離を一時的に遠ざけるように、リモート会議室３０における参加者Ａ、Ｂ、Ｃ、Ｄの座席３２、３３、３４、３５の位置を決定する。 When the one-to-one interactive mode is set between predetermined remote conference client apparatuses 11, 12, 13, and 14, the seat determination unit 22 selects seats from remote conference client apparatuses 11, 12, 13, and 14 in the one-to-one interactive mode. Seats 32, 33, 34 of participants A, B, C, and D in the remote conference room 30 so as to temporarily shorten the relative distances of the participants A, B, C, and D in the remote conference room 30 , 35 are determined. In addition, the seat determination unit 22 determines the relative distances in the remote conference room 30 from the participants A, B, C, and D who participate from the remote conference client devices 11, 12, 13, and 14 that are not in the one-to-one interactive mode. Determine the positions of the seats 32, 33, 34, 35 of the participants A, B, C, D in the remote conference room 30 so as to temporarily distance them.

具体的には、例えば、第１参加者Ａの第１リモート会議クライアント装置１１と、第３参加者Ｃの第３リモート会議クライアント装置１３との間で１対１対話モードが設定されている場合には、座席決定部２２は、図５に示すように、リモート会議室３０における第１参加者Ａと第３参加者Ｃとの相対的距離を一時的に近づけるように、第１座席３２の位置と、第３座席３４の位置とを相対的に近づける。併せて、座席決定部２２は、１対１対話モードが設定されていないリモート会議クライアント装置１２、１４から参加する参加者Ｂ、Ｄとの相対的距離を一時的に遠ざけるように、第２座席３３及び第４座席３５の位置に対して、第１座席３２及び第３座席３４の位置を相対的に遠ざける。 Specifically, for example, when the one-to-one dialogue mode is set between the first remote conference client device 11 of the first participant A and the third remote conference client device 13 of the third participant C 5, the seat determination unit 22 moves the first seat 32 so that the relative distance between the first participant A and the third participant C in the remote conference room 30 is temporarily shortened. The position and the position of the third seat 34 are brought relatively close to each other. At the same time, the seat determination unit 22 temporarily increases the relative distance from the participants B and D who participate from the remote conference client devices 12 and 14 for which the one-to-one interaction mode is not set, and moves the second seat. The positions of the first seat 32 and the third seat 34 are relatively distanced from the positions of the seat 33 and the fourth seat 35.例文帳に追加

音声合成部２４は、１対１対話モードに設定されているリモート会議クライアント装置１１、１２、１３、１４のスピーカー１７から出力される参加者Ａ、Ｂ、Ｃ、Ｄの音声のうち、１対１対話モードに設定されていないリモート会議クライアント装置１１、１２、１３、１４の参加者Ａ、Ｂ、Ｃ、Ｄの音声を、１対１対話モードに設定されているリモート会議クライアント装置１１、１２、１３、１４の参加者Ａ、Ｂ、Ｃ、Ｄの音声より小さい音量でスピーカー１７から出力させるように、１対１対話モードに設定されていないリモート会議クライアント装置１１、１２、１３、１４の参加者Ａ、Ｂ、Ｃ、Ｄの音声データを合成する。すなわち、１対１対話モードに設定されているリモート会議クライアント装置では、１対１対話モードに設定されていないリモート会議クライアント装置の参加者の音声の音量が絞られてスピーカー１７から出力される。 The voice synthesizing unit 24 synthesizes one pair of the voices of the participants A, B, C, and D output from the speakers 17 of the remote conference client devices 11, 12, 13, and 14 set to the one-to-one dialogue mode. The voices of the participants A, B, C, and D of the remote conference client devices 11, 12, 13, and 14, which are not set to the one-to-one interactive mode, are transferred to the remote conference client devices 11, 12, which are set to the one-to-one interactive mode. , 13, and 14 of the remote conference client devices 11, 12, 13, and 14 that are not set to the one-to-one interactive mode so that the voices of the participants A, B, C, and D of the remote conferences 11, 12, 13, and 14 are output from the speaker 17 at a volume lower than that of the participants A, B, C, and D. The speech data of participants A, B, C, and D are synthesized. That is, in the remote conference client device set to the one-to-one interactive mode, the volume of the voice of the participant of the remote conference client device not set to the one-to-one interactive mode is reduced and output from the speaker 17 .

例えば、第１参加者Ａの第１リモート会議クライアント装置１１と、第３参加者Ｃの第３リモート会議クライアント装置１３との間で１対１対話モードが設定されている場合には、音声合成部２４は、第１リモート会議クライアント装置１１及び第３リモート会議クライアント装置１３のスピーカー１７から出力される参加者Ａ、Ｂ、Ｃ、Ｄの音声のうち、１対１対話モードに設定されていないリモート会議クライアント装置１２、１４の参加者Ｂ、Ｄの音声を、１対１対話モードに設定されているリモート会議クライアント装置１１、１４の参加者Ａ、Ｃの音声より小さい音量でスピーカー１７から出力させるように、リモート会議クライアント装置１２、１４の参加者Ｂ、Ｄの音声データを合成する。 For example, when the one-to-one dialogue mode is set between the first remote conference client device 11 of the first participant A and the third remote conference client device 13 of the third participant C, voice synthesis The unit 24 determines that among the voices of the participants A, B, C, and D output from the speakers 17 of the first remote conference client device 11 and the third remote conference client device 13, the one-to-one dialogue mode is not set. The voices of the participants B and D of the remote conference client devices 12 and 14 are output from the speaker 17 at a volume lower than the voices of the participants A and C of the remote conference client devices 11 and 14 set to the one-to-one dialogue mode. The audio data of the participants B and D of the remote conference client devices 12 and 14 are synthesized so that

このように、音声合成部２４がスピーカー１７から出力させる参加者Ａ、Ｂ、Ｃ、Ｄの音声の音量を制御した音声データを合成することで、１対１対話モードに設定されているリモート会議クライアント装置１１、１２、１３、１４の参加者Ａ、Ｂ、Ｃ、Ｄの音声と、１対１対話モードに設定されていないリモート会議クライアント装置１１、１２、１３、１４の参加者Ａ、Ｂ、Ｃ、Ｄの音声と、が混同することなく、１対１対話モードに設定されているリモート会議クライアント装置１１、１２、１３、１４のスピーカー１７から出力させることができ、より一層参加者Ａ、Ｂ、Ｃ、Ｄの音声を聞き取り易くすることができる。 In this way, by synthesizing voice data in which the voice synthesis unit 24 controls the volume of the voices of the participants A, B, C, and D to be output from the speaker 17, the remote conference set to the one-to-one dialogue mode can be performed. Voices of participants A, B, C, and D of client devices 11, 12, 13, and 14 and participants A and B of remote conference client devices 11, 12, 13, and 14 not set to one-to-one interaction mode , C, and D can be output from the speakers 17 of the remote conference client devices 11, 12, 13, and 14 set in the one-to-one dialogue mode without confusion between the voices of the participants A, C, and D. , B, C, and D can be made easier to hear.

なお、本発明の実施形態では、音声合成部２４（リモート会議サーバー装置１５側）においてスピーカー１７から出力させる参加者Ａ、Ｂ、Ｃ、Ｄの音声の音量を制御しているが、これに限定されるものではなく、リモート会議クライアント装置１１、１２、１３、１４側（例えば、スピーカー１７）において参加者Ａ、Ｂ、Ｃ、Ｄの音声の音量を制御しても構わない。 In the embodiment of the present invention, the volume of the voices of the participants A, B, C, and D to be output from the speaker 17 is controlled by the voice synthesizing unit 24 (on the side of the remote conference server device 15), but the present invention is limited to this. Instead, the volume of the voices of the participants A, B, C, and D may be controlled on the side of the remote conference client devices 11, 12, 13, and 14 (for example, the speakers 17).

以上のように、本発明の実施形態によると、リモート会議において、個々の参加者Ａ、Ｂ、Ｃ、Ｄの音像定位を、リモート会議クライアント装置１１、１２、１３、１４（参加者Ａ、Ｂ、Ｃ、Ｄ）毎に分離することから、複数の参加者Ａ、Ｂ、Ｃ、Ｄが同時に発言した場合であっても、個々の話者の発言内容を容易に聞き取ることができる。 As described above, according to the embodiment of the present invention, in a remote conference, the sound image localization of individual participants A, B, C, and D is controlled by the remote conference client devices 11, 12, 13, and 14 (participants A, B). , C, and D), even when a plurality of participants A, B, C, and D speak at the same time, it is possible to easily hear the content of each speaker's speech.

また、本発明の実施形態によると、個々の参加者Ａ、Ｂ、Ｃ、Ｄの音像定位を過去の発話履歴や特別な１対１対話モードによって調整するため、より一層参加者Ａ、Ｂ、Ｃ、Ｄの音声を聞き取り易くすることができる。 In addition, according to the embodiment of the present invention, since the sound image localization of each participant A, B, C, D is adjusted according to the past speech history and a special one-to-one interaction mode, the participants A, B, The voices of C and D can be made easier to hear.

なお、本発明の実施形態では、音場特性決定部２３及び音声合成部２４をリモート会議サーバー装置１５に設けているが、これに限定されるものではなく、リモート会議クライアント装置１１、１２、１３、１４に設けても構わない。すなわち、リモート会議サーバー装置１５は、各参加者Ａ、Ｂ、Ｃ、Ｄの音声を個別の音声トラックとした全参加者分の音声データを、各リモート会議クライアント装置１１、１２、１３、１４に対して並列に同報配信する。そして、リモート会議クライアント装置１１、１２、１３、１４は、配信された各参加者Ａ、Ｂ、Ｃ、Ｄの音声データを音場特性決定部２３及び音声合成部２４によって調整した後、各参加者Ａ、Ｂ、Ｃ、Ｄの音声データをミックスしてステレオ再生を行う。 In the embodiment of the present invention, the sound field characteristic determining unit 23 and the voice synthesizing unit 24 are provided in the remote conference server device 15, but the present invention is not limited to this. , 14. That is, the remote conference server device 15 sends the voice data of all the participants, with the voices of the participants A, B, C, and D as separate audio tracks, to the remote conference client devices 11, 12, 13, and 14. broadcast in parallel. Then, the remote conference client devices 11, 12, 13, and 14 adjust the delivered voice data of each of the participants A, B, C, and D by the sound field characteristic determining unit 23 and the voice synthesizing unit 24, and Audio data of persons A, B, C, and D are mixed and reproduced in stereo.

本実施形態では、第１参加者Ａと第３参加者Ｃとの間で１対１対話モードが設定されている場合に、座席決定部２２は、リモート会議室３０における第１参加者Ａと第３参加者Ｃとを一時的に近づけるように、第１座席３２の位置と第３座席３４の位置とを近づけているが、これに限定されるものではない。例えば、第１座席３２の位置と第３座席３４の位置とを近づけることなく、第１参加者Ａと第３参加者Ｃとの間で１対１対話モードを設定しても構わない。すなわち、リモート会議室３０における参加者の座席の位置を近づけることなく、１対１対話モードを設定しても構わない。このようにすることで、１対１対話モードを設定している参加者（第１参加者Ａ及び第３参加者Ｃ）は、１対１対話モードを設定していない参加者（第２参加者Ｂ及び第４参加者Ｄ）に対して、１対１対話モードを実行していることを隠すことができる。 In this embodiment, when the one-to-one dialogue mode is set between the first participant A and the third participant C, the seat determination unit 22 The position of the first seat 32 and the position of the third seat 34 are brought closer to each other so that the third participant C can be brought closer temporarily, but the present invention is not limited to this. For example, a one-to-one interaction mode may be set between the first participant A and the third participant C without moving the position of the first seat 32 and the position of the third seat 34 closer together. That is, the one-to-one dialogue mode may be set without moving the seats of the participants in the remote conference room 30 closer to each other. By doing so, the participants (the first participant A and the third participant C) who have set the one-to-one dialogue mode can be compared with the participants who have not set the one-to-one dialogue mode (the second participant It can be hidden from party B and the fourth participant D) that it is running in one-to-one interaction mode.

以上、図面を参照しながら本発明の実施形態を説明した。但し、本発明は、上記の実施形態に限られるものではなく、その要旨を逸脱しない範囲で種々の態様において実施することが可能である。図面は、理解しやすくするために、それぞれの構成要素を主体に模式的に示しており、図示された各構成要素の厚み、長さ、個数、間隔等は、図面作成の都合上から実際とは異なる。また、上記の実施形態で示す各構成要素の材質、形状、寸法等は一例であって、特に限定されるものではなく、本発明の構成から実質的に逸脱しない範囲で種々の変更が可能である。 The embodiments of the present invention have been described above with reference to the drawings. However, the present invention is not limited to the above-described embodiments, and can be implemented in various aspects without departing from the gist of the present invention. In order to make the drawings easier to understand, the drawings mainly show each component schematically. is different. In addition, the material, shape, dimensions, etc. of each component shown in the above embodiment are examples and are not particularly limited, and various changes are possible without substantially departing from the configuration of the present invention. be.

本発明は、リモート会議を実行するためのリモート会議システム等に用いるのに好適である。 INDUSTRIAL APPLICABILITY The present invention is suitable for use in remote conference systems and the like for executing remote conferences.

１０リモート会議システム
１１第１リモート会議クライアント装置（リモート会議クライアント装置）
１２第２リモート会議クライアント装置（リモート会議クライアント装置）
１３第３リモート会議クライアント装置（リモート会議クライアント装置）
１４第４リモート会議クライアント装置（リモート会議クライアント装置）
１５リモート会議サーバー装置
２２座席決定部
２３音場特性決定部
２４音声合成部
３２第１座席（座席）
３３第２座席（座席）
３４第３座席（座席）
３５第４座席（座席）
Ａ第１参加者（参加者）
Ｂ第２参加者（参加者）
Ｃ第３参加者（参加者）
Ｄ第４参加者（参加者） 10 remote conference system 11 first remote conference client device (remote conference client device)
12 second remote conference client device (remote conference client device)
13 third remote conference client device (remote conference client device)
14 fourth remote conference client device (remote conference client device)
15 Remote conference server device 22 Seat determination unit 23 Sound field characteristic determination unit 24 Voice synthesis unit 32 First seat (seat)
33 Second Seat (Seat)
34 Third Seat (Seat)
35 Fourth seat (seat)
A First Participant (Participant)
B Second Participant (Participant)
C Third Participant (Participant)
D Fourth Participant (Participant)

Claims

a remote conference server device that hosts a remote conference;
a remote conference client device assigned to each participant participating in the remote conference;
The remote conference server device includes a seat determination unit that determines the seats of the participants in the remote conference room where the remote conference is held;
The remote conference server device or the remote conference client device,
a sound field characteristic determination unit that determines sound field characteristics between the seats determined by the seat determination unit;
a speech synthesizer that synthesizes speech to be output to the participant,
The sound field characteristic determination unit determines sound field characteristics of audio reproduced by each of the remote conference client devices,
The remote conference system, wherein the voice synthesizing unit synthesizes voices to be reproduced by each of the remote conference client devices based on the sound field characteristics determined by the sound field characteristics determining unit.

The remote conference server device includes an utterance count recording unit that records the number of utterances of each of the participants,
The utterance frequency recording unit calculates the utterance frequency of each of the participants participating in the remote conference based on the number of utterances of each of the participants;
2. The seat determining unit adjusts the position of the seat where the participant with the high frequency of speaking sits in the remote conference room, based on the plurality of the speaking frequencies calculated by the speaking frequency recording unit. remote conference system described in .

The utterance count recording unit is based on a simple average of the number of utterances of each of the participants within a certain period of time, or a weighted average of the number of utterances of each of the participants with the most recent ratio increased. 3. The remote conferencing system of claim 2, wherein the speaking frequency of each of the participating participants is calculated.

The seat determination unit determines the position of the seat where the participant with the high utterance frequency sits in the remote conference room, the position of the seat where the participant with the low utterance frequency sits in the remote conference room, 4. The remote conference system according to claim 2 or 3, wherein the position of the seat is adjusted so that the distance between is increased.

The remote conference client device has a one-to-one dialogue mode in which a specific participant and another specific participant among the plurality of participants have a one-to-one dialogue,
wherein said remote conference server device distributes audio transmitted from said remote conference client device set in said one-to-one interaction mode only to said remote conference client device set in said one-to-one interaction mode. The remote conference system according to any one of claims 1 to 4.

6. The remote conference system according to claim 5, wherein setting of said one-to-one interactive mode can be canceled only by said remote conference client device set to said one-to-one interactive mode.

The seat determination unit
determining a seat position of the participant in the remote conference room so as to reduce the relative distance between the specific participant and the other specific participant in the remote conference room;
a seating position of the participant in the remote conference room to distance the relative distance between the particular participant and the participant assigned the remote conference client device not in the one-to-one interaction mode; 7. The remote conference system according to claim 5 or 6, wherein the remote conference system determines

The remote conference server device or the remote conference client device, among the voices of the participants output from the remote conference client device in the one-to-one interactive mode, converts the voice of the participant different from the specific participant to the 8. The remote conference system according to any one of claims 5 to 7, wherein output is made at a volume lower than that of a specific participant's voice.