JP2010219733A

JP2010219733A - Conference recording device, method of recording conference, and conference recording program

Info

Publication number: JP2010219733A
Application number: JP2009062557A
Authority: JP
Inventors: Kenji Oike; 健二尾池
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-03-16
Filing date: 2009-03-16
Publication date: 2010-09-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a conference recording device capable of recording a videoconference executed by a videoconference system so that reaction of other users to remarks of one user are reproduced in synchronization. <P>SOLUTION: In the videoconference system 1, a terminal unit 3 stores moving image data mutually converted among the terminal units 3, 4, 5 corresponding to time data. The time data correlated with the moving image data are adjusted by buffering of recording delay time calculated based on delay time of a network 2. The terminal unit 3 composites the moving image data in time series based on the time data to create conference recording data to be reproduced while the moving image data generated by the terminal unit 3 are in synchronization with the moving image data generated by the terminal units 4, 5 in the reception of the moving data. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、複数拠点に設けられたコンピュータ間で行われる遠隔会議を記録する会議記録装置、会議記録方法および会議記録プログラムに関する。 The present invention relates to a conference recording apparatus, a conference recording method, and a conference recording program for recording a remote conference held between computers provided at a plurality of locations.

従来、ユーザが存在する複数拠点に設けられた会議端末をネットワーク経由で接続し、これらの会議端末を用いて遠隔会議を行うテレビ会議システムが存在する。このようなテレビ会議システムにおいて、遠隔会議を記録しておくことによって、のちに遠隔会議を再生することができるようにした会議記録機能が提供されている。例えば、複数のビデオカメラを用いて会合の様子を撮影した各動画像を、時間軸上で同期をとりながら統合して記録・管理するようにしたコンテンツ処理システムが知られている（例えば、特許文献１参照）。 2. Description of the Related Art Conventionally, there is a video conference system in which conference terminals provided at a plurality of locations where users exist are connected via a network and a remote conference is performed using these conference terminals. In such a video conference system, there is provided a conference recording function that allows a remote conference to be reproduced later by recording the remote conference. For example, a content processing system is known in which each moving image obtained by photographing a meeting using a plurality of video cameras is integrated and recorded and managed while being synchronized on a time axis (for example, patents). Reference 1).

特開２００５−２６０５１２号公報JP-A-2005-260512

しかしながら、従来のテレビ会議システムでは、各会議端末がそれぞれ遠く離れた拠点に設置されている場合に、ネットワークの遅延によってリアルタイムに出力される会議映像にタイミングのズレが生じることがある。具体的には、遠隔会議において、一拠点のユーザが行った発言に対して、他拠点のユーザが何らかの反応（例えば、頷きや首傾げなど）を示すことがある。このとき、会議映像中にネットワークの遅延が発生していると、一拠点のユーザの発言が出力されてから大幅に遅れて他拠点のユーザの反応が出力される。この場合、遠隔会議を記録再生したとしても、同様に会議映像中にタイミングのズレが生じていることから、会議映像中のユーザの反応がどの発言に対するものなのかを判断できないおそれがあった。 However, in the conventional video conference system, when each conference terminal is installed at a location far away from each other, a timing shift may occur in a conference video output in real time due to a network delay. Specifically, in a remote conference, a user at another site may show some kind of reaction (for example, whispering or tilting head) in response to a statement made by a user at one site. At this time, if there is a network delay in the conference video, the reaction of the user at the other site is output with a great delay after the message of the user at one site is output. In this case, even if the remote conference is recorded and reproduced, there is a possibility that it is impossible to determine which remark the user's reaction in the conference video is due to a timing shift in the conference video.

本発明は、上記課題を解決するためになされたものであり、テレビ会議システムにて実行されるテレビ会議を、一のユーザの発言に対する他のユーザの反応が同期して再生されるように記録可能な会議記録装置、会議記録方法および会議記録プログラムを提供することを目的とする。 The present invention has been made in order to solve the above-described problems, and records a video conference executed by a video conference system so that the reaction of another user with respect to one user's speech is reproduced in synchronization. An object of the present invention is to provide a possible conference recording device, conference recording method, and conference recording program.

上記課題を解決するために、請求項１に係る発明の会議記録装置は、遠隔会議に参加するユーザの画像および音声を取得して、前記ユーザの画像および音声を含む動画データを生成する会議端末が、前記ユーザが存在する複数拠点にそれぞれ設置され、前記会議端末の各々において、ネットワークを介して前記動画データが相互に交換されるとともに、前記動画データに基づいて合成された画像および音声が出力される遠隔会議システムに用いられ、前記遠隔会議を記録した会議記録データを作成する会議記録装置であって、前記会議端末間で相互に前記動画データを送受信して交換する動画データ交換手段と、前記動画データ交換手段によって取得された前記動画データを、前記会議記録データの再生時における前記動画データの再生タイミングを示す時間情報に対応付けて記憶する動画記憶手段と、前記動画データ交換手段によって取得された前記動画データが前記会議端末間を前記ネットワーク経由で伝送されるのに要する遅延時間に基づいて、前記動画記憶手段に記憶されている動画データと時間情報との対応を補正することによって、前記動画データの再生タイミングを調整する遅延時間補正手段と、前記動画記憶手段に記憶された前記動画データを前記時間情報に基づいて時系列に合成することによって、一の前記会議端末にて生成された一の前記動画データと、前記一の動画データの受信時に他の前記会議端末にて生成された他の前記動画データとが同期して再生される前記会議記録データを作成する会議記録データ作成手段と、前記会議記録データ作成手段によって作成された前記会議記録データを出力する会議記録データ出力手段とを備えている。 In order to solve the above-described problem, a conference recording apparatus according to a first aspect of the present invention is a conference terminal that acquires an image and sound of a user participating in a remote conference and generates moving image data including the user's image and sound. Are installed at a plurality of locations where the user exists, and each of the conference terminals exchanges the moving image data with each other via a network, and outputs an image and a sound synthesized based on the moving image data. A conference recording device that is used in a remote conference system to create conference record data recording the remote conference, and exchanges and exchanges the video data between the conference terminals; The moving image data acquired by the moving image data exchanging means is used as a reproduction type of the moving image data when reproducing the conference recording data. Based on a delay time required for the video data acquired by the video data exchanging means to be transmitted between the conference terminals via the network. The delay time correcting means for adjusting the reproduction timing of the moving picture data by correcting the correspondence between the moving picture data stored in the moving picture storage means and the time information, and the moving picture data stored in the moving picture storage means By synthesizing in time series based on the time information, the one moving image data generated by one conference terminal and the other generated by the other conference terminal when receiving the one moving image data The conference record data creating means for creating the conference record data to be played back in synchronization with the moving image data, and the conference record data creating means And a conference recording data output means for outputting the conference record data created.

請求項２に係る発明の会議記録装置は、請求項１に記載の発明の構成に加えて、前記遅延時間補正手段は、前記会議端末間における前記ネットワークの伝送時間を計測する問い合わせ信号に基づいて前記遅延時間を特定し、前記動画記憶手段において前記動画データに対応付けられている前記時間情報を、前記遅延時間に基づいて前記動画データが受信された現実のタイミングよりも遅延させることを特徴とする。 According to a second aspect of the present invention, in the conference recording apparatus according to the first aspect, the delay time correcting means is based on an inquiry signal for measuring a transmission time of the network between the conference terminals. The delay time is specified, and the time information associated with the moving image data in the moving image storage means is delayed from the actual timing at which the moving image data is received based on the delay time. To do.

請求項３に係る発明の会議記録装置は、請求項１に記載の発明の構成に加えて、前記時間情報は、前記動画データが送信または受信された時刻を示すタイムスタンプであり、前記遅延時間補正手段は、前記動画記憶手段に記憶される前記動画データに対応付けられるタイムスタンプに基づいて、前記動画データの再生タイミングを調整することを特徴とする。 According to a third aspect of the present invention, in addition to the configuration of the first aspect, the time information is a time stamp indicating a time when the moving image data is transmitted or received, and the delay time The correcting unit adjusts the reproduction timing of the moving image data based on a time stamp associated with the moving image data stored in the moving image storage unit.

請求項４に係る発明の会議記録装置は、請求項１〜３のいずれかに記載の発明の構成に加えて、前記会議端末にて検出された前記ユーザに関する所定の反応を含む反応データを、前記ネットワークを介して前記会議端末間で送受信して交換する反応データ交換手段と、前記反応データ交換手段によって取得された前記反応データを、前記会議端末にて前記反応データの生成時に受信された前記動画データに含まれる前記時間情報と対応付けて記憶する反応記憶手段とを備え、前記会議記録データ作成手段は、前記動画記憶手段に記憶された前記動画データおよび前記反応記憶手段に記憶された前記反応データを前記時間情報に基づいて時系列に合成することによって、前記会議記録データの再生時において前記動画データと同期して前記所定の反応が表示される前記会議記録データを作成することを特徴とする。 In addition to the configuration of the invention according to any one of claims 1 to 3, the conference recording device of the invention according to claim 4 includes reaction data including a predetermined reaction related to the user detected by the conference terminal, Reaction data exchanging means for transmitting and receiving and exchanging between the conference terminals via the network, and the reaction data acquired by the reaction data exchanging means received at the conference terminal when the reaction data was generated Reaction storage means for storing in association with the time information included in the moving image data, the conference recording data creating means is the moving image data stored in the moving image storage means and the reaction storage means stored in the reaction storage means By synthesizing the reaction data in time series based on the time information, the predetermined recorded data is synchronized with the moving image data when the conference recorded data is reproduced. The reaction is characterized in that to create the conference record data to be displayed.

請求項５に係る発明の会議記録装置は、請求項４に記載の発明の構成に加えて、前記会議端末の各々は、前記会議端末を使用して前記遠隔会議に参加する前記ユーザに装着され、前記ユーザによる肯定的な頭部の動きである頷き、および、前記ユーザによる否定的な頭部の動きである首傾げの少なくとも一方を、前記所定の反応として検出するセンサ手段と、前記センサ手段によって前記所定の反応が検出された場合、前記センサ手段によって検出された前記所定の反応を含む前記反応データを生成する反応データ生成手段とを備えている。 According to a fifth aspect of the present invention, in addition to the configuration of the fourth aspect of the present invention, each of the conference terminals is mounted on the user who participates in the remote conference using the conference terminal. Sensor means for detecting at least one of a whisper that is a positive head movement by the user and a head tilt that is a negative head movement by the user as the predetermined reaction; and the sensor means When the predetermined reaction is detected by the method, the apparatus includes reaction data generating means for generating the reaction data including the predetermined reaction detected by the sensor means.

請求項６に係る発明の会議記録装置は、請求項４に記載の発明の構成に加えて、前記会議端末の各々は、前記会議端末を使用して前記遠隔会議に参加する前記ユーザを撮像する撮像手段と、前記撮像手段によって撮像された前記ユーザの画像を解析することによって、前記ユーザによる肯定的な頭部の動きである頷き、および、前記ユーザによる否定的な頭部の動きである首傾げの少なくとも一方を、前記所定の反応として検出する画像解析手段と、前記画像解析手段によって前記所定の反応が検出された場合、前記画像解析手段によって検出された前記所定の反応を含む前記反応データを生成する反応データ生成手段とを備えている。 According to a sixth aspect of the present invention, in addition to the configuration of the fourth aspect of the present invention, each of the conference terminals images the user participating in the remote conference using the conference terminal. An imaging means and a neck which is a positive head movement by the user and a negative head movement by the user by analyzing the image of the user imaged by the imaging means Image analysis means for detecting at least one of the tilts as the predetermined reaction, and the reaction data including the predetermined reaction detected by the image analysis means when the predetermined analysis is detected by the image analysis means Reaction data generating means for generating.

請求項７に係る発明の会議記録装置は、請求項４〜６のいずれかに記載の発明の構成に加えて、前記会議端末の各々は、さらに、前記動画データ交換手段によって受信された前記動画データに含まれる音声を解析して、前記音声が所定音量よりも大きいか否かを判断する音量判断手段を備え、前記反応データ生成手段は、前記音量判断手段によって前記音声が所定音量よりも大きいと判断された場合に前記反応データを生成することを特徴とする。 According to a seventh aspect of the present invention, in addition to the configuration of the fourth aspect of the present invention, each of the conference terminals further includes the video data received by the video data exchanging means. Analyzing the voice included in the data, it is provided with volume judgment means for judging whether or not the voice is higher than a predetermined volume, and the reaction data generating means is configured such that the voice is higher than the predetermined volume by the volume judgment means. If it is determined that the response data is generated, the reaction data is generated.

請求項８に係る発明の会議記録装置は、請求項４〜７のいずれかに記載の発明の構成に加えて、前記会議記録データ作成手段は、前記反応記憶手段に記憶された前記反応データの時間情報に基づいて、前記動画記憶手段に記憶された前記動画データのうちで、前記所定の反応が含まれない前記動画データを前記所定の反応が含まれる前記動画データに差し替えて合成することを特徴とする。 In addition to the configuration of the invention according to any one of claims 4 to 7, the conference recording device of the invention according to claim 8 is characterized in that the conference record data creating means stores the reaction data stored in the reaction storage means. Based on the time information, among the moving image data stored in the moving image storage means, the moving image data that does not include the predetermined reaction is replaced with the moving image data that includes the predetermined reaction. Features.

請求項９に係る発明の会議記録方法は、遠隔会議に参加するユーザの画像および音声を取得して、前記ユーザの画像および音声を含む動画データを生成する会議端末が、前記ユーザが存在する複数拠点にそれぞれ設置され、前記会議端末の各々において、ネットワークを介して前記動画データが相互に交換されるとともに、前記動画データに基づいて合成された画像および音声が出力される遠隔会議システムに用いられ、前記遠隔会議を記録する会議記録方法であって、前記会議端末間で相互に前記動画データを送受信して交換する動画データ交換ステップと、前記動画データ交換ステップによって取得された前記動画データを、前記会議記録データの再生時における前記動画データの再生タイミングを示す時間情報に対応付けて動画記憶手段に記憶させる動画記憶ステップと、前記動画データ交換ステップによって取得された前記動画データが前記会議端末間を前記ネットワーク経由で伝送されるのに要する遅延時間に基づいて、前記動画記憶手段に記憶されている動画データと時間情報との対応を補正することによって、前記動画データの再生タイミングを調整する遅延時間補正ステップと、前記動画記憶手段に記憶された前記動画データを前記時間情報に基づいて時系列に合成することによって、一の前記会議端末にて生成された一の前記動画データと、前記一の動画データの受信時に他の前記会議端末にて生成された他の前記動画データとが同期して再生される前記会議記録データを作成する会議記録データ作成ステップと、前記会議記録データ作成ステップによって作成された前記会議記録データを出力する会議記録データ出力ステップとを備えている。 According to a ninth aspect of the present invention, there is provided a conference recording method in which a plurality of conference terminals that acquire images and sounds of users participating in a remote conference and generate moving image data including the images and sounds of the users exist. Used in a remote conference system that is installed at each site and in which each of the conference terminals exchanges the moving image data with each other via a network and outputs an image and a sound synthesized based on the moving image data. , A conference recording method for recording the remote conference, wherein the video data exchange step of exchanging and exchanging the video data between the conference terminals and the video data acquired by the video data exchange step, The moving image storage unit is associated with time information indicating the reproduction timing of the moving image data at the time of reproducing the conference recording data. Stored in the moving image storage means based on a delay time required for the moving image storage step to be stored and the moving image data acquired by the moving image data exchange step to be transmitted between the conference terminals via the network. A delay time correcting step for adjusting the reproduction timing of the moving image data by correcting the correspondence between the moving image data and the time information, and the moving image data stored in the moving image storage means based on the time information The one video data generated at one conference terminal and the other video data generated at another conference terminal when receiving the one video data are synchronized with each other. Created by the conference record data creation step for creating the conference record data to be played back and the conference record data creation step The and a conference recording data output step of the outputting of the conference record data.

請求項１０に係る発明の会議記録プログラムは、遠隔会議に参加するユーザの画像および音声を取得して、前記ユーザの画像および音声を含む動画データを生成する会議端末が、前記ユーザが存在する複数拠点にそれぞれ設置され、前記会議端末の各々において、ネットワークを介して前記動画データが相互に交換されるとともに、前記動画データに基づいて合成された画像および音声が出力される遠隔会議システムに用いられる会議記録プログラムであって、コンピュータを、前記会議端末間で相互に前記動画データを送受信して交換する動画データ交換手段、前記動画データ交換手段によって取得された前記動画データを、前記会議記録データの再生時における前記動画データの再生タイミングを示す時間情報に対応付けて動画記憶手段に記憶させる記憶実行手段、前記動画データ交換手段によって取得された前記動画データが前記会議端末間を前記ネットワーク経由で伝送されるのに要する遅延時間に基づいて、前記動画記憶手段に記憶されている動画データと時間情報との対応を補正することによって、前記動画データの再生タイミングを調整する遅延時間補正手段、前記動画記憶手段に記憶された前記動画データを前記時間情報に基づいて時系列に合成することによって、一の前記会議端末にて生成された一の前記動画データと、前記一の動画データの受信時に他の前記会議端末にて生成された他の前記動画データとが同期して再生される前記会議記録データを作成する会議記録データ作成手段、前記会議記録データ作成手段によって作成された前記会議記録データを出力する会議記録データ出力手段として機能させることを特徴とする。 According to a tenth aspect of the present invention, there is provided a conference recording program in which a plurality of conference terminals that acquire images and sounds of users participating in a remote conference and generate moving image data including the images and sounds of the users exist. Used in a remote conference system that is installed at each site and in which each of the conference terminals exchanges the moving image data with each other via a network and outputs an image and a sound synthesized based on the moving image data A video recording data exchange means for exchanging the video data between the conference terminals by exchanging the video data between the conference terminals, the video data obtained by the video data exchanging means, Video storage means associated with time information indicating the playback timing of the video data during playback A moving image stored in the moving image storage unit based on a delay time required for the moving image data acquired by the storage execution unit to be stored and the moving image data exchange unit to be transmitted between the conference terminals via the network By correcting the correspondence between the data and the time information, delay time correcting means for adjusting the reproduction timing of the moving picture data, and the moving picture data stored in the moving picture storage means are synthesized in time series based on the time information. Thus, the one moving image data generated at one of the conference terminals and the other moving image data generated at the other conference terminal when the one moving image data is received are reproduced in synchronization. Meeting record data creating means for creating the meeting record data, and outputting the meeting record data created by the meeting record data creating means. Characterized in that to function as conference record data output means for.

請求項１に係る発明の会議記録装置では、会議端末間で相互に交換される動画データが、会議記録データの再生時における動画データの再生タイミングを示す時間情報に対応付けて記憶される。動画データに対応付けられる時間情報は、ネットワーク経由で伝送されるのに要する遅延時間に基づいて調整される。動画データを時間情報に基づいて時系列に合成することによって、一の会議端末にて生成された一の動画データと、一の動画データの受信時に他の会議端末にて生成された他の動画データとが同期して再生される会議記録データが作成される。これにより、会議記録データを再生したときに、一のユーザの発言に対する他のユーザの反応を同期して出力することができる。 In the conference recording apparatus according to the first aspect of the present invention, the moving image data exchanged between the conference terminals is stored in association with time information indicating the reproduction timing of the moving image data at the time of reproducing the conference recording data. The time information associated with the moving image data is adjusted based on the delay time required for transmission via the network. By synthesizing video data in time series based on time information, one video data generated at one conference terminal and another video generated at another conference terminal when receiving one video data Conference recording data is created that is played back in synchronization with the data. Thereby, when meeting record data is reproduced | regenerated, the reaction of the other user with respect to one user's utterance can be output synchronizing.

請求項２に係る発明の会議記録装置では、ネットワークの伝送時間を計測する問い合わせ信号に基づいて遅延時間が特定され、その遅延時間に基づいて動画データが受信された現実のタイミングよりも時間情報が遅延される。これにより、請求項１に記載の発明の効果に加え、問い合わせ信号によって特定されるネットワークの伝送時間に基づいて、動画データに対応付けられた時間情報を遅延させることで、一のユーザの発言に対する他のユーザの反応をネットワーク遅延に関係なく正確に同期させることができる。 In the conference recording apparatus of the invention according to claim 2, the delay time is specified based on the inquiry signal for measuring the transmission time of the network, and the time information is more than the actual timing at which the moving image data is received based on the delay time. Delayed. Thereby, in addition to the effect of the invention of claim 1, the time information associated with the moving image data is delayed based on the transmission time of the network specified by the inquiry signal, so that one user's remark Other users' reactions can be accurately synchronized regardless of network delay.

請求項３に係る発明の会議記録装置では、動画データが送信または受信された時刻を示すタイムスタンプに基づいて、動画データの再生タイミングが調整される。これにより、請求項１に記載の発明の効果に加え、タイムスタンプに基づいて動画データが送信または受信された時刻が特定されるため、一のユーザの発言に対する他のユーザの反応をネットワーク遅延に関係なく正確に同期させることができる。 In the conference recording apparatus of the invention according to claim 3, the reproduction timing of the moving image data is adjusted based on the time stamp indicating the time when the moving image data is transmitted or received. Thus, in addition to the effect of the invention according to claim 1, since the time at which the moving image data is transmitted or received is specified based on the time stamp, the response of one user to another user's response to the network delay It can be accurately synchronized regardless.

請求項４に係る発明の会議記録装置では、会議端末にて検出されたユーザに関する所定の反応を含む反応データに基づいて、会議記録データの再生時において動画データと同期して所定の反応が表示される会議記録データが作成される。これにより、請求項１〜３のいずれかに記載の発明の効果に加え、会議記録データの再生時において一のユーザの発言に対する他のユーザの反応を正確に判別することができる。 In the conference recording apparatus of the invention according to claim 4, a predetermined response is displayed in synchronization with the moving image data when the conference recording data is reproduced based on the response data including the predetermined response regarding the user detected by the conference terminal. Meeting record data is created. Thereby, in addition to the effect of the invention according to any one of claims 1 to 3, it is possible to accurately discriminate the reaction of another user to one user's utterance at the time of reproducing the conference record data.

請求項５に係る発明の会議記録装置では、各会議端末において、ユーザに装着されるセンサ手段によって所定の反応として頷きおよび首傾げの少なくとも一方が検出され、所定の反応が検出された場合に反応データが生成される。これにより、請求項４に記載の発明の効果に加え、会議記録データの再生時において、ユーザの反応が肯定的および否定的のいずれであるかを表示することができる。 In the conference recording apparatus of the invention according to claim 5, at each conference terminal, at least one of beating and tilting is detected as a predetermined response by the sensor means attached to the user, and the response is detected when the predetermined response is detected. Data is generated. Thereby, in addition to the effect of the invention of claim 4, it is possible to display whether the user's reaction is positive or negative at the time of reproducing the conference record data.

請求項６に係る発明の会議記録装置では、各会議端末において、画像解析によって所定の反応として頷きおよび首傾げの少なくとも一方が検出され、所定の反応が検出された場合に反応データが生成される。これにより、請求項４に記載の発明の効果に加え、会議記録データの再生時において、ユーザの反応が肯定的および否定的のいずれであるかを表示することができる。 In the conference recording apparatus of the invention according to claim 6, at each conference terminal, at least one of whirling and tilting is detected as a predetermined reaction by image analysis, and reaction data is generated when the predetermined reaction is detected. . Thereby, in addition to the effect of the invention of claim 4, it is possible to display whether the user's reaction is positive or negative at the time of reproducing the conference record data.

請求項７に係る発明の会議記録装置では、各会議端末が受信した動画データに含まれる音声を解析して、その音声が所定音量よりも大きい場合に反応データが生成される。これにより、請求項４〜６のいずれかに記載の発明の効果に加え、一のユーザの発言が行われていないにも関わらず、他のユーザについて反応データが生成されるといった不具合を防止することができる。 In the conference recording apparatus according to the seventh aspect of the present invention, the audio included in the moving image data received by each conference terminal is analyzed, and reaction data is generated when the audio is higher than a predetermined volume. Thereby, in addition to the effect of the invention according to any one of claims 4 to 6, it is possible to prevent a problem that reaction data is generated for another user even though one user does not speak. be able to.

請求項８に係る発明の会議記録装置では、会議記録データの作成時に反応データに基づいて、所定の反応が含まれない動画データが、所定の反応が含まれる動画データに差し替えて合成される。これにより、請求項４〜７のいずれかに記載の発明の効果に加え、会議記録データにおける一部の動画データを差し替えるだけで、一のユーザの発言に対する他のユーザの反応を正確に同期させることができる。 In the conference recording apparatus according to the eighth aspect of the present invention, the moving image data not including the predetermined reaction is combined with the moving image data including the predetermined reaction based on the reaction data when the conference recording data is created. Thereby, in addition to the effect of the invention according to any one of claims 4 to 7, by simply replacing a part of the moving image data in the conference record data, the reaction of the other user to the utterance of one user is accurately synchronized. be able to.

請求項９に係る発明の会議記録方法では、会議端末間で相互に交換される動画データが、会議記録データの再生時における動画データの再生タイミングを示す時間情報に対応付けて記憶される。動画データに対応付けられる時間情報は、ネットワーク経由で伝送されるのに要する遅延時間に基づいて調整される。動画データを時間情報に基づいて時系列に合成することによって、一の会議端末にて生成された一の動画データと、一の動画データの受信時に他の会議端末にて生成された他の動画データとが同期して再生される会議記録データが作成される。これにより、会議記録データを再生したときに、一のユーザの発言に対する他のユーザの反応を同期して出力することができる。 In the conference recording method according to the ninth aspect of the present invention, the moving image data exchanged between the conference terminals is stored in association with the time information indicating the reproduction timing of the moving image data when reproducing the conference recording data. The time information associated with the moving image data is adjusted based on the delay time required for transmission via the network. By synthesizing video data in time series based on time information, one video data generated at one conference terminal and another video generated at another conference terminal when receiving one video data Conference recording data is created that is played back in synchronization with the data. Thereby, when meeting record data is reproduced | regenerated, the reaction of the other user with respect to one user's utterance can be output synchronizing.

請求項１０に係る発明の会議記録プログラムでは、会議端末間で相互に交換される動画データが、会議記録データの再生時における動画データの再生タイミングを示す時間情報に対応付けて記憶される。動画データに対応付けられる時間情報は、ネットワーク経由で伝送されるのに要する遅延時間に基づいて調整される。動画データを時間情報に基づいて時系列に合成することによって、一の会議端末にて生成された一の動画データと、一の動画データの受信時に他の会議端末にて生成された他の動画データとが同期して再生される会議記録データが作成される。これにより、会議記録データを再生したときに、一のユーザの発言に対する他のユーザの反応を同期して出力することができる。 In the conference recording program according to the tenth aspect, the moving image data exchanged between the conference terminals is stored in association with the time information indicating the reproduction timing of the moving image data at the time of reproducing the conference recording data. The time information associated with the moving image data is adjusted based on the delay time required for transmission via the network. By synthesizing video data in time series based on time information, one video data generated at one conference terminal and another video generated at another conference terminal when receiving one video data Conference recording data is created that is played back in synchronization with the data. Thereby, when meeting record data is reproduced | regenerated, the reaction of the other user with respect to one user's utterance can be output synchronizing.

第１の実施形態に係るテレビ会議システム１の全体構成を示す図である。It is a figure showing the whole video conference system 1 composition concerning a 1st embodiment. 端末装置３の電気的構成を示すブロック図である。3 is a block diagram showing an electrical configuration of a terminal device 3. FIG. ＨＤＤ３１のメモリ構成を示す図である。2 is a diagram illustrating a memory configuration of an HDD 31. FIG. 端末装置３で実行される遅延算出処理のフローチャートである。10 is a flowchart of a delay calculation process executed by the terminal device 3. 端末装置３，４，５間におけるデータ送受を時系列に示す図である。It is a figure which shows the data transmission / reception between the terminal devices 3, 4, and 5 in time series. テレビ会議画面２８０の一具体例を示す図である。It is a figure which shows one specific example of the video conference screen. 端末装置３で実行される会議記録処理のフローチャートである。10 is a flowchart of a conference recording process executed by the terminal device 3. 会議再生画面２９０の一具体例を示す図である。It is a figure which shows one specific example of the meeting reproduction | regeneration screen 290. FIG. 第２の実施形態に係るテレビ会議システム１の全体構成を示す図である。It is a figure which shows the whole structure of the video conference system 1 which concerns on 2nd Embodiment. ＭＵＣ７の全体構成を示す図である。It is a figure which shows the whole structure of MUC7. ＨＤＤ５１のメモリ構成を示す図である。2 is a diagram illustrating a memory configuration of an HDD 51. FIG. 端末装置３，４，５およびＭＵＣ７間におけるデータ送受を時系列に示す図である。It is a figure which shows the data transmission / reception between the terminal devices 3, 4, 5 and MUC7 in time series. 端末装置３で実行される反応データ送信処理のフローチャートである。4 is a flowchart of a reaction data transmission process executed by the terminal device 3. ＭＵＣ７で実行される会議記録処理のフローチャートである。It is a flowchart of the meeting recording process performed by MUC7. ＭＵＣ７で実行される反応表示加工処理のフローチャートである。It is a flowchart of the reaction display processing process performed by MUC7. ＭＵＣ７で実行される反応表示付加処理のフローチャートである。It is a flowchart of the reaction display addition process performed by MUC7. 会議再生画面３００の一具体例を示す図である。5 is a diagram illustrating a specific example of a conference reproduction screen 300. FIG. 変形例に係る遅延算出処理のフローチャートである。It is a flowchart of the delay calculation process which concerns on a modification. 変形例に係る端末装置３の電気的構成を示すブロック図である。It is a block diagram which shows the electric constitution of the terminal device 3 which concerns on a modification. 変形例に係る反応データ送信処理のフローチャートである。It is a flowchart of the reaction data transmission process which concerns on a modification.

以下、本発明を具現化した実施の形態について、図面を参照して説明する。なお、参照する図面は、本発明が採用しうる技術的特徴を説明するために用いられるものであり、記載されている装置の構成、各種処理のフローチャートなどは、それのみに限定する趣旨ではなく、単なる説明例である。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, embodiments of the invention will be described with reference to the drawings. The drawings to be referred to are used for explaining the technical features that can be adopted by the present invention, and the configuration of the apparatus and the flowcharts of various processes described are not intended to be limited thereto. This is just an illustrative example.

＜第１の実施形態＞
図１を参照して、第１の実施形態に係るテレビ会議システム１の全体構成について説明する。図１に示すように、第１の実施形態に係るテレビ会議システム１は、ユーザが存在する複数の拠点にそれぞれ設けられた複数の端末装置が、ネットワーク２に接続されている。なお、図１では、テレビ会議システム１に３つの端末装置３，４，５が設けられている場合を示しているが、テレビ会議システム１には複数の端末装置が設けられていればよい。 <First Embodiment>
With reference to FIG. 1, the overall configuration of the video conference system 1 according to the first embodiment will be described. As shown in FIG. 1, in the video conference system 1 according to the first embodiment, a plurality of terminal devices respectively provided at a plurality of locations where a user exists are connected to a network 2. Although FIG. 1 shows a case where the video conference system 1 is provided with three terminal devices 3, 4, and 5, the video conference system 1 only needs to be provided with a plurality of terminal devices.

第１の実施形態に係るテレビ会議システム１では、各拠点に設けられた端末装置３，４，５で取得された映像および音声が、ネットワーク２を介して互いに送受信される。各端末装置３，４，５では、各拠点にて取得された映像および音声を合成した会議データが作成および出力される。これにより、各端末装置３，４，５では、各拠点に存在するユーザの映像および音声がリアルタイムに合成出力されて遠隔会議（ここでは、テレビ会議）が実施される。また、各端末装置３，４，５では、テレビ会議システム１にて実行されるテレビ会議が、テレビ会議の終了後に再生可能な会議記録データとして記録される。以下では、テレビ会議システム１にて実行されるテレビ会議が、端末装置３にて記録される場合を例示して説明する。 In the video conference system 1 according to the first embodiment, video and audio acquired by the terminal devices 3, 4, and 5 provided at each site are transmitted and received via the network 2. In each terminal device 3, 4, 5, conference data obtained by synthesizing video and audio acquired at each base is created and output. As a result, in each of the terminal devices 3, 4, and 5, the video and audio of the user existing at each base are synthesized and output in real time, and a remote conference (here, a video conference) is performed. In each of the terminal devices 3, 4, and 5, a video conference executed by the video conference system 1 is recorded as conference record data that can be reproduced after the video conference ends. Below, the case where the video conference performed by the video conference system 1 is recorded by the terminal device 3 is demonstrated and demonstrated.

図２および図３を参照して、端末装置３の電気的構成について説明する。なお、端末装置３，４，５は全て同様の構成であるため、ここでは端末装置３の構成についてのみ説明し、他の端末装置４，５については説明を省略する。 With reference to FIG. 2 and FIG. 3, the electrical configuration of the terminal device 3 will be described. Since all the terminal devices 3, 4, and 5 have the same configuration, only the configuration of the terminal device 3 will be described here, and the description of the other terminal devices 4, 5 will be omitted.

端末装置３には、端末装置３の制御を司るコントローラとしてのＣＰＵ２０が設けられている。ＣＰＵ２０には、ＢＩＯＳ等を記憶したＲＯＭ２１と、各種データを一時的に記憶するＲＡＭ２２と、データの受け渡しの仲介を行うＩ／Ｏインタフェイス３０とが接続されている。Ｉ／Ｏインタフェイス３０には、各種記憶エリアを有するハードディスクドライブ３１（以下、ＨＤＤ３１）が接続されている。 The terminal device 3 is provided with a CPU 20 as a controller that controls the terminal device 3. Connected to the CPU 20 are a ROM 21 that stores BIOS, a RAM 22 that temporarily stores various data, and an I / O interface 30 that mediates data transfer. The I / O interface 30 is connected to a hard disk drive 31 (hereinafter referred to as HDD 31) having various storage areas.

Ｉ／Ｏインタフェイス３０には、ネットワーク２に通信接続するための通信装置２５と、マウス２７と、ビデオコントローラ２３と、キーコントローラ２４と、ユーザを撮影するためのカメラ３４と、ユーザの音声を取り込むためのマイク３５と、音声を出力するためのスピーカ３６と、ＣＤ−ＲＯＭドライブ２６とが各々接続されている。ビデオコントローラ２３には、ディスプレイ２８が接続されている。キーコントローラ２４には、キーボード２９が接続されている。 The I / O interface 30 includes a communication device 25 for communication connection to the network 2, a mouse 27, a video controller 23, a key controller 24, a camera 34 for photographing a user, and a user's voice. A microphone 35 for capturing, a speaker 36 for outputting sound, and a CD-ROM drive 26 are connected to each other. A display 28 is connected to the video controller 23. A keyboard 29 is connected to the key controller 24.

図３に示すように、ＨＤＤ３１には、各端末装置３，４，５で生成された動画データを記憶する動画データ記憶エリア３１ａと、後述の記録遅延時間を記憶する記録遅延時間記憶エリア３１ｂと、テレビ会議を記録した会議記録データを記憶する会議記録データ記憶エリア３１ｃと、端末装置３にて実行される各種プログラムを記憶するプログラム記憶エリア３１ｄと、その他の情報記憶エリア３１ｅとが設けられている。会議記録データ記憶エリア３１ｃに記憶される会議記録データは、動画データ記憶エリア３１ａに記憶される複数の動画データに基づいて作成されるが、詳細は後述する。 As shown in FIG. 3, the HDD 31 includes a moving image data storage area 31a for storing moving image data generated by the terminal devices 3, 4, and 5, and a recording delay time storage area 31b for storing a recording delay time described later. A conference record data storage area 31c for storing conference record data recording a video conference, a program storage area 31d for storing various programs executed by the terminal device 3, and another information storage area 31e are provided. Yes. The conference recording data stored in the conference recording data storage area 31c is created based on a plurality of moving image data stored in the moving image data storage area 31a. Details will be described later.

なお、プログラム記憶エリア３１ｄには、テレビ会議を実行するための会議実行プログラムや、テレビ会議を記録するための会議記録プログラムが記憶されている。ＣＤ−ＲＯＭドライブ２６に挿入されるＣＤ−ＲＯＭ１１４には、上記の会議実行プログラムおよび会議記録プログラムが記憶されている。端末装置３では、ＣＤ−ＲＯＭドライブ２６からＣＤ−ＲＯＭ１１４を読み込ませることで、これらのプログラムやデータをＨＤＤ３１にセットアップしてプログラム記憶エリア３１ｄに格納することができる。 The program storage area 31d stores a conference execution program for executing a video conference and a conference recording program for recording the video conference. The CD-ROM 114 inserted into the CD-ROM drive 26 stores the above conference execution program and conference recording program. In the terminal device 3, by reading the CD-ROM 114 from the CD-ROM drive 26, these programs and data can be set up in the HDD 31 and stored in the program storage area 31d.

図４〜図８を参照して、第１の実施形態に係るテレビ会議システム１における、テレビ会議の記録に関する処理について説明する。以下に説明する各種処理は、ＨＤＤ３１に記憶されている会議記録プログラムに基づいて、ＣＰＵ２０によって実行される。ここでは端末装置３にて実行される処理を説明するが、他の端末装置４，５にてテレビ会議を記録する場合も同様である。 With reference to FIGS. 4-8, the process regarding the recording of the video conference in the video conference system 1 which concerns on 1st Embodiment is demonstrated. Various processes described below are executed by the CPU 20 based on a conference recording program stored in the HDD 31. Here, the processing executed by the terminal device 3 will be described, but the same applies to the case where the video conference is recorded by the other terminal devices 4 and 5.

図４を参照して、端末装置３にて各拠点間のネットワーク遅延を計測する遅延算出処理について説明する。なお、遅延算出処理（図４）は、テレビ会議システム１にてテレビ会議が開始される前に、任意のタイミングで実行されればよい。第１の実施形態に係る端末装置３では、遅延算出処理（図４）がテレビ会議の開始が指示されたタイミングで実行され、遅延算出処理（図４）が終了したのちに各端末装置３，４，５間でテレビ会議が開始されるものとする。 With reference to FIG. 4, the delay calculation process which measures the network delay between each base in the terminal device 3 is demonstrated. The delay calculation process (FIG. 4) may be executed at an arbitrary timing before the video conference is started in the video conference system 1. In the terminal device 3 according to the first embodiment, the delay calculation process (FIG. 4) is executed at the timing when the start of the video conference is instructed, and after the delay calculation process (FIG. 4) ends, each terminal device 3, Assume that a video conference starts between 4 and 5.

図４に示すように、遅延算出処理では、まず自拠点を基準として、各拠点との遅延時間が計測される（Ｓ１）。具体的には、自拠点の端末装置３と各拠点の端末装置３，４，５との間における、ネットワーク２を経由するデータの伝送時間（つまり、ネットワーク遅延）が、他拠点ごとに算出される。各拠点間のネットワーク遅延は公知の手法で計測されればよいが、例えばＰＩＮＧ（ＰａｃｋｅｔＩＮｔｅｒｎｅｔＧｒｏｐｅｒ）を用いてネットワーク遅延が算出されるものとする。 As shown in FIG. 4, in the delay calculation process, first, the delay time with each site is measured with reference to the own site (S1). Specifically, the data transmission time (that is, network delay) via the network 2 between the terminal device 3 at the local site and the terminal devices 3, 4, and 5 at each site is calculated for each other site. The The network delay between the bases may be measured by a known method. For example, the network delay is calculated by using PING (Packet Internet Grouper).

具体的には、図５に示すように、端末装置３から時刻ｔ１のタイミングで送信されたＰＩＮＧが、端末装置４に時刻ｔ２のタイミングで到達し、端末装置４からの応答が時刻ｔ３のタイミングで端末装置３に到達したものとする。この場合、時刻ｔ３から時刻ｔ１を減じた時間差が、拠点１を基準とした拠点２との遅延時間Δｔ１２として算出される。また、端末装置３から時刻ｔ１のタイミングで送信されたＰＩＮＧが、端末装置５に時刻ｔ４のタイミングで到達し、端末装置５からの応答が時刻ｔ５のタイミングで端末装置３に到達したものとする。この場合、時刻ｔ５から時刻ｔ１を減じた時間差が、拠点１を基準とした拠点３との遅延時間Δｔ１３として算出される。なお、拠点１は自拠点であるため、拠点１との遅延時間Δｔ１１は「０」とされる。 Specifically, as shown in FIG. 5, the PING transmitted from the terminal device 3 at the timing of time t1 reaches the terminal device 4 at the timing of time t2, and the response from the terminal device 4 is the timing of time t3. It is assumed that the terminal device 3 has been reached. In this case, a time difference obtained by subtracting time t1 from time t3 is calculated as a delay time Δt12 with respect to the base 2 with respect to the base 1. Also, it is assumed that the PING transmitted from the terminal device 3 at the time t1 arrives at the terminal device 5 at the time t4 and the response from the terminal device 5 reaches the terminal device 3 at the time t5. . In this case, a time difference obtained by subtracting the time t1 from the time t5 is calculated as a delay time Δt13 with respect to the base 3 with the base 1 as a reference. Since the base 1 is its own base, the delay time Δt11 with respect to the base 1 is set to “0”.

Ｓ１で算出された各拠点との遅延時間に基づいて、拠点毎の記録遅延時間が算出される（Ｓ３）。記録遅延時間は、後述する会議記録データにおける、各拠点で生成された動画データの記録位置を調整するためのデータである。Ｓ３では、Ｓ１で取得された遅延時間のうちで最長の遅延時間から拠点毎の遅延時間をそれぞれ減じることで、拠点毎の記録遅延時間が算出される。 Based on the delay time with each base calculated in S1, the recording delay time for each base is calculated (S3). The recording delay time is data for adjusting the recording position of the moving image data generated at each site in the meeting recording data described later. In S3, the recording delay time for each site is calculated by subtracting the delay time for each site from the longest delay time among the delay times acquired in S1.

具体的には、図５の例では、各拠点１〜３との遅延時間Δｔ１１，Δｔ１２，Δｔ１３のうちで最長の遅延時間は、拠点３との遅延時間Δｔ１３である。そのため、拠点２の記録遅延時間は「Δｔ１３―Δｔ１２」となる。拠点３の記録遅延時間は「Δｔ１３―Δｔ１３」、すなわち「０」となる。拠点１の記録遅延時間は「Δｔ１３―Δｔ１１」、すなわち「Δ１３」となる。Ｓ３で算出された拠点毎の記録遅延時間は、ＨＤＤ３１の記録遅延時間記憶エリア３１ｂに記憶される。 Specifically, in the example of FIG. 5, the longest delay time among the delay times Δt11, Δt12, Δt13 with respect to each of the bases 1 to 3 is the delay time Δt13 with respect to the base 3. Therefore, the recording delay time of the base 2 is “Δt13−Δt12”. The recording delay time of the base 3 is “Δt13−Δt13”, that is, “0”. The recording delay time of the site 1 is “Δt13−Δt11”, that is, “Δ13”. The recording delay time for each base calculated in S3 is stored in the recording delay time storage area 31b of the HDD 31.

全拠点について記録遅延時間が算出および記憶されると、遅延算出処理（図４）が終了する。そして、各拠点間（つまり、端末装置３，４，５）で、ネットワーク２を介してテレビ会議が実行される。端末装置３では、ＨＤＤ３１に記憶されている会議実行プログラムに基づいて、ＣＰＵ２０によって公知の手法でテレビ会議が実行される。例えば、端末装置３では、カメラ３４にて撮像される規定時間単位（例えば、１秒毎）の映像と、マイク３５にて取得される規定時間単位（例えば、１秒毎）の音声とが、拠点１の動画データとして生成される。拠点１の動画データは、ネットワーク２を介して他拠点に送信される。同様に端末装置４，５でも、それぞれ拠点２，３の動画データが生成されて、ネットワーク２を介して他拠点に送信される。端末装置３では、自拠点１で生成した動画データと、他拠点２，３から受信した動画データとを合成した会議データが作成されて、その会議データがディスプレイ２８およびスピーカ３６から出力される。同様に端末装置４，５でも、それぞれ自拠点で生成した動画データと、他拠点から受信した動画データとを合成した会議データが作成および出力される。 When the recording delay time is calculated and stored for all the bases, the delay calculation process (FIG. 4) ends. Then, a video conference is executed between the bases (that is, the terminal devices 3, 4, 5) via the network 2. In the terminal device 3, the video conference is executed by the CPU 20 by a known method based on the conference execution program stored in the HDD 31. For example, in the terminal device 3, a video in a specified time unit (for example, every second) captured by the camera 34 and a sound in a specified time unit (for example, every second) acquired by the microphone 35 are It is generated as moving image data of the base 1. The moving image data of the site 1 is transmitted to another site via the network 2. Similarly, in the terminal devices 4 and 5, moving image data of the bases 2 and 3 are generated and transmitted to other bases via the network 2. In the terminal device 3, conference data is created by combining the moving image data generated at the local site 1 and the moving image data received from the other sites 2 and 3, and the conference data is output from the display 28 and the speaker 36. Similarly, the terminal devices 4 and 5 also generate and output conference data obtained by synthesizing the moving image data generated at each site and the moving image data received from another site.

例えば、テレビ会議システム１で実行されるテレビ会議では、拠点１の端末装置３にて作成された会議データに基づいて、図６に示すようなテレビ会議画面２８０がディスプレイ２８に表示される。テレビ会議画面２８０（図６）には、自拠点１のユーザの映像が表示されるユーザ表示領域２８１のほか、他拠点２，３のユーザの映像がユーザ表示領域２８２，２８３にそれぞれ表示される。また、テレビ会議画面２８０に同期して、スピーカ３６から各拠点１〜３のユーザの音声が出力される。同様に拠点２，３でも、端末装置４，５にて作成された会議データに基づいて、各拠点１〜３のユーザの映像および音声が合成出力される。 For example, in a video conference executed by the video conference system 1, a video conference screen 280 as shown in FIG. 6 is displayed on the display 28 based on conference data created by the terminal device 3 at the site 1. On the video conference screen 280 (FIG. 6), in addition to the user display area 281 where the video of the user at the local site 1 is displayed, the video of the users at the other bases 2 and 3 are displayed in the user display areas 282 and 283, respectively. . In addition, in synchronism with the video conference screen 280, the voices of the users at the respective sites 1 to 3 are output from the speaker 36. Similarly, at the bases 2 and 3, the video and audio of the users at the respective bases 1 to 3 are synthesized and output based on the conference data created by the terminal devices 4 and 5.

ただし、各端末装置３，４，５で作成される会議データは、自拠点で生成された動画データと、他拠点から受信した動画データとがリアルタイムに合成されたものである。一方、自拠点の動画データが送信された時点から起算して、自拠点の動画データが他拠点に到達し、さらにその到達時に生成された他拠点の動画データが自拠点に到達するまでには、先述したようにネットワーク２での伝送時間分を要する。そのため、実際のテレビ会議では、例えば自拠点のユーザの発言に対する他拠点のユーザの反応が、他拠点においてユーザが現実に反応を示した時点よりも遅れてテレビ会議に出力される。 However, the conference data created by each of the terminal devices 3, 4, and 5 is a composite of the moving image data generated at its own site and the moving image data received from another site in real time. On the other hand, from the time when the video data of the local site is transmitted, until the video data of the local site reaches the other site, and the video data of the other site generated at that time reaches the local site. As described above, the transmission time in the network 2 is required. For this reason, in an actual video conference, for example, a user's reaction at another site to a user's speech at the local site is output to the video conference after a time when the user actually reacted at the other site.

図６に示す例では、上記のテレビ会議の実行時において、自拠点１のユーザが行った「私の意見に賛成ですか？」という発言に対して、他拠点２，３のユーザの反応がネットワーク遅延のために出力されていない。この拠点１のユーザの発言に対して、拠点２のユーザの反応が出力されるのは先述の遅延時間Δ１２が経過した時点となり、拠点３のユーザの反応が出力されるのは先述の遅延時間Δ１３が経過した時点となる。このようにテレビ会議では、各拠点のユーザの映像および音声がリアルタイムに合成出力されるものの、実際には各拠点とのネットワーク遅延によって出力タイミングにズレが生じることがある。そうすると、テレビ会議をそのまま記録して会議記録データを作成した場合に、その会議記録データを再生しても自拠点のユーザに発言に対する他拠点のユーザの反応が分かりにくいままである。第１の実施形態では、テレビ会議を記録する端末装置３にて以下の処理を実行することで、会議記録データの再生時に自拠点のユーザの発言と他拠点のユーザの反応とを同期して出力可能にしている。 In the example shown in FIG. 6, when the above-mentioned video conference is performed, the user's reaction at the other sites 2 and 3 responds to the remark “Do you agree with my opinion?” Not output due to network delay. The response of the user at the site 1 is output when the above-mentioned delay time Δ12 has passed, and the response of the user at the site 3 is output in response to the above-mentioned delay time. It is the time when Δ13 has elapsed. As described above, in the video conference, the video and audio of the user at each site are synthesized and output in real time, but in reality, the output timing may be shifted due to the network delay with each site. Then, when the conference record data is created by recording the video conference as it is, even if the conference record data is reproduced, it is difficult for the user at the local site to understand the reaction of the user at the other site to the utterance. In the first embodiment, by executing the following processing in the terminal device 3 that records a video conference, the user's speech at the local site and the user's reaction at the other site are synchronized when the recorded conference data is reproduced. Output is enabled.

図７を参照して、端末装置３にてテレビ会議を記録するための会議記録処理について説明する。会議記録処理（図７）は、拠点１（端末装置３）のユーザが参加するテレビ会議が開始されると、ＣＰＵ２０によって開始実行される。 With reference to FIG. 7, the meeting recording process for recording a video conference in the terminal device 3 is demonstrated. The conference recording process (FIG. 7) is started and executed by the CPU 20 when a video conference in which the user of the base 1 (terminal device 3) participates is started.

図７に示すように、会議記録処理では、まず各拠点から動画データが取得される（Ｓ１１）。Ｓ１１では、テレビ会議の実行中に自拠点で生成された動画データや他拠点から受信した動画データが、ＲＡＭ２２に一時的に記憶される。なお、動画データには、その動画データが生成された拠点（生成元拠点）にて取得された規定時間単位の映像および音声のほか、その生成元拠点を示す識別データが含まれている。 As shown in FIG. 7, in the conference recording process, moving image data is first acquired from each base (S11). In S <b> 11, the moving image data generated at the local site during the video conference and the moving image data received from another site are temporarily stored in the RAM 22. Note that the moving image data includes identification data indicating the generation source site in addition to video and audio in a predetermined time unit acquired at the site (generation source site) where the moving image data was generated.

ＲＡＭ２２に一時記憶された動画データは、拠点毎の記録遅延時間に応じてバッファリングされる（Ｓ１３）。Ｓ１３では、Ｓ１１で取得された動画データに示される生成元拠点を参照して、記録遅延時間記憶エリア３１ｂから生成元拠点の記録遅延時間が取得される。そして、ＲＡＭ２２に一時記憶された動画データが生成元拠点の記録遅延時間分のバッファリングが行われる。記録遅延時間分のバッファリングは、公知の遅延回路や遅延プログラムなどによって実行されればよい。 The moving image data temporarily stored in the RAM 22 is buffered according to the recording delay time for each site (S13). In S13, the recording delay time of the generation source base is acquired from the recording delay time storage area 31b with reference to the generation base indicated in the moving image data acquired in S11. The moving image data temporarily stored in the RAM 22 is buffered for the recording delay time of the generation base. The buffering for the recording delay time may be executed by a known delay circuit or delay program.

記録遅延時間分のバッファリングが行われたのち、ＲＡＭ２２に一時記憶された動画データがＨＤＤ３１の動画データ記憶エリア３１ａに記録される（Ｓ１５）。動画データ記憶エリア３１ａでは、バッファリングが終了した時点の現実の時刻を示す時刻データと対応付けて、動画データが記憶される。なお、動画データ記憶エリア３１ａに動画データが保存されると、ＲＡＭ２２に一時記憶された動画データは削除される。 After buffering for the recording delay time, the moving image data temporarily stored in the RAM 22 is recorded in the moving image data storage area 31a of the HDD 31 (S15). In the moving image data storage area 31a, moving image data is stored in association with time data indicating the actual time when the buffering is finished. When the moving image data is saved in the moving image data storage area 31a, the moving image data temporarily stored in the RAM 22 is deleted.

そして、テレビ会議が終了されたか否かが判断される（Ｓ１７）。例えば端末装置３にてテレビ会議の終了指示がなされた場合、（Ｓ１７：ＹＥＳ）、動画データ記憶エリア３１ａに記憶されている複数の動画データを合成して、そのテレビ会議を記録した会議記録データが作成される（Ｓ１９）。一方、テレビ会議の終了指示がない場合（Ｓ１７：ＮＯ）、引き続き動画データの取得、バッファリング、記録が実行される（Ｓ１１〜Ｓ１５）。 Then, it is determined whether or not the video conference is ended (S17). For example, when the terminal device 3 is instructed to end the video conference (S17: YES), a plurality of video data stored in the video data storage area 31a is synthesized and conference recording data in which the video conference is recorded. Is created (S19). On the other hand, when there is no instruction to end the video conference (S17: NO), the acquisition, buffering, and recording of the moving image data are continued (S11 to S15).

Ｓ１９では、動画データ記憶エリア３１ａに記憶されている複数の動画データを、各動画データに対応付けられている時刻データに基づいて時系列に合成することによって、テレビ会議が記録された会議記録データが作成される。そのため、会議記録データの再生時には、各動画データに対応付けられている時刻データが一致する動画データが同タイミングで（つまり、同期して）出力される。Ｓ１９で作成された会議記録データが、ＨＤＤ３１の会議記録データ記憶エリア３１ｃに保存されると（Ｓ２１）、会議記録処理（図７）が終了する。 In S19, the conference record data in which the video conference is recorded by synthesizing the plurality of movie data stored in the movie data storage area 31a in time series based on the time data associated with each movie data. Is created. Therefore, at the time of reproducing the conference recording data, the moving image data having the same time data associated with each moving image data is output at the same timing (that is, synchronously). When the conference record data created in S19 is stored in the conference record data storage area 31c of the HDD 31 (S21), the conference recording process (FIG. 7) ends.

より具体的には、会議記録処理（図７）のＳ１３では、各拠点１〜３にて生成された動画データのうち、自拠点１から取得された動画データに「Δｔ１３」秒のバッファリングが実行される。他拠点２から取得された動画データに「Δｔ１３―Δｔ１２」秒のバッファリングが実行される。他拠点３から取得された動画データに「０」秒のバッファリングが実行される。つまり、ネットワーク遅延の少ない拠点（例えば、自拠点１）で生成された動画データほど、Ｓ１３でのバッファリング時間が長くなる。そのため、Ｓ１５ではネットワーク遅延の少ない拠点で生成された動画データほど、実際に生成されたタイミングを基準として遅延幅が大きい時刻データが対応付けられる。一方、ネットワーク遅延の大きい拠点（例えば、他拠点３）で生成された動画データほど、Ｓ１３でのバッファリング時間が短くなる。そのため、Ｓ１５ではネットワーク遅延の大きい拠点で生成された動画データほど、実際に生成されたタイミングを基準として遅延幅が小さい時刻データが対応付けられる。その結果、動画データ記憶エリア３１ａでは、ネットワーク遅延の最も大きい他拠点で生成された動画データが自拠点で受信された時刻を基準として、自拠点から送信された動画データと、その動画データが到達したときに他拠点で生成された動画データとに、同一の時刻データが付与される。 More specifically, in S13 of the conference recording process (FIG. 7), among the moving image data generated at each of the sites 1 to 3, the moving image data acquired from the own site 1 is buffered for “Δt13” seconds. Executed. Buffering of “Δt13−Δt12” seconds is performed on the moving image data acquired from the other site 2. Buffering of “0” seconds is performed on the moving image data acquired from the other base 3. That is, the buffering time in S13 becomes longer as the moving image data is generated at a base with less network delay (for example, own base 1). For this reason, in S15, time data having a larger delay width is associated with the moving image data generated at the base having a smaller network delay with reference to the actually generated timing. On the other hand, the buffering time in S13 becomes shorter as the moving image data is generated at a base with a large network delay (for example, another base 3). Therefore, in S15, time data having a smaller delay width is associated with the moving image data generated at the base having the larger network delay with reference to the actually generated timing. As a result, in the moving image data storage area 31a, the moving image data transmitted from the local site and the moving image data arrived on the basis of the time when the moving image data generated at the other site having the largest network delay is received at the local site. The same time data is given to the moving image data generated at the other site.

これにより、Ｓ１９およびＳ２１では、動画データ記憶エリア３１ａに記憶された複数の動画データを時刻データに沿って合成することで、テレビ会議において自拠点１のユーザの発言と、他拠点２，３にてその発言を受けたユーザの反応とが、それぞれ同期して出力されるような会議記録データが作成および保存される。そして、テレビ会議の終了後に任意のタイミングで、ユーザが端末装置３にて保存されている会議記録データを再生すると、例えば図８に示すような会議再生画面２９０がディスプレイ２８に表示される。 Thereby, in S19 and S21, by synthesizing a plurality of moving image data stored in the moving image data storage area 31a along with the time data, the user's remarks at the local site 1 and the other sites 2 and 3 in the video conference. Thus, meeting record data is generated and stored so that the reaction of the user who receives the comment is output in synchronization with each other. Then, when the user reproduces the conference record data stored in the terminal device 3 at an arbitrary timing after the end of the video conference, a conference reproduction screen 290 as shown in FIG. 8 is displayed on the display 28, for example.

会議再生画面２９０（図８）は、基本的にはテレビ会議で表示されるテレビ会議画面２８０（図６）と同様であるが、テレビ会議とは異なりネットワーク遅延に起因する出力タイミングのズレが抑制されている。図８に示す例では、自拠点１のユーザが「私の意見に賛成ですか？」という発言に同期して、他拠点２のユーザが頷きという肯定的な反応を示し、他拠点３のユーザが首傾げという否定的な反応を示したことが表示されている。また、会議再生画面２９０（図８）に同期して、テレビ会議での実際の発話タイミングによって、スピーカ３６から各ユーザの音声が出力される。 The conference playback screen 290 (FIG. 8) is basically the same as the video conference screen 280 (FIG. 6) displayed in the video conference, but unlike the video conference, the output timing shift caused by the network delay is suppressed. Has been. In the example shown in FIG. 8, the user at the other site 3 shows a positive reaction that the user at the other site 2 whispered in synchronization with the statement “Do you agree with my opinion?” Has shown a negative reaction of tilting the head. In addition, in synchronization with the conference playback screen 290 (FIG. 8), each user's voice is output from the speaker 36 at the actual speech timing in the video conference.

第１の実施形態に係るテレビ会議システム１によれば、端末装置３，４，５間で相互に交換される動画データが、端末装置３にて時刻データに対応付けて記憶される。動画データに対応付けられる時刻データは、ネットワーク経由で伝送されるのに要する遅延時間に基づいて算出される記録遅延時間のバッファリングによって調整される。動画データを時刻データに基づいて時系列に合成することによって、端末装置３にて生成された動画データと、その動画データの受信時に端末装置４，５にて生成された動画データとが同期して再生される会議記録データが作成される。これにより、会議記録データを再生したときに、一のユーザの発言に対する他のユーザの反応を同期して出力することができる。 According to the video conference system 1 according to the first embodiment, the moving image data exchanged between the terminal devices 3, 4, and 5 is stored in the terminal device 3 in association with the time data. The time data associated with the moving image data is adjusted by buffering the recording delay time calculated based on the delay time required for transmission via the network. By synthesizing the moving image data in time series based on the time data, the moving image data generated by the terminal device 3 is synchronized with the moving image data generated by the terminal devices 4 and 5 when the moving image data is received. Meeting record data to be played back is created. Thereby, when meeting record data is reproduced | regenerated, the reaction of the other user with respect to one user's utterance can be output synchronizing.

さらに、ネットワークの伝送時間を計測する問い合わせ信号（ＰＩＮＧ）に基づいて遅延時間が特定され、その遅延時間に基づいて算出される記録遅延時間のバッファリングによって、動画データに対応付けられる時刻データが遅延される。よって、一のユーザの発言に対する他のユーザの反応を、ネットワーク遅延に関係なく正確に同期させることができる。 Further, the delay time is specified based on the inquiry signal (PING) for measuring the transmission time of the network, and the time data associated with the video data is delayed by buffering the recording delay time calculated based on the delay time. Is done. Therefore, it is possible to accurately synchronize the reactions of other users to one user's remarks regardless of network delay.

＜第２の実施形態＞
図９を参照して、第２の実施形態に係るテレビ会議システム１の全体構成について説明する。図９に示すように、第２の実施形態に係るテレビ会議システム１は、各拠点に設けられる複数の端末装置と多拠点接続装置７とがネットワーク２に接続される。なお、図９では、テレビ会議システム１に３つの端末装置３，４，５が設けられている場合を示しているが、テレビ会議システム１には複数の端末装置が設けられていればよい。 <Second Embodiment>
With reference to FIG. 9, the overall configuration of the video conference system 1 according to the second embodiment will be described. As shown in FIG. 9, in the video conference system 1 according to the second embodiment, a plurality of terminal devices provided at each site and a multi-site connection device 7 are connected to the network 2. Although FIG. 9 shows a case where the video conference system 1 is provided with three terminal devices 3, 4, and 5, the video conference system 1 only needs to be provided with a plurality of terminal devices.

多拠点接続装置７は、ネットワーク２を介して複数の拠点に備えられたユーザ端末に接続され、映像、音声、データ等を中継することにより、多拠点間のテレビ会議を実現する装置である。以下では、多拠点接続装置（ＭｕｌｔｉｐｏｉｎｔＣｏｎｔｒｏｌＵｎｉｔ）７を、ＭＣＵ７と略称する。 The multi-site connection apparatus 7 is an apparatus that realizes a video conference between multi-sites by being connected to user terminals provided at a plurality of bases via the network 2 and relaying video, audio, data, and the like. Hereinafter, the multipoint control unit (Multipoint Control Unit) 7 is abbreviated as MCU7.

第２の実施形態に係るテレビ会議システム１では、各拠点に設けられた端末装置３，４，５で生成された動画データが、ネットワーク２を介してＭＣＵ７に送信される。ＭＣＵ７では、各拠点から受信した動画データを合成した会議データが作成されて、その会議データが端末装置３，４，５にネットワーク２を介して送信される。端末装置３，４，５では、ＭＣＵ７から受信した会議データが出力される。これにより、各端末装置３，４，５では、各拠点に存在するユーザの映像および音声がリアルタイムに合成出力されて、遠隔会議（ここでは、テレビ会議）が実施される。また、ＭＣＵ７では、テレビ会議システム１にて実行されるテレビ会議が、テレビ会議の終了後に再生可能な会議記録データとして記録される。以下では、テレビ会議システム１にて実行されるテレビ会議が、ＭＵＣ７にて記録される場合を例示して説明する。 In the video conference system 1 according to the second embodiment, moving image data generated by the terminal devices 3, 4, and 5 provided at each base is transmitted to the MCU 7 via the network 2. In the MCU 7, conference data obtained by synthesizing the moving image data received from each base is created, and the conference data is transmitted to the terminal devices 3, 4, 5 via the network 2. In the terminal devices 3, 4, and 5, the conference data received from the MCU 7 is output. Thereby, in each terminal device 3, 4, and 5, the video and audio of the user existing at each base are synthesized and output in real time, and a remote conference (here, a video conference) is performed. In the MCU 7, the video conference executed in the video conference system 1 is recorded as conference record data that can be reproduced after the video conference is completed. Below, the case where the video conference performed by the video conference system 1 is recorded by MUC7 is demonstrated and illustrated.

端末装置３，４，５は、第１の実施形態（図３）と同様の構成をなすが、少なくともＭＣＵ７によって実行されるテレビ会議に参加するクライアントとしての機能（詳細には、ＭＣＵ７に映像や音声を送信する機能や、ＭＣＵ７から送信される会議データを出力する機能など）を有していればよい。そのため、先述の会議実行プログラム、会議記録プログラム、および、テレビ会議の記録に必要な各種記憶エリア（動画データ記憶エリア３１ａ、記録遅延時間記憶エリア３１ｂ、会議記録データ記憶エリア３１ｃ）を具備しない。ただし、プログラム記憶エリア３１ｄには、テレビ会議中のユーザの反応を検出するための反応検出プログラムが記憶されている。 The terminal devices 3, 4, and 5 have the same configuration as that of the first embodiment (FIG. 3), but at least functions as a client that participates in a video conference executed by the MCU 7 (more specifically, the MCU 7 has video and A function of transmitting voice, a function of outputting conference data transmitted from the MCU 7, and the like. For this reason, the above-described conference execution program, conference recording program, and various storage areas (video data storage area 31a, recording delay time storage area 31b, and conference recording data storage area 31c) necessary for video conference recording are not provided. However, the program storage area 31d stores a reaction detection program for detecting the reaction of the user during the video conference.

図１０および図１１を参照して、ＭＵＣ７の電気的構成について説明する。図１０に示すように、ＭＵＣ７は、先述の端末装置３とほぼ同様の構成をなし、ＣＰＵ４０，ＲＯＭ４１，ＲＡＭ４２，Ｉ／Ｏインタフェイス５０，ＨＤＤ５１を有している。Ｉ／Ｏインタフェイス５０には、ネットワーク２と通信するための通信装置４５と、マウス４７と、キーボード４９でのキー入力を受け付けるキーコントローラ４４と、ディスプレイ４８の表示制御を行うビデオコントローラ４３とがそれぞれ接続されている。 The electrical configuration of the MUC 7 will be described with reference to FIGS. 10 and 11. As shown in FIG. 10, the MUC 7 has substantially the same configuration as the terminal device 3 described above, and includes a CPU 40, a ROM 41, a RAM 42, an I / O interface 50, and an HDD 51. The I / O interface 50 includes a communication device 45 for communicating with the network 2, a mouse 47, a key controller 44 that receives key inputs from the keyboard 49, and a video controller 43 that controls display of the display 48. Each is connected.

図１１に示すように、ＨＤＤ５１には、各端末装置３，４，５で生成された動画データを記憶する動画データ記憶エリア５１ａと、各拠点のユーザの反応を示す反応データを記憶する反応データ記憶エリア５１ｂと、テレビ会議を記録した会議記録データを記憶する会議記録データ記憶エリア５１ｃと、ＭＵＣ７にて実行される各種プログラムを記憶するプログラム記憶エリア５１ｄと、その他の情報記憶エリア５１ｅとが設けられている。会議記録データ記憶エリア５１ｃに記憶される会議記録データは、動画データ記憶エリア５１ａに記憶される動画データと、反応データ記憶エリア５１ｂに記憶されている反応データとに基づいて作成されるが、詳細は後述する。なお、プログラム記憶エリア５１ｄには、テレビ会議を実行するための会議実行プログラムや、テレビ会議を記録するための会議記録プログラムが記憶されている。 As shown in FIG. 11, the HDD 51 stores a moving image data storage area 51 a that stores moving image data generated by the terminal devices 3, 4, and 5, and reaction data that stores reaction data indicating the reaction of the user at each site. A storage area 51b, a conference recording data storage area 51c for storing conference recording data recording a video conference, a program storage area 51d for storing various programs executed in the MUC 7, and other information storage areas 51e are provided. It has been. The conference recording data stored in the conference recording data storage area 51c is created based on the moving image data stored in the moving image data storage area 51a and the reaction data stored in the reaction data storage area 51b. Will be described later. The program storage area 51d stores a conference execution program for executing a video conference and a conference recording program for recording the video conference.

ところで、第２の実施形態に係るテレビ会議システム１において、ＭＵＣ７で作成される会議データは、各拠点から受信した動画データがリアルタイムに合成されたものである。一方、ＭＵＣ７でテレビ会議を実行した場合、ＭＵＣ７が一の拠点の動画データを受信した時点から起算して、その動画データを含む会議データが別の拠点に到達し、さらにその到達時に生成された別の拠点の動画データをＭＵＣ７が受信するまでには、ネットワーク２での伝送時間分を要する。そのため、実際のテレビ会議では、例えば一の拠点のユーザの発言に対する別の拠点のユーザの反応が、その別の拠点においてユーザが現実に反応を示した時点よりも遅れてテレビ会議に出力される。 By the way, in the video conference system 1 according to the second embodiment, the conference data created by the MUC 7 is obtained by combining the moving image data received from each base in real time. On the other hand, when a video conference is executed with MUC7, the conference data including the video data arrives at another base from the point when MUC7 receives the video data of one base and is generated at the time of arrival. It takes a transmission time in the network 2 until the MUC 7 receives the video data of another base. For this reason, in an actual video conference, for example, a user's reaction from another site to a user's statement at one site is output to the video conference later than when the user actually reacted at the other site. .

具体的には、図１２に示すように、端末装置３から時刻ｔ１１のタイミングで送信された動画データが、ＭＵＣ７に時刻ｔ１２のタイミングで到達する。ＭＵＣ７が時刻ｔ１２のタイミングで受信した複数の動画データを合成して会議データを作成し、各端末装置３，４，５に会議データを送信する。端末装置４に時刻ｔ１３のタイミングで会議データが到達し、端末装置４から返信された動画データが時刻ｔ１４のタイミングでＭＣＵ７に到達する。この場合、時刻ｔ１４から時刻ｔ１２を減じた時間差が、例えば拠点１でのユーザの発話に対する拠点２のユーザの反応をＭＵＣ７が取得するのに必要な遅延時間Δｔ２１となる。また、端末装置５に時刻ｔ１５のタイミングで会議データが到達し、端末装置５から返信された動画データが時刻ｔ１６のタイミングでＭＣＵ７に到達する。この場合、時刻ｔ１６から時刻ｔ１２を減じた時間差が、例えば拠点１でのユーザの発話に対する拠点３のユーザの反応をＭＵＣ７が取得するのに必要な遅延時間Δｔ２２となる。 Specifically, as shown in FIG. 12, the moving image data transmitted from the terminal device 3 at the timing of time t11 arrives at the MUC 7 at the timing of time t12. The MUC 7 synthesizes a plurality of video data received at the timing of time t12 to create conference data, and transmits the conference data to each of the terminal devices 3, 4, and 5. The conference data arrives at the terminal device 4 at the timing of time t13, and the moving image data returned from the terminal device 4 reaches the MCU 7 at the timing of time t14. In this case, the time difference obtained by subtracting the time t12 from the time t14 becomes, for example, the delay time Δt21 necessary for the MUC 7 to acquire the reaction of the user at the base 2 with respect to the user's utterance at the base 1. The conference data arrives at the terminal device 5 at the timing of time t15, and the moving image data returned from the terminal device 5 reaches the MCU 7 at the timing of time t16. In this case, the time difference obtained by subtracting the time t12 from the time t16 becomes, for example, the delay time Δt22 necessary for the MUC 7 to acquire the reaction of the user at the base 3 with respect to the user's utterance at the base 1.

図１３〜図１７を参照して、第２の実施形態に係るテレビ会議システム１における、テレビ会議の記録に関する処理について説明する。以下では、テレビ会議の記録に関する処理のうち、端末装置３で実行される処理とＭＵＣ７で実行される処理とを分けて説明する。また、ここでは端末装置３にて実行される処理を説明するが、他の端末装置４，５にて実行される処理も同様である。 With reference to FIGS. 13-17, the process regarding the video conference recording in the video conference system 1 which concerns on 2nd Embodiment is demonstrated. Below, the process performed by the terminal device 3 and the process performed by MUC7 are demonstrated separately among the processes regarding the recording of a video conference. In addition, although the process executed by the terminal device 3 will be described here, the process executed by the other terminal devices 4 and 5 is the same.

図１３を参照して、端末装置３にて実行される反応データ送信処理について説明する。反応データ送信処理（図１３）は、ＭＵＣ７から取得された会議データの出力時におけるユーザの反応を、ＭＵＣ７に反応データとして返信するための処理である。反応データ送信処理（図１３）は、拠点１（端末装置３）のユーザが参加するテレビ会議が開始されると、先述の反応検出プログラムに基づいてＣＰＵ２０によって実行される。 With reference to FIG. 13, the reaction data transmission process performed in the terminal device 3 is demonstrated. The reaction data transmission process (FIG. 13) is a process for returning a user reaction at the time of outputting the conference data acquired from the MUC 7 to the MUC 7 as reaction data. The reaction data transmission process (FIG. 13) is executed by the CPU 20 based on the above-described reaction detection program when a video conference in which the user of the base 1 (terminal device 3) participates is started.

図１３に示すように、反応データ送信処理では、まずＭＣＵ７から送信された会議データが受信される（Ｓ５１）。先述したように、テレビ会議の実行中は、各端末装置３，４，５で取得された映像および音声を含む動画データがＭＵＣ７に送信され、ＭＵＣ７ではこれらの動画データを合成した会議データが作成される。ＭＵＣ７から送信された会議データは、各端末装置３，４，５にて出力される。Ｓ５１では、ＭＵＣ７から端末装置３に送信された会議データが受信される。 As shown in FIG. 13, in the reaction data transmission process, first, conference data transmitted from the MCU 7 is received (S51). As described above, during the video conference, video data including video and audio acquired by each terminal device 3, 4, and 5 is transmitted to MUC 7, and conference data obtained by synthesizing these video data is created in MUC 7. Is done. The conference data transmitted from the MUC 7 is output from each terminal device 3, 4, 5. In S51, the conference data transmitted from the MUC 7 to the terminal device 3 is received.

Ｓ５１で受信された会議データから反応用タイムスタンプが抽出される（Ｓ５３）。反応用タイムスタンプは、会議データがＭＵＣ７から送信された現実の時刻を示すが、詳細は後述する。そして、端末装置３を使用するユーザの反応が検出される（Ｓ５５）。Ｓ５５では、カメラ３４にて撮影された画像を解析することで、その画像に含まれるユーザの反応が検出される。一例として、ユーザの反応として「頷き」や「首傾げ」を検出する場合には、周知の画像処理によってユーザの頭部の振れが検出される。「頷き」は、話者が話している内容に聞き手が納得したときに、聞き手の頭部が上下方向に所定量以上に振れる状態をいう。「首傾げ」は、話者が話している内容に聞き手が納得しないときに、聞き手の頭部が左右方向に所定量以上に振れる状態をいう。これらの反応は、例えば特開２００７−９７６６８号公報に記載された状態識別装置による識別方法で検出可能である。 A reaction time stamp is extracted from the conference data received in S51 (S53). The reaction time stamp indicates the actual time when the conference data is transmitted from the MUC 7, and will be described in detail later. Then, the reaction of the user who uses the terminal device 3 is detected (S55). In S55, the user's reaction included in the image is detected by analyzing the image taken by the camera 34. As an example, when detecting “whit” or “head tilt” as a user's reaction, the shake of the user's head is detected by well-known image processing. “Swing” refers to a state in which the listener's head swings more than a predetermined amount in the vertical direction when the listener is satisfied with the content the speaker is speaking. “Neck tilt” refers to a state in which the listener's head shakes more than a predetermined amount in the left-right direction when the listener is not satisfied with the content the speaker is speaking. These reactions can be detected by an identification method using a state identification device described in, for example, Japanese Patent Application Laid-Open No. 2007-97668.

Ｓ５５にてユーザの反応が検出された場合（Ｓ５７：ＹＥＳ）、そのユーザの反応に関する反応データが生成される（Ｓ５９）。反応データは、Ｓ５３で抽出された反応用タイムスタンプと、Ｓ５５で検出された反応の種類（例えば、頷きや首傾げ等）と、自拠点を示す識別データとを含む。Ｓ５９で生成された反応データは、ネットワーク２を介してＭＣＵ７に返信される（Ｓ６１）。Ｓ５５にてユーザの反応が検出されなかった場合は（Ｓ５７：ＮＯ）、Ｓ３７，Ｓ３９がスキップされるため、Ｓ５３で抽出された反応用タイムスタンプは破棄される。そして、テレビ会議が終了されたか否かが判断される（Ｓ６３）。例えばＭＵＣ７からテレビ会議の終了指示がなされた場合、（Ｓ６３：ＹＥＳ）、反応データ送信処理（図１３）が終了する。テレビ会議の終了指示がなければ（Ｓ６３：ＮＯ）、Ｓ５１に戻る。 When a user reaction is detected in S55 (S57: YES), reaction data relating to the user reaction is generated (S59). The reaction data includes the reaction time stamp extracted in S53, the type of reaction detected in S55 (for example, whispering or tilting the head), and identification data indicating its own base. The reaction data generated in S59 is returned to the MCU 7 via the network 2 (S61). If no user reaction is detected in S55 (S57: NO), S37 and S39 are skipped, and the reaction time stamp extracted in S53 is discarded. Then, it is determined whether or not the video conference is terminated (S63). For example, when a video conference end instruction is issued from the MUC 7 (S63: YES), the reaction data transmission process (FIG. 13) ends. If there is no instruction to end the video conference (S63: NO), the process returns to S51.

図１４を参照して、ＭＵＣ７にてテレビ会議を記録する会議記録処理について説明する。ＭＣＵ７にて実行されるテレビ会議の記録に関する処理について説明する。会議記録処理（図１４）は、テレビ会議システム１にてテレビ会議が開始されると、先述の会議記録プログラムに基づいてＣＰＵ４０によって実行される。 With reference to FIG. 14, the meeting recording process which records a video conference in MUC7 is demonstrated. A process related to the recording of the video conference executed by the MCU 7 will be described. When the video conference is started in the video conference system 1, the conference recording process (FIG. 14) is executed by the CPU 40 based on the above-described conference recording program.

図１４に示すように、ＭＵＣ７にてテレビ会議を記録する会議記録処理では、まず端末装置３，４，５のいずれかからデータを受信した場合（Ｓ１０１：ＹＥＳ）、Ｓ１０１での受信データが動画データであるか否かが判断される（Ｓ１０３）。受信データが動画データである場合（Ｓ１０３：ＹＥＳ）、その動画データが記録用タイムスタンプと関連付けて動画データ記憶エリア５１ａに記憶される（Ｓ１０５）。Ｓ１０５にて動画データに関連付けられる記録用タイムスタンプは、その動画データがＭＣＵ７にて受信された現実の時刻を示す情報である。 As shown in FIG. 14, in the conference recording process for recording a video conference in MUC7, when data is first received from any one of terminal devices 3, 4, and 5 (S101: YES), the received data in S101 is a moving image. It is determined whether it is data (S103). When the received data is moving image data (S103: YES), the moving image data is stored in the moving image data storage area 51a in association with the recording time stamp (S105). The recording time stamp associated with the moving image data in S105 is information indicating the actual time when the moving image data is received by the MCU 7.

一方、ＭＵＣ７では、先述したように各端末装置３，４，５から同タイミングで受信した動画データがリアルタイムに合成されて、各端末装置３，４，５に会議データが送信される。このとき、Ｓ１０５にて記憶された動画データを含む会議データは、反応用タイムスタンプが付与されたうえで各端末装置３，４，５に送信される（Ｓ１０７）。反応用タイムスタンプは、その会議データがＭＵＣ７から送信される現実の時刻を示す情報である。なお、第２の実施形態に係るＭＵＣ７では、動画データが受信された時点と会議データが送信される時点とのタイムラグがほとんどないため、Ｓ１０７にて付与される反応用タイムスタンプはＳ１０５で関連付けられた記録用タイムスタンプと同一の時刻を示す。これにより、各端末装置３，４，５では、先述のＳ５３にてＭＵＣ７で管理される記録用タイムスタンプと同一タイミングを示す反応用タイムスタンプが取得される。 On the other hand, in the MUC 7, as described above, the moving image data received from the terminal devices 3, 4, and 5 at the same timing is synthesized in real time, and the conference data is transmitted to the terminal devices 3, 4, and 5. At this time, the conference data including the moving image data stored in S105 is transmitted to each of the terminal devices 3, 4, and 5 after being given a reaction time stamp (S107). The reaction time stamp is information indicating the actual time when the conference data is transmitted from the MUC 7. In the MUC 7 according to the second embodiment, since there is almost no time lag between the time when the moving image data is received and the time when the conference data is transmitted, the reaction time stamp assigned in S107 is associated in S105. Indicates the same time as the recording time stamp. As a result, each terminal device 3, 4, 5 acquires a reaction time stamp indicating the same timing as the recording time stamp managed by the MUC 7 in S 53 described above.

Ｓ１０１での受信データが動画データでない場合（Ｓ１０３：ＮＯ）、Ｓ１０１での受信データが反応データであるか否かが判断される（Ｓ１０９）。受信データが反応データである場合（Ｓ１０９：ＹＥＳ）、その反応データが記録用タイムスタンプおよび反応用タイムスタンプと関連付けて反応データ記憶エリア５１ｂに記憶される（Ｓ１１１）。Ｓ１１１にて反応データに関連付けられる記録用タイムスタンプは、その反応データがＭＣＵ７にて受信された現実の時刻を示す。また、Ｓ１１１にて反応データに関連付けられる反応用タイムスタンプは、送信元の拠点（つまり、反応拠点）にて抽出された反応用タイムスタンプ（Ｓ５３参照）である。 If the received data in S101 is not moving image data (S103: NO), it is determined whether the received data in S101 is reaction data (S109). When the received data is reaction data (S109: YES), the reaction data is stored in the reaction data storage area 51b in association with the recording time stamp and the reaction time stamp (S111). The recording time stamp associated with the reaction data in S111 indicates the actual time when the reaction data was received by the MCU 7. Further, the reaction time stamp associated with the reaction data in S111 is a reaction time stamp (see S53) extracted at the transmission source site (that is, the reaction site).

Ｓ１０７またはＳ１１１が実行されたのち、あるいは、受信データが反応データでない場合（Ｓ１０９：ＮＯ）、テレビ会議が終了されたか否かが判断される（Ｓ１１３）。例えば各端末装置３，４，５にてテレビ会議の終了指示がなされた場合、（Ｓ１１３：ＹＥＳ）、動画データ記憶エリア５１ａに記憶されている複数の動画データを合成して、そのテレビ会議を記録した会議記録データが作成される（Ｓ１１５）。Ｓ１１５で作成された会議記録データは、ＨＤＤ５１の会議記録データ記憶エリア５１ｃに保存される（Ｓ１１７）。このとき、会議記録データ記憶エリア５１ｃでは、会議記録データと関連付けて、その会議記録データの生成時に各拠点から取得された動画データがそれぞれ表示される位置（拠点別表示位置）が記憶される。テレビ会議の終了指示がなければ（Ｓ１１３：ＮＯ）、Ｓ１０１に戻る。 After S107 or S111 is executed, or when the received data is not reaction data (S109: NO), it is determined whether or not the video conference is terminated (S113). For example, when an instruction to end a video conference is given by each terminal device 3, 4, 5, (S113: YES), a plurality of video data stored in the video data storage area 51a are combined to perform the video conference. The recorded conference record data is created (S115). The conference record data created in S115 is stored in the conference record data storage area 51c of the HDD 51 (S117). At this time, in the conference record data storage area 51c, a position (display position for each site) where the moving image data acquired from each site when the conference record data is generated is stored in association with the conference record data. If there is no instruction to end the video conference (S113: NO), the process returns to S101.

Ｓ１１５では、動画データ記憶エリア５１ａに記憶されている複数の動画データを、各動画データに対応付けられている記録用タイムスタンプに基づいて時系列に合成することによって、テレビ会議が記録された会議記録データが作成される。そのため、会議記録データの再生時には、各動画データに対応付けられている記録用タイムスタンプが一致する動画データが同タイミングで（つまり、同期して）出力される。ただし、先述したように記録用タイムスタンプは、ＭＵＣ７での受信時を基準として動画データに付与される時刻データである。そのため、ＭＵＣ７と各拠点とのネットワーク遅延によって、各動画データがテレビ会議において実際に生成された時点とは異なるタイミングで出力されることがある。 In S115, a conference in which a video conference is recorded by synthesizing a plurality of movie data stored in the movie data storage area 51a in time series based on a recording time stamp associated with each movie data. Recorded data is created. Therefore, at the time of reproducing the conference recording data, the moving image data whose recording time stamps corresponding to the respective moving image data match is output at the same timing (that is, synchronously). However, as described above, the recording time stamp is time data given to the moving image data with reference to the time of reception at the MUC 7. Therefore, due to network delay between the MUC 7 and each base, each moving picture data may be output at a timing different from the time when it is actually generated in the video conference.

例えば、図６に示すようなテレビ会議を記録した会議記録データの再生時には、一の拠点１のユーザが「私の意見に賛成ですか？」という発言に対して、別の拠点２，３のユーザの反応が同タイミングでは出力されない。この拠点１のユーザの発言に対して、拠点２のユーザの反応が出力されるのは先述の遅延時間Δ２２が経過した時点となり、拠点３のユーザの反応が出力されるのは先述の遅延時間Δ２３が経過した時点となる。このように、ＭＵＣ７の受信時を基準とする記録用タイムスタンプに基づいて作成された会議記録データでは、各拠点のユーザの映像および音声が合成して出力されるものの、実際には各拠点とのネットワーク遅延によって再生タイミングにズレが生じることがある。そうすると、会議記録データを再生しても、一の拠点のユーザに発言に対する別の拠点のユーザの反応が分かりにくいままである。第２の実施形態では、ＭＵＣ７にて以下の反応表示加工処理（Ｓ１１９）を実行することで、会議記録データの再生時に一の拠点のユーザに発言とその発言に対する別の拠点のユーザの反応とを同期して出力可能にしている。 For example, at the time of reproducing the conference record data recorded in the video conference as shown in FIG. 6, in response to the remark “Do you agree with my opinion?” User response is not output at the same timing. The response of the user at the site 1 is output at the time when the above-described delay time Δ22 has passed, and the response of the user at the site 3 is output for the above-described delay time. It is the time when Δ23 has elapsed. As described above, in the conference recording data created based on the recording time stamp based on the reception time of the MUC 7, the video and audio of the user at each site are synthesized and output. The playback timing may be shifted due to the network delay. Then, even if the meeting record data is reproduced, it is difficult for the user at one site to understand the reaction of the user at another site to the speech. In the second embodiment, by executing the following reaction display processing (S119) in the MUC 7, when the conference record data is played back, the user at one site speaks and the user's reaction at another site responds to the statement. Can be output synchronously.

図１５に示すように、反応表示加工処理（Ｓ１１９）では、まず反応データ記憶エリア５１ｂに未処理の反応データが存在するか否かが判断される（Ｓ１２１）。未処理の反応データが存在する場合（Ｓ１２１：ＹＥＳ）、その反応データからタイムスタンプが抽出される（Ｓ１２３）。Ｓ１２３では、反応データ記憶エリア５１ｂにて各反応データに関連付けられている２つのタイムスタンプ（記録用タイムスタンプおよび反応用タイムスタンプ）が取得される。そして、会議記録データにユーザの反応を付加する反応表示付加処理が実行される（Ｓ１２５）。 As shown in FIG. 15, in the reaction display processing (S119), it is first determined whether or not unprocessed reaction data exists in the reaction data storage area 51b (S121). When unprocessed reaction data exists (S121: YES), a time stamp is extracted from the reaction data (S123). In S123, two time stamps (recording time stamp and reaction time stamp) associated with each reaction data are acquired in the reaction data storage area 51b. And the reaction display addition process which adds a user's reaction to meeting record data is performed (S125).

図１６に示すように、反応表示付加処理（Ｓ１２５）では、まず会議記録データからユーザが反応を示した時点の動画データが切り出される（Ｓ１３１）。Ｓ１３１では、会議記録データ記憶エリア５１ｃに保存されている会議記録データから、Ｓ１２３で抽出された反応データの記録用タイムスタンプと一致する動画データが切り出される。会議記録データに合成された個々の動画データは、会議記録データ中で拠点別に規定時間単位（例えば、１秒毎）の再生時間を占める。ここでは、会議記録データに含まれる動画データのうちで、Ｓ１２３で抽出された反応データと生成元の拠点（反応拠点）が共通する動画データが、各拠点を示す識別データに基づいて共通の拠点別表示位置から切り出される。 As shown in FIG. 16, in the response display addition process (S125), first, moving image data at the time when the user shows a response is cut out from the meeting record data (S131). In S131, moving image data that matches the recording time stamp of the reaction data extracted in S123 is cut out from the conference recording data stored in the conference recording data storage area 51c. Each moving image data combined with the conference record data occupies a reproduction time of a predetermined time unit (for example, every second) for each base in the conference record data. Here, among the moving image data included in the conference record data, the moving image data common to the reaction data extracted in S123 and the generation base (reaction base) is based on the identification data indicating each base. Cut out from another display position.

次に、会議記録データに含まれる動画データのうち、会議データ送信時の動画データがＳ１３１で切り出された動画データに差し替えられる（Ｓ１３３）。Ｓ１３３では、会議記録データ記憶エリア５１ｃに保存されている会議記録データにおいて、Ｓ１２３で抽出された反応データの反応用タイムスタンプと一致する記録用タイムスタンプに対応する動画データが、Ｓ１３１で切り出された動画データに差し替えられる。これにより、一の拠点におけるユーザの発言と、その発言に対する別の拠点でのユーザの反応とが、会議記録データの再生時に同期して再生される。さらに、Ｓ１３３では、反応データに示されるユーザの反応が文字または図柄によって表示されるように、会議記録データに含まれる差し替え後の動画データが加工される。 Next, among the moving image data included in the conference record data, the moving image data at the time of conference data transmission is replaced with the moving image data cut out in S131 (S133). In S133, the video data corresponding to the recording time stamp that matches the reaction time stamp of the reaction data extracted in S123 in the conference recording data stored in the conference recording data storage area 51c is cut out in S131. Replaced with video data. Thereby, the user's utterance at one site and the user's reaction at another site with respect to the utterance are reproduced in synchronization with the reproduction of the conference record data. Further, in S133, the replaced moving image data included in the conference record data is processed so that the user's reaction indicated in the response data is displayed with characters or symbols.

最後に、会議記録データにおけるＳ１３１で動画データが切り出された部分に、その動画データが切り出される直前の停止画像が挿入される（Ｓ１３５）。すなわち、会議記録データ記憶エリア５１ｃに保存されている会議記録データのうち、Ｓ１３１で動画データが切り出された部分はデータなしの状態となるため、会議記録データの再生時にその切り出し部分は表示なしとされる。そこで、Ｓ１３５では、会議記録データの再生時にその切り出し部分の直前に表示される画像が、その切り出し部分に亘って表示される停止画像として挿入される。これにより、会議記録データからの動画データの切り出しに伴うブランクの発生を防止することができる。Ｓ１３５が実行されたのち、反応表示加工処理（図１５）に戻る。 Finally, the stop image immediately before the moving image data is cut out is inserted into the portion of the meeting record data where the moving image data is cut out in S131 (S135). That is, in the conference record data stored in the conference record data storage area 51c, the portion where the moving image data is cut out in S131 is in a state of no data, so that the cut-out portion is not displayed when the conference record data is reproduced. Is done. Therefore, in S135, the image displayed immediately before the cutout portion when the conference record data is reproduced is inserted as a stop image displayed over the cutout portion. Thereby, generation | occurrence | production of the blank accompanying cutout of the moving image data from meeting recording data can be prevented. After S135 is executed, the process returns to the reaction display processing (FIG. 15).

図１５に戻り、反応表示付加処理（Ｓ１２５）が実行されたのち、Ｓ１２１に戻って未処理の反応データが存在するか否かが判断される。つまり、反応データ記憶エリア５１ｂに未処理の反応データが存在しなくなるまで、その未処理の反応データごとにＳ１２３，Ｓ１２５が繰り返し実行される。未処理の反応データが存在しない場合（Ｓ１２１：ＮＯ）、会議記録データ記憶エリア５１ｃに加工済みの会議記録データが保存されて（Ｓ１２７）、会議記録処理（図１４）に戻る。会議記録処理（図１４）は、反応表示加工処理（Ｓ１１９）が実行されたのちに終了する。 Returning to FIG. 15, after the reaction display addition process (S125) is executed, the process returns to S121 to determine whether or not unprocessed reaction data exists. That is, S123 and S125 are repeatedly executed for each unprocessed reaction data until there is no unprocessed reaction data in the reaction data storage area 51b. If there is no unprocessed reaction data (S121: NO), the processed conference record data is stored in the conference record data storage area 51c (S127), and the process returns to the conference recording process (FIG. 14). The meeting recording process (FIG. 14) ends after the reaction display processing process (S119) is executed.

そして、テレビ会議の終了後に任意のタイミングで、各端末装置３，４，５のユーザがＭＵＣ７にて保存されている会議記録データをネットワーク２経由で取得する。具体的には、端末装置３のユーザがＭＵＣ７から取得した会議記録データを再生すると、例えば図１７に示すような会議再生画面３００がディスプレイ２８に表示される。会議再生画面３００（図１７）は、基本的にはテレビ会議で表示されるテレビ会議画面２８０（図６）と同様であるが、テレビ会議とは異なりネットワーク遅延に起因する出力タイミングのズレが抑制されている。 Then, at an arbitrary timing after the end of the video conference, the user of each terminal device 3, 4, 5 acquires conference record data stored in the MUC 7 via the network 2. Specifically, when the user of the terminal device 3 reproduces the conference record data acquired from the MUC 7, a conference reproduction screen 300 as shown in FIG. 17 is displayed on the display 28, for example. The conference playback screen 300 (FIG. 17) is basically the same as the video conference screen 280 (FIG. 6) displayed in the video conference, but unlike the video conference, the output timing shift caused by the network delay is suppressed. Has been.

図１７に示す例では、拠点１のユーザが「私の意見に賛成ですか？」という発言に同期して、拠点２のユーザが頷きという肯定的な反応を示し、拠点３のユーザが首傾げという否定的な反応を示したことが表示されている。さらに、拠点２のユーザの反応が「傾きあり」というポップアップで表示され、拠点３のユーザの反応が「首傾げあり」というポップアップで表示される。また、会議再生画面３００（図１７）に同期して、テレビ会議での実際の発話タイミングによって、スピーカ３６から各ユーザの音声が出力される。なお、ポップアップに替えて、ユーザの反応に応じて予め登録されている映像や、ユーザの反応の強弱や履歴等を示すグラフを表示してもよい。 In the example shown in FIG. 17, the user at the site 1 shows a positive response that the user at the site 2 whispered in synchronization with the statement “Do you agree with my opinion?” It is displayed that it showed a negative reaction. Further, the response of the user at the site 2 is displayed in a pop-up “with tilt”, and the response of the user at the site 3 is displayed in a pop-up “with tilt”. In addition, in synchronism with the conference playback screen 300 (FIG. 17), each user's voice is output from the speaker 36 at the actual speech timing in the video conference. Instead of the pop-up, a video that is registered in advance according to the user's reaction, or a graph indicating the strength or history of the user's reaction may be displayed.

第２の実施形態に係るテレビ会議システム１によれば、端末装置３，４，５間で相互に交換される動画データが、ＭＵＣ７にて記録用タイムスタンプに対応付けて記憶される。動画データに対応付けられる記録用タイムスタンプは、会議データが送信されるタイミングに応じて付与される。動画データを記録用タイムスタンプに基づいて時系列に合成することによって、テレビ会議を記録した会議記録データが作成される。そして、会議記録データの作成時には、反応データに含まれる記録用タイムスタンプおよび反応用タイムスタンプに基づいて、ユーザの反応が含まれない動画データが、所定の反応が含まれる動画データに差し替えて合成される。これにより、会議記録データにおける一部の動画データを差し替えるだけで、一のユーザの発言に対する他のユーザの反応をネットワーク遅延に関係なく同期させることができる。 According to the video conference system 1 according to the second embodiment, moving image data exchanged between the terminal devices 3, 4, and 5 is stored in the MUC 7 in association with the recording time stamp. The recording time stamp associated with the moving image data is given according to the timing at which the conference data is transmitted. By synthesizing the moving image data in time series based on the recording time stamp, conference record data recording a video conference is created. When creating the meeting record data, based on the recording time stamp and reaction time stamp included in the reaction data, the moving image data not including the user's reaction is replaced with the moving image data including the predetermined reaction. Is done. Thereby, it is possible to synchronize the reactions of other users with respect to one user's remarks regardless of network delays by simply replacing a part of the moving image data in the conference record data.

さらに、ＭＣＵ７では、端末装置３，４，５にて検出されたユーザの反応を示す反応データに基づいて、会議記録データの再生時において動画データと同期してユーザの反応が表示される会議記録データが作成される。これにより、会議記録データの再生時において、一のユーザの発言に対する他のユーザの反応を正確に判別することができる。さらに、ユーザの反応が肯定的（頷き等）および否定的（首傾げ等）のいずれであるかを表示することができる。 Further, in the MCU 7, the conference record in which the user's reaction is displayed in synchronization with the moving image data when the conference record data is reproduced based on the reaction data indicating the user's reaction detected by the terminal devices 3, 4, and 5. Data is created. Thereby, at the time of reproduction | regeneration of meeting recording data, the reaction of the other user with respect to one user's statement can be discriminate | determined correctly. Further, it can be displayed whether the user's reaction is positive (whispering or the like) or negative (head tilting or the like).

ところで、上記実施形態において、テレビ会議システム１が本発明の「遠隔会議システム」に相当し、端末装置３，４，５が「会議端末」に相当し、端末装置３およびＭＵＣ７が本発明の「会議記録装置」にそれぞれ相当する。Ｓ１１等を実行するＣＰＵ２０と、Ｓ１０１等を実行するＣＰＵ４０とが、本発明の「動画データ交換手段」にそれぞれ相当する。動画データ記憶エリア３１ａと動画データ記憶エリア５１ａとが、本発明の「動画記憶手段」にそれぞれ相当する。Ｓ１，Ｓ３，Ｓ１３を実行するＣＰＵ２０と、Ｓ１２５を実行するＣＰＵ４０とが、本発明の「遅延時間補正手段」にそれぞれ相当する。Ｓ１９を実行するＣＰＵ２０と、Ｓ１１５，Ｓ１２５を実行するＣＰＵ４０とが、本発明の「会議記録データ作成手段」にそれぞれ相当する。Ｓ２１を実行するＣＰＵ２０と、Ｓ１１７，Ｓ１２７を実行するＣＰＵ４０とが、本発明の「会議記録データ出力手段」にそれぞれ相当する。Ｓ１０１等を実行するＣＰＵ４０が本発明の「反応データ交換手段」に相当する。反応データ記憶エリア５１ｂが本発明の「反応記憶手段」に相当する。カメラ３４が本発明の「撮像手段」に相当する。Ｓ５５を実行するＣＰＵ２０が本発明の「画像解析手段」に相当する。Ｓ５９を実行するＣＰＵ２０が本発明の「反応データ生成手段」に相当する。 By the way, in the said embodiment, the video conference system 1 is equivalent to the "remote conference system" of this invention, the terminal devices 3, 4, and 5 are equivalent to a "conference terminal", and the terminal device 3 and MUC7 are " It corresponds to “conference recording device”. The CPU 20 that executes S11 and the like and the CPU 40 that executes S101 and the like correspond to the “moving image data exchanging means” of the present invention, respectively. The moving image data storage area 31a and the moving image data storage area 51a correspond to the “moving image storage means” of the present invention. The CPU 20 that executes S1, S3, and S13 and the CPU 40 that executes S125 correspond to the “delay time correcting means” of the present invention. The CPU 20 that executes S19 and the CPU 40 that executes S115 and S125 respectively correspond to the “conference recording data creation means” of the present invention. The CPU 20 that executes S21 and the CPU 40 that executes S117 and S127 correspond to the “conference recording data output means” of the present invention. The CPU 40 that executes S101 and the like corresponds to the “reaction data exchanging means” of the present invention. The reaction data storage area 51b corresponds to the “reaction storage means” of the present invention. The camera 34 corresponds to the “imaging means” of the present invention. The CPU 20 that executes S55 corresponds to the “image analysis means” of the present invention. The CPU 20 that executes S59 corresponds to the “reaction data generation means” of the present invention.

また、Ｓ１１等とＳ１０１等とが、本発明の「動画データ交換ステップ」にそれぞれ相当する。Ｓ１５とＳ１０５とが、本発明の「動画記憶ステップ」にそれぞれ相当する。１，Ｓ３，Ｓ１３とＳ１２５とが、本発明の「遅延時間補正ステップ」にそれぞれ相当する。Ｓ１９と、Ｓ１１５，Ｓ１２５とが、本発明の「会議記録データ作成ステップ」にそれぞれ相当する。Ｓ２１と、Ｓ１１７，Ｓ１２７とが、本発明の「会議記録データ出力ステップ」にそれぞれ相当する。なお、ＣＰＵ２０に遅延算出処理（図４）および会議記録処理（図７）を実行させる会議記録プログラムと、ＣＰＵ４０に会議記録処理（図１４）を実行させる会議記録プログラムとが、本発明の「会議記録プログラム」に相当する。 S11 and S101 correspond to the “moving image data exchange step” of the present invention. S15 and S105 correspond to the “moving image storing step” of the present invention. 1, S3, S13, and S125 correspond to the “delay time correction step” of the present invention. S19, S115, and S125 correspond to the “conference record data creation step” of the present invention. S21, S117, and S127 correspond to the “conference record data output step” of the present invention, respectively. The conference recording program that causes the CPU 20 to execute the delay calculation process (FIG. 4) and the conference recording process (FIG. 7) and the conference recording program that causes the CPU 40 to execute the meeting recording process (FIG. 14) Corresponds to “recording program”.

なお、本発明は上記実施形態に限定されるものではなく、発明の要旨を変更しない範囲での変更が可能である。以下、図１８〜図２０を参照して、本発明の変形例について説明する。 In addition, this invention is not limited to the said embodiment, The change in the range which does not change the summary of invention is possible. Hereinafter, a modification of the present invention will be described with reference to FIGS.

例えば、第１の実施形態では、先述した遅延算出処理（図４）に替えて、図１８に示す遅延算出処理を実行してもよい。図１８に示す遅延算出処理は、テレビ会議が開始されると実行され、まず先述と同様に各拠点との遅延時間が計測される（Ｓ１）。Ｓ１で計測された各拠点との遅延時間は、ＨＤＤ３１のその他の情報記憶エリア３１ｅに履歴として記憶される。そして、Ｓ１で計測された最新の遅延時間を前回に計測された遅延時間と拠点別に比較して、所定の閾値（例えば、１秒）以上変化した拠点があるか否かが判断される（Ｓ２）。なお、テレビ会議が開始された直後では、全ての拠点について遅延時間が閾値以上変化したと判断される（Ｓ２：ＹＥＳ）。そして、先述と同様に、拠点毎の記録遅延時間が算出されて、ＨＤＤ３１の記録遅延時間記憶エリア３１ｂに記憶される（Ｓ３）。その後、テレビ会議の終了指示がなされた場合は（Ｓ５：ＹＥＳ）、遅延算出処理（図１８）は終了する。 For example, in the first embodiment, the delay calculation process shown in FIG. 18 may be executed instead of the delay calculation process (FIG. 4) described above. The delay calculation process shown in FIG. 18 is executed when a video conference is started, and first, the delay time with each base is measured as described above (S1). The delay time with each base measured in S1 is stored as a history in the other information storage area 31e of the HDD 31. Then, the latest delay time measured in S1 is compared with the previously measured delay time for each base, and it is determined whether there is a base that has changed by a predetermined threshold (for example, 1 second) (S2). ). Immediately after the video conference is started, it is determined that the delay time has changed by more than a threshold value for all bases (S2: YES). Then, as described above, the recording delay time for each site is calculated and stored in the recording delay time storage area 31b of the HDD 31 (S3). Thereafter, when an instruction to end the video conference is given (S5: YES), the delay calculation process (FIG. 18) ends.

一方、テレビ会議の終了指示がない場合は（Ｓ５：ＮＯ）、所定時間（例えば、５分）待機されたのちにＳ１に戻る。そして、Ｓ１では各拠点との遅延時間が計測され、Ｓ２では最新の遅延時間が前回の遅延時間よりも閾値以上変化した拠点があるか否かが判断される。ここで、遅延時間が閾値以上変化した拠点がある場合は（Ｓ２：ＹＥＳ）、その拠点についてのネットワーク遅延状況が大きく変化したことを意味する。そこで、遅延時間が閾値以上変化した拠点について記録遅延時間が算出されて、記録遅延時間記憶エリア３１ｂに記憶されるその拠点の記録遅延時間が更新される（Ｓ３）。一方、遅延時間が閾値以上変化した拠点がない場合は（Ｓ２：ＮＯ）、前回の遅延時間計測時と比較してネットワーク遅延状況の変化が乏しいことから、各拠点の記録遅延時間は更新されずにＳ５に進む。 On the other hand, if there is no instruction to end the video conference (S5: NO), the process returns to S1 after waiting for a predetermined time (for example, 5 minutes). In S1, the delay time with each base is measured, and in S2, it is determined whether or not there is a base where the latest delay time has changed by a threshold value or more than the previous delay time. Here, if there is a base whose delay time has changed by more than the threshold value (S2: YES), it means that the network delay situation for that base has changed greatly. Therefore, the recording delay time is calculated for the site whose delay time has changed by more than the threshold value, and the recording delay time of the site stored in the recording delay time storage area 31b is updated (S3). On the other hand, if there is no site where the delay time has changed by more than the threshold (S2: NO), the record delay time of each site is not updated because the change in the network delay status is less than that at the time of the previous delay time measurement. Proceed to S5.

図１８に示す遅延算出処理によれば、テレビ会議の進行中に各拠点とのネットワーク遅延状況が大きく変化しても、それに対応して記録遅延時間が更新される。そのため、レビ会議中のネットワーク遅延状況の変化に影響を受けることなく、一のユーザの発言に対する他のユーザの反応を正確に同期させることができる。 According to the delay calculation process shown in FIG. 18, even if the network delay state with each base changes greatly during the video conference, the recording delay time is updated accordingly. Therefore, it is possible to accurately synchronize the reactions of other users with respect to one user's remarks without being affected by changes in the network delay status during the Levi conference.

また、第１の実施形態では、問い合わせ信号であるＰＩＮＧに替えて、先述のタイムスタンプを送受信して各拠点との遅延時間を計測してもよい。この場合は、遅延算出処理（図４）にて、自拠点から送信時の時刻を示すタイムスタンプを他拠点に送信し、他拠点は自拠点に対してそのタイムスタンプを返信する。自拠点では、他拠点から返信されたタイムスタンプが受信された時刻と、そのタイムスタンプが示す時刻との時間差を、その他拠点との遅延時間として算出すればよい。 In the first embodiment, instead of PING as an inquiry signal, the above-described time stamp may be transmitted and received to measure the delay time with each base. In this case, in the delay calculation process (FIG. 4), a time stamp indicating the time of transmission from the own site is transmitted to the other site, and the other site returns the time stamp to the own site. The own base may calculate a time difference between the time when the time stamp returned from the other base is received and the time indicated by the time stamp as a delay time with respect to the other base.

また、第２の実施形態では、各端末装置３，４，５にて画像解析でユーザの反応を検出するのに替えて、ユーザに装着されるセンサによって反応を検出してもよい。具体的には、図１９に示す端末装置３では、公知の加速度センサ３３がセンサ制御部３２を介してＩ／Ｏインタフェイス３０に接続されている。加速度センサ３３は図示外のヘッドホンに内蔵されており、テレビ会議の実行中はユーザがそのヘッドホンを装着する。すると、反応データ送信処理（図１３）では、加速度センサ３３からの検出信号に応じてユーザの反応（頷きや首傾げ等）が検出される（Ｓ５５）。このように、ユーザの反応を検出する手法としては、各種の手法を適用することができる。 In the second embodiment, instead of detecting the user's reaction by image analysis in each of the terminal devices 3, 4, and 5, the reaction may be detected by a sensor attached to the user. Specifically, in the terminal device 3 shown in FIG. 19, a known acceleration sensor 33 is connected to the I / O interface 30 via the sensor control unit 32. The acceleration sensor 33 is built in headphones (not shown), and the user wears the headphones during the video conference. Then, in the reaction data transmission process (FIG. 13), the user's reaction (eg, whispering or tilting the neck) is detected according to the detection signal from the acceleration sensor 33 (S55). As described above, various methods can be applied as a method for detecting a user reaction.

また、第２の実施形態では、先述した反応データ送信処理（図１３）に替えて、図２０に示す反応データ送信処理を実行してもよい。図２０に示す反応データ送信処理は、先述した反応データ送信処理（図１３）と同様であるが、ユーザに反応があった場合に（Ｓ５７：ＹＥＳ）、端末装置３で出力されている会議データに基づく音声が、所定の音量以上であるか否かが判断される（Ｓ５８）。所定の音量は、例えばテレビ会議における過去のユーザ音声の平均音量とすればよい。音声が所定の音量以上であれば（Ｓ５８：ＹＥＳ）、反応データが生成される一方（Ｓ５９）、音声が所定の音量に満たなければ（Ｓ５８：ＮＯ）、Ｓ５９，Ｓ６１がスキップされる。 In the second embodiment, the reaction data transmission process shown in FIG. 20 may be executed instead of the above-described reaction data transmission process (FIG. 13). The reaction data transmission process shown in FIG. 20 is the same as the above-described reaction data transmission process (FIG. 13). However, when the user has a reaction (S57: YES), the conference data output from the terminal device 3 is displayed. It is determined whether or not the sound based on the sound level is equal to or higher than a predetermined volume (S58). The predetermined volume may be, for example, an average volume of past user voices in a video conference. If the voice is equal to or higher than the predetermined volume (S58: YES), reaction data is generated (S59). If the voice does not reach the predetermined volume (S58: NO), S59 and S61 are skipped.

図２０に示す反応データ送信処理によれば、各端末装置３，４，５にて会議データに基づいて出力される音声が小さいときは、テレビ会議においてユーザの発言が行われていないものとみなされる。このような状態でユーザの反応が検出された場合には、そのユーザの反応はテレビ会議における他のユーザの発言に対するものではないとされる。これにより、テレビ会議にてユーザの発言が行われていないにも関わらず、各端末装置３，４，５にて反応データが生成されるといった誤検知を防止することができる。 According to the reaction data transmission process shown in FIG. 20, when the audio output based on the conference data is small at each terminal device 3, 4, 5, it is considered that the user does not speak in the video conference. It is. If a user reaction is detected in such a state, the user reaction is not related to another user's remarks in the video conference. Thereby, it is possible to prevent erroneous detection such that reaction data is generated in each of the terminal devices 3, 4, and 5 even though the user does not speak in the video conference.

１テレビ会議システム
２ネットワーク
３端末装置
４端末装置
５端末装置
７ＭＵＣ
２０ＣＰＵ
２１ＲＯＭ
２２ＲＡＭ
３１ＨＤＤ
３１ａ動画データ記憶エリア
３１ｂ記録遅延時間記憶エリア
３１ｃ会議記録データ記憶エリア
３３加速度センサ
３４カメラ
３５マイク
３６スピーカ
４０ＣＰＵ
４１ＲＯＭ
４２ＲＡＭ
５１ＨＤＤ
５１ａ動画データ記憶エリア
５１ｂ反応データ記憶エリア
５１ｃ会議記録データ記憶エリア
２８０テレビ会議画面
２９０会議再生画面
３００会議再生画面 1 video conference system 2 network 3 terminal device 4 terminal device 5 terminal device 7 MUC
20 CPU
21 ROM
22 RAM
31 HDD
31a Video data storage area 31b Recording delay time storage area 31c Conference recording data storage area 33 Acceleration sensor 34 Camera 35 Microphone 36 Speaker 40 CPU
41 ROM
42 RAM
51 HDD
51a Video data storage area 51b Reaction data storage area 51c Conference record data storage area 280 Video conference screen 290 Conference playback screen 300 Conference playback screen

Claims

A conference terminal that acquires images and sounds of users participating in a remote conference and generates moving image data including the images and sounds of the users is installed at each of a plurality of locations where the users exist, and each of the conference terminals The video data is exchanged via a network and used in a remote conference system in which images and sounds synthesized based on the video data are output to create conference record data recording the remote conference A conference recording device that
Video data exchanging means for exchanging and exchanging the video data between the conference terminals;
Moving image storage means for storing the moving image data acquired by the moving image data exchanging means in association with time information indicating the reproduction timing of the moving image data at the time of reproduction of the conference record data;
Based on the delay time required for the moving image data acquired by the moving image data exchanging means to be transmitted between the conference terminals via the network, the moving image data stored in the moving image storage means and the time information A delay time correcting means for adjusting the reproduction timing of the moving image data by correcting the correspondence;
One moving image data generated by one conference terminal and one moving image data received by synthesizing the moving image data stored in the moving image storage means in time series based on the time information Meeting record data creating means for creating the meeting record data that is sometimes played back in synchronization with the other video data generated by the other conference terminal;
A conference recording apparatus comprising: conference record data output means for outputting the conference record data created by the conference record data creation means.

The delay time correcting means includes
Identify the delay time based on an inquiry signal that measures the transmission time of the network between the conference terminals,
2. The time information associated with the moving image data in the moving image storage means is delayed from an actual timing at which the moving image data is received based on the delay time. Conference recording device.

The time information is a time stamp indicating a time when the moving image data is transmitted or received,
2. The conference recording according to claim 1, wherein the delay time correcting unit adjusts the reproduction timing of the moving image data based on a time stamp associated with the moving image data stored in the moving image storage unit. apparatus.

Reaction data exchange means for exchanging and exchanging reaction data including a predetermined reaction relating to the user detected at the conference terminal between the conference terminals via the network;
Reaction memory means for storing the reaction data acquired by the reaction data exchanging means in association with the time information included in the video data received when the reaction data is generated at the conference terminal,
The conference record data creating unit synthesizes the video data stored in the video storage unit and the reaction data stored in the reaction storage unit in time series based on the time information. The conference recording apparatus according to claim 1, wherein the conference recording data in which the predetermined reaction is displayed in synchronization with the moving image data during reproduction is generated.

Each of the conference terminals
Worn by the user participating in the teleconference using the conference terminal, whispering a positive head movement by the user and tilting a negative head movement by the user Sensor means for detecting at least one of the predetermined reactions;
5. The reaction data generating means for generating the reaction data including the predetermined reaction detected by the sensor means when the predetermined reaction is detected by the sensor means. Conference recording device.

Each of the conference terminals
Imaging means for imaging the user participating in the remote conference using the conference terminal;
By analyzing the image of the user imaged by the imaging means, at least one of a whisper that is a positive head movement by the user and a neck tilt that is a negative head movement by the user Image analysis means for detecting the predetermined reaction as
5. The reaction data generating means for generating the reaction data including the predetermined reaction detected by the image analyzing means when the predetermined reaction is detected by the image analyzing means. The meeting recording device described.

Each of the conference terminals
Furthermore, it comprises a sound volume judging means for analyzing the sound included in the moving picture data received by the moving picture data exchanging means and judging whether or not the sound is larger than a predetermined sound volume,
7. The conference record according to claim 4, wherein the response data generation unit generates the response data when the volume determination unit determines that the voice is higher than a predetermined volume. apparatus.

The meeting record data creating means includes the predetermined reaction not included in the moving picture data stored in the moving picture storage means based on time information of the reaction data stored in the reaction storage means. The conference recording apparatus according to claim 4, wherein the video recording data is synthesized by replacing the video data with the video data including the predetermined reaction.

A conference terminal that acquires images and sounds of users participating in a remote conference and generates moving image data including the images and sounds of the users is installed at each of a plurality of locations where the users exist, and each of the conference terminals The conference recording method is used in a remote conference system in which the moving image data is exchanged via a network and an image and sound synthesized based on the moving image data are output, and the remote conference is recorded. And
A video data exchange step of exchanging and exchanging the video data between the conference terminals;
A moving image storage step of storing the moving image data acquired by the moving image data exchange step in a moving image storage unit in association with time information indicating a reproduction timing of the moving image data at the time of reproduction of the conference recording data;
Based on the delay time required for the video data acquired by the video data exchange step to be transmitted between the conference terminals via the network, the video data stored in the video storage means and the time information A delay time correcting step of adjusting the reproduction timing of the moving image data by correcting the correspondence;
One moving image data generated by one conference terminal and one moving image data received by synthesizing the moving image data stored in the moving image storage means in time series based on the time information A conference record data creating step for creating the conference record data that is sometimes played back synchronously with the other video data generated by the other conference terminal;
A conference record data output step of outputting the conference record data created by the conference record data creation step.

A conference terminal that acquires images and sounds of users participating in a remote conference and generates moving image data including the images and sounds of the users is installed at each of a plurality of locations where the users exist, and each of the conference terminals The conference recording program used in a remote conference system in which the video data is exchanged with each other via a network and an image and a sound synthesized based on the video data are output.
Computer
Video data exchanging means for exchanging and exchanging the video data between the conference terminals,
A storage execution unit for storing the moving image data acquired by the moving image data exchanging unit in a moving image storage unit in association with time information indicating a reproduction timing of the moving image data at the time of reproduction of the conference recording data;
Based on the delay time required for the moving image data acquired by the moving image data exchanging means to be transmitted between the conference terminals via the network, the moving image data stored in the moving image storage means and the time information A delay time correcting means for adjusting the reproduction timing of the moving image data by correcting the correspondence;
One moving image data generated by one conference terminal and one moving image data received by synthesizing the moving image data stored in the moving image storage means in time series based on the time information Meeting record data creating means for creating the meeting record data that is played back in synchronization with the other video data generated by the other conference terminal.
A conference recording program that functions as conference record data output means for outputting the conference record data created by the conference record data creation means.