JP2006304182A

JP2006304182A - Stream data generating method, video conference system, stream data generating device, and stream data compositing device

Info

Publication number: JP2006304182A
Application number: JP2005126542A
Authority: JP
Inventors: Toshihiro Takashima; 稔弘高島
Original assignee: Sumitomo Electric Industries Ltd; Sumitomo Electric Networks Inc
Current assignee: Sumitomo Electric Industries Ltd; Sumitomo Electric Networks Inc
Priority date: 2005-04-25
Filing date: 2005-04-25
Publication date: 2006-11-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a stream data generating method with which a video conference system can be realized even in a video signal other than a video signal which conforms to a variable length format. <P>SOLUTION: Processing executed by a terminal constituting the video conference system comprises the steps of: receiving the input of a video signal from a camera (S1110); storing the video signal in a reception buffer (S1120); reading the video signal from the reception buffer (S1130); encoding areas other than an area (effective area) allocated to a terminal itself as ineffective areas (S1140); extracting data corresponding to an effective area in compressed data generated by encoding (S1150); storing the extracted data in a transmission buffer (S1160); and reading the compressed data from the transmission buffer and outputting the compressed data to a communication line (S1170). <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は映像信号の処理に関する。より特定的には、本発明は、ストリームデータの生成方法、ならびにテレビ会議システム、ストリームデータの生成装置、およびストリームデータの合成装置に関する。 The present invention relates to video signal processing. More specifically, the present invention relates to a stream data generation method, a video conference system, a stream data generation apparatus, and a stream data synthesis apparatus.

通信回線を介した映像音声信号の通信の利用形態として、いわゆるテレビ会議システムがある。このシステムでは複数の映像信号が合成され、その合成された信号が当該システムを構成する各端末に配信される。各端末の使用者、すなわち、テレビ会議の参加者は、その信号に基づいて表示される映像および出力される音声を視聴することができる。ここで、映像信号の合成あるいは映像信号の符号化のための技術が、たとえば、以下のように開示されている（特許文献１〜３、非特許文献１参照）。 There is a so-called video conference system as a usage form of video / audio signal communication via a communication line. In this system, a plurality of video signals are combined, and the combined signals are distributed to each terminal constituting the system. A user of each terminal, that is, a participant in a video conference can view the video displayed and the audio output based on the signal. Here, techniques for synthesizing video signals or encoding video signals are disclosed, for example, as follows (see Patent Documents 1 to 3 and Non-Patent Document 1).

たとえば、非特許文献１に開示された技術によれば、高速演算アルゴリズムにより表示位置のフォーマット長を高速に演算することができる。また、処理は、ソフトウェアによる処理として実現可能であるため、ＰＣ（Personal Computer）レベルで実現することができる。したがって、特別のハードウェアを必要としないため、低コストでテレビ会議システムを実現することができる。
特開平７−２９８２６３号公報特開平９−２１４９７０号公報特開平１０−２３４０４３号公報松下電器産業（株）マルチメディアシステム研究所、「マルチ画面ＭＰＥＧ合成技術を開発」、１９９８年１２月９日 For example, according to the technique disclosed in Non-Patent Document 1, the format length of the display position can be calculated at high speed by a high-speed calculation algorithm. Further, since the processing can be realized as processing by software, it can be realized at a PC (Personal Computer) level. Therefore, since no special hardware is required, a video conference system can be realized at a low cost.
JP 7-298263 A JP-A-9-214970 Japanese Patent Laid-Open No. 10-234043 Matsushita Electric Industrial Co., Ltd. Multimedia System Laboratory, “Developing Multi-screen MPEG Compositing Technology”, December 9, 1998

動画像の再生では、処理の対象となるデータ量が多いため、再生する端末が備える処理装置の能力の制約を受ける。たとえば非特許文献１に開示された技術によれば、可変長に拡張された表示位置フォーマットにもとづく映像信号のみを合成できるため、テレビ会議システムの汎用性を欠くという問題がある。また、高速処理を実行可能な処理装置を必要とするため、ハードウェア構成のコスト削減にも制限を受ける可能性がある。 In the reproduction of moving images, since the amount of data to be processed is large, the capacity of the processing device provided in the terminal to be reproduced is restricted. For example, according to the technique disclosed in Non-Patent Document 1, since only a video signal based on a display position format extended to a variable length can be synthesized, there is a problem that versatility of a video conference system is lacking. In addition, since a processing device capable of executing high-speed processing is required, there is a possibility that the cost reduction of the hardware configuration is limited.

本発明は、上述の問題点を解決するためになされたものであって、その目的は、処理対象が可変長フォーマットに基づく映像信号以外の映像信号でもテレビ会議システムを実現可能なストリームデータの生成方法を提供することである。 The present invention has been made to solve the above-described problems, and an object of the present invention is to generate stream data capable of realizing a video conference system even when a video signal other than a video signal based on a variable-length format is processed. Is to provide a method.

本発明の他の目的は、テレビ会議システムの運用時における映像の表示処理の遅延を防止できるストリームデータの生成方法を提供することである。 Another object of the present invention is to provide a stream data generation method capable of preventing a delay in video display processing during operation of a video conference system.

本発明の他の目的は、処理対象が可変長フォーマットに基づく映像信号以外の映像信号でも作動するテレビ会議システムを提供することである。 Another object of the present invention is to provide a video conference system that operates even on video signals other than video signals based on a variable length format.

本発明の他の目的は、テレビ会議システムの運用時における映像の表示処理の遅延を防止できるストリームデータを生成できるストリームデータの生成装置を提供することである。 Another object of the present invention is to provide a stream data generating apparatus capable of generating stream data capable of preventing delay of video display processing during operation of a video conference system.

本発明の他の目的は、処理対象が可変長フォーマットに基づく映像信号以外の映像信号でもテレビ会議システムを実現するためのストリームデータを生成するストリームデータの生成装置を提供することである。 Another object of the present invention is to provide a stream data generating apparatus that generates stream data for realizing a video conference system even when a video signal other than a video signal based on a variable length format is processed.

本発明の他の目的は、テレビ会議システムの運用時における映像の表示処理の遅延を防止できるストリームデータを生成できるストリームデータの合成装置を提供することである。 Another object of the present invention is to provide a stream data synthesizing apparatus capable of generating stream data capable of preventing delay of video display processing during operation of a video conference system.

上記の課題を解決するために、この発明のある局面に従うと、ストリームデータの生成方法は、通信回線に電気的に接続された複数の端末において、ストリームデータを生成するステップを備える。複数の端末の各々は映像信号に基づいて映像を表示領域に表示する表示手段を含む。ストリームデータを生成するステップは、映像信号の入力を受けるステップと、表示領域において予め定められた一部の領域を特定するための領域情報を準備するステップと、領域情報に基づいて映像信号を符号化することにより、映像信号に基づく映像を一部の領域に表示するための映像データを生成する生成ステップと、映像データの送信のためのデータを生成するステップと、送信のためのデータを通信回線に対して出力するステップとを含む。生成方法は、通信回線に電気的に接続された映像信号合成装置において、各々の端末から出力されたデータに基づいて映像信号を合成するステップをさらに備える。映像信号を合成するステップは、通信回線を介して、複数の端末の各々により出力された各々のストリームデータの入力を受けるステップを備える。各々のストリームデータは、表示領域の一部に映像を表示するために予め定められた位置に応じて符号化された映像データを含む。位置は、複数のストリームデータの各々に応じて異なる。当該合成するステップは、複数のストリームデータに基づいて、各々の映像データに基づく映像を表示領域にそれぞれ表示するための表示用データを生成する合成ステップと、生成された表示用データの送信のためのデータを生成するステップと、通信回線に、生成された送信のためのデータを出力するステップとを含む。ストリームデータの合成方法は、複数の端末の各々において、映像信号合成装置から、送信のためのデータに基づいて生成されたデータを受信するステップと、複数の端末の各々において、受信されたデータに基づいて復号処理を実行することにより表示用のデータを生成するステップと、複数の端末の各々において、表示用のデータに基づいて、表示手段に映像を表示させるステップとをさらに備える。 In order to solve the above-described problem, according to one aspect of the present invention, a stream data generation method includes a step of generating stream data in a plurality of terminals electrically connected to a communication line. Each of the plurality of terminals includes display means for displaying a video in a display area based on the video signal. The step of generating stream data includes a step of receiving an input of a video signal, a step of preparing region information for identifying a predetermined region in the display region, and a step of encoding the video signal based on the region information. The generation step of generating video data for displaying video based on the video signal in a partial area, the step of generating data for transmission of the video data, and the communication of the data for transmission And outputting to the line. The generation method further includes a step of synthesizing the video signal based on the data output from each terminal in the video signal synthesis device electrically connected to the communication line. The step of synthesizing the video signal includes a step of receiving each stream data output by each of the plurality of terminals via the communication line. Each stream data includes video data encoded in accordance with a predetermined position for displaying a video in a part of the display area. The position differs depending on each of the plurality of stream data. The synthesizing step includes a synthesizing step of generating display data for displaying video based on each video data in a display area based on a plurality of stream data, and transmission of the generated display data And a step of outputting the generated data for transmission to a communication line. A method of combining stream data includes a step of receiving data generated based on data for transmission from a video signal synthesizer in each of a plurality of terminals, and a method of converting received data in each of a plurality of terminals. The method further includes a step of generating display data by executing decoding processing based on the above, and a step of causing the display means to display an image based on the display data in each of the plurality of terminals.

好ましくは、生成ステップは、一部の領域に対応する映像信号の符号化を行なうことにより、映像データを生成する符号化ステップを含む。 Preferably, the generation step includes an encoding step for generating video data by encoding a video signal corresponding to a partial area.

好ましくは、符号化ステップは、一部の領域に対応する映像信号を動き補償の対象として符号化を実行する。 Preferably, in the encoding step, encoding is executed by using a video signal corresponding to a partial area as a target of motion compensation.

好ましくは、合成ステップは、複数のストリームデータの各々から、各々の映像データを抽出するステップと、抽出された映像データについて予め定められた位置に基づいて、抽出された映像データをそれぞれ合成することにより表示データを生成するステップとを含む。 Preferably, the synthesizing step extracts each video data from each of the plurality of stream data, and synthesizes the extracted video data based on a predetermined position with respect to the extracted video data. Generating display data.

この発明の他の局面に従うと、テレビ会議システムは、通信回線に電気的に接続された生成装置を備える。生成装置は、表示領域に映像を表示する表示手段と、表示領域において予め定められた一部の領域を特定するための領域情報を格納する記憶手段と、映像信号の入力を受ける入力手段と、領域情報に基づいて映像信号を符号化することにより、映像信号に基づく映像を一部の領域に表示するための映像データを生成する生成手段と、映像データの送信のためのデータを生成する送信データ生成手段と、通信回線に電気的に接続され、送信のためのデータを通信回線に対して出力する出力手段とを含む。テレビ会議システムは、通信回線に電気的に接続された映像信号合成装置をさらに備える。映像信号合成装置は、通信回線を介して、複数のストリームデータの入力を受ける入力手段を備える。複数のストリームデータの各々は、映像の表示領域の一部に映像を表示するために予め定められた位置に応じて符号化された映像データを含む。位置は、複数のストリームデータの各々に応じて異なる。映像信号合成装置は、複数のストリームデータに基づいて、各々の映像データに基づく映像を表示領域にそれぞれ表示するための表示用データを生成する合成手段と、合成手段により生成された表示用データの送信のためのデータを生成する送信データ生成手段と、通信回線に、生成された送信のためのデータを出力する出力手段とを含む。生成装置は、通信回線から、送信のためのデータに基づいて生成されたデータを受信する受信手段と、受信手段により受信されたデータに基づいて復号処理を実行することにより表示用のデータを生成する復号手段と、表示用のデータに基づいて、表示手段に映像を表示させる制御手段とをさらに含む。 According to another aspect of the present invention, a video conference system includes a generation device electrically connected to a communication line. The generation apparatus includes a display unit that displays a video in a display area, a storage unit that stores area information for specifying a predetermined area in the display area, an input unit that receives an input of a video signal, A generation means for generating video data for displaying video based on the video signal in a partial area by encoding the video signal based on the area information, and transmission for generating data for transmitting the video data Data generating means and output means electrically connected to the communication line and outputting data for transmission to the communication line. The video conference system further includes a video signal synthesizer electrically connected to the communication line. The video signal synthesizer includes input means for receiving a plurality of stream data inputs via a communication line. Each of the plurality of stream data includes video data encoded in accordance with a predetermined position for displaying video in a part of the video display area. The position differs depending on each of the plurality of stream data. The video signal synthesizing device is configured to generate display data for displaying video based on each video data in the display area based on the plurality of stream data, and display data generated by the synthesizing unit. Transmission data generating means for generating data for transmission and output means for outputting the generated data for transmission to a communication line are included. The generation device generates data for display from a receiving unit that receives data generated based on data for transmission from a communication line and a decoding process based on the data received by the receiving unit. And a control means for causing the display means to display an image based on the display data.

この発明の他の局面に従うと、ストリームデータの生成装置は、表示領域に映像を表示する表示手段と、表示領域において予め定められた一部の領域を特定するための領域情報を格納する記憶手段と、映像信号の入力を受ける入力手段と、領域情報に基づいて映像信号を符号化することにより、映像信号に基づく映像を一部の領域に表示するための映像データを生成する生成手段と、通信回線に電気的に接続され、映像データを通信回線に対して出力するための出力手段と、通信回線から、送信のためのデータに基づいて生成されたデータを受信する受信手段と、受信手段により受信されたデータに基づいて復号処理を実行することにより表示用のデータを生成する復号手段と、表示用のデータに基づいて、表示手段に映像を表示させる制御手段とを備える。 According to another aspect of the present invention, the stream data generation apparatus includes a display unit that displays video in a display area, and a storage unit that stores area information for specifying a predetermined area in the display area. And an input means for receiving an input of the video signal; a generating means for generating video data for displaying the video based on the video signal in a partial area by encoding the video signal based on the area information; An output means electrically connected to the communication line and outputting video data to the communication line; a receiving means for receiving data generated based on the data for transmission from the communication line; and a receiving means Decoding means for generating display data by executing decoding processing based on the data received by the control, and control for displaying video on the display means based on the display data And a stage.

好ましくは、生成手段は、一部の領域に対応する映像信号の符号化を行なうことにより、映像データを生成する符号化手段を含む。 Preferably, the generation unit includes an encoding unit that generates video data by encoding a video signal corresponding to a partial area.

好ましくは、符号化手段は、一部の領域に対応する映像信号を動き補償の対象として符号化を実行する。 Preferably, the encoding unit performs encoding using a video signal corresponding to a partial area as a target of motion compensation.

好ましくは、映像データは、一部の領域にのみ映像を表示するためのデータのみを含む。 Preferably, the video data includes only data for displaying a video only in a partial area.

好ましくは、ストリームデータの生成装置は、外部からの入力に基づいて、記憶手段に格納されている領域情報を変更する変更手段をさらに備える。 Preferably, the stream data generation device further includes a changing unit that changes the area information stored in the storage unit based on an external input.

好ましくは、ストリームデータの生成装置は、外部から領域情報の入力を受ける指示入力手段をさらに備える。変更手段は、記憶手段に格納されている領域情報を、指示入力手段を介して入力された領域情報に変更する。 Preferably, the stream data generation apparatus further includes an instruction input unit that receives an input of region information from the outside. The changing means changes the area information stored in the storage means to the area information input via the instruction input means.

好ましくは、受信手段は、通信回線から、領域情報が含まれるデータを受信する。変更手段は、記憶手段に格納されている領域情報を、受信されたデータに含まれる領域情報に変更する。 Preferably, the receiving means receives data including area information from the communication line. The changing means changes the area information stored in the storage means to area information included in the received data.

この発明の他の局面に従うと、ストリームデータの合成装置は、通信回線を介して、複数のストリームデータの入力を受ける入力手段を備える。複数のストリームデータの各々は、映像の表示領域の一部に映像を表示するために予め定められた位置に応じて符号化された映像データを含む。位置は、複数のストリームデータの各々に応じて異なる。ストリームデータの合成装置は、複数のストリームデータに基づいて、各々の映像データに基づく映像を表示領域にそれぞれ表示するための表示用データを生成する合成手段と、合成手段により生成された表示用データの送信のためのデータを生成する送信データ生成手段と、通信回線に、生成された送信のためのデータを出力する出力手段とを備える。 According to another aspect of the present invention, the stream data synthesizing apparatus includes input means for receiving a plurality of stream data inputs via a communication line. Each of the plurality of stream data includes video data encoded in accordance with a predetermined position for displaying video in a part of the video display area. The position differs depending on each of the plurality of stream data. A stream data synthesizing device includes a synthesizing unit that generates display data for displaying video based on each video data in a display area based on a plurality of stream data, and display data generated by the synthesizing unit Transmission data generating means for generating data for transmission of the data, and output means for outputting the generated data for transmission to the communication line.

好ましくは、合成手段は、複数のストリームデータの各々から、各々の映像データを抽出する抽出手段と、抽出された映像データについて予め定められた位置に基づいて、抽出された映像データをそれぞれ合成することにより、表示データを生成する生成手段とを含む。 Preferably, the synthesizing unit synthesizes the extracted video data based on a predetermined position with respect to the extracted video data and an extraction unit that extracts each video data from each of the plurality of stream data. And generating means for generating display data.

好ましくは、複数のストリームデータの各々は、各々のストリームデータが生成された時刻を特定するための時刻データを含む。合成手段は、複数のストリームデータの各々の時刻データに基づいて、表示用データを生成する。 Preferably, each of the plurality of stream data includes time data for specifying a time at which each stream data is generated. The synthesizing unit generates display data based on the time data of each of the plurality of stream data.

好ましくは、合成手段は、各映像データを復号することなく表示用データを生成する。 Preferably, the synthesizing unit generates display data without decoding each video data.

好ましくは、複数のストリームデータの各々は、各々の映像データに基づく映像が表示される領域を特定するための特定データを含む。 Preferably, each of the plurality of stream data includes specific data for specifying an area in which a video based on each video data is displayed.

好ましくは、合成手段は、特定データに基づいて、表示データを生成する。 Preferably, the synthesizing unit generates display data based on the specific data.

本発に係るストリームデータの生成方法によると、処理対象が可変長フォーマットに基づく映像信号以外の映像信号でもテレビ会議システムを実現することができる。 According to the stream data generation method according to the present invention, a video conference system can be realized even with a video signal other than a video signal whose processing target is based on a variable length format.

本発明に係るストリームデータの生成方法によると、テレビ会議システムの運用時における映像の表示処理の遅延を防止できる。 According to the stream data generation method of the present invention, it is possible to prevent a delay in video display processing during the operation of the video conference system.

本発明に係るテレビ会議システムによると、処理対象が可変長フォーマットに基づく映像信号以外の映像信号でも作動する。 According to the video conference system according to the present invention, a video signal other than a video signal based on a variable length format is processed.

本発明に係るストリームデータの生成装置によると、テレビ会議システムの運用時における映像の表示処理の遅延を防止できる。 According to the stream data generation device of the present invention, it is possible to prevent delay of video display processing during operation of the video conference system.

本発明に係るストリームデータの生成装置によると、処理対象が可変長フォーマットに基づく映像信号以外の映像信号でもテレビ会議システムを実現することができる。 According to the stream data generating apparatus of the present invention, a video conference system can be realized even with a video signal other than a video signal whose processing target is based on a variable length format.

本発明に係るストリームデータの合成装置によると、テレビ会議システムの運用時における映像の表示処理の遅延を防止できるストリームデータを生成することができる。 According to the stream data synthesizing apparatus according to the present invention, it is possible to generate stream data that can prevent a delay in video display processing during operation of the video conference system.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部品には同一の符号を付してある。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

図１を参照して、本発明の実施の形態に係るテレビ会議システムについて説明する。図１は、テレビ会議システム１０のシステム構成を表わすブロック図である。 A video conference system according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing the system configuration of the video conference system 10.

テレビ会議システム１０は、端末１００ａ，１００ｂ，１００ｃ，１００ｄと、映像信号合成装置１２００とを備える。各々の端末は、総称される場合には、端末１００と称する。映像信号合成装置１２００と各端末とは、通信回線を介してそれぞれ接続されている。端末１００は、カメラ１１０と、マイク１２０と、モニタテレビ１３０と、ストリームデータの生成装置４００とを含む。ストリームデータの生成装置４００は、ＭＰＥＧ（Moving Picture Experts Group）エンコーダ５００と、ＭＰＥＧデコーダ４６０とを含む。カメラ１１０から出力される信号は、ＭＰＥＧエンコーダ５００に入力される。マイク１２０から出力される信号は、ＭＰＥＧエンコーダ５００に入力される。ＭＰＥＧデコーダ４６０から出力される信号は、モニタテレビ１３０に入力される。ＭＰＥＧエンコーダ５００から出力される信号は、通信回線を介して映像信号合成装置１２００に対して送信される。ＭＰＥＧデコーダ４６０は、映像信号合成装置１２００により送信された信号の入力を受ける。 The video conference system 10 includes terminals 100a, 100b, 100c, and 100d, and a video signal synthesizer 1200. Each terminal is referred to as a terminal 100 when collectively referred to. The video signal synthesizer 1200 and each terminal are connected to each other via a communication line. The terminal 100 includes a camera 110, a microphone 120, a monitor television 130, and a stream data generation device 400. The stream data generating apparatus 400 includes an MPEG (Moving Picture Experts Group) encoder 500 and an MPEG decoder 460. A signal output from the camera 110 is input to the MPEG encoder 500. A signal output from the microphone 120 is input to the MPEG encoder 500. A signal output from the MPEG decoder 460 is input to the monitor television 130. A signal output from the MPEG encoder 500 is transmitted to the video signal synthesis apparatus 1200 via a communication line. The MPEG decoder 460 receives a signal transmitted from the video signal synthesis device 1200.

映像信号合成装置１２００は、通信回線を介して信号の入力を受ける多地点接続用コントローラ１２２０と、通信回線を介して信号を送信可能な送信部１２４０とを含む。多地点接続用コントローラ１２２０から出力される信号は、送信部１２４０に入力される。 Video signal synthesizer 1200 includes a multipoint connection controller 1220 that receives a signal input via a communication line, and a transmission unit 1240 that can transmit the signal via the communication line. A signal output from the multipoint connection controller 1220 is input to the transmission unit 1240.

ここで、図２を参照して、各端末と映像信号合成装置１２００との間で通信される信号の構成について説明する。図２（Ａ）は、Ａ地点に配置されている端末１００ａから出力されるストリームの構成を概念的に表わす図である。図２（Ｂ）は、Ｂ地点に配置されている端末１００ｂから出力されるストリームの構成を概念的に表わす図である。図２（Ｃ）は、Ｃ地点に配置されている端末１００ｃから出力されるストリームの構成を概念的に表わす図である。図２（Ｄ）は、Ｄ地点に配置されている端末１００ｄから出力されるストリームの構成を表わす図である。図２（Ｅ）は、映像信号合成装置１２００から各端末にそれぞれ送信されるストリームの構成を概念的に表わす図である。 Here, the configuration of signals communicated between each terminal and the video signal synthesis apparatus 1200 will be described with reference to FIG. FIG. 2A is a diagram conceptually showing the structure of a stream output from terminal 100a arranged at point A. FIG. 2B is a diagram conceptually showing the configuration of the stream output from terminal 100b arranged at point B. FIG. 2C is a diagram conceptually showing the configuration of the stream output from terminal 100c arranged at point C. FIG. 2D is a diagram illustrating a configuration of a stream output from terminal 100d arranged at point D. FIG. 2 (E) is a diagram conceptually showing the structure of a stream transmitted from video signal synthesis apparatus 1200 to each terminal.

図２（Ａ）に示されるように、端末１００ａから出力される信号は、領域２１０〜２１４を含む。領域２１０には、ヘッダが格納されている。領域２１２には、映像音声データＡが格納されている。なお、映像音声データとは、映像を表示するためのデータと音声を出力するためのデータとを含む。領域２１４には、端末１００ａから出力される信号のデータサイズを予め定められたサイズに揃えるためのデータが格納されている。ヘッダは、本実施の形態に係るテレビ会議システム１０を実現するために予め定められたデータ項目を含む。映像音声データＡは、端末１００ａが備えるカメラ１１０により撮影された映像に基づくデータとマイク１２０により取得された音声データとを含む。映像音声データＡは、後述するように、テレビ会議システム１０を構成する各々の端末に割り当てられた領域に映像を表示するように、予め圧縮符号化処理されている。 As shown in FIG. 2A, the signal output from the terminal 100a includes regions 210 to 214. The area 210 stores a header. In the area 212, video / audio data A is stored. The video / audio data includes data for displaying video and data for outputting audio. In the area 214, data for aligning the data size of the signal output from the terminal 100a with a predetermined size is stored. The header includes data items determined in advance for realizing the video conference system 10 according to the present embodiment. The video / audio data A includes data based on video captured by the camera 110 included in the terminal 100 a and audio data acquired by the microphone 120. As will be described later, the video / audio data A is compression-encoded in advance so as to display video in an area allocated to each terminal constituting the video conference system 10.

図２（Ｂ）に示されるように、端末１００ｂから出力される信号は、領域２２０〜２２６を含む。領域２２０には、ヘッダが格納されている。このヘッダは、領域２１０に格納されているヘッダが有するデータ項目と同一の項目を含む。領域２２２は、領域２１２が有するデータサイズと同一のサイズを有する。領域２２２には、映像を表示するためのデータは格納されていない。領域２２４には、映像音声データＢが格納されている。領域２２６には、映像を表示するためのデータが格納されていない。領域２２６には、端末１００ｂから出力される信号のデータサイズを予め定められたサイズに揃えるためのデータが格納されている。ここで予め定められたサイズとは、端末１００ａから出力される信号のサイズについて予め定められたサイズと同一である。 As illustrated in FIG. 2B, the signal output from the terminal 100b includes regions 220 to 226. The area 220 stores a header. This header includes the same items as the data items included in the header stored in the area 210. The area 222 has the same size as the data size of the area 212. In the area 222, data for displaying a video is not stored. In the area 224, video / audio data B is stored. In the area 226, data for displaying a video is not stored. The area 226 stores data for aligning the data size of the signal output from the terminal 100b with a predetermined size. Here, the predetermined size is the same as the predetermined size for the size of the signal output from the terminal 100a.

図２（Ｃ）に示されるように、端末１００ｃから出力される信号は、領域２３０から２３６を含む。領域２３０には、ヘッダが格納されている。このヘッダは、領域２１０に格納されているヘッダが有するデータ項目と同一の項目を含む。領域２３２は、領域２２２が有するデータサイズと領域２２４が有するデータサイズとの和に等しいサイズを有する。領域２３４には、映像音声データＣが格納されている。領域２３６には、映像を表示するためのデータが格納されていない。領域２３６には、端末１００ｃから出力される信号のデータサイズを予め定められたサイズに揃えるためのデータが格納されている。ここで予め定められたサイズとは、端末１００ａから出力される信号のサイズについて予め定められたサイズと同一である。 As shown in FIG. 2C, the signal output from the terminal 100c includes regions 230 to 236. In the area 230, a header is stored. This header includes the same items as the data items included in the header stored in the area 210. The area 232 has a size equal to the sum of the data size of the area 222 and the data size of the area 224. In the area 234, video / audio data C is stored. In the area 236, data for displaying a video is not stored. The area 236 stores data for aligning the data size of the signal output from the terminal 100c with a predetermined size. Here, the predetermined size is the same as the predetermined size for the size of the signal output from the terminal 100a.

図２（Ｄ）に示されるように、端末１００ｄから出力される信号は、領域２４０〜２４６を含む。領域２４０には、ヘッダが格納されている。このヘッダは、領域２１０に格納されているヘッダが有するデータ項目と同一の項目を含む。領域２４２には、映像を表示するためのデータが格納されていない。領域２４２には、有効な映像データは格納されていない。しかし、領域２４２に相当するマクロブロックの座標はフレームに存在する（有効ではないが映像データが存在するとしてエンコードされるため）ので、別の有効な映像データが含まれている。フレームの同じマクロブロックアドレス間を置換して、フレームが合成される。なお、マクロブロックについては後述する。領域２４６には、映像音声データＤが格納されている。 As shown in FIG. 2D, the signal output from terminal 100d includes regions 240-246. In the area 240, a header is stored. This header includes the same items as the data items included in the header stored in the area 210. The area 242 does not store data for displaying video. In the area 242, valid video data is not stored. However, since the coordinates of the macroblock corresponding to the region 242 exist in the frame (because it is not valid but is encoded as the presence of video data), another valid video data is included. A frame is synthesized by replacing between the same macroblock addresses of the frame. The macro block will be described later. In the area 246, video / audio data D is stored.

図２（Ｅ）に示されるように、映像信号合成装置１２００から出力される信号は、領域２５０〜２５８を含む。ヘッダは、領域２５０に格納されている。このヘッダは、テレビ会議システムを実現するために予め定められたデータ項目を含む。このデータ項目は、当該信号の送信元（すなわち映像信号合成装置１２００）を特定するためのデータと、送信先（すなわち端末１００ａ，１００ｂ，１００ｃ，１００ｄ）を特定するためのデータとを含む。映像信号合成装置１２００がこのようなヘッダに基づいて通信回線に対して信号を出力すると、その信号は、送信先として指定された端末の数に合わせて複製され、各複製後の信号は、通信回線を介して各端末に向けて送信される。 As shown in FIG. 2E, the signal output from the video signal synthesis device 1200 includes regions 250 to 258. The header is stored in area 250. This header includes data items predetermined for realizing the video conference system. This data item includes data for specifying the transmission source (that is, the video signal combining device 1200) of the signal and data for specifying the transmission destination (that is, the terminals 100a, 100b, 100c, and 100d). When the video signal synthesizer 1200 outputs a signal to the communication line based on such a header, the signal is duplicated in accordance with the number of terminals designated as transmission destinations, and the signal after each duplication is communicated. It is sent to each terminal via the line.

領域２５２には、端末１００ａから送信された映像データが格納されている。すなわち領域２５２に格納されている映像データは、領域２１２に格納されている映像データと同一のものである。領域２５４には、端末１００ｂから送信された映像データが格納されている。この映像データは、図２（Ｂ）に示される領域２２４に格納されている映像データと同一である。領域２５６には、端末１００ｃから送信された映像データが格納されている。この映像データは、図２（Ｃ）に示される領域２３４に格納されている映像データと同一である。領域２５８には、端末１００ｄから送信された映像データが格納されている。この映像データは、図２（Ｅ）に示される領域２４６に格納されている映像データと同一である。 The area 252 stores video data transmitted from the terminal 100a. That is, the video data stored in the area 252 is the same as the video data stored in the area 212. The area 254 stores video data transmitted from the terminal 100b. This video data is the same as the video data stored in the area 224 shown in FIG. The area 256 stores video data transmitted from the terminal 100c. This video data is the same as the video data stored in the area 234 shown in FIG. The area 258 stores video data transmitted from the terminal 100d. This video data is the same as the video data stored in the area 246 shown in FIG.

なお、図２（Ａ）〜図２（Ｄ）に示される信号は、予め定められたデータサイズを有するものとして示されているが、各々の端末から出力される各信号の構成は、図２（Ａ）〜図２（Ｄ）に示されるものに限られない。たとえば端末１００ａから出力される信号は、領域２１０と領域２１２のみを有するものであってもよい。 The signals shown in FIGS. 2A to 2D are shown as having a predetermined data size, but the configuration of each signal output from each terminal is shown in FIG. (A)-It is not restricted to what is shown in Drawing 2 (D). For example, the signal output from the terminal 100a may have only the area 210 and the area 212.

次に、図３を参照して、図２（Ａ）から図２（Ｅ）の各々に示される信号に基づいて表示される映像について説明する。図３（Ａ）から図３（Ｅ）は、それぞれ端末１００ａ，１００ｂ，１００ｃ，１００ｄがモニタテレビ１３０に表示する画面の一態様を表わす図である。 Next, with reference to FIG. 3, an image displayed based on the signals shown in FIGS. 2A to 2E will be described. FIG. 3A to FIG. 3E are diagrams showing one mode of screens displayed on the monitor television 130 by the terminals 100a, 100b, 100c, and 100d, respectively.

図３（Ａ）を参照して、端末１００ａは、図２（Ａ）に示される信号に基づいて映像を表示する場合には、その映像Ａは、領域３１０に表示される。ここで領域３１０は、テレビ会議システムを実現する場合に予め設定される領域である。この領域は、テレビ会議システム１０を構成する端末の数に応じて設定される。たとえば４つの端末がテレビ会議システムに参加する場合には、モニタテレビ１３０の表示領域を４分割するように設定される。またその場合の各領域は、たとえば予め当該システムに参加し得る端末に対して割当てられている位置とその領域とを関連付けることにより、映像が実際に表示される領域を特定することができる。またこの領域は、後述するように外部からの入力に基づいて変更することもできる。 Referring to FIG. 3A, when terminal 100a displays a video based on the signal shown in FIG. 2A, video A is displayed in area 310. Here, the area 310 is an area set in advance when the video conference system is realized. This area is set according to the number of terminals constituting the video conference system 10. For example, when four terminals participate in the video conference system, the display area of the monitor television 130 is set to be divided into four. Each area in that case can specify an area where an image is actually displayed by associating the area with a position assigned to a terminal that can participate in the system in advance. Further, this area can be changed based on an external input as will be described later.

図３（Ｂ）を参照して、端末１００ｂが図２（Ｂ）に示される信号に基づいて映像を表示する場合には、映像Ｂは、端末１００ｂのモニタテレビ１３０の領域３２０に表示される。 Referring to FIG. 3B, when terminal 100b displays a video based on the signal shown in FIG. 2B, video B is displayed in area 320 of monitor television 130 of terminal 100b. .

図３（Ｃ）を参照して、端末１００ｃが図２（Ｃ）に示される信号に基づいて映像を表示する場合には、映像Ｃは、端末１００ｃが備えるモニタテレビ１３０の領域３３０に表示される。 Referring to FIG. 3C, when terminal 100c displays a video based on the signal shown in FIG. 2C, video C is displayed in area 330 of monitor television 130 provided in terminal 100c. The

図３（Ｄ）を参照して、端末１００ｄが図２（Ｄ）に示される信号に基づいて映像を表示する場合には、映像Ｄは、端末１００ｄが備えるモニタテレビ１３０の領域３４０に表示される。 Referring to FIG. 3D, when terminal 100d displays a video based on the signal shown in FIG. 2D, video D is displayed in area 340 of monitor TV 130 provided in terminal 100d. The

また、図３（Ｅ）を参照して、各々の端末１００が図２（Ｅ）に示される信号に基づいて映像を表示する場合には、映像Ａは、各端末のモニタテレビ１３０における領域３１２に表示される。この表示位置は、図３（Ａ）に示される表示位置（すなわち領域３１０）と同一である。また映像Ｂは、領域３２２に表示される。この映像が表示される位置は、図３（Ｂ）に示される映像Ｂの表示位置（すなわち領域３２０）と同一である。映像Ｃは、領域３３２に表示される。この映像が表示される位置は、図３（Ｃ）に示される表示位置（すなわち領域３３０）と同一である。映像Ｄは、領域３４２に表示される。この映像が表示される位置は、図３（Ｄ）に示される映像の表示位置（すなわち領域３４０）と同一である。 3E, when each terminal 100 displays an image based on the signal shown in FIG. 2E, the image A is an area 312 in the monitor television 130 of each terminal. Is displayed. This display position is the same as the display position shown in FIG. Video B is displayed in area 322. The position where this video is displayed is the same as the display position of video B shown in FIG. Video C is displayed in area 332. The position where this video is displayed is the same as the display position (that is, the region 330) shown in FIG. Video D is displayed in area 342. The position where this video is displayed is the same as the video display position (ie, region 340) shown in FIG.

このように、各端末は、図３（Ａ）〜図３（Ｄ）に示されるような映像を表示するための信号をそれぞれ映像信号合成装置１２００に送信することにより、図３（Ｅ）に示される映像の表示を実現する信号を受信する。このような通信を実現するための構成について以下説明する。 In this manner, each terminal transmits a signal for displaying a video as shown in FIGS. 3A to 3D to the video signal synthesis device 1200, thereby FIG. A signal for realizing display of the displayed video is received. A configuration for realizing such communication will be described below.

図４を参照して、本実施の形態に係るストリームデータの生成装置４００の構成について説明する。図４は、ストリームデータの生成装置４００のハードウェア構成を表わすブロック図である。ストリームデータの生成装置４００は、映像インターフェイス４１０と、音声インターフェイス４２０と、メモリ４３０と、ＭＰＥＧエンコーダ５００と、送信部４４０と、受信部４５０と、ＭＰＥＧデコーダ４６０と、映像インターフェイス４７０と、音声インターフェイス４８０とを含む。 With reference to FIG. 4, a configuration of stream data generation apparatus 400 according to the present embodiment will be described. FIG. 4 is a block diagram showing a hardware configuration of stream data generation apparatus 400. The stream data generating apparatus 400 includes a video interface 410, an audio interface 420, a memory 430, an MPEG encoder 500, a transmission unit 440, a reception unit 450, an MPEG decoder 460, a video interface 470, and an audio interface 480. Including.

映像インターフェイス４１０は、カメラ１１０に接続されている。映像インターフェイス４１０は、カメラ１１０から出力される映像信号の入力を受ける。この映像信号は、ＭＰＥＧエンコーダ５００に対して送出される。 The video interface 410 is connected to the camera 110. The video interface 410 receives a video signal output from the camera 110. This video signal is sent to the MPEG encoder 500.

音声インターフェイス４２０は、マイク１２０に接続されている。音声インターフェイス４２０は、マイク１２０から出力される音声信号の入力を受ける。音声インターフェイス４２０は、その信号をデジタル信号としてＭＰＥＧエンコーダ５００に対して送出する。 The audio interface 420 is connected to the microphone 120. The audio interface 420 receives an audio signal output from the microphone 120. The audio interface 420 sends the signal to the MPEG encoder 500 as a digital signal.

メモリ４３０は、各端末ごとに、図３（Ａ）〜図３（Ｄ）のいずれかに示される映像の表示を実現するためのデータを格納している。このデータ構造については後述する（図６）。 The memory 430 stores data for realizing video display shown in any of FIGS. 3A to 3D for each terminal. This data structure will be described later (FIG. 6).

ＭＰＥＧエンコーダ５００は、映像インターフェイス４１０から出力される信号と音声インターフェイス４２０から出力される信号とメモリ４３０に格納されているデータとに基づいて、モニタテレビ１３０において予め定められた位置に映像を表示するための映像データを生成する。ＭＰＥＧエンコーダ５００は、生成された映像データを送信部４４０に対して送出する。 The MPEG encoder 500 displays the video at a predetermined position on the monitor television 130 based on the signal output from the video interface 410, the signal output from the audio interface 420, and the data stored in the memory 430. Video data is generated. The MPEG encoder 500 sends the generated video data to the transmission unit 440.

送信部４４０は、ＭＰＥＧエンコーダ５００からの映像データに基づいて映像信号合成装置１２００に送信するためのデータを生成する。具体的には、送信部４４０は、当該映像データに予め定められたデータ項目を有するヘッダを付加して、送信用のデータを生成する。このデータ項目は、たとえば、送信用のデータの宛先である映像信号合成装置１２００のネットワークアドレスを有する。送信部４４０は、送信用のデータを生成すると、通信回線に対して送出する。 The transmission unit 440 generates data to be transmitted to the video signal synthesis device 1200 based on the video data from the MPEG encoder 500. Specifically, the transmission unit 440 adds a header having a predetermined data item to the video data to generate data for transmission. This data item has, for example, the network address of the video signal synthesis device 1200 that is the destination of the data for transmission. When the transmission unit 440 generates data for transmission, it transmits the data to the communication line.

受信部４５０は、通信回線を介して映像信号合成装置１２００からテレビ会議用の映像データを受信する。この映像データは、ＭＰＥＧエンコーダ５００により生成された映像データに加えて、他の端末により生成された映像データを含む。受信部４５０は、そのデータをＭＰＥＧデコーダ４６０に対して送出する。 The receiving unit 450 receives video data for video conferencing from the video signal synthesis device 1200 via a communication line. This video data includes video data generated by other terminals in addition to the video data generated by the MPEG encoder 500. Receiving unit 450 sends the data to MPEG decoder 460.

ＭＰＥＧデコーダ４６０は、受信部４５０により受信されたデータを復号する。復号化処理により生成された信号は、映像を表示するための信号（以下、映像信号）と音声を出力するための信号（以下、音声信号）とを含む。映像信号は、映像インターフェイス４７０に対して送出される。音声信号は、音声インターフェイス４８０に対して送出される。 The MPEG decoder 460 decodes the data received by the receiving unit 450. The signal generated by the decoding process includes a signal for displaying video (hereinafter referred to as video signal) and a signal for outputting audio (hereinafter referred to as audio signal). The video signal is sent to the video interface 470. The audio signal is sent to the audio interface 480.

映像インターフェイス４７０は、その信号をモニタテレビ１３０に対して出力する。これにより、モニタテレビ１３０は、テレビ会議用に構成された映像（図３（Ｅ））を表示する。 The video interface 470 outputs the signal to the monitor television 130. Thereby, the monitor television 130 displays a video (FIG. 3E) configured for the video conference.

音声インターフェイス４８０は、音声信号をスピーカ（図示しない）に対して出力する。当該スピーカは、映像の出力に合わせて、その信号に基づく音声を出力する。 The audio interface 480 outputs an audio signal to a speaker (not shown). The said speaker outputs the audio | voice based on the signal according to the output of an image | video.

図５を参照して、ＭＰＥＧエンコーダ５００について説明する。図５は、ＭＰＥＧエンコーダ５００のハードウェア構成を表わすブロック図である。 The MPEG encoder 500 will be described with reference to FIG. FIG. 5 is a block diagram showing a hardware configuration of MPEG encoder 500.

ＭＰＥＧエンコーダ５００は、映像信号と音声信号との入力をそれぞれ受ける受信バッファ５１０と、入力された各々の信号に対して予め定められた圧縮符号化処理を実行する符号化回路５２０と、圧縮されたデータから送信用のデータを抽出する抽出回路５３０と、抽出されたデータを一時的に格納する送信バッファ５４０とを含む。 The MPEG encoder 500 includes a reception buffer 510 that receives an input of a video signal and an audio signal, an encoding circuit 520 that executes a predetermined compression encoding process on each input signal, and a compressed signal. An extraction circuit 530 for extracting data for transmission from the data and a transmission buffer 540 for temporarily storing the extracted data are included.

図６を参照して、本実施の形態に係る端末のデータ構造について説明する。図６は、メモリ４３０におけるデータの格納の一態様を表わす図である。 With reference to FIG. 6, the data structure of the terminal according to the present embodiment will be described. FIG. 6 is a diagram illustrating an aspect of data storage in memory 430.

メモリ４３０は、領域６１０〜６３０を含む。モニタテレビ１３０において定められる領域を特定するためのデータは、領域６１０に格納されている。各々の領域を表わすためのデータは、領域６２０に格納されている。各々の領域における映像の表示が可能であるか否かを表わすデータは、領域６３０に格納されている。たとえば第１の表示領域は、座標（１０，１０）から（４０，５０）として規定されている。第１の表示領域は、第１の端末１００ａにおいて使用可能に設定されている。第２の表示領域は、座標（１０，５０）から（４０，９０）として規定されている。この領域は、映像の表示のために使用できないと設定されている。同様に第３の表示領域および第４の表示領域についても、各々の座標および映像の表示の可否が設定されている。 Memory 430 includes areas 610-630. Data for specifying an area defined in the monitor television 130 is stored in the area 610. Data for representing each area is stored in area 620. Data indicating whether video can be displayed in each area is stored in area 630. For example, the first display area is defined as coordinates (10, 10) to (40, 50). The first display area is set to be usable in the first terminal 100a. The second display area is defined as coordinates (10, 50) to (40, 90). This area is set to be unusable for video display. Similarly, with respect to the third display area and the fourth display area, the respective coordinates and whether or not video can be displayed are set.

図６に示されるメモリ４３０のデータ構造は、端末１００ａにおけるものである。端末１００ｂにおけるデータ構造は、第１の表示領域に代えて第２の表示領域が「使用可」と設定されている。また端末１００ｃについては、第３の表示領域が「使用可」として設定されている。なお、モニタテレビ１３０における表示領域の特定の態様は、図６に示されるものに限られない。図６に示されるデータは、テレビ会議システムの開始時に予め格納される場合もあれば、キーボード（図示しない）その他の入力装置を介した入力により、変更可能であってもよい。あるいは、当該システムに接続されている端末の数に応じて設定されてもよい。この場合は、新たに接続しようとする端末におけるログイン操作に基づいて接続されている端末の数を検知し、その検知の結果に基づいて表示領域を細分化するように、当該領域を特定するデータ（たとえば座標）を算出してもよい。 The data structure of the memory 430 shown in FIG. 6 is that in the terminal 100a. In the data structure of the terminal 100b, the second display area is set to “usable” instead of the first display area. For the terminal 100c, the third display area is set as “usable”. The specific form of the display area on the monitor television 130 is not limited to that shown in FIG. The data shown in FIG. 6 may be stored in advance at the start of the video conference system, or may be changeable by input via a keyboard (not shown) or other input device. Or you may set according to the number of the terminals connected to the said system. In this case, data for identifying the area so that the number of connected terminals is detected based on a login operation at a terminal to be newly connected and the display area is subdivided based on the detection result. (For example, coordinates) may be calculated.

ここで、図７を参照して、図６に示される座標について説明する。図７は、モニタテレビ１３０における座標を概念的に表わす図である。 Here, the coordinates shown in FIG. 6 will be described with reference to FIG. FIG. 7 is a diagram conceptually showing coordinates on the monitor television 130.

図７に示されるように、モニタテレビ１３０において、映像を表示可能な領域として座標（１０，１０）から（８０，９０）が予め設定されている。この座標の定義は、図６に示される領域の特定のための定義に対応する。したがって、各端末が図６に示される座標に基づいて映像の表示処理を実行すると、モニタテレビ１３０は、その座標に応じた領域に映像を表示する。 As shown in FIG. 7, in the monitor television 130, coordinates (10, 10) to (80, 90) are set in advance as areas where video can be displayed. This definition of coordinates corresponds to the definition for specifying the region shown in FIG. Therefore, when each terminal executes the video display process based on the coordinates shown in FIG. 6, the monitor television 130 displays the video in an area corresponding to the coordinates.

次に、図８を参照して、テレビ会議システム用の映像データと表示領域との関係について説明する。図８は、端末１００ａに関し、モニタテレビ１３０に映像を表示する場合に当該表示の対象となる映像データの対応する領域を表わす図である。 Next, the relationship between video data for a video conference system and a display area will be described with reference to FIG. FIG. 8 is a diagram showing a corresponding region of video data to be displayed when video is displayed on the monitor television 130 with respect to the terminal 100a.

端末１００ａに関し、モニタテレビ１３０が備える映像の表示領域は、領域８１０と領域８２０とを含む。領域８１０は、端末１００ａが備えるカメラ１１０により生成された映像データに基づく映像を表示するための領域である。この領域は、図６に示されるようなでーたによって特定される。領域８２０は、そのカメラ１１０による表示を行なわないために排他的に確保された領域である。領域８２０は、すなわち他の端末（たとえば端末１００ｂから１００ｄ）から送信された映像データに基づく映像を表示するために使用される。 Regarding the terminal 100a, the video display area included in the monitor television 130 includes an area 810 and an area 820. An area 810 is an area for displaying a video based on video data generated by the camera 110 included in the terminal 100a. This region is identified by a dot as shown in FIG. The area 820 is an area reserved exclusively for not performing display by the camera 110. The area 820 is used to display a video based on video data transmitted from other terminals (for example, the terminals 100b to 100d).

次に、図９を参照して、各端末１００から映像信号合成装置１２００に対して送信されるＭＰＥＧストリームの構成について説明する。図９（Ａ）は、ＭＰＥＧストリームのピクチャ層の構成を表わす図である。図９（Ｂ）は、ピクチャ層に含まれるスライス層の構成を概念的に表わす図である。図９（Ｃ）は、スライス層に含まれるマクロブロック（ＭＢ）層の構成を概念的に表わす図である。図９（Ｄ）は、ＭＰＥＧストリームデータに基づくリファレンス映像領域と参照領域との関係を表わす図である。 Next, the configuration of an MPEG stream transmitted from each terminal 100 to the video signal synthesis device 1200 will be described with reference to FIG. FIG. 9A shows the structure of the picture layer of the MPEG stream. FIG. 9B is a diagram conceptually showing the configuration of slice layers included in the picture layer. FIG. 9C conceptually shows a configuration of a macroblock (MB) layer included in the slice layer. FIG. 9D is a diagram showing the relationship between the reference video area and the reference area based on the MPEG stream data.

図９（Ａ）に示されるように、ピクチャは、ピクチャヘッダ９１０と、ピクチャタイプ９２０と、動きベクトル探索範囲９３０と、スライス層９４０とを含む。図９（Ｂ）に示されるように、スライス層９４０は、スライスヘッダ９５０と、スライス位置９６０と、ＭＢ層９７０とを含む。図９（Ｃ）に示されるように、ＭＢ層９７０は、ＭＢのアドレス９７２と、ＭＢタイプ９７４と、動きベクトル９７６と、ＭＢ符号データ９７８とを含む。ＭＢ符号データ９７８は、ＭＰＥＧエンコーダ５００により圧縮符号化された映像データである。 As shown in FIG. 9A, the picture includes a picture header 910, a picture type 920, a motion vector search range 930, and a slice layer 940. As shown in FIG. 9B, the slice layer 940 includes a slice header 950, a slice position 960, and an MB layer 970. As shown in FIG. 9C, the MB layer 970 includes an MB address 972, an MB type 974, a motion vector 976, and MB code data 978. The MB code data 978 is video data compressed and encoded by the MPEG encoder 500.

図９（Ｄ）を参照して、リファレンス映像領域９８０は、スライス９６２を含む。スライス９６２は、スライス位置９６０に基づいて特定される。スライス位置９６０は、たとえば垂直ライン番号に対応する。スライス９６２は、マクロブロック（ＭＢ）９６４を含む。マクロブロック９６４は、ＭＢのアドレス９７２により特定される。参照領域９９０は、動き補償の参照に関し、その参照が有効である有効範囲と有効でない無効範囲とを含む。たとえばリファレンス映像領域９８０と参照領域９９０との重複部分では、マクロブロック９６４による動きベクトルの検出が可能に設定される。一方、参照領域９９０における無効範囲（ハッチング部分）では、マクロブロック９６４による動きベクトルの検出は禁止される。 Referring to FIG. 9D, the reference video area 980 includes a slice 962. The slice 962 is identified based on the slice position 960. The slice position 960 corresponds to, for example, a vertical line number. Slice 962 includes a macroblock (MB) 964. The macro block 964 is specified by the MB address 972. The reference area 990 relates to a motion compensation reference and includes a valid range in which the reference is valid and an invalid range in which the reference is not valid. For example, in the overlapping portion of the reference video area 980 and the reference area 990, the motion vector can be detected by the macro block 964. On the other hand, in the invalid range (hatched portion) in the reference area 990, the motion vector detection by the macroblock 964 is prohibited.

図１０を参照して、本実施の形態に係る各端末から映像信号合成装置１２００に対して送信されるストリームデータについて説明する。図１０（Ａ）は、端末１００ａが映像信号合成装置１２００に対して送信するストリームを概念的に表わす図である。図１０（Ｂ）は、端末１００ｂが映像信号合成装置１２００に対して送信するストリームを概念的に表わす図である。図１０（Ｃ）は、端末１００ｃが映像信号合成装置１２００に対して送信するストリームを概念的に表わす図である。図１０（Ｄ）は、端末１００ｄが映像信号合成装置１２００に対して送信するストリームを概念的に表わす図である。 With reference to FIG. 10, the stream data transmitted from each terminal according to the present embodiment to video signal combining apparatus 1200 will be described. FIG. 10A is a diagram conceptually showing a stream transmitted from terminal 100 a to video signal synthesis apparatus 1200. FIG. 10B is a diagram conceptually showing a stream transmitted from terminal 100b to video signal synthesis apparatus 1200. FIG. 10C is a diagram conceptually showing a stream transmitted from terminal 100 c to video signal synthesis apparatus 1200. FIG. 10D is a diagram conceptually showing a stream transmitted from terminal 100d to video signal synthesis apparatus 1200.

図１０（Ａ）に示されるように、端末１００ａが送信するストリームは、ピクチャヘッダＰＨとスライスヘッダＳＨと、スライス層とを含む。スライス層は、たとえば「ＳＬ１」〜「ＳＬ４８０」まで、４８０層に分けられる。各スライス層は、ＭＢ０−ＭＢ２１までのマクロブロック（ＭＢ）と、ＭＢ２２〜ＭＢ４３までのマクロブロックとを含む。スライス層ＳＬ１〜ＳＬ２４０におけるＭＢ０〜ＭＢ２１は、図３（Ａ）に示されるように、領域３１０における表示を実現するための表示データ１０１０に対応する。 As shown in FIG. 10A, the stream transmitted by the terminal 100a includes a picture header PH, a slice header SH, and a slice layer. The slice layer is divided into, for example, 480 layers from “SL1” to “SL480”. Each slice layer includes macroblocks (MB) from MB0 to MB21 and macroblocks from MB22 to MB43. MB0 to MB21 in the slice layers SL1 to SL240 correspond to the display data 1010 for realizing the display in the area 310, as shown in FIG.

図１０（Ｂ）に示されるように、端末１００ｂが送信するストリームは、ピクチャヘッダＰＨとスライスヘッダＳＨと、スライス層とを含む。スライス層は、たとえば「ＳＬ１」〜「ＳＬ４８０」まで、４８０層に分けられる。各スライス層は、ＭＢ０−ＭＢ２１までのマクロブロック（ＭＢ）と、ＭＢ２２〜ＭＢ４３までのマクロブロックとを含む。スライス層ＳＬ１〜ＳＬ２４０におけるＭＢ２２〜ＭＢ４３は、図３（Ｂ）に示されるように、領域３２０における表示を実現するための表示データ１０２０に対応する。 As shown in FIG. 10B, the stream transmitted by the terminal 100b includes a picture header PH, a slice header SH, and a slice layer. The slice layer is divided into, for example, 480 layers from “SL1” to “SL480”. Each slice layer includes macroblocks (MB) from MB0 to MB21 and macroblocks from MB22 to MB43. MB22 to MB43 in the slice layers SL1 to SL240 correspond to display data 1020 for realizing display in the region 320, as shown in FIG.

図１０（Ｃ）を参照して、端末１００ｃが送信するストリームは、同様にピクチャヘッダＰＨとスライスヘッダＳＨと表示データ１０３０とを含む。当該ストリームのデータ構成は、図１０（Ａ）に示されるものと同じである。表示データ１０３０は、図３（Ｃ）に示されるように、領域３３０における表示を実現するためのデータである。図１０（Ｄ）を参照して、端末１００ｄが送信するストリームは、同様にピクチャヘッダＰＨとスライスヘッダＳＨと表示データ１０３０とを含む。当該ストリームのデータ構成は、図１０（Ａ）に示されるものと同じである。表示データ１０４０は、図３（Ｄ）に示されるように、領域３４０における表示を実現するためのデータである。なお、各ストリームが合成された態様については後述する（図１４）。 Referring to FIG. 10C, the stream transmitted by terminal 100c similarly includes picture header PH, slice header SH, and display data 1030. The data structure of the stream is the same as that shown in FIG. The display data 1030 is data for realizing display in the area 330 as shown in FIG. Referring to FIG. 10D, the stream transmitted by terminal 100d similarly includes picture header PH, slice header SH, and display data 1030. The data structure of the stream is the same as that shown in FIG. The display data 1040 is data for realizing display in the area 340 as shown in FIG. A mode in which the streams are combined will be described later (FIG. 14).

図１１を参照して、本実施の形態に係る端末１００の制御構造について説明する。図１１は、端末１００のＭＰＥＧエンコーダ５００が実行する処理の手順を表わすフローチャートである。 With reference to FIG. 11, the control structure of terminal 100 according to the present embodiment will be described. FIG. 11 is a flowchart showing a procedure of processing executed by MPEG encoder 500 of terminal 100.

ステップＳ１１１０にて、ＭＰＥＧエンコーダ５００は、カメラ１１０から映像信号の入力を受ける。ステップＳ１１２０にて、ＭＰＥＧエンコーダ５００は、映像信号を受信バッファ５１０に格納する。ステップＳ１１３０にて、ＭＰＥＧエンコーダ５００は、受信バッファ５１０に格納されている映像信号を読み出す。ステップＳ１１４０にて、符号化回路５２０は、端末１００自身に割当てられている領域（有効領域）以外の領域を無効領域として、映像信号を符号化する。符号化回路５２０は、この符号化処理を実行する際に、その処理の行なわれた時刻をタイムスタンプとしてさらに追加する。 In step S 1110, MPEG encoder 500 receives an input of a video signal from camera 110. In step S 1120, MPEG encoder 500 stores the video signal in reception buffer 510. In step S1130, MPEG encoder 500 reads the video signal stored in reception buffer 510. In step S1140, encoding circuit 520 encodes the video signal using an area other than the area (effective area) allocated to terminal 100 as an invalid area. When executing this encoding process, the encoding circuit 520 further adds the time when the process was performed as a time stamp.

ステップＳ１１５０にて、抽出回路５３０は、符号化回路５２０の符号化により生成された圧縮データの中から上記有効領域に対応するデータを抽出する。ステップＳ１１６０にて、ＭＰＥＧエンコーダ５００は、抽出回路５３０により抽出されたデータを送信バッファ５４０に格納する。ステップＳ１１７０にて、ＭＰＥＧエンコーダ５００は、送信バッファ５４０からデータを読み出し、そして送信部４４０に対して送出する。送信部４４０は、そのデータに基づいて送信用のデータを生成し、そして通信回線に対して出力する。このようにして出力されたストリームデータは、映像信号合成装置１２００に向けて送信される。 In step S1150, extraction circuit 530 extracts data corresponding to the effective area from the compressed data generated by the encoding of encoding circuit 520. In step S1160, MPEG encoder 500 stores the data extracted by extraction circuit 530 in transmission buffer 540. In step S 1170, MPEG encoder 500 reads data from transmission buffer 540 and sends the data to transmission unit 440. The transmission unit 440 generates data for transmission based on the data and outputs it to the communication line. The stream data output in this way is transmitted to the video signal synthesis apparatus 1200.

図１２を参照して、本実施の形態に係るテレビ会議システム１０を構成する映像信号合成装置１２００について説明する。図１２は、映像信号合成装置１２００のハードウェア構成を表わすブロック図である。 With reference to FIG. 12, video signal synthesis apparatus 1200 constituting video conference system 10 according to the present embodiment will be described. FIG. 12 is a block diagram showing a hardware configuration of video signal synthesis device 1200.

映像信号合成装置１２００は、受信部１２１０と、多地点接続用コントローラ１２２０と、送信部１２４０と、音声パケット合成部１２３０とを含む。多地点接続用コントローラ１２２０は、有効ストリーム抽出部１２２２と、ストリーム合成部１２２４とを含む。受信部１２１０は、通信回線に対して電気的に接続されている。受信部１２１０は、当該通信回線を介して各端末からのストリームデータを受信する。受信部１２１０は、ストリームデータから映像信号と音声信号とをそれぞれ抽出する。受信部１２１０は、映像信号を多地点接続用コントローラ１２２０に対して送出する。受信部１２１０は、また音声信号を音声パケット合成部１２３０に送出する。 The video signal synthesis device 1200 includes a reception unit 1210, a multipoint connection controller 1220, a transmission unit 1240, and an audio packet synthesis unit 1230. The multipoint connection controller 1220 includes an effective stream extraction unit 1222 and a stream synthesis unit 1224. The receiving unit 1210 is electrically connected to the communication line. The receiving unit 1210 receives stream data from each terminal via the communication line. The receiving unit 1210 extracts a video signal and an audio signal from the stream data. The receiving unit 1210 sends the video signal to the multipoint connection controller 1220. The receiving unit 1210 also transmits the audio signal to the audio packet synthesizing unit 1230.

多地点接続用コントローラ１２２０に入力された映像信号は、有効ストリーム抽出部１２２０によって有効ストリームが抽出される。ここで有効ストリームとは、各端末のモニタリング１３０に映像を表示するために必要な映像データをいう。有効ストリーム抽出部１２２２は、テレビ会議システム１０に参加可能な端末の数だけ多地点接続用コントローラ１２２０に含まれている。各々の有効ストリーム抽出部１２２２によって抽出された各映像データは、ストリーム合成部１２２４にそれぞれ入力される。ストリーム合成部１２２４は、各映像データを合成する。この合成は、たとえば各端末からのストリームデータのヘッダに含まれている領域情報に基づいて各映像が重ならないように行なわれる。ストリーム合成部１２２４は、そのような合成により生成された映像データを送信部１２４０に対して送出する。 An effective stream is extracted by the effective stream extraction unit 1220 from the video signal input to the multipoint connection controller 1220. Here, the effective stream refers to video data necessary for displaying video on the monitoring 130 of each terminal. The valid stream extraction units 1222 are included in the multipoint connection controller 1220 by the number of terminals that can participate in the video conference system 10. Each video data extracted by each valid stream extraction unit 1222 is input to the stream synthesis unit 1224. The stream combining unit 1224 combines each video data. This synthesis is performed so that the videos do not overlap based on, for example, area information included in the header of stream data from each terminal. The stream synthesis unit 1224 sends the video data generated by such synthesis to the transmission unit 1240.

受信部１２１０によって取得された音声信号は、音声パケット合成部１２３０に送出される。音声パケット合成部１２３０は、各端末からの音声信号を合成する。合成された信号は、送信部１２４０に対して送出される。 The audio signal acquired by the reception unit 1210 is sent to the audio packet synthesis unit 1230. The voice packet synthesizer 1230 synthesizes voice signals from the terminals. The combined signal is sent to the transmission unit 1240.

送信部１２４０は、ストリーム合成部１２２４からの映像信号と音声パケット合成部１２３０からの音声信号とを合成する。送信部１２４０は、当該合成により生成されたデータに送信用のパケットを構成するためのヘッダを付加する。送信部１２４０は、送信用のパケットを通信回線に対して送出する。その結果、映像信号合成装置１２００から出力された信号は、各端末１００ａから１００ｄにそれぞれ送信される。 The transmission unit 1240 synthesizes the video signal from the stream synthesis unit 1224 and the audio signal from the audio packet synthesis unit 1230. The transmission unit 1240 adds a header for configuring a transmission packet to the data generated by the synthesis. The transmission unit 1240 transmits a transmission packet to the communication line. As a result, the signal output from the video signal synthesizer 1200 is transmitted to each terminal 100a to 100d.

図１３を参照して、本実施の形態に係る映像信号合成装置１２００の制御構造について説明する。図１３は、映像信号合成装置１２００が実行する処理の手順を表わすフローチャートである。 With reference to FIG. 13, a control structure of video signal synthesis apparatus 1200 according to the present embodiment will be described. FIG. 13 is a flowchart showing a procedure of processing executed by video signal synthesis device 1200.

ステップＳ１３１０にて、映像信号合成装置の受信部１２１０は、通信回線を介して各端末から圧縮されたそれぞれのデータを受信する。ステップＳ１３２０にて、映像信号合成装置１２００は、各データを受信バッファ（図示しない）に格納する。ステップＳ１３３０にて、映像信号合成装置１２００は、受信バッファからデータを読み出す。 In step S1310, the reception unit 1210 of the video signal synthesis device receives each compressed data from each terminal via the communication line. In step S1320, video signal synthesizer 1200 stores each data in a reception buffer (not shown). In step S1330, video signal synthesizer 1200 reads data from the reception buffer.

ステップＳ１３４０にて、映像信号合成装置１２００は、各端末からのデータの同期を取るために、各々読出されたデータに含まれるタイムスタンプを参照し、そして同一時間帯に含まれるデータを合成する。この処理は、ストリーム合成部１２２４において実行される。またこのとき音声信号も同様に音声パケット合成部１２３０において合成される。各々合成された信号は、送信部１２４０に送出される。 In step S1340, video signal synthesizing device 1200 refers to the time stamp included in each read data and synthesizes data included in the same time zone in order to synchronize data from each terminal. This process is executed in the stream synthesis unit 1224. At this time, the voice signal is similarly synthesized by the voice packet synthesis unit 1230. Each synthesized signal is sent to transmission section 1240.

ステップＳ１３５０にて、送信部１２４０は、各々の合成により生成されたデータに送信アドレスを付加して、ストリームデータを生成する。ここで送信アドレスとは、テレビ会議システムに参加している各端末のネットワークアドレスをいう。このアドレスにより各端末が特定されるため、テレビ会議システムへの参加者の映像がそれぞれの端末に配信される。ステップＳ１３６０にて、送信部１２４０は、送信アドレスに基づいてストリームデータを送信する。 In step S1350, transmission section 1240 generates stream data by adding a transmission address to the data generated by each synthesis. Here, the transmission address refers to the network address of each terminal participating in the video conference system. Since each terminal is specified by this address, a video of a participant in the video conference system is distributed to each terminal. In step S1360, transmission unit 1240 transmits stream data based on the transmission address.

ここで図１４を参照して、映像信号合成装置１２００から送信されるストリームデータについて説明する。図１４は、ストリームデータの構成を概略的に表わす図である。 Here, with reference to FIG. 14, the stream data transmitted from the video signal synthesis device 1200 will be described. FIG. 14 schematically shows the structure of stream data.

ストリームデータは、ピクチャヘッダＰＨと、スライスヘッダＳＨと、表示データ１０１０，１０２０，１０３０，１０４０とを含む。表示データ１０１０は、図１０（Ａ）に示されるように、端末１００ａにより生成された映像データに基づくマクロブロックを含む。表示データ１０２０は、同様に図１０（Ｂ）に示されるマクロブロックを含む。表示データ１０３０は、端末１００ｃにより生成された映像データのためのマクロブロックを含む。表示データ１０４０は、端末１００ｄにより生成された映像データのためのマクロブロックを含む。 The stream data includes a picture header PH, a slice header SH, and display data 1010, 1020, 1030, and 1040. Display data 1010 includes macroblocks based on video data generated by terminal 100a, as shown in FIG. Similarly, the display data 1020 includes a macro block shown in FIG. Display data 1030 includes macroblocks for video data generated by terminal 100c. Display data 1040 includes macroblocks for video data generated by terminal 100d.

なお、ストリームデータは、各スライス層に含まれるマクロブロックについて、復号処理を実行していない。すなわちストリームデータは、映像信号合成装置１２００が受信したデータに含まれる映像データをそのまま結合して生成されたデータである。したがって、各端末に送信するためのデータは、各々の端末からの映像データの受信に応答して生成されるため、復号化処理のための遅延の発生が抑制される。 Note that the stream data is not subjected to decoding processing for macroblocks included in each slice layer. That is, the stream data is data generated by combining the video data included in the data received by the video signal synthesizer 1200 as they are. Therefore, data to be transmitted to each terminal is generated in response to reception of video data from each terminal, so that the occurrence of delay for decoding processing is suppressed.

各々の端末が、図１４に示されるストリームデータを受信すると、各モニタテレビ１３０は、それぞれの領域において他の端末のカメラ１１０により撮影された映像（たとえばテレビ会議システムに参加しているユーザ）を表示する。 When each terminal receives the stream data shown in FIG. 14, each monitor TV 130 displays a video (for example, a user who participates in the video conference system) captured by the camera 110 of another terminal in each area. indicate.

以上のようにして、本実施の形態に係る端末によれば、テレビ会議システムを実現するための映像を表示するデータに関し、圧縮符号化処理は、当該端末により撮影された映像を表示するために予め定められた領域に応じたデータとなるように実行される。各端末は、いわゆる他地点接続コントローラを実現するために各端末からの映像データを合成する映像信号合成装置１２００に対して、当該圧縮符号化処理により生成されたデータを送信する。映像信号合成装置１２００は、各端末からの映像データを抽出して合成することにより、テレビ会議システム用の合成データを生成する。すなわち、合成データの生成の差異には、映像データの復号処理および圧縮処理が実行されない。そのため、映像信号合成装置１２００において、合成データを生成するための処理の遅延が抑制される。その結果、各端末の使用者、すなわちテレビ会議への参加者は、映像および音声をスムーズに視聴することができる。 As described above, according to the terminal according to the present embodiment, with respect to data for displaying video for realizing the video conference system, the compression encoding processing is performed in order to display video captured by the terminal. It is executed so as to be data corresponding to a predetermined area. Each terminal transmits the data generated by the compression encoding process to the video signal synthesis apparatus 1200 that synthesizes video data from each terminal in order to realize a so-called other-point connection controller. The video signal synthesizer 1200 generates synthesized data for the video conference system by extracting and synthesizing video data from each terminal. That is, the decoding process and the compression process of the video data are not executed for the difference in the generation of the composite data. Therefore, in the video signal synthesizer 1200, processing delay for generating synthesized data is suppressed. As a result, the user of each terminal, that is, a participant in the video conference can smoothly view the video and audio.

＜変形例＞
以下、本発明の実施の形態の変形例について説明する。前述の実施の形態においては、映像信号合成装置１２００から各々の端末に送信されるストリームデータは、すべて同一の映像データを含む構成であった。しかしながら、映像信号合成装置１２００から送信されるデータの構成は、そのようなものに限られない。たとえば、各端末に送信されるストリームデータは、各端末において生成された映像データを含まないデータであってもよい。すなわち、各端末は、各々のＭＰＥＧエンコーダ５００によって生成された自らの映像データを有しているため、映像信号合成装置１２００から同一のデータの配信を受けなくてもよい。この場合、したがって、映像信号合成装置１２００において、特定の端末から送信された映像データをその送信元に再び送り返すための処理が省略され得る。 <Modification>
Hereinafter, modifications of the embodiment of the present invention will be described. In the above-described embodiment, the stream data transmitted from the video signal synthesis device 1200 to each terminal is configured to include the same video data. However, the configuration of data transmitted from the video signal synthesis device 1200 is not limited to that. For example, the stream data transmitted to each terminal may be data that does not include video data generated at each terminal. That is, since each terminal has its own video data generated by each MPEG encoder 500, it is not necessary to receive the same data from the video signal synthesis device 1200. In this case, therefore, the process for returning the video data transmitted from the specific terminal back to the transmission source in the video signal synthesis apparatus 1200 can be omitted.

なお、本変形例に係るテレビ会議システムを実現するための各端末および映像信号合成装置は、前述の端末１００および映像信号合成装置１２００が有するハードウェア構成と同一のハードウェア構成を有する。それらの機能も同じである。したがって、ここでは、それらについての詳細な説明は、繰り返さない。 Note that each terminal and the video signal synthesis device for realizing the video conference system according to this modification have the same hardware configuration as that of the terminal 100 and the video signal synthesis device 1200 described above. Their functions are the same. Therefore, detailed description thereof will not be repeated here.

ここで、図１５を参照して、本変形例に係るテレビ会議システムにおいて送信されるデータについて説明する。図１５（Ａ）は、本変形例に係る映像信号合成装置１２００が端末１００ａに送信するデータの構成を表わす図である。図１５（Ｂ）は、当該映像信号合成装置が端末１００ｂに対して送信するデータの構成を表わす図である。図１５（Ｃ）は、当該映像信号合成装置１２００が端末１００ｃに対して送信するデータの構成を表わす図である。図１５（Ｄ）は、当該映像信号合成装置１２００が端末１００ｄに対して送信するデータの構成を表わす図である。 Here, with reference to FIG. 15, data transmitted in the video conference system according to the present modification will be described. FIG. 15A is a diagram showing a configuration of data transmitted to terminal 100a by video signal synthesis apparatus 1200 according to this modification. FIG. 15B is a diagram illustrating a configuration of data that the video signal synthesis device transmits to terminal 100b. FIG. 15 (C) is a diagram illustrating a configuration of data that video signal synthesizing apparatus 1200 transmits to terminal 100c. FIG. 15D is a diagram illustrating a configuration of data that video signal synthesis apparatus 1200 transmits to terminal 100d.

図１５（Ａ）を参照して、ストリームデータ１５１０は、領域１５１１〜１５１４を含む。領域１５１１には、ヘッダが格納されている。領域１５１２には、端末１００ｂにより生成された映像データが格納されている。領域１５１３には、端末１００ｃにより生成された映像データが格納されている。領域１５１４には、端末１００ｄにより生成された映像データが格納されている。領域１５１１に格納されているヘッダは、たとえば領域２５０に格納されているヘッダ（図２）と同一の項目を有する。 Referring to FIG. 15A, stream data 1510 includes areas 1511 to 1514. An area 1511 stores a header. The area 1512 stores video data generated by the terminal 100b. The area 1513 stores video data generated by the terminal 100c. The area 1514 stores video data generated by the terminal 100d. The header stored in area 1511 has the same items as the header (FIG. 2) stored in area 250, for example.

このようなデータが映像信号合成装置１２００から端末１００ａに送信される場合、端末１００ａにより生成された映像データは、送信されない。したがって、端末１００ａと映像信号合成装置１２００との間の通信量を削減することができる。このようなストリームデータ１５１０を受信する端末１００ａは、領域１５１１のヘッダに含まれるタイムスタンプに基づいて自らが保持している映像データＡと、ストリームデータ１５１０に格納されている映像データＢ〜Ｄとを合成し、ＭＰＥＧデコーダ４６０によって復号化する。モニタテレビ１３０は、復号化されたデータに基づいて各映像データに対応する映像をそれぞれの領域に表示する。 When such data is transmitted from the video signal synthesis device 1200 to the terminal 100a, the video data generated by the terminal 100a is not transmitted. Therefore, the amount of communication between the terminal 100a and the video signal synthesis device 1200 can be reduced. The terminal 100 a that receives such stream data 1510 receives the video data A held by itself based on the time stamp included in the header of the area 1511, and the video data B to D stored in the stream data 1510. Are decoded by the MPEG decoder 460. The monitor television 130 displays video corresponding to each video data in each area based on the decoded data.

図１５（Ｂ）に示されるように、映像信号合成装置１２００から端末１００ｂに送信されるストリームデータ１５２０は、領域１５２１〜１５２４を含む。ヘッダは、領域１５２１に格納されている。端末１００ａにより生成された映像データＡは、領域１５２２に格納されている。端末１００ｃにより生成された映像データＣは、領域１５２３に格納されている。端末１００ｄにより生成された映像データは、領域１５２４に格納されている。すなわち端末１００ｂにより生成された映像データＢは、映像信号合成装置１２００から端末１００ｂに対して送信されない。その結果、端末１００ｂと映像信号合成装置１２００との間の通信量は、映像データＢのデータ量だけ削減される。 As shown in FIG. 15B, the stream data 1520 transmitted from the video signal synthesis device 1200 to the terminal 100b includes regions 1521 to 1524. The header is stored in area 1521. Video data A generated by the terminal 100 a is stored in the area 1522. Video data C generated by the terminal 100c is stored in the area 1523. Video data generated by the terminal 100d is stored in an area 1524. That is, the video data B generated by the terminal 100b is not transmitted from the video signal synthesis device 1200 to the terminal 100b. As a result, the communication amount between the terminal 100b and the video signal synthesis device 1200 is reduced by the data amount of the video data B.

図１５（Ｃ）を参照して、映像信号合成装置１２００から端末１００ｃに送信されるストリームデータ１５３０は、領域１５３１〜１５３４を含む。ヘッダは、領域１５３１に格納されている。端末１００ａにより生成された映像データＡは、領域１５３２に格納されている。端末１００ｂにより生成された映像データＢは、領域１５３３に格納されている。端末１００ｄにより生成された映像データＤは、領域１５３４に格納されている。すなわち端末１００ｃにより生成された映像データＣは、映像信号合成装置１２００から端末１００ｃに対して送信されない。その結果、映像データＣに相当するデータ量だけ通信量が削減される。 Referring to FIG. 15C, stream data 1530 transmitted from video signal synthesis apparatus 1200 to terminal 100c includes areas 1531 to 1534. The header is stored in area 1531. Video data A generated by the terminal 100 a is stored in the area 1532. Video data B generated by the terminal 100 b is stored in the area 1533. The video data D generated by the terminal 100d is stored in the area 1534. That is, the video data C generated by the terminal 100c is not transmitted from the video signal synthesis device 1200 to the terminal 100c. As a result, the communication amount is reduced by a data amount corresponding to the video data C.

さらに図１５（Ｄ）を参照して、映像信号合成装置１２００から端末１００ｄに送信されるストリームデータ１５４０は、領域１５４１〜１５４４を含む。ヘッダは、領域１５４１に格納されている。端末１００ａにより生成された映像データＡは、領域１５４２に格納されている。端末１００ｂにより生成された映像データＢは、領域１５４３に格納されている。また端末１００ｃにより生成された映像データＣは、領域１５４４に格納されている。この場合、端末１００ｄにより生成された映像データＤは、映像信号合成装置１２００から端末１００ｄに送信されるデータに含まれない。 Further, referring to FIG. 15D, stream data 1540 transmitted from video signal synthesis apparatus 1200 to terminal 100d includes regions 1541 to 1544. The header is stored in area 1541. The video data A generated by the terminal 100a is stored in the area 1542. The video data B generated by the terminal 100b is stored in the area 1543. The video data C generated by the terminal 100c is stored in the area 1544. In this case, the video data D generated by the terminal 100d is not included in the data transmitted from the video signal synthesis device 1200 to the terminal 100d.

端末１００ｄがストリームデータ１５４０を受信すると、自らが生成した映像データＤと、ストリームデータ１５４０に含まれる映像データＡ〜Ｃとを合成して、表示用のデータを生成する。モニタテレビ１３０は、このようにして生成されたデータに基づいて端末１００ａから１００ｄにより撮影された映像をそれぞれの領域に表示する。 When the terminal 100d receives the stream data 1540, the video data D generated by itself and the video data A to C included in the stream data 1540 are combined to generate display data. The monitor television 130 displays videos taken by the terminals 100a to 100d in the respective areas based on the data generated in this way.

以上のようにして、本変形例に係るテレビ会議システムによれば、当該システムを構成する各端末は、映像信号合成装置から、自己において生成された映像を含まないストリームデータの配信を受ける。各端末は、受信したストリームデータに含まれている他の端末により生成された映像データと、自らが作成した映像データとを合成する。このようにすると、テレビ会議システムを構成する通信回線におけるデータ通信量を削減することができるため、映像の表示を速やかに実現することができる。 As described above, according to the video conference system according to the present modification, each terminal configuring the system receives distribution of stream data that does not include video generated by itself from the video signal synthesis device. Each terminal synthesizes video data generated by another terminal included in the received stream data and video data created by itself. In this way, the amount of data communication on the communication line that constitutes the video conference system can be reduced, so that video display can be realized quickly.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本発明の実施の形態に係るテレビ会議システム１０のシステム構成を表わすブロック図である。It is a block diagram showing the system configuration | structure of the video conference system 10 which concerns on embodiment of this invention. 図１に示される各端末と映像信号合成装置１２００との間で通信される信号の構成を概念的に表わす図である。FIG. 2 is a diagram conceptually showing a configuration of signals communicated between each terminal shown in FIG. 1 and a video signal synthesis device 1200. 端末１００ａ，１００ｂ，１００ｃ，１００ｄがモニタテレビ１３０に表示する画面の一態様を表わす図である。It is a figure showing the one aspect | mode of the screen which the terminal 100a, 100b, 100c, 100d displays on the monitor television 130. FIG. 本実施の形態に係るストリームデータの生成装置４００のハードウェア構成を表わすブロック図である。It is a block diagram showing the hardware constitutions of the production | generation apparatus 400 of the stream data concerning this Embodiment. ＭＰＥＧエンコーダ５００のハードウェア構成を表わすブロック図である。2 is a block diagram showing a hardware configuration of MPEG encoder 500. FIG. メモリ４３０におけるデータの格納の一態様を表わす図である。FIG. 11 is a diagram illustrating an aspect of data storage in memory 430. モニタテレビ１３０における座標を概念的に表わす図である。It is a figure which represents the coordinate in the monitor television 130 notionally. 端末１００ａにおいて、モニタテレビ１３０に映像を表示する場合に当該表示の対象となる映像データの対応する領域を表わす図である。When terminal 100a displays a video on monitor television 130, it is a diagram showing a corresponding region of video data to be displayed. 各端末１００から映像信号合成装置１２００に対して送信されるＭＰＥＧストリームの構成を表わす図である。3 is a diagram illustrating a configuration of an MPEG stream transmitted from each terminal 100 to a video signal synthesis device 1200. FIG. 本実施の形態に係る各端末から映像信号合成装置１２００に対して送信されるストリームデータの構成を概念的に表わす図である。It is a figure which represents notionally the structure of the stream data transmitted with respect to the video signal synthesis apparatus 1200 from each terminal which concerns on this Embodiment. 本実施の形態に係る端末１００のＭＰＥＧエンコーダ５００が実行する処理の手順を表わすフローチャートである。It is a flowchart showing the procedure of the process which the MPEG encoder 500 of the terminal 100 which concerns on this Embodiment performs. 本実施の形態に係るテレビ会議システム１０を構成する映像信号合成装置１２００のハードウェア構成を表わすブロック図である。It is a block diagram showing the hardware constitutions of the video signal synthesizer 1200 which comprises the video conference system 10 which concerns on this Embodiment. 本実施の形態に係る映像信号合成装置１２００が実行する処理の手順を表わすフローチャートである。It is a flowchart showing the procedure of the process which the video signal synthesis apparatus 1200 which concerns on this Embodiment performs. 映像信号合成装置１２００から送信されるストリームデータの構成を概略的に表わす図である。FIG. 3 is a diagram schematically showing a configuration of stream data transmitted from a video signal synthesis device 1200. 本変形例に係るテレビ会議システムにおいて送信されるデータの構成を表わす図である。It is a figure showing the structure of the data transmitted in the video conference system concerning this modification.

Explanation of symbols

１０テレビ会議システム、１００ａ，１００ｂ，１００ｃ，１００ｄ端末、１１０カメラ、１２０マイク、１３０テレビモニタ、４００ストリームデータの生成装置、４１０，４７０映像インターフェイス、４２０，４８０音声インターフェイス、４３０メモリ、４４０，１２４０送信部、４５０，１２１０受信部、４６０ＭＰＥＧデコーダ、５００ＭＰＥＧエンコーダ、５１０受信バッファ、５２０符号化回路、５３０抽出回路、５４０送信バッファ、１２００映像信号合成装置、１２２０多地点接続用コントローラ、１２２２有効ストリーム抽出部、１２２４ストリーム合成部、１２３０音声パケット合成部。 10 video conference system, 100a, 100b, 100c, 100d terminal, 110 camera, 120 microphone, 130 TV monitor, 400 stream data generator, 410, 470 video interface, 420, 480 audio interface, 430 memory, 440, 1240 transmission 450, 1210 receiver, 460 MPEG decoder, 500 MPEG encoder, 510 receive buffer, 520 encoding circuit, 530 extraction circuit, 540 transmission buffer, 1200 video signal synthesizer, 1220 multipoint connection controller, 1222 effective stream extraction Part, 1224 stream composition part, 1230 voice packet composition part.

Claims

A plurality of terminals electrically connected to a communication line, each of which includes a step of generating stream data, each of the plurality of terminals including display means for displaying a video in a display area based on a video signal; The step to generate
Receiving a video signal input;
Preparing region information for specifying a predetermined region in the display region;
Generating the video data for displaying the video based on the video signal in the partial area by encoding the video signal based on the area information;
Generating data for transmission of the video data;
Outputting the data for transmission to the communication line,
In the video signal synthesizer electrically connected to the communication line, the video signal synthesizer further comprises the step of synthesizing the video signal based on the data output from each of the terminals, and the step of synthesizing the video signal comprises:
Receiving each stream data output from each of the plurality of terminals via the communication line, wherein each stream data is preliminarily displayed in order to display a video in a part of the display area. Including video data encoded according to a predetermined position, the position being different according to each of the plurality of stream data,
Based on the plurality of stream data, a synthesis step for generating display data for displaying each video based on the video data in the display area;
Generating data for transmission of the generated display data;
Outputting the generated data for transmission to the communication line,
In each of the plurality of terminals, receiving data generated based on the data for transmission from the video signal synthesizer;
In each of the plurality of terminals, generating display data by performing a decoding process based on the received data;
Each of the plurality of terminals further comprises a step of displaying a video on the display means based on the display data.

The stream data generation method according to claim 1, wherein the generation step includes an encoding step of generating the video data by encoding a video signal corresponding to the partial area.

The stream data generation method according to claim 2, wherein the encoding step executes the encoding using a video signal corresponding to the partial area as a target of motion compensation.

The synthesis step includes
Extracting each of the video data from each of the plurality of stream data;
The step of generating the display data by respectively combining the extracted video data based on a predetermined position with respect to the extracted video data. How to generate stream data.

Comprising a generating device electrically connected to a communication line, the generating device comprising:
Display means for displaying video in the display area;
Storage means for storing area information for specifying a predetermined area in the display area;
An input means for receiving an input of a video signal;
Generating means for generating video data for displaying the video based on the video signal in the partial area by encoding the video signal based on the area information;
Transmission data generating means for generating data for transmission of the video data;
Output means electrically connected to the communication line and outputting data for transmission to the communication line;
The video signal synthesis device further comprising a video signal synthesis device electrically connected to the communication line,
Input means for receiving a plurality of stream data inputs via the communication line, each of the plurality of stream data corresponding to a predetermined position for displaying a video in a part of a video display area. Encoded video data, and the position differs according to each of the plurality of stream data,
Based on the plurality of stream data, combining means for generating display data for displaying each video based on the video data in the display area;
Transmission data generating means for generating data for transmission of display data generated by the combining means;
Output means for outputting the generated data for transmission to the communication line;
The generator is
Receiving means for receiving data generated based on the data for transmission from the communication line;
Decoding means for generating display data by executing a decoding process based on the data received by the receiving means;
The video conference system further comprising: control means for displaying video on the display means based on the display data.

Display means for displaying video in the display area;
Storage means for storing area information for specifying a predetermined area in the display area;
An input means for receiving an input of a video signal;
Generating means for generating video data for displaying the video based on the video signal in the partial area by encoding the video signal based on the area information;
An output means electrically connected to a communication line for outputting the video data to the communication line;
Receiving means for receiving data generated based on the data for transmission from the communication line;
Decoding means for generating display data by executing a decoding process based on the data received by the receiving means;
A stream data generating apparatus, comprising: control means for displaying video on the display means based on the display data.

The stream data generation apparatus according to claim 6, wherein the generation unit includes an encoding unit that generates the video data by encoding a video signal corresponding to the partial area.

The stream data generation device according to claim 7, wherein the encoding unit performs the encoding using a video signal corresponding to the partial area as a target of motion compensation.

The stream data generation device according to claim 6, wherein the video data includes only data for displaying a video only in the partial area.

The stream data generation device according to claim 6, further comprising a changing unit that changes area information stored in the storage unit based on an input from the outside.

It further comprises an instruction input means for receiving area information input from the outside,
The stream data generation device according to claim 10, wherein the changing unit changes the region information stored in the storage unit to the region information input via the instruction input unit.

The receiving means receives data including the area information from the communication line,
The stream data generation device according to claim 10, wherein the changing unit changes the region information stored in the storage unit to region information included in the received data.

Input means for receiving a plurality of stream data inputs via a communication line, each of the plurality of stream data corresponding to a predetermined position for displaying a video in a part of a video display area Including encoded video data, and the position differs according to each of the plurality of stream data;
Based on the plurality of stream data, combining means for generating display data for displaying each video based on the video data in the display area;
Transmission data generating means for generating data for transmission of display data generated by the combining means;
An apparatus for synthesizing stream data, comprising: output means for outputting the generated data for transmission to the communication line.

The synthesis means includes
Extraction means for extracting each of the video data from each of the plurality of stream data;
14. The stream data according to claim 13, further comprising: generation means for generating the display data by combining the extracted video data based on a predetermined position with respect to the extracted video data. Synthesizer.

Each of the plurality of stream data includes time data for specifying a time at which each of the stream data is generated,
15. The stream data synthesizing apparatus according to claim 13, wherein the synthesizing unit generates the display data based on the time data of each of the plurality of stream data.

The stream data synthesizing apparatus according to claim 13, wherein the synthesizing unit generates the display data without decoding the video data.

Each of the plurality of stream data includes specific data for specifying an area in which video based on the video data is displayed,
The stream data synthesizing apparatus according to claim 13, wherein the synthesizing unit generates the display data based on the specific data.