JP2005341325A

JP2005341325A - Multi-point video conference system, multi-point video conference control method, server apparatus, multi-point video conference control program, and program recording medium thereof

Info

Publication number: JP2005341325A
Application number: JP2004158494A
Authority: JP
Inventors: Junichi Nakajima; 淳一中嶋; Hisami Shinsenji; 久美秦泉寺; Kazuto Kamikura; 一人上倉
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-05-28
Filing date: 2004-05-28
Publication date: 2005-12-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multi-point video conference system, capable of relieving a load imposed on a server apparatus for controlling a communication conference and gathering the number of downstream connections of into one. <P>SOLUTION: When the server apparatus 1 receives a coded stream in the unit of VP from each client terminal 2, a header update processing section 12 carries out processing of rewriting a VOP header into a VP header and processing of revising a macroblock number in the VP header on the basis of related information between a predetermined client and a display position of the image and generates a coded stream resulting from compositing coded streams from a plurality of clients. Each client terminal receives the composite coded stream from the server apparatus 1 and decodes and displays the coded stream to display the composited image of the plurality of clients. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は，映像符号化データの多重化方式に関し，特に符号化および復号機能を持つ複数のクライアント端末において符号化された符号化ストリームを配信サーバが受信して，それぞれのクライアント端末に符号化ストリームを配信する形式の多地点テレビ会議システムに関するものである。 The present invention relates to a multiplexing method of video encoded data, and in particular, a distribution server receives an encoded stream encoded in a plurality of client terminals having encoding and decoding functions, and the encoded stream is transmitted to each client terminal. It is related with the multipoint video conference system of the form which distributes.

一般に，多地点テレビ会議システムでは，複数のクライアント端末で撮影された映像の符号化ストリームを，配信サーバがそれぞれのクライアント端末から受信し，配信サーバにおいて各クライアント端末から受信した符号化ストリームを復号し，復号した各クライアント端末の画像を一つの画像に合成して，それを再符号化したものを，それぞれのクライアント端末に配信することを行っていた（例えば，特許文献１，特許文献２参照）。 In general, in a multipoint video conference system, a distribution server receives encoded streams of video shot by a plurality of client terminals from each client terminal, and the distribution server decodes the encoded streams received from each client terminal. The decoded image of each client terminal is synthesized into one image, and the re-encoded image is distributed to each client terminal (for example, see Patent Document 1 and Patent Document 2). .

図１１は，従来の多地点テレビ会議システムの例を示す図である。図１１において，１００は多地点テレビ会議を制御し，各クライアントに映像を配信するサーバ装置，１１０ａ〜１１０ｚは多地点テレビ会議に参加するクライアントのクライアント端末である。サーバ装置１００は，各クライアント端末１１０ａ〜１１０ｚに対応する受信バッファ１０１ａ〜１０１ｚと，復号部１０２ａ〜１０２ｚと，画像合成部１０３と，再符号化部１０４と，送信バッファ１０５とを備える。 FIG. 11 is a diagram showing an example of a conventional multipoint video conference system. In FIG. 11, reference numeral 100 denotes a server device that controls a multipoint video conference and distributes video to each client, and 110a to 110z are client terminals of clients participating in the multipoint video conference. The server device 100 includes reception buffers 101a to 101z corresponding to the client terminals 110a to 110z, decoding units 102a to 102z, an image synthesis unit 103, a re-encoding unit 104, and a transmission buffer 105.

サーバ装置１００は，各クライアント端末１１０ａ〜１１０ｚにネットワークを介して接続され，各クライアント端末１１０ａ〜１１０ｚからの符号化ストリームを受信し，各クライアント端末に対応する受信バッファ１０１ａ〜１０１ｚに格納する。復号部１０２ａ〜１０２ｚは，受信バッファ１０１ａ〜１０１ｚに格納された符号化ストリームをそれぞれ復号する。 The server device 100 is connected to each client terminal 110a to 110z via a network, receives an encoded stream from each client terminal 110a to 110z, and stores it in reception buffers 101a to 101z corresponding to each client terminal. The decoding units 102a to 102z decode the encoded streams stored in the reception buffers 101a to 101z, respectively.

画像合成部１０３は，各復号部１０２ａ〜１０２ｚが復号した画像をあらかじめ定められた位置に配置して一つの画像に合成し，再符号化部１０４は，合成された画像を再符号化する。再符号化された合成画像ストリームは，送信バッファ１０５に格納され，各クライアント端末１１０ａ〜１１０ｚにネットワークを介して配信される。 The image composition unit 103 arranges the images decoded by the decoding units 102a to 102z at predetermined positions and composes them into one image, and the re-encoding unit 104 re-encodes the synthesized images. The re-encoded composite image stream is stored in the transmission buffer 105 and distributed to each of the client terminals 110a to 110z via the network.

図１２は，従来の他の多地点テレビ会議システムの例を示す図である。図１２において，１２０は多地点テレビ会議を制御し，各クライアントに映像を配信するサーバ装置，１３０ａ〜１３０ｚは多地点テレビ会議に参加するクライアントのクライアント端末である。サーバ装置１２０は，各クライアント端末１３０ａ〜１３０ｚに対応する受信バッファ１２１ａ〜１２１ｚと，送信バッファ１２２ａ〜１２２ｚとを備える。 FIG. 12 is a diagram showing an example of another conventional multipoint video conference system. In FIG. 12, 120 is a server device that controls a multi-point video conference and distributes video to each client, and 130a to 130z are client terminals of clients participating in the multi-point video conference. The server device 120 includes reception buffers 121a to 121z and transmission buffers 122a to 122z corresponding to the client terminals 130a to 130z.

各クライアント端末１３０ａ〜１３０ｚは，クライアントを撮影するカメラ１３１ａ〜１３１ｚと，ディスプレイ１３２ａ〜１３２ｚと，撮影した映像を符号化する符号化部１３３と，符号化ストリームを送信するための送信バッファ１３４と，サーバ装置１２０が配信した符号化ストリームを受信するための複数の受信バッファ１３５と，各受信バッファ１３５に対応する符号化ストリームの復号部１３６と，各復号部１３６が復号した復号画像と，自端末のカメラで撮影した画像とを合成する画像合成部１３７とを備える。 Each of the client terminals 130a to 130z includes cameras 131a to 131z that capture the client, displays 132a to 132z, an encoding unit 133 that encodes the captured video, a transmission buffer 134 for transmitting the encoded stream, A plurality of reception buffers 135 for receiving the encoded streams distributed by the server apparatus 120, encoded stream decoding units 136 corresponding to the respective reception buffers 135, decoded images decoded by the respective decoding units 136, and the own terminal And an image composition unit 137 that composes an image captured by the camera.

前述した図１１の例では，サーバ装置１００においてクライアント端末の符号化ストリームを一度復号した後，画像全体を合成してから再符号化しているが，図１２のシステムでは，サーバ装置１２０は，各クライアント端末１３０ａ〜１３０ｚから受信した符号化ストリームを，受信バッファ１２１ａ〜１２１ｚおよび送信バッファ１２２ａ〜１２２ｚを介して各クライアント端末１３０ａ〜１３０ｚに配信する。 In the example of FIG. 11 described above, after the encoded stream of the client terminal is once decoded in the server apparatus 100, the entire image is synthesized and re-encoded. However, in the system of FIG. The encoded streams received from the client terminals 130a to 130z are distributed to the client terminals 130a to 130z via the reception buffers 121a to 121z and the transmission buffers 122a to 122z.

各クライアント端末１３０ａ〜１３０ｚでは，受信バッファ１３５によりサーバ装置１２０から配信された符号化ストリームを受信すると，受信した符号化ストリームをそれぞれ復号部１３６で復号し，復号画像を画像合成部１３７へ送る。画像合成部１３７では，それらの画像と自端末で撮影した画像とを合成し，ディスプレイ１３２ａ〜１３２ｚに表示する。これにより，各クライアント端末１３０ａ〜１３０ｚのディスプレイ１３２ａ〜１３２ｚには，テレビ会議に参加するクライアントの映像が表示されることになる。
特許第３０９７７３６号公報特開平１１−１８７３７２号公報 In each of the client terminals 130a to 130z, when the encoded stream distributed from the server device 120 is received by the reception buffer 135, the received encoded stream is decoded by the decoding unit 136, and the decoded image is sent to the image synthesis unit 137. In the image composition unit 137, these images and images taken by the terminal are synthesized and displayed on the displays 132a to 132z. Thereby, the video of the client participating in the video conference is displayed on the displays 132a to 132z of the client terminals 130a to 130z.
Japanese Patent No. 3097736 JP-A-11-187372

図１１に示すような従来の多地点テレビ会議システムでは，サーバ装置１００において各クライアント端末１１０ａ〜１１０ｚから受信した符号化ストリームを一度復号した後，画像全体を合成してから再符号化するため，サーバ装置１００において復号，画像合成，再符号化の処理が必要であり，サーバ装置１００の負荷が大きくなるという問題があった。 In the conventional multipoint video conference system as shown in FIG. 11, after the encoded stream received from each of the client terminals 110a to 110z is once decoded in the server device 100, the entire image is synthesized and then re-encoded. There is a problem in that the server apparatus 100 needs to perform decoding, image synthesis, and re-encoding processing, which increases the load on the server apparatus 100.

一方，図１２に示すような従来の多地点テレビ会議システムの場合，サーバ装置１２０の負荷は小さくて済むが，サーバ装置１２０と各クライアント端末１３０ａ〜１３０ｚとの間で，ダウンストリームの分だけコネクションを張る必要があるため，テレビ会議に参加するクライアントの増加に伴い必要となる通信ポートが増加するという問題があった。また，ネットワークリソースやセキュリティの観点からも好ましくはないという問題があった。 On the other hand, in the case of the conventional multi-point video conference system as shown in FIG. Therefore, there is a problem that the necessary communication ports increase as the number of clients participating in the video conference increases. There is also a problem that it is not preferable from the viewpoint of network resources and security.

また，以上のような従来技術では，各クライアント端末が，サーバ装置から合成画像ストリームを受信して復号する場合と，サーバ装置から特定のクライアント画像の符号化ストリームだけを受信して復号する場合とで，同一の復号部により復号することはできなかった。 In the conventional technology as described above, each client terminal receives and decodes a composite image stream from the server device, and receives and decodes only an encoded stream of a specific client image from the server device. Thus, the same decoding unit could not be used for decoding.

本発明は上記問題点の解決を図り，サーバ装置における負荷を軽減し，かつダウンストリームのコネクション数を１つにまとめることができる多地点テレビ会議システムを実現することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to solve the above problems, and to realize a multipoint video conference system capable of reducing the load on the server device and integrating the number of downstream connections into one.

上記課題を解決するため，本発明は，複数のクライアント端末において符号化された符号化ストリームをサーバ装置が受信して，それぞれのクライアント端末に符号化ストリームを配信する形式の多地点テレビ会議システムにおいて，クライアント端末によって符号化されサーバ装置にアップロードされた符号化ストリームについて，サーバ装置が，各ＶＰ（Video Packet）単位の符号化ストリームのＶＰヘッダに格納されたマクロブロック番号を，各クライアント端末からの画像を合成したときの合成画像の該当する表示位置になるように変更し，各クライアント端末に配信することを特徴とする。 In order to solve the above problems, the present invention provides a multipoint video conference system in which a server device receives encoded streams encoded by a plurality of client terminals and distributes the encoded streams to the respective client terminals. For the encoded stream encoded by the client terminal and uploaded to the server apparatus, the server apparatus obtains the macroblock number stored in the VP header of the encoded stream of each VP (Video Packet) unit from each client terminal. It changes so that it may become a corresponding display position of the synthesized image when it synthesize | combines an image, It distributes to each client terminal, It is characterized by the above-mentioned.

また，合成画像における先頭の表示位置以外のクライアントの符号化ストリームは，ＶＯＰ（Video Object Plane）ヘッダは，ＶＰヘッダに書き換える。ヘッダを書き換えたストリームは，ＶＰ単位（例えば，マクロブロックの１ライン）ごとに出力する。 In addition, the VOP (Video Object Plane) header of the encoded stream of the client other than the head display position in the composite image is rewritten to the VP header. The stream with the rewritten header is output for each VP unit (for example, one line of a macro block).

サーバ装置から符号化ストリームを受信した各クライアント端末は，ＶＰ単位で順次復号し，ＶＰヘッダ内の変更後のマクロブロック番号が示す表示位置に復号画像を表示する。 Each client terminal that has received the encoded stream from the server device sequentially decodes in units of VP, and displays the decoded image at the display position indicated by the changed macroblock number in the VP header.

クライアント端末が一つの画像領域（ストリーム）を選択したときには，サーバ装置は，上記ヘッダ更新処理をスルーして，指定された画像領域に該当するクライアント端末からの符号化ストリームをそのまま出力する。 When the client terminal selects one image area (stream), the server apparatus passes through the header update process and outputs the encoded stream from the client terminal corresponding to the designated image area as it is.

本発明は，符号化および復号機能を持つ複数のクライアント端末において符号化された符号化ストリームをサーバ装置が受信して，それぞれのクライアント端末に符号化ストリームを配信する形式の多地点テレビ会議システムにおいて，サーバ装置にアップロードされた各々の符号化ストリームのヘッダを更新し，ＶＰヘッダ内のマクロブロック番号を変更することによって合成ストリームを生成して配信するので，サーバ装置における各クライアント端末から受信した符号化ストリームを復号する処理，複数の復号画像を合成する処理，および合成画像を再符号化する処理が不要となり，サーバ装置の負荷を軽減することができる。 The present invention relates to a multipoint video conference system in which a server device receives encoded streams encoded by a plurality of client terminals having encoding and decoding functions and distributes the encoded streams to the respective client terminals. Since the composite stream is generated and distributed by updating the header of each encoded stream uploaded to the server device and changing the macroblock number in the VP header, the code received from each client terminal in the server device The process for decoding the encrypted stream, the process for synthesizing a plurality of decoded images, and the process for re-encoding the synthesized image are not required, and the load on the server device can be reduced.

また，クライアント端末においては，サーバ装置から受信した合成ストリームをＶＰ単位で順次復号し，ＶＰヘッダ内のマクロブロック番号に対応する位置に復号画像を表示するため，各クライアント端末における符号化同期処理が不要であり，簡易に効率よく符号化ストリームを生成および復号・表示することができる。 Further, since the client terminal sequentially decodes the composite stream received from the server device in units of VP and displays the decoded image at the position corresponding to the macroblock number in the VP header, the encoding synchronization processing in each client terminal is performed. It is unnecessary, and an encoded stream can be generated, decoded, and displayed easily and efficiently.

また，サーバ装置から各クライアント端末の映像符号化データを合成して配信するので，テレビ会議に参加するクライアントの数だけコネクションを張る必要はなく，クライアント端末に対するダウンストリームのコネクション数を１つにまとめることができ，通信ポートその他の通信に必要な資源の増加を抑えることができる。 Also, since the encoded video data of each client terminal is synthesized and distributed from the server device, it is not necessary to establish connections as many as the number of clients participating in the video conference, and the number of downstream connections to the client terminals is integrated into one. And increase in resources necessary for communication ports and other communications can be suppressed.

また，クライアント端末が，合成ストリームではなく，特定のクライアントの符号化ストリームの一つを選択して受信した場合に，合成ストリームのときと同一の復号部で復号することができ，復号プロセスを簡易化することができる。 In addition, when the client terminal selects and receives one of the encoded streams of a specific client instead of the composite stream, it can be decoded by the same decoding unit as that for the composite stream, thus simplifying the decoding process. Can be

以下，図面を用いて本発明の実施の形態を説明する。図１は，本発明に係る多地点テレビ会議システムの構成例を示す。図１において，１は多地点テレビ会議を制御し，各クライアントに映像を配信するサーバ装置，２は多地点テレビ会議に参加するクライアントのクライアント端末である。本発明の実施の形態においては，例えば，クライアントＡ〜クライアントＤの４人のクライアントがクライアント端末２を用いて多地点テレビ会議に参加するものとする。もちろん，本発明における通信会議の参加者は，４人に限られるわけではなく，複数人であればよい。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a configuration example of a multipoint video conference system according to the present invention. In FIG. 1, 1 is a server device that controls a multipoint video conference and distributes video to each client, and 2 is a client terminal of a client that participates in the multipoint video conference. In the embodiment of the present invention, for example, four clients A to D participate in a multipoint video conference using the client terminal 2. Of course, the number of participants in the communication conference according to the present invention is not limited to four, and a plurality of participants may be used.

サーバ装置１において，１１はそれぞれ各クライアント端末２から送信された符号化ストリームを受信し格納する受信バッファ，１２は各受信バッファ１１からＶＰ単位の符号化ストリームを順次受け取り，ＶＰヘッダの内容を更新した上で合成するヘッダ更新処理部，１３は合成された符号化ストリームが格納される送信バッファである。 In the server apparatus 1, 11 is a reception buffer for receiving and storing the encoded stream transmitted from each client terminal 2, and 12 is sequentially receiving the encoded stream in units of VP from each reception buffer 11 and updating the contents of the VP header. In addition, a header update processing unit 13 to be combined is a transmission buffer in which the combined encoded stream is stored.

ヘッダ更新処理部１２において，１４は受信バッファ１１に格納された符号化ストリームをＶＰ単位で読み出すストリーム入力部，１５はＶＰヘッダ内の変更後のマクロブロック（以下，ＭＢと表す）番号を決定するＭＢ番号決定部，１６は特定のＶＯＰヘッダをＶＰヘッダに書き換え，またＶＰヘッダ内のＭＢ番号を変更後のＭＢ番号に変更するヘッダ／ＭＢ番号変更部，１７はＭＢ番号が変更されたＶＰ単位の符号化ストリームを送信バッファ１３に転送するストリーム出力部である。 In the header update processing unit 12, 14 is a stream input unit for reading the encoded stream stored in the reception buffer 11 in VP units, and 15 is a macro block (hereinafter referred to as MB) number after change in the VP header. MB number determination unit, 16 rewrites a specific VOP header to a VP header, and changes the MB number in the VP header to the MB number after the change, 17 is a VP unit in which the MB number is changed This is a stream output unit for transferring the encoded stream to the transmission buffer 13.

クライアント端末２において，２３はカメラ２１により撮影された映像を符号化する符号化部，２４はサーバ装置１へ送信する符号化ストリームを格納する送信バッファ，２５はサーバ装置１からの合成された符号化ストリームを格納する受信バッファ，２６は受信バッファ２５に格納された符号化ストリームを復号する復号部，２７は復号データのディスプレイ２２への表示を制御する表示制御部である。 In the client terminal 2, reference numeral 23 denotes an encoding unit that encodes video captured by the camera 21, 24 denotes a transmission buffer that stores an encoded stream to be transmitted to the server apparatus 1, and 25 denotes a synthesized code from the server apparatus 1. A reception buffer for storing the encoded stream, 26 is a decoding unit for decoding the encoded stream stored in the reception buffer 25, and 27 is a display control unit for controlling the display of the decoded data on the display 22.

図２は，クライアント端末の符号化部の詳細を示す図である。この例では，ＭＰＥＧ−４の符号化方式を用いるものとする。この符号化部２３は，従来から一般に用いられているエンコーダと同じものである。減算部２３１は，マクロブロックごとに入力画像信号と動き補償部２３９の出力である予測画像信号との差分信号を算出する。ＤＣＴ部２３２は，その差分信号を離散コサイン変換（ＤＣＴ）する。量子化部２３３は，符号量制御部２４１が決定した量子化パラメータに従ってＤＣＴ係数を量子化する。可変長符号化部２４０は，量子化後のＤＣＴ係数を可変長符号化し，送信バッファ２４へ出力する。 FIG. 2 is a diagram illustrating details of the encoding unit of the client terminal. In this example, an MPEG-4 encoding method is used. The encoding unit 23 is the same as an encoder generally used conventionally. The subtraction unit 231 calculates a difference signal between the input image signal and the predicted image signal that is the output of the motion compensation unit 239 for each macroblock. The DCT unit 232 performs a discrete cosine transform (DCT) on the difference signal. The quantization unit 233 quantizes the DCT coefficient according to the quantization parameter determined by the code amount control unit 241. The variable length coding unit 240 performs variable length coding on the quantized DCT coefficient and outputs the result to the transmission buffer 24.

また，量子化後のＤＣＴ係数は逆量子化部２３４へも出力され，逆量子化部２３４では，それを逆量子化する。逆ＤＣＴ部２３５は，逆量子化後の信号を逆離散コサイン変換する。加算部２３６は，逆ＤＣＴ後の信号に動き補償部２３９の出力信号を加算する。フレームメモリ２３７は，加算後の信号を参照画像として蓄積する。動き予測部２３８は，フレームメモリ２３７に蓄積された参照画像と入力画像信号とに基づいて動き予測を行う。動き補償部２３９は，動き予測部２３８が検出した動きベクトルに基づいて，フレームメモリ２３７に蓄積された参照画像から予測画像信号を生成する。動き予測部２３８の出力である動きベクトルは，可変長符号化部２４０で可変長符号化され，送信バッファ２４に出力される。 The quantized DCT coefficient is also output to the inverse quantization unit 234, and the inverse quantization unit 234 performs inverse quantization. The inverse DCT unit 235 performs inverse discrete cosine transform on the signal after inverse quantization. The adder 236 adds the output signal of the motion compensation unit 239 to the signal after inverse DCT. The frame memory 237 stores the added signal as a reference image. The motion prediction unit 238 performs motion prediction based on the reference image stored in the frame memory 237 and the input image signal. The motion compensation unit 239 generates a predicted image signal from the reference image stored in the frame memory 237 based on the motion vector detected by the motion prediction unit 238. The motion vector output from the motion prediction unit 238 is variable length encoded by the variable length encoding unit 240 and output to the transmission buffer 24.

また，可変長符号化部２４０では，各フレームごとにＶＯＰ（Video Object Plane）ヘッダを生成してＭＢ符号化情報の前に付与し，ＶＰ単位にＶＰヘッダを生成してＭＢ符号化情報の前に付与する。ＶＰヘッダ内には，ＭＢ符号化情報の先頭のＭＢ番号情報を格納する。 Also, the variable length coding unit 240 generates a VOP (Video Object Plane) header for each frame and adds it to the front of the MB encoded information, generates a VP header for each VP, To grant. The MB number information at the head of the MB encoded information is stored in the VP header.

図１に示す構成を採る多地点テレビ会議システムにおいて，まず，各クライアント端末２の符号化部２３は，カメラ２１により撮影された映像を符号化し，送信バッファ２４に格納する。 In the multipoint video conference system having the configuration shown in FIG. 1, first, the encoding unit 23 of each client terminal 2 encodes the video captured by the camera 21 and stores it in the transmission buffer 24.

サーバ装置１の受信バッファ１１は，各クライアント端末２の送信バッファ２４からネットワークを介して送信された符号ストリームを，送信元のクライアントのクライアントＩＤと対応付けて格納する。ヘッダ更新処理部１２のストリーム入力部１４は，受信バッファ１１からクライアントＩＤとＶＰ単位の符号化ストリームを受け取る。ＭＢ番号決定部１５は，クライアントＩＤに基づいてＶＰヘッダ内の変更後のＭＢ番号を決定する。 The reception buffer 11 of the server device 1 stores the code stream transmitted from the transmission buffer 24 of each client terminal 2 via the network in association with the client ID of the transmission source client. The stream input unit 14 of the header update processing unit 12 receives the client ID and the encoded stream in VP units from the reception buffer 11. The MB number determination unit 15 determines the changed MB number in the VP header based on the client ID.

また，ヘッダ更新処理部１２のヘッダ／ＭＢ番号変更部１６は，ヘッダがＶＯＰヘッダである場合，クライアントＩＤに応じてそれをＶＰヘッダに書き換え，またＶＰヘッダ内のＭＢ番号を上記変更後のＭＢ番号に変更する。ストリーム出力部１７は，ＭＢ番号が変更されたＶＰ単位の符号化ストリームを順次，送信バッファ１３に転送する。 In addition, when the header is a VOP header, the header / MB number changing unit 16 of the header update processing unit 12 rewrites it into a VP header according to the client ID, and changes the MB number in the VP header to the MB after the change Change to a number. The stream output unit 17 sequentially transfers the encoded stream of VP units whose MB numbers have been changed to the transmission buffer 13.

その結果，複数のクライアントの各ＶＰ単位の符号化ストリームが合成ストリームとして，送信バッファ１３からネットワークを介して各クライアント端末２へ配信される。 As a result, the encoded stream of each VP unit of a plurality of clients is distributed as a composite stream from the transmission buffer 13 to each client terminal 2 via the network.

クライアント端末２においては，サーバ装置１から送信された合成ストリームが受信バッファ２５に格納される。復号部２６は，受信バッファ２５に格納された合成ストリームをＶＰ単位で順次復号し，表示制御部２７が，ディスプレイ２２の表示画面中の，ＶＰヘッダ内に格納されたＭＢ番号に対応する位置に復号画像を表示する。 In the client terminal 2, the composite stream transmitted from the server device 1 is stored in the reception buffer 25. The decoding unit 26 sequentially decodes the composite stream stored in the reception buffer 25 in units of VP, and the display control unit 27 is located at a position corresponding to the MB number stored in the VP header in the display screen of the display 22. Display the decoded image.

図３は，各クライアント端末によって設定されるＶＯＰヘッダとＶＰヘッダを示す図である。一般に，ＶＰはＶＯＰ（Video Object Plane）を任意の数のＭＢ毎に区切った単位をいうが，本発明の実施の形態では，例えば，図３に示すようなＮ個のＭＢの符号化情報が並んだ１ラインにＶＰヘッダ（第１番目のＶＰについてはＶＯＰヘッダ）が付与されたデータをＶＰの単位とし，各クライアント端末２からはＭ個のＶＰからなるＶＯＰが符号化ストリームとして送信されるものとする。 FIG. 3 is a diagram showing a VOP header and a VP header set by each client terminal. In general, VP is a unit obtained by dividing a VOP (Video Object Plane) into an arbitrary number of MBs. In the embodiment of the present invention, for example, encoding information of N MBs as shown in FIG. Data with a VP header (VOP header for the first VP) attached to one line is used as a VP unit, and each client terminal 2 transmits a VOP consisting of M VPs as an encoded stream. Shall.

図３に示すように，第１番目のＶＰにおいて，第０番目〜第Ｎ−１番目までのＮ個のＭＢの符号化情報の前には，ＶＯＰ全体を復号するための情報が格納されたＶＯＰヘッダが付与される。第２番目のＶＰにおいて，第Ｎ番目から２Ｎ−１番目までのＮ個のＭＢの符号化情報の前には，ＶＰ（Ｎ）というＶＰヘッダが付与される。 As shown in FIG. 3, in the first VP, information for decoding the entire VOP is stored before the encoded information of the Nth MBs from the 0th to the (N-1) th. A VOP header is added. In the second VP, a VP header of VP (N) is added before the encoded information of N MBs from the Nth to the 2N−1th.

ＶＰヘッダの括弧内の数字は，ＶＰ中のＮ個のＭＢの先頭ＭＢのシーケンス番号（ＭＢ番号）を示す。第２番目のＶＰ中の先頭ＭＢは第Ｎ番目のＭＢなので，第２番目のＶＰのＶＰヘッダには，「Ｎ」という値のＭＢ番号が格納される。同様に，第Ｍ番目のＶＰのＶＰヘッダには，「（Ｍ−１）Ｎ」という値のＭＢ番号が格納される。 The number in parentheses of the VP header indicates the sequence number (MB number) of the first MB of the N MBs in the VP. Since the first MB in the second VP is the Nth MB, the MB number having a value of “N” is stored in the VP header of the second VP. Similarly, an MB number having a value of “(M−1) N” is stored in the VP header of the Mth VP.

図４は，各クライアント端末に表示される合成ストリームの復号画像の例を示す図であり，図５は，図４に示す復号画像中の各ＭＢのＭＢ番号を示す図である。また，図６は，図４，図５に示すようなレイアウトで復号画像を構成するための，各クライアントから送信されるＶＰ単位の符号化ストリームのヘッダの更新情報を示す図である。 FIG. 4 is a diagram illustrating an example of a decoded image of a composite stream displayed on each client terminal, and FIG. 5 is a diagram illustrating an MB number of each MB in the decoded image illustrated in FIG. FIG. 6 is a diagram showing update information of the header of the encoded stream in VP units transmitted from each client for constituting a decoded image with the layouts shown in FIGS. 4 and 5.

クライアントＡ〜クライアントＤから送信されたＮ×Ｍ個のＭＢの符号化情報を含む符号化ストリームの復号画像を，図４，図５に示すレイアウトで合成ストリームの復号画像中に配置する場合を例にとって説明する。ここで，各クライアントから送信される画像サイズと合成ストリームの復号画像の画像サイズは，各クライアント端末２とサーバ装置１とのセッション確立時に決めておくものとする。 An example in which the decoded image of the encoded stream including the encoded information of N × M MBs transmitted from the client A to the client D is arranged in the decoded image of the composite stream with the layout shown in FIGS. I will explain to you. Here, the image size transmitted from each client and the image size of the decoded image of the composite stream are determined when a session between each client terminal 2 and the server apparatus 1 is established.

また，サーバ装置１は，各クライアントを一意に識別するクライアントＩＤと各クライアントの画像の配置位置との対応情報である配置位置情報を生成して保持しておく。この配置位置情報は，クライアント名，クライアントＩＤ，配置位置，画像サイズ（ＭＢ数）といったデータ項目からなる。 Further, the server device 1 generates and holds arrangement position information that is correspondence information between a client ID for uniquely identifying each client and the arrangement position of the image of each client. This arrangement position information includes data items such as a client name, a client ID, an arrangement position, and an image size (number of MBs).

図４に示すように，クライアントＡの画像を表示画面全体の左上に配置し，クライアントＢの画像を表示画面全体の右上に配置し，クライアントＣの画像を表示画面全体の左下に配置し，クライアントＤの画像を表示画面全体の右下に配置する例では，サーバ装置１は，図７に示すような配置位置情報を保持している。配置位置は，座標情報でもよい。 As shown in FIG. 4, the image of client A is arranged at the upper left of the entire display screen, the image of client B is arranged at the upper right of the entire display screen, and the image of client C is arranged at the lower left of the entire display screen. In the example in which the image D is arranged at the lower right of the entire display screen, the server apparatus 1 holds arrangement position information as shown in FIG. The arrangement position may be coordinate information.

また，サーバ装置１のヘッダ更新処理部１２は，上記配置位置情報に基づいて，例えば図６に示すヘッダの更新情報を生成して保持しておく。ヘッダの更新情報は，例えば，クライアントＩＤごとの変更前ヘッダ情報と変更後ヘッダ情報との対応情報である。変更前ヘッダ情報は，各クライアントから送信される符号化ストリームのヘッダの種類（ＶＯＰヘッダの場合）またはＶＰヘッダ内に格納されているＭＢ番号情報であり，これらに対応して，変更後ヘッダ情報の項目として，変更後のＭＢ番号情報が格納されている。なお，ヘッダの更新情報をテーブル化して持つのではなく，その都度，所定の算出式によって求めるようにしてもよい。 Further, the header update processing unit 12 of the server device 1 generates and holds, for example, header update information shown in FIG. 6 based on the arrangement position information. The header update information is, for example, correspondence information between pre-change header information and post-change header information for each client ID. The header information before change is the type of the header of the encoded stream transmitted from each client (in the case of the VOP header) or the MB number information stored in the VP header. As the item, MB number information after the change is stored. Instead of having the header update information in the form of a table, it may be obtained by a predetermined calculation formula each time.

図６に示すヘッダの更新情報について詳述する。クライアントＩＤ＝１のクライアントＡの画像は，表示画面全体の左上に配置されるので，クライアントＡから送信された符号化ストリ−ムのヘッダがＶＯＰヘッダである場合には，変更後ヘッダ情報を「そのまま」とする。 The header update information shown in FIG. 6 will be described in detail. Since the image of client A with client ID = 1 is arranged at the upper left of the entire display screen, if the header of the encoded stream transmitted from client A is a VOP header, the header information after change is changed to “ “As is”.

クライアントＡから送信されたＶＰのＶＰヘッダ内に格納されたＭＢ番号が「Ｎ」の場合，図５に示すように，このＶＰの先頭ＭＢの復号画像中のＭＢ番号は「２Ｎ」であることから，対応する変更後ヘッダ情報を「２Ｎ」とする。 When the MB number stored in the VP header of the VP transmitted from the client A is “N”, the MB number in the decoded image of the first MB of this VP is “2N” as shown in FIG. Therefore, the corresponding post-change header information is set to “2N”.

また，例えば，クライアントＢの画像は，表示画面全体の右上に配置されるので，クライアントＢから送信された符号化ストリームのヘッダがＶＯＰヘッダである場合には，変更後ヘッダ情報を「ＶＰ（Ｎ）」とする。このＶＰヘッダに格納されるＭＢ番号を「Ｎ」とするのは，図５に示すように，このＶＰの先頭ＭＢの復号画像中のＭＢ番号が「Ｎ」であるからである。 Further, for example, since the image of the client B is arranged at the upper right of the entire display screen, when the header of the encoded stream transmitted from the client B is a VOP header, the header information after change is set to “VP (N ) ”. The reason why the MB number stored in this VP header is “N” is that the MB number in the decoded image of the first MB of this VP is “N” as shown in FIG.

同様の方法で，サーバ装置１のヘッダ更新処理部１２は，各クライアントから送信される符号化ストリームのヘッダの更新情報を生成し，保持する。 In the same way, the header update processing unit 12 of the server apparatus 1 generates and holds update information of the header of the encoded stream transmitted from each client.

図８は，ヘッダ更新処理部１２の処理フローの例を示す図である。ヘッダ更新処理部１２は，受信バッファ１１からクライアントＩＤとＶＰ単位の符号化ストリームを受け取ると（ステップＳ１），クライアントＩＤに基づいて，図６に示すヘッダの更新情報を参照して，対応する変更後ヘッダ情報を決定する（ステップＳ２）。具体的には，変更後のＶＰヘッダ内に格納するＭＢ番号を決定する。 FIG. 8 is a diagram illustrating an example of a processing flow of the header update processing unit 12. When the header update processing unit 12 receives the client ID and the encoded stream of VP units from the reception buffer 11 (step S1), the header update processing unit 12 refers to the update information of the header shown in FIG. Post header information is determined (step S2). Specifically, the MB number to be stored in the changed VP header is determined.

次に，ＶＯＰヘッダ／ＶＰヘッダを検索する。ＶＯＰヘッダであるかＶＰヘッダであるかはビットパターンの違いによって判別することができる（ステップＳ３）。符号化ストリームのヘッダがＶＯＰヘッダかＶＰヘッダかを判断し（ステップＳ４），ＶＯＰヘッダである場合には，復号画像が表示画面全体の左上に表示されるクライアントが送信した符号化ストリームであるかを判断する（ステップＳ５）。例えば，復号画像が左上に表示されるクライアントＩＤ＝１のクライアントＡが送信元かを判断する。 Next, the VOP header / VP header is searched. Whether the header is a VOP header or a VP header can be determined by a difference in bit pattern (step S3). It is determined whether the header of the encoded stream is a VOP header or a VP header (step S4). If the header is a VOP header, whether the decoded image is an encoded stream transmitted by a client displayed at the upper left of the entire display screen. Is determined (step S5). For example, it is determined whether or not the client A with client ID = 1 whose decoded image is displayed on the upper left is the transmission source.

復号画像が表示画面全体の左上に表示されるクライアントが送信元である場合には，ＶＰ単位で送信バッファ１３に渡す（ステップＳ７）。例えば，送信元がクライアントＩＤ＝１のクライアントＡである場合，図６に示すように，変更前ヘッダ情報「ＶＯＰヘッダ」に対応する変更後ヘッダ情報は「そのまま」であるので，ヘッダ情報を変更しないでＶＰ単位で送信バッファ１３に渡す。 If the client whose decoded image is displayed at the upper left of the entire display screen is the transmission source, it is transferred to the transmission buffer 13 in units of VP (step S7). For example, when the transmission source is the client A with the client ID = 1, the header information after the change corresponding to the header information “VOP header” before the change is “as is” as shown in FIG. Instead, the data is passed to the transmission buffer 13 in units of VP.

ステップＳ５において，復号画像が表示画面全体の左上に表示されるクライアントが送信元でない場合には，ＶＯＰヘッダをＶＰヘッダに変更し，ステップＳ２で決定された変更後のＭＢ番号をＶＰヘッダ内に格納し（ステップＳ６），ステップＳ７へ進む。例えば，送信元がクライアントＩＤ＝３のクライアントＣである場合，図６に示すヘッダの更新情報において，変更前ヘッダ情報「ＶＯＰヘッダ」に対応する変更後ヘッダ情報は「ＶＰ（２ＭＮ）」であるので，ＶＰヘッダ内にＭＢ番号「２ＭＮ」を格納する。 In step S5, if the client whose decoded image is displayed at the upper left of the entire display screen is not the transmission source, the VOP header is changed to the VP header, and the changed MB number determined in step S2 is included in the VP header. Store (step S6) and proceed to step S7. For example, when the transmission source is the client C with the client ID = 3, the post-change header information corresponding to the pre-change header information “VOP header” in the header update information illustrated in FIG. 6 is “VP (2MN)”. Therefore, the MB number “2MN” is stored in the VP header.

ステップＳ４において，符号化ストリームのヘッダがＶＯＰヘッダでなく，ＶＰヘッダである場合には，ＶＰヘッダ内のＭＢ番号をステップＳ２で決定された変更後のＭＢ番号に変更し（ステップＳ８），ステップＳ７へ進む。 In step S4, if the header of the encoded stream is not a VOP header but a VP header, the MB number in the VP header is changed to the changed MB number determined in step S2 (step S8). Proceed to S7.

例えば，送信元のクライアントがクライアントＩＤ＝１のクライアントＡであって，受け取ったＶＰのＶＰヘッダ内のＭＢ番号が２Ｎである場合，図６に示すヘッダの更新情報に示すように，対応する変更後ヘッダ情報は「４Ｎ」である。従って，ＶＰのＶＰヘッダ内のＭＢ番号を「４Ｎ」に変更する。 For example, if the source client is client A with client ID = 1 and the MB number in the VP header of the received VP is 2N, the corresponding change is made as shown in the header update information shown in FIG. The rear header information is “4N”. Therefore, the MB number in the VP VP header is changed to “4N”.

図９は，各クライアント端末における復号画像更新処理を説明する図である。図９（Ａ）は，各クライアント端末２がサーバ装置１から受信した，複数のＶＰから構成される合成ストリームの例であり，図９（Ｂ）は，表示画面全体中における各ＶＰの復号画像の配置を示す図である。 FIG. 9 is a diagram for explaining decoded image update processing in each client terminal. FIG. 9A shows an example of a composite stream composed of a plurality of VPs received by each client terminal 2 from the server device 1, and FIG. 9B shows a decoded image of each VP in the entire display screen. It is a figure which shows arrangement | positioning.

図９（Ａ）に示す合成ストリームを構成する各ＶＰには，サーバ装置１によって変更されたヘッダの情報が付与されている。図９（Ａ）に示す例では，例えば，第１番目のＶＰにはＶＯＰヘッダの情報が付与されており，第２番目のＶＰにはＭＢ番号「２Ｎ」，第３番目のＶＰにはＭＢ番号「Ｎ」というＶＰヘッダの情報が付与されている。 Information of the header changed by the server device 1 is given to each VP constituting the composite stream shown in FIG. In the example shown in FIG. 9A, for example, VOP header information is assigned to the first VP, the MB number “2N” is assigned to the second VP, and the MB is assigned to the third VP. The information of the VP header with the number “N” is given.

サーバ装置１から図９（Ａ）に示す合成ストリームを受信した各クライアント端末２は，ＶＰ単位で順次復号し，復号されたＶＰ単位の復号画像を，各ＶＰのヘッダの情報が示す表示位置に表示する。例えば，ＶＯＰヘッダというヘッダ情報が付与されているＶＰの復号画像は，図９（Ｂ）に示す表示画面中の最も左上の位置（図中の（１）の位置）に配置され，ＭＢ番号「２Ｎ」というＶＰヘッダの情報が付与されているＶＰの復号画像は，その先頭のＭＢが全ＭＢを通じて２Ｎ番目のＭＢとなる位置（図中の（３）の位置）に配置される。 Each client terminal 2 that has received the composite stream shown in FIG. 9A from the server device 1 sequentially decodes in units of VP, and the decoded image in units of VP is displayed at the display position indicated by the information of the header of each VP. indicate. For example, a decoded image of a VP to which header information called a VOP header is assigned is arranged at the upper left position (position (1) in the figure) in the display screen shown in FIG. The decoded image of the VP to which the VP header information of “2N” is assigned is arranged at a position (position (3) in the figure) where the leading MB becomes the 2N-th MB through all MBs.

同様にして，ＭＢ番号「Ｎ」というＶＰヘッダの情報が付与されているＶＰの復号画像は，図中の（２）の位置に配置され，ＭＢ番号「３Ｎ」というＶＰヘッダの情報が付与されているＶＰの復号画像は，図中の（４）の位置に配置される。 Similarly, the decoded image of the VP to which the VP header information with the MB number “N” is assigned is arranged at the position (2) in the figure, and the VP header information with the MB number “3N” is given. The decoded image of the VP is arranged at the position (4) in the figure.

各クライアント端末２からサーバ装置１へアップロードされる符号化ストリームのフレームレートが異なる場合には，復号画像の表示において頻繁に更新される領域と更新頻度が少ない領域とが生じることがあるが，画像全体としての表示が乱れるわけではないので問題が生じることはない。 When the frame rate of the encoded stream uploaded from each client terminal 2 to the server device 1 is different, an area that is frequently updated and an area that is less frequently updated in the display of the decoded image may occur. There is no problem because the display as a whole is not disturbed.

図１０は，本発明の他の実施の形態の多地点テレビ会議システムの構成例を示す図である。図１に示す構成に加えて，各クライアント端末２は，画像選択指示部２８を持つ。また，サーバ装置１は，画像選択情報受信部１８と，ストリーム選択・切替え部１９とを持つ。 FIG. 10 is a diagram illustrating a configuration example of a multipoint video conference system according to another embodiment of the present invention. In addition to the configuration shown in FIG. 1, each client terminal 2 has an image selection instruction unit 28. The server device 1 also includes an image selection information receiving unit 18 and a stream selection / switching unit 19.

クライアント端末２の画像選択指示部２８は，複数のクライアントの画像を合成した合成ストリームを受信するか，特定のクライアントの単独の画像を受信するかを，ユーザからの指示情報の入力によって決定し，その画像選択情報をサーバ装置１の画像選択情報受信部１８へ送る。デフォルトは例えば合成ストリームであり，合成ストリームを復号して表示した画像において，ユーザが特定のクライアントの画像をマウス等のポインティングデバイスによってクリックすると，そのクライアントの画像の選択情報が，画像選択指示部２８からサーバ装置１へ送られる。 The image selection instruction unit 28 of the client terminal 2 determines whether to receive a combined stream obtained by combining images of a plurality of clients or to receive a single image of a specific client by inputting instruction information from the user, The image selection information is sent to the image selection information receiving unit 18 of the server device 1. The default is, for example, a composite stream. When the user clicks a specific client image with a pointing device such as a mouse in an image displayed by decoding the composite stream, the selection information of the client image is displayed in the image selection instruction unit 28. To the server device 1.

また，特定のクライアントの画像が表示されている状態で，ユーザがクリックすると，画像選択指示部２８から合成ストリーム（合成画像）の画像選択情報がサーバ装置１へ送られる。 Further, when the user clicks in a state where an image of a specific client is displayed, image selection information of a composite stream (composite image) is sent from the image selection instruction unit 28 to the server device 1.

サーバ装置１におけるストリーム選択・切替え部１９は，合成ストリームが選択されている場合には，ヘッダ更新処理部１２によって各クライアントの符号化ストリームから生成した合成ストリームを，送信バッファ１３およびネットワークを介してクライアント端末２へ配信する。 When the composite stream is selected, the stream selection / switching unit 19 in the server device 1 transmits the composite stream generated from the encoded stream of each client by the header update processing unit 12 via the transmission buffer 13 and the network. Delivered to the client terminal 2.

また，画像選択情報受信部１８が，特定のクライアントの画像選択情報を受信した場合には，その画像選択情報によってストリーム選択・切替え部１９を制御し，ストリーム選択・切替え部１９は，そのクライアントのクライアントＩＤに対応する受信バッファ１１に格納された符号化ストリームをそのまま送信バッファ１３に出力する。これによって，クライアント端末２には，選択された特定のクライアントの画像だけが表示される。 Further, when the image selection information receiving unit 18 receives image selection information of a specific client, the stream selection / switching unit 19 is controlled by the image selection information, and the stream selection / switching unit 19 The encoded stream stored in the reception buffer 11 corresponding to the client ID is output to the transmission buffer 13 as it is. As a result, only the image of the selected specific client is displayed on the client terminal 2.

なお，図１０に示すサーバ装置１において，画像選択情報受信部１８とストリーム選択・切替え部１９と送信バッファ１３とを，会議に参加するクライアントの数だけ複数組設けるようにすれば，各クライアント端末２ごとにディスプレイ２２に合成画像を表示させるか特定のクライアントの画像を表示させるかを，個別に選択することができるようになる。 In the server device 1 shown in FIG. 10, if a plurality of sets of the image selection information receiving unit 18, the stream selection / switching unit 19, and the transmission buffer 13 are provided for each client participating in the conference, each client terminal is provided. It is possible to individually select whether to display a composite image or a specific client image every two.

以上のサーバ装置１およびクライアント端末２が行う処理は，ハードウェアやファームウェアによって実現することができるだけでなく，コンピュータとソフトウェアプログラムとによっても実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも，ネットワークを通して提供することも可能である。 The processes performed by the server device 1 and the client terminal 2 can be realized not only by hardware and firmware but also by a computer and a software program, and the program can be stored on a computer-readable recording medium. It can be recorded and provided through a network.

本発明に係る多地点テレビ会議システムの構成例を示す図である。It is a figure which shows the structural example of the multipoint video conference system which concerns on this invention. クライアント端末の符号化部の詳細を示す図である。It is a figure which shows the detail of the encoding part of a client terminal. 各クライアント端末によって設定されるＶＯＰヘッダとＶＰヘッダを示す図である。It is a figure which shows the VOP header and VP header which are set by each client terminal. 合成ストリームの復号画像の例を示す図である。It is a figure which shows the example of the decoded image of a synthetic | combination stream. 復号画像中の各ＭＢのＭＢ番号を示す図である。It is a figure which shows MB number of each MB in a decoded image. ヘッダの更新情報を示す図である。It is a figure which shows the update information of a header. 配置位置情報を示す図である。It is a figure which shows arrangement position information. ヘッダ更新処理部の処理フローの例を示す図である。It is a figure which shows the example of the processing flow of a header update process part. 各クライアント端末における復号画像生成処理を説明する図である。It is a figure explaining the decoded image generation process in each client terminal. 本発明の他の実施の形態の構成例を示す図である。It is a figure which shows the structural example of other embodiment of this invention. 従来の多地点テレビ会議システムの例を示す図である。It is a figure which shows the example of the conventional multipoint video conference system. 従来の他の多地点テレビ会議システムの例を示す図である。It is a figure which shows the example of the other conventional multipoint video conference system.

Explanation of symbols

１，１００，１２０サーバ装置
２，１１０ａ〜１１０ｚ，１３０ａ〜１３０ｚクライアント端末
１１，２５，１０１ａ〜１０１ｚ，１２１ａ〜１２１ｚ，１３５受信バッファ
１２ヘッダ更新処理部
１３，２４，１０５，１２２ａ〜１２２ｚ，１３４送信バッファ
１４ストリーム入力部
１５ＭＢ番号決定部
１６ヘッダ／ＭＢ番号変更部
１７ストリーム出力部
１８画像選択情報受信部
１９ストリーム選択・切替え部
２１，１３１ａ〜１３１ｚカメラ
２２，１３２ａ〜１３２ｚディスプレイ
２３，１３３符号化部
２６，１０２ａ〜１０２ｚ，１３６復号部
２７表示制御部
２８画像選択指示部
１０３，１３７画像合成部
１０４再符号化部
２３１減算部
２３２ＤＣＴ部
２３３量子化部
２３４逆量子化部
２３５逆ＤＣＴ部
２３６加算部
２３７フレームメモリ
２３８動き予測部
２３９動き補償部
２４０可変長符号化部
２４１符号量制御部 DESCRIPTION OF SYMBOLS 1,100,120 Server apparatus 2,110a-110z, 130a-130z Client terminal 11,25,101a-101z, 121a-121z, 135 Reception buffer 12 Header update process part 13,24,105,122a-122z, 134 Transmission Buffer 14 Stream input unit 15 MB number determination unit 16 Header / MB number change unit 17 Stream output unit 18 Image selection information reception unit 19 Stream selection / switching unit 21, 131a to 131z Camera 22, 132a to 132z Display 23, 133 Coding Unit 26, 102a to 102z, 136 decoding unit 27 display control unit 28 image selection instruction unit 103, 137 image synthesis unit 104 re-encoding unit 231 subtraction unit 232 DCT unit 233 quantization unit 234 inverse quantization unit 235 inverse DCT unit 23 6 Adder 237 Frame memory 238 Motion predictor 239 Motion compensator 240 Variable length encoder 241 Code amount controller

Claims

In a multipoint video conference system comprising a plurality of client terminals and a server device connected to the client terminals via a network and controlling a communication conference between the client terminals,
Each of the client terminals is
A frame of an input video in its own terminal is encoded in units of blocks, and a first header is added to the beginning of the frame for an encoded data group of a predetermined number of blocks, and an encoded data group other than the head of the frame is assigned to the encoded data group. On the other hand, means for generating an encoded stream provided with a second header including information indicating the position of the block;
Means for transmitting the generated encoded stream to the server device;
Means for receiving an encoded stream from the server device;
Means for decoding the received encoded stream;
Means for displaying the decoded image,
The server device
Means for receiving an encoded stream from each of the client terminals;
The encoded stream received from each client terminal is analyzed in units to which the first header or the second header is added, and based on relationship information between a predetermined client and the display position of the image. , By executing a process of rewriting the first header with the second header or a process of changing the information indicating the position of the macroblock in the second header to the corresponding display position. Header update processing means for generating an encoded stream obtained by synthesizing the received encoded stream;
Means for transmitting the encoded stream synthesized by the header update processing means to each of the client terminals.

In the multipoint video conference system according to claim 1,
The client terminal is
A means for selecting whether to display an image composed of a plurality of client images or a specific client image, and to transmit the image selection information to the server device;
The server device
Means for receiving the image selection information;
Means for selecting either the encoded stream synthesized by the header update processing means or the encoded stream received from a specific client terminal based on the received image selection information, and transmitting the selected stream to the client terminal. A featured multipoint video conference system.

In a multipoint video conference control method in a system comprising a plurality of client terminals and a server device connected to the client terminals via a network and controlling a communication conference between the client terminals,
The client terminal encodes an input video frame in its own terminal in units of blocks, and assigns a first header to the beginning of the frame for an encoded data group of a predetermined number of blocks. A process of generating an encoded stream with a second header including information indicating the position of the block for the encoded data group;
Each client terminal transmits the generated encoded stream to the server device;
A process in which the server device receives an encoded stream from each of the client terminals;
The server device analyzes the encoded stream received from each of the client terminals in a unit to which the first header or the second header is added, and determines a predetermined client and its image display position. Based on the relationship information, by executing a process of rewriting the first header to the second header or a process of changing information indicating the position of the macroblock in the second header to a corresponding display position, A header update process for generating an encoded stream obtained by combining the encoded streams received from the client terminals;
A process in which the server device transmits an encoded stream synthesized by the header update process to each client terminal;
A process in which the client terminal receives an encoded stream from the server device;
A process in which the client terminal decodes the received encoded stream;
The multi-point video conference control method characterized in that the client terminal includes a process of displaying a decoded image.

A server device in a multipoint video conference system that is connected to a plurality of client terminals via a network and controls a communication conference between the client terminals,
Each client terminal encodes the frame of the input video in its own terminal in units of blocks, and adds a first header to the head of the frame for the encoded data group of a predetermined number of blocks, except for the head of the frame. Means for receiving an encoded stream with a second header including information indicating the position of the block for the encoded data group;
The encoded stream received from each client terminal is analyzed in units to which the first header or the second header is added, and based on relationship information between a predetermined client and the display position of the image. , By executing a process of rewriting the first header with the second header or a process of changing the information indicating the position of the macroblock in the second header to the corresponding display position. Header update processing means for generating an encoded stream obtained by synthesizing the received encoded stream;
Means for transmitting the encoded stream synthesized by the header update processing means to each of the client terminals.

A multipoint video conference control program for causing a computer of a server device in a multipoint video conference system connected to a plurality of client terminals via a network to control a communication conference between the client terminals,
Each client terminal encodes the frame of the input video in its own terminal in units of blocks, and adds a first header to the head of the frame for the encoded data group of a predetermined number of blocks, except for the head of the frame. Means for receiving an encoded stream with a second header including information indicating the position of the block for the encoded data group;
The encoded stream received from each client terminal is analyzed in units to which the first header or the second header is added, and based on relationship information between a predetermined client and the display position of the image. , By executing a process of rewriting the first header with the second header or a process of changing information indicating the position of the macroblock in the second header to a corresponding display position. Header update processing means for generating an encoded stream obtained by synthesizing the received encoded stream;
As means for transmitting the encoded stream synthesized by the header update processing means to each client terminal,
A multipoint video conference control program for causing the computer to function.

A computer-readable recording medium having a multipoint video conference control program connected to a plurality of client terminals via a network and recorded by a server computer in a multipoint video conference system for controlling a communication conference between the client terminals. A recording medium,
Each client terminal encodes the frame of the input video in its own terminal in units of blocks, and adds a first header to the head of the frame for the encoded data group of a predetermined number of blocks, except for the head of the frame. Means for receiving an encoded stream with a second header including information indicating the position of the block for the encoded data group;
The encoded stream received from each client terminal is analyzed in units to which the first header or the second header is added, and based on relationship information between a predetermined client and the display position of the image. , By executing a process of rewriting the first header with the second header or a process of changing information indicating the position of the macroblock in the second header to a corresponding display position. Header update processing means for generating an encoded stream obtained by synthesizing the received encoded stream;
As means for transmitting the encoded stream synthesized by the header update processing means to each client terminal,
A multipoint video conference control program recording medium, wherein a program for causing the computer to function is recorded.