JP2006279220A

JP2006279220A - Multipoint conference system

Info

Publication number: JP2006279220A
Application number: JP2005091658A
Authority: JP
Inventors: Haruhisa Kato; 晴久加藤; Yasuhiro Takishima; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2005-03-28
Filing date: 2005-03-28
Publication date: 2006-10-12
Anticipated expiration: 2025-03-28
Also published as: JP4693096B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multipoint conference system for efficiently converting resolution and freely synthesizing video images from a plurality of terminals on one screen. <P>SOLUTION: The plurality of terminals 1-1 to 1-4 are connected to a server 2. The server 2 includes: transmitting/receiving means 8-1 to 8-4; a synthesis control means 9; a synthesizing means 10; and processing means 11-1 to 11-4. The processing means 11-1 to 11-4 receive converted and encoded compression encoding moving pictures from the terminals 1-1 to 1-4, so as to convert the resolution in a code area and perform area extracting processing. The synthesizing means 10 synthesizes the plurality of encoding moving pictures processed by the processing means 11-1 to 11-4, so as to perform synthesis into one encoding moving picture. The synthesized encoding moving picture is transmitted to the terminals 1-1 to 1-4. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、多地点のテレビ会議を実現する多地点会議装置に関し、特に、複数の端末からの画像を、効率よく解像度変換して合成することができる多地点会議装置に関する。 The present invention relates to a multipoint conference apparatus that realizes a multipoint video conference, and more particularly to a multipoint conference apparatus that can efficiently perform resolution conversion and combine images from a plurality of terminals.

テレビ会議システムには、二種類の形態、すなわちサーバが存在する形態とサーバが存在しない形態がある。サーバが存在しない形態ではシステムの設定が容易であるが、利用者が増えるたびに新規な端末を他の全ての端末に接続する必要があるため、一対一以上の多地点を結ぶシステムとしては現実的でない。 There are two types of video conference systems, that is, a mode in which a server exists and a mode in which no server exists. System configuration is easy when there is no server, but it is necessary to connect a new terminal to all other terminals as the number of users increases. Not right.

一方、サーバが存在する形態では、利用者が増えても新規な端末をサーバだけに接続すればよいので、利用者数に関わらず簡単な接続でテレビ会議システムを実現できる。このため、一般に多地点を結ぶテレビ会議システムは、サーバが存在する形態となっている。 On the other hand, in a form in which a server exists, a new terminal only needs to be connected to the server even if the number of users increases, so that a video conference system can be realized with a simple connection regardless of the number of users. For this reason, generally, a video conference system that connects multiple points has a form in which a server exists.

サーバが存在する形態の多地点テレビ会議システムでは、サーバには、多地点からの画像を再送信するという機能だけでなく、多地点からの画像を個々に加工し、接続環境に応じた画像にして送信するという機能が求められる。 In a multipoint video conference system in which a server exists, the server not only has the function of retransmitting images from multiple points, but also processes the images from multiple points individually to create an image that matches the connection environment. The function to transmit is required.

例えば、サーバに、多地点からの複数の画像を合成し、１つの画像に構成し直して送信するという機能を持たせることにより、制御情報のオーバーヘッドを低減したり、回線の状況に応じてビットレートやフレームレートを適応的に変換したりすることができる。また、送信する合成画面の中で、発言者だけを大きくしたり、発言者だけを高画質に符号化したりすることにより、テレビ会議を分かり易く、スムーズに進行させることができる。 For example, by giving the server the function of compositing multiple images from multiple points, reconstructing them into one image, and transmitting it, the overhead of control information can be reduced, or the bit can be changed according to the line conditions. The rate and frame rate can be converted adaptively. Also, by enlarging only the speaker or encoding only the speaker with high image quality in the composition screen to be transmitted, the video conference can be easily understood and progressed smoothly.

従来のサーバが存在する多地点テレビ会議システムには、サーバで受信した複数の画像の中から送信する画像を選択する方式と、サーバで受信した複数の画像を合成して１つの画像として送信する方式がある。 In a multipoint video conference system where a conventional server exists, a method for selecting an image to be transmitted from a plurality of images received by the server and a plurality of images received by the server are combined and transmitted as one image. There is a method.

前者の方式は、特許文献１に記載されている。特許文献１に記載のテレビ会議システムでは、管理テーブルを備えて会議参加者の参加モードを対話者モードと観察者モードに区分した状態で管理し、対話者モードに区分された各利用者の端末装置と観察者モードに区分された各利用者の端末装置とで通信負荷の異なるデータをそれぞれ送信することにより、現状の通信インフラの能力を最大限利用して多地点間での遠隔地対話を実現する。 The former method is described in Patent Document 1. In the video conference system described in Patent Document 1, a management table is provided to manage a conference participant's participation mode divided into a dialogue mode and an observer mode, and each user terminal classified into the dialogue mode By transmitting data with different communication loads between each device and the terminal device of each user classified into the observer mode, the remote communication between multiple points can be performed using the current communication infrastructure capacity to the maximum. Realize.

後者の方式は、特許文献２、３に記載されている。特許文献２に記載の多地点制御装置では、テレビ会議端末各々からの多重化データの中からビデオデータのピクチャヘッダを検出してフレーム内符号化されたイントラフレームビデオデータのみを抽出し、抽出されたイントラフレームビデオデータを合成する。これにより、複数のビデオデータを簡易に合成して表示することを可能にしている。 The latter method is described in Patent Documents 2 and 3. In the multipoint control apparatus described in Patent Document 2, only the intra-frame video data that is intra-frame encoded by extracting the picture header of the video data from the multiplexed data from each video conference terminal is extracted and extracted. Synthesized intra-frame video data. This makes it possible to easily combine and display a plurality of video data.

また、特許文献３に記載の多地点制御装置では、符号化されたビットストリームデータを、少なくとも動きベクトル情報を含む符号情報を持ったビットストリームへ復号し、動きベクトル情報に映像の位置変更による動き情報を付加した上で、１画面のビットストリームデータを再構築し符号化する。これにより、映像通信のリアルタイム性を落とすことなく、効率のよい映像位置の変更や配置換えを可能にしている。
特開２００４−７５６１号公報特開２００１−６９４７４号公報特開平１０−２６２２２８号公報 In the multipoint control apparatus described in Patent Document 3, the encoded bitstream data is decoded into a bitstream having code information including at least motion vector information, and motion by changing the position of the video is converted into motion vector information. After adding the information, the bit stream data of one screen is reconstructed and encoded. As a result, the video position can be efficiently changed and rearranged without degrading the real-time property of the video communication.
Japanese Patent Laid-Open No. 2004-7561 JP 2001-69474 A JP-A-10-262228

しかしながら、特許文献１に記載のテレビ会議システムのように、サーバで受信した画像の中から送信する画像を選択する方式では、対話に参加していない人の画像などのデータは伝送されないので、意見を述べるには発言権が回ってくるのを待たなければならず、活発な議論が阻害されるという課題がある。 However, in the method of selecting an image to be transmitted from images received by the server as in the video conference system described in Patent Document 1, data such as an image of a person who does not participate in the dialogue is not transmitted. There is a problem that active discussions are hindered because it is necessary to wait for the right to speak.

特許文献２に記載の多地点制御装置では、イントラフレームビデオデータのみを抽出して合成するので、複数の画像の合成は簡易であるが、イントラフレームビデオデータのみを利用するので、合成されて出力される画像の動きが滑らかでないという課題がある。 In the multipoint control device described in Patent Document 2, since only intra-frame video data is extracted and combined, it is easy to combine a plurality of images. However, since only intra-frame video data is used, it is combined and output. There is a problem that the motion of the image to be played is not smooth.

特許文献３に記載の多地点制御装置では、サーバは受信された画像を連結して１画面の画像を再構築するので、サーバから送信される画像のビットレートは受信された画像のビットレートの合計となる。したがって、帯域が限られる接続環境では１画面として連結する画像の数が制限され、自由に合成ができないという課題がある。 In the multipoint control apparatus described in Patent Document 3, the server concatenates the received images and reconstructs one screen image. Therefore, the bit rate of the image transmitted from the server is the bit rate of the received image. Total. Therefore, in a connection environment where the bandwidth is limited, there is a problem that the number of images to be connected as one screen is limited and the composition cannot be freely performed.

本発明の目的は、上記課題を解決し、解像度変換を効率よく行って複数の端末からの画像を自由に１画面に合成することができる多地点会議装置を提供することにある。 An object of the present invention is to solve the above-mentioned problems and to provide a multipoint conference apparatus that can efficiently combine resolutions and combine images from a plurality of terminals into one screen.

前述の目的を達成するために、本発明は、複数の変換符号化された情報を受信し符号領域上で１つの変換符号化された情報に合成して送信する多地点会議装置において、変換符号化によって圧縮された複数の符号化動画像のそれぞれを入力として符号領域上で処理する複数の処理手段と、前記複数の処理手段で処理された複数の符号化動画像を合成して１つの符号化動画像に合成する合成手段を備えた点を基本的特徴としている。 In order to achieve the above-mentioned object, the present invention provides a conversion code in a multipoint conference apparatus that receives a plurality of pieces of transform-coded information, synthesizes the information into one transform-coded information on a code area, and transmits the information. A plurality of processing means for processing each of a plurality of encoded moving images compressed by the conversion on the code area and a plurality of encoded moving images processed by the plurality of processing means are combined into one code. The basic feature is that it is provided with a synthesizing means for synthesizing it into a synthesized moving image.

本発明では、符号化動画像を画素領域(ベースバンド)まで復号することなく、符号領域上で処理し、合成して１つの符号化動画像に合成するので、画素領域まで復号しての合成のように復号処理および再圧縮処理を行う必要がないので、装置構成を簡単化することができる。なお、符号化動画像を画素領域まで復号しての処理では端末数と同数の復号装置を準備しておく必要がある。 In the present invention, the encoded moving image is processed on the code region without being decoded up to the pixel region (baseband), and is synthesized and combined into one encoded moving image. Thus, since it is not necessary to perform the decoding process and the recompression process, the apparatus configuration can be simplified. Note that it is necessary to prepare the same number of decoding devices as the number of terminals in the process of decoding the encoded moving image to the pixel region.

また、符号情報を直接操作することにより、画像合成に際しての解像度変換などの処理に掛かる時間を短縮できるので、処理に伴う遅延を小さく抑え、高速化を図ることができる。 Further, by directly manipulating the code information, the time required for processing such as resolution conversion at the time of image synthesis can be shortened, so that the delay associated with the processing can be suppressed and the speed can be increased.

また、本発明によれば、イントラフレームビデオデータのみでなくインターフレームのビデオデータも生成できるので、合成されて出力される画像の動きを滑らかにすることができる。 Furthermore, according to the present invention, not only intra-frame video data but also inter-frame video data can be generated, so that the motion of an image that is synthesized and output can be smoothed.

また、複数の画像の合成と同時に解像度変換を施すことにより、合成する画像の数に依存せず、送信する映像のビットレートを一定に抑えることができるので、１画面に合成する画像の数が制限されず、自由に合成ができる。また、単純に解像度変換しただけで複数の画像を送信した場合には画像間で同期ずれが生じ、その後に合成するには同期ずれをなくすなどの処理が必要になるが、本発明では、１つの画像に合成して送信するため、複数の画像間での同期ずれの問題は生じない。 Also, by performing resolution conversion simultaneously with the synthesis of a plurality of images, the bit rate of the video to be transmitted can be kept constant regardless of the number of images to be synthesized, so the number of images to be synthesized on one screen can be reduced. It is not limited and can be synthesized freely. In addition, when a plurality of images are transmitted simply by converting the resolution, a synchronization shift occurs between the images, and a process such as elimination of the synchronization shift is necessary for the subsequent synthesis. Since the images are combined and transmitted, there is no problem of synchronization shift between the plurality of images.

さらに、複数の画像の合成過程において、画像境界部分に任意の情報を埋め込み可能にし、該任意の情報を適宜選択することにより、圧縮効率の向上や会議での利用者の便宜を図ることができる。 Furthermore, in the process of combining a plurality of images, it is possible to embed arbitrary information in the image boundary portion, and by appropriately selecting the arbitrary information, the compression efficiency can be improved and the convenience of the user at the meeting can be achieved. .

以下、図面を参照して本発明を説明する。図１は、本発明に係る多地点会議装置の一実施形態を示すブロック図である。クライアント(利用者端末。以下、単に端末と称する)１−１,１−２,１−３,・・・は利用者ごとに配置されており、それらの中央に位置するサーバ２に接続される。 The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a multipoint conference apparatus according to the present invention. Clients (user terminals; hereinafter simply referred to as terminals) 1-1, 1-2, 1-3,... Are arranged for each user and connected to the server 2 located in the center thereof. .

各端末１−１,１−２,１−３,・・・は、カメラ３−１,３−２,３−３,・・・により撮影された動画像(入力動画像)を取り込み、変換符号化して圧縮符号化動画像を出力する符号化器(encoder)４−１,４−２,４−３,・・・と、この圧縮符号化動画像を伝送路を介してサーバ２に送信し、サーバ２から配信された圧縮符号化動画像を受信する送受信器５−１,５−２,５−３,・・・と、受信された圧縮符号化動画像を復号してディスプレイ６−１,６−２,６−３・・・に出力する復号器(decoder)７−１,７−２,７−３・・・から構成される。 Each terminal 1-1, 1-2, 1-3,... Captures and converts a moving image (input moving image) captured by the camera 3-1, 3-2, 3-3,. Encoders 4-1, 4-2, 4-3,... That encode and output compressed encoded moving images, and transmit the compressed encoded moving images to the server 2 via a transmission path. , Receiving the compressed and encoded moving images distributed from the server 2, and decoding the received compressed and encoded moving images to display 6- The decoders 7-1, 7-2, 7-3,... Output to 1,6-2, 6-3,.

サーバ２は、各端末１−１,１−２,１−３,・・・から送信された圧縮符号化動画像(入力圧縮符号化動画像)を受信し、サーバ２内で合成された圧縮符号化動画像を配信する送受信手段８−１,８−２,８−３,・・・と、複数の入力圧縮符号化動画像を１つ圧縮符号化動画像に合成するに際し、合成に供する入力圧縮符号化動画像動画像の選択や画面内配置、各種効果などを指示する合成制御手段９と、合成制御手段９からの指示に基づいて、複数の入力圧縮符号化動画像を１つ圧縮符号化動画像に合成する合成手段１０と、入力圧縮符号化動画像から所定領域の切り出し、解像度変換など、合成手段１０での合成に際して必要とされる処理を符号領域上で行う処理手段１１−１,１１−２,１１−３,・・・から構成される。 The server 2 receives the compression encoded moving images (input compression encoded moving images) transmitted from the respective terminals 1-1, 1-2, 1-3,. When transmitting / receiving means 8-1, 8-2, 8-3,... For distributing encoded moving images and combining a plurality of input compressed encoded moving images into one compressed encoded moving image, it is used for combining. A combination control means 9 for instructing selection of an input compression-encoded moving image, arrangement in the screen, various effects, and the like. Based on an instruction from the combination control means 9, a plurality of input compression-encoded moving images are compressed by one. A synthesizing unit 10 for synthesizing the encoded moving image, and a processing unit 11-for performing processing necessary for synthesizing by the synthesizing unit 10 such as extraction of a predetermined area from the input compressed encoded moving image and resolution conversion on the code region. 1, 11-2, 11-3,...

合成制御手段９は、手動あるいは自動で入力されるパラメータに従って複数の入力圧縮符号化動画像の合成を合成手段１０に指示する。この指示には、合成に使用する入力圧縮符号化動画像の選択、画面内配置、各種効果などが含まれる。レイアウト、解像度変換の倍率、切り出し、装飾用枠（後述）、その他の効果情報をデータベースとして格納しておき、手動あるいは自動で入力されるパラメータに従ってそれらの情報を呼び出して合成手段１０に指示することもできる。 The synthesis control unit 9 instructs the synthesis unit 10 to synthesize a plurality of input compression-coded moving images according to parameters input manually or automatically. This instruction includes selection of an input compression-encoded moving image to be used for synthesis, on-screen arrangement, various effects, and the like. Layout, resolution conversion magnification, cutout, decoration frame (described later), and other effect information are stored as a database, and the information is recalled and instructed to the combining means 10 according to manually or automatically input parameters. You can also.

合成手段１０は、合成制御手段９からの指示に従って複数の入力圧縮符号化動画像を符号領域上で１つ圧縮符号化動画像に合成する。また、合成に際して必要となる所定領域の切り出し、解像度変換などを処理手段１１−１,１１−２,１１−３,・・・に指示する。 The synthesizing unit 10 synthesizes a plurality of input compression-encoded moving images into one compression-encoded moving image on the code area in accordance with an instruction from the synthesis control unit 9. In addition, it instructs the processing means 11-1, 11-2, 11-3,.

処理手段１１−１,１１−２,１１−３,・・・は、合成手段１０からの所定領域の切り出し、解像度変換など指示に従って入力圧縮符号化動画像を処理する。処理手段１１−１,１１−２,１１−３,・・・での処理は、符号情報そのものまたはその一部を可変長復号した情報を再利用して符号領域上で行われ、入力圧縮符号化動画像は完全に画素領域までは復号されない。 The processing means 11-1, 11-2, 11-3,... Process the input compressed and encoded moving image in accordance with instructions such as extraction of a predetermined area and resolution conversion from the synthesizing means 10. The processing in the processing means 11-1, 11-2, 11-3,... Is performed on the code area by reusing the code information itself or information obtained by variable-length decoding a part thereof, and the input compression code The converted moving image is not completely decoded up to the pixel region.

符号領域上での解像度変換は、予め計算した変換行列との行列演算で実現できる。具体的には、符号領域上での解像度変換は、倍率を有理数の分数で表現したとき、分母に相当する個数の変換符号化係数の組を入力とし、分子に相当する個数の変換符号化係数の組を出力する。 Resolution conversion on the code area can be realized by matrix operation with a previously calculated conversion matrix. Specifically, the resolution conversion on the code domain is performed by inputting a set of transform coding coefficients corresponding to the denominator and the number of transform coding coefficients corresponding to the numerator when the magnification is expressed as a rational fraction. Output the pair.

解像度変換に用いる変換行列として、変換符号化係数を復号する変換符号化行列を分母の個数分だけ組み合わせた行列と、水平垂直それぞれの方向に対応した解像度変換を表現する行列と、画素を変換符号化する変換符号化行列を分子の個数分だけ組み合わせた行列との積を予め計算して求めておく。 As a transformation matrix used for resolution transformation, a matrix obtained by combining transformation coding matrices for decoding transformation coding coefficients by the number of denominators, a matrix expressing resolution transformation corresponding to horizontal and vertical directions, and a pixel transformation code. The product of the transform coding matrix to be converted and the matrix combined with the number of numerators is calculated in advance.

符号領域上での解像度変換の処理自体は、入力圧縮符号化動画像の変換符号化係数ごとに高周波成分を強制的に０とした行列と変換行列との行列演算で実現することができる。つまり、入力圧縮符号化動画像の変換符号化係数の低周波成分に対して符号領域上での解像度変換を実現することができる。強制的に０とする高周波成分は倍率によって制御することができ、この操作によって画質を保ったまま計算量を抑えることができる。 The resolution conversion process itself on the code area can be realized by a matrix operation of a matrix and a conversion matrix in which a high frequency component is forcibly set to 0 for each conversion coding coefficient of an input compression encoded moving image. That is, resolution conversion on the code region can be realized for the low-frequency component of the transform coding coefficient of the input compression-coded moving image. The high frequency component forcibly set to 0 can be controlled by the magnification, and the calculation amount can be suppressed while maintaining the image quality by this operation.

また、本発明者らが特願２００３−３７２２２３号(先願１)で先に提案した手法を利用して符号領域上で解像度変換を行うこともできる。図２は、先願１で提案した解像度変換の機能ブロック図である。符号化係数取得部２１は、入力圧縮符号化動画像を部分的に復号して８×８ＤＣＴ係数の低域４×４ＤＣＴ成分を取得し、減算器２２と第１の縮小画像生成部２８に出力する。 Further, resolution conversion can be performed on the code region by using the technique previously proposed by the present inventors in Japanese Patent Application No. 2003-372223 (Prior Application 1). FIG. 2 is a functional block diagram of resolution conversion proposed in the prior application 1. The encoding coefficient acquisition unit 21 partially decodes the input compression encoded moving image to acquire the low frequency 4 × 4 DCT component of the 8 × 8 DCT coefficient, and outputs it to the subtractor 22 and the first reduced image generation unit 28. To do.

減算器２２は、符号化係数取得部２１で取得された１フレーム分の低域４×４ＤＣＴ成分と動き補償された１フレーム分の予測４×４ＤＣＴ係数の差を求め、１フレーム分の４×４ＤＣＴ係数の差分値である予測誤差情報を出力する。動き補償された１フレーム分の予測４×４ＤＣＴ係数は、動き補償部２７から出力される。 The subtracter 22 obtains a difference between the low-frequency 4 × 4 DCT component for one frame acquired by the coding coefficient acquisition unit 21 and the predicted 4 × 4 DCT coefficient for one frame for which motion compensation has been performed, and 4 × for one frame. Prediction error information that is a difference value of 4DCT coefficients is output. The predicted 4 × 4 DCT coefficients for one frame subjected to motion compensation are output from the motion compensation unit 27.

量子化部２３は、減算器２２で得られた予測誤差情報を量子化し、量子化済み予測誤差情報（量子化済み低域４×４ＤＣＴ係数）を逆量子化部２４と基底変換部３１に出力する。逆量子化部２４は、量子化部２３で得られた量子化予測誤差情報を逆量子化し、量子化誤差を含んだ予測誤差情報を出力する。 The quantization unit 23 quantizes the prediction error information obtained by the subtracter 22 and outputs the quantized prediction error information (quantized low frequency 4 × 4 DCT coefficient) to the inverse quantization unit 24 and the basis conversion unit 31. To do. The inverse quantization unit 24 inversely quantizes the quantization prediction error information obtained by the quantization unit 23 and outputs prediction error information including a quantization error.

加算器２５は、逆量子化部２４で得られた量子化誤差を含んだ予測誤差情報と動き補償された１フレーム分の予測４×４ＤＣＴ係数を加算し、１フレーム分の参照４×４ＤＣＴ係数を求め、フレームメモリ２６に出力する。ここで用いる動き補償された１フレーム分の予測４×４ＤＣＴ係数も後述の動き補償部２７から出力される。 The adder 25 adds the prediction error information including the quantization error obtained by the inverse quantization unit 24 and the motion-compensated prediction 4 × 4 DCT coefficient for one frame, and adds a reference 4 × 4 DCT coefficient for one frame. Is output to the frame memory 26. The motion-compensated 27 × 4 DCT coefficients for one frame used here are also output from the motion compensation unit 27 described later.

フレームメモリ２６は、加算器２５で得られた１フレーム分の参照４×４ＤＣＴ係数を格納し、これを符号化する順序に応じて１フレーム分の参照４×４ＤＣＴ係数として動き補償部２７と第２の縮小画像生成部２９に出力する。 The frame memory 26 stores the reference 4 × 4 DCT coefficients for one frame obtained by the adder 25, and the motion compensation unit 27 and the first reference 4 × 4 DCT coefficients as one frame according to the encoding order. 2 to the reduced image generation unit 29.

動き補償部２７は、フレームメモリ２６から得られた１フレーム分の参照４×４ＤＣＴ係数を動きベクトルを用いて４×４ＤＣＴ係数上で動き補償して前の１フレーム分の予測４×４ＤＣＴ係数を生成し、これを減算器２２と加算器２５に出力する。 The motion compensation unit 27 performs motion compensation on the reference 4 × 4 DCT coefficient for one frame obtained from the frame memory 26 on the 4 × 4 DCT coefficient using the motion vector, and obtains the predicted 4 × 4 DCT coefficient for the previous one frame. This is generated and output to the subtracter 22 and the adder 25.

第１の縮小画像生成部２８は、符号化係数取得部２１で取得された１フレーム分の低域４×４ＤＣＴ係数のうちの２×２ＤＣＴ係数を加減算のみからなる２×２ＩＤＣＴで処理して縮小画像を生成し、この縮小画像を動き予測部３０に出力する。 The first reduced image generation unit 28 reduces the 2 × 2 DCT coefficients of the low frequency 4 × 4 DCT coefficients for one frame acquired by the coding coefficient acquisition unit 21 by 2 × 2 IDCT including only addition and subtraction. An image is generated, and the reduced image is output to the motion prediction unit 30.

第２の縮小画像生成部２９は、フレームメモリ２６から得られた１フレーム分の低域４×４ＤＣＴ係数のうちの２×２ＤＣＴ係数を加減算のみからなる２×２ＩＤＣＴで処理して縮小画像を生成し、この縮小画像を動き予測部３０に出力する。 The second reduced image generating unit 29 generates a reduced image by processing the 2 × 2 DCT coefficients of the low frequency 4 × 4 DCT coefficients for one frame obtained from the frame memory 26 with 2 × 2 IDCT including only addition and subtraction. Then, this reduced image is output to the motion prediction unit 30.

動き予測部３０は、第１の縮小画像生成部２８で得られた縮小画像と第２の縮小画像生成部２９で得られた縮小画像との間の動きベクトルを探索し、探索した動きベクトルを動き補償部２７に出力する。 The motion prediction unit 30 searches for a motion vector between the reduced image obtained by the first reduced image generation unit 28 and the reduced image obtained by the second reduced image generation unit 29, and uses the searched motion vector. The result is output to the motion compensation unit 27.

基底変換部３１は、量子化部２３で得られた量子化済み予測誤差情報をＤＣＴ係数上で８×８ＤＣＴ係数に変換し、量子化済み８×８ＤＣＴ係数を可変長符号化部３２へ出力する。 The basis conversion unit 31 converts the quantized prediction error information obtained by the quantization unit 23 into 8 × 8 DCT coefficients on the DCT coefficients, and outputs the quantized 8 × 8 DCT coefficients to the variable length coding unit 32. .

可変長符号化部３２は、基底変換部３１で得られた量子化済み８×８ＤＣＴ係数と動き補償部３０で得られた動きベクトルを可変長符号化し、バッファ３３に出力する。 The variable length coding unit 32 performs variable length coding on the quantized 8 × 8 DCT coefficient obtained by the basis conversion unit 31 and the motion vector obtained by the motion compensation unit 30, and outputs the result to the buffer 33.

バッファ３３は、可変長符号化部３２で得られたデータを一時的に保持し、送出する。レート制御部３４は、バッファ３３に保持された符号量をもとに量子化パラメータを決定し、量子化部２３からの情報発生量を制御する。 The buffer 33 temporarily holds and transmits the data obtained by the variable length encoding unit 32. The rate control unit 34 determines a quantization parameter based on the code amount held in the buffer 33 and controls the amount of information generated from the quantization unit 23.

符号領域上での解像度変換などの処理において、変換速度を優先させるか変換精度を優先させるかに応じて変換方式を切り替えるようにすることができる。符号領域上での動き補償では動き情報の取りうる値によって処理負荷が異なるので、変換速度を優先させる場合には、処理負荷の小さい値に動き情報を優先的に設定することが好ましい。処理負荷を軽減するため動き探索処理を省略し、代わりに処理対象領域に存在する入力圧縮符号化動画像の動き情報の中央値を出力圧縮符号化動画像の動き情報として利用することもできる。 In processing such as resolution conversion on the code area, the conversion method can be switched according to whether priority is given to conversion speed or conversion accuracy. In the motion compensation on the code area, the processing load varies depending on the value that the motion information can take. Therefore, when giving priority to the conversion speed, it is preferable to preferentially set the motion information to a value with a small processing load. In order to reduce the processing load, the motion search process can be omitted, and instead, the median value of the motion information of the input compression-coded moving image existing in the processing target region can be used as the motion information of the output compression-coded moving image.

画像の選択や合成に際し、解像度変換することなく領域全体またはその一部領域だけを切り出す場合は、該当する領域の変換符号化係数を圧縮符号化動画像から抽出するだけでよい。ただし、入力圧縮符号化動画像が、動き補償による残差成分と動き情報だけを保持している場合は、切り出す領域外を参照する場合があるので、該当する領域だけを符号領域上でイントラブロックへ変換する。つまり、入力圧縮符号化動画像における必要な領域だけを別形式に変換する。 When extracting or synthesizing an image, if the entire region or only a partial region thereof is cut out without performing resolution conversion, it is only necessary to extract the transform coding coefficient of the corresponding region from the compression-coded moving image. However, if the input compression-coded moving image retains only the residual component and motion information by motion compensation, it may refer to the outside of the region to be cut out, so only the corresponding region is an intra block on the code region. Convert to That is, only a necessary area in the input compression-coded moving image is converted into another format.

符号領域上で逆動き補償を行ってイントラブロックへ変換する処理には、例えば、本発明者らが特願２００４−５３８７号(先願２)で先に提案した手法を利用することができる。図３は、この処理を示す機能ブロック図であり、図４は、その様子を示す説明図である。 For the process of performing inverse motion compensation on the code domain and converting it into an intra block, for example, the technique previously proposed by the present inventors in Japanese Patent Application No. 2004-5387 (Prior Application 2) can be used. FIG. 3 is a functional block diagram showing this process, and FIG. 4 is an explanatory diagram showing this state.

図３において、符号情報取得部３５は、変換符号化された入力圧縮符号化動画像の符号情報を部分的に復号して動き予測情報(動きベクトル)および量子化済み予測誤差情報（差分ＤＣＴ係数）を取得する。符号情報取得部３５で抽出された動きベクトルは、逆動き補償部３７に出力され、量子化済み差分ＤＣＴ係数は、逆量子化部３６に出力される。 In FIG. 3, the code information acquisition unit 35 partially decodes code information of a transform-coded input compression-coded moving image, and performs motion prediction information (motion vector) and quantized prediction error information (difference DCT coefficient). ) To get. The motion vector extracted by the code information acquisition unit 35 is output to the inverse motion compensation unit 37, and the quantized differential DCT coefficient is output to the inverse quantization unit 36.

逆量子化部３６は、符号情報取得部３５で取得された量子化済み差分ＤＣＴ係数を逆量子化する。逆量子化により生成された量子化誤差を含んだ差分ＤＣＴ係数は、逆動き補償部３７に出力される。 The inverse quantization unit 36 inversely quantizes the quantized differential DCT coefficient acquired by the code information acquisition unit 35. The differential DCT coefficient including the quantization error generated by the inverse quantization is output to the inverse motion compensation unit 37.

逆動き補償部３７はフレームメモリを含み、フレームメモリから得られる１フレーム前の参照ＤＣＴ係数と符号情報取得部３５で取得された動きベクトルと逆量子化部３６で得られた処理対象フレームの差分ＤＣＴ係数(８×８ＤＣＴ係数)を用いてＤＣＴ係数領域上で逆動き補償し、処理対象フレームのイントラＤＣＴ係数を導出する。 The inverse motion compensation unit 37 includes a frame memory, and the difference between the reference DCT coefficient one frame before obtained from the frame memory, the motion vector obtained by the code information obtaining unit 35, and the processing target frame obtained by the inverse quantization unit 36. Inverse motion compensation is performed on the DCT coefficient region using the DCT coefficient (8 × 8 DCT coefficient) to derive the intra DCT coefficient of the processing target frame.

図４は、逆量子化部３６で生成された差分８×８ＤＣＴ係数の４組すなわちマクロブロックと、動きベクトルに基づいて参照される前フレームの参照ＤＣＴ係数との関係を示す。なお、Ｒ_００〜Ｒ_２２はそれぞれ、８×８ＤＣＴ係数のブロックである。 FIG. 4 shows the relationship between the four sets of difference 8 × 8 DCT coefficients generated by the inverse quantization unit 36, that is, macroblocks, and the reference DCT coefficients of the previous frame referred to based on the motion vector. Each of R _{00 to} R ₂₂ is a block of 8 × 8 DCT coefficients.

逆動き補償する場合、マクロブロック１つ分の４組の差分８×８ＤＣＴ係数は、最大で前フレームの９組の８×８ＤＣＴ係数Ｒ_００〜Ｒ_２２にまたがることがあるので、ＤＣＴ係数領域上で動きベクトルと９組の参照８×８ＤＣＴ係数から４組の差分８×８ＤＣＴ係数を逆動き補償する必要がある。 When reverse motion compensation is performed, four sets of difference 8 × 8 DCT coefficients for one macroblock may straddle up to nine sets of 8 × 8 DCT coefficients R _{00 to} R ₂₂ in the previous frame. Therefore, it is necessary to perform inverse motion compensation of 4 sets of difference 8 × 8 DCT coefficients from the motion vector and 9 sets of reference 8 × 8 DCT coefficients.

動きベクトルの水平成分，垂直成分ともに０を含んだ８の倍数である場合には、１つのブロック(ＤＣＴ係数ブロック)は前フレームのブロックと全く重なり、複数のブロックにまたがらないので、該当する場所の参照ＤＣＴ係数を取得すればよい。それ以外の場合、複数のブロックにまたがるのでＤＣＴ係数をそれらのブロックから取得して再構成する必要がある。 If the horizontal and vertical components of the motion vector are multiples of 8 including 0, one block (DCT coefficient block) overlaps with the block of the previous frame and does not span a plurality of blocks. What is necessary is just to acquire the reference DCT coefficient of a place. In other cases, since it extends over a plurality of blocks, it is necessary to obtain and reconstruct DCT coefficients from those blocks.

先願２には、動きベクトルの水平成分，垂直成分ごとおよび動きベクトルが取り得る数値ごとに異なる場合においても、予め計算された変換テーブルを用いて逆動き補償を行えることが記載されている。変換テーブルは、画素領域上での逆動き補償の演算式から導出される。符号領域上での逆動き補償での動き情報の再探索には、先願１での手法を利用することもできる。 Prior application 2 describes that reverse motion compensation can be performed by using a pre-calculated conversion table even when the horizontal and vertical components of the motion vector are different for each numerical value that can be taken by the motion vector. The conversion table is derived from an arithmetic expression for reverse motion compensation on the pixel region. The method of the prior application 1 can also be used for re-searching motion information in inverse motion compensation on the code domain.

合成手段１０(図１)により複数の画像を合成する際、個々の入力画像を任意の倍率で解像度変換して得られる複数の入力画像が重なり合わないように配置することもできるし、複数の入力画像の一部が重なり合うように配置することもできる。また、解像度変換せずに入力画像の一部を切り取って配置することもできる。 When synthesizing a plurality of images by the synthesizing means 10 (FIG. 1), the input images obtained by converting the resolution of individual input images at an arbitrary magnification can be arranged so as not to overlap each other. It is also possible to arrange the input images so as to partially overlap each other. Also, it is possible to cut and arrange a part of the input image without converting the resolution.

例えば４地点での会議の場合、４つの入力画像をそれぞれ縦横１／２倍した画像を重なり合わないように配置することも可能であるし、解像度変換をしない１つの入力画像の任意の位置に縦横１／８倍した３つの入力画像を重ね合うように配置することもできる。これは変換符号化係数を上書きすることで実現できる。また、４つの入力画像の縦横１／２の領域を任意の位置から切り出して重なり合わないように配置することもできる。 For example, in the case of a meeting at four points, it is possible to arrange the four input images so as not to overlap each other by halving the vertical and horizontal directions, or to an arbitrary position of one input image that does not perform resolution conversion. It is also possible to arrange three input images that are 1/8 times vertically and horizontally so as to overlap each other. This can be realized by overwriting the transform coding coefficient. It is also possible to cut out the vertical and horizontal half regions of the four input images from arbitrary positions so as not to overlap.

解像度変換を併用すれば、いずれの配置においても、受信した４つの圧縮符号化動画像のビットレートの合計よりサーバ２が送信する圧縮符号化動画像のビットレートを小さくすることができる。 If resolution conversion is used in combination, the bit rate of the compressed encoded moving image transmitted by the server 2 can be made smaller than the sum of the bit rates of the received four compressed encoded moving images in any arrangement.

さらに、画像の配置は、上記のように空間的だけでなく、時間的に変更することができる。例えば、発話者を大きな画面にして配置することで、注目される画像をより強調して提示することができる。 Furthermore, the arrangement of the images can be changed not only spatially but also temporally as described above. For example, by placing a speaker on a large screen, a noticeable image can be presented with more emphasis.

合成手段１０で、解像度変換された入力圧縮符号化動画像あるいは所定領域を切り出すことにより得られた入力圧縮符号化動画像を合成すると、画像境界部での不連続性に起因して圧縮効率が低下することがある。圧縮効率を改善するには、画像境界部に入力圧縮符号化動画像とは異なる任意の情報を符号領域上で埋め込み可能にし、該任意の情報で境界部での不連続性を軽減すればよい。 When the synthesizing unit 10 synthesizes an input compression-encoded moving image whose resolution has been converted or an input compression-encoded moving image obtained by cutting out a predetermined region, the compression efficiency is increased due to discontinuity at the image boundary. May decrease. In order to improve the compression efficiency, it is possible to embed any information different from the input compression-coded moving image in the image boundary portion in the code area, and reduce the discontinuity at the boundary portion with the arbitrary information. .

上記任意の情報を予め符号化された変換符号化係数とし、これで境界部を置き換えるようにすれば、合成処理自体の負荷を低減することができる。また、任意の情報はイントラフレームだけで置き換え、イントラフレーム以外は同じ変換符号化係数を再利用できるので、圧縮効率を改善することができる。圧縮効率が改善された分だけ他の領域の画質向上や高速化に振り向けることができる。 If the above-mentioned arbitrary information is used as a transform coding coefficient coded in advance and the boundary portion is replaced by this, the load on the synthesis process itself can be reduced. Also, arbitrary information can be replaced with only intra frames, and the same transform coding coefficients can be reused for other than intra frames, so that the compression efficiency can be improved. As much as the compression efficiency is improved, it can be directed to improving the image quality and speeding up in other areas.

上記任意の情報として、端末に関連する情報や装飾用枠を表示する情報などを用いることができる。また、任意の情報を時間的に変化させて画像の境界部に変化を持たせることもできる。例えば、発話者には派手な装飾用枠を付与することによって注目画像を強調して提示することができるので、圧縮効率の改善や高速化だけでなく利用者の便宜を図ることができる。 As said arbitrary information, the information relevant to a terminal, the information which displays a decoration frame, etc. can be used. It is also possible to change the boundary portion of the image by changing arbitrary information with time. For example, since an attention image can be emphasized and presented to a speaker by adding a flashy decorative frame, not only the compression efficiency is improved and the speed is increased, but also the convenience of the user can be achieved.

各端末１−１,１−２,１−３,・・・は異なる環境に配置されるのが普通であり、このような場合にはカメラ３−１,３−２,３−３,・・・が出力する画像の輝度(明るさ)、コントラスト、色調などが異なる。輝度、コントラスト、色調などが異なる複数の画像を１つの画像に合成すると、全体的に統一感がなく、見にくい画像となる。全体的に統一感があり、見やすい画像を得るために、符号領域上で輝度やコントラスト、色調などを補正処理することが好ましい。 The terminals 1-1, 1-2, 1-3,... Are usually arranged in different environments, and in such a case, the cameras 3-1, 3-2, 3-3,.・・ The brightness (brightness), contrast, color tone, etc. of the output image are different. When a plurality of images having different brightness, contrast, color tone, and the like are combined into a single image, there is no overall sense of unity and the image is difficult to see. In order to obtain an image that is uniform and easy to see as a whole, it is preferable to correct the luminance, contrast, color tone, and the like on the code area.

複数の入力圧縮符号化動画像において画面内明るさの平均値間に相違が認められる場合、イントラフレームに対して変換符号化係数の直流成分を調整するだけで、各画像の明るさを補正し、合成された画像全体における明るさを一定に保つことができる。各画像において明るさを暗くする場合は直流成分を減じ、明るくする場合には直流成分を増加させる。 If there is a difference between the average values of in-screen brightness in multiple input compression-coded moving images, the brightness of each image can be corrected by simply adjusting the DC component of the transform coding coefficient for intra frames. The brightness of the entire synthesized image can be kept constant. When the brightness is reduced in each image, the direct current component is decreased, and when the brightness is increased, the direct current component is increased.

コントラストの補正は、符号化係数の交流成分を調整することで実現できる。コントラストを弱める場合は交流成分を減じ、強める場合には交流成分を増加させる。 The contrast correction can be realized by adjusting the AC component of the coding coefficient. When the contrast is weakened, the AC component is decreased, and when it is increased, the AC component is increased.

複数の画像を半透明に重なり合うようにして滑らか、かつ連続的に合成する効果を得ることもできる。これは、境界部分の変換符号化係数を符号領域上で重み付け平均することによって実現できる。 It is also possible to obtain an effect of smoothly and continuously synthesizing a plurality of images so as to be semitransparently overlapped. This can be realized by weighted averaging the transform coding coefficients in the boundary portion on the code area.

また、符号領域上で明るさやコントラスト、色調などを操作して、特定の領域だけを強調して提示することもできる。例えば、発話者以外の画像から色差成分を除去することによって発話者だけをカラー画像にして発話者を強調することができる。除去した色差成分に割り当てられていた情報量を発話者の画像に割り当て、その画質向上を図ることもできる。 It is also possible to emphasize and present only a specific area by operating brightness, contrast, color tone, etc. on the code area. For example, by removing the color difference component from the image other than the speaker, only the speaker can be made a color image and the speaker can be emphasized. The amount of information assigned to the removed color difference component can be assigned to the speaker's image to improve the image quality.

図５は、合成手段１０による合成画像の具体例を示す図であり、いずれも４地点での会議の場合に４つの入力画像を合成した例である。同図(ａ)は、４つの入力画像をそれぞれ縦横１／２倍した画像を重なり合わないように配置した例であり、同図(ｂ)は、４つの入力画像をそれぞれ横１／４倍した画像を重なり合わないように配置した例であり、同図(ｃ)は、４つの入力画像のうち横２／３倍した画像と縦横１／３倍した画像を重なり合わないように配置した例であり、同図(ｄ)は、解像度変換しない１つの画像の任意の位置に縦横１／４倍した３つの画像を重なり合うように配置した例である。画像の重ね合わせは、変換符号化係数を上書きすることで実現できる。これらの合成画像では、画像境界部に縞模様や黒などの装飾用枠が付されている。なお、縞模様は、動きベクトルにより模様方向にスクロールさせることができる。 FIG. 5 is a diagram showing a specific example of a composite image by the combining means 10, and all are examples in which four input images are combined in the case of a meeting at four points. (A) in the figure is an example in which four input images are arranged so as not to overlap each other by ½ times in the vertical and horizontal directions, and (b) in FIG. In this example, the images are arranged so as not to overlap each other, and in the figure (c), the image that is 2/3 times the width and the image that is 1/3 times the height and width are arranged so as not to overlap. FIG. 4D shows an example in which three images that are ¼ times in length and width are arranged so as to overlap at an arbitrary position of one image that is not subjected to resolution conversion. Image superimposition can be realized by overwriting the transform coding coefficient. In these synthesized images, a decorative frame such as a striped pattern or black is added to the image boundary. The striped pattern can be scrolled in the pattern direction by the motion vector.

図５(ｅ)〜(ｈ)は、他の合成画像の例である。このように、４つの画像を種々のフォーマットで合成することができ、入力画像から任意の領域を切り出して合成することもできる。これらの合成画像では、画像境界部や各画像の人物を取り囲むように装飾が付されている。 FIGS. 5E to 5H are examples of other composite images. As described above, the four images can be combined in various formats, and an arbitrary region can be cut out from the input image and combined. These synthesized images are decorated so as to surround the image boundary and the person of each image.

以上、実施形態を説明したが、本発明は、上記実施形態に限定されるものではなく、その要旨を逸脱しない範囲において種々に変更可能である。 Although the embodiments have been described above, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention.

本発明に係る多地点会議装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the multipoint conference apparatus which concerns on this invention. 先に提案した符号領域上での解像度変換処理の機能ブロック図である。It is a functional block diagram of the resolution conversion process on the code area proposed previously. 先に提案した符号領域上での逆動き補償およびイントラブロックへ変換処理の機能ブロック図である。It is a functional block diagram of the inverse motion compensation and conversion processing to an intra block on the code domain proposed previously. 逆動き補償の説明図である。It is explanatory drawing of reverse motion compensation. 合成手段による合成画像の具体例を示す図である。It is a figure which shows the specific example of the synthesized image by a synthetic | combination means.

Explanation of symbols

１−１〜１−４・・・端末、２・・・サーバ、３−１〜３−４・・・カメラ、４−１〜４−４・・・符号化器、５−１〜５−４・・・送受信器、６−１〜６−４・・・ディスプレイ。７−１〜７−４・・・復号器、８−１〜８−４・・・送受信手段、９・・・合成制御手段、１０・・・合成手段、１１−１〜１１−４・・・処理手段、２１，３５・・・符号情報取得部、２２・・・減算器、２３・・・量子化部、２４，３６・・・逆量子化部、２５・・・加算器、２６・・・フレームメモリ、２７・・・動き補償部、２８，２９・・・縮小画像生成部、３０・・・動き予測部、３１・・・基底変換部、３２・・・可変長符号化部、３３・・・バッファ、３４・・・レート制御部、３７・・・逆動き補償部 1-1 to 1-4 terminal, 2 server, 3-1 to 3-4 camera, 4-1 to 4-4 encoder, 5-1 to 5- 4 ... transceivers, 6-1 to 6-4 ... display. 7-1 to 7-4... Decoder, 8-1 to 8-4 ... transmission / reception means, 9 ... synthesis control means, 10 ... synthesis means, 11-1 to 11-4. Processing means 21, 35: Code information acquisition unit, 22: Subtractor, 23: Quantization unit, 24, 36: Inverse quantization unit, 25: Adder, 26 .. Frame memory 27... Motion compensation unit 28 and 29... Reduced image generation unit 30... Motion prediction unit 31 .. basis conversion unit 32. 33: Buffer, 34: Rate control unit, 37: Reverse motion compensation unit

Claims

In a multipoint conference apparatus that receives a plurality of transform-coded information, combines the information into one transform-coded information on a code area, and transmits the information,
A plurality of processing means for processing each of a plurality of encoded moving images compressed by transform encoding on the code region as an input;
A multipoint conference apparatus comprising: a combining unit that combines a plurality of encoded moving images processed by the plurality of processing units and combines them into one encoded moving image.

The multipoint conference apparatus according to claim 1, wherein each of the plurality of processing units includes a processing unit that directly processes and processes code information of an encoded moving image on a code area.

The multipoint conference apparatus according to claim 2, wherein the processing unit converts code information of an encoded moving image of only a necessary area into another format.

The multipoint conference according to claim 2, wherein the processing unit performs at least one of a resolution conversion process on a code area and a process of extracting a specific area and forming an encoded moving image of a specified size. apparatus.

The multipoint conference apparatus according to claim 2, wherein the processing unit switches a processing method according to which of conversion speed and conversion accuracy is prioritized.

The multipoint conference apparatus according to claim 1, wherein the synthesizing unit includes a processing unit that directly processes and processes code information of the encoded moving image from the processing unit on a code region.

The multiple synthesizing unit according to claim 6, wherein the synthesizing unit is capable of synthesizing a plurality of encoded moving images processed by the processing unit into one encoded moving image with an arbitrary spatial and temporal arrangement. Point conference equipment.

The synthesizing unit includes an embedding processing unit capable of embedding arbitrary information in a code area in an image boundary portion when synthesizing a plurality of encoded moving images into one encoded moving image. Item 7. The multipoint conference device according to Item 6.

The multipoint conference apparatus according to claim 8, wherein the embedding processing unit can select, as the arbitrary information, information that minimizes the amount of code generated after embedding.

The multipoint conference apparatus according to claim 8, wherein the embedding processing unit can operate the arbitrary information on a code area to change in time and space.

The multipoint conference apparatus according to claim 6, wherein the synthesizing unit includes a correction processing unit capable of correcting at least one of luminance, contrast, and color tone on a code area.