JP4222557B2

JP4222557B2 - Video composition device

Info

Publication number: JP4222557B2
Application number: JP2004005387A
Authority: JP
Inventors: 晴久加藤; 康之中島
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2004-01-13
Filing date: 2004-01-13
Publication date: 2009-02-12
Anticipated expiration: 2024-01-13
Also published as: JP2005203847A

Description

本発明は、映像合成装置に関し、特に、符号化領域上で複数の動画像を高速かつ高精度に１つの画像に合成することができる映像合成装置に関する。 The present invention relates to a video synthesizing apparatus, and more particularly to a video synthesizing apparatus capable of synthesizing a plurality of moving images into one image at high speed and with high accuracy on an encoding region.

テレビ会議システムには、端末間でサーバを介することなく画像などの情報を直接やり取りする形態とサーバを介して画像などの情報をやり取りする形態の２種類がある。端末間にサーバが介在しない形態では利用者が増えるたびに他の全ての利用者との間に回線を接続することが必要である。 There are two types of video conference systems: a mode in which information such as an image is directly exchanged between terminals without going through a server, and a mode in which information such as an image is exchanged through a server. In a form in which no server is interposed between terminals, it is necessary to connect a line to all other users whenever the number of users increases.

一方、サーバが存在する形態では、利用者の数に拘わらず利用者が増えてもその端末をサーバに接続するだけでシステムを構成することができる。したがって、一般に多地点を結ぶテレビ会議システムではサーバが介在する形態が採用される。 On the other hand, in the form in which the server exists, even if the number of users increases regardless of the number of users, the system can be configured only by connecting the terminals to the server. Therefore, in general, a video conference system that connects multiple points adopts a form in which a server is interposed.

特許文献１〜３には、伝送されてきた符号化済み画像を一旦画素領域まで復号し、復号された画素を操作して１つの画面に配置し直し、それによって得られる画像を再び符号化して送出することにより、複数の符号化済み画像を１つに合成して配信するテレビ会議システムが記載されている。 In Patent Documents 1 to 3, a transmitted encoded image is once decoded to a pixel area, and the decoded pixel is manipulated and rearranged on one screen, and an image obtained thereby is encoded again. A video conference system is described in which a plurality of encoded images are combined and distributed by sending them out.

また、特許文献４〜６には、テレビ会議システムではないが、符号化済み画像を画素領域までは復号せず、符号化領域上で近似処理を含む加工処理を行うサーバの要素技術が記載されている。
特開２００３−１０２００９号公報特開２００３−１６３９０３号公報特開２００３−２１９３０９号公報特開２０００−３５０２０７号公報特開２００１−１０３４８２号公報特開２００３−３４８５９８号公報 Patent Documents 4 to 6 describe elemental technologies of a server that is not a video conference system but does not decode an encoded image up to a pixel area and performs processing including approximation processing on the encoded area. ing.
JP 2003-102009 A JP 2003-163903 A JP 2003-219309 A JP 2000-350207 A JP 2001-103482 A JP 2003-348598 A

端末間にサーバが介在しない形態のテレビ会議システムは、システム設定が容易であるが、利用者が増えるたびに新規な端末と他の全ての利用者の端末との間に回線を接続する必要があるため、一対一以上の多地点を結ぶシステムとしては現実的でない。 A video conference system in which no server is interposed between terminals is easy to set up the system, but it is necessary to connect a line between a new terminal and all other users' terminals as the number of users increases. Therefore, it is not realistic as a system that connects multiple points of one to one or more.

端末間にサーバが介在する形態のテレビ会議システムにおいて、サーバは多地点から伝送されてきた画像を再配信するという機能だけでなく、伝送されてきた個々の画像を加工し、接続環境に応じた画像にして配信する機能を有することが要求される。 In a video conference system in which a server is interposed between terminals, the server not only has a function of redistributing images transmitted from multiple points, but also processes each transmitted image according to the connection environment. It is required to have a function for distributing images.

例えば、多地点から伝送されてきた画像を合成し１つの画像として構成し直すことでスムーズなテレビ会議の進行が期待でき、また、制御情報のオーバヘッドを低減したり、ビットレートやフレームレートを適応的に変換したりすることもできる。さらに、多地点から伝送されてきた画像を１つの画像に合成する際に、合成画面の中で発言者だけを大きくしたり、発言者だけを高画質に符号化したりすることも望ましいものである。 For example, smooth video conference progress can be expected by combining images transmitted from multiple points and reconstructing them as a single image. Also, the overhead of control information can be reduced, and the bit rate and frame rate can be adapted. Can also be converted. Furthermore, when combining images transmitted from multiple points into a single image, it is also desirable to enlarge only the speaker or encode only the speaker with high image quality in the combined screen. .

特許文献１〜３に記載された技術は、いずれも伝送されてきた符号化済み画像を一旦画素領域まで復号し、画素領域上で１つの画面に配置し直して再構成した画像を符号化して送出するものであるため、符号化済み画像を画素領域まで復号する処理負荷および画素領域上で合成した画像を再符号化する処理負荷によって大きな遅延が生じるという課題がある。 In the techniques described in Patent Documents 1 to 3, the encoded image that has been transmitted is once decoded up to the pixel area, and is re-arranged on one screen in the pixel area and the reconstructed image is encoded. Therefore, there is a problem that a large delay occurs due to the processing load for decoding the encoded image to the pixel region and the processing load for re-encoding the image synthesized on the pixel region.

リアルタイム性が厳しく求められるテレビ会議システムでは、システムを快適に利用するためには片道当たり僅か0.3秒程度の遅延しか許されない。伝送路での遅延がある程度避けられないことを考慮すると、サーバの処理での遅延は極力抑えなければならず、サーバでの大きな遅延は致命的な問題となる。 In a video conference system that requires strict real-time performance, a delay of only 0.3 seconds per one way is allowed in order to use the system comfortably. Considering that the delay in the transmission path is unavoidable to some extent, the delay in the server processing must be suppressed as much as possible, and a large delay in the server becomes a fatal problem.

特許文献４〜６に記載された技術を適用することで個々の符号化済み画像の復号処理と再符号化処理を省き、サーバでの処理負荷を軽減することが可能となる。しかしながら、ここでは符号化情報を利用しての加工処理が近似処理であるため、一旦画素領域にまで復号して処理する従来の方法で得られる画像の画質と比べて得られる画質が劣るという課題が生じる。 By applying the techniques described in Patent Documents 4 to 6, it is possible to omit the decoding process and re-encoding process of each encoded image, and to reduce the processing load on the server. However, since the processing using the encoded information is an approximation process here, the image quality obtained is inferior to the image quality obtained by the conventional method of once decoding and processing to the pixel area. Occurs.

本発明の目的は、前述した従来技術の課題を解決し、変換符号化された複数の動画像を高速かつ高精度に合成して１つの変換符号化された画像を生成することができる映像合成装置を提供することにある。 The object of the present invention is to solve the above-mentioned problems of the prior art, and to synthesize a plurality of transform-coded moving images at high speed and with high precision, and to generate one transform-coded image. To provide an apparatus.

前記した課題を解決するために、本発明は、変換符号化された画像を再構成する映像合成装置において、変換符号化された動画像の符号化情報を部分的に復号し、量子化済み予測誤差情報および動き予測情報を抽出する符号情報抽出部と、 In order to solve the above-described problem, the present invention provides a video synthesizing apparatus that reconstructs a transform-coded image, partially decoding coding information of the transform-coded moving image, and performing quantized prediction. A code information extraction unit for extracting error information and motion prediction information;

前記符号化情報抽出部で抽出された量子化済み予測誤差情報を逆量子化して予測誤差情報を生成する逆量子化部と、前記逆量子化部で生成された予測誤差情報および前記符号情報抽出部で抽出された動き予測情報を用いて符号化領域上で、画素領域上での逆動き補償と結果的に同等の処理を動き補償単位で一括実行し、その実行に際して定数行列となる行列の積の対称性により削減された計算量での計算を行って変換符号化情報を再構成する逆動き補償部と、前記動き補償部で生成された予測誤差情報を量子化して量子化済み予測誤差情報を生成する量子化部と、前記量子化部で生成された量子化済み予測誤差情報および前記動き補償部で生成された動き予測情報を符号化して出力する符号化部とを備え、変換符号化された複数の動画像の全てまたはその一部を１つの変換符号化された画像として出力する点を基本的な特徴としている。 The inverse quantization unit that generates the prediction error information by dequantizing the quantized prediction error information extracted by the encoding information extraction unit, the prediction error information generated by the inverse quantization unit, and the code information extraction As a result, the processing equivalent to the inverse motion compensation on the pixel region is collectively executed in the motion compensation unit on the coding region using the motion prediction information extracted by the unit. An inverse motion compensation unit that reconstructs transform coding information by performing computation with a reduced amount of calculation due to product symmetry, and a quantized prediction error by quantizing the prediction error information generated by the motion compensation unit A quantization unit that generates information, and an encoding unit that encodes and outputs the quantized prediction error information generated by the quantization unit and the motion prediction information generated by the motion compensation unit, All of multiple animated images Or it has as its basic features the point that outputs a part thereof as a single transform coded image.

本発明によれば、複数の符号化済み画像を画素領域まで復号することなく、符号化領域上で合成して１つの符号化済み画像を再構成することができる。これにより画素領域までの復号処理および再符号化処理を省略できるので、画素を操作して合成する方式と比較して処理負荷を大幅に軽減できる。また、合成処理の過程、特に、逆動き補償や動き補償での演算処理を工夫することにより計算量を削減した高速処理が可能になる。さらに、特許文献３〜６に記載されているような符号化領域上で情報を加工処理する方式と比較して得られる画質は高画質であり、画素を操作する方式と同等の画質を得ることができる。 According to the present invention, it is possible to reconstruct a single encoded image by synthesizing it on the encoding region without decoding a plurality of encoded images to the pixel region. As a result, the decoding process up to the pixel region and the re-encoding process can be omitted, so that the processing load can be greatly reduced as compared with the method of synthesizing by manipulating the pixels. Moreover, the synthesis processing of the process, in particular, enables high-speed processing with a reduced amount of calculation by devising the arithmetic processing in the inverse motion compensation and motion compensation. Furthermore, the image quality obtained in comparison with the method of processing information on the coding region as described in Patent Documents 3 to 6 is high image quality, and an image quality equivalent to the method of manipulating pixels is obtained. Can do.

以下、図面を参照して本発明を詳細に説明する。図１は、本発明に係る映像合成装置の実施形態を示すブロック図である。本実施形態の映像合成装置は、符号情報抽出部１、逆量子化部２、逆動き補償部３、合成部４、動き補償部５、量子化部６、および符号化部７を備える。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a video composition apparatus according to the present invention. The video composition apparatus according to the present embodiment includes a code information extraction unit 1, an inverse quantization unit 2, an inverse motion compensation unit 3, a synthesis unit 4, a motion compensation unit 5, a quantization unit 6, and an encoding unit 7.

まず、符号情報抽出部１は、変換符号化によって圧縮された複数の動画像の符号化情報を部分的に復号して動き予測情報（動きベクトル）および量子化済み予測誤差情報（差分ＤＣＴ係数）を抽出する。 First, the code information extraction unit 1 partially decodes coding information of a plurality of moving images compressed by transform coding, thereby predicting motion prediction information (motion vector) and quantized prediction error information (difference DCT coefficient). To extract.

なお、ここでの抽出処理は、入力された複数の動画像全てに対して行う必要はなく、出力する画像に含ませる動画像に対して行えばよい。また、入力された１つの動画像内についても全て対して抽出処理を行う必要はなく、例えば顔部分などの動画像の一部に対して抽出処理を行うようにして出力する画像に該部分のみを含ませるようにすることもできる。 Note that the extraction process here does not have to be performed on all of the plurality of input moving images, and may be performed on the moving images included in the output image. In addition, it is not necessary to perform extraction processing for all the input moving images. For example, only the portion is included in an image to be output by performing extraction processing on a part of the moving image such as a face portion. Can also be included.

符号情報抽出部１で抽出された動きベクトルは、逆動き補償部３および動き補償部５に出力され、量子化済み差分ＤＣＴ係数は、逆量子化部２に出力される。 The motion vector extracted by the code information extraction unit 1 is output to the inverse motion compensation unit 3 and the motion compensation unit 5, and the quantized differential DCT coefficient is output to the inverse quantization unit 2.

逆量子化部２は、符号情報抽出部１で得られた量子化済み差分ＤＣＴ係数を逆量子化する。逆量子化により生成された量子化誤差を含んだ差分ＤＣＴ係数は、逆動き補償部３に出力される。逆量子化部２に、出力する画像に必要な差分ＤＣＴ係数を選択して出力させる機能を持たせることもできる。 The inverse quantization unit 2 performs inverse quantization on the quantized differential DCT coefficient obtained by the code information extraction unit 1. The differential DCT coefficient including the quantization error generated by the inverse quantization is output to the inverse motion compensation unit 3. The inverse quantization unit 2 can be provided with a function of selecting and outputting a differential DCT coefficient necessary for an output image.

逆動き補償部３はフレームメモリを含み、フレームメモリから得られる１フレーム前の参照ＤＣＴ係数と符号情報抽出部１で抽出された動きベクトルと逆量子化部２で得られた処理対象フレームの差分ＤＣＴ係数（８ｘ８ＤＣＴ係数）とを用いてＤＣＴ係数領域上で逆動き補償し、処理対象フレームの完全なＤＣＴ係数を導出する。 The inverse motion compensation unit 3 includes a frame memory, and the difference between the reference DCT coefficient one frame before obtained from the frame memory, the motion vector extracted by the code information extraction unit 1, and the processing target frame obtained by the inverse quantization unit 2 Inverse motion compensation is performed on the DCT coefficient region using the DCT coefficient (8 × 8 DCT coefficient) to derive a complete DCT coefficient of the processing target frame.

逆動き補償部３で導出された、複数の画像についてのＤＣＴ係数は合成部４に入力される。合成部４は、入力されたＤＣＴ係数を１つのフレームとして合成する。なお、出力する画像に必要なＤＣＴ係数をここで選択して合成するようにすることもできる。 DCT coefficients for a plurality of images derived by the inverse motion compensation unit 3 are input to the synthesis unit 4. The synthesizer 4 synthesizes the input DCT coefficients as one frame. Note that DCT coefficients necessary for the output image can be selected and combined here.

動き補償部５は、符号情報抽出部１で得られた動きベクトルを、出力する画像の動きベクトルとして再利用するとともに、合成部４で得られたＤＣＴ係数に対して動き補償を行う。ここでのＤＣＴ係数領域上における動き補償は、逆動き補償部３での処理の逆の操作である。動き補償部５で得られる動きベクトルは符号化部７へ出力され、動き補償されたＤＣＴ係数は量子化部６へ出力される。 The motion compensation unit 5 reuses the motion vector obtained by the code information extraction unit 1 as a motion vector of an output image and performs motion compensation on the DCT coefficient obtained by the synthesis unit 4. The motion compensation on the DCT coefficient region here is an operation reverse to the processing in the inverse motion compensation unit 3. The motion vector obtained by the motion compensation unit 5 is output to the encoding unit 7, and the motion compensated DCT coefficient is output to the quantization unit 6.

量子化部６は、動き補償部５で得られたＤＣＴ係数を量子化し、量子化されたＤＣＴ係数を符号化部７へ出力する。符号化部７は、量子化部６で得られた量子化済みＤＣＴ係数および動き補償部５で得られた動きベクトルを符号化して出力する。 The quantization unit 6 quantizes the DCT coefficient obtained by the motion compensation unit 5 and outputs the quantized DCT coefficient to the encoding unit 7. The encoding unit 7 encodes the quantized DCT coefficient obtained by the quantization unit 6 and the motion vector obtained by the motion compensation unit 5 and outputs the result.

次に、逆動き補償部３における処理の具体例について説明する。図２は、逆量子化部２での逆量子化により生成された差分８ｘ８ＤＣＴ係数の４組すなわちマクロブロックと、動きベクトルに基づいて参照される前フレームの参照ＤＣＴ係数との関係を示す。なお、Ｒ００〜Ｒ２２はそれぞれ、８ｘ８ＤＣＴ係数のブロックである。 Next, a specific example of processing in the inverse motion compensation unit 3 will be described. FIG. 2 shows a relationship between four sets of difference 8 × 8 DCT coefficients generated by inverse quantization in the inverse quantization unit 2, that is, macroblocks, and reference DCT coefficients of a previous frame referred based on a motion vector. . R00 to R22 are blocks of 8 x 8 DCT coefficients.

逆動き補償する場合、マクロブロック１つ分の４組の差分８ｘ８ＤＣＴ係数は、最大で前フレームの９組の８ｘ８ＤＣＴ係数Ｒ００〜Ｒ２２にまたがることがあるので、以下では、ＤＣＴ係数領域上で動きベクトルと９組の参照８ｘ８ＤＣＴ係数から４組の差分８ｘ８ＤＣＴ係数を逆動き補償する場合を例として説明する。 When reverse motion compensation is performed, four sets of difference 8 x 8 DCT coefficients for one macroblock may straddle up to nine sets of 8 x 8 DCT coefficients R00 to R22 in the previous frame. An example will be described above in which four sets of differential 8 x 8 DCT coefficients are inversely motion compensated from a motion vector and nine sets of reference 8 x 8 DCT coefficients.

まず、動きベクトルの水平成分，垂直成分ともに０を含んだ８の倍数である場合には、１つのブロック（ＤＣＴ係数ブロック）は前フレームのブロックと全く重なり、複数のブロックにまたがらないので、該当する場所の参照ＤＣＴ係数を取得して処理を終了する。それ以外の場合、複数のブロックにまたがるのでＤＣＴ係数をそれらのブロックから取得して再構成する必要がある。 First, when the horizontal and vertical components of the motion vector are multiples of 8 including 0, one block (DCT coefficient block) overlaps with the block of the previous frame and does not span a plurality of blocks. The reference DCT coefficient of the corresponding place is acquired, and the process ends. In other cases, since it extends over a plurality of blocks, it is necessary to obtain and reconstruct DCT coefficients from those blocks.

動きベクトルの水平成分，垂直成分ともに８の倍数でない場合、動きベクトルが指す参照領域を含む９組の参照８ｘ８ＤＣＴ係数Ｒｍ，ｎを(1)式のように１つの２４ｘ２４行列Ｒとして表す。 If both the horizontal and vertical components of the motion vector are not a multiple of 8, nine sets of reference 8 x 8 DCT coefficients Rm, n including the reference region pointed to by the motion vector are set as one 24 x 24 matrix R as shown in equation (1). To express.

ここで、４組の差分８ｘ８ＤＣＴ係数Ｄに対しての画素領域（ベースバンド）上での逆動き補償を行列形式で表現すると、動きベクトル（ｕ，ｖ）に対する８ｘ８ＤＣＴ係数行列Ｙは(2)式で表される。なお、「０」は全てが０の行列を表している。 Here, when inverse motion compensation on the pixel region (baseband) for the four sets of differences 8 × 8DCT coefficients D is expressed in matrix form, the 8 × 8DCT coefficient matrix Y for the motion vector (u, v) is It is expressed by equation (2). Note that “0” represents a matrix of all zeros.

ここで、Ｔｎは、ｎ次のＤＣＴ変換行列を表し、演算子ｔは、転置操作を表す。また、ｉ，ｊはそれぞれ、動きベクトルの成分ｕ，ｖをＤＣＴ係数の次数で割った余りを表す。(2)式は次数が８であるため、ｉ，ｊはそれぞれ１から７を取り得る。行列Ｖｉ，Ｈｊはそれぞれ、１６ｘ１６要素を抽出するために入力された動きベクトルに応じて変化する１６ｘ２４、２４ｘ１６のシフト行列であり、動きベクトルが整数画素単位の場合には(3),(4)式で表すことができる。なお、行列Ｖｉ，Ｈｊを個別に用いることは、動きベクトルの成分ごとに参照ＤＣＴ係数の領域を抽出することに相当する。 Here, Tn represents an n-th order DCT transformation matrix, and the operator t represents a transposition operation. In addition, i and j respectively represent the remainders obtained by dividing the motion vector components u and v by the order of the DCT coefficient. Since the degree of the expression (2) is 8, i and j can take 1 to 7, respectively. The matrices Vi and Hj are 16 x 24 and 24 x 16 shift matrices that change according to the input motion vector to extract 16 x 16 elements. When the motion vector is an integer pixel unit, It can be expressed by equations 3) and (4). Note that using the matrices Vi and Hj individually corresponds to extracting a region of the reference DCT coefficient for each motion vector component.

ここで、Ｅｎは、ｎｘｎの単位行列を表す。また、０ｉ，０ｊはそれぞれ、１６ｘｉの０行列、ｊｘ１６の０行列を表す。つまり、１６ｘ２４行列Ｖｉ内の１６ｘ１６単位行列Ｅ１６は動きベクトルの水平成分の８による剰余で前後する。同じく、２４ｘ１６行列Ｈｊ内の１６ｘ１６単位行列Ｅ１６は動きベクトルの垂直成分の８による剰余で上下する。いずれも、剰余が０でない７通りを取り得るので、予め計算して保持しておくことができる。 Here, En represents an n × n unit matrix. Also, 0i and 0j represent a 16 x i 0 matrix and a j x 16 0 matrix, respectively. That is, the 16 × 16 unit matrix E16 in the 16 × 24 matrix Vi is preceded and followed by a remainder of 8 of the horizontal component of the motion vector. Similarly, the 16 x 16 unit matrix E16 in the 24 x 16 matrix Hj goes up and down with a remainder of 8 of the vertical component of the motion vector. In any case, since there are seven ways in which the remainder is not 0, it is possible to calculate and hold in advance.

また、動きベクトルが半画素単位の場合、行列Ｖ_ｊ，Ｈ_ｉはそれぞれ、(5),(6)式で表すことができ、さらに動きベクトルが１／４画素単位の場合など、動きベクトルの精度に応じて拡張することができる。 When the motion vector is a half pixel unit, the matrices V _j and H _i can be expressed by equations (5) and (6), respectively, and when the motion vector is a 1/4 pixel unit, the motion vector It can be expanded according to accuracy.

ここで、(5)式における０と１／２と１とからなる半画素生成行列は１６ｘ２４行列であり、(6)式における０と１／２と１とからなる半画素生成行列は２４ｘ１６行列である。いずれの場合でも(2)式のＤＣＴ係数行列Ｒ以外はすべて定数行列であるので、それらを予め計算して格納しておくことにより演算回数を削減することができる。 Here, the half-pixel generation matrix composed of 0, 1/2, and 1 in the equation (5) is a 16 × 24 matrix, and the half-pixel generation matrix composed of 0, 1/2, and 1 in the equation (6) is 24 × 16 matrix. In any case, all but the DCT coefficient matrix R in the equation (2) are constant matrices, and the number of operations can be reduced by calculating and storing them in advance.

さらに、(2)式の定数行列の積を求めると、(7)式のように対称性が現れる。この対称性を勘案して逆動き補償での計算量を削減することができる。ここで、Ａｉ，Ａ′ｉ，Ａｊ，Ａ′ｊは８ｘ８行列を表す。Ａｉ，Ａ′ｉを水平成分の変換テーブルとし、Ａｊ，Ａ′ｊを垂直成分の変換テーブルとして予め計算して格納しておくことができ、これらの変換テーブルは、動きベクトルの精度に応じて変わる。 Furthermore, when the product of the constant matrix of equation (2) is obtained, symmetry appears as in equation (7). Considering this symmetry, the amount of calculation in the backward motion compensation can be reduced. Here, Ai, A′i, Aj, and A′j represent an 8 × 8 matrix. Ai and A′i can be calculated and stored in advance as horizontal component conversion tables and Aj and A′j as vertical component conversion tables. These conversion tables are stored in accordance with the accuracy of the motion vector. change.

上記(2)式に上記(7)式を適用すると、ＤＣＴ係数領域で逆動き補償されたＤＣＴ係数行列Ｙは、下記(8)式で表される。 When the above equation (7) is applied to the above equation (2), the DCT coefficient matrix Y subjected to inverse motion compensation in the DCT coefficient region is expressed by the following equation (8).

(1)式をＲに適用して(8)式を展開すると、(9)式が導出される。(9)式の括弧内の式には、(10),(11)式で表される共通する部分があり、(10)および(11)式を予め計算して保持させておくことにより計算量を削減できる。また、解像度変換行列は行列Ａ_ｉ，Ａ′_ｉ，Ａ_ｊ，Ａ′_ｊだけであるので、その格納に必要なメモリの量を小さく抑えることができる。 When equation (1) is applied to R and equation (8) is expanded, equation (9) is derived. The formulas in parentheses in formula (9) have a common part represented by formulas (10) and (11), and are calculated by precalculating and holding formulas (10) and (11). The amount can be reduced. Further, since the resolution conversion matrix is only the matrix A _i , A ′ _i , A _j , A ′ _j , the amount of memory required for the storage can be reduced.

また、一般に部分行列ＡとＡ′（Ａ_ｉとＡ′_ｉあるいはＡ_ｊとＡ′_ｊ）の間には密接な関係があり、両者の部分要素の対称性を勘案した計算を行うことにより計算量を削減できる。例えばｉ＝１，ｉ＝２の場合、すなわちＡ_１とＡ′_１、Ａ_２とＡ′_２の関係を図３（ａ），（ｂ）に示す。同図ではＡ_ｉとＡ′_ｉで要素が完全に一致する箇所を“＋”、正負符号が反転する箇所を“−”、全く異なる箇所を“ ”（無印）、要素が０の箇所を“０”で表している。 In general, there is a close relationship between the sub-matrices A and A ′ (A _i and A ′ _i or A _j and A ′ _j ), and the calculation is performed by considering the symmetry of the sub-elements of both. The amount can be reduced. For example, when i = 1 and i = 2, the relationship between A ₁ and A ′ ₁ and A ₂ and A ′ ₂ is shown in FIGS. In the figure, “+” is a place where the elements completely match between A _i and A ′ _i , “−” is a place where the sign is reversed, “” (no mark) is a completely different place, and “0” is the place where the element is 0 0 ”.

ここで、Ａ_ｉとＡ′_ｉで全く一致する要素だけを抽出した行列をＡ'_ｉ、絶対値は等しいが正負符号が逆の要素だけを抽出した行列をＡ"_ｉと定義し、Ａ_ｉの残りの要素からなる行列を‘Ａ_ｉ、Ａ′_ｉの残りの要素からなる行列を“Ａ_ｉと定義すると、行列Ａ_ｉと行列Ａ′_ｉは(12)式に書き直すことができる。(12)式はＡ'_ｉ、Ａ"_ｉ、‘Ａ_ｉ、“Ａ_ｉの対称性に応じて行列Ａ_ｉと行列Ａ′_ｉを複数に分解して計算するものであり、計算量を削減できる。 Here, 'a matrix obtained by extracting only the elements that match exactly the _i A' A _i and A _i, although the absolute value equal to define a matrix sign is extracted by an inverse element A _"i and, A _i the remaining elements consist of matrix 'a _i, a' matrix consists remaining elements of _i "when defined as a _i, the matrix a _i and the matrix a _'i can be rewritten as (12). Equation (12) calculates A ′ _i , A ” _i , 'A _i , and“ A _i by decomposing the matrix A _i and the matrix A ′ _i into a plurality according to the symmetry of A _i. it can.

(10)式に(12)式を代入すると(13)式が導出される。 Substituting (12) into (10) yields (13).

(13)式を一見すると、もともと行列積が２回であったのが４回に増えているが、Ａ_ｉ’、Ａ"_ｉ、‘Ａ_ｉおよび“Ａ_ｉはいずれも疎行列（０要素を多く含む行列）であるので計算量は減少する。例えばｉ＝１のとき、Ａ_ｉとＡ′_ｉで完全に一致する要素だけから構成される行列Ａ'_ｉの非０要素は１２個であり、残る５２個は０である。同様に、行列Ａ"_ｉ、‘Ａ_ｉ、“Ａ_ｉの非０要素はそれぞれ２１個、３１個、３１個でしかない。このとき、(10)式の計算に必要な乗算回数は１０２４回であるのに対し、(13)式の計算に必要な乗算回数は７６０回であり、約２５％の計算量を削減できる。上記では(10)式の計算の例を説明したが、(9)式や(11)式の計算についても同様にして計算量を削減できる。このように(13)式によれば行列Ａ_ｉと行列Ａ′_ｉとの間のＡ'_ｉ、Ａ"_ｉ、‘Ａ_ｉ、“Ａ_ｉの対称性を利用し、共通項をまとめた計算が可能になり、計算量を削減できる。 At first glance, equation (13) shows that the matrix product was originally increased from 2 to 4, but A _i ', A " _i ,' A _i, and 'A _i are all sparse matrices (zero elements). Since this is a matrix containing a large number of For example, when i = 1, there are 12 non-zero elements in the matrix A ′ _i composed only of elements that completely match A _i and A ′ _i , and the remaining 52 are 0. Similarly, the matrix A " _i , 'A _i ," A _i has only non-zero elements of 21, 31, and 31, respectively. At this time, the number of multiplications necessary for the calculation of the expression (10) is 1024, whereas the number of multiplications required for the calculation of the expression (13) is 760, and the calculation amount can be reduced by about 25%. In the above description, the calculation example of the expression (10) has been described, but the calculation amount can be reduced in the same way for the calculation of the expression (9) and the expression (11). Thus, according to the equation (13), the calculation of the common terms by using the symmetry of A ′ _i , A ” _i , 'A _i ,“ A _i between the matrix A _i and the matrix A ′ _i. Can be reduced and the amount of calculation can be reduced.

また、図３（ａ），（ｂ）に示すようにｉによって対称性の分布が異なり、ｉが奇数より偶数、単に偶数より４の倍数のほうが絶対値が一致する要素が多く、計算量の削減効果が大きい。よって、ｉとｊとが異なる数値を持つ場合、２の因数が多い成分から抽出処理を行うようにすれば全体の計算量を削減できる。 Further, as shown in FIGS. 3A and 3B, the distribution of symmetry differs depending on i, and i has an even number more than an odd number. Reduction effect is great. Therefore, when i and j have different numerical values, the entire calculation amount can be reduced by performing extraction processing from components having a large factor of 2.

上記処理により計算量を削減できるが、さらに上記処理は画素領域上での逆動き補償と等価であり、結果的に同等の処理であるため、この処理を経ても画素領域まで復号して逆動き補償を行う方式と同等の画質を保つことができる。逆に、画素領域上で逆動き補償では逆ＤＣＴ変換、逆動き補償などの処理ごとにその結果を整数に丸めて格納する必要があり、幾度もの丸め込み操作により僅かながら画質が劣化するが、上記処理は一括して行われて丸め込み操作が少ないので、画質劣化は最小限に抑えられる。 Although the amount of calculation can be reduced by the above process, the above process is equivalent to the inverse motion compensation on the pixel area, and as a result, it is an equivalent process. The image quality equivalent to the compensation method can be maintained. On the contrary, in the inverse motion compensation on the pixel area, it is necessary to store the result rounded to an integer for each processing such as inverse DCT transform and inverse motion compensation. Since processing is performed in a lump and there are few rounding operations, image quality degradation is minimized.

また、動きベクトルの水平成分、垂直成分のどちらか一方が８の倍数で、他方が８の倍数でない場合には、８の倍数の成分に関しては処理を省略できる。例えば水平成分が８の倍数の場合には行列Ｒを２４ｘ１６行列に設定し、逆に垂直成分が８の倍数の場合には行列Ｒを１６ｘ２４行列に設定することで演算量を削減できる。 Further, when one of the horizontal component and the vertical component of the motion vector is a multiple of 8 and the other is not a multiple of 8, the processing can be omitted for the multiple of 8 components. For example, when the horizontal component is a multiple of 8, the matrix R is set to 24 × 16 matrix, and conversely, when the vertical component is a multiple of 8, the matrix R is set to 16 × 24 matrix to reduce the amount of calculation. it can.

動き補償部５での処理、すなわちＤＣＴ係数領域上における動き補償は変換テーブルなどを利用し、出力する画像に応じて上記の逆動き補償と逆の操作を行うことにより実現できる。 The processing in the motion compensation unit 5, that is, motion compensation on the DCT coefficient region, can be realized by using a conversion table or the like and performing an operation reverse to the above-described inverse motion compensation in accordance with the output image.

以上、実施形態について説明したが、本発明は上記実施形態に限定されず、種々に変更できる。例えば、合成部４での合成の際に、出力する画像のフレームサイズや入力された複数の画像に対するフレームサイズの構成比率などによって、８ｘ８ＤＣＴ係数の低域成分を抽出して縮小したり、８ｘ８ＤＣＴ係数の高域成分に０を挿入して拡大したりすることも可能である。 As mentioned above, although embodiment was described, this invention is not limited to the said embodiment, It can change variously. For example, when the composition unit 4 performs composition, the low-frequency component of the 8 × 8 DCT coefficient is extracted and reduced according to the frame size of the output image, the composition ratio of the frame size with respect to a plurality of input images, or the like. It is also possible to enlarge by inserting 0 in the high frequency component of the 8 × 8 DCT coefficient.

なお、このように入力された画像のフレームサイズに何らかの拡大縮小処理を施した場合には、動きベクトルおよびＤＣＴ係数もそれに一致させる必要がある。例えば縮小処理を施した場合には、個々の符号化単位ごとに複数の動きベクトルから１つの動きベクトルを推定し、低次のＤＣＴ係数に対してＤＣＴ係数領域上で動き補償を行う。この際の動きベクトルの推定は、符号化単位が占める領域にある動きベクトルの中央値、符号化単位のアクティビティなどを重みとした重み付き平均、あるいは符号化単位が占める面積を重みとした重み付き平均などを求めることで実現できる。 In addition, when any enlargement / reduction processing is performed on the frame size of the input image as described above, the motion vector and the DCT coefficient must be matched with each other. For example, when reduction processing is performed, one motion vector is estimated from a plurality of motion vectors for each coding unit, and motion compensation is performed on the DCT coefficient region for low-order DCT coefficients. In this case, motion vector estimation is based on the weighted average weighted by the median value of the motion vector in the area occupied by the coding unit, the activity of the coding unit, or the area occupied by the coding unit. This can be achieved by calculating the average.

また、縮小処理が施された場合には低次のＤＣＴ係数だけが抽出されているので、動き補償部５での処理に使用するＤＣＴ変換行列Ｔ_ｎの次数ｎを、縮小率に応じた次数に置き換える必要がある。例えばフレームサイズを１／２に縮小する場合、４次のＤＣＴ変換行列Ｔ_４を用いる。このときシフト行列は縦横それぞれ１，２，３の３種類ずつ保持するだけで十分である。また、動き補償後のＤＣＴ係数に対しては低次のＤＣＴ係数から、出力する画像に用いられる次数のＤＣＴ係数に変換する。この変換には、先の出願（特願2003-372223号）に記載した基底変換の技術を利用できる。 Further, since only the low-order DCT coefficients are extracted when the reduction process is performed, the order n of the DCT transformation matrix T _n used for the process in the motion compensation unit 5 is set to the order corresponding to the reduction rate. It is necessary to replace with. For example, when the frame size is reduced to ½, a fourth-order DCT transformation matrix T ₄ is used. At this time, it is sufficient to hold only three types of shift matrices of 1, 2, and 3 in the vertical and horizontal directions. Also, the DCT coefficient after motion compensation is converted from a low-order DCT coefficient to a DCT coefficient of the order used for the output image. For this conversion, the base conversion technique described in the previous application (Japanese Patent Application No. 2003-372223) can be used.

本発明は、ＤＣＴ係数に限られず他の変換符号化方式で変換された変換符号化情報についても同様に適用可能であり、符号化領域上で高速かつ高精度に画像を合成することを可能にするので、多地点テレビ会議システムなどの複数の画像を加工処理して配信するシステムへ有効に適用できる。 The present invention is not limited to DCT coefficients and can be similarly applied to transform coding information transformed by other transform coding schemes, and can synthesize an image at high speed and with high accuracy on a coding region. Therefore, the present invention can be effectively applied to a system that processes and distributes a plurality of images such as a multipoint video conference system.

本発明に係る映像合成装置の実施形態を示すブロック図である。1 is a block diagram showing an embodiment of a video composition device according to the present invention. 逆動き補償処理に用いられる差分ＤＣＴ係数と参照ＤＣＴ係数の関係を示す説明図である。It is explanatory drawing which shows the relationship between the difference DCT coefficient used for a reverse motion compensation process, and a reference DCT coefficient. 逆動き補償処理での部分行列の説明図である。It is explanatory drawing of the partial matrix in a reverse motion compensation process.

Explanation of symbols

１・・・符号化情報抽出部、２・・・逆量子化部、３・・・逆動き補償部、４・・・合成部、５・・・動き補償部、６・・・量子化部、７・・・符号化部 DESCRIPTION OF SYMBOLS 1 ... Encoding information extraction part, 2 ... Inverse quantization part, 3 ... Inverse motion compensation part, 4 ... Synthesis | combination part, 5 ... Motion compensation part, 6 ... Quantization part , 7 ... Encoding unit

Claims

In a video composition device for reconstructing a transform-coded image,
A code information extraction unit that partially decodes coding information of the transform-coded moving image and extracts quantized prediction error information and motion prediction information;
A dequantization unit that dequantizes the quantized prediction error information extracted by the encoded information extraction unit to generate prediction error information;
Using the prediction error information generated by the inverse quantization unit and the motion prediction information extracted by the code information extraction unit, a process equivalent to inverse motion compensation on the pixel region is performed as a result on the encoding region. An inverse motion compensator that performs batch execution in motion compensation units, performs calculation with a calculation amount reduced by symmetry of a matrix product that becomes a constant matrix at the time of execution, and reconstructs transform coding information;
A combining unit that combines a plurality of transform coding information reconstructed by the inverse motion compensation unit into one transform coding information on a coding region;
Motion compensation for generating prediction error information and motion prediction information by performing motion compensation on a coding region using the transform coding information synthesized by the synthesis unit and the motion prediction information extracted by the coding information extraction unit And
A quantization unit that quantizes the prediction error information generated by the motion compensation unit to generate quantized prediction error information;
An encoding unit that encodes and outputs the quantized prediction error information generated by the quantization unit and the motion prediction information generated by the motion compensation unit;
A video synthesizing apparatus that outputs all or part of a plurality of transform-coded moving images as one transform-encoded image.

The code information extraction unit partially decodes all or a part of the encoded information according to the output image from the transform-coded moving image, and obtains the quantized prediction error information and the motion prediction information. The video composition device according to claim 1, wherein the video composition device is extracted.

The video synthesizing apparatus according to claim 1, wherein the inverse quantization unit performs inverse quantization on prediction error information necessary for an output image.

The video synthesizing apparatus according to claim 1, wherein the inverse motion compensation unit reconstructs transform coding information on a coding area from motion prediction information, prediction error information, and transform coding information to be referred to.

5. The video composition apparatus according to claim 4, wherein the inverse motion compensation unit uses a different conversion table for each component of motion prediction information and for each numerical value that motion prediction information can take.

The video synthesizing apparatus according to claim 5 , wherein the inverse motion compensation unit extracts a region of transform coding information to be referred to for each component of motion prediction information.

The inverse motion compensation section, according to claim 6, characterized in that upon extraction of the region of the transform coding information to be referred to each component of the motion prediction information, the extraction from the components of a factor of 2 is large motion prediction information Video synthesizer.

6. The video composition apparatus according to claim 5 , wherein the conversion table used in the inverse motion compensation unit can be expanded according to the accuracy of motion prediction information.

The video composition apparatus according to claim 5 , wherein the conversion table used in the inverse motion compensation unit is calculated and stored in advance.

6. The video according to claim 5 , wherein the inverse motion compensation unit is configured to reduce an overall calculation amount in the inverse motion compensation by calculating in advance a common term in a calculation process. Synthesizer.

The inverse motion compensation section, according to claim 5, characterized in that the calculation amount of the entire is configured to be reduced by calculating in consideration of the local symmetry in subelements conversion table Video synthesizer.

12. The video composition apparatus according to claim 11 , wherein the inverse motion compensation unit decomposes and uses the conversion table according to local symmetry in the subelements of the conversion table.

The inverse motion compensation unit is configured to reduce the number of multiplications by using a local symmetry in a subelement of a conversion table and calculating a common term collectively. 12. The video composition device according to 12 .

The video according to claim 1, wherein the motion compensation unit performs motion compensation on a coding region from motion prediction information, transform coding information, and reference transform coding information to generate prediction error information. Synthesizer.

The video synthesis apparatus according to claim 14 , wherein the motion compensation unit performs inverse operation of inverse motion compensation.

The video synthesizing apparatus according to claim 14 , wherein the motion compensation unit uses a conversion table for motion compensation processing and reconstructs motion prediction information according to an output image.

The video synthesizing apparatus according to claim 1, wherein the synthesizing unit synthesizes prediction error information necessary for an output image.