JP2011109350A

JP2011109350A - Stereoscopic video encoder

Info

Publication number: JP2011109350A
Application number: JP2009261487A
Authority: JP
Inventors: Shigeki Mochizuki; 成記望月
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-11-17
Filing date: 2009-11-17
Publication date: 2011-06-02

Abstract

<P>PROBLEM TO BE SOLVED: To alleviate concentration of accesses on a shared memory for encoding. <P>SOLUTION: A left eye image encoding unit (101) encodes a left eye video signal by any one of in-picture encoding, inter-picture forward direction prediction encoding, and inter-picture bidirectional prediction encoding. A right eye image encoding unit (102) encodes a right eye video signal by any one of in-picture encoding, inter-picture forward direction prediction encoding, and inter-picture bidirectional prediction encoding. The left eye image encoding unit (101) and the right eye image encoding unit (102) share a memory 104 via a memory I/F 103. A picture type control unit (105) controls picture types of encoding in the left eye image encoding unit (101) and the right eye image encoding unit (102) so that timing of inter-picture bidirectional prediction encoding does not become simultaneous. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、立体映像符号化装置に関する。 The present invention relates to a stereoscopic video encoding apparatus.

特開２００２−２５２８５９号公報JP 2002-252859 A

従来、２眼式立体映像テレビにおいては、２台のカメラにより異なる２方向から撮像された左眼用画像と右眼用画像を生成し、これを同一画面に合成して表示することで立体映像として表示する。このような立体映像を記録する場合、左眼用画像と右眼用画像をそれぞれ記録するのでは、従来の２倍のデータ量となってしまう。 2. Description of the Related Art Conventionally, in a twin-lens stereoscopic video TV, a left-eye image and a right-eye image captured from two different directions by two cameras are generated, and these are combined and displayed on the same screen to display a stereoscopic video. Display as. When such a stereoscopic video is recorded, if the left-eye image and the right-eye image are recorded respectively, the data amount is twice as much as that of the conventional art.

全体のデータ量を削減する方法として、副画像（例えば、右眼用画像）を主画像（例えば、左眼用画像）との視点間の冗長度を利用した視差補償予測を用いる圧縮符号化を行うことが提案されている（特許文献１）。主画像は、動き補償予測で符号化される。副画像には、視差補償予測と動き補償予測の両方を適用可能であり、視差補償予測と動き補償予測の予測精度の評価値を算出し、その評価値に応じて、視差補償予測か動き補償予測かを決定する。 As a method of reducing the entire data amount, compression coding using parallax compensation prediction using redundancy between viewpoints between a sub-image (for example, a right-eye image) and a main image (for example, a left-eye image) is performed. It has been proposed to do this (Patent Document 1). The main image is encoded by motion compensation prediction. Both the parallax compensation prediction and the motion compensation prediction can be applied to the sub-image, and an evaluation value of the prediction accuracy of the parallax compensation prediction and the motion compensation prediction is calculated, and the parallax compensation prediction or the motion compensation is calculated according to the evaluation value. Decide whether to predict.

２眼式立体映像の左眼用画像と右眼用画像を符号化する場合、従来の２倍に相当する処理を必要とする。各画像の符号化に伴う参照用画像を格納するメモリを共有する場合、そのメモリに必要とされる伝送帯域幅も高くなる。左眼用画像と右眼用画像で画面間双方向予測符号化を同時に処理する場合が最もメモリアクセスが集中し、それに応じた周波数の高いクロックでメモリアクセスを行う必要がある。これは、消費電力の観点からも好ましくない。 When encoding a left-eye image and a right-eye image of a binocular stereoscopic video, a process corresponding to twice the conventional method is required. When a memory for storing a reference image accompanying the encoding of each image is shared, the transmission bandwidth required for the memory also increases. When the inter-screen bi-directional predictive encoding is simultaneously processed for the left-eye image and the right-eye image, the memory access is most concentrated, and it is necessary to perform the memory access with a clock having a high frequency corresponding to the memory access. This is not preferable from the viewpoint of power consumption.

本発明は、このような不都合を緩和する立体映像符号化装置を提示することを目的とする。 It is an object of the present invention to provide a stereoscopic video encoding device that alleviates such inconvenience.

本発明に係る立体映像符号化装置は、左眼用映像信号と右眼用映像信号を符号化する立体映像符号化装置であって、左眼用映像信号を符号化する左眼用画像符号化手段と、右眼用映像信号を符号化する右眼用画像符号化手段と、前記左眼用画像符号化手段及び前記右眼用画像符号化手段により共有され、符号化に伴うデータを記憶するメモリ手段と、前記左眼用画像符号化手段及び前記右眼用画像符号化手段における符号化のピクチャタイプを、画面内符号化、画面間順方向予測符号化及び画面間双方向予測符号化の何れかに制御するピクチャタイプ制御手段とを有し、前記ピクチャタイプ制御手段は、前記左眼用画像符号化手段及び前記右眼用画像符号化手段において画面間双方向予測符号化のタイミングが同時にならないように前記ピクチャタイプを制御することを特徴とする。 A stereoscopic video encoding apparatus according to the present invention is a stereoscopic video encoding apparatus that encodes a left-eye video signal and a right-eye video signal, and that encodes a left-eye video signal. Means, a right-eye image encoding unit that encodes a right-eye video signal, a left-eye image encoding unit, and a right-eye image encoding unit, and stores data associated with the encoding. The picture type of the encoding in the memory means and the left-eye image encoding means and the right-eye image encoding means is set to intra-screen encoding, inter-screen forward prediction encoding, and inter-screen bidirectional prediction encoding. Picture type control means for controlling the picture type control means, wherein the picture type control means simultaneously performs inter-screen bi-directional predictive encoding timing in the left-eye image encoding means and the right-eye image encoding means. So that it does not become And controlling the Yataipu.

本発明によれば、左眼用画像符号化と右眼用画像符号化において画面間双方向予測符号化が同時にならないようにピクチャタイプを制御することで、メモリ手段へのアクセスを緩和出来る。例えば、より低周波数のクロックでメモリを駆動することが可能となり、符号化における消費電力の低減が期待できる。 According to the present invention, access to the memory means can be eased by controlling the picture type so that inter-screen bi-directional predictive encoding is not performed simultaneously in left-eye image encoding and right-eye image encoding. For example, it becomes possible to drive the memory with a clock having a lower frequency, and a reduction in power consumption in encoding can be expected.

本発明の第１実施例である立体映像符号化装置の概略構成ブロック図である。1 is a block diagram of a schematic configuration of a stereoscopic video encoding apparatus according to a first embodiment of the present invention. 左眼用画像符号化部（及び右眼用画像符号化部）の概略構成ブロック図である。It is a schematic block diagram of a left-eye image encoding unit (and a right-eye image encoding unit). 左眼用画像及び右眼用画像のＧＯＰ構造を示す図である。It is a figure which shows the GOP structure of the image for left eyes, and the image for right eyes. 本発明の第２実施例の概略構成ブロック図である。It is a schematic block diagram of the second embodiment of the present invention. 第２実施例での左眼用画像及び右眼用画像の符号化タイミング例を示す図である。It is a figure which shows the example of an encoding timing of the image for left eyes and the image for right eyes in 2nd Example.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の一実施例である立体映像符号化装置の概略構成ブロック図を示す。左眼用映像信号を符号化する左眼用画像符号化部１０１と、右眼用映像信号を符号化する右眼用画像符号化部１０２は、メモリＩ／Ｆ１０３を介してメモリ１０４を共有している。メモリ１０４には符号化に伴うデータが一時的に記憶される。メモリＩ／Ｆ１０３は、左眼用画像符号化部１０１及び右眼用画像符号化部１０２からのメモリアクセスを調停してメモリ１０４への読み出し及び書き込みを行う。左眼用画像符号化部１０１と右眼用画像符号化部１０２は、符号化におけるピクチャタイプを制御するピクチャタイプ制御部１０５からの指示に従うピクチャタイプで、それぞれ左眼用画像及び右眼用画像を符号化する。本実施例では、ピクチャタイプは、画面内符号化ピクチャ（Ｉピクチャ）、画面間順方向予測符号化ピクチャ（Ｐピクチャ）及び画面間双方向予測符号化（Ｂピクチャ）の３種類からなる。 FIG. 1 is a block diagram showing a schematic configuration of a stereoscopic video encoding apparatus according to an embodiment of the present invention. The left-eye image encoding unit 101 that encodes the left-eye video signal and the right-eye image encoding unit 102 that encodes the right-eye video signal share the memory 104 via the memory I / F 103. ing. Data accompanying encoding is temporarily stored in the memory 104. The memory I / F 103 mediates memory access from the left-eye image encoding unit 101 and the right-eye image encoding unit 102 and performs reading and writing to the memory 104. The left-eye image encoding unit 101 and the right-eye image encoding unit 102 are picture types in accordance with instructions from the picture type control unit 105 that controls the picture type in encoding, and the left-eye image and the right-eye image, respectively. Is encoded. In this embodiment, there are three types of picture types: an intra-picture coded picture (I picture), an inter-picture forward prediction coding picture (P picture), and an inter-picture bi-directional prediction coding (B picture).

左眼用画像符号化部１０１と右眼用画像符号化部１０２は、ＭＰＥＧ−４ＡＶＣ（ISO/IEC 14496-10）方式に対応し、同じ構成からなる。図２は、左眼用画像符号化部１０１及び右眼用画像符号化部１０２の概略構成ブロック図を示す。図２に示す画像符号化部は、符号化対象画面を１６×１６画素ブロックに分割したマクロブロック単位で処理する。 The left-eye image encoding unit 101 and the right-eye image encoding unit 102 correspond to the MPEG-4 AVC (ISO / IEC 14496-10) system and have the same configuration. FIG. 2 shows a schematic block diagram of the left-eye image encoding unit 101 and the right-eye image encoding unit 102. The image encoding unit illustrated in FIG. 2 performs processing in units of macroblocks obtained by dividing the encoding target screen into 16 × 16 pixel blocks.

予測方法決定部２０１は、ピクチャタイプ制御部１０５から指示されたピクチャタイプに応じた符号化対象画面内の各マクロブロックに対する予測方法を決定する。具体的には、予測方法決定部２０１は、入力映像信号とメモリ１０４から読み出した符号化済み画素値とから簡易的な画面内予測又は動き検出を含む画面間予測処理を行い、符号化効率が最適となる予測方式を決定する。符号化対象マクロブロックがＩスライスの場合、画面内予測画素ブロックサイズ及び予測モードを決定する。Ｐスライス又はＢスライスの場合には、画面内予測又は画面間予測の内、符号化効率の高い方を選択する。そして、画面間予測の場合には画面内予測画素ブロックサイズ及び画面内予測モード等の画面内予測符号化用パラメータを決定する。画面間予測の場合には、参照画像フレーム、マクロブロック分割パターン、及び動きベクトル等の画面間予測符号化用パラメータを決定する。 The prediction method determination unit 201 determines a prediction method for each macroblock in the encoding target screen according to the picture type instructed from the picture type control unit 105. Specifically, the prediction method determination unit 201 performs inter-screen prediction processing including simple intra-screen prediction or motion detection from the input video signal and the encoded pixel value read from the memory 104, and the encoding efficiency is improved. Determine the optimal prediction method. When the encoding target macroblock is an I slice, the intra prediction pixel block size and the prediction mode are determined. In the case of P slice or B slice, the one with higher coding efficiency is selected from among intra prediction or inter prediction. In the case of inter-screen prediction, intra-screen prediction encoding parameters such as an intra-screen prediction pixel block size and an intra-screen prediction mode are determined. In the case of inter-screen prediction, parameters for inter-screen prediction encoding such as a reference image frame, a macroblock division pattern, and a motion vector are determined.

予測処理部２０２は、予測方法決定部２０１により指定される予測符号化用パラメータに応じて、メモリ１０４からの符号化済み画像から予測画像を生成し、局所復号化部２０４に出力する。予測処理部２０２はまた、符号化対象画像（画素ブロック）と予測画像との差分となる予測残差信号を生成し、直交変換量子化部２０３に出力する。 The prediction processing unit 202 generates a prediction image from the encoded image from the memory 104 according to the prediction encoding parameter specified by the prediction method determination unit 201, and outputs the prediction image to the local decoding unit 204. The prediction processing unit 202 also generates a prediction residual signal that is a difference between the encoding target image (pixel block) and the prediction image, and outputs the prediction residual signal to the orthogonal transform quantization unit 203.

直交変換量子化部２０３は、指定された画素ブロック単位（８×８画素又は、４×４画素ブロック単位）で整数精度離散コサイン変換及び離散アダマール変換による直交変換処理を行う。離散アダマール変換は、特定の整数精度離散コサイン変換で得られる直流（ＤＣ）成分に適用される。輝度成分については、１６×１６画素ブロック単位で画面内予測処理が行われた場合の直流（ＤＣ）成分に適用される。色差信号については、各画素ブロックを整数精度離散コサイン変換した結果のＤＣ（直流）成分に適用される。 The orthogonal transform quantization unit 203 performs orthogonal transform processing by integer precision discrete cosine transform and discrete Hadamard transform in designated pixel block units (8 × 8 pixels or 4 × 4 pixel block units). The discrete Hadamard transform is applied to a direct current (DC) component obtained by a specific integer precision discrete cosine transform. The luminance component is applied to a direct current (DC) component when the in-screen prediction process is performed in units of 16 × 16 pixel blocks. The color difference signal is applied to a DC (direct current) component resulting from integer precision discrete cosine transform of each pixel block.

直交変換量子化部２０３は、直交変換で得られた変換係数を、指定された量子化パラメータに応じた量子化ステップで量子化し、量子化変換係数データをエントロピー符号化部２０５と局所復号化部２０４に供給する。 The orthogonal transform quantization unit 203 quantizes the transform coefficient obtained by the orthogonal transform in a quantization step according to a designated quantization parameter, and the quantized transform coefficient data is entropy coding unit 205 and a local decoding unit. 204.

局所復号化部２０４は、直交変換量子化部２０３からの量子化変換係数データを逆量子化及び逆直交変換し、予測処理部２０２からの予測画像データを加算して、局所復号化する。こうして復号化された画像データは、メモリ１０４に格納される。メモリ１０４に格納された画像データは、以降の画面内予測処理に利用される。更に、復号化されデブロッキングフィルタ処理が施された画像データも、メモリ１０４に格納され、以降の画面間予測処理に利用される。 The local decoding unit 204 performs inverse quantization and inverse orthogonal transform on the quantized transform coefficient data from the orthogonal transform quantization unit 203, adds the predicted image data from the prediction processing unit 202, and performs local decoding. The decoded image data is stored in the memory 104. The image data stored in the memory 104 is used for subsequent intra-screen prediction processing. Further, the image data that has been decoded and subjected to the deblocking filter processing is also stored in the memory 104 and used for the subsequent inter-screen prediction processing.

エントロピー符号化部２０５は、直交変換量子化部２０３からの量子化変換係数データをエントロピー符号化する。エントロピー符号化には、コンテキスト適応型可変長符号化（ＣＡＶＬＣ：Context-based Adaptive Variable Length Coding）等がある。他に、コンテキスト適応型２値算術符号化（ＣＡＢＡＣ：Context-based Adaptive Binary Arithmetic Coding）を採用しても良い。エントロピー符号化部２０５による符号化データは、多重化処理部２０６に供給される。 The entropy encoding unit 205 performs entropy encoding on the quantized transform coefficient data from the orthogonal transform quantizing unit 203. Entropy coding includes context adaptive variable length coding (CAVLC). In addition, Context-based Adaptive Binary Arithmetic Coding (CABAC) may be employed. The encoded data by the entropy encoding unit 205 is supplied to the multiplexing processing unit 206.

多重化処理部２０６は、エントロピー符号化部２０５からの符号化映像データに、図示しないシステムデータ（スライスヘッダ等）を多重化し、符号化データとして出力する。 The multiplexing processing unit 206 multiplexes system data (such as a slice header) not shown in the encoded video data from the entropy encoding unit 205 and outputs the multiplexed data as encoded data.

Ｉピクチャ、Ｐピクチャ及びＢピクチャの内、２枚のピクチャを参照可能なＢピクチャによる符号化処理が、メモリ１０４に対するメモリアクセスが最も多い。例えば、図３（ａ）に示すように、左眼用画像と右眼用画像とで同様のＧＯＰ（Group Of Pictures）構造で符号化する場合を考える。そして、Ｉピクチャの符号化に伴うメモリ１０４へのデータ伝送量に対し、Ｐピクチャの符号化に伴うデータ伝送量が２倍、Ｂピクチャの符号化に伴うデータ伝送量が４倍であると仮定する。図３（ａ）の下のグラフに示すように、Ｂピクチャの符号化処理タイミングでのメモリアクセスが極端に集中する。 Among the I picture, P picture, and B picture, the B 104 picture that can refer to two pictures has the most memory access to the memory 104. For example, as shown in FIG. 3A, consider a case where the left-eye image and the right-eye image are encoded with the same GOP (Group Of Pictures) structure. Then, it is assumed that the data transmission amount accompanying the encoding of the P picture is twice the data transmission amount accompanying the encoding of the I picture and the data transmission amount accompanying the encoding of the B picture is four times the data transmission amount to the memory 104. To do. As shown in the lower graph of FIG. 3A, memory accesses at the B picture encoding processing timing are extremely concentrated.

そこで、ピクチャタイプ制御部１０５は、Ｂピクチャによる符号化処理が左眼用画像と右眼用画像とで同時に発生しないようなＧＯＰ構造でピクチャタイプを制御する。図３（ｂ）は、ピクチャタイプ制御部１０５により制御される左眼用画像と右眼用画像のＧＯＰ構造の一例を示す。この例では、右眼用画像の３フレーム目にＰピクチャを配置することで、左眼用画像と右眼用画像とで同時にＢピクチャ符号化タイミングにならないようにしている。 Therefore, the picture type control unit 105 controls the picture type with a GOP structure such that the encoding process using the B picture does not occur simultaneously in the left-eye image and the right-eye image. FIG. 3B shows an example of the GOP structure of the left eye image and the right eye image controlled by the picture type control unit 105. In this example, the P-picture is arranged in the third frame of the right-eye image, so that the B-picture encoding timing is not reached at the same time for the left-eye image and the right-eye image.

このように、左眼用画像と右眼用画像とで異なるＧＯＰ構造で符号化することで、符号化におけるメモリアクセスを平準化できる。すなわち、図３（ｂ）の下のグラフに示すように、図３（ａ）に対してメモリ１０４に要求される最大伝送帯域を抑えることが可能となる。 As described above, by encoding with the GOP structure different between the left-eye image and the right-eye image, memory access in the encoding can be leveled. That is, as shown in the lower graph of FIG. 3B, it is possible to suppress the maximum transmission band required for the memory 104 with respect to FIG.

図４は、本発明の第２実施例の概略構成ブロック図を示す。左眼用画像を符号化する左眼用画像符号化部４０１と、右眼用画像を符号化する右眼用画像符号化部４０２は、メモリＩ／Ｆ４０３を介してメモリ４０４を共有している。メモリＩ／Ｆ４０３は、左眼用画像符号化部４０１及び右眼用画像符号化部４０２からのメモリアクセスを調停してメモリ４０４への読み出し及び書き込みを行う。左眼用画像符号化部４０１と右眼用画像符号化部４０２は、符号化におけるピクチャタイプを制御するピクチャタイプ制御部４０５からの指示に従うピクチャタイプで、それぞれ左眼用画像及び右眼用画像を符号化する。本実施例では、ピクチャタイプは、画面内符号化ピクチャ（Ｉピクチャ）、画面間順方向予測符号化ピクチャ（Ｐピクチャ）及び画面間双方向予測符号化（Ｂピクチャ）の３種類からなる。 FIG. 4 shows a schematic block diagram of a second embodiment of the present invention. The left-eye image encoding unit 401 that encodes the left-eye image and the right-eye image encoding unit 402 that encodes the right-eye image share the memory 404 via the memory I / F 403. . The memory I / F 403 mediates memory access from the left-eye image encoding unit 401 and the right-eye image encoding unit 402 and performs reading and writing to the memory 404. The left-eye image encoding unit 401 and the right-eye image encoding unit 402 are picture types in accordance with instructions from the picture type control unit 405 that controls the picture type in encoding, and the left-eye image and the right-eye image, respectively. Is encoded. In this embodiment, there are three types of picture types: an intra-picture coded picture (I picture), an inter-picture forward prediction coding picture (P picture), and an inter-picture bi-directional prediction coding (B picture).

左眼用画像符号化部４０１及び右眼用画像符号化部４０２の詳細は、左眼用画像符号化部１０１及び右眼用画像符号化部１０２と同じであるので、説明を省略する。 Details of the left-eye image encoding unit 401 and the right-eye image encoding unit 402 are the same as those of the left-eye image encoding unit 101 and the right-eye image encoding unit 102, and thus the description thereof is omitted.

ピクチャタイプ制御部４０５は、Ｂピクチャによる符号化処理が左眼用画像と右眼用画像とで同時に発生しないように、以下に説明するように左眼用画像符号化部４０１及び右眼用画像符号化部４０２を制御する。 As will be described below, the picture type control unit 405 performs the left-eye image encoding unit 401 and the right-eye image so that the encoding process using the B picture does not occur simultaneously in the left-eye image and the right-eye image. The encoding unit 402 is controlled.

図５は、左眼用画像符号化と右眼用画像符号化のＧＯＰ構造が同じである場合において、ピクチャタイプ制御部４０５による左眼用画像と右眼用画像の符号化処理タイミング例を示す。この例では、右眼用画像の符号化タイミングを１画面分遅延させている。これにより、左眼用画像と右眼用画像とでＢピクチャの符号化タイミングが同時になることがない。なお、上記１画面分の遅延以外に、左眼用画像符号化および右眼用画像符号化の一方の符号化タイミングを３画面や５画面といった所定画面分遅延させるようにして制御しても良い。 FIG. 5 shows an example of timing for encoding the left-eye image and the right-eye image by the picture type control unit 405 when the left-eye image encoding and the right-eye image encoding have the same GOP structure. . In this example, the encoding timing of the right-eye image is delayed by one screen. As a result, the encoding timing of the B picture does not coincide with the left-eye image and the right-eye image. In addition to the delay for one screen, the encoding timing of one of the left-eye image coding and the right-eye image coding may be controlled to be delayed by a predetermined screen such as three screens or five screens. .

このように左眼用画像と右眼用画像とで異なるタイミングで符号化することで符号化におけるメモリアクセスを平準化できる。図５の下のグラフに示すように、図３（ａ）に対してメモリ１０４に要求される最大伝送帯域を抑えることが可能となる。 In this way, the memory access in the encoding can be leveled by encoding the left-eye image and the right-eye image at different timings. As shown in the lower graph of FIG. 5, it is possible to suppress the maximum transmission bandwidth required for the memory 104 with respect to FIG.

１０１：左眼用画像符号化部
１０２：右眼用画像符号化部
１０４：メモリ
１０５：ピクチャタイプ制御部 101: Left-eye image encoding unit 102: Right-eye image encoding unit 104: Memory 105: Picture type control unit

Claims

A stereoscopic video encoding device that encodes a left-eye video signal and a right-eye video signal,
Left-eye image encoding means for encoding the left-eye video signal;
Right-eye image encoding means for encoding a right-eye video signal;
Memory means for storing data associated with encoding, shared by the left-eye image encoding means and the right-eye image encoding means;
The picture type of encoding in the left-eye image encoding means and the right-eye image encoding means is controlled to any one of intra-frame encoding, inter-screen forward prediction encoding, and inter-screen bidirectional prediction encoding. And a picture type control means for
The picture type control means controls the picture type so that the timing of inter-picture bi-directional predictive encoding does not coincide in the left-eye image encoding means and the right-eye image encoding means. Stereoscopic video encoding device.

2. The picture type control means controls to make different GOP structures of encoded video data generated by the left-eye image encoding means and the right-eye image encoding means, respectively. The stereoscopic video encoding device described in 1.

When the GOP structure of the encoded video data generated by the left-eye image encoding unit and the right-eye image encoding unit is the same, the picture type control unit is configured to perform the left-eye image encoding unit. 2. The stereoscopic video encoding apparatus according to claim 1, wherein one of the right-eye image encoding means is controlled to be delayed by a predetermined screen.