JP2015033105A

JP2015033105A - Motion picture encoding device

Info

Publication number: JP2015033105A
Application number: JP2013163752A
Authority: JP
Inventors: 小林　幸史; Yukifumi Kobayashi; 幸史小林
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-08-07
Filing date: 2013-08-07
Publication date: 2015-02-16

Abstract

PROBLEM TO BE SOLVED: To provide a motion picture encoding device capable of effectively reducing a data bus access amount without significantly deteriorating image quality.SOLUTION: A search range buffer 102 stores reference image data within a search range for motion vector detection from a reference frame buffer 211. A decimal pixel position prediction image creation part 103 creates a decimal pixel position prediction image by means of a six-tap filter. A decimal pixel position prediction image creation part 104 creates a decimal pixel position prediction image by means of a two-tap filter. A selection part 105 selects, as a prediction image of a decimal pixel position, output of the creation part 103 when encoding I and P pictures and selects output of the creation part 104 when encoding B pictures.

Description

本発明は、動画像データを圧縮符号化する動画像符号化装置に関し、より具体的には、符号化ブロック単位で動きベクトル検出処理を行う動画像符号化装置に関する。 The present invention relates to a moving image encoding apparatus that compresses and encodes moving image data, and more specifically to a moving image encoding apparatus that performs motion vector detection processing in units of encoded blocks.

近年、動画像の高解像度化及び高フレームレート化が進んでいる。例えば、デジタルビデオカメラなどでは、１９２０画素×１０８０画素の画素数を有するＨＤ（High Definition）画像で、いわゆる６０Ｐと呼ばれる６０フレーム／秒のプログレッシブ映像を扱う製品も既に市場に出ている。 In recent years, higher resolution and higher frame rate of moving images have been advanced. For example, in a digital video camera or the like, a product that handles 60 frames / second progressive video called 60P with a HD (High Definition) image having 1920 × 1080 pixels has already been put on the market.

高解像度化及び高フレームレート化は今後も進んでいく見通しがある。将来的には、４ｋ画像と呼ばれる４０９６画素×２１６０画素の画像、スーパーハイビジョンと呼ばれる７６８０画素×４３２０画素の画像、また１２０フレーム／秒などの高フレームレート画像を扱うことが予想される。 Higher resolution and higher frame rate are expected to continue. In the future, it is expected to handle an image of 4096 pixels × 2160 pixels called a 4k image, an image of 7680 pixels × 4320 pixels called a super high vision, and a high frame rate image such as 120 frames / second.

このような高解像度及び／又は高フレームレートの画像データを符号化する場合、画像メモリへのアクセス量及び処理量は、画素数及び／又はフレームレート数の増加に応じて増大する。 When such high resolution and / or high frame rate image data is encoded, the access amount and the processing amount to the image memory increase with an increase in the number of pixels and / or the number of frame rates.

動画像データの圧縮符号化方式として様々な方式が提案されているが、動き補償予測方式が符号化効率と画質の両面で主流となっている。代表的な動画像圧縮符号化方式の一つとして、Ｈ．２６４符号化方式が挙げられる。Ｈ．２６４符号化方式は、ビデオカメラのハイビジョン記録方式であるＡＶＣＨＤと地上デジタル放送のワンセグ放送に採用され、一般に広く普及している。 Various methods have been proposed as compression encoding methods for moving image data, but motion compensated prediction methods have become mainstream in both encoding efficiency and image quality. As one of the typical moving image compression coding systems, H.264 H.264 encoding method. H. The H.264 encoding method is adopted for AVCHD, which is a high-definition recording method for video cameras, and one-segment broadcasting for terrestrial digital broadcasting, and is widely spread.

Ｈ．２６４符号化方式は、画像をマクロブロックと呼ばれる１６画素×１６画素に分割し、このマクロブロック単位で符号化する。すなわち、符号化の主要な処理、例えば、動きベクトル検出、周波数変換、量子化処理及び可変長符号化処理等が、マクロブロック単位で実行される。 H. In the H.264 encoding method, an image is divided into 16 pixels × 16 pixels called a macroblock, and encoding is performed in units of the macroblock. That is, main processing of encoding, for example, motion vector detection, frequency conversion, quantization processing, variable length encoding processing, and the like are executed on a macroblock basis.

これらの処理のうち、最も処理量が多く、画像メモリへのアクセス量が多いのが、動きベクトル検出処理である。動きベクトル検出処理は、符号化を実行するマクロブロックに対して、参照画像と符号化対象画像との間でブロックマッチングを取り、両者が最も一致する位置を動きベクトルとして検出する処理である。検出された動きベクトルを使って動き補償処理を行うことで、符号化効率を高めることができる。 Among these processes, the motion vector detection process has the largest processing amount and the largest access amount to the image memory. The motion vector detection process is a process in which block matching is performed between a reference image and an encoding target image with respect to a macroblock to be encoded, and a position where the two match most is detected as a motion vector. Encoding efficiency can be increased by performing motion compensation processing using the detected motion vector.

Ｈ．２６４符号化方式では、動きベクトルの精度として整数画素精度の他に１／２画素精度と１／４画素精度が用いられている。これらの小数画素位置の画素は、整数画素位置の画素を補間処理することで生成される。ＭＰＥＧ−２符号化方式などでは１／２画素位置の画素を生成する補間処理として整数画素位置の２画素を線形補間するが、Ｈ．２６４符号化方式では１／２画素位置の周辺の整数画素位置の６画素に６タップフィルタを適用する。６タップフィルタ処理では、参照画像を、動き探索範囲よりも広い範囲で画像メモリから読み出さなければならず、ＭＰＥＧ−２符号化方式に比べ、データアクセス量が増加する。 H. In the H.264 encoding method, 1/2 pixel accuracy and 1/4 pixel accuracy are used as motion vector accuracy in addition to integer pixel accuracy. These pixels at the decimal pixel positions are generated by interpolating the pixels at the integer pixel positions. In the MPEG-2 encoding method or the like, two pixels at integer pixel positions are linearly interpolated as an interpolation process for generating pixels at 1/2 pixel positions. In the H.264 encoding method, a 6-tap filter is applied to 6 pixels at integer pixel positions around a 1/2 pixel position. In the 6-tap filter process, the reference image must be read from the image memory in a range wider than the motion search range, and the amount of data access increases compared to the MPEG-2 encoding method.

データアクセス量の増加によりデータバス上に流れるデータ量が非常に多くなり、例えば高解像度で高フレームレートの画像などではバスボトルネックにより所望の性能が満たせなくなる。バスボトルネックを回避するには、なるべくデータバスへのアクセス量を低減する必要がある。 The amount of data flowing on the data bus becomes very large due to an increase in the data access amount. For example, in a high resolution and high frame rate image, a desired performance cannot be satisfied due to a bus bottleneck. In order to avoid the bus bottleneck, it is necessary to reduce the access amount to the data bus as much as possible.

データバスへのアクセス量を低減する方法として、特許文献１には、動きベクトル検出時の参照画像の読み出し量を削減する方法が記載されている。具体的には、動きベクトル検出時の、予測画像作成のための補間処理を、探索範囲の端の部分に関してはタップ数の少ないフィルタを用いて簡易的に行うようにする。 As a method for reducing the access amount to the data bus, Patent Document 1 describes a method for reducing the read amount of a reference image at the time of motion vector detection. Specifically, interpolation processing for creating a predicted image at the time of motion vector detection is simply performed using a filter with a small number of taps at the end of the search range.

特開２００７−１２４２３８号公報JP 2007-124238 A

従来の動き補償予測符号化方式では、動きベクトル検出のための予測画像作成とは別に、予測画像と符号化対象画像との差分を算出する予測画像作成部で参照画像を生成していた。それも、差分画像生成部では、予測画像は規格どおりに６タップフィルタ処理で生成している。従って、動きベクトル検出手段において予測画像作成にタップ数の少ないフィルタ処理を採用したとしても、予測画像作成部で予測画像を作成するために探索範囲外の画素を必要とする事態が生じ得る。すなわち、データバスアクセス量の低減効果が限定されてしまう。 In the conventional motion compensated prediction encoding method, a reference image is generated by a prediction image generation unit that calculates a difference between a prediction image and an encoding target image, separately from the generation of a prediction image for motion vector detection. In addition, the difference image generation unit generates the predicted image by the 6-tap filter process according to the standard. Therefore, even if the motion vector detection means employs filter processing with a small number of taps for creating a predicted image, a situation may arise in which a pixel outside the search range is required in order to create a predicted image in the predicted image creating unit. That is, the effect of reducing the data bus access amount is limited.

本発明は、データバスアクセス量を効果的に低減できる動画像符号化装置を提示することを目的とする。 An object of the present invention is to provide a moving picture coding apparatus that can effectively reduce the amount of data bus access.

本発明に係る動画像符号化装置は、動画像を参照画像に基づき小数画素精度で動き予測する動き予測手段であって、前記動画像における符号化対象画像の、前記参照画像との間の動きベクトルと、前記動きベクトルの位置で前記参照画像から得られる予測画像と、前記符号化対象画像の、前記予測画像に対する差分画像とを出力する動き予測手段と、前記差分画像を符号化する符号化手段と、前記符号化手段の出力を復号化する復号化手段と、前記復号化手段の出力に前記予測画像を加算する加算手段と、前記加算手段の出力画像を前記参照画像として格納する参照フレームバッファとを具備する動画像符号化装置であって、前記動き予測手段が、前記符号化対象画像が非参照用画像である場合に、前記符号化対象画像が参照用画像データである場合よりも少ないタップ数で前記参照画像から小数画素位置の予測画像を作成する作成手段を具備することを特徴とする。 The moving picture encoding apparatus according to the present invention is a motion prediction unit that predicts a moving picture with a decimal pixel accuracy based on a reference picture, and a motion between the encoding target picture in the moving picture and the reference picture. A motion prediction means for outputting a vector, a prediction image obtained from the reference image at the position of the motion vector, a difference image of the encoding target image with respect to the prediction image, and an encoding for encoding the difference image Means, decoding means for decoding the output of the encoding means, addition means for adding the predicted image to the output of the decoding means, and a reference frame for storing the output image of the addition means as the reference image A motion picture encoding apparatus comprising: a buffer, wherein the motion prediction means is configured such that when the encoding target image is a non-reference image, the encoding target image is reference image data. Characterized by including a generation means for generating a predicted image of the sub-pel position from the reference image small number of taps in than that.

本発明によれば、画質を大幅に劣化させることなく、予測画像作成のための参照画像の読み出し量を少なくすることができ、データアクセス量を低減することができる。 According to the present invention, it is possible to reduce the read amount of a reference image for creating a predicted image without significantly degrading the image quality, and to reduce the data access amount.

本発明の実施例１の動き予測部の概略構成ブロック図である。It is a schematic block diagram of the motion estimation part of Example 1 of this invention. 本発明の実施例１の動画像符号化装置の概略構成ブロック図である。1 is a block diagram of a schematic configuration of a moving image encoding apparatus according to Embodiment 1 of the present invention. ６タップフィルタによる小数画素位置の予測画像作成方法の説明図である。It is explanatory drawing of the prediction image creation method of the decimal pixel position by 6 tap filter. ２タップフィルタによる小数画素位置の予測画像作成方法の説明図である。It is explanatory drawing of the prediction image creation method of the decimal pixel position by 2 tap filter. ピクチャタイプと参照関係の説明図である。It is explanatory drawing of a picture type and a reference relationship. 予測画像作成に必要な範囲の説明図である。It is explanatory drawing of the range required for prediction image preparation. 本発明の実施例２の動き予測部の概略構成ブロック図である。It is a schematic block diagram of the motion estimation part of Example 2 of this invention. 本発明の実施例２のバス混雑度とピクチャタイプに基づく予測画像選択処理のフローチャートである。It is a flowchart of the prediction image selection process based on the bus congestion degree and picture type of Example 2 of this invention.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図２は、Ｈ．２６４符号化方式に適用した本発明に係る動画像符号化装置の一実施例の概略構成ブロック図を示し、図１は、本実施例の特徴部分である動き予測部の概略構成ブロック図を示す。本実施例は、動画像データをマクロブロック等の所定の画素数から成る符号化ブロックに分割した上で、符号化ブロック単位で、動き補償予測符号化方式で圧縮符号化する。 FIG. 1 shows a schematic configuration block diagram of an embodiment of a moving picture encoding apparatus according to the present invention applied to an H.264 encoding system, and FIG. 1 shows a schematic configuration block diagram of a motion prediction unit which is a characteristic part of this embodiment. . In the present embodiment, the moving image data is divided into encoded blocks having a predetermined number of pixels such as macroblocks, and then compressed and encoded by the motion compensated prediction encoding method in units of encoded blocks.

フレームバッファ２０１には、符号化しようとする入力画像データ（符号化対象画像データ）が、図示しない画像入力手段（例えば、撮像手段）から書き込まれる。動き予測部２０２は、詳細は図２を参照して後述するが、符号化対象画像データの符号化ブロックの動きベクトルを検出すると共に、予測画像データとの間の差分を示す差分画像データを出力する。動き予測部２０２は、算出された差分画像データを直交変換部２０３に、差分画像算出に使用した予測画像データを加算器２０９に、動きベクトルをエントロピー符号化部２０６にそれぞれ供給する。 In the frame buffer 201, input image data to be encoded (encoding target image data) is written from an image input unit (for example, an imaging unit) (not shown). Although the details will be described later with reference to FIG. 2, the motion prediction unit 202 detects a motion vector of an encoding block of the encoding target image data and outputs difference image data indicating a difference from the prediction image data. To do. The motion prediction unit 202 supplies the calculated difference image data to the orthogonal transform unit 203, the prediction image data used for the difference image calculation to the adder 209, and the motion vector to the entropy encoding unit 206.

直交変換部２０３は、動き予測部２０２からの差分画像データを離散コサイン変換し、変換係数を量子化部２０５に供給する。量子化部２０５は、直交変換部２０３からの変換係数を量子化制御部２０４により指定される量子化ステップサイズで量子化する。量子化部２０５は、量子化で得られた量子化変換係数をエントロピー符号化部２０６に供給し、ローカルデコーダの逆量子化部２０７に供給する。 The orthogonal transform unit 203 performs discrete cosine transform on the difference image data from the motion prediction unit 202 and supplies transform coefficients to the quantization unit 205. The quantization unit 205 quantizes the transform coefficient from the orthogonal transform unit 203 with a quantization step size specified by the quantization control unit 204. The quantization unit 205 supplies the quantized transform coefficient obtained by the quantization to the entropy coding unit 206 and supplies it to the inverse quantization unit 207 of the local decoder.

エントロピー符号化部２０６は、量子化部２０５からの量子化変換係数にジグザグスキャン又はオルタネートスキャン等を実行し、エントロピー符号化する。エントロピー符号化部２０６は、エントロピー符号化により生成される符号データに、動き予測部２０２からの動きベクトル、量子化ステップサイズ及び符号化ブロック分割情報などの符号化方式情報を付加して、符号化ストリームを生成する。また、エントロピー符号化部２０６は、符号化ブロックごとの発生符号量を算出し、量子化制御部２０４に出力する。 The entropy encoding unit 206 performs entropy encoding by performing a zigzag scan, an alternate scan, or the like on the quantized transform coefficient from the quantization unit 205. The entropy encoding unit 206 adds encoding method information such as a motion vector, a quantization step size, and encoding block division information from the motion prediction unit 202 to code data generated by entropy encoding, and encodes the encoded data. Create a stream. In addition, the entropy encoding unit 206 calculates a generated code amount for each encoding block and outputs it to the quantization control unit 204.

量子化制御部２０４は、エントロピー符号化部２０６からの発生符号量情報に従い、後続の符号化ブロックが目標符号量で符号化されるように量子化ステップサイズを決定し、量子化部２０５に出力する。 The quantization control unit 204 determines the quantization step size so that the subsequent encoded block is encoded with the target code amount according to the generated code amount information from the entropy encoding unit 206, and outputs the quantization step size to the quantization unit 205. To do.

逆量子化部２０７は、量子化部２０５により量子化された変換係数を逆量子化し、変換係数（代表値）を逆直交変換部２０８に供給する。逆直交変換部２０８は、逆量子化部２０７の出力を逆離散コサイン変換して差分画像データを復元し、加算器２０９に供給する。 The inverse quantization unit 207 inversely quantizes the transform coefficient quantized by the quantization unit 205 and supplies the transform coefficient (representative value) to the inverse orthogonal transform unit 208. The inverse orthogonal transform unit 208 performs inverse discrete cosine transform on the output of the inverse quantization unit 207 to restore difference image data, and supplies the difference image data to the adder 209.

加算器２０９は、動き予測部２０２からの予測画像データを逆直交変換部２０８からの復元された差分画像データに加算して、画像データを復元する。復元された画像データは、デブロッキングフィルタ２１０でデブロッキング処理され、ローカルデコード画像として参照フレームバッファ２１１に格納される。逆量子化部２０７、逆直交変換部２０８、加算器２０９及びデブロッキングフィルタ２１０はローカルデコーダを構成する。 The adder 209 adds the predicted image data from the motion prediction unit 202 to the restored difference image data from the inverse orthogonal transform unit 208 to restore the image data. The restored image data is deblocked by the deblocking filter 210 and stored in the reference frame buffer 211 as a local decoded image. The inverse quantization unit 207, the inverse orthogonal transform unit 208, the adder 209, and the deblocking filter 210 constitute a local decoder.

本実施例では、イントラ符号化のＩピクチャ、前方向予測符号化のＰピクチャ及び双方向予測符号化のＢピクチャを選択可能である。そして、入力画像の各フレーム画像は、所定フレーム数を単位（いわゆるＧＯＰ（Group Of Pictures））としてＩ，Ｐ，Ｂピクチャとして所定の順番で符号化される。符号化制御部２１２が動き予測部２０２に対しフレーム画像のピクチャタイプを指定する。 In this embodiment, an I picture for intra coding, a P picture for forward prediction coding, and a B picture for bidirectional prediction coding can be selected. Each frame image of the input image is encoded in a predetermined order as an I, P, B picture with a predetermined number of frames as a unit (so-called GOP (Group Of Pictures)). The encoding control unit 212 designates the picture type of the frame image to the motion prediction unit 202.

図１を参照して、動き予測部２０２の構成と動作を説明する。符号化画像バッファ１０１には、符号化対象画像データの符号化ブロックが、フレームバッファ２０１から読み出されて格納されている。探索範囲バッファ１０２には、詳細は後述するが、動きベクトル検出に必要となる範囲を包含する若干広い範囲のローカルデコード画像データが参照フレームバッファ２１１から読み出されて格納される。 With reference to FIG. 1, the configuration and operation of the motion prediction unit 202 will be described. In the encoded image buffer 101, an encoded block of image data to be encoded is read from the frame buffer 201 and stored. Although the details will be described later, local search image data in a slightly wider range including the range necessary for motion vector detection is read from the reference frame buffer 211 and stored in the search range buffer 102.

６タップの小数画素位置予測画像作成部１０３は、探索範囲バッファ１０２の参照画像データに６タップフィルタを適用して予測画像データを生成する。また、２タップの小数画素位置予測画像作成部１０４は、探索範囲バッファ１０２の参照画像データに２タップフィルタを適用して予測画像データを生成する。選択部１０５は、整数画素精度で動きベクトルを検出する場合には、探索範囲バッファ１０２から予測画像データとなる所定数の画素データを読み出して動きベクトル検出部１０６に供給する。選択部１０５はまた、小数画素精度で動きベクトルを検出する場合には、小数画素位置予測画像作成部１０３，１０４の一方で作成された予測画像データを動きベクトル検出部１０６に供給する。 The 6-tap decimal pixel position predicted image creation unit 103 generates predicted image data by applying a 6-tap filter to the reference image data in the search range buffer 102. Further, the 2-tap decimal pixel position predicted image creation unit 104 generates predicted image data by applying a 2-tap filter to the reference image data in the search range buffer 102. When detecting a motion vector with integer pixel accuracy, the selection unit 105 reads out a predetermined number of pixel data serving as predicted image data from the search range buffer 102 and supplies the pixel data to the motion vector detection unit 106. The selection unit 105 also supplies predicted image data created by one of the decimal pixel position predicted image creation units 103 and 104 to the motion vector detection unit 106 when detecting a motion vector with decimal pixel accuracy.

動きベクトル検出部１０６は、選択部１０５からの予測画像データと、符号化画像バッファ１０１の符号化ブロックとをマッチングして差分を算出する。探索範囲バッファ１０２の探索範囲内から予測画像に必要な範囲を掃引しながら差分値を算出することで、動きベクトル検出部１０６は、符号化ブロックに最も類似する、参照画像上の位置を示す動きベクトルを決定できる。 The motion vector detection unit 106 matches the predicted image data from the selection unit 105 with the encoded block of the encoded image buffer 101 and calculates a difference. By calculating the difference value while sweeping the range necessary for the predicted image from the search range of the search range buffer 102, the motion vector detection unit 106 is the motion that indicates the position on the reference image that is most similar to the coding block. The vector can be determined.

先に説明したように、小数画素位置予測画像作成部１０３は、探索範囲バッファ１０２から参照画像の所定範囲内の画素値を読み出し、６タップフィルタ（タップ数＝６）を適用して１／２画素精度の予測画像を生成する。６タップフィルタを使用することは、Ｈ．２６４符号化方式で規定されている。即ち、図３に示すように、小数画素位置予測画像作成部１０３は、１／２画素位置の画素Ｘを作成する上で、同じライン上の周辺の整数位置の６画素ａ〜ｆを用いる。画素Ｘの値は、下記式（１）で、
Ｘ＝（ａ−５×ｂ＋２０×ｃ＋２０×ｄ−５×ｅ＋ｆ＋１６）＞＞５（１）
と示される。なお、” ＞＞５”は右に５ビットシフトの演算を示す。 As described above, the decimal pixel position predicted image creation unit 103 reads pixel values within a predetermined range of the reference image from the search range buffer 102 and applies a 6-tap filter (number of taps = 6) to 1/2. A predicted image with pixel accuracy is generated. Using a 6-tap filter is It is defined by the H.264 encoding method. That is, as shown in FIG. 3, the decimal pixel position predicted image creation unit 103 uses 6 pixels a to f at the peripheral integer positions on the same line in creating the pixel X at the 1/2 pixel position. The value of the pixel X is the following formula (1):
X = (a-5 * b + 20 * c + 20 * d-5 * e + f + 16) >> 5 (1)
It is shown. “>> 5” indicates a 5-bit shift operation on the right.

他方、小数画素位置予測画像作成部１０４は、探索範囲バッファ１０２から所定範囲の画素値を読み出し、２タップフィルタ（タップ数＝２）を適用して１／２画素精度の予測画像を作成する。即ち、図４に示すように、小数画素位置予測画像作成部１０４は、１／２画素位置の画素Ｘを作成する上で、同じライン上の隣接する整数位置の２画素ｃ，ｄを用いる。画素Ｘの値は、下記式（２）で、
Ｘ＝（ｃ＋ｄ＋１）＞＞１（２）
と示される。なお、” ＞＞１”は右に１ビットシフトの演算を示す。 On the other hand, the decimal pixel position predicted image creation unit 104 reads a predetermined range of pixel values from the search range buffer 102 and creates a predicted image with 1/2 pixel accuracy by applying a 2-tap filter (number of taps = 2). That is, as shown in FIG. 4, the decimal pixel position predicted image creation unit 104 uses two pixels c and d at adjacent integer positions on the same line in creating the pixel X at the ½ pixel position. The value of the pixel X is the following formula (2):
X = (c + d + 1) >> 1 (2)
It is shown. “>> 1” indicates a 1-bit shift operation on the right.

小数画素位置予測画像作成部１０３は、整数画素位置の６画素を用いて１／２画素精度の予測画像を作成するのに対し、小数画素位置予測画像作成部１０４は、整数画素位置の２画素を用いて１／２画素精度の参照画像を生成する。従って、小数画素位置予測画像作成部１０３は、小数画素位置予測画像作成部１０４が必要とする範囲よりも２画素分だけ外側に広い範囲の参照画像データを必要とする。換言すると、小数画素位置予測画像作成部１０４による予測画像作成では、探索範囲バッファ１０２からの画素値の読み出し量が、小数画素位置予測画像作成部１０３のそれよりも大幅に少なくなる。ただし、小数画素位置予測画像作成部１０４により作成される予測画像データは符号化規格に準拠していないので、受信側でのデコード画像と、ローカルデコード画像との間に誤差が生じる。 The decimal pixel position predicted image creation unit 103 creates a predicted image with 1/2 pixel accuracy using 6 pixels at integer pixel positions, whereas the decimal pixel position predicted image creation unit 104 has two pixels at integer pixel positions. Is used to generate a reference image with 1/2 pixel accuracy. Therefore, the decimal pixel position predicted image creation unit 103 requires reference image data in a wide range outside the range required by the decimal pixel position predicted image creation unit 104 by two pixels. In other words, in the predicted image creation by the decimal pixel position predicted image creation unit 104, the read amount of the pixel value from the search range buffer 102 is significantly smaller than that of the decimal pixel position predicted image creation unit 103. However, since the predicted image data created by the decimal pixel position predicted image creation unit 104 does not comply with the encoding standard, an error occurs between the decoded image on the receiving side and the local decoded image.

選択部１０５は、整数画素位置で動きベクトルを検出すべきときには、探索範囲バッファ１０２から読み出された整数画素精度の予測画像データを選択して動きベクトル検出部１０６に供給する。小数画素位置で動きベクトルを検出すべきときには、選択部１０５は、符号化制御部２１２からのピクチャタイプ情報に従い小数画素位置予測画像作成部１０３又は１０４からの小数画素位置の予測画像データを選択して動きベクトル検出部１０６に供給する。 When a motion vector is to be detected at an integer pixel position, the selection unit 105 selects predicted image data with integer pixel accuracy read from the search range buffer 102 and supplies the predicted image data to the motion vector detection unit 106. When the motion vector is to be detected at the decimal pixel position, the selection unit 105 selects the predicted image data at the decimal pixel position from the decimal pixel position predicted image creation unit 103 or 104 according to the picture type information from the encoding control unit 212. To the motion vector detecting unit 106.

Ｈ．２６４符号化方式では、後続ピクチャの符号化時に参照されるピクチャ、即ち、予測符号化のための予測画像の生成に利用されるピクチャと、参照されないピクチャとがある。通常、ＩピクチャとＰピクチャは、後続ピクチャの符号化時に参照されるが、Ｂピクチャは、後続ピクチャの符号化時に参照されない。 H. In the H.264 encoding method, there are a picture that is referred to when a subsequent picture is encoded, that is, a picture that is used to generate a predicted image for predictive encoding, and a picture that is not referred to. Normally, the I picture and the P picture are referred to when the subsequent picture is encoded, while the B picture is not referred to when the subsequent picture is encoded.

図５は、ピクチャタイプと参照関係の一例を示す。図５において、ピクチャ５０１〜５０７は、表示順で左から順番に並んでいる。Ｉ、Ｐ、Ｂはピクチャタイプを示し、これらに続けて付記された番号は、符号化の順番を示す。例えば、ピクチャ５０１は、Ｉピクチャであり、最初に符号化される。 FIG. 5 shows an example of the picture type and the reference relationship. In FIG. 5, pictures 501 to 507 are arranged in order from the left in the display order. “I”, “P”, and “B” indicate picture types, and the numbers added after these indicate the order of encoding. For example, the picture 501 is an I picture and is encoded first.

Ｐピクチャ５０４はＩピクチャ５０１を参照して符号化される。Ｂピクチャ５０２及びＢピクチャ５０３は、Ｉピクチャ５０１を前方予測用に参照し、かつＰピクチャ５０４を後方予測用に参照して、符号化される。Ｐピクチャ５０７はＰピクチャ５０４を参照して符号化される。Ｂピクチャ５０５及びＢピクチャ５０６は、Ｐピクチャ５０４を前方予測用に参照し、Ｐピクチャ５０７を後方予測用に参照して、符号化される。このように、Ｉピクチャ及びＰピクチャは後続のピクチャの符号化時に参照されるが、Ｂピクチャは参照されない。 The P picture 504 is encoded with reference to the I picture 501. B picture 502 and B picture 503 are encoded with reference to I picture 501 for forward prediction and P picture 504 for backward prediction. The P picture 507 is encoded with reference to the P picture 504. The B picture 505 and the B picture 506 are encoded with reference to the P picture 504 for forward prediction and the P picture 507 for backward prediction. As described above, the I picture and the P picture are referred to when the subsequent picture is encoded, but the B picture is not referred to.

後続ピクチャの符号化時に参照されるピクチャのローカルデコード画像と、受信側でのデコード画像との間に誤差が生じると、その誤差が参照の繰り返しにより累積し、画質の劣化をもたらす。他方、後続ピクチャの符号化時に参照されることの無いピクチャについては、ローカルデコード画像と、受信側でのデコード画像との間に誤差が生じても、その誤差は当該ピクチャにしか影響しないので、画質にもたらす影響は少ない。そこで、本実施例では、後続ピクチャの符号化時に参照されないピクチャについて、簡易な小数画素位置予測画像作成方法を採用することとした。なお、以下では、後続ピクチャ（後続画像）の符号化時に参照されうるピクチャ（画像）を参照用ピクチャ（参照用画像）と称し、参照されないピクチャ（画像）を非参照用ピクチャ（非参照用画像）と称する。 If an error occurs between a local decoded image of a picture referenced at the time of encoding a subsequent picture and a decoded image on the receiving side, the error accumulates due to repeated reference, resulting in deterioration in image quality. On the other hand, for a picture that is not referred to when the subsequent picture is encoded, even if an error occurs between the local decoded image and the decoded image on the receiving side, the error only affects the picture. Little impact on image quality. Therefore, in this embodiment, a simple decimal pixel position predicted image creation method is adopted for a picture that is not referred to when a subsequent picture is encoded. Hereinafter, a picture (image) that can be referred to when a subsequent picture (subsequent image) is encoded is referred to as a reference picture (reference image), and a non-reference picture (image) is referred to as a non-reference picture (non-reference image). ).

そこで、本実施例では、小数画素位置の動きベクトルを検出する場合に、メモリアクセス量の低減を考慮して、選択部１０５が、小数画素位置予測画像作成部１０３，１０４の出力を次のように選択するようにした。 Therefore, in this embodiment, when detecting a motion vector at a decimal pixel position, the selection unit 105 outputs the outputs of the decimal pixel position predicted image creation units 103 and 104 in the following manner in consideration of reduction of the memory access amount. To choose.

参照用ピクチャについては、小数画素位置の動きベクトルを検出する際に、選択部１０５が、小数画素位置予測画像作成部１０３の出力画像を選択するようにした。すなわち、ＩピクチャとＰピクチャを符号化する際には、受信側での予測画像作成方法と同じになることが保証される６タップの小数画素位置予測画像作成部１０３を採用する。 For the reference picture, the selection unit 105 selects the output image of the decimal pixel position predicted image creation unit 103 when detecting the motion vector at the decimal pixel position. That is, when encoding an I picture and a P picture, a 6-tap decimal pixel position predicted image creation unit 103 that is guaranteed to be the same as the predicted image creation method on the receiving side is employed.

他方、非参照用ピクチャについては、小数画素位置の動きベクトルを検出する際に、選択部１０５が、２タップの小数画素位置予測画像作成部１０４の出力画像を選択するようにした。すなわち、Ｂを符号化する際には、受信側での予測画像作成方法と同じにならないが、メモリアクセス量を低減できる２タップの小数画素位置予測画像作成部１０４を採用する。 On the other hand, for the non-reference picture, the selection unit 105 selects the output image of the 2-tap decimal pixel position predicted image creation unit 104 when detecting the motion vector at the decimal pixel position. That is, when encoding B, a 2-tap decimal pixel position predicted image creation unit 104 that can reduce the memory access amount is employed, although it is not the same as the predicted image creation method on the receiving side.

探索範囲バッファ１０２は、動きベクトル検出部が符号化ブロックに対して動き検出するのに必要な探索範囲の参照画像データを参照フレームバッファ２１１から読み込む。６タップフィルタで小数画素位置の予測画像を作成する場合、更に、上下左右に２画素分の画素データを必要とする。例えば、符号化ブロックの大きさが１６画素×１６画素に対して探索範囲を上下左右の３画素とした場合、探索範囲として（１６＋６）×（１６＋６）画素範囲の画像データが探索範囲バッファ１０２に必要となる。但し、６タップフィルタを使って予測画像を生成する場合、この探索範囲の外に上下左右で２画素の参照画像が必要になるので、結局、探索範囲バッファ１０２には、（１６＋６＋４）×（１６＋６＋４）画素範囲の画像データが必要となる。他方、２タップフィルタを使用する場合、上記の探索範囲外の参照画像デーは必要無いので、探索範囲バッファ１０２には、（１６＋６）×（１６＋６）画素範囲の画像データがあればよい。 The search range buffer 102 reads, from the reference frame buffer 211, reference image data in the search range necessary for the motion vector detection unit to detect motion for the encoded block. When a predicted image at a decimal pixel position is created with a 6-tap filter, pixel data for two pixels is further required in the vertical and horizontal directions. For example, when the size of the coding block is 16 pixels × 16 pixels and the search range is three pixels, top, bottom, left, and right, image data of (16 + 6) × (16 + 6) pixel range as the search range is stored in the search range buffer 102. Necessary. However, when a predicted image is generated using a 6-tap filter, a reference image of 2 pixels is required in the upper, lower, left, and right sides outside this search range, so that the search range buffer 102 eventually has (16 + 6 + 4) × (16 + 6 + 4). ) Image data in the pixel range is required. On the other hand, when the 2-tap filter is used, reference image data outside the search range is not necessary, and the search range buffer 102 only needs to have image data in the (16 + 6) × (16 + 6) pixel range.

図６は、符号化ブロックと、動き検出の探索範囲と、作成部１０３，１０４による予測画像の作成に必要な範囲を示す模式図である。図６では、整数画素位置で縦、横ともに±３画素を探索範囲としている。図６（ａ）は、６タップフィルタの小数画素位置予測画像作成部１０３が必要とする範囲を示し、図６（ｂ）は、２タップフィルタの小数画素位置予測画像作成部１０４が必要とする範囲を示す。 FIG. 6 is a schematic diagram illustrating a coding block, a search range for motion detection, and a range necessary for creating a predicted image by the creation units 103 and 104. In FIG. 6, the search range is ± 3 pixels both vertically and horizontally at integer pixel positions. 6A shows a range required by the fractional pixel position predicted image creation unit 103 of the 6-tap filter, and FIG. 6B shows a range required by the fractional pixel position predicted image creation unit 104 of the 2-tap filter. Indicates the range.

図６に示す例では、６タップフィルタの小数画素位置予測画像作成部１０３には２６画素×２６画素（＝６７６画素）が必要であるのに対し、２タップフィルタの小数画素位置予測画像作成部１０４には２２画素×２２画素（＝４８４画素）で良い。従って、非参照用ピクチャ（Ｂピクチャ）では、参照用ピクチャ（Ｉ，Ｐピクチャ）に比べて、１９２画素（＝６７６画素−４８４画素）相当のデータアクセスを低減できる。これは、割合にすると、２８％（＝（１９２／６７６）×１００％）の削減率となる。 In the example illustrated in FIG. 6, the 6-tap filter decimal pixel position prediction image creation unit 103 requires 26 pixels × 26 pixels (= 676 pixels), whereas the 2-tap filter decimal pixel position prediction image creation unit 104 may be 22 pixels × 22 pixels (= 484 pixels). Therefore, in the non-reference picture (B picture), data access corresponding to 192 pixels (= 676 pixels-484 pixels) can be reduced compared to the reference pictures (I, P pictures). This is a reduction rate of 28% (= (192/676) × 100%).

動きベクトル検出部１０６は、探索範囲内で符号化ブロックに最も近似する位置をブロックマッチングにより小数精度で動きベクトルとして検出する。この動きベクトル検出後に、動きベクトル検出部１０６は、その動きベクトル位置の予測画像を小数画素位置予測画像作成部１０３，１０４及び選択部１０５により生成させ、符号化ブロックとの差分画像を生成する。動きベクトル検出部１０６は最終的に、検出した動きベクトルをエントロピー符号化部２０６に、差分画像データを直交変換部２０３に、動きベクトル位置の予測画像データを加算器２０９に供給する。 The motion vector detection unit 106 detects a position closest to the encoded block within the search range as a motion vector with decimal precision by block matching. After this motion vector detection, the motion vector detection unit 106 generates a predicted image at the motion vector position by the decimal pixel position predicted image creation units 103 and 104 and the selection unit 105, and generates a difference image from the encoded block. Finally, the motion vector detection unit 106 supplies the detected motion vector to the entropy encoding unit 206, the difference image data to the orthogonal transformation unit 203, and the prediction image data of the motion vector position to the adder 209.

符号化しようとするピクチャが参照用ピクチャか非参照用ピクチャかにより、前者の場合には６タップフィルタを使用し、後者の場合には２タップフィルタを使用することで、画質を大幅に損ねることなく、データアクセス量を低減することができる。 Depending on whether the picture to be encoded is a reference picture or a non-reference picture, the use of a 6-tap filter in the former case and the use of a 2-tap filter in the latter case significantly impairs image quality. Therefore, the data access amount can be reduced.

画像圧縮符号化方式としてＨ．２６４符号化方式を使用する実施例を説明したが、上記実施例は、動き補償予測符号化方式一般に適用可能である。 As an image compression coding method, H.264 is used. Although the embodiment using the H.264 encoding method has been described, the above embodiment is applicable to a motion compensation prediction encoding method in general.

また、規格に準拠しない予測画像作成方式として２タップフィルタを用いる方式を説明したが、これは説明用の一例である。規格に準拠した予測画像作成方式と比較してデータアクセス量を低減できる方式であれば、その他の予測画像作成方式を適用できる。 Moreover, although the method using a 2-tap filter has been described as a predicted image creation method that does not comply with the standard, this is an example for explanation. Other prediction image creation methods can be applied as long as the data access amount can be reduced as compared with the prediction image creation method compliant with the standard.

データバスの混雑度をも考慮するように変更した実施例を説明する。図７は、そのような変更実施例の概略構成ブロック図を示す。図１と同じ構成要素には同じ符号を付してある。 An embodiment will be described which is modified so as to take into consideration the degree of congestion of the data bus. FIG. 7 shows a schematic block diagram of such a modified embodiment. The same components as those in FIG. 1 are denoted by the same reference numerals.

図７に示す動き予測部７０１は、図２に示す符号化装置において、動き予測部２０２に代えて配置される。動き予測部７０１は、バス混雑度判定部７０２を具備する。符号化画像バッファ７０３は、符号化画像バッファ１０１の機能に加えて、フレームバッファ２０１から符号化画像データを読み込む処理サイクル数の情報をバス混雑度判定部７０２に供給する。探索範囲バッファ７０４は探索範囲バッファ１０２の機能に加えて、参照フレームバッファ２１１から参照画像データを読み込む処理サイクル数の情報をバス混雑度判定部７０２に供給する。 A motion prediction unit 701 shown in FIG. 7 is arranged in place of the motion prediction unit 202 in the encoding device shown in FIG. The motion prediction unit 701 includes a bus congestion degree determination unit 702. In addition to the function of the encoded image buffer 101, the encoded image buffer 703 supplies information on the number of processing cycles for reading encoded image data from the frame buffer 201 to the bus congestion degree determination unit 702. In addition to the function of the search range buffer 102, the search range buffer 704 supplies information on the number of processing cycles for reading reference image data from the reference frame buffer 211 to the bus congestion degree determination unit 702.

符号化画像データ及び参照画像データを読み出す処理サイクル数は、データバスの混雑度により変化するので、これらを参照することで、バス混雑度を判定できる。バス混雑度判定部７０２は、符号化画像データ及び参照画像データの読み込み処理サイクル数がそれぞれ所定の閾値よりも小さい場合、データバスの混雑度は低いと判定する。他方、符号化画像データ及び参照画像データの読み出し処理サイクル数のどちらか又は両方が所定の閾値以上の場合、バス混雑度判定部７０２は、データバスの混雑度が高いと判定する。 Since the number of processing cycles for reading the encoded image data and the reference image data varies depending on the congestion degree of the data bus, the bus congestion degree can be determined by referring to these. The bus congestion degree determination unit 702 determines that the data bus congestion degree is low when the number of read processing cycles of the encoded image data and the reference image data is smaller than a predetermined threshold value. On the other hand, when either or both of the read processing cycle numbers of the encoded image data and the reference image data are equal to or greater than a predetermined threshold value, the bus congestion degree determination unit 702 determines that the data bus congestion degree is high.

バス混雑度判定部７０２による判定は、符号化ブロック単位の処理サイクル数を基準に実施してもよい。例えば、１０個の符号化ブロックの処理が終了した時点での処理サイクル数のように、複数の符号化ブロック単位での処理サイクル数を基準に判定してもよい。符号化画像データと参照画像の読み出し処理サイクル数を用いてバス混雑度を判定しているが、データバスの混雑度を判定できる他の方法を用いてもよい。 The determination by the bus congestion degree determination unit 702 may be performed based on the number of processing cycles for each coding block. For example, the determination may be made on the basis of the number of processing cycles in units of a plurality of encoded blocks, such as the number of processing cycles at the time when the processing of 10 encoded blocks is completed. Although the bus congestion degree is determined using the number of read processing cycles of the encoded image data and the reference image, other methods that can determine the data bus congestion degree may be used.

選択部７０５は、ピクチャタイプとバス混雑度判定部７０２の判定結果とに従い、小数画素位置の予測画像データとして作成部１０３又は同１０４の出力を選択して動きベクトル検出部１０６に供給する。 The selection unit 705 selects the output of the creation unit 103 or 104 as predicted image data at the decimal pixel position according to the picture type and the determination result of the bus congestion degree determination unit 702 and supplies the selected output to the motion vector detection unit 106.

データバスの混雑度が低い場合、データアクセス量を少なくしなくても、十分に処理が間に合うので、符号化規格に準拠した方式で作成された予測画像を選択するのが好ましい。この観点で、選択部７０５は、バス混雑度判定部７０２によりデータバスの混雑度が低いと判定された場合、ピクチャタイプによらず小数画素位置予測画像作成部１０３の出力を選択する。 When the data bus congestion is low, the processing can be performed in time without reducing the data access amount, and therefore it is preferable to select a predicted image created by a method compliant with the coding standard. From this viewpoint, when the bus congestion degree determination unit 702 determines that the data bus congestion degree is low, the selection unit 705 selects the output of the decimal pixel position predicted image creation unit 103 regardless of the picture type.

他方、データバスの混雑度が高い場合、データアクセス量を少なくしなければ、処理が所定の時間内に間に合わなくなる可能性がある。従って、選択部７０５は、バス混雑度判定部７０２によりデータバスの混雑度が高いと判定された場合、実施例１と同様に、参照用ピクチャか否かにより小数画素位置予測画像作成部１０３又は同１０４の出力を選択する。即ち、選択部７０５は、参照用ピクチャであるＩ，Ｐピクチャに対して小数画素位置予測画像作成部１０３の出力を選択し、非参照用ピクチャであるＢピクチャに対して小数画素位置予測画像作成部１０４の出力を選択する。 On the other hand, when the degree of data bus congestion is high, processing may not be in time within a predetermined time unless the data access amount is reduced. Accordingly, when the bus congestion degree determination unit 702 determines that the data bus congestion level is high, the selection unit 705 determines whether the sub-pixel position predicted image creation unit 103 or the The output of the same 104 is selected. That is, the selection unit 705 selects the output of the decimal pixel position predicted image creation unit 103 for the I and P pictures that are reference pictures, and creates the decimal pixel position predicted image for the B picture that is a non-reference picture. The output of the unit 104 is selected.

図８は、バス混雑度と参照用ピクチャ／非参照用ピクチャによる小数画素位置予測画像選択のフローチャートを示す。 FIG. 8 shows a flowchart of selection of a decimal pixel position predicted image based on the bus congestion level and the reference picture / non-reference picture.

バス混雑度判定部７０２は、符号化画像データ及び参照画像データの読み出しに要する処理サイクル数からデータバスの混雑度を判定する（Ｓ８０１）。バス混雑度判定部７０２がデータバスの混雑度が低いと判定した場合（Ｓ８０２）、選択部７０５は、符号化規格に準拠した方式である６タップフィルタの小数画素位置予測画像作成部１０３の出力を選択する（Ｓ８０３）。 The bus congestion degree determination unit 702 determines the data bus congestion degree from the number of processing cycles required for reading the encoded image data and the reference image data (S801). When the bus congestion degree determination unit 702 determines that the data bus congestion degree is low (S802), the selection unit 705 outputs the output from the fractional pixel position prediction image creation unit 103 of the 6-tap filter which is a method compliant with the encoding standard. Is selected (S803).

バス混雑度判定部７０２がデータバスの混雑度が高いと判定した場合（Ｓ８０２）、選択部７０５は、符号化対象ピクチャが参照用ピクチャであるか否かを判定する（Ｓ８０４）。符号化対象ピクチャが参照用ピクチャである場合（Ｓ８０４）、選択部７０５は、符号化規格に準拠した方式である６タップフィルタの小数画素位置予測画像作成部１０３の出力を選択する（Ｓ８０５）。 When the bus congestion degree determination unit 702 determines that the data bus congestion degree is high (S802), the selection unit 705 determines whether or not the encoding target picture is a reference picture (S804). When the encoding target picture is a reference picture (S804), the selection unit 705 selects the output of the fractional pixel position predicted image creation unit 103 of the 6-tap filter that is a method compliant with the encoding standard (S805).

符号化対象ピクチャが非参照用ピクチャである場合、選択部７０５は、２タップフィルタの小数画素位置予測画像作成部１０４の出力を選択する（Ｓ８０６）。 When the encoding target picture is a non-reference picture, the selection unit 705 selects the output of the fractional pixel position predicted image creation unit 104 of the 2-tap filter (S806).

図８に示すように、データバスが混雑していないときは符号化規格に準拠した予測画像作成方式を用いる予測画像を採用することで、画質を損なわずに符号化及び復号化できる。また、データバスが混雑しているときには、画質の低下を抑えつつ、データアクセス量を低減でき、これにより所望の処理性能を満たすことが容易になる。すなわち、データバスの混雑度に応じて、なるべく画質を損なわずにデータアクセス量を低減して、所望の処理性能を満たす符号化動作を実現する。 As shown in FIG. 8, when the data bus is not congested, it is possible to encode and decode without impairing the image quality by adopting a predicted image using a predicted image creation method compliant with the encoding standard. Further, when the data bus is congested, it is possible to reduce the data access amount while suppressing the deterioration of the image quality, thereby making it easy to satisfy the desired processing performance. That is, according to the degree of congestion of the data bus, the data access amount is reduced without losing the image quality as much as possible, and the encoding operation that satisfies the desired processing performance is realized.

特許請求の範囲に規定される本発明の技術的範囲を逸脱しないで、種々の変更が可能であることは明らかである。 Obviously, various modifications may be made without departing from the scope of the invention as defined in the claims.

Claims

A motion prediction means for predicting a motion image with decimal pixel accuracy based on a reference image, the motion vector between the reference image of the encoding target image in the motion image, and the reference at the position of the motion vector Motion prediction means for outputting a predicted image obtained from an image and a difference image of the encoding target image with respect to the predicted image;
Encoding means for encoding the difference image;
Decoding means for decoding the output of the encoding means;
Adding means for adding the predicted image to the output of the decoding means;
A video encoding device comprising a reference frame buffer for storing an output image of the adding means as the reference image,
When the encoding target image is a non-reference image, the motion prediction unit calculates a predicted image at a decimal pixel position from the reference image with a smaller number of taps than when the encoding target image is a reference image. A moving picture coding apparatus comprising a creating means for creating.

Furthermore, it has a judging means for judging the degree of congestion of the data bus,
When the data bus has a low degree of congestion, the creating means counts the number of taps when the encoding target image is a reference image regardless of whether the encoding target image is a non-reference image or a reference image. The video encoding apparatus according to claim 1, wherein a predicted image at a decimal pixel position is created from the reference image.

The said creating means creates the prediction image of the said decimal pixel position with the number of taps based on an encoding standard, when the said encoding object image is a reference image. Video encoding device.

The moving picture encoding apparatus according to claim 3, wherein the number of taps conforming to the encoding standard is six.

When the encoding target image is a non-reference image, the generation unit generates a predicted image at the decimal pixel position from the reference image by a 2-tap filter, and the encoding target image is a reference image. 5. The moving picture encoding apparatus according to claim 1, wherein a predicted image at a decimal pixel position is created from the reference image by a 6-tap filter.