JP2017225199A

JP2017225199A - Video encoding device and video encoding program

Info

Publication number: JP2017225199A
Application number: JP2017188029A
Authority: JP
Inventors: 卓佐野; Taku Sano; 大西　隆之; Takayuki Onishi; 隆之大西; 和也横張; Kazuya Yokohari
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-09-28
Filing date: 2017-09-28
Publication date: 2017-12-21
Anticipated expiration: 2033-10-02
Also published as: JP6435029B2

Abstract

PROBLEM TO BE SOLVED: To provide a video encoding device which can be packaged in hardware by making search computational complexity with respect to an encoded block uniform by performing a search while determining a reduction ratio of a picture of an input video signal and a reference image of a decoded video signal and a search range on a pre-stage of search processing.SOLUTION: A video encoding device comprises: reduction means for performing reduction processing on a picture of an input video signal and a reference image of a decoded video signal of a motion prediction destination while using a predetermined reduction ratio; and vector determination means for determining a motion vector by performing motion prediction processing while using an encoded block of the reduced picture of the input video signal and a search region of the reference image of the decoded video signal. The reduction means calculates dispersion values of motion vector components after applying scaling processing for aligning temporal distances of motion vectors to the motion vectors of pictures that are encoded just before the picture of the input video signal, and performs the reduction processing on the picture of the input video signal and the reference image of the decoded video signal in a reduction ratio corresponding to the dispersion values.SELECTED DRAWING: Figure 1

Description

本発明は、動き予測を用いて映像符号化を行う映像符号化装置及び映像符号化プログラムに関する。 The present invention relates to a video encoding device and a video encoding program that perform video encoding using motion prediction.

映像符号化技術は、ＭＰＥＧ−２、ＭＰＥＧ−４、ＭＰＥＧ−４／ＡＶＣが多く用いられており、最近では次世代の映像符号化規格であるＨＥＶＣが規格化されつつある。映像符号化規格では、１つのピクチャ内に閉じた情報を用いて符号化を行う画面内符号化と、時間的に連続した複数のピクチャを用いて符号化を行う画面間符号化が用いられている。
画面間符号化には画面間の差分値を削減するために動き予測処理を行い、差分値と動きベクトル情報を符号化することで情報量を削減している。ただ、映像の正しい動きを捉えて符号化すべき差分値を小さくするためには、動き予測処理に膨大な演算量を必要とする。 MPEG-2, MPEG-4, and MPEG-4 / AVC are often used as video encoding technologies, and recently, HEVC, which is the next generation video encoding standard, is being standardized. In the video coding standard, intra-frame coding that performs coding using information closed in one picture and inter-screen coding that performs coding using a plurality of temporally continuous pictures are used. Yes.
In inter-frame coding, motion prediction processing is performed to reduce the difference value between the screens, and the information amount is reduced by encoding the difference value and the motion vector information. However, in order to capture the correct motion of the video and reduce the difference value to be encoded, a large amount of computation is required for motion prediction processing.

この膨大な演算量を削減するためにはいくつかの方法が考えられる。１つ目は予測ベクトルの周辺を探索する方法である。これは符号化ブロックの動き予測結果と周囲の符号化ブロックの動き予測結果には相関が高いという傾向に基づいたものであり、ＭＰＥＧ−４／ＡＶＣや次世代符号化規格であるＨＥＶＣでは予測ベクトルの算出方法が定められており、ブロックの動きベクトルは予測ベクトルからの差分値のみを符号化することで符号量の削減を行っている。この予測ベクトルを利用し、ブロック周辺の既出の動きベクトルから予測ベクトルを算出し、予測ベクトルが指し示す位置を探索の中心として周辺を探索するという方法である。 Several methods are conceivable for reducing this enormous amount of calculation. The first is a method for searching around the prediction vector. This is based on the tendency that the motion prediction result of the encoded block and the motion prediction result of the surrounding encoded blocks have a high correlation. In MPEG-4 / AVC and HEVC which is the next generation encoding standard, a prediction vector is used. The calculation method is determined, and the code amount is reduced by encoding only the difference value from the prediction vector of the motion vector of the block. This prediction vector is used to calculate a prediction vector from the motion vectors already present around the block, and to search the periphery using the position indicated by the prediction vector as the center of the search.

２つ目は探索点を効率よく削減し探索する方法である。ステップサーチに代表されるように、ある探索範囲の探索点全てを評価するのではなく、まず広く離散的な数点を評価しその中で最も適した点を中心に近傍点を探索して行くことで少ない演算量で動きベクトルを決定する方法である。近年ではＥＰＺＳ（Enhanced Predictive Zonal Search）と呼ばれるステップサーチと全探索を組み合わせた探索手法が使われており、少ない演算量で全探索とほぼ同等程度の探索精度を実現している。 The second is a method for efficiently reducing search points. Rather than evaluating all search points in a certain search range, as represented by step search, first, a wide range of discrete points are evaluated, and neighboring points are searched around the most suitable point. This is a method for determining a motion vector with a small amount of calculation. In recent years, a search method called a combined step search called EPZS (Enhanced Predictive Zonal Search) and a full search has been used, and a search accuracy almost equivalent to that of a full search has been realized with a small amount of calculation.

３つ目は縮小画像による探索方法である。探索処理の前に入力映像信号のピクチャと復号映像信号の参照画像を同じ比率で縮小し探索を行う。これにより探索点の数と探索点を評価するための演算量の両方が削減されるため、大幅な演算量の削減が可能となる。この探索方法は演算性能が限られているソフトウェアコーデックや、広い探索範囲を必要とするプロ向けのハードウェアコーデック等で有効な方法である。しかし、一方で画像の縮小比率を上げていくと物体の細かい動きを正確に予測することが難しくなるため、動きベクトルの精度は悪くなってしまうという問題がある。 The third is a search method using reduced images. Before the search process, the search is performed by reducing the picture of the input video signal and the reference image of the decoded video signal at the same ratio. As a result, both the number of search points and the calculation amount for evaluating the search points are reduced, so that the calculation amount can be greatly reduced. This search method is effective for software codecs with limited computing performance, professional hardware codecs that require a wide search range, and the like. However, when the image reduction ratio is increased, it becomes difficult to accurately predict the fine motion of the object, and there is a problem that the accuracy of the motion vector is deteriorated.

このような問題を解決するために、広い探索範囲と高い動きベクトルの精度を両立させる従来技術の手法としては、複数の探索領域を設定し第一の領域の動き探索後に評価値の閾値判定を行い、第二の領域の動き探索を行うか否かを決定する動きベクトル探索方法がある（例えば、特許文献１参照）。 In order to solve such a problem, as a conventional technique for achieving both a wide search range and high motion vector accuracy, a plurality of search regions are set, and evaluation value threshold determination is performed after motion search of the first region. There is a motion vector search method that determines whether or not to perform motion search in the second region (see, for example, Patent Document 1).

特開平１１−０７５１９２号公報Japanese Patent Application Laid-Open No. 11-075192

しかしながら、特許文献１に記載の動きベクトル探索方法にあっては、符号化ブロックに対する探索演算量が均一にならないため、ソフトウェアでの実装においては問題ないがハードウェアへの実装においては制御が難しくなるという問題がある。 However, in the motion vector search method described in Patent Document 1, since the search calculation amount for the coding block is not uniform, there is no problem in the implementation in software, but the control in the implementation in hardware becomes difficult. There is a problem.

本発明は、このような事情に鑑みてなされたもので、探索処理よりも前段階で入力映像信号のピクチャと復号映像信号の参照画像の縮小比率と探索範囲を決定して探索を行うことにより、符号化ブロックに対する探索演算量を均一にすることができる映像符号化装置及び映像符号化プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and by performing a search by determining a reduction ratio and a search range of a picture of an input video signal and a reference image of a decoded video signal before the search process. An object of the present invention is to provide a video encoding device and a video encoding program capable of making the search calculation amount for the encoding block uniform.

本発明の一態様は、入力映像信号のピクチャの時間的相関を利用し、該ピクチャについて符号化ブロック単位に動き予測を行いその差分の符号化処理を行う映像符号化装置であって、前記入力映像信号のピクチャと動き予測先の復号映像信号の参照画像を所定の縮小比率を用いて縮小処理を行う縮小手段と、前記縮小手段において縮小した入力映像信号のピクチャの符号化ブロックと復号映像信号の参照画像の探索領域とを用いて動き予測処理を行い、動きベクトルを決定するベクトル決定手段と、を備え、前記縮小手段は、前記入力映像信号のピクチャの直前に符号化を行ったピクチャの動きベクトルに対して動きベクトルの時間的距離を揃えるためのスケーリング処理を施した後に動きベクトル成分の分散値を算出し、算出した前記分散値に応じた縮小比率で、前記入力映像信号のピクチャと前記復号映像信号の参照画像とを縮小処理することを特徴とする映像符号化装置である。 One aspect of the present invention is a video encoding apparatus that uses temporal correlation of a picture of an input video signal, performs motion prediction for each encoded block for the picture, and performs encoding processing of the difference. Reduction means for reducing a reference image of a picture of a video signal and a decoded video signal of a motion prediction destination using a predetermined reduction ratio, a coded block of a picture of an input video signal reduced by the reduction means, and a decoded video signal And a vector determining unit that determines a motion vector by performing motion prediction processing using the reference image search region, and the reducing unit is configured to detect a picture that has been encoded immediately before the picture of the input video signal. The variance value of the motion vector component is calculated after performing a scaling process to align the temporal distance of the motion vector with respect to the motion vector, and the calculated variance In the reduction ratio corresponding to a video encoding apparatus, characterized in that the reduction process and the reference picture of the picture and the decoded video signal of the input video signal.

本発明の一態様は、コンピュータを、上記の映像符号化装置として機能させるための映像符号化プログラムである。 One embodiment of the present invention is a video encoding program for causing a computer to function as the above video encoding device.

本発明によれば、動き予測を用いて符号化を行う映像符号化方式において、動き探索を行う際の縮小比率を適応的に切り替えることにより演算量を均一化することができるとともに動きベクトルの精度を向上させることができるという効果が得られる。 According to the present invention, in a video coding system that performs coding using motion prediction, the amount of calculation can be made uniform by adaptively switching the reduction ratio when performing the motion search, and the accuracy of the motion vector The effect that can be improved is obtained.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示すインター予測処理部１０２の構成を示すブロック図である。It is a block diagram which shows the structure of the inter prediction process part 102 shown in FIG. 符号化ピクチャ構造の一例（階層符号化Ｍ＝３の場合）を示す図である。It is a figure which shows an example (when hierarchical coding M = 3) of an encoded picture structure. 符号化ピクチャ構造の一例（階層符号化Ｍ＝８の場合）を示す図である。It is a figure which shows an example (when hierarchical coding M = 8) of an encoded picture structure. 各フレームレートにおける入力映像信号と復号映像信号との時間的距離を示す図である。It is a figure which shows the time distance of the input video signal and decoded video signal in each frame rate.

以下、図面を参照して、本発明の一実施形態による映像符号化装置を説明する。図１は同実施形態の構成を示すブロック図である。以下に示す「符号化ブロック」についてはＭＰＥＧ−２やＨ．２６４／ＡＶＣ規格ではマクロブロックのことを示し、ＨＥＶＣについてはコーディングユニット（ＣＵ）又はプレディクションユニット（ＰＵ）のことを指し示す。図１に示す映像符号化装置１００において、インター予測処理部１０２の中の縮小画像生成部２０１が、従来技術と異なる構成であり、他の部分はＨ．２６４／ＡＶＣやＨＥＶＣ等の映像符号化装置として用いられている従来の一般的な構成と同様である。 Hereinafter, a video encoding apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. The following "encoded block" is MPEG-2 or H.264. The H.264 / AVC standard indicates a macro block, and HEVC indicates a coding unit (CU) or a prediction unit (PU). In the video encoding device 100 shown in FIG. 1, the reduced image generation unit 201 in the inter prediction processing unit 102 has a configuration different from that of the conventional technique, and the other part is H.264. This is the same as the conventional general configuration used as a video encoding device such as H.264 / AVC or HEVC.

映像符号化装置１００は、符号化対象の入力映像信号を入力し、入力映像信号のピクチャをブロックに分割してブロックごとに符号化し、そのビットストリームを符号化ストリームとして出力する。この符号化のため、予測残差信号生成部１０３は、入力映像信号とイントラ予測処理部１０１あるいはインター予測処理部１０２の出力である予測信号との差分を求め、それを予測残差信号として出力する。変換・量子化処理部１０４は、予測残差信号に対して離散コサイン変換等の直交変換を行い、変換係数を量子化し、その量子化された変換係数を出力する。エントロピー符号化部１０５は、量子化された変換係数をエントロピー符号化し、符号化ストリームとして出力する。 The video encoding device 100 receives an input video signal to be encoded, divides a picture of the input video signal into blocks, encodes each block, and outputs the bit stream as an encoded stream. For this encoding, the prediction residual signal generation unit 103 obtains a difference between the input video signal and the prediction signal output from the intra prediction processing unit 101 or the inter prediction processing unit 102, and outputs it as a prediction residual signal. To do. The transform / quantization processing unit 104 performs orthogonal transform such as discrete cosine transform on the prediction residual signal, quantizes the transform coefficient, and outputs the quantized transform coefficient. The entropy encoding unit 105 entropy-encodes the quantized transform coefficient and outputs it as an encoded stream.

一方、量子化された変換係数は、逆量子化・逆変換処理部１０６にも入力され、ここで逆量子化と逆直交変換され、予測残差信号を出力する。復号映像信号生成部１０７では、予測残差信号とイントラ予測処理部１０１あるいはインター予測処理部１０２の出力である予測信号とを加算し、符号化した符号化対象ブロックの復号映像信号を生成する。この復号映像信号は、インター予測処理部１０２で参照画像として用いるために、ループフィルタ処理部１０８に対して出力される。ループフィルタ処理部１０８では符号化歪みを低減するフィルタリング処理を行い、このフィルタリング処理後の画像を復号映像信号としてインター予測処理部１０２に対して出力する。 On the other hand, the quantized transform coefficient is also input to the inverse quantization / inverse transform processing unit 106, where inverse quantization and inverse orthogonal transform are performed, and a prediction residual signal is output. The decoded video signal generation unit 107 adds the prediction residual signal and the prediction signal output from the intra prediction processing unit 101 or the inter prediction processing unit 102 to generate a decoded video signal of the encoded target block. The decoded video signal is output to the loop filter processing unit 108 for use as a reference image by the inter prediction processing unit 102. The loop filter processing unit 108 performs filtering processing to reduce coding distortion, and outputs the image after the filtering processing to the inter prediction processing unit 102 as a decoded video signal.

次に、図２を参照して、図１に示すインター予測処理部１０２の構成を説明する。図２は、図１に示すインター予測処理部１０２の構成を示すブロック図である。インター予測処理部１０２は、縮小画像生成部２０１、プレ探索処理部２０２、整数画素探索処理部２０３、第一モード判定部２０４、小数画素探索処理部２０５、第二モード判定部２０６及び小数画像生成部２０７を備える。縮小画像生成部２０１は、入力映像信号（原画像）と復号映像信号を入力し、縮小処理を行って出力する。プレ探索処理部２０２は、縮小された入力画像信号及び復号映像信号を入力し、縮小された映像データ上で動き探索処理を行う。整数画素探索処理部２０３は、プレ探索処理部２０２で探索した結果の動きベクトルを入力する。また、第一モード判定部２０４は、プレ探索処理部２０２から符号化モード情報を入力する。 Next, the configuration of the inter prediction processing unit 102 illustrated in FIG. 1 will be described with reference to FIG. FIG. 2 is a block diagram showing a configuration of the inter prediction processing unit 102 shown in FIG. The inter prediction processing unit 102 includes a reduced image generation unit 201, a pre-search processing unit 202, an integer pixel search processing unit 203, a first mode determination unit 204, a decimal pixel search processing unit 205, a second mode determination unit 206, and a decimal image generation. A unit 207 is provided. The reduced image generation unit 201 inputs an input video signal (original image) and a decoded video signal, performs a reduction process, and outputs the result. The pre-search processing unit 202 inputs the reduced input image signal and the decoded video signal, and performs a motion search process on the reduced video data. The integer pixel search processing unit 203 inputs a motion vector as a result of searching by the pre-search processing unit 202. Further, the first mode determination unit 204 receives the encoding mode information from the pre-search processing unit 202.

整数画素探索処理部２０３は、プレ探索処理部２０２と第一モード判定部２０４でそれぞれ指定された動きベクトル及び符号化モードで整数画素の探索処理を行う。小数画素探索処理部２０５は、整数画素探索処理部２０３において探索した結果の動きベクトルを入力する。第二モード判定部２０６は、整数画素探索処理部２０３から符号化モード情報を入力する。 The integer pixel search processing unit 203 performs an integer pixel search process using the motion vector and the encoding mode specified by the pre-search processing unit 202 and the first mode determination unit 204, respectively. The decimal pixel search processing unit 205 inputs a motion vector obtained as a result of searching in the integer pixel search processing unit 203. The second mode determination unit 206 receives the encoding mode information from the integer pixel search processing unit 203.

小数画素探索処理部２０５は、整数画素探索処理部２０３と第二モード判定部２０６でそれぞれ指定された動きベクトル及び符号化モードで小数画素の探索処理を行う。小数画像生成部２０７は該当する復号映像信号位置の小数画素補間画像を生成し、小数画素探索処理部２０５に対して出力する。小数画素探索処理部２０５において探索された結果の予測残差画像と動きベクトル情報は予測残差信号生成部１０３に対して出力される。 The decimal pixel search processing unit 205 performs a decimal pixel search process using the motion vector and the encoding mode specified by the integer pixel search processing unit 203 and the second mode determination unit 206, respectively. The decimal image generation unit 207 generates a decimal pixel interpolated image at the corresponding decoded video signal position and outputs it to the decimal pixel search processing unit 205. A prediction residual image and motion vector information obtained as a result of searching in the decimal pixel search processing unit 205 are output to the prediction residual signal generation unit 103.

本実施形態は、図２に示す縮小画像生成部２０１において、プレ探索処理部２０２で行う探索処理の縮小精度を適応的に決定することで高い符号化効率と少ない演算量を実現するものである。ここで、縮小画像による動き予測処理の必要性について説明する。動き予測処理は物体の動きを正確に追随することで高い符号化効率を実現できるため、探索範囲は広ければ広いほど符号化効率が向上する。しかし、探索範囲を広くすればするほど動き予測に必要な演算量が増大するため、ハードウェア・ソフトウェアの処理能力やアプリケーション用途に応じて適切な探索範囲を設定している。少ない演算量で広い探索範囲をカバーする方法の１つとして縮小画像による探索方法がある。 In this embodiment, the reduced image generation unit 201 shown in FIG. 2 realizes high encoding efficiency and a small amount of calculation by adaptively determining the reduction accuracy of the search processing performed by the pre-search processing unit 202. . Here, the necessity of motion prediction processing using reduced images will be described. Since the motion prediction process can realize high coding efficiency by accurately following the motion of the object, the wider the search range, the better the coding efficiency. However, since the calculation amount necessary for motion prediction increases as the search range becomes wider, an appropriate search range is set in accordance with the processing capability of hardware / software and application usage. One method for covering a wide search range with a small amount of computation is a search method using a reduced image.

例として当該符号化ブロックのサイズを１６×１６、探索範囲を±８、単画素精度による全探索の動き探索処理部とすると、１ブロックの動き探索処理を行うのに必要な差分処理回数は、１６×１６×（８×２＋１）２＝７３９８４となる。入力映像信号と復号映像信号をそれぞれ１／２に縮小し同様の探索範囲を探索した場合、符号化ブロックのサイズと探索範囲がそれぞれ縦横半分になるため、差分演算回数は８×８×（４×２＋１）２＝５１８４となり、約７％の演算量で処理が可能となる。 As an example, assuming that the size of the coding block is 16 × 16, the search range is ± 8, and the motion search processing unit for full search with single pixel accuracy is, the number of difference processes necessary to perform the motion search processing of one block is 16 × 16 × (8 × 2 + 1) 2 = 73984. When a similar search range is searched by reducing the input video signal and the decoded video signal by ½, the size of the encoding block and the search range are halved vertically and horizontally, so the number of difference calculations is 8 × 8 × (4 × 2 + 1) 2 = 5184, and processing is possible with a calculation amount of about 7%.

また、単画素精度の探索処理で必要であった演算回数と同等の演算を１／２縮小画像上で行うと、±１６画素の探索範囲をカバーできる。縮小比率を１／２、１／４、１／８と大きくすればするほど少ない演算量で広い探索範囲をカバーすることができるが、細かいテクスチャがつぶれてしまうために動きベクトルの精度が低下するという問題点もある。
縮小画像探索による探索範囲と動きベクトルの精度についてはトレードオフの関係があるため、入力映像信号や復号映像信号の特性や特徴に応じて適応的に設定する必要がある。 Further, if a calculation equivalent to the number of calculations required for the search process with single pixel accuracy is performed on the 1/2 reduced image, a search range of ± 16 pixels can be covered. As the reduction ratio is increased to 1/2, 1/4, and 1/8, a wide search range can be covered with a small amount of calculation, but the accuracy of the motion vector decreases because the fine texture is crushed. There is also a problem.
Since there is a trade-off relationship between the search range by the reduced image search and the accuracy of the motion vector, it is necessary to set adaptively according to the characteristics and characteristics of the input video signal and the decoded video signal.

次に、縮小精度を決定する手法を説明する。１つ目の縮小精度決定手法は入力映像信号のピクチャと復号映像信号の参照画像との時間的距離と画像サイズによって縮小精度を適応的に設定する手法である。入力映像信号のピクチャと復号映像信号の参照画像との時間的距離が大きくなればなるほど動いている物体の動き量は大きくなるため、広い範囲を探索する必要がある。入力映像信号のピクチャと復号映像信号の参照画像との時間的距離は符号化ピクチャ構造とフレームレートによって決定する。符号化ピクチャ構造は様々な種類があるが、代表的なものは図３、図４に示すものである。 Next, a method for determining the reduction accuracy will be described. The first reduction accuracy determination method is a method for adaptively setting the reduction accuracy according to the temporal distance between the picture of the input video signal and the reference image of the decoded video signal and the image size. As the temporal distance between the picture of the input video signal and the reference image of the decoded video signal increases, the amount of motion of the moving object increases, so it is necessary to search a wide range. The temporal distance between the picture of the input video signal and the reference image of the decoded video signal is determined by the encoded picture structure and the frame rate. There are various types of coded picture structures, but typical ones are shown in FIGS.

図３は主にＭＰＥＧ−２やＨ．２６４／ＡＶＣで用いられ、図４はＨＥＶＣで主に用いられている符号化ピクチャ構造である。図３及び図４のＰで示したピクチャは過去のピクチャのみを参照して符号化を行う片方向予測ピクチャを表しており、Ｂで示したピクチャは過去のピクチャと未来のピクチャの両方を参照して符号化を行う両方向予測ピクチャを表している。図３の符号化ピクチャ構造場合、入力映像信号の当該ピクチャと復号映像信号との間隔はＰピクチャで３ピクチャ間、Ｂピクチャは１ピクチャ間又は２ピクチャ間となっている。図４の場合、入力映像信号のピクチャと復号映像信号の参照画像との間隔は１ピクチャ間〜８ピクチャ間まで変化する。ただし符号化ピクチャ構造とその参照関係はこの通りではなく実際の符号化時には任意に設定可能である。 3 mainly shows MPEG-2 and H.264. FIG. 4 shows a coded picture structure mainly used in HEVC. The picture indicated by P in FIGS. 3 and 4 represents a unidirectional prediction picture that is encoded with reference to only past pictures, and the picture indicated by B refers to both past pictures and future pictures. This represents a bidirectional prediction picture to be encoded. In the encoded picture structure of FIG. 3, the interval between the picture of the input video signal and the decoded video signal is a P picture between 3 pictures, and a B picture is between 1 picture or 2 pictures. In the case of FIG. 4, the interval between the picture of the input video signal and the reference image of the decoded video signal varies from 1 picture to 8 pictures. However, the encoded picture structure and its reference relationship are not as described above, and can be arbitrarily set during actual encoding.

また、映像のフレームレートによっても入力映像信号のピクチャと復号映像信号の参照画像との時間的距離が変化する。フレームレートとは、１秒間に表示されるフレーム数のことであり、用途に応じて２４フレーム／秒、３０フレーム／秒、６０フレーム／秒、１２０フレーム／秒など様々なフォーマットが存在する。 Further, the temporal distance between the picture of the input video signal and the reference image of the decoded video signal also changes depending on the video frame rate. The frame rate is the number of frames displayed per second, and there are various formats such as 24 frames / second, 30 frames / second, 60 frames / second, and 120 frames / second depending on the application.

符号化ピクチャ構造とフレームレートにおける入力映像信号のピクチャと復号映像信号の参照画像との時間的距離の関係を図５に示す。図５は、各フレームレートにおける入力映像信号と復号映像信号との時間的距離を示す図である。１つ目の縮小精度決定手法としては図５に示す入力映像信号のピクチャと復号映像信号の参照画像との時間的距離に応じて縮小比率を適応的に切り替えるものである。例えば、入力映像信号のピクチャと復号映像信号の参照画像との時間的距離に閾値を設定し、閾値以下の場合はピクチャ間の時間的距離は小さいため低い縮小比率の縮小画像を用いて探索を行うことで精度の良い動き予測処理を実行し、閾値以上の場合はピクチャ間の時間的距離は大きいため高い縮小比率の縮小画像を用いて探索を行うことで広い探索範囲で動き予測処理を実行する。 FIG. 5 shows the relationship between the encoded picture structure and the temporal distance between the picture of the input video signal and the reference image of the decoded video signal at the frame rate. FIG. 5 is a diagram showing the temporal distance between the input video signal and the decoded video signal at each frame rate. As a first reduction accuracy determination method, the reduction ratio is adaptively switched according to the temporal distance between the picture of the input video signal and the reference image of the decoded video signal shown in FIG. For example, a threshold is set for the temporal distance between the picture of the input video signal and the reference image of the decoded video signal. If the threshold is equal to or smaller than the threshold, the temporal distance between pictures is small and the search is performed using a reduced image with a low reduction ratio. Performing motion prediction processing with high accuracy by performing, and if the threshold value is exceeded, the temporal distance between pictures is large, so the motion prediction processing is performed in a wide search range by performing a search using a reduced image with a high reduction ratio To do.

これによって少ない演算量でも入力映像信号のピクチャと復号映像信号の参照画像との時間的距離が遠ければ遠いほど広い範囲の動き探索処理が可能となり符号化効率が向上する。上記の閾値は複数用いるようにしてもよい。１つの閾値を用いれば２種類の縮小比率を切り替えることができる。２つの閾値を用いれば３種類の縮小比率を適応的に切り替えることが可能となる。また、縮小比率と動き探索範囲を可変とすることで、１ピクチャあたりの動き予測に要する演算量が一定に保つことが可能となる。これはハードウェア実装においては高いハードウェア稼働率で動作させることが可能となり、無駄のないハードウェア構成が可能となる。 As a result, even if the calculation amount is small, the longer the temporal distance between the picture of the input video signal and the reference image of the decoded video signal, the wider the motion search process becomes possible, and the coding efficiency improves. A plurality of the above threshold values may be used. If one threshold is used, two kinds of reduction ratios can be switched. If two threshold values are used, the three reduction ratios can be switched adaptively. Further, by making the reduction ratio and the motion search range variable, it is possible to keep the calculation amount required for motion prediction per picture constant. In hardware implementation, it is possible to operate at a high hardware operation rate, and a hardware configuration without waste is possible.

また、近年画像サイズがＨＤサイズ（１９２０×１０８０画素）から４ｋ（３８４０×２１６０画素）や８ｋ（７６８０×４３２０）という画像サイズが普及しつつある。全く同じ内容の映像であっても画像サイズがＨＤサイズから４ｋサイズになると、その物体の動き量は２倍になるため、動き探索範囲も２倍の広さが必要になるため、上記で示した時間的距離の閾値は画像サイズによっても個別に設定する必要がある。 In recent years, image sizes ranging from HD size (1920 × 1080 pixels) to 4k (3840 × 2160 pixels) and 8k (7680 × 4320) are becoming widespread. Even if the video content is exactly the same, if the image size is changed from HD size to 4k size, the amount of motion of the object is doubled, and the motion search range needs to be doubled. In addition, the threshold of the temporal distance needs to be set individually depending on the image size.

次に、２つ目の縮小精度決定手法について説明する。２つ目の縮小精度決定手法は入力映像信号の画像情報（輝度値）から縮小比率を適応的に切り替える手法である。例えば入力映像信号のピクチャと復号映像信号の参照画像との間のフレーム間差分絶対値和を算出し縮小比率を切り替える。入力映像信号のピクチャの画素値をｃｕｒ（ｘ，ｙ）、復号映像信号の参照画像の画素値をｒｅｆ（ｘ，ｙ）、ピクチャ水平サイズをｗｉｄｔｈ、ピクチャ垂直サイズをｈｅｉｇｈｔとすると、入力映像信号のピクチャと復号映像信号の参照画像とのフレーム間差分絶対値和ｄｉｆｆは（１）式のように表すことができる。

Next, the second reduction accuracy determination method will be described. The second reduction accuracy determination method is a method for adaptively switching the reduction ratio from the image information (luminance value) of the input video signal. For example, the sum of absolute differences between frames between the picture of the input video signal and the reference image of the decoded video signal is calculated, and the reduction ratio is switched. When the pixel value of the picture of the input video signal is cur (x, y), the pixel value of the reference image of the decoded video signal is ref (x, y), the picture horizontal size is width, and the picture vertical size is height, the input video signal An inter-frame difference absolute value sum diff between the picture and the reference image of the decoded video signal can be expressed as in equation (1).

上記のフレーム間差分絶対値和が大きい場合は物体の動き量が大きいと判断し、縮小比率を大きくして広い探索範囲で動き探索処理を行う。フレーム間差分絶対値和が小さい場合は物体が動いていない又は物体の動き量が小さいと判断し縮小比率は小さくして精度よく動き探索処理を行う。このようにして入力映像信号のピクチャの画像情報から物体の動き量を推定し縮小比率を適応的に切り替える。 When the sum of absolute differences between frames is large, it is determined that the amount of motion of the object is large, and the motion search process is performed in a wide search range by increasing the reduction ratio. If the sum of absolute differences between frames is small, it is determined that the object is not moving or the amount of motion of the object is small, and the reduction ratio is reduced to perform the motion search process with high accuracy. In this way, the amount of motion of the object is estimated from the image information of the picture of the input video signal, and the reduction ratio is adaptively switched.

同様の手法として、入力映像信号のピクチャと復号映像信号の参照画像の符号化ブロック累算値の差分絶対値和を用いる方法もある。例として符号化ブロックを６４×６４画素、１ピクチャの水平方向の符号化ブロック数をｂｌｋ＿ｗｉｄｔｈ、１ピクチャの垂直方向の符号化ブロック数をｂｌｋ＿ｈｅｉｇｈｔとすると、符号化ブロック累算値の差分絶対値和ｄｉｆｆ２は（２）式のように表すことができる。

As a similar method, there is also a method using the sum of absolute differences of the encoded block accumulated values of the picture of the input video signal and the reference image of the decoded video signal. As an example, if the encoded block is 64 × 64 pixels, the number of encoded blocks in the horizontal direction of one picture is blk_width, and the number of encoded blocks in the vertical direction of one picture is blk_height, the sum of absolute differences of the encoded block accumulated values diff2 can be expressed as shown in equation (2).

上記の累算値の差分絶対値和ｄｉｆｆ２が大きい場合は物体の動き量が大きいと判断し、縮小比率を大きくして広い探索範囲で動き探索処理を行う。累算値の差分絶対値和が小さい場合は物体が動いていない又は物体の動き量が小さいと判断し縮小比率は小さくして精度よく動き探索処理を行う。このようにして入力映像信号のピクチャの画像情報から物体の動き量を推定し縮小比率を適応的に切り替える。 If the sum of absolute differences diff2 of the accumulated values is large, it is determined that the amount of motion of the object is large, and the motion search process is performed in a wide search range by increasing the reduction ratio. When the sum of absolute differences of accumulated values is small, it is determined that the object is not moving or the amount of movement of the object is small, and the reduction ratio is reduced to perform the motion search process with high accuracy. In this way, the amount of motion of the object is estimated from the image information of the picture of the input video signal, and the reduction ratio is adaptively switched.

また、入力映像信号のピクチャのテクスチャの細かさによって縮小比率を適応的に切り替える手法もある。入力映像信号のピクチャのテクスチャの細かさを推定するために、符号化ブロックごとの分散値の１ピクチャの合計値を算出する。符号化ブロックの画素値の平均値をａｖｅとした場合の符号化ブロックの分散値の１ピクチャの合計値ｖａｒ＿ｐｉｃは式（３）のように表すことができる。

There is also a method of adaptively switching the reduction ratio according to the fineness of the texture of the picture of the input video signal. In order to estimate the fineness of the texture of the picture of the input video signal, the total value of one picture of the variance value for each coding block is calculated. The total value var_pic of one picture of the variance value of the coding block when the average value of the pixel values of the coding block is ave can be expressed as in Expression (3).

上記の符号化ブロックごとの分散値の１ピクチャ合計値が大きい場合は細かいテクスチャが多い画像であると判断できるので、縮小比率を小さくし比較的狭い範囲を精度よく動き探索処理を行う。このようにして入力映像信号のピクチャの画像情報から物体のテクスチャの細かさを推定し縮小比率を適応的に切り替える。 If the one-picture total value of the variance value for each coding block is large, it can be determined that the image has many fine textures. Therefore, the reduction ratio is reduced and the motion search process is performed accurately in a relatively narrow range. In this way, the fineness of the texture of the object is estimated from the image information of the picture of the input video signal, and the reduction ratio is adaptively switched.

次に、３つ目の縮小精度決定手法について説明する。３つ目の縮小精度決定手法は符号化順で直前に符号化したピクチャの符号化結果を用いる手法である。入力映像信号のピクチャの符号化開始時に、符号化順で１つ前に符号化を行ったピクチャの符号化ブロックごとの動きベクトル成分について、時間的距離を揃えるためのスケーリング処理を施した後にその動きベクトル成分の分散値を１ピクチャ分計算する。 Next, a third reduction accuracy determination method will be described. The third reduction accuracy determination method is a method using the encoding result of the picture encoded immediately before in the encoding order. At the start of encoding of the picture of the input video signal, the motion vector component for each coding block of the picture that was coded one before in the coding order is subjected to scaling processing for equalizing the temporal distance, and then The variance value of the motion vector component is calculated for one picture.

符号化順で１つ前に符号化を行ったピクチャの動きベクトルの水平成分をｍｖ＿ｘ［ｉ］、垂直成分をｍｖ＿ｙ［ｉ］、当該ピクチャとその参照画像との時間的距離をｔ＿ｃｕｒ、符号化順で１つ前に符号化を行ったピクチャとその参照画像との時間的距離をｔ＿ｐｓｔとすると、時間的距離でスケーリングされた動きベクトル成分ｍｖ＿ｓｃａｌｅ＿ｘ［ｉ］、及びｍｖ＿ｓｃａｌｅ＿ｙ［ｉ］は（４）式のように表すことができる。

The horizontal component of the motion vector of the picture previously coded in the coding order is mv_x [i], the vertical component is mv_y [i], the temporal distance between the picture and its reference image is t_cur, and coding Assuming that the temporal distance between the previously encoded picture and its reference image is t_pst, the motion vector components mv_scale_x [i] and mv_scale_y [i] scaled by the temporal distance are (4). It can be expressed as:

符号化順で１つ前に符号化を行ったピクチャの１ピクチャ分の動きベクトルの本数をｍｖ＿ｎｕｍ、動きベクトル水平成分の平均値をｍｖ＿ｘ＿ａｖｅ、垂直成分の平均値をｍｖ＿ｙ＿ａｖｅとすると、動きベクトルの分散値ｍｖ＿ｖａｒ＿ｘ、ｍｖ＿ｖａｒ＿ｙはそれぞれ（５）式、（６）式のように表すことができる。

When the number of motion vectors for one picture of a picture previously coded in the coding order is mv_num, the average value of the motion vector horizontal component is mv_x_ave, and the average value of the vertical component is mv_y_ave, the variance of the motion vector The values mv_var_x and mv_var_y can be expressed as in equations (5) and (6), respectively.

最後に、（７）式に示す様に上記（５）式、（６）式の合計値ｍｖ＿ｖａｒを算出する。

ｍｖ＿ｖａｒと閾値を比較してｍｖ＿ｖａｒの方が大きい場合は物体が一様な動きをしておらず、様々な方向に動いていたり、画面のズームインやズームアウトの映像であったり、海面のように動きが一定ではないような映像であることが推測できる。この様な場合は縮小比率を大きくして広い探索範囲で動き探索処理を行うことで様々な動きに対応する。逆にｍｖ＿ｖａｒと閾値を比較してｍｖ＿ｖａｒの方が小さい場合は、物体の動きは小さい又は一様な動きをしていると推測できるので、縮小比率を小さくし狭い範囲を精度よく探索を行う。 Finally, as shown in the equation (7), the total value mv_var of the above equations (5) and (6) is calculated.

If mv_var is larger than mv_var and the threshold is larger, the object is not moving uniformly, it is moving in various directions, it is an image of zooming in and out of the screen, like the sea surface It can be inferred that the motion is not constant. In such a case, various motions can be handled by increasing the reduction ratio and performing motion search processing in a wide search range. On the other hand, if mv_var is smaller than mv_var and the threshold is smaller, it can be estimated that the movement of the object is small or uniform, and therefore the reduction ratio is reduced and a narrow range is searched with high accuracy.

また、符号化順で１つ前に符号化を行ったピクチャの差分動きベクトルの累算値を用いる方法もある。差分動きベクトルとは、Ｈ．２６４／ＡＶＣやＨＥＶＣで定められている符号化ブロックの周囲のブロックの動きベクトルから決定される予測ベクトルと動きベクトルとの差分値のことを指す。符号化順で１つ前に符号化を行ったピクチャの差分動きベクトルを１ピクチャ分累算する。差分動きベクトルをそれぞれｍｖ＿ｄｉｆｆ＿ｘ［ｉ］、ｍｖ＿ｄｉｆｆ＿ｙとすると差分動きベクトルの１ピクチャ累算値ｍｖ＿ｄｉｆｆ＿ｓｕｍは式（８）のように表される。

There is also a method of using an accumulated value of differential motion vectors of a picture that has been coded one before in the coding order. The difference motion vector is H.264. It indicates a difference value between a motion vector and a prediction vector determined from motion vectors of blocks around a coding block defined by H.264 / AVC or HEVC. The differential motion vector of the picture previously coded in the coding order is accumulated for one picture. Assuming that the difference motion vectors are mv_diff_x [i] and mv_diff_y, respectively, the one-picture accumulated value mv_diff_sum of the difference motion vector is expressed as shown in Expression (8).

ｍｖ＿ｄｉｆｆ＿ｓｕｍと閾値を比較してｍｖ＿ｄｉｆｆ＿ｓｕｍの方が大きい場合は物体が一様な動きをしていない映像であると推測できるので、縮小比率を大きくし広い範囲の探索を行う。逆にｍｖ＿ｄｉｆｆ＿ｓｕｍと閾値を比較してｍｖ＿ｄｉｆｆ＿ｓｕｍの方が小さい場合は物体が動いていない又は一様な動きをしている映像であると推測できるので、縮小比率を小さくして狭い範囲を精度よく探索を行う。 When the threshold value is compared with mv_diff_sum and mv_diff_sum is larger, it can be estimated that the object is not moving uniformly, so the reduction ratio is increased and a wide range search is performed. Conversely, when mv_diff_sum is compared with the threshold and mv_diff_sum is smaller, it can be assumed that the object is not moving or is moving uniformly, so the reduction ratio is reduced and a narrow range is searched accurately. I do.

以上説明したように、様々な条件から入力映像信号のピクチャと復号映像信号の参照画像の縮小精度を適応的に設定することによって、物体の動きが大きい場合には広い探索範囲を、物体の動きが小さい場合には細かい精度の動き探索を行うことが可能となり、全体の符号化効率を向上させることができる。縮小画像を生成するための手法としては画素を間引く手法や、画素の平均値を用いる手法、縮小フィルタを用いる手法等があるが、本実施形態では縮小画像生成方法は任意の手法が適用可能である。 As described above, by adaptively setting the reduction accuracy of the picture of the input video signal and the reference image of the decoded video signal under various conditions, a wide search range can be obtained when the object motion is large. When is small, it is possible to perform a motion search with fine accuracy, and the overall coding efficiency can be improved. As a method for generating a reduced image, there are a method of thinning out pixels, a method using an average value of pixels, a method using a reduction filter, and the like. In this embodiment, any method can be applied as a reduced image generation method. is there.

このように、縮小画像による探索手法に関する手法において、画像サイズや探索距離、物体の動き等の様々な情報を元に適切な縮小比率を設定し、ピクチャ毎に縮小比率を適応的に変更しながら探索処理を行うことで、少ない演算量で広い探索範囲と高い動きベクトルの精度を両立させることが可能になる。 In this way, in the method related to the search method using reduced images, an appropriate reduction ratio is set based on various information such as image size, search distance, and object motion, and the reduction ratio is adaptively changed for each picture. By performing the search process, it is possible to achieve both a wide search range and high motion vector accuracy with a small amount of calculation.

前述した実施形態における映像符号化装置１００をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve the video encoding apparatus 100 in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

動き探索処理に要する演算量が限られている映像符号化装置又は動き探索処理に要する演算量を均一にする必要のある映像符号化装置及び映像符号化プログラムに適用できる。 The present invention can be applied to a video encoding device in which the amount of computation required for motion search processing is limited, or a video encoding device and a video encoding program that require uniform computation amount for motion search processing.

１００・・・映像符号化装置、１０１・・・イントラ予測処理部、１０２・・・インター予測処理部、１０３・・・予測残差信号生成部、１０４・・・変換・量子化処理部、１０５・・・エントロピー符号化部、１０６・・・逆量子化・逆変換処理部、１０７・・・復号映像信号生成部、１０８・・・ループフィルタ処理部、２０１・・・縮小画像生成部、２０２・・・プレ探索処理部、２０３・・・整数画素探索処理部、２０４・・・第一モード判定部、２０５・・・小数画素探索処理部、２０６・・・第二モード判定部、２０７・・・小数画像生成部 DESCRIPTION OF SYMBOLS 100 ... Video coding apparatus, 101 ... Intra prediction process part, 102 ... Inter prediction process part, 103 ... Prediction residual signal generation part, 104 ... Transformation / quantization process part, 105 ... Entropy encoding unit, 106 ... Inverse quantization / inverse transformation processing unit, 107 ... Decoded video signal generation unit, 108 ... Loop filter processing unit, 201 ... Reduced image generation unit, 202 ... Pre-search processing unit, 203 ... Integer pixel search processing unit, 204 ... First mode determination unit, 205 ... Decimal pixel search processing unit, 206 ... Second mode determination unit, 207 ..Fractional image generator

Claims

A video encoding apparatus that uses temporal correlation of a picture of an input video signal, performs motion prediction for each encoding block for the picture, and performs encoding processing of the difference,
Reduction means for reducing the picture of the input video signal and the reference image of the decoded video signal of the motion prediction destination using a predetermined reduction ratio;
A vector determination unit that performs a motion prediction process using a coded block of a picture of the input video signal reduced by the reduction unit and a search area of a reference image of the decoded video signal, and determines a motion vector;
With
The reduction means calculates a variance value of a motion vector component after performing a scaling process for aligning a temporal distance of a motion vector with respect to a motion vector of a picture encoded immediately before a picture of the input video signal Then, the video coding apparatus characterized in that the picture of the input video signal and the reference image of the decoded video signal are reduced at a reduction ratio corresponding to the calculated variance value.

A video encoding program for causing a computer to function as the video encoding device according to claim 1.