JP6495834B2

JP6495834B2 - Video encoding method, video encoding apparatus, and video encoding program

Info

Publication number: JP6495834B2
Application number: JP2016000456A
Authority: JP
Inventors: 誠之高村; 清水　淳; 淳清水
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-01-05
Filing date: 2016-01-05
Publication date: 2019-04-03
Anticipated expiration: 2036-01-05
Also published as: JP2017123513A

Description

本発明は、映像符号化方法、映像符号化装置及び映像符号化プログラムに関する。 The present invention relates to a video encoding method, a video encoding device, and a video encoding program.

従来から、代表的な映像符号化規格として、映像符号化国際規格Ｈ．２６４／ＡＶＣ（Advanced Video Coding、例えば、非特許文献１参照）及びＨ．２６５／ＨＥＶＣ（High Efficiency Video Coding、例えば、非特許文献２参照）が知られている。これらの規格を用いた映像符号化を高効率に行うために、圧縮ビット列符号を出力する最後段の処理として「エントロピー符号化」がある。Ｈ．２６４／ＡＶＣ及びＨ．２６５／ＨＥＶＣにおいては、このエントロピー符号化としてＣＡＢＡＣ（コンテキスト適応型二値算術符号化方式；Context Adaptive Binary Arithmetic Coding）という方法が用いられている。 Conventionally, as a typical video coding standard, the video coding international standard H.264 has been used. H.264 / AVC (Advanced Video Coding, see Non-Patent Document 1, for example) and H.264 / AVC. H.265 / HEVC (High Efficiency Video Coding, see Non-Patent Document 2, for example) is known. In order to perform video coding using these standards with high efficiency, there is “entropy coding” as the last stage process for outputting a compressed bit string code. H. H.264 / AVC and H.264. In H.265 / HEVC, a CABAC (Context Adaptive Binary Arithmetic Coding) method is used as the entropy coding.

また高能率映像符号化においては、個々の符号化単位（映像を構成するフレームを所定サイズに分割したブロック）の符号化を行う際、符号化モードと呼ばれる選択肢が多数あり、その選択肢の中から適切な符号化モードを選択することが可能となっている。このとき、符号化モードのどれを選べば最適であるかが重要である。すなわち、一定符号量において歪みを最小化する符号化モード、言い方を換えると一定歪み量の下で符号量を最小化する符号化モードを選択する必要がある。最適な符号化モードの選択にはＬａｇｒａｎｇｅの未定乗数法に基づいた方法が用いられており、「ＲＤ最適化（Ｒａｔｅ−Ｄｉｓｔｏｒｔｉｏｎ最適化）」と呼ばれている。 In high-efficiency video encoding, there are many options called encoding modes when encoding individual encoding units (blocks obtained by dividing a frame constituting a video into a predetermined size). An appropriate encoding mode can be selected. At this time, it is important which of the coding modes is optimal. That is, it is necessary to select an encoding mode that minimizes distortion at a constant code amount, in other words, an encoding mode that minimizes code amount under a certain amount of distortion. A method based on Lagrange's undetermined multiplier method is used to select an optimal encoding mode, which is called “RD optimization (Rate-Distortion optimization)”.

ＲＤ最適化において用いる変数は３つある。１つ目の「Ｒ」はそのモードを選んで符号化単位を符号化をしたときのビット数（以下、「符号量推定値」という。）である。２つ目の「Ｄ」はそのときの復号信号と原信号の誤差二乗和である。そして３つ目の「λ」はラグランジュ未定乗数である。λは目標とする画質や符号量に応じて映像符号化装置が任意に選択するものである。そして、選択可能な各符号化モードについて、（１）式の「コスト関数」と呼ばれる量Ｃを最小化する符号化モードが選択される。
Ｃ＝Ｄ＋λＲ・・・（１） There are three variables used in RD optimization. The first “R” is the number of bits when the mode is selected and the encoding unit is encoded (hereinafter referred to as “code amount estimation value”). The second “D” is the sum of squared errors between the decoded signal and the original signal. The third “λ” is a Lagrange undetermined multiplier. λ is arbitrarily selected by the video encoding apparatus according to the target image quality and code amount. Then, for each selectable encoding mode, an encoding mode that minimizes the quantity C called “cost function” in equation (1) is selected.
C = D + λR (1)

Ｈ．２６５／ＨＥＶＣ及びＨ．２６４／ＡＶＣの実際の符号化ソフトウェアでは、ＲＤ最適化を行う場合に、実際のＣＡＢＡＣ符号化を行わず、発生符号量を推定によって求めることが多い。これはＣＡＢＡＣ符号化の処理量が比較的大きなものであるため、選択可能な各符号化モードについて符号量推定値Ｒを求める処理量が多大になってしまうのを避けるためである。 H. H.265 / HEVC and H.264. In actual coding software of H.264 / AVC, when RD optimization is performed, the generated code amount is often obtained by estimation without performing actual CABAC coding. This is because the amount of processing of CABAC encoding is relatively large, so that the amount of processing for obtaining the code amount estimation value R for each selectable encoding mode is avoided.

そのような発生符号量推定方法として、例えば非特許文献３の方法は符号化前段に現れるいくつかの数値から線形回帰により推定符号量を得ている。 As such a generated code amount estimation method, for example, the method of Non-Patent Document 3 obtains an estimated code amount by linear regression from several numerical values appearing in the preceding stage of encoding.

より一般的な符号量推定方法として、符号化すべき信号の生起確率を用いた方法がある（例えば、非特許文献４、５、６参照）。これら方法により、高い精度でＣＡＢＡＣ発生符号量を推定しつつ、処理量を１〜２５％削減することができる。 As a more general code amount estimation method, there is a method using an occurrence probability of a signal to be encoded (see, for example, Non-Patent Documents 4, 5, and 6). By these methods, it is possible to reduce the processing amount by 1 to 25% while estimating the CABAC generated code amount with high accuracy.

例えば非特許文献６の方法は、Ｈ．２６５／ＨＥＶＣの参照ソフトウェアで用いられている方法であるが、既に出力されたビット量Ｂｗと、固定小数点数Ｂｆを用いて、（２）式によって符号量推定値Ｒを求めている。
Ｒ＝Ｂｗ＋（Ｂｆ＞＞１５）・・・（２）
（２）式において、’＞＞’は右ビットシフト演算である。すなわち、（２）式は、固定小数点数Ｂｆを右へ１５ビットシフトして得られた値にＢｗを加算して符号量推定値Ｒを求めている。 For example, the method of Non-Patent Document 6 is described in H.H. This is a method used in the reference software of H.265 / HEVC, and the code amount estimation value R is obtained by the equation (2) using the already output bit amount Bw and the fixed-point number Bf.
R = Bw + (Bf >> 15) (2)
In Expression (2), “>>” is a right bit shift operation. That is, in the equation (2), the code amount estimated value R is obtained by adding Bw to the value obtained by shifting the fixed-point number Bf to the right by 15 bits.

ISO/IEC 14496-10:2014 Information technology -- Coding of audio-visual objects -- Part 10: Advanced Video CodingISO / IEC 14496-10: 2014 Information technology-Coding of audio-visual objects-Part 10: Advanced Video Coding ISO/IEC 23008-2:2015 Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 2: High efficiency video codingISO / IEC 23008-2: 2015 Information technology-High efficiency coding and media delivery in heterogeneous environments-Part 2: High efficiency video coding S. Guo, Z. Liu, D. Wang, Q. Han and Y. Song, "Linear rate estimation model for HEVC RDO using binary classification based regression" Data Compression Conference 2014, p. 406, Snowbird, March 2014S. Guo, Z. Liu, D. Wang, Q. Han and Y. Song, "Linear rate estimation model for HEVC RDO using binary classification based regression" Data Compression Conference 2014, p. 406, Snowbird, March 2014 J. Hahm and C.-M. Kyung, "Efficient CABAC rate estimation for H.264/AVC mode decision," IEEE Trans. CSVT, vol. 20, no. 2, pp. 310-316, February 2010.J. Hahm and C.-M. Kyung, "Efficient CABAC rate estimation for H.264 / AVC mode decision," IEEE Trans. CSVT, vol. 20, no. 2, pp. 310-316, February 2010. K. Won, J. Yang and B. Jeon, "Fast CABAC rate estimation for H.264/AVC mode decision," Electronics Letters, vol. 48, no. 19, pp. 1201-1203, September 2012.K. Won, J. Yang and B. Jeon, "Fast CABAC rate estimation for H.264 / AVC mode decision," Electronics Letters, vol. 48, no. 19, pp. 1201-1203, September 2012. F. Bossen, "CE1: Table-based bit estimation for CABAC," JCTVCG763, Geneva, November 2011F. Bossen, "CE1: Table-based bit estimation for CABAC," JCTVCG763, Geneva, November 2011

ところで、非特許文献３、４、５、６に記載の方法はいずれも、推定符号量を算出する過程で小数の推定値が得られていたのにこれを四捨五入または切り捨てを行って、推定符号量を整数としていた。これはＣＡＢＡＣが発生する符号は０または１のビットの羅列であることから自然なことと考えられていた。 By the way, in any of the methods described in Non-Patent Documents 3, 4, 5, and 6, although the estimated number of decimals was obtained in the process of calculating the estimated code amount, the estimated code is rounded or rounded off. The quantity was an integer. This was considered natural because the code generated by CABAC is a sequence of 0 or 1 bits.

しかしながら、確率ｐで起こる信号の符号量は正確には、ｌｏｇ_２（ｐ）ビットとなり、必ずしも整数にはならない場合がある。ここでｌｏｇ_２（ｐ）は２を底とするｐの対数である。例えば確率が０．６で起こる信号の符号量は約０．７３７ビットである。このように符号量は一般に０以上の小数値をとる。 However, the code amount of the signal occurring with the probability p is exactly log ₂ (p) bits and may not necessarily be an integer. Where log ₂ (p) is the logarithm of p with 2 as the base. For example, the code amount of a signal that occurs with a probability of 0.6 is about 0.737 bits. Thus, the code amount generally takes a decimal value of 0 or more.

したがって従来方法では推定符号量に対して、四捨五入または切り捨てを行うことにより、数値を丸めて整数化していたため、符号量推定値Ｒに１ビット未満の丸め誤差が含まれていた。元々が推定値ではあるため、実際のＣＡＢＡＣ符号量との誤差も当然あるが、従来は、実際の符号量との誤差に加えて前述の丸め誤差が含まれた値を用いてＲＤ最適化が行われていた。すなわち誤差が比較的多く含まれてしまっている符号量推定値Ｒに基づくコスト関数Ｃの最小化を行っていたため、最適な符号化モードが得られていない可能性があり、さらには符号化効率が低下するという問題がある。 Therefore, in the conventional method, the estimated code amount is rounded or rounded down to round the numerical value to an integer, so that the code amount estimated value R includes a rounding error of less than 1 bit. Since it is originally an estimated value, there is naturally an error with the actual CABAC code amount. Conventionally, however, RD optimization is performed using a value that includes the above-described rounding error in addition to the error with the actual code amount. It was broken. That is, since the cost function C based on the code amount estimation value R that contains a relatively large amount of error has been minimized, there is a possibility that an optimal encoding mode may not be obtained, and further, the encoding efficiency There is a problem that decreases.

この１ビット未満の丸め誤差は、符号量推定値Ｒが数千ビットと大きい場合は無視できる量であるが、例えば前述の確率が０．６の場合、この符号量を丸めて１ビットとすると、元の符号量に加わる誤差の割合は、（３）式のように３５．７％となり、無視できない程度に大きな割合を占める。
（１−ｌｏｇ_２（ｐ））／ｌｏｇ_２（ｐ）×１００＝３５．７［％］・・・（３） The rounding error of less than 1 bit is a negligible amount when the code amount estimation value R is as large as several thousand bits. For example, when the probability is 0.6, if the code amount is rounded to 1 bit, The ratio of the error added to the original code amount is 35.7% as shown in the equation (3), which occupies a large ratio that cannot be ignored.
(1-log ₂ (p)) / log ₂ (p) × 100 = 35.7 [%] (3)

本発明は、このような事情に鑑みてなされたもので、符号量推定値Ｒの精度を向上させることができる映像符号化方法、映像符号化装置及び映像符号化プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide a video encoding method, a video encoding device, and a video encoding program capable of improving the accuracy of the code amount estimation value R. To do.

本発明の一態様は、入力映像の信号を符号化する映像符号化装置が行う映像符号化方法であって、符号化モードを選択する場合のコスト関数計算に用いる符号量推定値の計算を浮動小数点演算を用いて行う符号量推定値計算ステップを有する映像符号化方法である。 One aspect of the present invention is a video encoding method performed by a video encoding device that encodes a signal of an input video, and floating calculation of a code amount estimation value used for cost function calculation when an encoding mode is selected. This is a video encoding method including a code amount estimated value calculation step performed using decimal point arithmetic.

本発明の一態様は、前記映像符号化方法であって、前記符号量推定値計算ステップでは、符号化単位の符号量の整数部分Ｂｗと、前記符号量の少数部分Ｂｆとを算出し、前記符号量推定値を「Ｂｗ＋Ｂｆ÷所定の値」によって計算する。 One aspect of the present invention is the video encoding method, wherein the code amount estimation value calculation step calculates an integer part Bw of a code amount of a coding unit and a decimal part Bf of the code amount, The estimated code amount is calculated by “Bw + Bf ÷ predetermined value”.

本発明の一態様は、前記映像符号化方法であって、前記符号量推定値をビットシフト演算によって算出する場合のビットシフト数をｎとしたときに、前記所定の値は、２^ｎである。 One aspect of the present invention is the video encoding method, wherein the predetermined value is 2 ⁿ when the number of bit shifts when the code amount estimation value is calculated by a bit shift operation is ⁿ . .

本発明の一態様は、前記映像符号化方法であって、前記所定の値は、３２７６８である。 One aspect of the present invention is the video encoding method, wherein the predetermined value is 32768.

本発明の一態様は、入力映像の信号を符号化する映像符号化装置であって、符号化モードを選択する場合のコスト関数計算に用いる符号量推定値の計算を浮動小数点演算を用いて行う符号量推定値計算部を備える映像符号化装置である。 One aspect of the present invention is a video encoding apparatus that encodes an input video signal, and performs calculation of a code amount estimation value used for cost function calculation when a coding mode is selected by using a floating-point operation. It is a video coding apparatus provided with a code amount estimated value calculation part.

本発明の一態様は、コンピュータに、前記映像符号化方法を実行させるための映像符号化プログラムである。 One aspect of the present invention is a video encoding program for causing a computer to execute the video encoding method.

本発明によれば、固定少数点のビットシフトを用いずに、浮動小数点演算を用いて、符号量推定値Ｒを求めるようにしたため、符号量推定値Ｒの精度を向上させることができる。符号量推定値Ｒの精度向上により、映像符号化において、ＲＤ最適化を精緻化することができるため、より適切な符号化モードが選択されるようになり、より少ない符号量でより高い品質の復号映像が得られるという効果が得られる。 According to the present invention, since the code amount estimated value R is obtained using floating point arithmetic without using a fixed decimal point bit shift, the accuracy of the code amount estimated value R can be improved. By improving the accuracy of the code amount estimation value R, it is possible to refine RD optimization in video encoding, so that a more appropriate encoding mode is selected, and a higher quality is achieved with a smaller code amount. An effect of obtaining a decoded video is obtained.

一般的な映像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of a general video coding apparatus. 図１に示す映像符号化装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video coding apparatus shown in FIG. 図１に示す符号化制御部１１５の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the encoding control part 115 shown in FIG. 図３に示す符号量推定値Ｒを算出するステップＳ１６の動作を示すフローチャートである。It is a flowchart which shows operation | movement of step S16 which calculates the code amount estimated value R shown in FIG.

以下、図面を参照して、本発明の実施形態による映像符号化装置を説明する。なお、本明細書において、画像とは、静止画像、または動画像を構成する１フレーム分の画像のことをいう。また映像とは、動画像と同じ意味であり、一連の画像の集合である。 A video encoding apparatus according to an embodiment of the present invention will be described below with reference to the drawings. Note that in this specification, an image means a still image or an image for one frame constituting a moving image. A video has the same meaning as a moving image, and is a set of a series of images.

はじめに、Ｈ．２６５／ＨＥＶＣまたはＨ．２６４／ＡＶＣを含む一般的な映像符号化装置の構成を説明する。図１は、一般的な映像符号化装置１０の構成を示すブロック図である。図１に示す映像符号化装置１０は、符号化対象の映像信号１００を入力し、ブロックに分割してブロック毎に符号化することによって、より小さい符号化データ１１８を生成して出力する。 First, H.C. H.265 / HEVC or H.264 A configuration of a general video encoding device including H.264 / AVC will be described. FIG. 1 is a block diagram showing a configuration of a general video encoding device 10. The video encoding device 10 shown in FIG. 1 receives the video signal 100 to be encoded, divides it into blocks, and encodes each block, thereby generating and outputting smaller encoded data 118.

映像符号化装置１０は、減算部１０２、変換・量子化部１０４、逆量子化・逆変換部１０６、加算部１０７、歪除去フィルタ１０８、フレームメモリ１０９、画面内予測部１１１、動き推定部１１２、画面間予測部１１４、符号化制御部１１５、エントロピー符号化部１１７を備える。 The video encoding device 10 includes a subtraction unit 102, a transform / quantization unit 104, an inverse quantization / inverse transform unit 106, an addition unit 107, a distortion removal filter 108, a frame memory 109, an in-screen prediction unit 111, and a motion estimation unit 112. , An inter-screen prediction unit 114, an encoding control unit 115, and an entropy encoding unit 117.

なお、映像符号化装置１０の構成を図１を参照して説明するに際して、映像符号化装置１０が普通に有する公知の機能・構成については、本発明の説明に直接関わりがない限り、その説明及び構成の図示を省略する。 When the configuration of the video encoding device 10 is described with reference to FIG. 1, the well-known functions and configurations that the video encoding device 10 normally has are described unless they are directly related to the description of the present invention. And illustration of a structure is abbreviate | omitted.

次に、図２を参照して、図１に示す映像符号化装置１０の動作を説明する。図２は、図１に示す映像符号化装置１０の動作を示すフローチャートである。ここでは、映像符号化装置１０が、Ｈ．２６５／ＨＥＶＣに準拠した映像符号化装置であるものとして説明する。 Next, the operation of the video encoding device 10 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the operation of the video encoding device 10 shown in FIG. In this case, the video encoding device 10 is H.264. In the following description, it is assumed that the video encoding apparatus conforms to H.265 / HEVC.

まず、外部から入力された映像信号は、予測単位（ＰｒｅｄｉｃｔｉｏｎＵｎｉｔ）と呼ばれる処理単位ごとのブロックに分割される。減算部１０２は、予測単位毎に予測信号１０１を映像信号１００から減算し、その結果から予測残差信号１０３を生成して出力する（ステップＳ１）。予測信号１０１は、画面内予測部１１１による画面内予測または画面間予測部１１４による画面間予測を行った結果、いずれかの予測結果に基づいて生成される信号である。 First, a video signal input from the outside is divided into blocks for each processing unit called a prediction unit (Prediction Unit). The subtraction unit 102 subtracts the prediction signal 101 from the video signal 100 for each prediction unit, and generates and outputs a prediction residual signal 103 from the result (step S1). The prediction signal 101 is a signal generated based on one of the prediction results as a result of performing intra prediction by the intra prediction unit 111 or inter prediction by the inter prediction unit 114.

次に、変換・量子化部１０４は、この予測残差信号１０３を変換及び量子化して出力する（ステップＳ２）。ここでいう変換とは、圧縮しようとする情報を、より圧縮率を高めるためにその意味内容が失われない別の情報表現の形に変換することである。この変換を行うには、ＤＣＴ（離散コサイン変換；Discrete Cosine Transform）、ＤＳＴ（離散サイン変換；Discrete Sine Transform）などが用いられる。 Next, the transform / quantization unit 104 transforms and quantizes the prediction residual signal 103 and outputs it (step S2). The conversion referred to here is to convert the information to be compressed into another information representation form in which the semantic content is not lost in order to further increase the compression rate. For this conversion, DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), or the like is used.

次に、エントロピー符号化部１１７は、変換・量子化部１０４の結果である量子化係数１０５をエントロピー符号化して、符号化データ１１８として出力する（ステップＳ３）。 Next, the entropy encoding unit 117 entropy-encodes the quantization coefficient 105 that is the result of the transform / quantization unit 104 and outputs the result as encoded data 118 (step S3).

一方、量子化係数１０５の値は逆量子化・逆変換部１０６において逆量子化・逆変換を施す（ステップＳ４）。ここでいう逆変換とは、逆ＤＣＴ（逆離散コサイン変換）または逆ＤＳＴ（逆離散サイン変換）を行うことである。 On the other hand, the value of the quantization coefficient 105 is subjected to inverse quantization / inverse transform in the inverse quantization / inverse transform unit 106 (step S4). The inverse transformation here is to perform inverse DCT (inverse discrete cosine transformation) or inverse DST (inverse discrete sine transformation).

次に、加算部１０７は、画面内予測部１１１または画面間予測部１１４のいずれかから出力する予測信号１０１と、逆量子化・逆変換部１０６の出力信号とを加算する。続いて、歪除去フィルタ１０８は、加算部１０７から出力する信号に対し、歪除去を施す（ステップＳ５）。 Next, the addition unit 107 adds the prediction signal 101 output from either the intra-screen prediction unit 111 or the inter-screen prediction unit 114 and the output signal of the inverse quantization / inverse conversion unit 106. Subsequently, the distortion removal filter 108 performs distortion removal on the signal output from the addition unit 107 (step S5).

歪除去フィルタ１０８の出力は、変換復号画像が再現されたものになり、その信号が復号映像信号１１０として出力される（ステップＳ６）。この復号映像信号１１０は、フレームメモリ１０９に記憶される（ステップＳ７）とともに、符号化制御部１１５に対して出力される。 The output of the distortion removal filter 108 is a reproduction of the converted decoded image, and the signal is output as the decoded video signal 110 (step S6). The decoded video signal 110 is stored in the frame memory 109 (step S7) and is output to the encoding control unit 115.

動き推定部１１２は、フレームメモリ１０９に記憶された復号映像信号１１０と映像信号１００とを用いて、対象ブロックが時間を経てどの程度動いたかを推定し、その結果である動き情報１１３を生成して出力する。この動き情報１１３及びフレームメモリ１０９に記憶されている復号映像信号１１０を用いて、画面間予測部１１４は対象ブロックの予測信号１０１を生成する。または、画面内予測部１１１は、フレームメモリ１０９に記憶されている復号映像信号１１０を用いて、対象ブロックの予測信号１０１を生成して出力する（ステップＳ８）。 The motion estimation unit 112 uses the decoded video signal 110 and the video signal 100 stored in the frame memory 109 to estimate how much the target block has moved over time, and generates motion information 113 as a result of the estimation. Output. Using the motion information 113 and the decoded video signal 110 stored in the frame memory 109, the inter-screen prediction unit 114 generates the prediction signal 101 of the target block. Alternatively, the intra-screen prediction unit 111 generates and outputs the prediction signal 101 of the target block using the decoded video signal 110 stored in the frame memory 109 (step S8).

なお、エントロピー符号化部１１７は、符号化制御部１１５から出力する制御データ１１６、変換・量子化部１０４から出力する量子化係数１０５、動き推定部１１２から出力する動き情報１１３を符号化し、符号化データ１１８として出力する。 The entropy encoding unit 117 encodes the control data 116 output from the encoding control unit 115, the quantization coefficient 105 output from the transform / quantization unit 104, and the motion information 113 output from the motion estimation unit 112. Is output as converted data 118.

また、以上の一連の処理において、いくつかの処理部は符号化制御部１１５が出力する制御データ１１６に基づいて動作する。符号化制御部１１５は、映像信号１００と復号映像信号１１０を参照し、制御データを出力する。制御データ１１６は、例えば変換・量子化部１０４及び逆量子化・逆変換部１０６においては量子化幅や変換するブロックの大きさを指定する。 In the series of processes described above, some processing units operate based on the control data 116 output from the encoding control unit 115. The encoding control unit 115 refers to the video signal 100 and the decoded video signal 110 and outputs control data. For example, in the transform / quantization unit 104 and the inverse quantization / inverse transform unit 106, the control data 116 designates a quantization width and a block size to be transformed.

また、符号化制御部１１５は、画面内予測部１１１及び画面間予測部１１４においては対象ブロックが画面内予測・画面間予測いずれにより予測するかを指定する。そして、符号化制御部１１５は、画面内予測の場合どのような予測方法であるか、どのような大きさのブロック単位で予測を行うか、画面間予測の場合予測を２枚のフレームから行うか１枚のフレームから行うかを指定する。また、符号化制御部１１５は、動き推定部１１２においては画面間予測を行う単位のサイズや参照する過去符号化フレームを出力する。 In addition, the encoding control unit 115 specifies whether the target block is predicted by intra prediction or inter prediction in the intra prediction unit 111 and the inter prediction unit 114. Then, the encoding control unit 115 performs prediction from two frames for what prediction method is used for intra-screen prediction, what block size is used for prediction, and for inter-screen prediction. Specify whether to start from one frame. In addition, the encoding control unit 115 outputs a size of a unit for performing inter-screen prediction and a past encoded frame to be referred to in the motion estimation unit 112.

ここで説明した符号化制御部１１５が出力する制御データ１１６により指定される「量子化幅」や「変換するブロックの大きさ」情報などが「モード」と呼ばれるものである。複数あるモードの選択肢の中から最もコスト関数を小さくするモードを選ぶことで、与えられたビットレートで最高の画質を得ることができる。 The “quantization width” and “block size to be converted” information specified by the control data 116 output from the encoding control unit 115 described here are called “modes”. By selecting a mode with the smallest cost function from among a plurality of mode options, the highest image quality can be obtained at a given bit rate.

次に、図３を参照して、図１に示す符号化制御部１１５の動作を説明する。図３は、図１に示す符号化制御部１１５の動作を示すフローチャートである。まず、符号化制御部１１５は、別途与えられる未定乗数λの値を入力する（ステップＳ１１）。続いて、符号化制御部１１５は、対象ブロックの映像信号１００を入力する（ステップＳ１２）。 Next, the operation of the encoding control unit 115 shown in FIG. 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the encoding control unit 115 shown in FIG. First, the encoding control unit 115 inputs a value of an undetermined multiplier λ given separately (step S11). Subsequently, the encoding control unit 115 inputs the video signal 100 of the target block (step S12).

次に、符号化制御部１１５は、対象となる映像符号化装置１０内の処理部において試行可能なモードを順次列挙し、試行モードとして設定する（ステップＳ１３）。符号化制御部１１５は、設定した試行モードを用いて符号化を試行し（ステップＳ１４）、得られた復号画像信号と原画像信号の差の二乗の総和を歪み量Ｄとする（ステップＳ１５）。 Next, the encoding control unit 115 sequentially lists modes that can be tried in the processing unit in the target video encoding device 10 and sets them as the trial mode (step S13). The encoding control unit 115 attempts encoding using the set trial mode (step S14), and sets the sum of the squares of the difference between the obtained decoded image signal and the original image signal as the distortion amount D (step S15). .

また、符号化制御部１１５は、非特許文献６などに記載の技術により、対応する符号量推定値Ｒを推定する（ステップＳ１６）。そして、符号化制御部１１５は、（１）式に基づきコスト関数Ｃを計算する（ステップＳ１７）。符号化制御部１１５は、求めたコスト関数Ｃが試行した中で最も小さいか否かを判定し（ステップＳ１８）、最も小さければこの時点のモードを記憶する（ステップＳ１９）。 Also, the encoding control unit 115 estimates the corresponding code amount estimation value R by the technique described in Non-Patent Document 6 or the like (step S16). Then, the encoding control unit 115 calculates the cost function C based on the equation (1) (step S17). The encoding control unit 115 determines whether or not the obtained cost function C is the smallest among the trials (step S18), and if it is the smallest, stores the mode at this time (step S19).

次に、対象となる処理部において試行可能なモードをすべて終えたかを判定し（ステップＳ２０）、終えていなければステップＳ１３に戻って、処理を繰り返す。終えていれば、符号化制御部１１５は、コスト関数Ｃが最小であった記憶されているモードを出力し（ステップＳ２１）、終了する。 Next, it is determined whether all trialable modes have been completed in the target processing unit (step S20), and if not completed, the process returns to step S13 to repeat the process. If completed, the encoding control unit 115 outputs the stored mode in which the cost function C is minimum (step S21), and the process ends.

次に、図４を参照して、図３に示す符号量推定値Ｒを算出するステップＳ１６の動作の詳細を説明する。図４は、符号化制御部１１５が図３に示す符号量推定値Ｒを算出するステップＳ１６の動作を示すフローチャートである。 Next, the details of the operation of step S16 for calculating the code amount estimation value R shown in FIG. 3 will be described with reference to FIG. FIG. 4 is a flowchart showing the operation of step S16 in which the encoding control unit 115 calculates the code amount estimated value R shown in FIG.

非特許文献６の具体的方法はＨ．２６５／ＨＥＶＣの参照ソフトウェアに記載されているため詳細な説明を省略する。まず、符号化制御部１１５は、指定モードにおいて符号化を試行した際のＣＡＢＡＣの内部状態より整数符号量Ｂｗを算出する（ステップＳ４１）。続いて、符号化制御部１１５は、小数符号量Ｂｆを算出する（ステップＳ４２）。最後に符号化制御部１１５は、（４）式に基づいて符号量推定値Ｒを算出して出力する（ステップＳ４３）。（４）式は、浮動小数点を用いた演算を用いて計算する。
Ｒ＝Ｂｗ＋Ｂｆ÷３２７６８
＝Ｂｗ＋Ｂｆ÷２^１５・・・（４） The specific method of Non-Patent Document 6 is described in H.C. Detailed description is omitted because it is described in the reference software of H.265 / HEVC. First, the encoding control unit 115 calculates the integer code amount Bw from the internal state of the CABAC when the encoding is attempted in the designated mode (step S41). Subsequently, the encoding control unit 115 calculates a decimal code amount Bf (step S42). Finally, the encoding control unit 115 calculates and outputs a code amount estimated value R based on the equation (4) (step S43). Equation (4) is calculated using an arithmetic operation using a floating point.
R = Bw + Bf ÷ 32768
= Bw + Bf ÷ 2 ¹⁵ (4)

このように、符号量推定値Ｒを浮動小数点を用いて演算を行うことにより、ビットシフトによって算出するより計算精度を向上させることが可能となる。この構成により、符号量推定値Ｒの算出時に丸め誤差が含まれることがなくなり、符号量推定値Ｒを正確に算出することができるようになるため、最適な符号化モードが得られる可能性が高まり、符号化効率が向上することが期待できるという効果が得られる。 Thus, by calculating the code amount estimated value R using a floating point, it is possible to improve the calculation accuracy rather than calculating by bit shift. With this configuration, a roundoff error is not included when the code amount estimated value R is calculated, and the code amount estimated value R can be calculated accurately, so that the possibility of obtaining an optimal encoding mode is increased. Thus, the effect that the encoding efficiency can be expected to be improved is obtained.

なお、「３２７６８」という値は、固定値である必要はなく、±１％程度（３２４４０〜３３０９５）変動しても、復号画像には影響を与えないようにすることができるとともに、右ビットシフトを用いて計算するよりは、精度を高めることができる。 Note that the value “32768” does not have to be a fixed value, and even if it fluctuates by about ± 1% (32440 to 33095), the decoded image can be prevented from being affected, and the right bit shift can be performed. The accuracy can be improved compared with the calculation using.

また、前述した説明では、３２７６８（＝２^１５）で割り算する例を示したが、（２）式をＲ＝Ｂｗ＋（Ｂｆ＞＞ｎ）としたとき、すなわち、ｎビット右シフトするとしたとき、（４）式は、（４）’式とすればよい。
Ｒ＝Ｂｗ＋Ｂｆ÷２^ｎ・・・（４）’
この場合も割り算は、浮動小数点演算によって行えばよい。ただし、２^ｎは事前に計算してその値を代入して計算をした方が計算速度は向上する。 In the above description, an example in which division is performed by 32768 (= 2 ¹⁵ ) has been shown. However, when the expression (2) is set to R = Bw + (Bf >> n), that is, when n bits are shifted to the right, The equation (4) may be the equation (4) ′.
R = Bw + Bf ÷ 2 ⁿ (4) ′
In this case as well, division may be performed by floating point arithmetic. However, ²ⁿ is calculated in advance, and the calculation speed is improved by substituting the value.

なお、（４）式、（４）’式による浮動小数点演算は、符号量推定値を出力する処理の最終段階で実数Ｒを整数化する処理を有している符号化ソフトウェアであれば適用可能である。例えば、非特許文献４、５に記載されている計算処理において適用可能である。 Note that the floating-point arithmetic using the equations (4) and (4) ′ can be applied to any encoding software having a process for converting the real number R into an integer at the final stage of the process of outputting the code amount estimation value. It is. For example, the present invention can be applied to the calculation processes described in Non-Patent Documents 4 and 5.

以上説明したように、符号量の推定手順においてより誤差の少ない推定を行うように構成した。この構成によれば、映像符号化において画質改善の要となっている符号化最適化において、効率を高める符号化モードの選択を誤る可能性を低下させることができる。したがって、映像符号化において、より高い符号化効率が実現され、より少ない符号量でより高い品質の復号映像を得ることができる。 As described above, the code amount estimation procedure is configured to perform estimation with less error. According to this configuration, it is possible to reduce the possibility of erroneous selection of a coding mode that increases efficiency in coding optimization, which is a key to improving image quality in video coding. Therefore, higher encoding efficiency is realized in video encoding, and higher quality decoded video can be obtained with a smaller code amount.

前述した実施形態における映像符号化装置の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve all or one part of the video coding apparatus in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be a program for realizing a part of the above-described functions, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

画像・映像の非可逆符号化において、映像品質の改善及び符号化ビットレートの削減を目的として、画像の符号化・復号を行うことが不可欠な用途に適用できる。 In lossy encoding of images / videos, the present invention can be applied to applications where it is indispensable to encode / decode images for the purpose of improving video quality and reducing the encoding bit rate.

１００…映像信号、１０１…予測信号、１０２…減算部、１０３…予測残差信号、１０４…変換・量子化部、１０５…量子化係数、１０６…逆量子化・逆変換部、１０７…加算部、１０８…歪除去フィルタ、１０９…フレームメモリ、１１０…復号映像信号、１１１…画面内予測部、１１２…動き推定部、１１３…動き情報、１１４…画面間予測部、１１５…符号化制御部、１１６…制御データ、１１７…エントロピー符号化部、１１８…符号化データ DESCRIPTION OF SYMBOLS 100 ... Video signal 101 ... Prediction signal 102 ... Subtraction part 103 ... Prediction residual signal 104 ... Transformation / quantization part 105 ... Quantization coefficient 106 ... Inverse quantization / inverse transformation part 107 ... Addition part , 108 ... distortion removal filter, 109 ... frame memory, 110 ... decoded video signal, 111 ... intra prediction unit, 112 ... motion estimation unit, 113 ... motion information, 114 ... inter prediction unit, 115 ... encoding control unit, 116: Control data, 117: Entropy encoding unit, 118: Encoded data

Claims

A video encoding method performed by a video encoding device that encodes an input video signal,
Have a code amount estimation value calculation step performed using floating point arithmetic calculations code amount estimation value used for the cost function calculation in the case of selecting the coding mode,
In the code amount estimated value calculation step, an integer part Bw of a code amount of a coding unit and a decimal part Bf of the code amount are calculated, and the code amount estimated value is calculated by “Bw + Bf ÷ predetermined value”.
The predetermined value is either a predetermined fixed value or a value that varies ± 1% from the fixed value.
Video encoding method.

The fixed value, the video encoding method according to claim 1 which is 32768.

A video encoding device that encodes an input video signal,
A code amount estimated value calculation unit that performs calculation of a code amount estimated value used for cost function calculation in the case of selecting an encoding mode using a floating-point operation ,
The code amount estimated value calculation unit calculates an integer part Bw of a code amount of a coding unit and a decimal part Bf of the code amount, calculates the code amount estimated value by “Bw + Bf ÷ predetermined value”,
The predetermined value is either a predetermined fixed value or a value that varies ± 1% from the fixed value.
Video encoding device.

A video encoding program for causing a computer to execute the video encoding method according to claim 1 .