JP6677230B2

JP6677230B2 - Video encoding device, video decoding device, video system, video encoding method, and video encoding program

Info

Publication number: JP6677230B2
Application number: JP2017179959A
Authority: JP
Inventors: 貴之石田; 慶一蝶野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2020-04-08
Anticipated expiration: 2035-12-02
Also published as: JP2018014750A

Description

本発明は、映像の画面を分割してから圧縮する符号化方法に基づく映像符号化装置、映像復号装置、映像システム、映像符号化方法、及び映像符号化プログラムに関する。 The present invention relates to a video encoding device, a video decoding device, a video system, a video encoding method, and a video encoding program based on an encoding method of dividing a video screen and then compressing the divided video screen.

映像の高精細化の要請に応じて、水平方向1920×垂直方向1080（画素）のフルHD（High Definition ）の映像コンテンツが供給されている。また、水平方向3840×垂直方向2160（画素）の高精細映像（以下、４Ｋという。）の試験放送や商用放送が開始されている。さらに、水平方向7680×垂直方向4320（画素）の高精細映像（以下、８Ｋという。）の商用放送が計画されている。 In response to a demand for higher definition of a video, a full HD (High Definition) video content of 1920 in the horizontal direction × 1080 (pixels) in the vertical direction is supplied. Also, test broadcasting and commercial broadcasting of high-definition video (hereinafter, referred to as 4K) of 3840 × 2160 (pixels) in the horizontal direction have been started. Further, commercial broadcasting of high-definition video (hereinafter referred to as 8K) of 7680 horizontal directions × 4320 (pixels) vertical direction is planned.

映像コンテンツの配信システムにおいて、一般に、伝送側では映像信号はH.264/AVC (Advanced Video Coding )規格やHEVC(High Efficiency Video Coding)規格に基づいて符号化され、受信側では復号処理を経て映像信号が再生されるが、８Ｋの場合には画素数が多いので、符号化処理及び復号処理における処理負荷が高くなる。 In a video content distribution system, in general, a video signal is encoded based on the H.264 / AVC (Advanced Video Coding) standard or HEVC (High Efficiency Video Coding) standard on the transmission side, and is decoded through a decoding process on the reception side. The signal is reproduced, but in the case of 8K, since the number of pixels is large, the processing load in the encoding process and the decoding process increases.

８Ｋの場合の処理負荷を低減するための方法として、例えば非特許文献１に記載されたスライスを用いた画面４分割符号化がある（図１１参照）。図１２に示すように、非特許文献１では、スライス境界付近のブロックにおいて、動き補償（MC）のための動きベクトルでは、画面４分割符号化が使用される場合、インター予測が行われるときに、スライスの垂直方向（縦方向）の成分が１２８画素以下であるという制約が設けられている。なお、スライス境界付近に属さないブロックに対して、スライス境界を跨ぐ垂直方向の動きベクトル範囲の制約（以下、動きベクトル制限という。）はない。 As a method for reducing the processing load in the case of 8K, for example, there is a screen quadruple coding using slices described in Non-Patent Document 1 (see FIG. 11). As shown in FIG. 12, in Non-Patent Document 1, in a block near a slice boundary, when inter-prediction is performed, when a screen quadruple encoding is used for a motion vector for motion compensation (MC), , A constraint that the vertical component (vertical direction) of the slice is 128 pixels or less. Note that there is no restriction on the motion vector range in the vertical direction across the slice boundary (hereinafter, referred to as motion vector restriction) for blocks not belonging to the vicinity of the slice boundary.

また、4Kや8Kでは、標準ダイナミックレンジ方式（以下、SDR (Standard Dynamic Range)という。）の映像信号だけでなく、ARIB STD-B67規格のハイダイナミックレンジ方式（以下、HDR (High Dynamic Rang )という。）であるHybrid Log Gamma（以下、HLGという。）や、SMPTE ST.2084 規格のHDR であるPerceptual Quantizer（以下、PQという。) などの映像信号も扱うことが検討されている。ゆえに、SDR とHDR の切り替えも考慮する必要がある。 In 4K and 8K, not only standard dynamic range (SDR) video signals but also ARIB STD-B67 standard high dynamic range (HDR) video signals .), And the use of video signals such as the SMPTE ST.2084 HDR Perceptual Quantizer (hereinafter referred to as PQ). Therefore, switching between SDR and HDR also needs to be considered.

ARIB(Association of Radio Industries and Businesses)標準規格 STD-B32 3.0版平成２６年７月３１日電波産業会ARIB (Association of Radio Industries and Businesses) standard STD-B32 version 3.0 July 31, 2014 Radio Industry Association

動きベクトル制限がある場合、画面中の物体や画面全体が縦方向に速く動くシーンを符号化するときに、スライス境界では最適な動きベクトルが選択できないことがある。その結果、局所的な画質劣化を発生させる可能性がある。劣化の程度は、速い動きのときにＭ値が大きいほど大きくなる。Ｍ値は、参照ピクチャの間隔である。なお、「最適な動きベクトル」は、映像符号化装置における画面間予測（インター予測）処理を行う予測器で選択された本来の（正規の）動きベクトルを意味する。 When there is a motion vector restriction, when encoding a scene in which an object on the screen or the entire screen moves fast in the vertical direction, an optimal motion vector may not be selected at a slice boundary. As a result, there is a possibility that local image quality degradation will occur. The degree of deterioration increases as the M value increases during fast movement. The M value is a reference picture interval. The “optimal motion vector” means an original (regular) motion vector selected by a predictor that performs an inter-screen prediction (inter prediction) process in the video encoding device.

図１４に、Ｍ＝４の場合とＭ＝８の場合の参照ピクチャの間隔が例示されている。一般に、Ｍ値が小さい場合には、フレーム間距離が小さくなるので、動きベクトルの値は小さくなる傾向がある。しかし、特に定常的なシーンにおいて、時間方向階層が少なくなるため階層（レイヤ）に応じた符号量配分が制約されるため、符号化効率は低下する。一方、Ｍ値が大きい場合には、フレーム間距離が大きくなるので、動きベクトルの値は大きくなる傾向がある。しかし、特に定常的なシーンにおいて、時間方向階層が多くなるため階層（レイヤ）に応じた符号量配分の制約が緩和されるため、符号化効率は向上する。一例として、Ｍ値を８から４に変えると、動きベクトルの値は１／２になり、Ｍ値を４から８に変えると、動きベクトルの値は２倍になる。 FIG. 14 illustrates the intervals between reference pictures when M = 4 and M = 8. In general, when the M value is small, the distance between frames is small, so that the value of the motion vector tends to be small. However, particularly in a stationary scene, the number of layers in the time direction is reduced, so that the code amount distribution according to the layer (layer) is restricted, so that the coding efficiency is reduced. On the other hand, when the M value is large, the inter-frame distance increases, so that the value of the motion vector tends to increase. However, especially in a stationary scene, the number of hierarchical layers in the time direction increases, so that the restrictions on the distribution of code amounts according to the layers (layers) are relaxed, so that the coding efficiency is improved. As an example, when the M value is changed from 8 to 4, the value of the motion vector is reduced to 1/2, and when the M value is changed from 4 to 8, the value of the motion vector is doubled.

なお、非特許文献１においてＳＯＰ（Set of Pictures ）という概念が導入されている。SOP は、時間方向階層符号化を行う場合に、各AU（Access Unit ）の符号化順及び参照関係を記述する単位になる。時間方向階層符号化は、複数フレームの映像の中から、部分的にフレームを取り出せるようにする符号化である。 Note that Non-Patent Document 1 introduces the concept of SOP (Set of Pictures). The SOP is a unit that describes the coding order and reference relationship of each AU (Access Unit) when performing temporal layer coding. Temporal hierarchical coding is coding that allows a frame to be partially extracted from a video of a plurality of frames.

ＳＯＰ構造は、Ｌ=０の構造、Ｌ=１の構造、Ｌ=２の構造及びL=３の構造を含む。図１５に示すように、Ｌｘ（ｘ＝０，１，２，３）は、以下のような構造である。
・Ｌ＝０の構造：Temporal ID が0のピクチャだけで構成されるSOP 構造（つまり、同SOPに含まれるピクチャの段数は１つである。最大Temporal ID を示すＬが０であるともいえる。）
・Ｌ=１の構造：Temporal ID が０のピクチャおよび１のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は２つである。最大Temporal ID を示すＬが１であるともいえる。）
・Ｌ＝２の構造：Temporal ID が０のピクチャ、１のピクチャ、および、２のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は３つである。最大Temporal ID を示すＬが２であるともいえる。）
・Ｌ＝３の構造：Temporal ID が０のピクチャ、１のピクチャ、２のピクチャ、および３のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は４つである。最大Temporal ID を示すＬが３であるともいえる。） The SOP structure includes a structure of L = 0, a structure of L = 1, a structure of L = 2, and a structure of L = 3. As shown in FIG. 15, Lx (x = 0, 1, 2, 3) has the following structure.
Structure of L = 0: SOP structure composed of only pictures having a Temporal ID of 0 (that is, the number of pictures included in the SOP is one. It can be said that L indicating the maximum Temporal ID is 0). )
L = 1 structure: SOP structure composed of a picture with a Temporal ID of 0 and a picture with a 1 (that is, the number of pictures included in the SOP is two. L indicating the maximum Temporal ID is 1) It can also be said.)
L = 2 structure: SOP structure composed of a picture with a Temporal ID of 0, one picture, and two pictures (that is, the number of pictures included in the SOP is three. The maximum Temporal ID is It can be said that L shown is 2.)
L = 3 structure: SOP structure composed of a picture with a Temporal ID of 0, one picture, two pictures, and three pictures (that is, the number of pictures included in the SOP is four. It can be said that L indicating Temporal ID is 3.)

本明細書の記載では、Ｍ＝１はＬ＝０の構造のSOP に対応し、Ｍ＝２はＮ＝１の場合のＬ＝１の構造のSOP （図１５参照）に対応し、Ｍ＝３はＮ＝２の場合のＬ＝１の構造のSOP （図１５参照）に対応し、Ｍ＝４はＬ＝２の構造のSOP に対応し、Ｍ＝８はＬ＝３の構造のSOP に対応する。 In the description of the present specification, M = 1 corresponds to the SOP having the structure of L = 0, M = 2 corresponds to the SOP having the structure of L = 1 when N = 1 (see FIG. 15), and M = 3 corresponds to the SOP having the structure of L = 1 when N = 2 (see FIG. 15), M = 4 corresponds to the SOP of the structure of L = 2, and M = 8 corresponds to the SOP of the structure of L = 3. Corresponding to

定常的なシーン（例えば、画面中の物体や画面全体が速く動かないシーン）については、上述したように参照ピクチャ間隔（Ｍ値）が大きいほど符号化効率がよい。よって、８Ｋなどの高精細映像を低レートで符号化するためには、映像符号化装置が基本的にＭ＝８で動作することが好ましい。 For a stationary scene (for example, a scene in which an object in the screen or the entire screen does not move quickly), as described above, the larger the reference picture interval (M value), the higher the coding efficiency. Therefore, in order to encode a high-definition video such as 8K at a low rate, it is preferable that the video encoding apparatus basically operates at M = 8.

しかし、上述したように、Ｍ値を大きくすると動きベクトルの値が大きくなる傾向があるので、特に、画面中の物体や画面全体が縦方向に速く動くシーンにおいて、動きベクトル制限に起因して画質が劣化する。動きベクトル制限によって、スライス境界において、最適な動きベクトルを選択できない場合があるためである。 However, as described above, when the M value is increased, the value of the motion vector tends to increase. Therefore, especially in a scene in which an object on the screen or the entire screen moves quickly in the vertical direction, the image quality is reduced due to the motion vector restriction. Deteriorates. This is because an optimal motion vector cannot be selected at a slice boundary due to a motion vector restriction.

本発明は、映像の画面を分割してから圧縮する符号化方法であって、スライス境界付近において動きベクトルの選択の制約がある符号化方法を使用する場合に、画質劣化を抑制することを目的とする。加えて、上述したように4Kや8KではSDR とHDR の切り替えも考慮する必要があるため、本発明は、SDR とHDR の切り替えも考慮して、上記画質劣化を抑制することを目的とする。
An object of the present invention is to provide a coding method for compressing a video screen after dividing the screen, and to suppress image quality degradation when using a coding method in which a motion vector is restricted near slice boundaries. And In addition, since it is necessary to consider switching between SDR and HDR in 4K and 8K as described above, an object of the present invention is to suppress the above-described image quality degradation in consideration of switching between SDR and HDR.

本発明による映像符号化方法は、ダイナミックレンジに関する映像信号の情報を用い、複数のSOP構造を使用してビットストリームを生成する映像符号化方法であって、時間方向で、符号化対象の映像信号がSDRからHDRに切り替わる場合、または、HDRからSDRに切り替わる場合に、ダイナミックレンジを切り替えるためのダイナミックレンジに関する映像信号の情報を伝送可能にするために、切り替わり後の先頭のSOPのSOP構造をTemporal IDが０のピクチャだけで構成されるSOP構造に設定し、ダイナミックレンジに関する映像信号の情報を、SDRからHDRに切り替わったとき、または、HDRからSDRに切り替わったときに、ビットストリームにおけるtransfer characteristicsに設定することを特徴とする。 A video encoding method according to the present invention is a video encoding method for generating a bit stream using a plurality of SOP structures using information of a video signal regarding a dynamic range, and in a time direction, a video signal to be encoded. When switching from SDR to HDR, or when switching from HDR to SDR, the SOP structure of the first SOP after switching is Temporal to enable transmission of video signal information related to the dynamic range for switching the dynamic range. ID is set to SOP structure consisting of only pictures of 0, the information of the video signal relating to the dynamic range, when switching from SDR to HDR, or, when switching to the SDR from HDR, the transfer characteristics the in the bitstream It is characterized by setting.

本発明による映像符号化装置は、時間方向で、符号化対象の映像信号がSDRからHDRに切り替わる場合、または、HDRからSDRに切り替わる場合に、ダイナミックレンジを切り替えるためのダイナミックレンジに関する映像信号の情報を伝送可能にするために、切り替わり後の先頭のSOPのSOP構造をTemporal IDが０のピクチャだけで構成されるSOP構造に設定する符号化構造制御手段を備え、符号化構造制御手段は、ダイナミックレンジに関する映像信号の情報を、SDRからHDRに切り替わったとき、または、HDRからSDRに切り替わったときに、ビットストリームにおけるtransfer characteristicsに設定することを特徴とする。 Video encoding apparatus according to the present invention, in the time direction, when the video signal to be encoded switches from SDR to HDR, or when switching from HDR to SDR, video signal information about the dynamic range for switching the dynamic range In order to enable transmission, the coding structure control means for setting the SOP structure of the first SOP after the switching to an SOP structure composed of only pictures with a Temporal ID of 0 is provided. the information of the video signal relating to range, when switching from SDR to HDR, or, when switching to the SDR from HDR, and sets the transfer characteristics the in the bitstream.

本発明による映像符号化プログラムは、ダイナミックレンジに関する映像信号の情報を用い、複数のSOP構造を使用してビットストリームを生成する映像符号化装置におけるコンピュータに、時間方向で、符号化対象の映像信号がSDRからHDRに切り替わる場合、または、HDRからSDRに切り替わる場合に、ダイナミックレンジを切り替えるためのダイナミックレンジに関する映像信号の情報を伝送可能にするために、切り替わり後の先頭のSOPのSOP構造をTemporal IDが０のピクチャだけで構成されるSOP構造に設定する処理を実行させ、ダイナミックレンジに関する映像信号の情報を、SDRからHDRに切り替わったとき、または、HDRからSDRに切り替わったときに、ビットストリームにおけるtransfer characteristicsに設定させることを特徴とする。 A video encoding program according to the present invention uses a video signal information relating to a dynamic range, and in a time direction, a computer in a video encoding device that generates a bit stream using a plurality of SOP structures. When switching from SDR to HDR, or when switching from HDR to SDR, the SOP structure of the first SOP after switching is Temporal to enable transmission of video signal information related to the dynamic range for switching the dynamic range. ID to execute a process to set the SOP structure consisting of only pictures of 0, the information of the video signal relating to the dynamic range, when switching from SDR to HDR, or, when switching to the SDR from HDR, the bit stream It is characterized by having transfer characteristics set in

本発明による映像システムは、上記の映像符号化装置と、音声信号を符号化する音声符号化部と、映像符号化装置からのビットストリームと音声符号化部からのビットストリームとを多重化して出力する多重化部とを備える。
本発明による映像復号装置は、上記の映像符号化装置から伝送されるビットストリームを復号することを特徴とする。 A video system according to the present invention includes the above-described video encoding device, an audio encoding unit that encodes an audio signal, and multiplexes and outputs a bit stream from the video encoding device and a bit stream from the audio encoding unit. And a multiplexing unit.
A video decoding device according to the present invention decodes a bit stream transmitted from the video encoding device.

本発明によれば、SDR とHDR の切り替えに対応しつつ、画質劣化を抑制することができる。 Advantageous Effects of Invention According to the present invention, it is possible to suppress deterioration of image quality while supporting switching between SDR and HDR.

映像符号化装置の実施形態の構成例を示すブロック図である。It is a block diagram showing an example of composition of an embodiment of a video coding device. 映像復号装置の実施形態の構成例を示すブロック図である。It is a block diagram showing an example of composition of an embodiment of a video decoding device. 映像符号化装置の第１の実施形態の動作を示すフローチャートである。5 is a flowchart illustrating an operation of the video encoding device according to the first embodiment. 映像符号化装置の第２の実施形態の動作を示すフローチャートである。6 is a flowchart illustrating an operation of the video encoding device according to the second embodiment. 映像符号化装置の第３の実施形態の動作を示すフローチャートである。13 is a flowchart illustrating an operation of the video encoding device according to the third embodiment. 映像システムの一例を示すブロック図である。It is a block diagram showing an example of a picture system. 映像システムの他の例を示すブロック図である。It is a block diagram showing other examples of a video system. 映像符号化装置及び映像復号装置の機能を実現可能な情報処理システムの構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of an information processing system capable of realizing the functions of a video encoding device and a video decoding device. 映像符号化装置の主要部を示すブロック図である。It is a block diagram which shows the principal part of a video encoding device. 映像復号装置の主要部を示すブロック図である。It is a block diagram which shows the principal part of a video decoding device. 画面分割の一例を示す説明図である。FIG. 4 is an explanatory diagram illustrating an example of screen division. 動きベクトル制限を説明するための説明図である。FIG. 9 is an explanatory diagram for explaining motion vector restriction. ＳＯＰ構造を示す説明図である。FIG. 3 is an explanatory diagram showing an SOP structure. 参照ピクチャの間隔の一例を示す説明図である。FIG. 4 is an explanatory diagram illustrating an example of a reference picture interval. ＳＯＰ構造を示す説明図である。FIG. 3 is an explanatory diagram showing an SOP structure.

以下、本発明の実施形態を図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、映像符号化装置の実施形態の構成例を示すブロック図である。図１に示す映像符号化装置１００は、符号化部１０１、解析部１１１、判定部１１２及びＭ値決定部１１３を含む。なお、映像符号化装置１００は、HEVC規格に基づいて符号化処理を実行するが、他の規格、例えば、H.264/AVC規格に基づいて符号化処理を実行してもよい。また、以下、８Ｋの映像が入力される場合を例にする。 FIG. 1 is a block diagram illustrating a configuration example of an embodiment of a video encoding device. The video encoding device 100 illustrated in FIG. 1 includes an encoding unit 101, an analysis unit 111, a determination unit 112, and an M value determination unit 113. Although the video encoding device 100 performs the encoding process based on the HEVC standard, the video encoding device 100 may execute the encoding process based on another standard, for example, the H.264 / AVC standard. Hereinafter, a case where an 8K video is input will be described as an example.

符号化部１０１は、入力画像を複数の画面に分割する画面分割器１０２、周波数変換／量子化器１０３、逆量子化／逆周波数変換器１０４，バッファ１０５、予測器１０６、及びエントロピー符号化器１０７を含む。 The encoding unit 101 includes a screen divider 102 for dividing an input image into a plurality of screens, a frequency transformer / quantizer 103, an inverse quantization / inverse frequency transformer 104, a buffer 105, a predictor 106, and an entropy encoder. 107.

画面分割器１０２は、入力映像の画面を４つの画面に分割する（図１１参照）。周波数変換／量子化器１０３は、入力映像信号から予測信号を減じた予測誤差画像を周波数変換する。周波数変換／量子化器１０３は、さらに、周波数変換された予測誤差画像（周波数変換係数）を量子化する。以下、量子化された周波数変換係数を変換量子化値という。 The screen divider 102 divides the screen of the input video into four screens (see FIG. 11). The frequency converter / quantizer 103 frequency-converts a prediction error image obtained by subtracting a prediction signal from an input video signal. The frequency converter / quantizer 103 further quantizes the frequency-converted prediction error image (frequency conversion coefficient). Hereinafter, the quantized frequency transform coefficient is referred to as a transform quantization value.

エントロピー符号化器１０７は、予測パラメータと変換量子化値をエントロピー符号化して、ビットストリームを出力する。予測パラメータは、予測モード（イントラ予測、インター予測）、イントラ予測ブロックサイズ、イントラ予測方向、インター予測ブロックサイズ、及び動きベクトルなど、ＣＴＵ（Coding Tree Unit）及びブロックの予測に関連した情報である。 The entropy encoder 107 entropy-encodes the prediction parameter and the transform quantization value, and outputs a bit stream. The prediction parameter is information related to CTU (Coding Tree Unit) and block prediction, such as a prediction mode (intra prediction, inter prediction), an intra prediction block size, an intra prediction direction, an inter prediction block size, and a motion vector.

予測器１０６は、入力映像信号に対する予測信号を生成する。予測信号は、イントラ予測またはフレーム間予測に基づいて生成される。 The predictor 106 generates a prediction signal for the input video signal. The prediction signal is generated based on intra prediction or inter-frame prediction.

逆量子化／逆周波数変換器１０４は、変換量子化値を逆量子化する。さらに、逆量子化／逆周波数変換器１０４は、逆量子化した周波数変換係数を逆周波数変換する。逆周波数変換された再構築予測誤差画像は、予測信号が加えられて、バッファ１０５に供給される。バッファ１０５は、再構築画像を格納する。 The inverse quantization / inverse frequency converter 104 inversely quantizes the transformed quantization value. Further, the inverse quantization / inverse frequency converter 104 performs inverse frequency conversion on the inversely quantized frequency conversion coefficient. The reconstructed prediction error image subjected to the inverse frequency conversion is supplied with a prediction signal and supplied to the buffer 105. The buffer 105 stores the reconstructed image.

解析部１１１は、符号化統計情報を解析する。判定部１１２は、上述した動きベクトル制限で、スライス境界付近で最適な動きベクトルを選択できるか否かを、解析部１１１の解析結果に基づいて判定する。なお、符号化統計情報は、過去のフレーム（例えば、現在の符号化対象のフレームの直前のフレーム）の符号化結果の情報であるが、符号化統計情報の具体例については後述する。 The analysis unit 111 analyzes the encoded statistical information. The determination unit 112 determines whether or not an optimum motion vector can be selected near a slice boundary by the above-described motion vector restriction based on the analysis result of the analysis unit 111. Note that the encoding statistical information is information on the encoding result of a past frame (for example, the frame immediately before the current encoding target frame), and a specific example of the encoding statistical information will be described later.

なお、スライス境界付近は、最適な動きベクトルを選択できなかった領域になるが、以下の制御を実現する際に、便宜的に、例えば、スライス境界から±１２８画素の範囲や±２５６画素の範囲を、スライス境界付近としてもよい。また、以下の制御を実現する際に、「スライス境界付近」の範囲を、映像の状況（動きが大きい／小さいなど）に応じて、適宜変更可能であるようにしてもよい。例えば、値が大きい動きベクトルの発生比率が高い場合に、「スライス境界付近」の範囲を広く設定するようにしてもよい。 Note that the vicinity of the slice boundary is an area where an optimal motion vector cannot be selected. However, when the following control is realized, for example, a range of ± 128 pixels or a range of ± 256 pixels from the slice boundary is used for convenience. May be set near the slice boundary. Further, when implementing the following control, the range of “near the slice boundary” may be changed as appropriate according to the state of the video (movement is large / small). For example, when the occurrence ratio of a motion vector having a large value is high, the range of “near a slice boundary” may be set wider.

Ｍ値決定部１１３は、判定部１１２の判定結果に基づいて、Ｍ値を適応的に決定する。なお、上述したように、Ｍ値を決定することは、ＳＯＰ構造におけるＬｘ（ｘ＝０，１，２，３）構造を決定することと等価である。また、符号化統計情報については、後述する。 The M value determination unit 113 adaptively determines the M value based on the determination result of the determination unit 112. As described above, determining the M value is equivalent to determining the Lx (x = 0, 1, 2, 3) structure in the SOP structure. The coding statistical information will be described later.

図２は、映像復号装置の実施形態の構成例を示すブロック図である。図２に示す映像復号装置２００は、エントロピー復号器２０２、逆量子化／逆周波数変換器２０３、予測器２０４、及びバッファ２０５を含む。 FIG. 2 is a block diagram illustrating a configuration example of an embodiment of a video decoding device. The video decoding device 200 illustrated in FIG. 2 includes an entropy decoder 202, an inverse quantization / inverse frequency converter 203, a predictor 204, and a buffer 205.

エントロピー復号器２０２は、映像のビットストリームをエントロピー復号する。エントロピー復号器２０２は、エントロピー復号した変換量子化値を逆量子化／逆周波数変換器２０３に供給する。 The entropy decoder 202 entropy-decodes a video bitstream. The entropy decoder 202 supplies the transform quantization value subjected to the entropy decoding to the inverse quantization / inverse frequency converter 203.

逆量子化／逆周波数変換器２０３は、量子化ステップ幅で、輝度及び色差の変換量子化値を逆量子化して周波数変換係数を得る。さらに、逆量子化／逆周波数変換器２０３は、逆量子化した周波数変換係数を逆周波数変換する。 The inverse quantization / inverse frequency converter 203 inversely quantizes the transform quantization values of luminance and chrominance with a quantization step width to obtain a frequency transform coefficient. Further, the inverse quantization / inverse frequency converter 203 performs inverse frequency conversion on the inversely quantized frequency conversion coefficient.

逆周波数変換後、予測器２０４は、バッファ２０５に格納された再構築ピクチャの画像を用いて予測信号を生成する（前記予測は、動き補償予測、または、ＭＣ参照とも呼ぶ）。逆量子化／逆周波数変換器２０３で逆周波数変換された再構築予測誤差画像は、予測器２０４から供給される予測信号が加えられて、再構築ピクチャとしてバッファ２０５に供給される。そして、バッファ２０５に格納された再構築ピクチャが復号映像として出力される。 After the inverse frequency conversion, the predictor 204 generates a prediction signal using the image of the reconstructed picture stored in the buffer 205 (the prediction is also called motion compensation prediction or MC reference). The reconstructed prediction error image subjected to the inverse frequency transform by the inverse quantization / inverse frequency converter 203 is supplied with the prediction signal supplied from the predictor 204 and supplied to the buffer 205 as a reconstructed picture. Then, the reconstructed picture stored in the buffer 205 is output as a decoded video.

次に、映像符号化装置１００における解析部１１１、判定部１１２及びＭ値決定部１１３の動作を説明する。 Next, operations of the analysis unit 111, the determination unit 112, and the M-value determination unit 113 in the video encoding device 100 will be described.

実施形態１．
図３は、図１に示された映像符号化装置１００の第１の実施形態の動作を示すフローチャートである。第１の実施形態では、８Ｋの映像は４分割され（図１１参照）、スライス境界付近において動きベクトル制限があるとする。また、動きベクトル制限として、±１２８を例にする。８Ｋの映像は４分割され、かつ、動きベクトル制限があることは、他の実施形態でも同様である。なお、Ｍ値の初期値は８（Ｍ＝８）である。 Embodiment 1 FIG.
FIG. 3 is a flowchart showing the operation of the video encoding device 100 shown in FIG. 1 according to the first embodiment. In the first embodiment, it is assumed that an 8K video is divided into four (see FIG. 11), and there is a motion vector restriction near a slice boundary. Also, ± 128 is taken as an example of the motion vector limit. The fact that an 8K video is divided into four parts and that there are motion vector restrictions is the same in other embodiments. Note that the initial value of the M value is 8 (M = 8).

解析部１１１は、バッファ１０５に格納されている過去の符号化結果（例えば、直前フレームの符号化結果）を解析する。具体的には、解析部１１１は、スライス境界以外のブロックにおける動きベクトルの平均値又は中央値（以下、平均値又は中央値をＭ_ａｖｇとする。）を算出する（ステップＳ１０１）。なお、第１の実施形態では、符号化統計情報は、動きベクトルの値であり、解析結果は、動きベクトルの平均値又は中央値である。 The analysis unit 111 analyzes a past encoding result (eg, the encoding result of the immediately preceding frame) stored in the buffer 105. Specifically, the analysis unit 111 calculates an average value or a median value of the motion vectors in blocks other than the slice boundary (hereinafter, the average value or the median value is referred to as _Mavg ) (step S101). In the first embodiment, the coding statistical information is a value of a motion vector, and the analysis result is an average value or a median value of the motion vectors.

判定部１１２は、Ｍ_ａｖｇが、動きベクトル制限としての±１２８を基準として、どの程度の大きさになっているかを判定する（ステップＳ１０２）。 The determination unit 112 determines how large the value of _Mavg is based on ± 128 as the motion vector limit (step S102).

そして、Ｍ値決定部１１３は、Ｍ_ａｖｇがどの程度の大きさになっているかの判定結果に基づいて、Ｍ値を決定する（ステップＳ１０３）。 Then, the M value determining unit 113 determines the M value based on the determination result of the magnitude of M _avg (step S103).

Ｍ値決定部１１３は、判定結果に基づいて、例えば、以下のようにＭ値を決定する。 The M value determination unit 113 determines the M value based on the determination result, for example, as follows.

Ｍ値決定部１１３は、Ｍ値がその他の値であるときにも、上記の（１），（２）の場合と同様に、Ｍ値を８にしたときに、動きベクトル制限の下で、スライス境界付近での動きベクトルの値が±１２８以内に収まると推定できたときには、Ｍ値を８に戻す。換言すれば、Ｍ値決定部１１３は、動きベクトル制限の下で、スライス境界付近で最適な動きベクトルを選択できると推定できた場合には、Ｍ値を８に戻す。その他の場合にも、Ｍ_ａｖｇに応じて、スライス境界付近での動きベクトルの値が±１２８以内に収まるようにＭ値を決定する。 When the M value is set to 8, even when the M value is any other value, the M value determination unit 113 sets the M value to 8 in the same manner as in the above cases (1) and (2). When it is estimated that the value of the motion vector near the slice boundary falls within ± 128, the M value is returned to 8. In other words, when it is estimated that the optimal motion vector can be selected near the slice boundary under the motion vector restriction, the M value determining unit 113 returns the M value to 8. In other cases, depending on the M _avg, the motion vector value in the vicinity of a slice boundary determining M value to fit within ± 128.

なお、上記の場合分け（閾値の設定）は一例であって、閾値を変えたり、より細かな場合分けをしてもよい。 Note that the above case classification (threshold setting) is merely an example, and the threshold may be changed or finer case classification may be performed.

第１の実施形態の映像符号化装置の制御は、以下のような考え方に基づく。 The control of the video encoding device according to the first embodiment is based on the following concept.

映像が、画面全体が速く動くシーンの映像であるときには、発生した全ての動きベクトルに対して、スライス境界付近でもスライス境界付近以外でも、値が大きい動きベクトルの数の比率が高い。しかし、動きベクトル制限があるので、スライス境界付近では、最適な動きベクトルが選択されていない可能性がある。そこで、判定部１１２は、スライス境界以外の領域において発生した符号化統計情報としての動きベクトル（動きベクトル制限はないので、正規の、換言すれば最適な動きベクトルである。）に基づいて、符号化対象の画面が速く動くシーンの映像の画面であるか否かを推定する。Ｍ値決定部１１３は、速く動くシーンの映像であると判定部１１２が推定した場合には、スライス境界付近において最適な動きベクトルを選択可能になるようにＭ値を変える。 When the video is a video of a scene in which the entire screen moves fast, the ratio of the number of motion vectors having a large value to all the generated motion vectors is high near the slice boundary and other than near the slice boundary. However, since there is a motion vector restriction, an optimal motion vector may not be selected near the slice boundary. Therefore, the determination unit 112 determines a code based on a motion vector as coding statistical information generated in an area other than the slice boundary (there is no regular motion vector, in other words, an optimal motion vector because there is no motion vector restriction). It is estimated whether or not the screen to be converted is a screen of a video of a fast-moving scene. If the determination unit 112 estimates that the image is a video of a fast-moving scene, the M value determination unit 113 changes the M value so that an optimal motion vector can be selected near the slice boundary.

なお、速く動くシーンの映像である場合には、スライス境界付近において最適な動きベクトルが選択されていない可能性があるので、速く動くシーンの映像であると推定されたことは、動きベクトル制限の下で、スライス境界付近において最適な動きベクトルが選択されていないと推定されたことと等価である。 In the case of a video of a fast-moving scene, the optimal motion vector may not be selected near the slice boundary. This is equivalent to the assumption that an optimal motion vector has not been selected near the slice boundary.

また、上述したように、Ｍ値とＳＯＰ構造とは相関している。よって、Ｍ値決定部１１３がＭ値を決定することは、ＳＯＰ構造（すなわち、Ｌｘ（ｘ＝０，１，２，３）構造）を決定することと等価である。 Further, as described above, the M value and the SOP structure are correlated. Therefore, determining the M value by the M value determining unit 113 is equivalent to determining the SOP structure (that is, the Lx (x = 0, 1, 2, 3) structure).

実施形態２．
図４は、図１に示された映像符号化装置１００の第２の実施形態の動作を示すフローチャートである。 Embodiment 2. FIG.
FIG. 4 is a flowchart showing the operation of the video encoding device 100 shown in FIG. 1 according to the second embodiment.

解析部１１１は、バッファ１０５に格納されている過去の符号化結果（例えば、直前フレームの符号化結果）を解析する。具体的には、解析部１１１は、スライス境界以外の範囲における全てのブロック（例えば、PU：Prediction Unit ）に対して、画面内予測（イントラ予測）が用いられたブロックの割合Ｐ_１を算出し（ステップＳ２０１）、スライス境界付近の全てのブロックに対して、画面内予測が用いられたブロックの割合Ｐ_２を算出する（ステップＳ２０２）。なお、第２の実施形態では、符号化統計情報は、スライス境界付近のブロックの予測モード（具体的には、画面内予測のブロックの数）であり、解析結果は、割合Ｐ_１及び割合Ｐ_２である。 The analysis unit 111 analyzes a past encoding result (eg, the encoding result of the immediately preceding frame) stored in the buffer 105. Specifically, the analysis unit 111, all blocks in the range other than the slice boundary (e.g., PU: Prediction Unit) with respect to, the intra prediction (intra-prediction) and calculating the ratio P ₁ of the block used is (step S201), for all the blocks in the vicinity of a slice boundary, and calculates the ratio _{P 2} of the blocks used are intra prediction (step S202). In the second embodiment, the coding statistical information is the prediction mode of the block near the slice boundary (specifically, the number of blocks for intra prediction), and the analysis result is the ratio P ₁ and the ratio P ₁ ₂ .

判定部１１２は、割合Ｐ_１と割合Ｐ_２とを比較し、それらの乖離の程度を判定する。具体的には、割合Ｐ_１と比較して、割合Ｐ_２がかなり大きいか否かを判定する。判定部１１２は、例えば、割合Ｐ_２と割合Ｐ_１との差が所定値を越えているか否か判定する（ステップＳ２０３）。 Determination unit 112 compares the ratio _{P 1} and the ratio _{P 2,} determines their degree of divergence. Specifically, as compared with the ratio P _1, it determines whether or not the ratio P ₂ is quite large. Determination unit 112, for example, determines whether the difference between the ratio _{P 2} and the ratio _{P 1} exceeds a predetermined value (step S203).

Ｍ値決定部１１３は、割合Ｐ_２と割合Ｐ_１との差が所定値を越えている場合には、Ｍ値を小さくする（ステップＳ２０４）。なお、複数の所定値を設け、例えば、差が第１の所定値を越えているときにはＭ値を複数段階小さくし、差が第２の所定値（＜第１の所定値）を越えているときにはＭ値を１段階小さくするようにしてもよい。 M value determining unit 113, when the difference between the ratio _{P 2} and the ratio _{P 1} exceeds a predetermined value, the smaller the M value (step S204). Note that a plurality of predetermined values are provided. For example, when the difference exceeds the first predetermined value, the M value is reduced by a plurality of steps, and the difference exceeds the second predetermined value (<the first predetermined value). At times, the M value may be reduced by one step.

また、Ｍ値決定部１１３は、割合Ｐ_２と割合Ｐ_１との差が所定値以下である場合には、Ｍ値を維持するか、又は、Ｍ値を大きくする（ステップＳ２０５）。例えば、Ｍ値決定部１１３は、差が第３の所定値（＜第２の所定値）以下であるときにはＭ値を大きくし、差が第３の所定値を越えているときにはＭ値を維持する。 Further, M value determining unit 113, the difference between the ratio _{P 2} and the ratio _{P 1} is the case is less than the predetermined value, maintaining or M value, or increases the M value (step S205). For example, the M value determining unit 113 increases the M value when the difference is equal to or less than a third predetermined value (<second predetermined value), and maintains the M value when the difference exceeds the third predetermined value. I do.

第２の実施形態の映像符号化装置の制御は、以下のような考え方に基づく。 The control of the video encoding device according to the second embodiment is based on the following concept.

符号化部１０１は、画面内の各ブロックを符号化する際に、予測モードとして画面内予測と画面間予測（インター予測）とのいずれかを使用できる。映像が、画面全体が速く動くシーンの映像であるときには、スライス境界付近においても、画面間予測が使用されるときに値が大きい動きベクトルの数の発生率が高いと考えられる（動きベクトル制限がない場合）。動きベクトル制限があるので、スライス境界付近では、最適な動きベクトル（大きな動きベクトル）を発生することができず、その結果、スライス境界付近では、画面内予測が使用されることが多いと考えられる。スライス境界付近以外では、動きベクトル制限はないので、スライス境界付近に比べて、画面内予測が使用されることは少ないと考えられる。 When coding each block in a screen, the coding unit 101 can use any of an intra-screen prediction and an inter-screen prediction (inter prediction) as a prediction mode. When the video is a video of a scene in which the entire screen moves fast, it is considered that the occurrence rate of the number of motion vectors having a large value when the inter-screen prediction is used is high even near the slice boundary (the motion vector restriction is limited). If not). Since there is a motion vector restriction, an optimal motion vector (large motion vector) cannot be generated near the slice boundary, and as a result, it is considered that intra prediction is often used near the slice boundary. . Since there is no motion vector restriction other than near the slice boundary, it is considered that intra prediction is rarely used as compared with the vicinity of the slice boundary.

よって、割合Ｐ_１と割合Ｐ_２とが大きく乖離している場合には、速く動くシーンの映像の信号が符号化部１０１に入力されていると推定される。 Therefore, if the ratio P ₁ and the ratio P ₂ are greatly different, the signal of the video of fast moving scenes are estimated to be input to the encoding unit 101.

なお、速く動くシーンの映像である場合には、スライス境界付近において最適な動きベクトルが選択されていない可能性があるので、速く動くシーンの映像であると推定されたことは、動きベクトル制限の下で、割合Ｐ_１と割合Ｐ_２とが大きく乖離していることと等価である。 In the case of a video of a fast-moving scene, the optimal motion vector may not be selected near the slice boundary. below, it is equivalent to the ratio P ₁ and the ratio P ₂ are largely different.

大きく乖離しているか否か判定するための所定値として、一例として、経験的又は実験的に、そのような値を閾値として使用すれば、スライス境界付近において最適な動きベクトルが選択されていない可能性があることを推定可能な値が選択される。 As a predetermined value for determining whether there is a large deviation, for example, empirically or experimentally, if such a value is used as a threshold, an optimal motion vector may not be selected near a slice boundary. A value that can be estimated to have a possibility is selected.

実施形態３．
図５は、図１に示された映像符号化装置１００の第３の実施形態の動作を示すフローチャートである。 Embodiment 3 FIG.
FIG. 5 is a flowchart showing the operation of the video encoding device 100 shown in FIG. 1 according to the third embodiment.

解析部１１１は、バッファ１０５に格納されている過去の符号化結果（例えば、直前フレームの符号化結果）を解析する。具体的には、解析部１１１は、以前のフレーム（例えば、現在の符号化対象のフレームの２フレーム前）のスライス境界付近のブロックにおける発生符号量Ｃ_１を算出する（ステップＳ３０１）。また、解析部１１１は、直前のフレームのスライス境界付近のブロックにおける発生符号量Ｃ_２を算出する（ステップＳ３０２）。なお、第３の実施形態では、符号化統計情報は、スライス境界付近のブロックの発生符号量であり、解析結果は、発生符号量Ｃ_１及び発生符号量Ｃ_２である。 The analysis unit 111 analyzes a past encoding result (eg, the encoding result of the immediately preceding frame) stored in the buffer 105. Specifically, the analysis unit 111, the previous frame (e.g., two frames before the current encoding target frame) is calculated generated code amount C ₁ in the block in the vicinity of a slice boundary (step S301). Further, the analysis unit 111 calculates a generated code amount _{C 2} in the block near the slice boundary of the previous frame (step S302). In the third embodiment, the coding statistics are generated code amount of a block near the slice boundary, the analysis result is the generation code amount C ₁ and the generated code amount C _2.

判定部１１２は、発生符号量Ｃ_１と発生符号量Ｃ_２とを比較し、それらの乖離の程度を判定する。具体的には、発生符号量Ｃ_１と比較して、発生符号量Ｃ_２がかなり大きいか否かを判定する。判定部１１２は、例えば、発生符号量Ｃ_２と発生符号量Ｃ_１との差が所定量を越えているか否か判定する（ステップＳ３０３）。 Determination unit 112 compares the generated code amount C ₁ and the generated code amount C _2, determines their degree of divergence. Specifically, as compared to the amount of generated codes C _1, determines whether or not considerably large amount of generated codes C _2. Determination unit 112, for example, determines whether the difference between the generated code amount _{C 2} and the generated code amount _{C 1} exceeds a predetermined amount (step S303).

Ｍ値決定部１１３は、発生符号量Ｃ_２と発生符号量Ｃ_１との差が所定量を越えている場合には、Ｍ値を小さくする（ステップＳ３０４）。なお、複数の所定量を設け、例えば、差が第１の所定量を越えているときにはＭ値を複数段階小さくし、差が第２の所定量（＜第１の所定量）を越えているときにはＭ値を１段階小さくするようにしてもよい。 M value determining unit 113, when the difference between the generated code amount C ₂ and the generated code amount C ₁ exceeds a predetermined amount, the smaller the M value (step S304). Note that a plurality of predetermined amounts are provided. For example, when the difference exceeds the first predetermined amount, the M value is reduced by a plurality of steps, and the difference exceeds the second predetermined amount (<the first predetermined amount). At times, the M value may be reduced by one step.

また、Ｍ値決定部１１３は、発生符号量Ｃ_２と発生符号量Ｃ_１との差が所定量以下である場合には、Ｍ値を維持するか、又は、Ｍ値を大きくする（ステップＳ３０５）。例えば、Ｍ値決定部１１３は、差が第３の所定量（＜第２の所定量）以下であるときにはＭ値を大きくし、差が第３の所定量を越えているときにはＭ値を維持する。 Further, M value determining unit 113, when the difference between the generated code amount C ₂ and the generated code amount C ₁ is equal to or less than the predetermined amount, maintaining or M value, or increases the M value (step S305 ). For example, the M value determining unit 113 increases the M value when the difference is equal to or less than a third predetermined amount (<second predetermined amount), and maintains the M value when the difference exceeds the third predetermined amount. I do.

第３の実施形態の映像符号化装置の制御は、以下のような考え方に基づく。 The control of the video encoding device according to the third embodiment is based on the following concept.

上述したように、画面全体が速く動くシーンの映像であるときには、スライス境界付近においても、画面間予測が使用されるときに値が大きい動きベクトルの数の比率が高いと考えられる（動きベクトル制限がない場合）。しかし、動きベクトル制限があるので、スライス境界付近では、最適な動きベクトル（大きな動きベクトル）を発生することができず、その結果、スライス境界付近では、画面内予測が使用されることが多いと考えられる。一般に、画面間予測が使用されるときに比べて、画面内予測が使用されるときには、発生符号量は多くなる。 As described above, when a video of a scene in which the entire screen moves fast, it is considered that the ratio of the number of motion vectors having a large value when the inter-screen prediction is used is high even near the slice boundary (motion vector restriction). If not). However, since there is a motion vector restriction, an optimal motion vector (large motion vector) cannot be generated near a slice boundary, and as a result, intra prediction is often used near a slice boundary. Conceivable. In general, the generated code amount is larger when intra-picture prediction is used than when inter-picture prediction is used.

よって、発生符号量Ｃ_１と比較して、発生符号量Ｃ_２がかなり多い場合には、速く動くシーンの映像の信号が符号化部１０１に入力される状況に変化したと推定される。 Therefore, as compared with the amount of generated codes C _1, when the generated code amount C ₂ is quite large, the signal of the video of fast moving scenes are estimated to have changed the situation that is input to the encoding unit 101.

なお、速く動くシーンの映像になった場合には、スライス境界付近において最適な動きベクトルが選択されない可能性があるので、速く動くシーンの映像になったと推定されたことは、動きベクトル制限の下で、発生符号量Ｃ_２が大きく増えたことと等価である。 In the case of a video of a fast-moving scene, the optimal motion vector may not be selected near the slice boundary. in is equivalent to the generated code amount C ₂ increases greatly.

大きく増えたか否か判定するための所定量として、一例として、経験的又は実験的に、そのような量を閾値として使用すれば、スライス境界付近において最適な動きベクトルが選択されない可能性があることを推定可能な値が選択される。 As a predetermined amount for determining whether or not a large increase has occurred, for example, empirically or experimentally, if such an amount is used as a threshold, an optimal motion vector may not be selected near a slice boundary. Is selected.

以上に説明したように、上記の各実施形態では、過去の符号化結果（符号化統計情報）に基づいてＭ値が適応的に切替えられる。符号化統計情報に基づいて動きベクトル制限の下で、スライス境界付近で最適な動きベクトル（換言すれば、動きベクトル制限を外れる動きベクトル）を選択できるか否かが推定される。選択できないと推定され場合には、Ｍ値はより小さな値に変更される。選択できると判定された場合、そのときのＭ値でも動きベクトル制限の下でスライス境界付近で最適な動きベクトルを選択できると考えられるので、Ｍ値は、維持されるか、又は、より大きな値に変更される。 As described above, in each of the above embodiments, the M value is adaptively switched based on the past encoding result (encoding statistical information). Based on the coding statistical information, it is estimated whether or not an optimum motion vector (in other words, a motion vector that does not exceed the motion vector limit) can be selected near the slice boundary under the motion vector limit. If it is estimated that it cannot be selected, the M value is changed to a smaller value. If it is determined that the motion vector can be selected, the M value at that time is considered to be able to select the optimal motion vector near the slice boundary under the motion vector restriction, so the M value is maintained or is set to a larger value. Is changed to

その結果、動きベクトル制限によってスライス境界付近で最適な動きベクトルを選択できない状態になることをでるだけ回避でき、局所的な画質劣化が生ずる可能性を低減できる。すなわち、動きの速さに応じてＭ値が適応的に切替えられるので、好適な画質を得ることができる。 As a result, it is possible to avoid a state where an optimum motion vector cannot be selected in the vicinity of a slice boundary due to the motion vector restriction as much as possible, and it is possible to reduce the possibility that local image quality degradation occurs. That is, since the M value is adaptively switched in accordance with the speed of the movement, a suitable image quality can be obtained.

また、符号化結果（例えば、直前のフレームの符号化結果）に基づいてＭ値を切り替えることができるので、事前解析（現在のフレームを符号化する際に前処理として実行される解析処理）を行う必要がなく、事前解析を行う場合と比較して、符号化のための処理時間が延びてしまうことが防止される。 In addition, since the M value can be switched based on the encoding result (for example, the encoding result of the immediately preceding frame), the pre-analysis (analysis processing executed as pre-processing when encoding the current frame) can be performed. This need not be performed, and the processing time for encoding is prevented from being extended as compared with the case of performing pre-analysis.

なお、映像符号化装置１００において、第１〜第３の実施形態のうちの任意の２つ又は全ての形態が組み込まれるように、解析部１１１、判定部１１２及びＭ値決定部１１３が構成されていてもよい。 Note that, in the video encoding device 100, the analysis unit 111, the determination unit 112, and the M-value determination unit 113 are configured to incorporate any two or all of the first to third embodiments. May be.

さらに、映像符号化装置１００において、Ｍ値決定部１１３は外部から設定されるSDR とHDR の切り替え情報も用いて符号化構造を決定し、さらに、エントロピー符号化器１０７がSDR とHDR の切り替え情報を映像復号装置側に伝送してもよい。 Further, in the video encoding device 100, the M-value determining unit 113 determines the encoding structure using the switching information of SDR and HDR which is set from outside, and furthermore, the entropy encoder 107 determines the switching information of SDR and HDR. May be transmitted to the video decoding device side.

具体的には、Ｍ値決定部１１３は、外部から設定されるSDR とHDR の切り替えの位置（時間位置）で符号化シーケンス（CVS ）を終端できるように、M値を制御する。 More specifically, the M value determining unit 113 controls the M value so that the coding sequence (CVS) can be terminated at a switching position (time position) between SDR and HDR set from the outside.

説明の簡単のため、現在の時間位置のフレームから切り替え時間位置のフレームまでのフレーム数をfNumSwitch、仮決定したＭ値をＭとする。 For simplicity of description, the number of frames from the frame at the current time position to the frame at the switching time position is fNumSwitch, and the temporarily determined M value is M.

fNumSwitchが１以上で、かつ、ＭがfNumSwitchよりも大きな時、Ｍ値決定部１１３は、MをfNumSwitch以下の値に更新する。 When fNumSwitch is 1 or more and M is larger than fNumSwitch, the M value determination unit 113 updates M to a value equal to or less than fNumSwitch.

その他の場合で、fNumSwitchが０の時、Ｍ値決定部は、過去に符号化したフレームでCVS が終端されるように、Ｍを１とする。つまり、映像符号化装置は、現在のフレームをIDR ピクチャとして圧縮することになる。さらに、エントロピー符号化器１０７は、SDR とHDR の切り替え情報を映像復号装置側に伝送するために、IDR ピクチャのSPS のVUI のtransfer_characteristicsシンタクスに、切り替わったSDR またはHDR の情報を設定する。例えば、HLG のHDR に切り替わった場合transfer_characteristicsシンタクスに18を設定し、PQのHDR に切り替わった場合transfer_characteristicsシンタクスに16を設定し、Rec. ITU-R BT.2020のSDR に切り替わった場合transfer_characteristicsシンタクスに14を設定し、IEC 61966-2-4 のSDR に切り替わった場合transfer_characteristicsシンタクスに11を設定し、Rec. ITU-R BT.709 のSDR に切り替わった場合transfer_characteristicsシンタクスに1 を設定する。なお、このとき、エントロピー符号化器１０７は、IDR ピクチャのビットストリームよりも前に、EOS のビットストリームを出力してもよい。 In other cases, when fNumSwitch is 0, the M value determination unit sets M to 1 so that CVS is terminated in a previously encoded frame. That is, the video encoding device compresses the current frame as an IDR picture. Furthermore, the entropy encoder 107 sets the switched SDR or HDR information in the transfer_characteristics syntax of the VPS of the SPS of the IDR picture in order to transmit the switching information between SDR and HDR to the video decoding device side. For example, when switching to HLG HDR, set transfer_characteristics syntax to 18; when switching to PQ HDR, set transfer_characteristics syntax to 16; and when switching to Rec.ITU-R BT.2020 SDR, set transfer_characteristics syntax to 14. Is set, and when switching to SDR of IEC 61966-2-4, set transfer_characteristics syntax to 11, and when switching to SDR of Rec. ITU-R BT.709, set 1 to transfer_characteristics syntax. At this time, the entropy encoder 107 may output the EOS bit stream before the IDR picture bit stream.

その他の場合、Ｍ値決定部１１３は、Ｍをそのまま出力する。 In other cases, the M value determination unit 113 outputs M as it is.

なお、上述したSDR やHDR に対応するtransfer_characteristicsシンタクスの値と特性の関係は以下の表の通りである。 The relationship between the value of the transfer_characteristics syntax corresponding to the above-mentioned SDR and HDR and the characteristics is as shown in the following table.

また、図２に示された映像復号装置は、第１〜第３の実施形態において例示されたような、動きベクトル制限を満たす範囲で設定されたＭ値を用いて符号化されたビットストリームを復号する。 The video decoding device illustrated in FIG. 2 converts a bit stream encoded using an M value set within a range that satisfies a motion vector restriction as illustrated in the first to third embodiments. Decrypt.

さらに、図２に示された映像復号装置は、ビットストリームの復号によって、映像符号化装置側から伝送されたSDR とHDR の切り替え情報を受信することもできる。 Further, the video decoding device shown in FIG. 2 can also receive the switching information between SDR and HDR transmitted from the video encoding device side by decoding the bit stream.

具体的には、映像復号装置のエントロピー復号器２０２は、SPSのVUIのtransfer_characteristicsシンタクスの値を復号することでSDR とHDR の切り替え情報を受信できる。例えば、transfer_characteristicsシンタクスが18の場合HLG のHDR への切り替わりを受信し、transfer_characteristicsシンタクスが16の場合PQのHDR への切り替わりを受信し、transfer_characteristicsシンタクスが14の場合Rec. ITU-R BT.2020のSDR への切り替わりを受信し、transfer_characteristicsシンタクスが11の場合IEC 61966-2-4 のSDR への切り替わりを受信し、transfer_characteristicsシンタクスが1 の場合Rec. ITU-R BT.709 への切り替わりを受信できる。ここで、SDR とHDR の切り替えは、Ｍ＝１で符号化されたIDR ピクチャのSPS の復号だけで受信（検出）されることはいうまでもない。また、IDR ビットストリームの前にはEOS のビットストリームが受信（検出）されていてもよいことはいうまでもない。 Specifically, the entropy decoder 202 of the video decoding device can receive the switching information between SDR and HDR by decoding the value of the transfer_characteristics syntax of the VPS of the SPS. For example, if the transfer_characteristics syntax is 18, a switch to HLG HDR is received, if the transfer_characteristics syntax is 16, a switch to PDR HDR is received, and if the transfer_characteristics syntax is 14, Rec.ITU-R BT.2020 SDR. When the transfer_characteristics syntax is 11, a switch to IEC 61966-2-4 SDR is received, and when the transfer_characteristics syntax is 1, a switch to Rec. ITU-R BT.709 can be received. Here, it goes without saying that switching between SDR and HDR is received (detected) only by decoding the SPS of the IDR picture coded with M = 1. It goes without saying that the EOS bit stream may be received (detected) before the IDR bit stream.

上記の映像復号装置を利用した受信端末は、SDR とHDR の切り替え情報を知ることができ、映像信号の特性に応じた映像表示を調整できる。つまり、SDR とHDR の切り替え表示に対応しつつ、画質劣化が抑制された映像表示ができる。 A receiving terminal using the above-described video decoding device can know switching information between SDR and HDR, and can adjust video display according to characteristics of a video signal. In other words, it is possible to display an image in which image quality deterioration is suppressed while supporting switching display between SDR and HDR.

図６は、映像システムの一例を示すブロック図である。図６に示す映像システムは、上記の各実施形態の映像符号化装置１００と図２に示された映像復号装置２００とが、無線伝送路又は有線伝送路３００で接続されるシステムである。映像符号化装置１００は、上記の第１〜第３の実施形態のいずれかの映像符号化装置１００であるが、映像符号化装置１００において、第１〜第３の実施形態のうちの任意の２つ又は全ての処理を実行するように、解析部１１１、判定部１１２及びＭ値決定部１１３が構成されていてもよい。 FIG. 6 is a block diagram illustrating an example of a video system. The video system shown in FIG. 6 is a system in which the video encoding device 100 of each of the above embodiments and the video decoding device 200 shown in FIG. 2 are connected by a wireless transmission line or a wired transmission line 300. The video encoding device 100 is the video encoding device 100 according to any one of the first to third embodiments. However, in the video encoding device 100, any one of the first to third embodiments is used. The analysis unit 111, the determination unit 112, and the M-value determination unit 113 may be configured to execute two or all processes.

なお、上記の例では、SDR とHDR の切り替え情報を映像復号側に伝送する伝送手段は、エントロピー符号化器１０７で実現される。また、符号化された映像と映像符号化側から伝送されたSDR とHDR の切り替え情報を復号する復号手段は、エントロピー復号器２０２で実現される。しかし、エントロピー符号化を行うエントロピー符号化器が、エントロピー符号化器による符号化データとSDR とHDR の切り替え情報とを多重化する多重化器とは分離して構成され、エントロピー復号を行うエントロピー復号器が、多重化ビットストリームからSDR とHDR の切り替え情報と映像とを分離する多重化解除器とは分離して構成されている場合には、映像システムを、多重化器を含まない部分で構成される映像符号化装置と、多重化解除器を含まない部分で構成される映像復号装置とで構成されるシステムとしてもよい。 In the above example, the transmission means for transmitting the switching information between SDR and HDR to the video decoding side is realized by the entropy encoder 107. Decoding means for decoding the coded video and the switching information between SDR and HDR transmitted from the video coding side is realized by the entropy decoder 202. However, the entropy encoder that performs entropy encoding is configured separately from the multiplexer that multiplexes the data encoded by the entropy encoder and the switching information between SDR and HDR, and entropy decoding that performs entropy decoding The video system is composed of parts that do not include the multiplexer, if the demultiplexer is configured separately from the demultiplexer that separates SDR and HDR switching information and video from the multiplexed bit stream. The system may be configured by a video encoding device to be implemented and a video decoding device configured by a portion not including the demultiplexer.

図７は、映像システムの他の例を示すブロック図である。図７に示す映像システムは、音声符号化部４０１、映像符号化部４０２及び多重化部４０３を含む。 FIG. 7 is a block diagram showing another example of the video system. The video system shown in FIG. 7 includes an audio encoding unit 401, a video encoding unit 402, and a multiplexing unit 403.

音声符号化部４０１は、映像と音声とを含むデータ（コンテンツ）のうちの音声信号を、例えばARIB STD-B32規格で規定されるMPEG-4 AAC(Advanced Audio Coding) 規格やMPEG-4 ALS 規格(Audio Lossless Coding) に基づいて符号化することによって音声ビットストリームを作成して出力する。 The audio encoding unit 401 converts an audio signal of data (content) including video and audio into, for example, the MPEG-4 AAC (Advanced Audio Coding) standard or the MPEG-4 ALS standard defined by the ARIB STD-B32 standard. (Audio Lossless Coding) to create and output an audio bit stream by encoding.

映像符号化部４０２は、例えば、図１に示すように構成され、映像ビットストリームを作成して出力する。 The video encoding unit 402 is configured as shown in FIG. 1, for example, and creates and outputs a video bit stream.

多重化部４０３は、例えばARIB STD-B32規格に基づいて、音声ビットストリーム、映像ビットストリーム及びその他の情報を多重化することによって音声ビットストリームを作成して出力する。 The multiplexing unit 403 creates and outputs an audio bit stream by multiplexing an audio bit stream, a video bit stream, and other information based on, for example, the ARIB STD-B32 standard.

また、上記の各実施形態を、ハードウェアで構成することも可能であるが、コンピュータプログラムにより実現することも可能である。 Further, each of the above embodiments can be configured by hardware, but can also be realized by a computer program.

図８に示す情報処理システムは、プロセッサ１００１、プログラムメモリ１００２、映像データを格納するための記憶媒体１００３およびビットストリームを格納するための記憶媒体１００４を備える。記憶媒体１００３と記憶媒体１００４とは、別個の記憶媒体であってもよいし、同一の記憶媒体からなる記憶領域であってもよい。記憶媒体として、ハードディスク等の磁気記憶媒体を用いることができる。 The information processing system shown in FIG. 8 includes a processor 1001, a program memory 1002, a storage medium 1003 for storing video data, and a storage medium 1004 for storing a bit stream. The storage medium 1003 and the storage medium 1004 may be separate storage media, or may be storage areas composed of the same storage medium. As the storage medium, a magnetic storage medium such as a hard disk can be used.

図８に示された情報処理システムにおいて、プログラムメモリ１００２には、図１，図２のそれぞれに示された各ブロック（バッファのブロックを除く）の機能を実現するためのプログラム（映像符号化プログラム又は映像復号プログラム）が格納される。そして、プロセッサ１００１は、プログラムメモリ１００２に格納されているプログラムに従って処理を実行することによって、図１，図２のそれぞれに示された映像符号化装置または映像復号装置の機能を実現する。 In the information processing system shown in FIG. 8, a program (a video encoding program) for realizing the function of each block (excluding the buffer block) shown in each of FIGS. Or a video decoding program). Then, the processor 1001 performs the processing according to the program stored in the program memory 1002, thereby realizing the functions of the video encoding device or the video decoding device shown in each of FIGS.

図９は、映像符号化装置の主要部を示すブロック図である。図９に示すように、映像符号化装置１０は、符号化統計情報を解析する解析部１１（実施形態における解析部１１１に相当）と、解析部１１の解析結果に基づいて、スライス境界付近で最適な動きベクトルを選択できるか否かを推定する推定部１２（実施形態では、判定部１１２で実現される。）と、推定部１２の推定結果およびSDR とHDR の切り替え情報に基づいて、符号化構造を、Temporal ID が０のピクチャだけで構成されるSOP 構造、Temporal ID が０のピクチャおよび１のピクチャで構成されるSOP 構造、Temporal ID が０のピクチャ、１のピクチャ、および２のピクチャで構成されるSOP 構造、Temporal ID が０のピクチャ、１のピクチャ、２のピクチャおよび３のピクチャで構成されるSOP 構造のいずれかに適応的に決定する符号化構造決定部１３（実施形態では、Ｍ値決定部１１３で実現される。）と、SDR とHDR の切り替え情報を映像復号側に伝送する伝送手段１４（実施形態では、エントロピー符号化器１０７で実現される。）とを備える。 FIG. 9 is a block diagram illustrating a main part of the video encoding device. As illustrated in FIG. 9, the video encoding device 10 analyzes the encoding statistical information and analyzes the encoded statistical information (equivalent to the analyzing unit 111 in the embodiment). An estimating unit 12 for estimating whether or not an optimal motion vector can be selected (in the embodiment, realized by the determining unit 112), and a code based on the estimation result of the estimating unit 12 and the switching information between SDR and HDR An SOP structure composed of only pictures with a Temporal ID of 0, an SOP structure composed of a picture with a Temporal ID of 0 and a picture of 1, a picture with a Temporal ID of 0, a picture of 1 and a picture of 2 The coding structure determination unit 13 (actually determines an SOP structure composed of a picture having a Temporal ID of 0, an SOP structure composed of one picture, one picture, two pictures, and three pictures. In the embodiment, this is realized by the M value determination unit 113.) and the transmission unit 14 (in the embodiment, realized by the entropy encoder 107) for transmitting switching information between SDR and HDR to the video decoding side. Prepare.

図１０は、映像復号装置の主要部を示すブロック図である。図１０に示すように、映像復号装置２０は、Temporal ID が０のピクチャだけで構成されるSOP 構造、Temporal ID が０のピクチャおよび１のピクチャで構成されるSOP 構造、Temporal ID が０のピクチャ、１のピクチャ、および２のピクチャで構成されるSOP 構造、Temporal ID が０のピクチャ、１のピクチャ、２のピクチャ、および３のピクチャで構成されるSOP 構造のいずれかで符号化された映像と映像符号化側から伝送されたSDR とHDR の切り替え情報を復号する復号部２１（実施形態では、エントロピー復号器２０２で実現される。）を備える。 FIG. 10 is a block diagram showing a main part of the video decoding device. As illustrated in FIG. 10, the video decoding device 20 includes an SOP structure including only a picture with a Temporal ID of 0, an SOP structure including a picture with a Temporal ID of 0 and a picture with a 1 and a picture with a Temporal ID of 0. Video encoded with one of the following: an SOP structure consisting of one picture, one picture and two pictures, and an SOP structure consisting of a picture with a Temporal ID of 0, one picture, two pictures, and three pictures. And a decoding unit 21 (in the embodiment, realized by the entropy decoder 202) for decoding switching information between SDR and HDR transmitted from the video encoding side.

なお、復号部２１は、設定された符号化構造としての、Temporal IDが0のピクチャだけで構成されるSOP 構造、Temporal ID が０のピクチャおよび１のピクチャで構成されるSOP 構造、Temporal ID が０のピクチャ、１のピクチャ、および、２のピクチャで構成されるSOP 構造、Temporal ID が０のピクチャ、１のピクチャ、２のピクチャ、および３のピクチャで構成されるSOP 構造のいずれかのSOP 構造に基づいて符号化されたビットストリームを復号することができる。 Note that the decoding unit 21 has an SOP structure composed of only a picture with a Temporal ID of 0, an SOP structure composed of a picture with a Temporal ID of 0, and a picture with a Temporal ID of 0, and a Temporal ID of SOP structure consisting of 0 pictures, 1 picture, and 2 pictures, and SOP structure consisting of pictures with Temporal ID of 0, 1 picture, 2 pictures, and 3 pictures A bitstream encoded based on the structure can be decoded.

さらに、復号部２１は、図１１に示すような4個のスライスに分割されて、さらに、図１２に示すような、あるスライスのPUが別のスライスを動き補償（MC）参照する場合に、スライス境界を跨ぐ同PUのMC参照はスライス境界から１２８ライン以内の画素のみを参照するように制限されて、符号化されたビットストリームを復号できる。 Further, when the decoding unit 21 is divided into four slices as shown in FIG. 11 and the PU of one slice refers to another slice for motion compensation (MC) as shown in FIG. The MC reference of the same PU across the slice boundary is limited to refer to only pixels within 128 lines from the slice boundary, and the encoded bit stream can be decoded.

なお、実施形態では、１２０Ｐの動画像を扱う場合、映像符号化および復号側で図１３に示すような、以下のSOP 構造を用いることができる。 In the embodiment, when handling a 120P moving image, the following SOP structure as shown in FIG. 13 can be used on the video encoding and decoding side.

・Ｌ＝０の構造：Temporal ID が0のピクチャだけで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は１つである。最大Temporal ID を示すＬが０であるともいえる。）
・Ｌ＝１の構造：Temporal ID が０のピクチャおよび１（またはＭ）のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は２つである。最大Temporal ID を示すＬが１（またはＭ）であるともいえる。）
・Ｌ＝２の構造：Temporal ID が０のピクチャ、１のピクチャ、および、２（またはＭ）のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は３つである。最大Temporal ID を示すＬが２（またはＭ）であるともいえる。）
・Ｌ＝３の構造：Temporal ID が０のピクチャ、１のピクチャ、２のピクチャ、および３（またはＭ）のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は４つである。最大Temporal ID を示すＬが３（またはＭ）であるともいえる。）
・Ｌ＝４の構造：Temporal ID が０のピクチャ、１のピクチャ、２のピクチャ、３のピクチャ、および、４（またはＭ）のピクチャで構成されるSOP 構造（つまり、同SOP に含まれるピクチャの段数は４つである。最大Temporal ID を示すＬが４（またはＭ）であるともいえる。） Structure of L = 0: SOP structure composed of only pictures with Temporal ID of 0 (that is, the number of pictures included in the SOP is one. It can be said that L indicating the maximum Temporal ID is 0). )
L = 1 structure: SOP structure composed of a picture with a Temporal ID of 0 and a picture with a 1 (or M) (that is, the number of pictures included in the SOP is two. L indicating the maximum Temporal ID) Is 1 (or M).)
L = 2 structure: SOP structure composed of a picture with a Temporal ID of 0, 1 picture, and 2 (or M) pictures (that is, the number of pictures included in the SOP is three). It can be said that L indicating the maximum Temporal ID is 2 (or M).)
L = 3 structure: SOP structure composed of a picture with a Temporal ID of 0, 1 picture, 2 pictures, and 3 (or M) pictures (that is, the number of pictures included in the SOP is four) It can be said that L indicating the maximum Temporal ID is 3 (or M).)
L = 4 structure: SOP structure composed of a picture with a Temporal ID of 0, 1 picture, 2 pictures, 3 pictures, and 4 (or M) pictures (that is, pictures included in the SOP) Is 4. The L indicating the maximum Temporal ID can be said to be 4 (or M).)

１０映像符号化装置
１１解析部
１２推定部
１３符号化構造決定部
１４伝送部
２０映像復号装置
２１復号部
１００映像符号化装置
１０１符号化部
１０２画面分割器
１０３周波数変換／量子化器
１０４逆量子化／逆周波数変換器
１０５バッファ
１０６予測器
１０７エントロピー符号化器
１１１解析部
１１２判定部
１１３Ｍ値決定部
２００映像復号装置
２０２エントロピー復号器
２０３逆量子化／逆周波数変換器
２０４予測器
２０５バッファ
４０１音声符号化部
４０２映像符号化部
４０３多重化部
１００１プロセッサ
１００２プログラムメモリ
１００３，１０００４記憶媒体 REFERENCE SIGNS LIST 10 video encoding device 11 analyzing unit 12 estimating unit 13 encoding structure determining unit 14 transmission unit 20 video decoding device 21 decoding unit 100 video encoding device 101 encoding unit 102 screen divider 103 frequency transform / quantizer 104 inverse quantum Encoding / Inverse Frequency Transformer 105 Buffer 106 Predictor 107 Entropy Encoder 111 Analysis Unit 112 Judgment Unit 113 M-Value Determination Unit 200 Video Decoding Device 202 Entropy Decoder 203 Dequantization / Inverse Frequency Transformer 204 Predictor 205 Buffer 401 Audio encoding unit 402 Video encoding unit 403 Multiplexing unit 1001 Processor 1002 Program memory 1003, 10004 Storage medium

Claims

A video encoding method for generating a bit stream using a plurality of SOP structures using information of a video signal regarding a dynamic range,
In the time direction, when the video signal to be encoded switches from SDR (Standard Dynamic Range) to HDR (High Dynamic Range), or switches from HDR to SDR, the video related to the dynamic range for switching the dynamic range In order to enable transmission of signal information, the SOP structure of the first SOP after switching is set to an SOP structure composed of only pictures with a Temporal ID of 0,
The information of the video signal relating to the dynamic range, when switching from the SDR to the HDR, or, when from the HDR switched to the SDR, video coding and setting the transfer characteristics The in said bit stream Method.

A video encoding device that generates a bit stream using a plurality of SOP structures using information of a video signal regarding a dynamic range,
In the time direction, when the video signal to be encoded switches from SDR (Standard Dynamic Range) to HDR (High Dynamic Range), or switches from HDR to SDR, the video related to the dynamic range for switching the dynamic range In order to enable transmission of signal information, encoding structure control means for setting the SOP structure of the first SOP after switching to an SOP structure composed of only a picture with a Temporal ID of 0 is provided.
The coding structure control means, the information of the video signal relating to the dynamic range, when from the SDR switched to the HDR, or, when from the HDR switched to the SDR, sets the transfer characteristics The in said bit stream A video encoding device characterized by the above-mentioned.

Using information of the video signal regarding the dynamic range, a computer in a video encoding device that generates a bit stream using a plurality of SOP structures,
In the time direction, when the video signal to be encoded switches from SDR (Standard Dynamic Range) to HDR (High Dynamic Range), or switches from HDR to SDR, the video related to the dynamic range for switching the dynamic range In order to enable transmission of signal information, a process is performed to set the SOP structure of the first SOP after switching to an SOP structure composed of only pictures with a Temporal ID of 0,
Wherein the information of the video signal relating to the dynamic range, when from the SDR switched to the HDR, or, when from the HDR switched to the SDR, video encoding program for setting the transfer characteristics The in the bit stream.

A video encoding device according to claim 2,
An audio encoding unit that encodes an audio signal;
A video system, comprising: a multiplexing unit that multiplexes and outputs a bit stream from the video coding device and a bit stream from the audio coding unit.