WO2024080216A1

WO2024080216A1 - Image decoding device and image encoding device

Info

Publication number: WO2024080216A1
Application number: PCT/JP2023/036356
Authority: WO
Inventors: 哲銘范; 知宏猪飼; 将伸八杉; 友子青野
Original assignee: シャープ株式会社
Priority date: 2022-10-11
Filing date: 2023-10-05
Publication date: 2024-04-18
Also published as: JP2024056375A

Abstract

The present invention addresses the problem that, when an intra-prediction mode is derived on the decoder side using the gradient of a pixel value of an image adjacent to a region of interest, the angular gradient of the adjacent image and the angular gradient of a block of interest do not necessarily match each other. This image decoding device comprises: a reference sample derivation unit that selects an image adjacent to a block of interest in accordance with a DIMD mode; a gradient derivation unit that derives the gradient of a pixel unit using the selected adjacent image; and an angle mode selection unit that derives an intra-prediction mode from the gradient.

Description

Image decoding device and image encoding device

Embodiments of the present invention relate to an image decoding device and an image encoding device.

In order to efficiently transmit or record video, a video encoding device is used that generates encoded data by encoding video, and a video decoding device is used that generates a decoded image by decoding the encoded data.

Specific examples of video coding methods include those proposed in H.264/AVC and HEVC (High-Efficiency Video Coding).

In such video coding methods, the images (pictures) that make up a video are managed in a hierarchical structure consisting of slices obtained by dividing the images, coding tree units (CTUs) obtained by dividing the slices, coding units (sometimes called coding units: CUs) obtained by dividing the coding tree units, and transform units (TUs) obtained by dividing the coding units, and are coded/decoded for each CU.

In addition, in such video coding methods, a predicted image is usually generated based on a locally decoded image obtained by encoding/decoding an input image, and the prediction error (sometimes called a "difference image" or "residual image") obtained by subtracting the predicted image from the input image (original image) is coded. Methods for generating predicted images include inter-frame prediction (inter prediction) and intra-frame prediction (intra prediction).

Another recent example of video encoding and decoding technology is Non-Patent Document 1. Non-Patent Document 1 discloses decoder-side intra mode derivation (DIMD) prediction, in which the decoder derives a predicted image by deriving an intra direction prediction mode number using pixels in adjacent regions.

In Non-Patent Document 1, the intra mode is derived on the decoder side using the gradient of pixel values of the image adjacent to the target area, but there is an issue that the angle gradient of the adjacent image and the angle gradient of the target block do not necessarily match.

The present invention aims to improve the accuracy of decoder-side intra mode derivation by switching intra prediction mode derivation depending on the properties of adjacent blocks and the current block.

It includes a reference sample derivation unit that selects adjacent images for the target block according to the DIMD mode, a gradient derivation unit that uses the selected adjacent images to derive pixel-level gradients, and an angle mode selection unit that derives the intra prediction mode from the gradient.

According to one aspect of the present invention, it is possible to perform suitable intra prediction without increasing the amount of calculation required for decoder-side intra mode derivation.

1 is a schematic diagram showing a configuration of an image transmission system according to an embodiment of the present invention. FIG. 2 is a diagram showing a hierarchical structure of data in an encoded stream. FIG. 13 is a schematic diagram showing types of intra-prediction modes (mode numbers). FIG. 1 is a schematic diagram showing a configuration of a video decoding device. This is an example of DIMD syntax. 13 is a diagram explaining the binarization of the syntax dimd_mode used in the DIMD prediction unit 31046. FIG. Here is another example of DIMD syntax. A diagram showing context settings in decoding syntax elements of dimd_mode. FIG. 2 is a diagram illustrating a configuration of a predicted image generating unit. FIG. 13 is a diagram showing details of a DIMD prediction unit. 13 is a diagram showing an example of a reference region referred to by a DIMD prediction unit 31046. FIG. 13 is a diagram showing a configuration for changing the number of lines in a reference region for DIMD prediction according to dimd_mode. FIG. This is an example of a spatial filter. FIG. 13 is a diagram illustrating an example of a pixel from which gradient is derived; FIG. 13 is a diagram illustrating the relationship between gradient and area. 4 is a block diagram showing a configuration of an angle mode derivation unit. FIG. 13 is a diagram showing an example of a reference range in gradient derivation by the DIMD prediction unit 31046. FIG. 4 is a functional block diagram showing an example of the configuration of an inverse quantization and inverse transform unit. FIG. FIG. 1 is a block diagram showing a configuration of a video encoding device.

First Embodiment
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

FIG. 1 is a schematic diagram showing the configuration of an image transmission system 1 according to this embodiment.

The image transmission system 1 is a system that transmits an encoded stream obtained by encoding an image to be encoded, and decodes the transmitted encoded stream to display an image. The image transmission system 1 is composed of a video encoding device (image encoding device) 11, a network 21, a video decoding device (image decoding device) 31, and a video display device (image display device) 41.

An image T is input to the video encoding device 11.

The network 21 transmits the encoded stream Te generated by the video encoding device 11 to the video decoding device 31. The network 21 is the Internet, a wide area network (WAN), a local area network (LAN), or a combination of these. The network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting. The network 21 may also be replaced by a storage medium on which the encoded stream Te is recorded, such as a DVD (Digital Versatile Disc: registered trademark) or a BD (Blu-ray Disc: registered trademark).

The video decoding device 31 decodes each of the encoded streams Te transmitted by the network 21 and generates one or more decoded images Td.

The video display device 41 displays all or part of one or more decoded images Td generated by the video decoding device 31. The video display device 41 is equipped with a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Display forms include stationary, mobile, HMD, etc. Furthermore, when the video decoding device 31 has high processing power, it displays high quality images, and when it has only lower processing power, it displays images that do not require high processing power or display power.

<Operator>
The operators used in this specification are listed below.

>> is a right bit shift, << is a left bit shift, & is a bitwise AND, | is a bitwise OR, ^ is a bitwise XOR, |= is the OR assignment operator, ! is logical negation (NOT), && is logical AND, and || is logical OR.

x?y:z is a ternary operator that takes y if x is true (non-zero) and z if x is false (0).

Clip3(a,b,c) is a function that clips c to a value between a and b, and returns a if c<a, returns b if c>b, and returns c in all other cases (where a<=b).

Clip1Y(c) is the operator for Clip3(a,b,c) with a=0 and b=(1<<BitDepthY)-1. BitDepthY is the luminance bit depth.

abs(a) is a function that returns the absolute value of a.

Int(a) is a function that returns the integer value of a.

Floor(a) is a function that returns the largest integer less than or equal to a.

Log2(a) is a function that returns the logarithm to the base 2.

Ceil(a) is a function that returns the smallest integer greater than or equal to a.

a/d represents the division of a by d (rounded down to nearest whole number).

Min(a,b) is a function that returns the smaller of a and b.

<Structure of the Encoded Stream Te>
Before describing in detail the video encoding device 11 and the video decoding device 31 according to this embodiment, the data structure of the encoded stream Te generated by the video encoding device 11 and decoded by the video decoding device 31 will be described.

FIG. 2 is a diagram showing the hierarchical structure of data in an encoded stream Te. The encoded stream Te illustratively includes a sequence and a number of pictures that make up the sequence. FIG. 2 shows a coded video sequence that defines a sequence SEQ, a coded picture that specifies a picture PICT, a coded slice that specifies a slice S, coded slice data that specifies slice data, a coding tree unit included in the coded slice data, and a coding unit included in the coding tree unit.

(Coded Video Sequence)
The coded video sequence defines a set of data to be referred to by the video decoding device 31 in order to decode the sequence SEQ to be processed. As shown in the coded video sequence of Fig. 2, the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, and supplemental enhancement information SEI (Supplemental Enhancement Information).

The video parameter set VPS specifies a set of coding parameters common to multiple videos composed of multiple layers, as well as a set of coding parameters related to multiple layers and each individual layer included in the video.

The sequence parameter set SPS specifies a set of coding parameters that the video decoding device 31 references in order to decode the target sequence. For example, the width and height of a picture are specified. Note that there may be multiple SPSs. In that case, one of the multiple SPSs is selected from the PPS.

The picture parameter set PPS specifies a set of coding parameters that the video decoding device 31 references in order to decode each picture in the target sequence. For example, it includes the reference value of the quantization width used in decoding the picture (pic_init_qp_minus26) and a flag indicating the application of weighted prediction (weighted_pred_flag). Note that there may be multiple PPSs. In that case, one of the multiple PPSs is selected for each picture in the target sequence.

(Encoded Picture)
A coded picture defines a set of data to be referenced by the video decoding device 31 in order to decode a picture PICT to be processed. As shown in the coded picture of FIG. 2, the picture PICT includes slices 0 to NS-1 (NS is the total number of slices included in the picture PICT).

Note that in the following descriptions, when there is no need to distinguish between slices 0 to NS-1, the subscripts of the symbols may be omitted. The same applies to other data that are included in the encoded stream Te described below and have subscripts.

(Coding Slice)
An encoded slice defines a set of data to be referenced by the video decoding device 31 in order to decode a slice S to be processed. As shown in the encoded slice of Fig. 2, a slice includes a slice header and slice data.

The slice header includes a set of coding parameters that the video decoding device 31 refers to in order to determine the decoding method for the target slice. Slice type designation information (slice_type) that specifies the slice type is an example of a coding parameter included in the slice header.

Slice types that can be specified by the slice type specification information include (1) an I slice that uses only intra prediction when encoding, (2) a P slice that uses unidirectional prediction or intra prediction when encoding, and (3) a B slice that uses unidirectional prediction, bidirectional prediction, or intra prediction when encoding. Note that inter prediction is not limited to unidirectional or bidirectional prediction, and a predicted image may be generated using more reference pictures. Hereinafter, when referring to P or B slice, it refers to a slice that includes a block for which inter prediction can be used.

In addition, the slice header may include a reference to the picture parameter set PPS (pic_parameter_set_id).

(Encoded slice data)
The coded slice data specifies a set of data to be referenced by the video decoding device 31 in order to decode the slice data to be processed. The slice data includes a CTU, as shown in the coded slice header in Fig. 2. A CTU is a block of a fixed size (e.g., 64x64) that constitutes a slice, and is also called a Largest Coding Unit (LCU).

(coding tree unit)
In the coding tree unit in Fig. 2, a set of data that the video decoding device 31 refers to in order to decode the CTU to be processed is specified. The CTU is divided into coding units CU, which are basic units of the coding process, by recursive quad tree division (QT (Quad Tree) division), binary tree division (BT (Binary Tree) division), or ternary tree division (TT (Ternary Tree) division). BT division and TT division are collectively called multi tree division (MT (Multi Tree) division). A node of a tree structure obtained by recursive quad tree division is called a coding node. The intermediate nodes of the quad tree, binary tree, and ternary tree are coding nodes, and the CTU itself is specified as the top coding node.

(Encoding Unit)
As shown in the coding unit of Fig. 2, a set of data to be referenced by the video decoding device 31 in order to decode the coding unit to be processed is defined. Specifically, the CU is composed of a CU header CUH, prediction parameters, transformation parameters, quantization transformation coefficients, etc. The CU header defines a prediction mode, etc.

Prediction processing may be performed on a CU basis, or on a sub-CU basis, which is a further division of a CU. If the size of the CU and sub-CU are equal, there is one sub-CU in the CU. If the size of the CU is larger than the size of the sub-CU, the CU is divided into sub-CUs. For example, if the CU is 8x8 and the sub-CU is 4x4, the CU is divided into 2 parts horizontally and 2 parts vertically, into 4 sub-CUs.

There are two types of prediction (prediction modes): intra prediction and inter prediction. Intra prediction is a prediction within the same picture, while inter prediction refers to a prediction process performed between different pictures (for example, between display times or between layer images).

The transform and quantization process is performed on a CU basis, but the quantized transform coefficients may be entropy coded on a subblock basis, such as 4x4.

(Prediction parameters)
The predicted image is derived from prediction parameters associated with the block, which include intra-prediction and inter-prediction parameters.

Below, the prediction parameters for intra prediction are explained. The intra prediction parameters consist of a luminance prediction mode IntraPredModeY and a chrominance prediction mode IntraPredModeC. Figure 3 is a schematic diagram showing the types of intra prediction modes (mode numbers). As shown in the figure, there are, for example, 67 types of intra prediction modes (0 to 66). These include planar prediction (0), DC prediction (1), and angular prediction (2 to 66). In addition, linear model (LM: Linear Model) prediction such as cross component linear model (CCLM: Cross Component Linear Model) prediction and multi-mode linear model (MMLM: Multi Mode Linear Model) prediction may also be used. Furthermore, an LM mode may be added for chrominance.

(Configuration of video decoding device)
The configuration of a video decoding device 31 (FIG. 4) according to this embodiment will be described.

The video decoding device 31 includes an entropy decoding unit 301, a parameter decoding unit (prediction image decoding device) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generating unit (prediction image generating device) 308, an inverse quantization and inverse transform unit 311, an addition unit 312, and a prediction parameter derivation unit 320. Note that, in accordance with the video encoding device 11 described below, the video decoding device 31 may also be configured not to include the loop filter 305.

In the following, an example will be described in which CTU and CU are used as processing units, but this is not limiting and processing may be performed in sub-CU units. Alternatively, CTU and CU may be read as blocks and sub-CU as sub-blocks, and processing may be performed in block or sub-block units.

The entropy decoding unit 301 performs entropy decoding on the externally input encoded stream Te and parses each code (syntax element). There are two types of entropy coding: one is to perform variable-length coding of syntax elements using a context (probability model) adaptively selected according to the type of syntax element and surrounding circumstances, and the other is to perform variable-length coding of syntax elements using a predefined table or formula. The former CABAC (Context Adaptive Binary Arithmetic Coding) stores in memory an updated probability model for each encoded or decoded picture (slice). Then, from among the probability models stored in memory, the probability model of a picture using the same slice type and quantization parameters of the same slice level is set as the initial state of the context of a P picture or B picture. This initial state is used for the encoding and decoding processes. The parsed code includes prediction information for generating a predicted image and prediction errors for generating a difference image.

The entropy decoding unit 301 may decode each bin of the syntax element using the variables ivlCurrRange, ivlOffset, valIdx, pStateIdx0, and pStateIdx1. ivlCurrRange and ivlOffset are context-independent variables. valIdx, pStateIdx0, and pStateIdx1 are context-specific variables.

(Bin Decryption with Context)
When the entropy decoding unit 301 uses a context, it obtains ivlCurrRange and ivlOffset by the following calculation.

qRangeIdx = ivlCurrRange >> 5
pState = pStateIdx1 + 16 * pStateIdx0
valMps = pState >> 14
ivlLpsRange = (qRangeIdx * ((valMps ? 32767 - pState : pState) >> 9) >> 1) + 4
ivlCurrRange = ivlCurrRange - ivlLpsRange
Next, if ivlOffset>=ivlCurrRange, the entropy decoding unit 301 derives the Bin value binVal, the variables ivlOffset and ivlCurrRange as follows.

binVal = !valMps
ivlOffset = ivlOffset - ivlCurrRange
ivlCurrRange = ivlLpsRange
Otherwise, get binVal as follows:

binVal = valMps
Furthermore, the entropy decoding unit 301 updates the state of the context by the following calculation.

shift0 = (shiftIdx >> 2) + 2
shift1 = (shiftIdx & 3) + 3 + shift0
pStateIdx0 = pStateIdx0 - (pStateIdx0 >> shift0) + (1023 * binVal >> shift0)
pStateIdx1 = pStateIdx1 - (pStateIdx1 >> shift1) + (16383 * binVal >> shift1)
(Bin decryption in case of bypass)
In the case of bypass, the entropy decoding unit 301 obtains ivlCurrRange and ivlOffset by the following calculation.

ivlCurrRange = ivlCurrRange << 1
ivlOffset = ivlOffset | read_bits(1)
Here, read_bits(1) reads one bit from the bitstream and returns that value.
Next, if ivlOffset>=ivlCurrRange, the entropy decoding unit 301 sets binVal and ivlOffset as follows.

binVal = 1
ivlOffset = ivlOffset - ivlCurrRange
Otherwise, set binVal as follows:

binVal = 0
In the case of bypass, the entropy decoding unit 301 does not update the state of the context.

The entropy decoding unit 301 outputs the parsed syntax elements to the parameter decoding unit 302. Control of which syntax elements to parse is performed based on instructions from the parameter decoding unit 302.

The entropy decoding unit 301 may parse, for example, the syntax element dimd_mode shown in the syntax table of FIG. 5 as follows. dimd_mode is a syntax element that selects the reference region of the DIMD from the encoded data.

The entropy decoding unit 301 parses dimd_mode from the encoded data. In a configuration in which the position of the DIMD reference image is changed, dimd_mode may be DIMD_MODE_TOP_LEFT mode, DIMD_MODE_TOP mode, or DIMD_MODE_LEFT mode, which may be 0, 1, or 2, respectively.

Figure 6(a) shows an example of binarization of dimd_mode. binIdx is a variable indicating the bit position, Bin0 (binIdx==0), and the syntax element Bin1 (binIdx==1) indicate the first bit and the next bit.

Bin0: Flag to select DIMD_MODE_TOP_LEFT or other. 0 indicates DIMD_MODE_TOP_LEFT, 1 indicates not DIMD_MODE_TOP_LEFT mode.

Bin1 Flag to select DIMD_MODE_TOP or DIMD_MODE_LEFT. 0 indicates DIMD_MODE_TOP, 1 indicates DIMD_MODE_LEFT.
Note that instead of constituting one syntax element with Bin0 and Bin1, a syntax element may be assigned to each of Bin0 and Bin1, and the two syntax elements may be parsed instead of dimd_mode. Here, the syntax element assigned to Bin0 is called dimd_mode_flag, and the syntax element assigned to Bin1 is called dimd_mode_dir (see, for example, FIG. 7). In this case, the entropy decoding unit 301 may derive dimd_mode from dimd_mode_flag and dimd_mode_dir using the following formula. If dimd_mode_flag==0, dimd_mode_dir is not decoded and is set to 0.

dimd_mode = ((dimd_mode_flag == 0) ? 0 : 1) + dimd_mode_dir
In this example, 1 bit (for example, "0") is assigned to DIMD_MODE_TOP_LEFT, and 1 more bit is assigned after "1" to DIMD_MODE_TOP and DIMD_MODE_LEFT. In the binarization of dimd_mode, when using left and top, which have a high selection rate, shorter bits are assigned than when using left or top, which has the effect of shortening the average code amount and improving the coding efficiency.

The entropy decoding unit 301 parses dimd_mode from the encoded data. In a configuration in which the number of lines of the DIMD reference image is changed as shown in FIG. 6(b), dimd_mode may be DIMD_LINES1 mode or DIMD_LINED2, which may be 0 or 1, respectively.

Although not shown, the binarization of dimd_mode may be Bin0 (binIdx==0).

Figure 8 shows the setting of the context (ctxInc) when parsing the syntax element of dimd_mode. A context is a variable area for holding the probability (state) of CABAC, and is identified by the value of the context index ctxIdx (0, 1, 2, ...). The case where 0 and 1 are always equally probable, in other words 0.5, 0.5, is called EP (Equal Probability) or bypass. In this case, no context is used because there is no need to hold a state for a specific syntax element. ctxIdx is derived by referencing ctxInc.

As shown in FIG. 8(a), the entropy decoding unit 301 may parse the syntax element dimd_mode using a context (ctxInc=0) for decoding the first Bin0, and bypass for Bin1. Bin0 is a syntax element indicating whether DIMD_MODE_TOP_LEFT, and Bin1 is a syntax element indicating whether DIMD_MODE_LEFT or DIMD_MODE_TOP. Bypass is a parsing method that does not use a context. dimd_mode is a syntax element that selects the DIMD reference area from the encoded data. With the above configuration, no context is used to select between DIMD_MODE_LEFT and DIMD_MODE_TOP, which has the effect of reducing memory.

As shown in FIG. 8(b), the entropy decoding unit 301 may parse dimd_mode from the encoded data by using a context (ctxInc=0) for decoding the first Bin0 and another context (ctxInc=1) for Bin1. With the above configuration, adaptive encoding is possible to use contexts for all directions, which has the effect of improving performance.

As shown in FIG. 8(c), the entropy decoding unit 301 may parse dimd_mode from the encoded data using a different context (ctxInc=1,2,3) for Bin1 depending on the shape of the target block. For example, if the width bW and height bH of the target block are equal, different context values may be assigned when the block is horizontally long and when it is vertically long, as shown below.

ctxInc = ( bW == bH ) ? 1 : ( bW < bH ) ? 2 : 3
The formulas and values are not limited to those described above, and the order of determination and values may be changed. For example, the following may be used.

ctxIdx = ( bW > bH ) ? 1 : ( bW < bH ) ? 2 : 3
According to the above configuration, it is possible to adaptively encode the block using different contexts depending on the shape of the block, for example, between horizontal and vertical, and this provides the effect of improving performance.

As shown in FIG. 8(d), the entropy decoding unit 301 may parse dimd_mode from the encoded data using a different context (ctxInc=1,2) for Bin1 depending on the shape of the target block. If the shape of the block is square, dimd_mode may be parsed using bypass.

ctxIdx = ( bW == bH ) ? bypass : ( bW < bH ) ? 1 : 2
The formula and values are not limited to the above, and the order of judgment and values may be changed. For example, ctxIdx = ( bW > bH ) ? 1 : ( bW < bH ) ? 2 : bypass
The DIMD mode (dimd_mode) is composed of a first bit and a second bit, and the first bit selects whether the reference area of the DIMD is both above and to the left of the target block, and the second bit selects whether the reference area of the DIMD is to the left or above the target block, and the above adjacent area may be selected.

As shown in Fig. 8(e), the entropy decoding unit 301 may parse dimd_mode from the encoded data using a different context (ctxInc = 1, 2) for Bin1 depending on the shape of the target block. If the shape of the block is a square, a predetermined context (e.g., 1) may be used to parse dimd_mode, and if not, a different context (e.g., 2) may be used to parse dimd_mode.

ctxIdx = ( bW == bH ) ? 1 : 2
In this case, if the dimd_mode is not a square, dimd_mode is decoded using the value obtained by swapping the binary value of Bin1 (1 to 0, 0 to 1, for example, 1 - Bin1) depending on whether bW > bH or bH < bW. In other words, dimd_mode is derived as follows.

dimd_mode = ((Bin0 == 0) ? 0 : 1) + ((bW >= bH) ? Bin1 : 1-Bin1)
When using the above two syntaxes, dimd_mode is derived as follows:

dimd_mode = ((dimd_mode_flag == 0) ? 0 : 1) + ((bW >= bH) ? dimd_mode_dir : 1-dimd_mode_dir)
Note that the above bW >= bH may be bW > bH, bW <= bH, or bW < bH.

The above configuration uses different contexts depending on the shape of the target block, for example, whether the target block is square or not (and/or whether it is horizontal or vertical), so it is possible to adaptively encode the block with a short code according to its characteristics, improving performance. Also, if no context is used when the block is square, for example, this has the effect of reducing memory usage.

Note that if the left reference position (xC - refIdxW - 1, yC) and top reference position (xC, yC - refIdxH - 1) of the block cannot reference adjacent areas such as the edge of the screen, tile, or slice, the entropy decoding unit 301 may omit decoding of dimd_mode and set dimd_mode = DIMD_MODE_TOP_LEFT. Note that xC and yC are variables related to the top left position of the block, and refIdxW and refIdxH are variables related to the number of lines in the reference area for DIMD prediction.

As an alternative configuration, if only the upper adjacent region of the block is available, dimd_mode=DIMD_MODE_TOP, and if only the left adjacent region is available, dimd_mode=DIMD_MODE_LEFT.

The parameter decoding unit 302 informs the entropy decoding unit 301 which syntax elements to parse. In addition, the entropy decoding unit 301 outputs the syntax elements parsed by the entropy decoding unit 301 to the prediction parameter derivation unit 320.

(Configuration of prediction parameter derivation unit 320)
The prediction parameter derivation unit 320 derives prediction parameters, for example, an intra-prediction mode IntraPredMode, by referring to the prediction parameters stored in the prediction parameter memory 307 based on the syntax elements input from the parameter decoding unit 302. The prediction parameter derivation unit 320 outputs the derived prediction parameters to the predicted image generation unit 308, and also stores them in the prediction parameter memory 307. The prediction parameter derivation unit 320 may derive different prediction modes for luminance and chrominance.

The prediction parameter derivation unit 320 may derive prediction parameters from syntax elements related to intra prediction such as those shown in FIG. 5.

The loop filter 305 is a filter provided in the encoding loop that removes block distortion and ringing distortion and improves image quality. The loop filter 305 applies filters such as a deblocking filter, sample adaptive offset (SAO), and adaptive loop filter (ALF) to the decoded image of the CU generated by the adder 312.

The reference picture memory 306 stores the decoded image of the CU generated by the adder 312 in a predetermined location for each target picture and target CU.

The prediction parameter memory 307 stores prediction parameters at a predetermined location for each CTU or CU to be decoded. Specifically, the prediction parameter memory 307 stores the parameters decoded by the parameter decoding unit 302 and the prediction mode predMode derived by the prediction parameter derivation unit 320.

The prediction mode predMode, prediction parameters, etc. are input to the prediction image generation unit 308. The prediction image generation unit 308 also reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a block or sub-block using the prediction parameters and the read reference picture (reference picture block). Here, a reference picture block is a set of pixels on the reference picture (usually rectangular, so called a block), and is the area referenced to generate a prediction image.

When the prediction mode predMode indicates an intra prediction mode (IntraPredMode), the predicted image generation unit 310 performs intra prediction using the intra prediction parameters input from the prediction parameter derivation unit 320 and the reference pixels read from the reference picture memory 306.

Specifically, the predicted image generation unit 308 reads adjacent blocks in a predetermined range from the target block on the target picture from the reference picture memory 306. The predetermined range refers to the adjacent blocks to the left, upper left, upper, and upper right of the target block, and the area to be referenced differs depending on the intra prediction mode.

The predicted image generation unit 308 generates a predicted image of the current block by referring to the decoded pixel values that have been read and the prediction mode indicated by IntraPredMode. The predicted image generation unit 308 outputs the generated predicted image of the block to the addition unit 312.

The generation of predicted images based on intra prediction modes is explained below. In planar prediction, DC prediction, and angular prediction, a decoded surrounding area adjacent (close) to the block to be predicted is set as reference region R. Then, a predicted image is generated by extrapolating pixels in reference region R in a specific direction. For example, reference region R may be set as an L-shaped region that includes the left and top of the block to be predicted (or further, the top left, top right, and bottom left).

(Details of predicted image generation unit)
Next, the configuration of the predicted image generation unit 308 will be described in detail with reference to Fig. 9. The predicted image generation unit 308 includes a reference sample filter unit 3103 (second reference image setting unit), a prediction unit 3104, and a predicted image correction unit 3105 (predicted image correction unit, filter switching unit, and weighting coefficient changing unit).

Based on each reference pixel (reference image) in the reference region R, a filtered reference image generated by applying a reference pixel filter (first filter), and the intra prediction mode, the prediction unit 3104 generates a prediction image (provisional predicted image, uncorrected predicted image) of the block to be predicted, and outputs it to the prediction image correction unit 3105. The prediction image correction unit 3105 corrects the provisional predicted image according to the intra prediction mode, and generates and outputs a prediction image (corrected predicted image).

The following describes each component of the predicted image generation unit 308.

(Reference sample filter unit 3103)
The reference sample filter unit 3103 derives a reference sample s[x][y] at each position (x, y) on the reference region R by referring to the reference image. In addition, the reference sample filter unit 3103 applies a reference pixel filter (first filter) to the reference sample s[x][y] according to the intra prediction mode to update the reference sample s[x][y] at each position (x, y) on the reference region R (derives a filtered reference image s[x][y]). Specifically, a low-pass filter is applied to the position (x, y) and the reference image therearound to derive a filtered reference image. Note that it is not necessary to apply a low-pass filter to all intra prediction modes, and a low-pass filter may be applied to some intra prediction modes. Note that the filter applied to the reference image on the reference region R in the reference sample filter unit 3103 is referred to as a "reference pixel filter (first filter)", whereas the filter that corrects the tentative predicted image in the prediction image correction unit 3105 described later is referred to as a "position-dependent filter (second filter)".

(Configuration of intra prediction unit)
The intra prediction unit generates a tentative predicted image (tentative predicted pixel value, pre-corrected predicted image) of a prediction target block based on an intra prediction mode, a reference image, and a filtered reference pixel value, and outputs the generated image to a prediction image correction unit 3105. The prediction unit 3104 includes a planar prediction unit 31041, a DC prediction unit 31042, an angular prediction unit 31043, an LM prediction unit 31044, a matrix-based intra prediction unit 31045, and a DIMD prediction unit 31046 (Decoder-side Intra Mode Derivation, DIMD). The prediction unit 3104 selects a specific prediction unit according to the intra prediction mode, and inputs a reference image and a filtered reference image. The relationship between the intra prediction mode and the corresponding prediction unit is as follows.
・Planar prediction ・・・Planar prediction section 31041
・DC prediction...DC prediction unit 31042
・Angular prediction ・・・・Angular prediction section 31043
LM prediction...LM prediction unit 31044
・Matrix intra prediction・・MIP part 31045
DIMD prediction: DIMD prediction unit 31046
(Planar forecast)
The planar prediction unit 31041 generates a provisional predicted image by linearly adding the reference sample s[x][y] according to the distance between the prediction target pixel position and the reference pixel position, and outputs the provisional predicted image to the predicted image correction unit 3105.

(DC forecast)
The DC prediction unit 31042 derives a DC predicted value equivalent to the average value of the reference samples s[x][y], and outputs a temporary predicted image q[x][y] whose pixel values are the DC predicted values.

(Angular predictions)
The angular prediction unit 31043 generates a temporary predicted image q[x][y] using a reference sample s[x][y] in the prediction direction (reference direction) indicated by the intra prediction mode, and outputs the temporary predicted image q[x][y] to the predicted image correction unit 3105.

(LM forecast)
The LM prediction unit 31044 predicts pixel values of chrominance based on pixel values of luminance. Specifically, this is a method of generating a predicted image of a chrominance image (Cb, Cr) using a linear model based on a decoded luminance image. One type of LM prediction is CCLM (Cross-Component Linear Model prediction). CCLM prediction is a prediction method that uses a linear model to predict chrominance from luminance for one block.

(Matrix Intra Prediction)
The MIP unit 31045 generates a temporary predicted image q[x][y] by performing a product-sum operation on the reference sample s[x][y] derived from the adjacent region and a weighting matrix, and outputs the generated image to the predicted image correction unit 3105.

(DIMD forecast)
The DIMD prediction unit 31046 is a prediction method that generates a predicted image using an intra prediction mode that is not explicitly signaled. The angle mode derivation device 310465 derives an intra prediction mode suitable for the current block using information on the neighboring region, and the DIMD prediction unit 31046 generates a temporary predicted image using this intra prediction mode. Details will be described later.

(Configuration of the predicted image correction unit 3105)
The predicted image correction unit 3105 corrects the provisional predicted image output from the prediction unit 3104 according to the intra prediction mode. Specifically, the predicted image correction unit 3105 derives a position-dependent weighting coefficient for each pixel of the provisional predicted image according to the reference region R and the position of the target predicted pixel. Then, the predicted image correction unit 3105 performs weighted addition (weighted averaging) of the reference sample s[][] and the provisional predicted image q[x][y] to derive a predicted image (corrected predicted image) Pred[][] obtained by correcting the provisional predicted image. Note that, in some intra prediction modes, the predicted image correction unit 3105 may set the provisional predicted image q[x][y] as a predicted image without correcting it.

Example 1
10 shows the configuration of the DIMD prediction unit 31046 in this embodiment. The DIMD prediction unit 31046 is composed of a reference sample derivation unit 310460, an angle mode derivation device 310465 (gradient derivation unit 310461, angle mode derivation unit 310462), an angle mode selection unit 310463, and a temporary predicted image generation unit 310464. The angle mode derivation device 310465 may include the angle mode selection unit 310463.

Figure 5 shows an example of the syntax of encoded data related to DIMD. The prediction parameter derivation unit 320 decodes a flag dimd_flag indicating whether or not to use DIMD for each block from the encoded data. If dimd_flag for the target block is 1, the parameter decoding unit 302 does not need to decode syntax elements related to intra prediction mode (intra_mip_flag, intra_luma_mpm_flag, intra_luma_mpm_idx, intra_luma_mpm_reminder) from the encoded data. intra_mip_flag is a flag indicating whether or not to perform MIP prediction. intra_luma_mpm_flag is a flag indicating whether or not to use the prediction candidate Most Probable Mode (MPM). intra_luma_mpm_idx is an index that specifies MPM when MPM is used. intra_luma_mpm_reminder is an index that selects the remaining candidate when MPM is not used. If dimd_flag is 0, intra_luma_mpm_flag is decoded, and if intra_luma_mpm_flag is 0, intra_luma_mpm_reminder is also decoded. If dimd_flag of the current block is 1, dimd_mode of the current block is also decoded. dimd_mode indicates a reference region used to derive an intra prediction mode in DIMD prediction. The meaning of dimd_mode may be as follows:

dimd_mode = 0 DIMD_MODE_TOP_LEFT (use top and left)
dimd_mode = 2 DIMD_MODE_LEFT (use left)
dimd_mode = 3 DIMD_MODE_TOP (use top)
When dimd_flag is 1, the DIMD prediction unit 31046 derives an angle indicating the texture direction in the adjacent region using pixel values. Then, a provisional predicted image is generated using an intra prediction mode corresponding to the angle. For example, (1) a gradient direction of pixel values is derived for a pixel at a predetermined position in the adjacent region. (2) The derived gradient direction is converted to a corresponding directional prediction mode (angular prediction mode). (3) A histogram of the obtained prediction direction is created for each predetermined pixel in the adjacent region. (4) A prediction mode of the most frequent value or a plurality of prediction modes including the most frequent value is selected from the histogram, and a provisional predicted image is generated using the prediction mode. The processing in each part of the DIMD prediction unit 31046 shown in FIG. 10 will be described in more detail below.

(1) Reference Sample Derivation Unit The reference sample derivation unit 310460 derives a reference sample refUnit from decoded pixels recSamples adjacent to the current block. Note that the operation of the reference sample derivation unit 310460 may be performed by the reference sample filter unit 3103.

11 is a diagram showing an example of a reference region referred to by the DIMD prediction unit 31046. The reference sample derivation unit 310460 stores adjacent images (images in the DIMD reference region) recSamples of the current block to be used by a gradient derivation unit 310461 and the predicted image generation unit 308, which will be described later, in a sample array refUnit.

(Configuration example 1 of reference area according to mode)
When dimd_mode == DIMD_MODE_TOP_LEFT, the reference sample derivation unit 310460 derives a sample array refUnit from the left and top areas of the target block as follows.

First, the following process is performed at position (x, y) in the left region RL of the target block (hereafter simply referred to as RL).

refUnit[x][y] = recSamples[xC+x][yC+y]
Here, RL is x=-1-refIdxW..-1, y=-1-refIdxH..refH-1.
(xC, yC) are the top left coordinates of the target block, and refIdxW and refIdxH are constants indicating the width of the adjacent reference area to the left and the height of the adjacent reference area above. refIdxW = 2 or 3, and refIdxH = 2 or 3 may be used. Furthermore, each may be changed according to the block size (similar below). If the pixels in the area above the target block are not available, the y coordinate range of RL is y=0..refH-1.

refW, refH indicate the width and height of the DIMD reference area, and if they are the same as the size of the target block, refW = bW, refH = bH, and if they are extended, refW = bW*2, refH = bH*2 (same below). In this case, extending refers to using adjacent images including the lower left adjacent area in addition to the left, and using adjacent images including the upper right adjacent area in addition to the top.
Next, the following process is performed on the RT in the range of pixels in the area above the target block.

refUnit[x][y] = recSamples[xC+x][yC+y]
Here, RT is x=-1-refIdxW..refW-1, y=-1-refIdxH..-1. If the pixels in the area to the left of the target block are not available, the x-coordinate range of RT is x=0..refW-1. RTL is the area that combines RL and RT.

When dimd_mode == DIMD_MODE_LEFT, the reference sample derivation unit 310460 derives refUnit from the left area of the target block, for example, the above RL.

When dimd_mode == DIMD_MODE_TOP, the reference sample derivation unit 310460 derives refUnit from the area above the target block, for example, the above RT.

(Configuration example 2 of reference area according to mode)
The reference sample derivation unit 310460 may perform the following process.

When dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_LEFT, the reference sample derivation unit 310460 derives refUnit from the left area of the target block, for example, the above RL.

Furthermore, when dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_TOP, the reference sample derivation unit 310460 derives refUnit from the area above the target block, for example, the above RT.

(Example 3 of configuration of reference area according to mode: Example of using extension area according to dimd_mode)
Figure 11(b) shows another example of the reference range in the gradient derivation of DIMD prediction. In this example, when dimd_mode == DIMD_MODE_LEFT, an extended region including the adjacent region to the left and the lower left is used, when dimd_mode == DIMD_MODE_TOP, an extended region including the adjacent region to the top and the upper right is used, and when dimd_mode == DIMD_MODE_TOP_LEFT, the left and upper regions are used without being extended.

For example, the reference sample derivation unit 310460 may perform the following processing:

When dimd_mode == DIMD_MODE_TOP_LEFT, the reference sample derivation unit 310460 derives a sample array refUnit from the left and top areas of the target block as follows.
First, the following process is carried out in the pixel range RL in the left region of the target block.

refUnit[x][y] = recSamples[xC+x][yC+y]
Here RL is x=-1-refIdxW..-1, y=-1-refIdxH..refH-1, refW = bW, refH = bH.
Next, the following process is performed on the RT in the range of pixels in the area above the target block.

refUnit[x][y] = recSamples[xC+x][yC+y]
Here, RT is x=-1-refIdxW..refW-1, y=-1-refIdxH..-1, refW = bW, refH = bH. RTL is the area (range of positions) that combines RL and RT.

When dimd_mode == DIMD_MODE_LEFT, the reference sample derivation unit 310460 derives refUnit from the left and bottom left areas of the target block, for example, RL_EXT.
RL_EXT is x=-1-refIdxW..-1, y=-1-refIdxH..refH2-1, refH2 = bH*2. If the pixels in the area above the target block are not available, set y=0..refH2-1.

When dimd_mode == DIMD_MODE_TOP, the reference sample derivation unit 310460 derives refUnit from the area above the target block, for example, RT_EXT.
RT_EXT is x=-1-refIdxW..refW2-1, y=-1-refIdxH..-1, refW2 = bW*2. If the pixels in the left area of the target block are not available, set x=0..refW2-1.

(Example 4 of configuration of reference area according to mode: Example 2 of using extension area according to dimd_mode)
The reference sample derivation unit 310460 may perform the following process.

When dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_LEFT, the reference sample derivation unit 310460 derives refUnit from the left and bottom left areas of the target block, for example, RL_ADAP.
RL_ADAP: x=-1-refIdxW..-1, y=-1-refIdxH..refH-1, refH = (dimd_mode == DIMD_MODE_TOP_LEFT) ? bH : bH*2. If the pixels above the target block are not available in DIMD_MODE_LEFT, y=0..refH-1.

Furthermore, when dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_TOP, the reference sample derivation unit 310460 derives refUnit from the upper and upper right areas of the target block, for example, RT_ADAP.
RT_ADAP: x=-1-refIdxW..refW-1, y=-1-refIdxH..-1, refW = (dimd_mode == DIMD_MODE_TOP_LEFT) ? bW : bW*2. If the pixels to the left of the target block cannot be used in DIMD_MODE_TOP, set x=0..refW-1.

Then, the reference sample derivation unit 310460 may replace the value of the area that cannot be referenced because it is outside the target picture, outside the target subpicture, or outside the target slice boundary according to refUnit[x][y] with the pixel value derived above or a predetermined fixed value, for example, 1<<(bitDepth-1).

(Configuration Example 5 According to Mode)
FIG. 12(a) shows another example of the reference range in gradient derivation for DIMD prediction. In this example, the number of lines in the reference region is changed according to dimd_lines. For example, when dimd_mode == DIMD_LINES1, M lines are referenced, and when dimd_mode == DIMD_LINES2, N lines greater than M are referenced. For example, M=3, N=4.

The reference sample derivation unit 310460 sets the reference line numbers refIdxW and refIdxH in accordance with dimd_mode.
refIdxW = (dimd_mode == DIMD_LINES1) ? M-1 : N-1
refIdxH = (dimd_mode == DIMD_LINES1) ? M-1 : N-1
The reference sample derivation unit 310460 derives a sample array refUnit from the left and upper areas of the target block as follows.

First, the following process is performed in the pixel range RL of the area to the left of the target block.

refUnit[x][y] = recSamples[xC+x][yC+y]
Here, RL is x=-1-refIdxW..-1, y=-1-refIdxH..refH-1, refW = bW, refH = bH. If the pixels in the area above the target block are not available, set y=0..refH-1.

Next, the following process is performed on the RT in the range of pixels in the area above the target block.

refUnit[x][y] = recSamples[xC+x][yC+y]
Here, RT is x=-1-refIdxW..refW-1, y=-1-refIdxH..-1, refW = bW, refH = bH. If the pixels in the left area of the target block are not available, then x=0..refW-1.

You can also select based on block size.

(Configuration example 1 according to block size)
The reference sample derivation unit 310460 sets the numbers of reference lines refIdxW and refIdxH in accordance with the block size.
refIdxW = (bW >= 8 || bH >= 8) ? N-1 : M-1
refIdxH = (bW >= 8 || bH >= 8) ? N-1 : M-1
Furthermore, the reference sample derivation unit 310460 derives refUnit[x][y] from recSamples[xC+x][yC+y] of RL in the left region of the target block and RT in the upper region.

(Configuration Example 6 According to Mode)
Fig. 12(b) shows another example of the reference range in gradient derivation for DIMD prediction. In this example, the direction of the reference region is selected according to the dimd mode, and at the same time, the number of lines of the reference region is also selected.

The reference sample derivation unit 310460 derives the number of reference lines to be M when dimd_mode refers to the left and top, and otherwise derives the number of reference lines to be N (M<N). For example, M=3, N=4.
refIdxW = (dimd_mode == DIMD_MODE_TOP_LEFT) ? M-1 : N-1
refIdxH = (dimd_mode == DIMD_MODE_TOP_LEFT) ? M-1 : N-1
Values other than M=3, N=4, such as M=3, N=5, may also be used.

When dimd_mode == DIMD_MODE_TOP_LEFT, the reference sample derivation unit 310460 derives refUnit[x][y] from recSamples[xC+x][yC+y] of the left region RL and the top region RT.

When dimd_mode == DIMD_MODE_LEFT, the reference sample derivation unit 310460 derives refUnit[x][y] from the left area of the target block, for example, recSamples[xC+x][yC+y] of RL.

When dimd_mode == DIMD_MODE_TOP, the reference sample derivation unit 310460 derives refUnit[x][y] from the area above the target block, for example, recSamples[xC+x][yC+y] of RT.
(1) Gradient Derivation Unit The gradient derivation unit 310461 derives an angle (angle information) indicating a texture direction based on pixel values of a gradient derivation target image. The angle information may be a value representing an angle with 1/36 precision, or may be another value. The gradient derivation unit 310461 derives gradients in two or more specific directions (e.g., Dx, Dy), and derives the gradient direction (angle information) from the relationship between the gradients Dx and Dy.

A spatial filter may be used to derive the gradient. For example, a 3x3 pixel Sobel filter corresponding to the horizontal and vertical directions as shown in Figures 13(a) and (b) may be used as the spatial filter. The gradient derivation unit 310461 derives the gradient for point P[x][y] (hereinafter simply P) within the sample array refUnit[x][y] referenced and derived by the reference sample derivation unit 310460 in the gradient derivation target image. Note that it is also possible to configure the system to refer to recSamples[xC+x][yC+y] as point P instead of refUnit[x][y] without copying from recSamples to the sample array refUnit[x][y].

FIG. 14 shows an example of the positions of pixels to be subjected to gradient derivation in a target block of 8x8 pixels. When the angle mode derivation device 310465 is used for intra prediction, a shaded image in an adjacent region of the target block may be the image to be subjected to gradient derivation. The image to be subjected to gradient derivation may also be a luminance image corresponding to the chrominance image of the target block. In this way, the number of pixels to be subjected to gradient derivation, the position pattern, and the reference range of the spatial filter may be changed depending on information such as the size of the target block and the intra prediction mode of the blocks included in the adjacent region.

Specifically, the gradient deriving unit 310461 derives the horizontal and vertical gradients Dx, Dy for each point P as follows:
Dx = P[x-1][y-1] + 2*P[x-1][y] + P[x-1][y+1] - P[x+1][y-1] - 2*P[x+1][y] - P[x+1][y+1]
Dy = - P[x-1][y-1] - 2*P[x][y-1] - P[x+1][y-1] + P[x-1][y+1] + 2*P[x][y+1] + P[x+1][y+1]
13(c) and (d) may be used, which are obtained by inverting the filters in Fig. 13(a) and (b) horizontally or vertically. In that case, Dx and Dy are derived using the following equations.
Dx = - P[x-1][y-1] - 2*P[x-1][y] - P[x-1][y+1] + P[x+1][y-1] + 2*P[x+1][y] + P[x+1][y+1]
Dy = P[x-1][y-1] + 2*P[x][y-1] + P[x+1][y-1] - P[x-1][y+1] - 2*P[x][y+1] - P[x+1][y+1]
The method of deriving the gradient is not limited to this, and other methods (filters, formulas, tables, etc.) may be used. For example, a Prewitt filter or a Scharr filter may be used instead of the Sobel filter, and the filter size may be 2x2 or 5x5. The gradient derivation unit 310461 derives Dx and Dy using a Prewitt filter as follows.
Dx = P[x-1][y-1] + P[x-1][y] + P[x-1][y+1] - P[x+1][y-1] - P[x+1][y] - P[x+1][y1]
Dy = - P[x-1][y-1] - P[x][y-1] - P[x+1][y-1] + P[x-1][y+1] + P[x][y+1] + P[x+1][y+1]
The following equation is an example of deriving Dx and Dy using a Scharr filter.
Dx = 3*P[x-1][y-1]+10*P[x-1][y]+3*P[x-1][1] -3*P[x+1][y-1]-10*P[x+1][0]-3*P[x+1][y+1]
Dy = -3*P[x-1][y-1]-10*P[x][y-1]-3*P[x+1][-1] +3*P[x-1][y+1]+10*P[x][1]+3*P[x+1][y+1]
The gradient derivation method may be changed for each block. For example, a Sobel filter is used for a target block of 4x4 pixels, and a Scharr filter is used for blocks larger than 4x4. In this way, by using a filter with simpler calculations for small blocks, the increase in the amount of calculations for small blocks can be suppressed.

The gradient derivation method may be changed for each position of the pixel for which the gradient is to be derived. For example, a Sobel filter is used for the pixel for which the gradient is to be derived that is in the upper or left adjacent region, and a Scharr filter is used for the pixel for which the gradient is to be derived that is in the upper left adjacent region.

The gradient derivation unit 310461 derives angle information consisting of the quadrant (hereinafter referred to as region) of the texture angle of the target block and the angle within the quadrant based on the signs and magnitude relationship of Dx and Dy. Being able to express it by region makes it possible to standardize the processing of directions that are rotationally symmetric or line symmetric. However, the angle information is not limited to the region and the angle within the quadrant. For example, the angle information may be information only about the angle, and the region may be derived as necessary. Also, in this embodiment, the intra direction prediction modes derived below are limited to directions from the bottom left to the top right (2 to 66 in Figure 3), and intra direction prediction modes for directions that are rotationally symmetric by 180 degrees are treated the same.

Fig. 15(a) is a table showing the relationship between the signs (signx, signy) of Dx and Dy, the magnitude relationship (xgty), and the region (each of Ra to Rd is a constant representing the region). Fig. 15(b) shows the quadrants indicated by the regions Ra to Rd. The gradient derivation unit 310461 derives signx, signy, and xgty as follows.
absx = abs(Dx)
absy = abs(Dy)
signx = Dx < 0 ? 1 : 0
signy = Dy < 0 ? 1 : 0
xgty = absx > absy ? 1 : 0
Here, the inequality signs (>, <) can be replaced by inequality signs with equality signs (>=, <=). The area indicates a rough angle, and can be derived only from the signs signx, signy of Dx, Dy and the magnitude relationship xgty.

The gradient derivation unit 310461 derives a region from the signs signx, signy and the magnitude relationship xgty using calculations and table references. The gradient derivation unit 310461 may derive the corresponding region by referencing the table in FIG. 15(a).

The gradient derivation unit 310461 may derive the region using a logical formula as follows.
region = xgty ? ( (signx^signy) ? 1 : 0 ) : ( (signx^signy) ? 2 : 3)
Here, ^ indicates XOR (exclusive OR). The region is expressed as a value from 0 to 3. {Ra, Rb, Rc, Rd} = {0, 1, 2, 3}. Note that the way in which the region value is assigned is not limited to the above.

The gradient derivation unit 310461 may derive the region using another logical expression and addition/multiplication as follows.
region = 2*(!xgty) + (signx^signy^!xgty)
Here the symbol ! means logical negation.

(2) Angle Mode Derivation Unit The angle mode derivation unit 310462 derives an angle mode (a prediction mode corresponding to the gradient, for example, an intra prediction mode) based on the gradient information of each point P described above.

FIG. 16 is a block diagram showing one configuration of the angle mode derivation unit 310462. As shown in FIG. 16, the angle mode mode_delta may be derived as follows using a first gradient, a second gradient, and two tables.

The angle mode derivation unit 310462 consists of an angle coefficient derivation unit 310466 and a mode conversion unit 310467. The angle coefficient derivation unit 310466 derives an angle coefficient iRatio (or v) based on two gradients. Here, it derives a slope iRatio (= absy ÷ absx) based on the absolute value absx of the first gradient and the absolute value absy of the second gradient. An integer expressing ratio in increments of 1/R_UNIT is used as iRatio.
iRatio = int(R_UNIT*absy/absx) ≒ ratio*R_UNIT
R_UNIT is an exponential power of 2 (1<<shiftR), for example, 65536 (shiftR=16).

The following is an example of how to derive iRatio, but the method is not limited to this example.

s0 = xgty ? absy : absx
s1 = xgty ? absx : absy
x = Floor( Log2( s1 ) )
norm_s1 = (s1 << 4 >> x) & 15
v = gradDivTable[norm_s1] | 8
x += (norm_s1 != 0)
shift = 13 - x
if (shift < 0) {
shift = -shift
add = (1 << (shift - 1))
iRatio = (s0 * v + add) >> shift
} else {
iRatio = (s0 * v) << shift
}
where gradDivTable = { 0, 7, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 1, 1, 0 }
Alternatively, the above formula "| 8" (OR operation with 8) may be calculated as "+8". Similarly, "| 16", "| 32", and "| 64" that appear in the following explanations can also be calculated as "+16", "+32", and "+64", respectively.

The value norm_s1 is derived by shifting the first gradient (absx or absy) at a pixel by the logarithmic value x. norm_s1 is used to reference the gradDivTable to derive the angle coefficient v. Furthermore, idx is derived by the product of v and a second gradient (s0 or s1) different from the first gradient, and shifting the above logarithmic value x. idx is used to reference a second table LUT (LUT') to derive the angle mode mode_delta.

In addition, you can clip idx as follows so that it does not exceed the range of the LUT's number of regions.

idx = min((s0 * v)<< 3 >> x, N_LUT-1)
Furthermore, it is also appropriate to clip and shift the multiplication by s0*v to a value equal to or less than a predetermined value KK so that the result does not exceed, for example, 32 bits.

s0*v = (min(s0*v, KK)<<3) >> x
For example, KK is (1<<(31-3))-1=268435455.

The mode conversion unit 310467 derives and outputs the second angle mode modeVal using mode_delta.
modeVal = base_mode[region] + direction[region] * mode_delta
The angle mode derivation unit 310462 derives a histogram (frequency HistMode) of the value modeVal of the angle mode modeVal obtained for each point P. The histogram may be obtained by incrementing the value of HistMode by 1 at each point P (hereinafter referred to as counting with a histogram).

HistMode[modeVal] += 1
cntMode += 1
(3) Angle Mode Selection Unit The angle mode selection unit 310463 derives one or more representative values dimdModeVal (dimdModeVal0, dimdModeVal1, ...) of the angle mode using modeVal (modeVal) at multiple points P included in the gradient derivation target image. The representative value of the angle mode in this embodiment is an estimated value of the directionality of the texture pattern of the target block. Here, the representative value dimdModeVal is derived from the most frequent value derived using the derived histogram. In the histogram of the angle mode modeVal value modeVal obtained for each point P, the first mode dimdModeVal0 and the second mode dimdModeVal1 are derived by selecting the most frequent mode and the second most frequent mode in the frequency, respectively.

Furthermore, HistMode[x] is scanned for x, and the value of x that gives the maximum value of HisMode is set to dimdModeVal0, and the value of x that gives the second largest value is set to dimdModeVal1.

maxVal = 0
for (x = 0; x <cntMode; x++) {
if (HistMode[x] > maxVal) {
maxVal = HistMode[x]
dimdModeVal1 = dimdModeVal0
dimdModeVal0 = x
}
}
Note that the method of deriving dimdModeVal0 or dimdModeVal1 is not limited to the histogram. For example, the angle mode selection unit 310463 may set the average value of modeVal to dimdModeVal0 or dimdModelVal1.

The angle mode selection unit 310463 sets a predetermined mode (for example, intra prediction mode or transform mode) to dimdModeVal2 as the third mode. Here, dimdModeVal2=0 (Planar) is set, but this is not limited to this. Another mode may be set adaptively, or the third mode may not be used.

The angle mode selection unit 310463 may further derive weights corresponding to the representative values of each angle mode for intra prediction in the provisional predicted image generation unit 310464 described later. For example, the weight of the third mode is set to w2=21, and the remainder is allocated to weights w0 and w1 according to the frequency ratio of the first and second modes in the histogram. The sum of the weights is set to 64. The derivation of the weights is not limited to this, and the weights w0, w1, and w2 of the first, second, and third modes may be adaptively changed. For example, w2 may be increased or decreased according to the number of the first or second mode, or the frequency or ratio thereof. Note that the angle mode selection unit sets the corresponding weight value to 0 for any of the first to third modes when that mode is not used.

The angle mode selection unit 310463 is equipped with an angle mode selection unit that selects an angle mode representative value from multiple angle modes derived for pixels in the image for which gradient is to be derived, thereby enabling the deriving of an angle mode with higher accuracy.

In this way, the angle mode selection unit 310463 selects the angle mode (representative value of the angle mode) estimated from the gradient, and outputs it together with the weight corresponding to each angle mode.

(Configuration of the adaptive gradient derivation unit 310461 and the angle mode derivation unit 310462)
As described above, in this embodiment, the region of the reference image used to derive the intra prediction mode from the reference image is changed according to dimd_mode. Specifically, the positions of points P of the gradient derivation unit 310461, the angle mode derivation unit 310462, and the angle mode selection unit 310463 are changed according to dimd_mode.

The set of position ranges (x, y) used for gradient derivation, angle derivation, and histograms may be positions within the reference regions RL, RT, and RTL. That is, the position range of gradient derivation may be 1 larger at the start point and 1 smaller at the end point to apply a 3x3 filter. That is, if the reference range of DIMD prediction is x=X0..X1, y=Y0..Y1, the array range of gradient derivation may be x=X0+1..X1-1, y=Y0+1..Y1-1. Note that the position ranges of (x, y) for gradient derivation corresponding to RL, RT, and RTL are called RDL, RDT, and RDTL.

(Configuration example 1 of reference area according to mode)
17(a) shows an example of a reference range in gradient derivation for DIMD prediction. When dimd_mode == DIMD_MODE_TOP_LEFT, the angle mode derivation device 310465 derives Dx, Dy from each point P in the left region RDL of the target block, derives modeVal, and counts it in a histogram.
The RDL is x=-refIdxW..-2, y=-refIedxH..refH-2.
Next, Dx, Dy are derived from each point P of the RDT in the range of pixels in the region above the target block, and modeVal is derived and counted in a histogram.
RDT is x=-refIdxW..refW-2, y=-refIdxH..-2. RDTL is the combined domain of RDL and RDT.

When dimd_mode == DIMD_MODE_LEFT, the gradient derivation unit 310461 and angle mode derivation unit 310462 (hereinafter referred to as the angle mode derivation device 310465) derive Dx and Dy from the left region of the target block, for example the above RDL, derive modeVal, and count it in a histogram.

When dimd_mode == DIMD_MODE_TOP, the angle mode derivation device 310465 derives Dx and Dy from the area above the target block, for example, the RDT above, derives modeVal, and counts it in a histogram.

(Configuration example 2 of reference area according to mode)
In addition, the angle mode derivation device 310465 may perform the following processing.

When dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_LEFT, the angle mode derivation device 310465 derives Dx and Dy from the left area of the target block, for example, the above RDL, derives modeVal, and counts it in a histogram.

Furthermore, when dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_TOP, the angle mode derivation device 310465 derives Dx and Dy from the area above the target block, for example, the RDT mentioned above, derives modeVal, and counts it in a histogram.

(Example 3 of configuration of reference area according to mode: Example of using extension area according to dimd_mode)
Figure 17(b) shows another example of the reference range in the gradient derivation of DIMD prediction. In this example, when dimd_mode == DIMD_MODE_LEFT, an extended region including the adjacent region on the left and the bottom left is used. When dimd_mode == DIMD_MODE_TOP, an extended region including the adjacent region on the top and the top right is used. When dimd_mode == DIMD_MODE_TOP_LEFT, the left and top regions are used without extension.

For example, the angle mode derivation device 310465 may perform the following processing:

When dimd_mode == DIMD_MODE_TOP_LEFT, the reference sample derivation unit 310460 derives Dx, Dy from the left and bottom left areas of the target block, derives modeVal, and counts it in a histogram.
First, Dx, Dy are derived at point P at position (x, y) in the left region RDL of the target block, and modeVal is derived and counted in a histogram.
Here, the RDL is x=-refIdxW..-2, y=-refIdxH..refH-2, refW = bW, refH = bH.
Next, Dx, Dy are derived at point P at position (x, y) in region RDT above the target block, and modeVal is derived and counted in a histogram.
Here, RDT is x=-refIdxW..refW-2, y=-refIdxH..-2, refW = bW, refH = bH. RDTL is the domain of RDL and RDT.

When dimd_mode == DIMD_MODE_LEFT, the angle mode derivation device 310465 derives Dx, Dy from the left and bottom left areas of the target block, for example RDL_EXT, and derives modeVal and counts it in a histogram.
RDL_EXT is x=-refIdxW..-2, y=-refIdxH..refH2-2, refH2 = bH*2.
If the pixels in the region above the target block are not available, then y=1..refH2-2.

When dimd_mode == DIMD_MODE_TOP, the angle mode derivation device 310465 derives Dx, Dy from the region above the target block, for example, the above RDT_EXT, and derives modeVal and counts it in a histogram.
RDT_EXT is x=-refIdxW..refW2-2, y=-refIdxH..-2, refW2 = bW*2.
If the pixels in the area to the left of the target block are unavailable, then x=1..refW2-2.

(Configuration Example 4: Example 2 of using the extension area according to dimd_mode)
In addition, the angle mode derivation device 310465 may perform the following processing.

When dimd_mode == DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_LEFT, the reference sample derivation unit 310460 derives Dx and Dy from the left and bottom left areas of the target block, for example, RDL_ADAP, and derives modeVal and counts it in a histogram.
RDL_ADAP may use the following:
x=-refIdxW..-2, y=-refIdxH..refH-2, refH = (dimd_mode == DIMD_MODE_TOP_LEFT) ? bH : bH*2
In DIMD_MODE_LEFT, if the pixels in the area above the target block are not available, y=1..refH-2 is used.

Furthermore, if dimd_mode = DIMD_MODE_TOP_LEFT or dimd_mode == DIMD_MODE_TOP, the angle mode derivation device 310465 derives Dx and Dy from the upper and upper right regions of the target block, for example, RDT_ADAP, and derives modeVal and counts it in a histogram.
RDT_ADAP may use the following:
x=-refIdxW..refW-2, y=-refIdxH..-2, refW = (dimd_mode == DIMD_MODE_TOP_LEFT) ? bW*2 : bW
In DIMD_MODE_TOP, if pixels in the area to the left of the target block are not available, use x=1..refW-2.

The above configuration makes it possible to switch between at least the top and left, and the left and top of the target block as the adjacent images depending on the DIMD mode. Therefore, even if the characteristics of the target block differ from the characteristics of the adjacent area to the left or above, the intra prediction mode can be derived on the decoder side with high accuracy and high efficiency.

When the decoder derives the intra prediction mode using the gradient of pixel values of an image adjacent to the target area, the angle gradient of the adjacent image and the angle gradient of the target block do not necessarily match. Even in such cases, the effect of improving accuracy is achieved by switching the derivation of the intra prediction mode depending on the properties of the adjacent blocks and the target block.

Furthermore, when both the top and left are used (DIMD_MODE_TOP_LEFT), the top right and bottom left extension regions are not used, whereas when only the left (DIMD_MODE_LEFT) or only the top (DIMD_MODE_TOP) is used, the left and bottom left extension regions, and the top and top right extension regions are used, respectively, which has the effect of reducing the amount of processing required for sampling reference pixels, deriving gradients, and deriving histograms.

(Configuration Example 5 According to Mode)
Fig. 12(a) shows another example of the reference range in gradient derivation for DIMD prediction. In this example, the number of lines in the reference region is changed according to dimd_lines. For example, when dimd_mode == DIMD_LINES1, M lines are referenced, and when dimd_mode == DIMD_LINES2, N lines greater than M are referenced. M and N are, for example, M=3 and N=4.

The angle mode derivation device 310465 sets the reference line numbers refIdxW and refIdxH in accordance with dimd_mode.
refIdxW = (dimd_mode == DIMD_LINES1) ? M-1 : N-1
refIdxH = (dimd_mode == DIMD_LINES1) ? M-1 : N-1
The angle mode derivation unit 310465 derives Dx and Dy from the left region of the target block, for example, RDL, derives modeVal, and counts it in a histogram.

The angle mode derivation unit 310465 derives Dx and Dy from the area above the target block, for example, RDT, derives modeVal, and counts it in a histogram.

You can also select based on block size.

(Configuration example 1 according to block size)
The angle mode derivation device 310465 sets the reference line numbers refIdxW and refIdxH in accordance with the block size.
refIdxW = (bW >= 8 || bH >= 8) ? N-1 : M-1
refIdxH = (bW >= 8 || bH >= 8) ? N-1 : M-1
The angle mode derivation device 310465 derives Dx and Dy from the left region RDL and the top region RDT of the target block, derives modeVal, and counts it in a histogram.
(Configuration Example 6 According to Mode)
12(b) shows another example of the reference range in the gradient derivation of the DIMD prediction. In this example, the direction of the reference region is selected according to the dimd_mode, and at the same time, the number of lines of the reference region is also selected.

The angle mode derivation device 310465 derives the number of reference lines to be M if dimd_mode refers to the left and top, and otherwise derives the number of reference lines to be N (M<N), for example, M=2, N=3.
refIdxW = (dimd_mode == DIMD_MODE_TOP_LEFT) ? M-1 : N-1
refIdxH = (dimd_mode == DIMD_MODE_TOP_LEFT) ? M-1 : N-1
When dimd_mode == DIMD_MODE_TOP_LEFT, the reference sample derivation unit 310460 sets the number of reference lines to M, derives Dx and Dy from the left region RDL and the top region RDT, derives modeVal, and counts it in a histogram.

When dimd_mode == DIMD_MODE_LEFT, the angle mode derivation device 310465 sets the number of reference lines to N, derives Dx and Dy from the area to the left of the target block, for example, RDL, derives modeVal, and counts it in a histogram.

When dimd_mode == DIMD_MODE_TOP, the angle mode derivation device 310465 sets the number of reference lines to N, derives Dx and Dy from the area above the target block, for example, RDT, derives modeVal, and counts it in a histogram.

With the above configuration, in a configuration in which the intra prediction mode is derived on the decoder side using the gradient of pixel values of the reference area adjacent image, the number of lines to be referenced is switched between when both the top and left are used as the reference area, and when only the left or only the top is used. This further makes it possible to derive an intra prediction mode according to the difference in the continuity of the characteristics of the target block and the adjacent blocks, thereby improving prediction accuracy.

Furthermore, the number of reference lines when using both the top and left (DIMD_MODE_TOP_LEFT) is set to M, and the number of reference lines when using only the left (DIMD_MODE_LEFT) or only the top (DIMD_MODE_TOP) is set to N (where M<N). This configuration has the effect of reducing the amount of processing required for sampling reference pixels, deriving gradients, and deriving histograms by referring to both the top and left.

In the above configuration, the reference area to the left of the target block may be the left and bottom left reference areas, and the reference area above the target block may be the top and top right reference areas, or the top left of the target block may be referenced.

(4) Prediction Image Generation Unit The prediction image generation unit (provisional prediction image generation unit) 310464 generates a prediction image (provisional prediction image) using one or more input angle mode representative values (intra prediction modes). When there is one intra prediction mode, an intra prediction image is generated in that intra prediction mode and output as a provisional prediction image q[x][y]. When there are multiple intra prediction modes, a prediction image (pred0, pred1, pred2) is generated in each intra prediction mode. Multiple prediction images are synthesized using the corresponding weights (w0, w1, w2) and output as a prediction image q[x][y]. The prediction image q[x][y] is derived as follows.
q[x][y] = (w0 * pred0[x][y] + w1 * pred1[x][y] + w2 * pred2[x][y]) >> 6
However, if the frequency of the second mode is 0 or it is not a directional prediction mode (such as DC mode), the predicted image pred0[][] based on the first intra prediction mode is used as the predicted image q[][] (q[x][y]=pred0[0][0]).

(Dimd_mode Decryption Configuration)
The inverse quantization and inverse transform unit 311 inverse quantizes the quantized transform coefficients input from the prediction parameter derivation unit 320 to obtain transform coefficients. The quantized transform coefficients are coefficients obtained by performing frequency transform such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform) on the prediction error in the encoding process and quantizing the transform coefficients. The inverse quantization and inverse transform unit 311 performs inverse frequency transform such as inverse DCT or inverse DST on the transform coefficients to calculate the prediction error. The inverse quantization and inverse transform unit 311 outputs the prediction error to the adder unit 312.

FIG. 18 is a block diagram showing the configuration of the inverse quantization and inverse transform unit 311 of this embodiment. The inverse quantization and inverse transform unit 311 is composed of a scaling unit 31111, an inverse non-separable transform unit 31121, and an inverse separate transform unit 31123. Note that the transform coefficients decoded from the encoded data may be transformed using the angle mode derived by the angle mode derivation device 310465.

The inverse quantization and inverse transform unit 311 obtains the transform coefficients d[][] by scaling (inverse quantization) the quantized transform coefficients qd[][] input from the prediction parameter derivation unit 320 using the scaling unit 31111. The quantized transform coefficients qd[][] are coefficients obtained by performing a transform such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform) on the prediction error in the encoding process and quantizing it, or coefficients obtained by further performing a non-separable transform on the transformed coefficients. When the non-separable transform flag lfnst_idx!=0, the inverse quantization and inverse transform unit 31121 performs an inverse transform. Furthermore, an inverse frequency transform such as inverse DCT or inverse DST is performed on the transform coefficients to calculate the prediction error. Also, when lfnst_idx==0, the inverse non-separable transform unit 31121 does not perform processing, and the scaling unit 31111 performs inverse frequency transform such as inverse DCT and inverse DST on the scaled transform coefficients to calculate a prediction error. The inverse quantization and inverse transform unit 311 outputs the prediction error to the adder 312.

The adder 312 adds, for each pixel, the predicted image of the block input from the predicted image generation unit 308 and the prediction error input from the inverse quantization and inverse transform unit 311 to generate a decoded image of the block. The adder 312 stores the decoded image of the block in the reference picture memory 306, and also outputs it to the loop filter 305.

(Configuration of the video encoding device)
Next, the configuration of the video encoding device 11 according to this embodiment will be described. Fig. 19 is a block diagram showing the configuration of the video encoding device 11 according to this embodiment. The video encoding device 11 includes a prediction image generating unit 101, a subtraction unit 102, a transformation/quantization unit 103, an inverse quantization/inverse transformation unit 105, an addition unit 106, a loop filter 107, a prediction parameter memory (prediction parameter storage unit, frame memory) 108, a reference picture memory (reference image storage unit, frame memory) 109, an encoding parameter determining unit 110, a parameter encoding unit 111, an entropy encoding unit 104, and a prediction parameter derivation unit 120.

The predicted image generating unit 101 generates a predicted image for each CU, which is an area obtained by dividing each picture of the image T. The predicted image generating unit 101 operates in the same way as the predicted image generating unit 308 already explained, and so a description thereof will be omitted.

The subtraction unit 102 subtracts the pixel values of the predicted image of the block input from the predicted image generation unit 101 from the pixel values of image T to generate a prediction error. The subtraction unit 102 outputs the prediction error to the transformation and quantization unit 103.

The transform/quantization unit 103 calculates transform coefficients by frequency transforming the prediction error input from the subtraction unit 102, and derives quantized transform coefficients by quantizing it. The transform/quantization unit 103 outputs the quantized transform coefficients to the entropy coding unit 104, the inverse quantization/inverse transform unit 105, and the coding parameter determination unit 110.

The inverse quantization and inverse transform unit 105 is the same as the inverse quantization and inverse transform unit 311 (FIG. 4) in the video decoding device 31, and a description thereof will be omitted. The calculated prediction error is output to the addition unit 106.

The entropy coding unit 104 receives prediction parameters and quantized transform coefficients from the parameter coding unit 111. The entropy coding unit 104 entropy codes the split information, prediction parameters, quantized transform coefficients, etc. to generate and output an encoded stream Te.

The parameter coding unit 111 instructs the entropy coding unit 104 to code the prediction parameters, quantization coefficients, etc. derived by the prediction parameter derivation unit 120.

The prediction parameter derivation unit 120 derives syntax elements from the parameters input from the encoding parameter determination unit 110. The prediction parameter derivation unit 120 includes a configuration that is partially the same as the configuration of the prediction parameter derivation unit 320.

The adder 106 generates a decoded image by adding, for each pixel, the pixel values of the predicted image of the block input from the predicted image generation unit 101 and the prediction error input from the inverse quantization and inverse transform unit 105. The adder 106 stores the generated decoded image in the reference picture memory 109.

The loop filter 107 applies a deblocking filter, SAO, and ALF to the decoded image generated by the adder 106. Note that the loop filter 107 does not necessarily have to include the above three types of filters, and may be configured, for example, as only a deblocking filter.

The prediction parameter memory 108 stores the prediction parameters input from the prediction parameter derivation unit 120 in a predetermined location for each target picture and CU.

The reference picture memory 109 stores the decoded image generated by the loop filter 107 in a predetermined location for each target picture and CU.

The coding parameter determination unit 110 selects one set from among multiple sets of coding parameters. The coding parameters are the above-mentioned QT, BT or TT division information, prediction parameters, or parameters to be coded that are generated in relation to these. The predicted image generation unit 101 generates a predicted image using these coding parameters.

The coding parameter determination unit 110 calculates an RD cost value indicating the amount of information and the coding error for each of the multiple sets. The coding parameter determination unit 110 selects the set of coding parameters that minimizes the calculated cost value. As a result, the entropy coding unit 104 outputs the selected set of coding parameters as the coding stream Te. The coding parameter determination unit 110 stores the determined coding parameters in the prediction parameter memory 108.

Note that a part of the video encoding device 11 and video decoding device 31 in the above-mentioned embodiment, for example, the entropy decoding unit 301, the parameter decoding unit 302, the loop filter 305, the predicted image generating unit 308, the inverse quantization and inverse transform unit 311, the addition unit 312, the predicted image generating unit 101, the subtraction unit 102, the transform and quantization unit 103, the entropy encoding unit 104, the inverse quantization and inverse transform unit 105, the loop filter 107, the encoding parameter determination unit 110, and the parameter encoding unit 111 may be realized by a computer. In this case, a program for realizing this control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into and executed by a computer system. Note that the "computer system" referred to here is a computer system built into either the video encoding device 11 or the video decoding device 31, and includes hardware such as an OS and peripheral devices. Additionally, "computer-readable recording media" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording media" may also include devices that dynamically store a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, or devices that store a program for a certain period of time, such as volatile memory within a computer system that serves as a server or client in such cases. Furthermore, the above-mentioned program may be one that realizes part of the functions described above, or may be one that can realize the functions described above in combination with a program already recorded in the computer system.

Furthermore, part or all of the video encoding device 11 and video decoding device 31 in the above-mentioned embodiments may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the video encoding device 11 and video decoding device 31 may be individually made into a processor, or part or all of them may be integrated into a processor. The integrated circuit method is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Furthermore, if an integrated circuit technology that can replace LSI appears due to advances in semiconductor technology, an integrated circuit based on that technology may be used.

Although one embodiment of the present invention has been described in detail above with reference to the drawings, the specific configuration is not limited to that described above, and various design changes can be made without departing from the spirit of the present invention.

(summary)
An image decoding device according to aspect 1 of the present invention includes a reference sample derivation unit that selects a neighboring image of a current block in accordance with a DIMD mode, a gradient derivation unit that derives a pixel-by-pixel gradient using the selected neighboring image, and an angle mode selection unit that derives an intra-prediction mode from the gradient.

The image decoding device according to aspect 2 of the present invention is characterized in that, in the above aspect 1, it includes an entropy decoding unit that decodes the DIMD flag and DIMD mode of the target block from the encoded data, and further includes a predicted image generating unit that, when the DIMD flag is true, further decodes the DIMD mode and generates a predicted image using the derived intra prediction mode.

The image decoding device according to aspect 3 of the present invention is characterized in that in any of

aspects

1 and 2 above, the DIMD mode switches at least between the top and left, and the left and top as the adjacent images.

The image decoding device according to aspect 4 of the present invention is characterized in that in any of aspects 1 to 3 above, the DIMD mode is composed of a first bit and a second bit, and the first bit is a choice between top and left, and the second bit is a choice between left or top to select the adjacent image.

The image decoding device according to aspect 5 of the present invention is characterized in that in any one of aspects 1 to 4, the entropy decoding unit decodes the DIMD mode using a context that holds a probability for decoding the first bit, and using an equal probability without using a context for decoding the second bit.

The image decoding device according to aspect 6 of the present invention is characterized in that in any one of aspects 1 to 5, the entropy decoding unit decodes the DIMD mode using a context that holds probabilities for decoding the first bit and the second bit.

The image decoding device according to aspect 7 of the present invention is any one of aspects 1 to 6 above, characterized in that the entropy decoding unit derives the context index using the width and height of the target block.

The image decoding device according to aspect 8 of the present invention is any one of aspects 1 to 7 above, characterized in that the entropy decoding unit derives a context index from the second bit by using a determination of whether the target block is a square or not.

The image decoding device according to aspect 9 of the present invention is characterized in that in any one of aspects 1 to 8 above, the gradient derivation unit changes the number of lines to be referenced depending on dimd_mode.

The image decoding device according to aspect 10 of the present invention is characterized in that in any one of aspects 1 to 9 above, the gradient derivation unit changes the number of lines to be referenced depending on the size of the target block.

An image decoding device according to aspect 11 of the present invention is characterized in that in any one of aspects 1 to 10 above, the gradient derivation unit changes the number of lines to be referenced and the reference direction according to dimd_mode.

The image encoding device according to aspect 12 of the present invention includes a reference sample derivation unit that selects an adjacent image of a target block according to a DIMD mode, a gradient derivation unit that uses the selected adjacent image to derive a pixel-by-pixel gradient, and an angle mode selection unit that derives an intra prediction mode from the gradient.

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of priority to Japanese patent application No. 2022-163200, filed on October 11, 2022, the entire contents of which are incorporated herein by reference.

Embodiments of the present invention can be suitably applied to a video decoding device that decodes coded data in which image data has been coded, and a video coding device that generates coded data in which image data has been coded. The present invention can also be suitably applied to the data structure of coded data that is generated by a video coding device and referenced by the video decoding device.

31 Image Decoding Device
301 Entropy Decoding Unit
302 Parameter Decoding Unit
308 Prediction Image Generation Unit
31046 DIMD Prediction Department
310460 Reference sample derivation part
310465 Angle mode extraction device
310461 Gradient derivation part
310462 Angle mode derivation part
310463 Angle mode selection section
310464 Temporary predicted image generation unit
311 Inverse quantization and inverse transformation unit
312 Addition section
11 Image Encoding Device
101 Prediction image generation unit
102 Subtraction section
103 Transformation and Quantization Section
104 Entropy coding unit
105 Inverse quantization and inverse transformation section
107 Loop Filter
110 Encoding parameter determination unit
111 Parameter Encoding Unit

Claims

An image decoding device that includes a reference sample derivation unit that selects adjacent images of a target block according to a DIMD mode, a gradient derivation unit that uses the selected adjacent images to derive pixel-by-pixel gradients, and an angle mode selection unit that derives an intra prediction mode from the gradient.
The image decoding device according to claim 1, further comprising an entropy decoding unit that decodes the DIMD flag and DIMD mode of the target block from the encoded data, and a predicted image generating unit that further decodes the DIMD mode when the DIMD flag is true, and further generates a predicted image using the derived intra prediction mode.
The image decoding device according to claim 1, characterized in that the DIMD mode switches at least between the top and left, and the left and top as the adjacent images.
The image decoding device according to claim 2, characterized in that the DIMD mode is composed of a first bit and a second bit, the first bit being a choice between top and left, and the second bit being a choice between left and top, to select the adjacent image.
The image decoding device according to claim 4, characterized in that the entropy decoding unit decodes the DIMD mode using a context that holds a probability for decoding the first bit, and using equal probability without using a context for decoding the second bit.
The image decoding device according to claim 4, characterized in that the entropy decoding unit decodes the DIMD mode using a context that holds a probability for decoding the first bit and the second bit.
The image decoding device according to claim 2, characterized in that the entropy decoding unit derives the context index using the width and height of the target block.
The image decoding device according to claim 4, characterized in that the entropy decoding unit derives a context index from the second bit by using a determination of whether the target block is a square or not.
The image decoding device according to claim 1, characterized in that the gradient derivation unit changes the number of lines to be referenced depending on dimd_mode.
The image decoding device according to claim 1, characterized in that the gradient derivation unit changes the number of lines to be referenced depending on the size of the target block.
The image decoding device according to claim 1, characterized in that the gradient derivation unit changes the number of lines to be referenced and the reference direction according to dimd_mode.
An image encoding device that includes a reference sample derivation unit that selects an adjacent image of a target block according to a DIMD mode, a gradient derivation unit that uses the selected adjacent image to derive a pixel-by-pixel gradient, and an angle mode selection unit that derives an intra prediction mode from the gradient.