WO2024080216A1 - Dispositif de décodage d'image et dispositif de codage d'image - Google Patents

Dispositif de décodage d'image et dispositif de codage d'image Download PDF

Info

Publication number
WO2024080216A1
WO2024080216A1 PCT/JP2023/036356 JP2023036356W WO2024080216A1 WO 2024080216 A1 WO2024080216 A1 WO 2024080216A1 JP 2023036356 W JP2023036356 W JP 2023036356W WO 2024080216 A1 WO2024080216 A1 WO 2024080216A1
Authority
WO
WIPO (PCT)
Prior art keywords
mode
dimd
unit
prediction
image
Prior art date
Application number
PCT/JP2023/036356
Other languages
English (en)
Japanese (ja)
Inventor
哲銘 范
知宏 猪飼
将伸 八杉
友子 青野
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2024080216A1 publication Critical patent/WO2024080216A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • Embodiments of the present invention relate to an image decoding device and an image encoding device.
  • a video encoding device that generates encoded data by encoding video
  • a video decoding device is used that generates a decoded image by decoding the encoded data
  • video coding methods include those proposed in H.264/AVC and HEVC (High-Efficiency Video Coding).
  • the images (pictures) that make up a video are managed in a hierarchical structure consisting of slices obtained by dividing the images, coding tree units (CTUs) obtained by dividing the slices, coding units (sometimes called coding units: CUs) obtained by dividing the coding tree units, and transform units (TUs) obtained by dividing the coding units, and are coded/decoded for each CU.
  • CTUs coding tree units
  • coding units sometimes called coding units: CUs
  • transform units TUs
  • a predicted image is usually generated based on a locally decoded image obtained by encoding/decoding an input image, and the prediction error (sometimes called a "difference image” or “residual image") obtained by subtracting the predicted image from the input image (original image) is coded.
  • the prediction error sometimes called a "difference image” or “residual image”
  • Methods for generating predicted images include inter-frame prediction (inter prediction) and intra-frame prediction (intra prediction).
  • Non-Patent Document 1 discloses decoder-side intra mode derivation (DIMD) prediction, in which the decoder derives a predicted image by deriving an intra direction prediction mode number using pixels in adjacent regions.
  • DIMD decoder-side intra mode derivation
  • Non-Patent Document 1 the intra mode is derived on the decoder side using the gradient of pixel values of the image adjacent to the target area, but there is an issue that the angle gradient of the adjacent image and the angle gradient of the target block do not necessarily match.
  • the present invention aims to improve the accuracy of decoder-side intra mode derivation by switching intra prediction mode derivation depending on the properties of adjacent blocks and the current block.
  • It includes a reference sample derivation unit that selects adjacent images for the target block according to the DIMD mode, a gradient derivation unit that uses the selected adjacent images to derive pixel-level gradients, and an angle mode selection unit that derives the intra prediction mode from the gradient.
  • FIG. 1 is a schematic diagram showing a configuration of an image transmission system according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing a hierarchical structure of data in an encoded stream.
  • FIG. 13 is a schematic diagram showing types of intra-prediction modes (mode numbers).
  • FIG. 1 is a schematic diagram showing a configuration of a video decoding device. This is an example of DIMD syntax. 13 is a diagram explaining the binarization of the syntax dimd_mode used in the DIMD prediction unit 31046. FIG. Here is another example of DIMD syntax. A diagram showing context settings in decoding syntax elements of dimd_mode.
  • FIG. 2 is a diagram illustrating a configuration of a predicted image generating unit.
  • FIG. 13 is a diagram showing details of a DIMD prediction unit.
  • FIG. 13 is a diagram showing an example of a reference region referred to by a DIMD prediction unit 31046.
  • FIG. 13 is a diagram showing a configuration for changing the number of lines in a reference region for DIMD prediction according to dimd_mode.
  • FIG. This is an example of a spatial filter.
  • FIG. 13 is a diagram illustrating an example of a pixel from which gradient is derived;
  • FIG. 13 is a diagram illustrating the relationship between gradient and area.
  • 4 is a block diagram showing a configuration of an angle mode derivation unit.
  • FIG. 13 is a diagram showing an example of a reference range in gradient derivation by the DIMD prediction unit 31046.
  • FIG. 4 is a functional block diagram showing an example of the configuration of an inverse quantization and inverse transform unit.
  • FIG. FIG. 1 is a block diagram showing a configuration of a video encoding device.
  • FIG. 1 is a schematic diagram showing the configuration of an image transmission system 1 according to this embodiment.
  • the image transmission system 1 is a system that transmits an encoded stream obtained by encoding an image to be encoded, and decodes the transmitted encoded stream to display an image.
  • the image transmission system 1 is composed of a video encoding device (image encoding device) 11, a network 21, a video decoding device (image decoding device) 31, and a video display device (image display device) 41.
  • An image T is input to the video encoding device 11.
  • the network 21 transmits the encoded stream Te generated by the video encoding device 11 to the video decoding device 31.
  • the network 21 is the Internet, a wide area network (WAN), a local area network (LAN), or a combination of these.
  • the network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting.
  • the network 21 may also be replaced by a storage medium on which the encoded stream Te is recorded, such as a DVD (Digital Versatile Disc: registered trademark) or a BD (Blu-ray Disc: registered trademark).
  • the video decoding device 31 decodes each of the encoded streams Te transmitted by the network 21 and generates one or more decoded images Td.
  • the video display device 41 displays all or part of one or more decoded images Td generated by the video decoding device 31.
  • the video display device 41 is equipped with a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Display forms include stationary, mobile, HMD, etc. Furthermore, when the video decoding device 31 has high processing power, it displays high quality images, and when it has only lower processing power, it displays images that do not require high processing power or display power.
  • x?y:z is a ternary operator that takes y if x is true (non-zero) and z if x is false (0).
  • BitDepthY is the luminance bit depth.
  • abs(a) is a function that returns the absolute value of a.
  • Int(a) is a function that returns the integer value of a.
  • Floor(a) is a function that returns the largest integer less than or equal to a.
  • Log2(a) is a function that returns the logarithm to the base 2.
  • Ceil(a) is a function that returns the smallest integer greater than or equal to a.
  • a/d represents the division of a by d (rounded down to nearest whole number).
  • Min(a,b) is a function that returns the smaller of a and b.
  • FIG. 2 is a diagram showing the hierarchical structure of data in an encoded stream Te.
  • the encoded stream Te illustratively includes a sequence and a number of pictures that make up the sequence.
  • FIG. 2 shows a coded video sequence that defines a sequence SEQ, a coded picture that specifies a picture PICT, a coded slice that specifies a slice S, coded slice data that specifies slice data, a coding tree unit included in the coded slice data, and a coding unit included in the coding tree unit.
  • the coded video sequence defines a set of data to be referred to by the video decoding device 31 in order to decode the sequence SEQ to be processed.
  • the sequence SEQ includes a video parameter set VPS (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, and supplemental enhancement information SEI (Supplemental Enhancement Information).
  • the video parameter set VPS specifies a set of coding parameters common to multiple videos composed of multiple layers, as well as a set of coding parameters related to multiple layers and each individual layer included in the video.
  • the sequence parameter set SPS specifies a set of coding parameters that the video decoding device 31 references in order to decode the target sequence. For example, the width and height of a picture are specified. Note that there may be multiple SPSs. In that case, one of the multiple SPSs is selected from the PPS.
  • the picture parameter set PPS specifies a set of coding parameters that the video decoding device 31 references in order to decode each picture in the target sequence. For example, it includes the reference value of the quantization width used in decoding the picture (pic_init_qp_minus26) and a flag indicating the application of weighted prediction (weighted_pred_flag). Note that there may be multiple PPSs. In that case, one of the multiple PPSs is selected for each picture in the target sequence.
  • a coded picture defines a set of data to be referenced by the video decoding device 31 in order to decode a picture PICT to be processed. As shown in the coded picture of FIG. 2, the picture PICT includes slices 0 to NS-1 (NS is the total number of slices included in the picture PICT).
  • An encoded slice defines a set of data to be referenced by the video decoding device 31 in order to decode a slice S to be processed. As shown in the encoded slice of Fig. 2, a slice includes a slice header and slice data.
  • the slice header includes a set of coding parameters that the video decoding device 31 refers to in order to determine the decoding method for the target slice.
  • Slice type designation information (slice_type) that specifies the slice type is an example of a coding parameter included in the slice header.
  • Slice types that can be specified by the slice type specification information include (1) an I slice that uses only intra prediction when encoding, (2) a P slice that uses unidirectional prediction or intra prediction when encoding, and (3) a B slice that uses unidirectional prediction, bidirectional prediction, or intra prediction when encoding.
  • inter prediction is not limited to unidirectional or bidirectional prediction, and a predicted image may be generated using more reference pictures.
  • P or B slice it refers to a slice that includes a block for which inter prediction can be used.
  • the slice header may include a reference to the picture parameter set PPS (pic_parameter_set_id).
  • the coded slice data specifies a set of data to be referenced by the video decoding device 31 in order to decode the slice data to be processed.
  • the slice data includes a CTU, as shown in the coded slice header in Fig. 2.
  • a CTU is a block of a fixed size (e.g., 64x64) that constitutes a slice, and is also called a Largest Coding Unit (LCU).
  • LCU Largest Coding Unit
  • coding tree unit In the coding tree unit in Fig. 2, a set of data that the video decoding device 31 refers to in order to decode the CTU to be processed is specified.
  • the CTU is divided into coding units CU, which are basic units of the coding process, by recursive quad tree division (QT (Quad Tree) division), binary tree division (BT (Binary Tree) division), or ternary tree division (TT (Ternary Tree) division).
  • BT division and TT division are collectively called multi tree division (MT (Multi Tree) division).
  • a node of a tree structure obtained by recursive quad tree division is called a coding node.
  • the intermediate nodes of the quad tree, binary tree, and ternary tree are coding nodes, and the CTU itself is specified as the top coding node.
  • the CU is composed of a CU header CUH, prediction parameters, transformation parameters, quantization transformation coefficients, etc.
  • the CU header defines a prediction mode, etc.
  • Prediction processing may be performed on a CU basis, or on a sub-CU basis, which is a further division of a CU. If the size of the CU and sub-CU are equal, there is one sub-CU in the CU. If the size of the CU is larger than the size of the sub-CU, the CU is divided into sub-CUs. For example, if the CU is 8x8 and the sub-CU is 4x4, the CU is divided into 2 parts horizontally and 2 parts vertically, into 4 sub-CUs.
  • Intra prediction is a prediction within the same picture
  • inter prediction refers to a prediction process performed between different pictures (for example, between display times or between layer images).
  • the transform and quantization process is performed on a CU basis, but the quantized transform coefficients may be entropy coded on a subblock basis, such as 4x4.
  • the predicted image is derived from prediction parameters associated with the block, which include intra-prediction and inter-prediction parameters.
  • the intra prediction parameters consist of a luminance prediction mode IntraPredModeY and a chrominance prediction mode IntraPredModeC.
  • Figure 3 is a schematic diagram showing the types of intra prediction modes (mode numbers). As shown in the figure, there are, for example, 67 types of intra prediction modes (0 to 66). These include planar prediction (0), DC prediction (1), and angular prediction (2 to 66).
  • linear model (LM: Linear Model) prediction such as cross component linear model (CCLM: Cross Component Linear Model) prediction and multi-mode linear model (MMLM: Multi Mode Linear Model) prediction may also be used.
  • LM Linear Model
  • CCLM Cross Component Linear Model
  • MMLM Multi Mode Linear Model
  • an LM mode may be added for chrominance.
  • the video decoding device 31 includes an entropy decoding unit 301, a parameter decoding unit (prediction image decoding device) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generating unit (prediction image generating device) 308, an inverse quantization and inverse transform unit 311, an addition unit 312, and a prediction parameter derivation unit 320.
  • the video decoding device 31 may also be configured not to include the loop filter 305.
  • CTU and CU are used as processing units, but this is not limiting and processing may be performed in sub-CU units.
  • CTU and CU may be read as blocks and sub-CU as sub-blocks, and processing may be performed in block or sub-block units.
  • the entropy decoding unit 301 performs entropy decoding on the externally input encoded stream Te and parses each code (syntax element).
  • entropy coding There are two types of entropy coding: one is to perform variable-length coding of syntax elements using a context (probability model) adaptively selected according to the type of syntax element and surrounding circumstances, and the other is to perform variable-length coding of syntax elements using a predefined table or formula.
  • CABAC Context Adaptive Binary Arithmetic Coding
  • the probability model of a picture using the same slice type and quantization parameters of the same slice level is set as the initial state of the context of a P picture or B picture. This initial state is used for the encoding and decoding processes.
  • the parsed code includes prediction information for generating a predicted image and prediction errors for generating a difference image.
  • the entropy decoding unit 301 may decode each bin of the syntax element using the variables ivlCurrRange, ivlOffset, valIdx, pStateIdx0, and pStateIdx1.
  • ivlCurrRange and ivlOffset are context-independent variables.
  • valIdx, pStateIdx0, and pStateIdx1 are context-specific variables.
  • binVal valMps Furthermore, the entropy decoding unit 301 updates the state of the context by the following calculation.
  • pStateIdx1 pStateIdx1 - (pStateIdx1 >> shift1) + (16383 * binVal >> shift1) (Bin decryption in case of bypass)
  • the entropy decoding unit 301 obtains ivlCurrRange and ivlOffset by the following calculation.
  • ivlCurrRange ivlCurrRange ⁇ 1
  • ivlOffset ivlOffset
  • read_bits(1) reads one bit from the bitstream and returns that value.
  • the entropy decoding unit 301 outputs the parsed syntax elements to the parameter decoding unit 302. Control of which syntax elements to parse is performed based on instructions from the parameter decoding unit 302.
  • the entropy decoding unit 301 may parse, for example, the syntax element dimd_mode shown in the syntax table of FIG. 5 as follows.
  • dimd_mode is a syntax element that selects the reference region of the DIMD from the encoded data.
  • dimd_mode parses dimd_mode from the encoded data.
  • dimd_mode may be DIMD_MODE_TOP_LEFT mode, DIMD_MODE_TOP mode, or DIMD_MODE_LEFT mode, which may be 0, 1, or 2, respectively.
  • Figure 6(a) shows an example of binarization of dimd_mode.
  • Bin0 Flag to select DIMD_MODE_TOP_LEFT or other. 0 indicates DIMD_MODE_TOP_LEFT, 1 indicates not DIMD_MODE_TOP_LEFT mode.
  • the syntax element assigned to Bin0 is called dimd_mode_flag
  • the syntax element assigned to Bin1 is called dimd_mode_dir (see, for example, FIG. 7).
  • 1 bit (for example, "0") is assigned to DIMD_MODE_TOP_LEFT, and 1 more bit is assigned after "1" to DIMD_MODE_TOP and DIMD_MODE_LEFT.
  • shorter bits are assigned than when using left or top, which has the effect of shortening the average code amount and improving the coding efficiency.
  • dimd_mode parses dimd_mode from the encoded data.
  • dimd_mode may be DIMD_LINES1 mode or DIMD_LINED2, which may be 0 or 1, respectively.
  • Figure 8 shows the setting of the context (ctxInc) when parsing the syntax element of dimd_mode.
  • a context is a variable area for holding the probability (state) of CABAC, and is identified by the value of the context index ctxIdx (0, 1, 2, ). The case where 0 and 1 are always equally probable, in other words 0.5, 0.5, is called EP (Equal Probability) or bypass. In this case, no context is used because there is no need to hold a state for a specific syntax element.
  • ctxIdx is derived by referencing ctxInc.
  • Bin0 is a syntax element indicating whether DIMD_MODE_TOP_LEFT
  • Bin1 is a syntax element indicating whether DIMD_MODE_LEFT or DIMD_MODE_TOP.
  • Bypass is a parsing method that does not use a context.
  • dimd_mode is a syntax element that selects the DIMD reference area from the encoded data. With the above configuration, no context is used to select between DIMD_MODE_LEFT and DIMD_MODE_TOP, which has the effect of reducing memory.
  • the formula and values are not limited to the above, and the order of judgment and values may be changed.
  • ctxIdx ( bW > bH ) ? 1 : ( bW ⁇ bH ) ? 2 : bypass
  • the DIMD mode (dimd_mode) is composed of a first bit and a second bit, and the first bit selects whether the reference area of the DIMD is both above and to the left of the target block, and the second bit selects whether the reference area of the DIMD is to the left or above the target block, and the above adjacent area may be selected.
  • a predetermined context e.g. 1, 2
  • a different context e.g., 2
  • dimd_mode is decoded using the value obtained by swapping the binary value of Bin1 (1 to 0, 0 to 1, for example, 1 - Bin1) depending on whether bW > bH or bH ⁇ bW.
  • dimd_mode is derived as follows.
  • the above configuration uses different contexts depending on the shape of the target block, for example, whether the target block is square or not (and/or whether it is horizontal or vertical), so it is possible to adaptively encode the block with a short code according to its characteristics, improving performance. Also, if no context is used when the block is square, for example, this has the effect of reducing memory usage.
  • xC and yC are variables related to the top left position of the block
  • refIdxW and refIdxH are variables related to the number of lines in the reference area for DIMD prediction.
  • dimd_mode DIMD_MODE_TOP
  • dimd_mode DIMD_MODE_LEFT
  • the parameter decoding unit 302 informs the entropy decoding unit 301 which syntax elements to parse.
  • the entropy decoding unit 301 outputs the syntax elements parsed by the entropy decoding unit 301 to the prediction parameter derivation unit 320.
  • the prediction parameter derivation unit 320 derives prediction parameters, for example, an intra-prediction mode IntraPredMode, by referring to the prediction parameters stored in the prediction parameter memory 307 based on the syntax elements input from the parameter decoding unit 302.
  • the prediction parameter derivation unit 320 outputs the derived prediction parameters to the predicted image generation unit 308, and also stores them in the prediction parameter memory 307.
  • the prediction parameter derivation unit 320 may derive different prediction modes for luminance and chrominance.
  • the prediction parameter derivation unit 320 may derive prediction parameters from syntax elements related to intra prediction such as those shown in FIG. 5.
  • the loop filter 305 is a filter provided in the encoding loop that removes block distortion and ringing distortion and improves image quality.
  • the loop filter 305 applies filters such as a deblocking filter, sample adaptive offset (SAO), and adaptive loop filter (ALF) to the decoded image of the CU generated by the adder 312.
  • filters such as a deblocking filter, sample adaptive offset (SAO), and adaptive loop filter (ALF) to the decoded image of the CU generated by the adder 312.
  • the reference picture memory 306 stores the decoded image of the CU generated by the adder 312 in a predetermined location for each target picture and target CU.
  • the prediction parameter memory 307 stores prediction parameters at a predetermined location for each CTU or CU to be decoded. Specifically, the prediction parameter memory 307 stores the parameters decoded by the parameter decoding unit 302 and the prediction mode predMode derived by the prediction parameter derivation unit 320.
  • the prediction mode predMode, prediction parameters, etc. are input to the prediction image generation unit 308.
  • the prediction image generation unit 308 also reads a reference picture from the reference picture memory 306.
  • the prediction image generation unit 308 generates a prediction image of a block or sub-block using the prediction parameters and the read reference picture (reference picture block).
  • a reference picture block is a set of pixels on the reference picture (usually rectangular, so called a block), and is the area referenced to generate a prediction image.
  • the predicted image generation unit 310 performs intra prediction using the intra prediction parameters input from the prediction parameter derivation unit 320 and the reference pixels read from the reference picture memory 306.
  • the predicted image generation unit 308 reads adjacent blocks in a predetermined range from the target block on the target picture from the reference picture memory 306.
  • the predetermined range refers to the adjacent blocks to the left, upper left, upper, and upper right of the target block, and the area to be referenced differs depending on the intra prediction mode.
  • the predicted image generation unit 308 generates a predicted image of the current block by referring to the decoded pixel values that have been read and the prediction mode indicated by IntraPredMode.
  • the predicted image generation unit 308 outputs the generated predicted image of the block to the addition unit 312.
  • reference region R a decoded surrounding area adjacent (close) to the block to be predicted is set as reference region R. Then, a predicted image is generated by extrapolating pixels in reference region R in a specific direction.
  • reference region R may be set as an L-shaped region that includes the left and top of the block to be predicted (or further, the top left, top right, and bottom left).
  • the predicted image generation unit 308 includes a reference sample filter unit 3103 (second reference image setting unit), a prediction unit 3104, and a predicted image correction unit 3105 (predicted image correction unit, filter switching unit, and weighting coefficient changing unit).
  • the prediction unit 3104 Based on each reference pixel (reference image) in the reference region R, a filtered reference image generated by applying a reference pixel filter (first filter), and the intra prediction mode, the prediction unit 3104 generates a prediction image (provisional predicted image, uncorrected predicted image) of the block to be predicted, and outputs it to the prediction image correction unit 3105.
  • the prediction image correction unit 3105 corrects the provisional predicted image according to the intra prediction mode, and generates and outputs a prediction image (corrected predicted image).
  • the reference sample filter unit 3103 derives a reference sample s[x][y] at each position (x, y) on the reference region R by referring to the reference image.
  • the reference sample filter unit 3103 applies a reference pixel filter (first filter) to the reference sample s[x][y] according to the intra prediction mode to update the reference sample s[x][y] at each position (x, y) on the reference region R (derives a filtered reference image s[x][y]).
  • a low-pass filter is applied to the position (x, y) and the reference image therearound to derive a filtered reference image.
  • a low-pass filter may be applied to some intra prediction modes.
  • the filter applied to the reference image on the reference region R in the reference sample filter unit 3103 is referred to as a "reference pixel filter (first filter)"
  • the filter that corrects the tentative predicted image in the prediction image correction unit 3105 described later is referred to as a "position-dependent filter (second filter)”.
  • the intra prediction unit generates a tentative predicted image (tentative predicted pixel value, pre-corrected predicted image) of a prediction target block based on an intra prediction mode, a reference image, and a filtered reference pixel value, and outputs the generated image to a prediction image correction unit 3105.
  • the prediction unit 3104 includes a planar prediction unit 31041, a DC prediction unit 31042, an angular prediction unit 31043, an LM prediction unit 31044, a matrix-based intra prediction unit 31045, and a DIMD prediction unit 31046 (Decoder-side Intra Mode Derivation, DIMD).
  • the prediction unit 3104 selects a specific prediction unit according to the intra prediction mode, and inputs a reference image and a filtered reference image.
  • the relationship between the intra prediction mode and the corresponding prediction unit is as follows. ⁇ Planar prediction ⁇ Planar prediction section 31041 ⁇ DC prediction...DC prediction unit 31042 ⁇ Angular prediction ⁇ Angular prediction section 31043 LM prediction...LM prediction unit 31044 ⁇ Matrix intra prediction ⁇ MIP part 31045 DIMD prediction: DIMD prediction unit 31046 (Planar forecast)
  • the planar prediction unit 31041 generates a provisional predicted image by linearly adding the reference sample s[x][y] according to the distance between the prediction target pixel position and the reference pixel position, and outputs the provisional predicted image to the predicted image correction unit 3105.
  • the DC prediction unit 31042 derives a DC predicted value equivalent to the average value of the reference samples s[x][y], and outputs a temporary predicted image q[x][y] whose pixel values are the DC predicted values.
  • the angular prediction unit 31043 generates a temporary predicted image q[x][y] using a reference sample s[x][y] in the prediction direction (reference direction) indicated by the intra prediction mode, and outputs the temporary predicted image q[x][y] to the predicted image correction unit 3105.
  • the LM prediction unit 31044 predicts pixel values of chrominance based on pixel values of luminance. Specifically, this is a method of generating a predicted image of a chrominance image (Cb, Cr) using a linear model based on a decoded luminance image.
  • LM prediction is a prediction method that uses a linear model to predict chrominance from luminance for one block.
  • the MIP unit 31045 generates a temporary predicted image q[x][y] by performing a product-sum operation on the reference sample s[x][y] derived from the adjacent region and a weighting matrix, and outputs the generated image to the predicted image correction unit 3105.
  • the DIMD prediction unit 31046 is a prediction method that generates a predicted image using an intra prediction mode that is not explicitly signaled.
  • the angle mode derivation device 310465 derives an intra prediction mode suitable for the current block using information on the neighboring region, and the DIMD prediction unit 31046 generates a temporary predicted image using this intra prediction mode. Details will be described later.
  • the predicted image correction unit 3105 corrects the provisional predicted image output from the prediction unit 3104 according to the intra prediction mode. Specifically, the predicted image correction unit 3105 derives a position-dependent weighting coefficient for each pixel of the provisional predicted image according to the reference region R and the position of the target predicted pixel. Then, the predicted image correction unit 3105 performs weighted addition (weighted averaging) of the reference sample s[][] and the provisional predicted image q[x][y] to derive a predicted image (corrected predicted image) Pred[][] obtained by correcting the provisional predicted image. Note that, in some intra prediction modes, the predicted image correction unit 3105 may set the provisional predicted image q[x][y] as a predicted image without correcting it.
  • Example 1 10 shows the configuration of the DIMD prediction unit 31046 in this embodiment.
  • the DIMD prediction unit 31046 is composed of a reference sample derivation unit 310460, an angle mode derivation device 310465 (gradient derivation unit 310461, angle mode derivation unit 310462), an angle mode selection unit 310463, and a temporary predicted image generation unit 310464.
  • the angle mode derivation device 310465 may include the angle mode selection unit 310463.
  • Figure 5 shows an example of the syntax of encoded data related to DIMD.
  • the prediction parameter derivation unit 320 decodes a flag dimd_flag indicating whether or not to use DIMD for each block from the encoded data. If dimd_flag for the target block is 1, the parameter decoding unit 302 does not need to decode syntax elements related to intra prediction mode (intra_mip_flag, intra_luma_mpm_flag, intra_luma_mpm_idx, intra_luma_mpm_reminder) from the encoded data.
  • intra_mip_flag is a flag indicating whether or not to perform MIP prediction.
  • intra_luma_mpm_flag is a flag indicating whether or not to use the prediction candidate Most Probable Mode (MPM).
  • intra_luma_mpm_idx is an index that specifies MPM when MPM is used.
  • intra_luma_mpm_reminder is an index that selects the remaining candidate when MPM is not used. If dimd_flag is 0, intra_luma_mpm_flag is decoded, and if intra_luma_mpm_flag is 0, intra_luma_mpm_reminder is also decoded. If dimd_flag of the current block is 1, dimd_mode of the current block is also decoded. dimd_mode indicates a reference region used to derive an intra prediction mode in DIMD prediction. The meaning of dimd_mode may be as follows:
  • dimd_flag 1
  • the DIMD prediction unit 31046 derives an angle indicating the texture direction in the adjacent region using pixel values. Then, a provisional predicted image is generated using an intra prediction mode corresponding to the angle. For example, (1) a gradient direction of pixel values is derived for a pixel at a predetermined position in the adjacent region. (2) The derived gradient direction is converted to a corresponding directional prediction mode (angular prediction mode).
  • a histogram of the obtained prediction direction is created for each predetermined pixel in the adjacent region.
  • a prediction mode of the most frequent value or a plurality of prediction modes including the most frequent value is selected from the histogram, and a provisional predicted image is generated using the prediction mode.
  • the reference sample derivation unit 310460 derives a reference sample refUnit from decoded pixels recSamples adjacent to the current block. Note that the operation of the reference sample derivation unit 310460 may be performed by the reference sample filter unit 3103.
  • FIG. 11 is a diagram showing an example of a reference region referred to by the DIMD prediction unit 31046.
  • the reference sample derivation unit 310460 stores adjacent images (images in the DIMD reference region) recSamples of the current block to be used by a gradient derivation unit 310461 and the predicted image generation unit 308, which will be described later, in a sample array refUnit.
  • the reference sample derivation unit 310460 derives a sample array refUnit from the left and top areas of the target block as follows.
  • refUnit[x][y] recSamples[xC+x][yC+y]
  • y -1-refIdxH..refH-1
  • refIdxW and refIdxH are constants indicating the width of the adjacent reference area to the left and the height of the adjacent reference area above.
  • refIdxW 2 or 3
  • extending refers to using adjacent images including the lower left adjacent area in addition to the left, and using adjacent images including the upper right adjacent area in addition to the top.
  • RTL is the area that combines RL and RT.
  • the reference sample derivation unit 310460 derives refUnit from the left area of the target block, for example, the above RL.
  • the reference sample derivation unit 310460 derives refUnit from the area above the target block, for example, the above RT.
  • the reference sample derivation unit 310460 may perform the following process.
  • the reference sample derivation unit 310460 derives refUnit from the left area of the target block, for example, the above RL.
  • the reference sample derivation unit 310460 derives refUnit from the area above the target block, for example, the above RT.
  • Figure 11(b) shows another example of the reference range in the gradient derivation of DIMD prediction.
  • DIMD_MODE_TOP an extended region including the adjacent region to the top and the upper right
  • the reference sample derivation unit 310460 may perform the following processing:
  • the reference sample derivation unit 310460 derives a sample array refUnit from the left and top areas of the target block as follows. First, the following process is carried out in the pixel range RL in the left region of the target block.
  • refUnit[x][y] recSamples[xC+x][yC+y]
  • y -1-refIdxH..refH-1
  • refW bW
  • refH bH.
  • refUnit[x][y] recSamples[xC+x][yC+y]
  • y -1-refIdxH..-1
  • refW bW
  • refH bH
  • RTL is the area (range of positions) that combines RL and RT.
  • the reference sample derivation unit 310460 derives refUnit from the left and bottom left areas of the target block, for example, RL_EXT.
  • the reference sample derivation unit 310460 may perform the following process.
  • the reference sample derivation unit 310460 derives refUnit from the left and bottom left areas of the target block, for example, RL_ADAP.
  • y -1-refIdxH..refH-1
  • the reference sample derivation unit 310460 derives refUnit from the upper and upper right areas of the target block, for example, RT_ADAP.
  • y -1-refIdxH..-1
  • the reference sample derivation unit 310460 may replace the value of the area that cannot be referenced because it is outside the target picture, outside the target subpicture, or outside the target slice boundary according to refUnit[x][y] with the pixel value derived above or a predetermined fixed value, for example, 1 ⁇ (bitDepth-1).
  • FIG. 12(a) shows another example of the reference range in gradient derivation for DIMD prediction.
  • the reference sample derivation unit 310460 sets the reference line numbers refIdxW and refIdxH in accordance with dimd_mode.
  • the reference sample derivation unit 310460 derives a sample array refUnit from the left and upper areas of the target block as follows.
  • the following process is performed in the pixel range RL of the area to the left of the target block.
  • refUnit[x][y] recSamples[xC+x][yC+y]
  • y -1-refIdxH..refH-1
  • refW bW
  • refUnit[x][y] recSamples[xC+x][yC+y]
  • y -1-refIdxH..-1
  • refW bW
  • the reference sample derivation unit 310460 sets the numbers of reference lines refIdxW and refIdxH in accordance with the block size.
  • bH > 8) ? N-1 : M-1
  • bH > 8) ? N-1 : M-1
  • the reference sample derivation unit 310460 derives refUnit[x][y] from recSamples[xC+x][yC+y] of RL in the left region of the target block and RT in the upper region.
  • Fig. 12(b) shows another example of the reference range in gradient derivation for DIMD prediction.
  • the direction of the reference region is selected according to the dimd mode, and at the same time, the number of lines of the reference region is also selected.
  • the reference sample derivation unit 310460 derives refUnit[x][y] from recSamples[xC+x][yC+y] of the left region RL and the top region RT.
  • the reference sample derivation unit 310460 derives refUnit[x][y] from the left area of the target block, for example, recSamples[xC+x][yC+y] of RL.
  • the reference sample derivation unit 310460 derives refUnit[x][y] from the area above the target block, for example, recSamples[xC+x][yC+y] of RT.
  • the gradient derivation unit 310461 derives an angle (angle information) indicating a texture direction based on pixel values of a gradient derivation target image.
  • the angle information may be a value representing an angle with 1/36 precision, or may be another value.
  • the gradient derivation unit 310461 derives gradients in two or more specific directions (e.g., Dx, Dy), and derives the gradient direction (angle information) from the relationship between the gradients Dx and Dy.
  • a spatial filter may be used to derive the gradient.
  • a 3x3 pixel Sobel filter corresponding to the horizontal and vertical directions as shown in Figures 13(a) and (b) may be used as the spatial filter.
  • the gradient derivation unit 310461 derives the gradient for point P[x][y] (hereinafter simply P) within the sample array refUnit[x][y] referenced and derived by the reference sample derivation unit 310460 in the gradient derivation target image. Note that it is also possible to configure the system to refer to recSamples[xC+x][yC+y] as point P instead of refUnit[x][y] without copying from recSamples to the sample array refUnit[x][y].
  • FIG. 14 shows an example of the positions of pixels to be subjected to gradient derivation in a target block of 8x8 pixels.
  • a shaded image in an adjacent region of the target block may be the image to be subjected to gradient derivation.
  • the image to be subjected to gradient derivation may also be a luminance image corresponding to the chrominance image of the target block.
  • the number of pixels to be subjected to gradient derivation, the position pattern, and the reference range of the spatial filter may be changed depending on information such as the size of the target block and the intra prediction mode of the blocks included in the adjacent region.
  • Dx and Dy are derived using the following equations.
  • Dx - P[x-1][y-1] - 2*P[x-1][y] - P[x-1][y+1] + P[x+1][y-1] + 2*P[x+1][y] + P[x+1][y+1]
  • Dy P[x-1][y-1] + 2*P[x][y-1] + P[x+1][y-1] - P[x-1][y+1] - 2*P[x][y+1] - P[x+1][y+1]
  • the method of deriving the gradient is not limited to this, and other methods (filters, formulas, tables, etc.) may be used.
  • a Prewitt filter or a Scharr filter may be used instead of the Sobel filter, and the filter size may be 2x2 or 5x5.
  • the gradient derivation unit 310461 derives Dx and Dy using a Prewitt filter as follows.
  • Dx P[x-1][y-1] + P[x-1][y] + P[x-1][y+1] - P[x+1][y-1] - P[x+1][y] - P[x+1][y1]
  • Dy - P[x-1][y-1] - P[x][y-1] - P[x+1][y-1] + P[x-1][y+1] + P[x][y+1] + P[x+1][y+1]
  • the following equation is an example of deriving Dx and Dy using a Scharr filter.
  • Dx 3*P[x-1][y-1]+10*P[x-1][y]+3*P[x-1][1] -3*P[x+1][y-1]-10*P[x+1][0]-3*P[x+1][y+1]
  • Dy -3*P[x-1][y-1]-10*P[x][y-1]-3*P[x+1][-1] +3*P[x-1][y+1]+10*P[x][1]+3*P[x+1][y+1]
  • the gradient derivation method may be changed for each block. For example, a Sobel filter is used for a target block of 4x4 pixels, and a Scharr filter is used for blocks larger than 4x4. In this way, by using a filter with simpler calculations for small blocks, the increase in the amount of calculations for small blocks can be suppressed.
  • the gradient derivation method may be changed for each position of the pixel for which the gradient is to be derived.
  • a Sobel filter is used for the pixel for which the gradient is to be derived that is in the upper or left adjacent region
  • a Scharr filter is used for the pixel for which the gradient is to be derived that is in the upper left adjacent region.
  • the gradient derivation unit 310461 derives angle information consisting of the quadrant (hereinafter referred to as region) of the texture angle of the target block and the angle within the quadrant based on the signs and magnitude relationship of Dx and Dy. Being able to express it by region makes it possible to standardize the processing of directions that are rotationally symmetric or line symmetric.
  • the angle information is not limited to the region and the angle within the quadrant.
  • the angle information may be information only about the angle, and the region may be derived as necessary.
  • the intra direction prediction modes derived below are limited to directions from the bottom left to the top right (2 to 66 in Figure 3), and intra direction prediction modes for directions that are rotationally symmetric by 180 degrees are treated the same.
  • Fig. 15(a) is a table showing the relationship between the signs (signx, signy) of Dx and Dy, the magnitude relationship (xgty), and the region (each of Ra to Rd is a constant representing the region).
  • Fig. 15(b) shows the quadrants indicated by the regions Ra to Rd.
  • the area indicates a rough angle, and can be derived only from the signs signx, signy of Dx, Dy and the magnitude relationship xgty.
  • the gradient derivation unit 310461 derives a region from the signs signx, signy and the magnitude relationship xgty using calculations and table references.
  • the gradient derivation unit 310461 may derive the corresponding region by referencing the table in FIG. 15(a).
  • the gradient derivation unit 310461 may derive the region using a logical formula as follows.
  • region xgty ? ( (signx ⁇ signy) ? 1 : 0 ) : ( (signx ⁇ signy) ? 2 : 3)
  • indicates XOR (exclusive OR).
  • the region is expressed as a value from 0 to 3.
  • ⁇ Ra, Rb, Rc, Rd ⁇ ⁇ 0, 1, 2, 3 ⁇ . Note that the way in which the region value is assigned is not limited to the above.
  • the angle mode derivation unit 310462 derives an angle mode (a prediction mode corresponding to the gradient, for example, an intra prediction mode) based on the gradient information of each point P described above.
  • FIG. 16 is a block diagram showing one configuration of the angle mode derivation unit 310462.
  • the angle mode mode_delta may be derived as follows using a first gradient, a second gradient, and two tables.
  • the angle mode derivation unit 310462 consists of an angle coefficient derivation unit 310466 and a mode conversion unit 310467.
  • An integer expressing ratio in increments of 1/R_UNIT is used as iRatio.
  • iRatio int(R_UNIT*absy/absx) ⁇ ratio*R_UNIT
  • the value norm_s1 is derived by shifting the first gradient (absx or absy) at a pixel by the logarithmic value x.
  • norm_s1 is used to reference the gradDivTable to derive the angle coefficient v.
  • idx is derived by the product of v and a second gradient (s0 or s1) different from the first gradient, and shifting the above logarithmic value x.
  • idx is used to reference a second table LUT (LUT') to derive the angle mode mode_delta.
  • idx min((s0 * v) ⁇ 3 >> x, N_LUT-1) Furthermore, it is also appropriate to clip and shift the multiplication by s0*v to a value equal to or less than a predetermined value KK so that the result does not exceed, for example, 32 bits.
  • s0*v (min(s0*v, KK) ⁇ 3) >> x
  • the mode conversion unit 310467 derives and outputs the second angle mode modeVal using mode_delta.
  • modeVal base_mode[region] + direction[region] * mode_delta
  • the angle mode derivation unit 310462 derives a histogram (frequency HistMode) of the value modeVal of the angle mode modeVal obtained for each point P.
  • the histogram may be obtained by incrementing the value of HistMode by 1 at each point P (hereinafter referred to as counting with a histogram).
  • the angle mode selection unit 310463 derives one or more representative values dimdModeVal (dimdModeVal0, dimdModeVal1, ...) of the angle mode using modeVal (modeVal) at multiple points P included in the gradient derivation target image.
  • the representative value of the angle mode in this embodiment is an estimated value of the directionality of the texture pattern of the target block.
  • the representative value dimdModeVal is derived from the most frequent value derived using the derived histogram.
  • the first mode dimdModeVal0 and the second mode dimdModeVal1 are derived by selecting the most frequent mode and the second most frequent mode in the frequency, respectively.
  • HistMode[x] is scanned for x, and the value of x that gives the maximum value of HisMode is set to dimdModeVal0, and the value of x that gives the second largest value is set to dimdModeVal1.
  • the angle mode selection unit 310463 may set the average value of modeVal to dimdModeVal0 or dimdModelVal1.
  • the angle mode selection unit 310463 sets a predetermined mode (for example, intra prediction mode or transform mode) to dimdModeVal2 as the third mode.
  • Another mode may be set adaptively, or the third mode may not be used.
  • the angle mode selection unit 310463 may further derive weights corresponding to the representative values of each angle mode for intra prediction in the provisional predicted image generation unit 310464 described later.
  • the sum of the weights is set to 64.
  • the derivation of the weights is not limited to this, and the weights w0, w1, and w2 of the first, second, and third modes may be adaptively changed. For example, w2 may be increased or decreased according to the number of the first or second mode, or the frequency or ratio thereof.
  • the angle mode selection unit sets the corresponding weight value to 0 for any of the first to third modes when that mode is not used.
  • the angle mode selection unit 310463 is equipped with an angle mode selection unit that selects an angle mode representative value from multiple angle modes derived for pixels in the image for which gradient is to be derived, thereby enabling the deriving of an angle mode with higher accuracy.
  • the angle mode selection unit 310463 selects the angle mode (representative value of the angle mode) estimated from the gradient, and outputs it together with the weight corresponding to each angle mode.
  • the region of the reference image used to derive the intra prediction mode from the reference image is changed according to dimd_mode.
  • the positions of points P of the gradient derivation unit 310461, the angle mode derivation unit 310462, and the angle mode selection unit 310463 are changed according to dimd_mode.
  • (Configuration example 1 of reference area according to mode) 17(a) shows an example of a reference range in gradient derivation for DIMD prediction.
  • the angle mode derivation device 310465 derives Dx, Dy from each point P in the left region RDL of the target block, derives modeVal, and counts it in a histogram.
  • Dx, Dy are derived from each point P of the RDT in the range of pixels in the region above the target block, and modeVal is derived and counted in a histogram.
  • RDTL is the combined domain of RDL and RDT.
  • the gradient derivation unit 310461 and angle mode derivation unit 310462 (hereinafter referred to as the angle mode derivation device 310465) derive Dx and Dy from the left region of the target block, for example the above RDL, derive modeVal, and count it in a histogram.
  • the angle mode derivation device 310465 derives Dx and Dy from the area above the target block, for example, the RDT above, derives modeVal, and counts it in a histogram.
  • the angle mode derivation device 310465 may perform the following processing.
  • the angle mode derivation device 310465 derives Dx and Dy from the left area of the target block, for example, the above RDL, derives modeVal, and counts it in a histogram.
  • the angle mode derivation device 310465 derives Dx and Dy from the area above the target block, for example, the RDT mentioned above, derives modeVal, and counts it in a histogram.
  • Figure 17(b) shows another example of the reference range in the gradient derivation of DIMD prediction.
  • an extended region including the adjacent region on the top and the top right is used.
  • the left and top regions are used without extension.
  • the angle mode derivation device 310465 may perform the following processing:
  • the reference sample derivation unit 310460 derives Dx, Dy from the left and bottom left areas of the target block, derives modeVal, and counts it in a histogram.
  • Dx, Dy are derived at point P at position (x, y) in the left region RDL of the target block, and modeVal is derived and counted in a histogram.
  • Dx, Dy are derived at point P at position (x, y) in region RDT above the target block, and modeVal is derived and counted in a histogram.
  • RDTL is the domain of RDL and RDT.
  • the angle mode derivation device 310465 derives Dx, Dy from the left and bottom left areas of the target block, for example RDL_EXT, and derives modeVal and counts it in a histogram.
  • the angle mode derivation device 310465 derives Dx, Dy from the region above the target block, for example, the above RDT_EXT, and derives modeVal and counts it in a histogram.
  • the angle mode derivation device 310465 may perform the following processing.
  • the reference sample derivation unit 310460 derives Dx and Dy from the left and bottom left areas of the target block, for example, RDL_ADAP, and derives modeVal and counts it in a histogram.
  • the angle mode derivation device 310465 derives Dx and Dy from the upper and upper right regions of the target block, for example, RDT_ADAP, and derives modeVal and counts it in a histogram.
  • x 1..refW-2.
  • the above configuration makes it possible to switch between at least the top and left, and the left and top of the target block as the adjacent images depending on the DIMD mode. Therefore, even if the characteristics of the target block differ from the characteristics of the adjacent area to the left or above, the intra prediction mode can be derived on the decoder side with high accuracy and high efficiency.
  • the decoder derives the intra prediction mode using the gradient of pixel values of an image adjacent to the target area
  • the angle gradient of the adjacent image and the angle gradient of the target block do not necessarily match. Even in such cases, the effect of improving accuracy is achieved by switching the derivation of the intra prediction mode depending on the properties of the adjacent blocks and the target block.
  • DIMD_MODE_TOP_LEFT when both the top and left are used (DIMD_MODE_TOP_LEFT), the top right and bottom left extension regions are not used, whereas when only the left (DIMD_MODE_LEFT) or only the top (DIMD_MODE_TOP) is used, the left and bottom left extension regions, and the top and top right extension regions are used, respectively, which has the effect of reducing the amount of processing required for sampling reference pixels, deriving gradients, and deriving histograms.
  • Fig. 12(a) shows another example of the reference range in gradient derivation for DIMD prediction.
  • the angle mode derivation device 310465 sets the reference line numbers refIdxW and refIdxH in accordance with dimd_mode.
  • the angle mode derivation unit 310465 derives Dx and Dy from the left region of the target block, for example, RDL, derives modeVal, and counts it in a histogram.
  • the angle mode derivation unit 310465 derives Dx and Dy from the area above the target block, for example, RDT, derives modeVal, and counts it in a histogram.
  • the angle mode derivation device 310465 sets the reference line numbers refIdxW and refIdxH in accordance with the block size.
  • bH > 8) ? N-1 : M-1
  • bH > 8) ? N-1 : M-1
  • the angle mode derivation device 310465 derives Dx and Dy from the left region RDL and the top region RDT of the target block, derives modeVal, and counts it in a histogram.
  • Configuration Example 6 According to Mode) 12(b) shows another example of the reference range in the gradient derivation of the DIMD prediction. In this example, the direction of the reference region is selected according to the dimd_mode, and at the same time, the number of lines of the reference region is also selected.
  • M-1 M-1
  • the reference sample derivation unit 310460 sets the number of reference lines to M, derives Dx and Dy from the left region RDL and the top region RDT, derives modeVal, and counts it in a histogram.
  • the angle mode derivation device 310465 sets the number of reference lines to N, derives Dx and Dy from the area to the left of the target block, for example, RDL, derives modeVal, and counts it in a histogram.
  • the angle mode derivation device 310465 sets the number of reference lines to N, derives Dx and Dy from the area above the target block, for example, RDT, derives modeVal, and counts it in a histogram.
  • the number of lines to be referenced is switched between when both the top and left are used as the reference area, and when only the left or only the top is used. This further makes it possible to derive an intra prediction mode according to the difference in the continuity of the characteristics of the target block and the adjacent blocks, thereby improving prediction accuracy.
  • the number of reference lines when using both the top and left is set to M
  • the number of reference lines when using only the left is set to N (where M ⁇ N).
  • This configuration has the effect of reducing the amount of processing required for sampling reference pixels, deriving gradients, and deriving histograms by referring to both the top and left.
  • the reference area to the left of the target block may be the left and bottom left reference areas, and the reference area above the target block may be the top and top right reference areas, or the top left of the target block may be referenced.
  • the prediction image generation unit (provisional prediction image generation unit) 310464 generates a prediction image (provisional prediction image) using one or more input angle mode representative values (intra prediction modes). When there is one intra prediction mode, an intra prediction image is generated in that intra prediction mode and output as a provisional prediction image q[x][y]. When there are multiple intra prediction modes, a prediction image (pred0, pred1, pred2) is generated in each intra prediction mode. Multiple prediction images are synthesized using the corresponding weights (w0, w1, w2) and output as a prediction image q[x][y].
  • the prediction image q[x][y] is derived as follows.
  • q[x][y] (w0 * pred0[x][y] + w1 * pred1[x][y] + w2 * pred2[x][y]) >> 6
  • the frequency of the second mode is 0 or it is not a directional prediction mode (such as DC mode)
  • the inverse quantization and inverse transform unit 311 inverse quantizes the quantized transform coefficients input from the prediction parameter derivation unit 320 to obtain transform coefficients.
  • the quantized transform coefficients are coefficients obtained by performing frequency transform such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform) on the prediction error in the encoding process and quantizing the transform coefficients.
  • the inverse quantization and inverse transform unit 311 performs inverse frequency transform such as inverse DCT or inverse DST on the transform coefficients to calculate the prediction error.
  • the inverse quantization and inverse transform unit 311 outputs the prediction error to the adder unit 312.
  • FIG. 18 is a block diagram showing the configuration of the inverse quantization and inverse transform unit 311 of this embodiment.
  • the inverse quantization and inverse transform unit 311 is composed of a scaling unit 31111, an inverse non-separable transform unit 31121, and an inverse separate transform unit 31123. Note that the transform coefficients decoded from the encoded data may be transformed using the angle mode derived by the angle mode derivation device 310465.
  • the inverse quantization and inverse transform unit 311 obtains the transform coefficients d[][] by scaling (inverse quantization) the quantized transform coefficients qd[][] input from the prediction parameter derivation unit 320 using the scaling unit 31111.
  • the quantized transform coefficients qd[][] are coefficients obtained by performing a transform such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform) on the prediction error in the encoding process and quantizing it, or coefficients obtained by further performing a non-separable transform on the transformed coefficients.
  • a transform such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform)
  • the inverse quantization and inverse transform unit 31121 performs an inverse transform.
  • an inverse frequency transform such as inverse DCT or inverse DST is performed on the transform coefficients to calculate the prediction error.
  • the inverse non-separable transform unit 31121 does not perform processing, and the scaling unit 31111 performs inverse frequency transform such as inverse DCT and inverse DST on the scaled transform coefficients to calculate a prediction error.
  • the inverse quantization and inverse transform unit 311 outputs the prediction error to the adder 312.
  • the adder 312 adds, for each pixel, the predicted image of the block input from the predicted image generation unit 308 and the prediction error input from the inverse quantization and inverse transform unit 311 to generate a decoded image of the block.
  • the adder 312 stores the decoded image of the block in the reference picture memory 306, and also outputs it to the loop filter 305.
  • FIG. 19 is a block diagram showing the configuration of the video encoding device 11 according to this embodiment.
  • the video encoding device 11 includes a prediction image generating unit 101, a subtraction unit 102, a transformation/quantization unit 103, an inverse quantization/inverse transformation unit 105, an addition unit 106, a loop filter 107, a prediction parameter memory (prediction parameter storage unit, frame memory) 108, a reference picture memory (reference image storage unit, frame memory) 109, an encoding parameter determining unit 110, a parameter encoding unit 111, an entropy encoding unit 104, and a prediction parameter derivation unit 120.
  • the predicted image generating unit 101 generates a predicted image for each CU, which is an area obtained by dividing each picture of the image T.
  • the predicted image generating unit 101 operates in the same way as the predicted image generating unit 308 already explained, and so a description thereof will be omitted.
  • the subtraction unit 102 subtracts the pixel values of the predicted image of the block input from the predicted image generation unit 101 from the pixel values of image T to generate a prediction error.
  • the subtraction unit 102 outputs the prediction error to the transformation and quantization unit 103.
  • the transform/quantization unit 103 calculates transform coefficients by frequency transforming the prediction error input from the subtraction unit 102, and derives quantized transform coefficients by quantizing it.
  • the transform/quantization unit 103 outputs the quantized transform coefficients to the entropy coding unit 104, the inverse quantization/inverse transform unit 105, and the coding parameter determination unit 110.
  • the inverse quantization and inverse transform unit 105 is the same as the inverse quantization and inverse transform unit 311 (FIG. 4) in the video decoding device 31, and a description thereof will be omitted.
  • the calculated prediction error is output to the addition unit 106.
  • the entropy coding unit 104 receives prediction parameters and quantized transform coefficients from the parameter coding unit 111.
  • the entropy coding unit 104 entropy codes the split information, prediction parameters, quantized transform coefficients, etc. to generate and output an encoded stream Te.
  • the parameter coding unit 111 instructs the entropy coding unit 104 to code the prediction parameters, quantization coefficients, etc. derived by the prediction parameter derivation unit 120.
  • the prediction parameter derivation unit 120 derives syntax elements from the parameters input from the encoding parameter determination unit 110.
  • the prediction parameter derivation unit 120 includes a configuration that is partially the same as the configuration of the prediction parameter derivation unit 320.
  • the adder 106 generates a decoded image by adding, for each pixel, the pixel values of the predicted image of the block input from the predicted image generation unit 101 and the prediction error input from the inverse quantization and inverse transform unit 105.
  • the adder 106 stores the generated decoded image in the reference picture memory 109.
  • the loop filter 107 applies a deblocking filter, SAO, and ALF to the decoded image generated by the adder 106.
  • SAO deblocking filter
  • ALF ALF
  • the loop filter 107 does not necessarily have to include the above three types of filters, and may be configured, for example, as only a deblocking filter.
  • the prediction parameter memory 108 stores the prediction parameters input from the prediction parameter derivation unit 120 in a predetermined location for each target picture and CU.
  • the reference picture memory 109 stores the decoded image generated by the loop filter 107 in a predetermined location for each target picture and CU.
  • the coding parameter determination unit 110 selects one set from among multiple sets of coding parameters.
  • the coding parameters are the above-mentioned QT, BT or TT division information, prediction parameters, or parameters to be coded that are generated in relation to these.
  • the predicted image generation unit 101 generates a predicted image using these coding parameters.
  • the coding parameter determination unit 110 calculates an RD cost value indicating the amount of information and the coding error for each of the multiple sets.
  • the coding parameter determination unit 110 selects the set of coding parameters that minimizes the calculated cost value.
  • the entropy coding unit 104 outputs the selected set of coding parameters as the coding stream Te.
  • the coding parameter determination unit 110 stores the determined coding parameters in the prediction parameter memory 108.
  • a part of the video encoding device 11 and video decoding device 31 in the above-mentioned embodiment for example, the entropy decoding unit 301, the parameter decoding unit 302, the loop filter 305, the predicted image generating unit 308, the inverse quantization and inverse transform unit 311, the addition unit 312, the predicted image generating unit 101, the subtraction unit 102, the transform and quantization unit 103, the entropy encoding unit 104, the inverse quantization and inverse transform unit 105, the loop filter 107, the encoding parameter determination unit 110, and the parameter encoding unit 111 may be realized by a computer.
  • a program for realizing this control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into and executed by a computer system.
  • the "computer system” referred to here is a computer system built into either the video encoding device 11 or the video decoding device 31, and includes hardware such as an OS and peripheral devices.
  • “computer-readable recording media” refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems.
  • “computer-readable recording media” may also include devices that dynamically store a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, or devices that store a program for a certain period of time, such as volatile memory within a computer system that serves as a server or client in such cases.
  • the above-mentioned program may be one that realizes part of the functions described above, or may be one that can realize the functions described above in combination with a program already recorded in the computer system.
  • part or all of the video encoding device 11 and video decoding device 31 in the above-mentioned embodiments may be realized as an integrated circuit such as an LSI (Large Scale Integration).
  • LSI Large Scale Integration
  • Each functional block of the video encoding device 11 and video decoding device 31 may be individually made into a processor, or part or all of them may be integrated into a processor.
  • the integrated circuit method is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Furthermore, if an integrated circuit technology that can replace LSI appears due to advances in semiconductor technology, an integrated circuit based on that technology may be used.
  • An image decoding device includes a reference sample derivation unit that selects a neighboring image of a current block in accordance with a DIMD mode, a gradient derivation unit that derives a pixel-by-pixel gradient using the selected neighboring image, and an angle mode selection unit that derives an intra-prediction mode from the gradient.
  • the image decoding device is characterized in that, in the above aspect 1, it includes an entropy decoding unit that decodes the DIMD flag and DIMD mode of the target block from the encoded data, and further includes a predicted image generating unit that, when the DIMD flag is true, further decodes the DIMD mode and generates a predicted image using the derived intra prediction mode.
  • the image decoding device is characterized in that in any of aspects 1 and 2 above, the DIMD mode switches at least between the top and left, and the left and top as the adjacent images.
  • the image decoding device is characterized in that in any of aspects 1 to 3 above, the DIMD mode is composed of a first bit and a second bit, and the first bit is a choice between top and left, and the second bit is a choice between left or top to select the adjacent image.
  • the image decoding device is characterized in that in any one of aspects 1 to 4, the entropy decoding unit decodes the DIMD mode using a context that holds a probability for decoding the first bit, and using an equal probability without using a context for decoding the second bit.
  • the image decoding device is characterized in that in any one of aspects 1 to 5, the entropy decoding unit decodes the DIMD mode using a context that holds probabilities for decoding the first bit and the second bit.
  • the image decoding device is any one of aspects 1 to 6 above, characterized in that the entropy decoding unit derives the context index using the width and height of the target block.
  • the image decoding device is any one of aspects 1 to 7 above, characterized in that the entropy decoding unit derives a context index from the second bit by using a determination of whether the target block is a square or not.
  • the image decoding device is characterized in that in any one of aspects 1 to 8 above, the gradient derivation unit changes the number of lines to be referenced depending on dimd_mode.
  • the image decoding device is characterized in that in any one of aspects 1 to 9 above, the gradient derivation unit changes the number of lines to be referenced depending on the size of the target block.
  • An image decoding device is characterized in that in any one of aspects 1 to 10 above, the gradient derivation unit changes the number of lines to be referenced and the reference direction according to dimd_mode.
  • the image encoding device includes a reference sample derivation unit that selects an adjacent image of a target block according to a DIMD mode, a gradient derivation unit that uses the selected adjacent image to derive a pixel-by-pixel gradient, and an angle mode selection unit that derives an intra prediction mode from the gradient.
  • Embodiments of the present invention can be suitably applied to a video decoding device that decodes coded data in which image data has been coded, and a video coding device that generates coded data in which image data has been coded.
  • the present invention can also be suitably applied to the data structure of coded data that is generated by a video coding device and referenced by the video decoding device.
  • Image Decoding Device 301 Entropy Decoding Unit 302 Parameter Decoding Unit 308 Prediction Image Generation Unit 31046 DIMD Prediction Department 310460 Reference sample derivation part 310465 Angle mode extraction device 310461 Gradient derivation part 310462 Angle mode derivation part 310463 Angle mode selection section 310464 Temporary predicted image generation unit 311 Inverse quantization and inverse transformation unit 312 Addition section 11 Image Encoding Device 101 Prediction image generation unit 102 Subtraction section 103 Transformation and Quantization Section 104 Entropy coding unit 105 Inverse quantization and inverse transformation section 107 Loop Filter 110 Encoding parameter determination unit 111 Parameter Encoding Unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention aborde le problème selon lequel, lorsqu'un mode de prédiction intra est dérivé sur le côté décodeur à l'aide du gradient d'une valeur de pixel d'une image adjacente à une région d'intérêt, le gradient angulaire de l'image adjacente et le gradient angulaire d'un bloc d'intérêt ne correspondent pas nécessairement l'un à l'autre. Ce dispositif de décodage d'image comprend : une unité de dérivation d'échantillon de référence qui sélectionne une image adjacente à un bloc d'intérêt conformément à un mode DIMD ; une unité de dérivation de gradient qui dérive le gradient d'une unité de pixel à l'aide de l'image adjacente sélectionnée ; et une unité de sélection de mode d'angle qui dérive un mode de prédiction intra à partir du gradient.
PCT/JP2023/036356 2022-10-11 2023-10-05 Dispositif de décodage d'image et dispositif de codage d'image WO2024080216A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022163200A JP2024056375A (ja) 2022-10-11 2022-10-11 画像復号装置および画像符号化装置
JP2022-163200 2022-10-11

Publications (1)

Publication Number Publication Date
WO2024080216A1 true WO2024080216A1 (fr) 2024-04-18

Family

ID=90669197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/036356 WO2024080216A1 (fr) 2022-10-11 2023-10-05 Dispositif de décodage d'image et dispositif de codage d'image

Country Status (2)

Country Link
JP (1) JP2024056375A (fr)
WO (1) WO2024080216A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012191295A (ja) * 2011-03-09 2012-10-04 Canon Inc 画像符号化装置、画像符号化方法及びプログラム、画像復号装置、画像復号方法及びプログラム
WO2018110462A1 (fr) * 2016-12-16 2018-06-21 シャープ株式会社 Dispositif de decodage d'image et dispositif de codage d'image
WO2019007492A1 (fr) * 2017-07-04 2019-01-10 Huawei Technologies Co., Ltd. Harmonisation de mémoire de ligne d'outil de dérivation intramode côté décodeur avec filtre de déblocage
US20190166370A1 (en) * 2016-05-06 2019-05-30 Vid Scale, Inc. Method and system for decoder-side intra mode derivation for block-based video coding
JP2019535211A (ja) * 2016-10-14 2019-12-05 インダストリー アカデミー コーオペレイション ファウンデーション オブ セジョン ユニバーシティ 画像の符号化/復号化方法及び装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012191295A (ja) * 2011-03-09 2012-10-04 Canon Inc 画像符号化装置、画像符号化方法及びプログラム、画像復号装置、画像復号方法及びプログラム
US20190166370A1 (en) * 2016-05-06 2019-05-30 Vid Scale, Inc. Method and system for decoder-side intra mode derivation for block-based video coding
JP2019535211A (ja) * 2016-10-14 2019-12-05 インダストリー アカデミー コーオペレイション ファウンデーション オブ セジョン ユニバーシティ 画像の符号化/復号化方法及び装置
WO2018110462A1 (fr) * 2016-12-16 2018-06-21 シャープ株式会社 Dispositif de decodage d'image et dispositif de codage d'image
WO2019007492A1 (fr) * 2017-07-04 2019-01-10 Huawei Technologies Co., Ltd. Harmonisation de mémoire de ligne d'outil de dérivation intramode côté décodeur avec filtre de déblocage

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Z. FAN, Y. YASUGI, T. IKAI (SHARP): "Non-EE2: Adaptive reference region DIMD", 28. JVET MEETING; 20221021 - 20221028; MAINZ; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 14 October 2022 (2022-10-14), XP030304493 *

Also Published As

Publication number Publication date
JP2024056375A (ja) 2024-04-23

Similar Documents

Publication Publication Date Title
KR102636668B1 (ko) 쿼드 트리를 이용한 블록 정보 부/복호화 방법 및 이러한 방법을 사용하는 장치
CN109716771B (zh) 用于视频译码的线性模型色度帧内预测
JP5587508B2 (ja) ビデオコード化のためのイントラ平滑化フィルタ
CN105144718B (zh) 当跳过变换时用于有损译码的帧内预测模式
KR101771332B1 (ko) 스케일러블 비디오 코딩을 위한 인트라 예측 개선들
KR101855269B1 (ko) 인트라 예측 방법과 이를 이용한 부호화 장치 및 복호화 장치
KR102013561B1 (ko) 비디오 코딩에서 예측 잔차 블록들의 재배치
KR20190055113A (ko) 비디오 코딩을 위한 가변 수의 인트라 모드들
KR20190006174A (ko) 필터링 정보의 시그널링
KR20160135226A (ko) 비디오 코딩에서 인트라 블록 카피를 위한 검색 영역 결정
KR20150138308A (ko) Shvc 를 위한 다수의 기초 계층 참조 화상
WO2024080216A1 (fr) Dispositif de décodage d'image et dispositif de codage d'image
KR20230170072A (ko) 교차-컴포넌트 샘플 적응적 오프셋에서의 코딩 강화
JP2024513160A (ja) クロスコンポーネントサンプル適応型オフセットにおける符号化向上
WO2023234200A1 (fr) Dispositif de décodage vidéo, dispositif de codage vidéo et dispositif de dérivation du mode d'angle
JP2023177425A (ja) 動画像復号装置および動画像符号化装置および角度モード導出装置
WO2024127909A1 (fr) Appareil de génération d'image de prédiction, appareil de décodage vidéo, appareil de codage vidéo et procédé de génération d'image de prédiction
WO2024116691A1 (fr) Dispositif de décodage vidéo et dispositif de codage vidéo
WO2023048165A1 (fr) Dispositif de décodage vidéo et dispositif de codage vidéo
JP2023183430A (ja) 動画像復号装置および動画像符号化装置および角度モード導出装置
JP2024047922A (ja) 画像復号装置および画像符号化装置
JP2024077148A (ja) 動画像復号装置および動画像符号化装置
JP2024092612A (ja) 動画像復号装置および動画像符号化装置
JP2024047921A (ja) 画像復号装置
KR20240042646A (ko) 교차-컴포넌트 샘플 적응적 오프셋에서의 코딩 강화

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23877221

Country of ref document: EP

Kind code of ref document: A1