WO2019194425A1 - Appareil et procédé destinés à appliquer un réseau neuronal artificiel à un codage ou décodage d'image - Google Patents

Appareil et procédé destinés à appliquer un réseau neuronal artificiel à un codage ou décodage d'image Download PDF

Info

Publication number
WO2019194425A1
WO2019194425A1 PCT/KR2019/002654 KR2019002654W WO2019194425A1 WO 2019194425 A1 WO2019194425 A1 WO 2019194425A1 KR 2019002654 W KR2019002654 W KR 2019002654W WO 2019194425 A1 WO2019194425 A1 WO 2019194425A1
Authority
WO
WIPO (PCT)
Prior art keywords
cnn
block
current block
prediction
picture
Prior art date
Application number
PCT/KR2019/002654
Other languages
English (en)
Korean (ko)
Inventor
나태영
이선영
신재섭
손세훈
김효성
임정연
Original Assignee
에스케이텔레콤 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020180072499A external-priority patent/KR102648464B1/ko
Priority claimed from KR1020180072506A external-priority patent/KR20200000548A/ko
Priority claimed from KR1020180081123A external-priority patent/KR102668262B1/ko
Priority claimed from KR1020180099166A external-priority patent/KR20190117352A/ko
Application filed by 에스케이텔레콤 주식회사 filed Critical 에스케이텔레콤 주식회사
Publication of WO2019194425A1 publication Critical patent/WO2019194425A1/fr
Priority to US17/064,304 priority Critical patent/US11265540B2/en
Priority to US17/576,000 priority patent/US20220141462A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness

Definitions

  • the present disclosure relates to image encoding or decoding, and more particularly, to an apparatus and method for applying an artificial neural network (ANN) to image encoding or decoding.
  • ANN artificial neural network
  • moving image data Since moving image data has a larger amount of data than audio data or still image data, hardware resources such as memory are consumed when the data source is stored or transmitted as it is. Therefore, in general, moving image data is stored or transmitted after being compressed using an encoder, and the compressed moving image data is reproduced after being decompressed using a decoder.
  • the in-loop filter of the conventional video encoding or decoding apparatus is replaced by a Convolutional Neural Network (CNN) filter, which is a kind of artificial neural network, and has a BBR of about 3.57% of BBR (Bjonteggrad-delta Bit Rate) It has been found that the gain can be achieved. Accordingly, an image encoding and decoding technique using an artificial neural network technology has attracted attention as a solution to the above problem.
  • CNN Convolutional Neural Network
  • Some techniques of this disclosure relate to mitigating quantization error and blocking degradation using CNN based filters.
  • a method of decoding an image using a CNN-based filter comprising: inputting at least one of a quantization parameter map and a block division map and a first picture to the CNN-based filter; Outputting two pictures, wherein the quantization parameter map indicates information about coding units constituting the first picture, and the block partition map indicates information about divided regions constituting the first picture It provides a method for decoding an image characterized by.
  • an input unit for receiving at least one of a quantization parameter map and a block division map and a first picture, the quantization parameter map input to the input unit and the A filter unit for applying coefficients of the CNN-based filter to at least one of the block division maps and the first picture, and at least one of the quantization parameter map and the block division map and the CNN-based filter to the first picture
  • An output unit configured to output a second picture by applying coefficients, wherein the quantization parameter map indicates a coding unit constituting the first picture, and the block partition map indicates information about a divided region constituting the first picture
  • an image improvement, a quantization error, and a blocking artifact may be solved using a filter learned through supervised learning.
  • Some techniques of this disclosure relate to performing CNN based intra prediction.
  • an image decoding method using a CNN-based intra prediction unit comprising: decoding transform coefficients for a current block to be decoded from a bitstream; Determining input data including a reference region decoded before the current block; Generating prediction pixels of the current block by applying a predetermined filter coefficient of a CNN to the input data; Inversely transforming the transform coefficients to generate residual signals for the current block; And reconstructing the current block by using the prediction pixels and the residual signals.
  • an image decoding apparatus using a CNN-based intra prediction unit includes: a decoder which decodes transform coefficients of a current block to be decoded from a bitstream; A CNN setting unit which determines input data including a reference region decoded before the current block; A CNN execution unit which generates prediction pixels of the current block by applying a predetermined filter coefficient of the CNN to the input data; An inverse transform unit inversely transforming the transform coefficients to generate residual signals for the current block; And an adder for reconstructing the current block by using the prediction pixels and the residual signals.
  • Some techniques of this disclosure relate to performing CNN based inter prediction.
  • the method includes: setting input data including a search region in at least one reference picture; Generating a motion vector of a current block or prediction pixels of the current block by applying a predetermined filter coefficient of a CNN to the input data; Inversely transforming transform coefficients extracted from a bitstream to generate residual signals for the current block; And reconstructing the current block by using the residual signals and the motion vector of the current block or the prediction pixels of the current block.
  • a CNN setting unit configured to set input data including a search region in at least one reference picture;
  • a CNN execution unit generating a motion vector of a current block or prediction pixels of the current block by applying a predetermined filter coefficient of the CNN to the input data;
  • an inverse transform unit which inversely transforms transform coefficients extracted from the bitstream to restore residual signals, wherein the current block is reconstructed using the residual signals and the motion vector of the current block or the prediction pixels of the current block.
  • generating a motion vector of a current block or prediction pixels of the current block by using a syntax element for an inter prediction mode extracted from a bitstream; Setting input data including a search region in at least one reference picture and a motion vector of the current block or prediction pixels of the current block; Generating a redefined motion vector of the current block or redefined prediction pixels of the current block by applying a predetermined filter coefficient of the CNN to the input data; Generating residual signals by inversely transforming transform coefficients extracted from the bitstream; And reconstructing the current block by using the residual signals and the redefined motion vector of the current block or the redefined prediction pixels of the current block.
  • a CNN setting unit for setting a motion vector of a current block generated using a syntax element for an inter prediction mode extracted from a bitstream or prediction pixels of the current block as input data;
  • a CNN execution unit which applies a predetermined filter coefficient of the CNN to the input data to generate a redefined motion vector of the current block or redefined prediction pixels of the current block;
  • an inverse transformer configured to inversely transform transform coefficients extracted from the bitstream to generate residual signals, wherein the current block includes the residual signals and a redefined motion vector of the current block or a redefined prediction pixel of the current block.
  • Some techniques of this disclosure relate to filtering a reference region used for intra prediction using a CNN based filter.
  • decoding the transform coefficients for the current block to be decoded from the bitstream Setting input data of a CNN by using a first reference region decoded before the current block; Generating a second reference region by filtering the first reference region by applying a predetermined filter coefficient of the CNN to the input data; Generating a prediction block of the current block by performing intra prediction using the second reference region; Inversely transforming the transform coefficients to generate residual signals for the current block; And reconstructing the current block by using the prediction block and the residual signals.
  • a decoding unit for decoding the transform coefficients for the current block to be decoded from the bitstream;
  • An intra predictor configured to generate a predictive block of the current block by performing intra prediction using a second reference region filtered from a first reference region selected from regions decoded before the current block;
  • An inverse transform unit inversely transforming the transform coefficients to generate residual signals for the current block;
  • an adder for reconstructing the current block by using the prediction block and the residual signals, wherein the second reference region is a preset filter coefficient of the CNN with respect to input data set by using the first reference region.
  • the image decoding apparatus is generated by filtering the first reference region by applying.
  • FIG. 1 is an exemplary block diagram of an image encoding apparatus that may implement techniques of this disclosure.
  • FIG. 2 is a diagram for explaining a method of dividing a block using a QTBTTT structure.
  • 3 is an exemplary diagram for a plurality of intra prediction modes.
  • FIG. 4 is an exemplary block diagram of an image decoding apparatus that may implement techniques of this disclosure.
  • FIG. 5 is a diagram illustrating an exemplary structure of a CNN that may be used in the techniques of this disclosure.
  • FIG. 6 illustrates a CNN-based filter according to an embodiment of the present invention.
  • FIGS. 7A to 7C are diagrams illustrating structures of a CNN having different positions of a concatenated layer according to an embodiment of the present invention.
  • 8A to 8C illustrate data to be input to an input layer of a CNN according to an embodiment of the present invention.
  • FIGS. 9A and 9B are diagrams illustrating an example of a block partitioning map according to an embodiment of the present invention.
  • FIGS. 10A to 10C are diagrams illustrating another example of a block partitioning map according to an embodiment of the present invention.
  • 11A to 11C illustrate block division maps for adjusting the strength of deblocking according to an embodiment of the present invention.
  • FIG. 12 illustrates a flowchart of decoding an image using a CNN-based filter according to an embodiment of the present invention.
  • FIG. 13 is a diagram schematically illustrating a configuration of an apparatus for decoding an image according to an embodiment of the present invention.
  • FIG. 14 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image encoding apparatus according to an embodiment of the present invention.
  • 15 is an exemplary diagram of a peripheral region that can be used as input data of a CNN.
  • 16 is a diagram illustrating an example of configuring an input layer of a CNN from a plurality of neighboring blocks.
  • FIG. 17 is an exemplary diagram for describing a prediction direction suitable for a current block in view of pixel value types of neighboring blocks.
  • 18 is an exemplary diagram of a layer configuration of a CNN including hint information.
  • FIG. 19 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image decoding apparatus according to an embodiment of the present invention.
  • FIG. 20 is a flowchart illustrating an operation of a CNN prediction unit that may be included in the image encoding apparatus illustrated in FIG. 14.
  • FIG. 21 is a flowchart illustrating an operation of a CNN predictor that may be included in the image decoding apparatus illustrated in FIG. 19.
  • FIG. 22 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image encoding apparatus according to an embodiment of the present invention.
  • 23 is an exemplary diagram of a layer configuration of a CNN.
  • 24 is an exemplary diagram of time-base distance information between a current picture and a reference picture.
  • 25 is an exemplary diagram of a layer configuration of a CNN including hint information.
  • FIG. 26 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image decoding apparatus according to an embodiment of the present invention.
  • 27A and 27B are flowcharts illustrating a process of performing inter prediction by a CNN prediction unit included in an image encoding apparatus according to an embodiment of the present invention.
  • 28A and 28B are flowcharts illustrating a process of performing inter prediction by a CNN prediction unit included in an image decoding apparatus according to an embodiment of the present invention.
  • 29 is a flowchart illustrating a method of calculating filter coefficients of a CNN according to an embodiment of the present invention.
  • FIG. 30 is an exemplary diagram of a peripheral region that can be used as input data of a CNN according to an embodiment of the present invention.
  • FIG. 31 is an exemplary diagram of a layer configuration of a CNN according to an embodiment of the present invention.
  • FIG. 32 is a block diagram illustrating a configuration of a CNN-based filter unit according to an embodiment of the present invention.
  • FIG 33 is a flowchart illustrating a filtering process of a reference region according to an embodiment of the present invention.
  • 34 is a flowchart illustrating a filtering process of a reference region according to an embodiment of the present invention.
  • FIG. 1 is an exemplary block diagram of an image encoding apparatus that may implement techniques of this disclosure.
  • an image encoding apparatus and subcomponents thereof will be described with reference to FIG. 1.
  • the image encoding apparatus includes a block splitter 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, an encoder 150, and inverse quantization.
  • the unit 160, an inverse transform unit 165, an adder 170, a filter unit 180, and a memory 190 may be configured.
  • Each component of the image encoding apparatus may be implemented in hardware or software, or a combination of hardware and software.
  • the functions of each component may be implemented in software, and the microprocessor may be implemented in a form in which each function of each software (component) is executed.
  • One image is composed of a plurality of pictures. Each picture is divided into a plurality of regions, and encoding is performed for each region. For example, one picture is divided into one or more tiles. Here, one or more tiles may be defined as a tile group. Each tile is divided into one or more coding tree units (CTUs). Each CTU is divided into one or more coding units (CUs) by a tree structure.
  • CTUs coding tree units
  • Information applied to each CU is encoded by the syntax of the CU, and information commonly applied to CUs included in one CTU is encoded by the syntax of the CTU.
  • information commonly applied to all blocks in one tile is encoded by the syntax of the tile or the syntax of the tile group to which the tile belongs.
  • Information applied to all blocks constituting one picture may be a picture parameter. It is encoded in a set (PPS, Picture Parameter Set) or picture header.
  • SPS sequence parameter set
  • VPS video parameter set
  • the block divider 110 determines the size of a coding tree unit (CTU).
  • CTU size Information on the size of the CTU (CTU size) is encoded in the syntax of the SPS or PPS and transmitted to the image decoding apparatus.
  • the block dividing unit 110 After dividing each picture constituting an image into a plurality of coding tree units (CTUs) having a predetermined size, the block dividing unit 110 recursively divides the divided CTUs using a tree structure. Split by recursively. A leaf node in the tree structure becomes a CU (coding unit) which is a basic unit of coding.
  • CU coding unit
  • the tree structure includes a quad tree (QT) in which a parent node (or parent node) is divided into four child nodes (or child nodes) of the same size, and a binary tree in which the parent node is divided into two child nodes.
  • QT quad tree
  • BT ternary tree
  • TT ternary tree
  • a structure in which two or more of a QT structure, a BT structure, and a TT structure are mixed may be included.
  • a Quad Tree plus Binary Tree (QTBT) structure may be used, and a Quad Tree plus Binary Tree Ternary Tree (QTBTTT) structure may be used.
  • QTBT Quad Tree plus Binary Tree
  • QTBTTT Quad Tree plus Binary Tree Ternary Tree
  • FIG. 2 is a diagram for explaining a method of dividing a block using a QTBTTT structure.
  • the CTU may first be divided into a QT structure.
  • Quadtree splitting may be repeated until the size of the splitting block reaches the minimum block size (MinQTSize) of the leaf nodes allowed in QT.
  • MinQTSize minimum block size
  • the leaf node of the QT is not larger than the maximum block size (MaxBTSize) of the root node allowed in BT, it may be further divided into one or more of the BT structure or the TT structure. In the BT structure and / or the TT structure, there may be a plurality of division directions.
  • MaxBTSize maximum block size
  • splitting a block of a corresponding node horizontally ie, horizontal splitting
  • vertically splitting ie, vertical splitting
  • a flag indicating whether nodes are segmented a flag indicating a division direction (vertical or horizontal), and / or a flag indicating a partition type (Binary or Ternary) are decoded. May be signaled to the device.
  • the asymmetric form may include a form of dividing a block of a node into two rectangular blocks having a size ratio of 1: 3, a form of dividing a block of a node in a diagonal direction, and the like.
  • the CTU is first divided into the QT structure, and then the leaf nodes of the QT may be further divided into the BT structure.
  • the CU may have various sizes depending on the QTBT or QTBTTT splitting from the CTU.
  • a block corresponding to a CU that is, a leaf node of QTBTTT
  • a 'current block' a block corresponding to a CU (that is, a leaf node of QTBTTT) to be encoded or decoded.
  • the prediction unit 120 predicts the current block and generates a prediction block.
  • the predictor 120 may include an intra predictor 122 and an inter predictor 124.
  • current blocks within a picture may each be predictively coded. Prediction of the current block may be performed using an intra prediction technique using data of the picture including the current block or an inter prediction technique using data of a coded picture before the picture containing the current block.
  • the intra predictor 122 predicts pixels in the current block by using pixels (reference pixels) positioned around the current block in the current picture including the current block.
  • the plurality of intra prediction modes may include a non-directional mode including a planar mode and a DC mode and 65 directional modes.
  • the surrounding pixels to be used and the expressions are defined differently for each prediction mode.
  • the intra predictor 122 may predict pixels in the current block using reference pixels through a CNN-based learning and inference process.
  • the intra prediction unit 122 may operate in parallel with the CNN-based intra prediction mode (hereinafter, also referred to as a “CNN mode”) together with the plurality of intra prediction modes illustrated in FIG. 3.
  • the intra prediction unit 122 may operate only the CNN mode.
  • the intra predictor 122 may determine an intra prediction mode to use to encode the current block.
  • intra prediction unit 122 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, intra predictor 122 calculates rate distortion values using rate-distortion analysis for several tested intra prediction modes, and has the best rate distortion characteristics among the tested modes. Intra prediction mode may be selected.
  • the intra predictor 122 selects one intra prediction mode from among the plurality of intra prediction modes, and predicts the current block by using a neighboring pixel (reference pixel) and an operation formula determined according to the selected intra prediction mode. As described later, in the CNN mode, the intra predictor 122 predicts the current block by using input data and coefficient values of the convolution kernel.
  • the intra prediction unit 122 performs intra prediction on the current block of the plurality of intra prediction modes in order to efficiently encode intra prediction mode information indicating which mode of the plurality of intra prediction modes is used as the intra prediction mode of the current block. Some of the most probable modes as prediction modes can be determined as MPM (most probable mode).
  • the MPM list may include intra prediction modes, planar mode, and DC mode of neighboring blocks of the current block.
  • the MPM list may further include a CNN mode.
  • first intra identification information indicating which mode of the MPMs is selected as the intra prediction mode of the current block is encoded by the encoder 150 and signaled to the image decoding apparatus. do.
  • the second intra identification information indicating which mode other than the MPM is selected as the intra prediction mode of the current block is transmitted to the encoder 150. Is encoded and signaled to the video decoding apparatus.
  • the inter prediction unit 124 searches for the block most similar to the current block in the reference picture encoded and decoded before the current picture through a motion estimation process, and uses the block found through the motion compensation process. Generate a predictive block for the block.
  • Inter prediction may be generally classified into uni-directional prediction and bi-directional prediction according to a prediction direction.
  • Unidirectional prediction refers to a method of predicting a current block using only pictures displayed before the current picture or only pictures displayed after the time axis.
  • Bidirectional prediction refers to a method of predicting a current block by referring to both a picture displayed before and a picture displayed after the current picture on the time axis.
  • the inter prediction unit 124 generates a motion vector corresponding to a displacement between the current block in the current picture and the prediction block in the reference picture.
  • motion estimation is performed on a luma component, and a motion vector calculated based on the luma component is used for both the luma component and the chroma component.
  • the motion information including the information about the reference picture and the motion vector used to predict the current block is encoded by the encoder 150 and transmitted to the image decoding apparatus.
  • Various methods may be used to reduce or minimize the amount of bits required to encode motion information.
  • Representative examples of these various methods include Skip mode, Merge mode, and Adaptive (Advanced) motion vector predictor (AMVP) mode.
  • the inter prediction unit 124 constructs a merge list including candidate blocks, and selects motion information to be used as motion information of the current block among motion information of candidate blocks included in the list.
  • a merge index value for identifying the selected motion information (selected candidate block) is generated.
  • the index value of the selected motion information is encoded and signaled to the image decoding apparatus.
  • index values for Skip / Merge mode are expressed through the merge_idx syntax.
  • motion vector predictor (MVP) candidates for the motion vector of the current block are derived using neighboring blocks of the current block.
  • the inter prediction unit 124 determines the prediction motion vector mvp of the motion vector of the current block, subtracts the prediction motion vector determined from the motion vector of the current block, and subtracts the differential motion vector (motion). vector difference, mvd).
  • the calculated differential motion vector is encoded and signaled to the image decoding apparatus.
  • the process of determining the predicted motion vector from the predicted motion vector candidates may be implemented through a predefined function (eg, median operation, average value operation, etc.).
  • a predefined function eg, median operation, average value operation, etc.
  • the video decoding apparatus is set to apply a predefined function.
  • the image decoding apparatus Since the neighboring blocks used to derive the predictive motion vector candidates correspond to blocks that have already been encoded and decoded, the image decoding apparatus has already recognized the motion vectors of the neighboring blocks. Therefore, since the information for identifying the predicted motion vector candidates does not need to be encoded, the apparatus for encoding an image encodes only the information about the differential motion vector and the information about the reference picture used to predict the current block.
  • the process of determining the prediction motion vector from the prediction motion vector candidates may be implemented by selecting any one of the prediction motion vector candidates.
  • the information for identifying the determined prediction motion vector is additionally coded together with the information about the differential motion vector and the reference picture used for predicting the current block.
  • the inter prediction unit 124 may predict the current block through a CNN-based inference process.
  • the filter coefficients of the CNN that is, the coefficients of the convolution kernel, may be set through the supervised learning process of the CNN.
  • the inter prediction unit 124 primarily generates motion information or prediction pixels of the current block according to an existing inter prediction method (ie, motion estimation (ME) and motion compensation (MC)). Then, the generated motion information or prediction pixels may be refined through CNN-based learning and inference processes to predict the current block.
  • an existing inter prediction method ie, motion estimation (ME) and motion compensation (MC)
  • the inter prediction unit 124 may operate in parallel with the CNN-based inter prediction method (hereinafter, referred to as a "CNN prediction mode") together with the existing inter prediction method.
  • the inter prediction unit 124 may independently operate only the CNN prediction mode by replacing the existing inter prediction method.
  • the subtractor 130 generates a residual block by subtracting the prediction block generated by the intra predictor 122 or the inter predictor 124 and the current block, and the transform unit 140 has a residual having pixel values in the spatial domain.
  • the residual signal in the block is transformed into transform coefficients in the frequency domain.
  • the transform unit 140 may convert the residual signals in the residual block using the size of the current block as a conversion unit, divide the residual block into a plurality of smaller subblocks, and convert the residual signals into a conversion unit of the subblock size. You may.
  • the residual block may be divided into sub-blocks having the same size, and may be divided by a quad tree (QT) method using the residual block as a root node.
  • QT quad tree
  • the quantization unit 145 quantizes the transform coefficients output from the transform unit 140, and outputs the quantized transform coefficients to the encoder 150.
  • the encoder 150 generates a bitstream by encoding the quantized transform coefficients by using an encoding method such as CABAC. Also, the encoder 150 encodes and signals information such as a CTU size, a QT split flag, a BTTT split flag, a split direction, and a split type associated with block division, so that the image decoding apparatus splits the block in the same manner as the image encoder. To be able.
  • the encoder 150 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and intra prediction information (that is, intra prediction mode) according to the prediction type.
  • Information or inter prediction information (information about a reference picture and a motion vector) is encoded.
  • the encoder 150 encodes information (flag) indicating whether to use the CNN based inter prediction scheme.
  • the encoder 150 encodes the information about the reference picture and the motion vector as the inter prediction information.
  • the encoder 150 encodes information necessary for performing CNN based inter prediction as inter prediction information.
  • the encoder 150 encodes information necessary for performing CNN based inter prediction.
  • Information necessary for performing CNN-based inter prediction may include selection information on input data or filter coefficients of the CNN, which will be described later in detail with reference to FIG. 22.
  • the inverse quantizer 160 inversely quantizes the quantized transform coefficients output from the quantizer 145 to generate transform coefficients.
  • the inverse transformer 165 restores the residual block by converting the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain.
  • the adder 170 reconstructs the current block by adding the reconstructed residual block and the prediction block generated by the predictor 120.
  • the pixels in the reconstructed current block are used as reference pixels for intra prediction of the next order block.
  • the filter unit 180 may reduce the blocking artifacts, the ringing artifacts, the blurring artifacts, and the like that occur due to block-based prediction and transformation / quantization.
  • Perform filtering on The filter unit 180 may include a deblocking filter 182 and a SAO filter 184.
  • the deblocking filter 180 filters the boundaries between the reconstructed blocks to remove blocking artifacts caused by the encoding / decoding of blocks.
  • the SAO filter 184 further filters the deblocking filtered image. Do this.
  • the SAO filter 184 corresponds to a filter used to compensate for the difference between the reconstructed pixel and the original pixel caused by lossy coding.
  • the reconstructed blocks filtered through the deblocking filter 182 and the SAO filter 184 are stored in the memory 190.
  • the reconstructed picture is used as a reference picture for inter prediction of a block in a picture to be encoded later.
  • FIG. 4 is an exemplary block diagram of an image decoding apparatus that may implement techniques of this disclosure.
  • an image decoding apparatus and subcomponents thereof will be described with reference to FIG. 4.
  • the image decoding apparatus includes a decoder 410, an inverse quantizer 420, an inverse transformer 430, a predictor 440, an adder 450, a filter 460, and a memory ( 470).
  • each component may be implemented as a hardware chip, and the functions of each component may be implemented in software and the microprocessor may be configured to execute the functions of each software. have.
  • the decoder 410 decodes the bitstream received from the image encoding apparatus, extracts information related to block division (partition information of the luma block and / or partition information of the chroma block), and uses the same to decode the current block to be decoded. It extracts the prediction information and the information about the residual signal necessary to recover the current block.
  • the decoder 410 extracts information on the CTU size from a Sequence Parameter Set (SPS) or Picture Parameter Set (PPS) to determine the size of the CTU, and divides the picture into a CTU of the determined size. In addition, the decoder 410 determines the CTU as the highest layer of the tree structure, that is, the root node, extracts partition information from the bitstream, and divides or reconstructs the block using the extracted information.
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • the decoder 410 extracts information about whether a block is divided into BT and a split type (dividing direction) for a node corresponding to a leaf node of a QT split, and divides the corresponding leaf node into a BT structure.
  • the decoder 410 extracts information (flag) on whether to split the QT, divides each node into four nodes of a lower layer, and splits the QT. For nodes corresponding to leaf nodes (nodes where QT splitting no longer occurs), information about whether the split is further divided by BT or TT, information on the split direction, and split type information that distinguishes the BT or TT structure is extracted. Recursively split into BT or TT structures.
  • the decoder 410 when dividing or restoring a block using the QTBTTT structure, extracts information (eg, a flag) on whether to split or not, and splits type information when the corresponding block is divided. Extract.
  • the partition type is QT
  • the decoder 410 divides each node into four nodes corresponding to lower layers. If the splitting type indicates that the splitting node is a leaf node of the QT splitting (node where QT splitting no longer occurs), that is, split into BT or TT, the decoder 410 additionally includes information on splitting direction and whether the BT or TT structure is present. Extract the partition type information for identifying the partition and split it into a BT or TT structure.
  • the decoder 410 extracts information about a prediction type indicating whether the current block is intra predicted or inter predicted.
  • the decoder 410 extracts a syntax element for intra prediction information (intra prediction mode) of the current block.
  • intra prediction mode a syntax element for intra prediction information
  • the decoder 410 extracts a syntax element of the inter prediction information, that is, a motion vector and information indicating the reference picture to which the motion vector refers (motion information of the current block).
  • the decoder 410 extracts information about quantized transform coefficients of the current block as information on the residual signal.
  • the inverse quantizer 420 inverse quantizes the quantized transform coefficients, and the inverse transformer 430 inversely transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to generate a residual block for the current block. .
  • the predictor 440 may include an intra predictor 442 and an inter predictor 444.
  • the intra predictor 342 is activated when the prediction type of the current block is intra prediction, and inter prediction.
  • Unit 344 is activated when the prediction type of the current block is intra prediction.
  • the intra prediction unit 442 determines the intra prediction mode of the current block among the plurality of intra prediction modes by using the syntax element for the intra prediction mode extracted from the decoder 410, and surrounds the current block according to the determined intra prediction mode. Predict the current block by using the reference pixels.
  • the intra prediction unit 442 infers the CNN by using the coefficients of the convolution kernel determined by the image encoding apparatus (ie, the filter coefficients). By predicting the current block can be predicted.
  • the inter prediction unit 444 determines a motion vector of the current block and a reference picture to which the motion vector refers by using syntax elements of the inter prediction mode extracted from the decoder 410, and then uses the motion vector and the reference picture. Predict the current block.
  • the inter prediction unit 444 may generate a motion vector of the current block or directly generate prediction pixels of the current block through a CNN-based inference process.
  • the inter prediction unit 444 first generates motion vectors or prediction pixels of the current block according to an existing inter prediction scheme (ie, motion compensation MC), and then secondly, the corresponding motion vectors or prediction pixels. By refining them through a CNN based inference process, the motion vector or the prediction pixels of the current block may be finally generated.
  • an existing inter prediction scheme ie, motion compensation MC
  • the inter prediction unit 444 may operate in parallel with the CNN-based inter prediction method together with the existing inter prediction method.
  • the decoder 410 further decodes information (eg, a flag) indicating whether the prediction type information indicates CNN based inter prediction.
  • the inter prediction unit 444 selectively performs an existing inter prediction method or CNN based inter prediction according to information indicating whether the decoder 410 is CNN based inter prediction.
  • the inter prediction unit 444 may independently operate only the CNN based inter prediction scheme. In this case, the inter prediction unit 444 performs CNN based inter prediction when the prediction type information (information indicating whether intra prediction or inter prediction) decoded by the decoder 410 indicates inter prediction.
  • the adder 450 reconstructs the current block by adding the residual block output from the inverse transformer 430 and the prediction block output from the inter predictor 444 or the intra predictor 442.
  • the pixels in the reconstructed current block are used as reference pixels for intra prediction of a block to be decoded later.
  • the filter unit 460 includes a deblocking filter 462 and a SAO filter 464.
  • the deblocking filter 462 removes blocking artifacts caused by block-by-block decoding by deblocking filtering the boundary between the reconstructed blocks.
  • the SAO filter 464 performs additional filtering on the reconstructed block after the deblocking filtering to compensate for the difference between the reconstructed pixel and the original pixel resulting from lossy coding.
  • the reconstructed block filtered through the deblocking filter 462 and the SAO filter 464 is stored in the memory 470, and when all the blocks in one picture are reconstructed, the reconstructed picture intercepts the block in the picture to be subsequently encoded. It is used as a reference picture for prediction.
  • the techniques of this disclosure generally relate to applying artificial neural network techniques to image encoding or decoding.
  • Some techniques of this disclosure relate to a CNN-based filter capable of performing the functions of a deblocking filter and a SAO filter in an image encoding apparatus and a decoding apparatus. Some other techniques of this disclosure relate to performing CNN based intra prediction. Some other techniques of this disclosure relate to performing CNN based inter prediction. Some other techniques of this disclosure relate to performing CNN based filtering on a reference region used for intra prediction of the current block.
  • FIG. 5 is a diagram illustrating an exemplary structure of a CNN that may be used in the techniques of this disclosure.
  • a convolutional neural network is a multilayer neural network having a special connection structure designed for image processing, and may include an input layer 510, a hidden layer 530, and an output layer 550.
  • the hidden layer 530 is positioned between the input layer 510 and the output layer 550 and may include a plurality of convolution layers 531 to 439.
  • the hidden layer 530 may further include an upsampling layer or a pooling layer in order to adjust the resolution of a feature map that is a result of the convolution operation.
  • the CNN may have a structure (not shown) in which ResNet is combined with VDSR (Very Deep Super Resolution) or VDSR.
  • All layers constituting the CNN each include a plurality of nodes, and each node may be interconnected with nodes of other adjacent layers so as to transfer an output value to which a predetermined connection weight is applied as an input of other nodes.
  • the convolution layers 531 to 539 may generate a feature map by performing a convolution operation on image data input to each layer using a convolution kernel (ie, a filter) in the form of a 2D matrix or a 3D matrix.
  • a convolution kernel ie, a filter
  • the feature map refers to image data in which various features of image data input to each layer are expressed.
  • the number of convolution layers 531 to 539, the size of the convolution kernel, and the like may be preset before the learning process.
  • the output layer 550 may be composed of a fully connected layer.
  • the nodes of the output layer 550 may output image data by combining various features expressed in the feature map.
  • the CNN algorithm can be divided into learning process and inference process.
  • the learning process may be classified into supervised learning, unsupervised learning, and reinforcement learning according to a learning method.
  • supervised learning refers to coefficient values (ie, filters) of the convolution kernel using an output label that is an explicit answer to data input to the input layer 510 (hereinafter referred to as 'input data'). Coefficient).
  • the filter coefficient of the CNN may be calculated through repeated supervised learning using an error backpropagation algorithm for predetermined input data. Specifically, according to the error backpropagation algorithm, the error between the output data of the CNN and the output label propagates in the reverse direction from the output layer 550 through the hidden layer 530 to the input layer 510. In the process of propagation of the corresponding error, the connection weights between the nodes are updated in the direction of reducing the corresponding error. In addition, by repeating the supervised learning process of the CNN until the corresponding error is less than a preset threshold, an optimal filter coefficient to be used in the inference process of the CNN may be calculated.
  • the CNN-based filter described below may be used in both the image encoding apparatus and the image decoding apparatus.
  • the CNN-based filter may be used in place of the deblocking filter 182 and the SAO filter 284 of the image encoding apparatus, and may be used in place of the deblocking filter 462 and the SAO filter 464 of the image decoding apparatus.
  • CNN-based filters are described using YUV as an example of information constituting pictures, but CNN-based filters may be applied to RGB, YCbCr, and the like. That is, in the following description, it should be understood that 'YUV to improve picture quality' may be 'RGB to improve picture quality' or 'YCbCr to improve picture quality'.
  • FIG. 6 illustrates a CNN-based filter according to an embodiment of the present invention.
  • a YUV difference ( 621 is output.
  • the YUV 601 to improve the image quality may be a YUV 601 reconstructed from the bit stream received from the encoder, and means a YUV in which the original YUV is artificially or unartificially damaged.
  • a hint (not shown) may also be input to the input layer.
  • the coefficients of the CNN-based filter that is, the coefficients of the convolution kernel
  • the coefficients of the CNN-based filter are trained such that the YUV difference 621 outputted to the output layer is the difference between the original YUV and the YUV to improve the image quality.
  • the convolution kernel is available in 2D and 3D forms.
  • the CNN-based filter 611 is for improving the image quality of the input YUV, and the final output of the CNN-based filter 611 is called YUV, that is, the YUV 631 with improved image quality.
  • the YUV to improve the image quality may be filtered for each channel or may be filtered at once.
  • the size of the QP map may be set to the same resolution as the input YUV to be filtered, and the value of the QP map may be filled with QP values used in coding units in the YUV plane, for example, blocks or sub-blocks.
  • the YUV is filtered for each channel, one map may be configured with the QP value of the channel to be filtered.
  • the QP values of the three channels may consist of three separate maps or one map having an average QP value.
  • block mode map information useful for the learning process may be added as hint information in addition to the QP map, the block division map, and the image to be improved in quality as an input layer.
  • the block mode map may be filled with a mode value used in a coding unit, for example, a block or a sub block.
  • the information may be information for distinguishing whether a block is encoded in an intra mode or an inter mode, and the information may be represented by a number.
  • the convolution kernel coefficient that is a result of the learning process may be set to include not only the data of the input layer but also a hint.
  • the input layer and output layer of the learning process and the inference process of the CNN technique should be configured identically.
  • the inference process generates the YUV with improved image quality from the YUV, the quantization parameter map, and the block partitioning map to improve the image quality by applying the coefficients of the CNN-based filter obtained in the learning process.
  • FIGS. 7A to 7C are diagrams illustrating structures of a CNN having different positions of a concatenated layer according to an embodiment of the present invention.
  • the YUV 701, the quantization parameter map 703, and the block partition map 705, which may improve the image quality input to the input layer 710, may be concatenated through the concatenate layer 720 during the CNN process. Can be. However, the position of the contiguous layer 720 may be changed as illustrated in FIGS. 7A, 7B, and 7C.
  • a contiguous layer 720 is positioned immediately after the input layer 710, and a YUV 701, a quantization parameter map 703, and a block partition map 705 are input to the input layer 710 to improve image quality. It shows the structure of the CNN that is immediately connected.
  • FIG. 7B illustrates the structure of the CNN in which the concatenated layer 720 is located between the convolution layers 730
  • FIG. 7C illustrates the structure of the CNN in which the concatenated layer 720 is positioned immediately before the output layer 740.
  • 8A to 8C illustrate data to be input to an input layer according to an embodiment of the present invention.
  • FIG. 8A illustrates that a Y plane (for example, a Y coding tree block (CTB)) for improving image quality is omitted in a pixel value of luminance luma for improving image quality.
  • CTB Y coding tree block
  • FIG. 8c shows a block division map of the Y plane to improve the image quality.
  • the block partitioning map indicates whether a block is divided or not, so that the processing of the partitioned boundary of the block and the inner region of the block may be differently performed during the CNN's learning process and inference process.
  • 9A and 9B show an example of a block partitioning map according to an embodiment of the present invention.
  • the block division map may be set to the same resolution as the YUV plane to be filtered and may be configured to indicate whether to block the partition.
  • the block partitioning map may represent a division boundary of the coding blocks in the coding tree block.
  • FIG. 9A illustrates a coding tree block divided by a quadtree plus binary tree (QTBT) scheme
  • FIG. 9B illustrates a block partitioning map according to the coding tree block. Referring to FIG. 9B, the boundaries of the coding blocks are indicated by '1' and the inside of the coding blocks are indicated by '0' in the block division map.
  • 10A to 10B show another example of a block partitioning map according to an embodiment of the present invention.
  • FIGS. 10A to 10B when the YUV plane consists of coding tree blocks, blocking deterioration processing on boundaries of the coding tree blocks may not be possible, and thus an extra area ⁇ may be added to the YUV plane.
  • 10A and 10B show an example in which 2 pixels are set as the extra area ⁇ , but other values may be set.
  • the boundary of the coding tree block may be indicated by the extra area, and whether or not the block is split may be displayed also in the area outside the coding tree block. If the filtering includes an extra area, an area overlapping with another adjacent coding tree block is generated after the filtering, and the overlapping area may be processed as an average value.
  • FIG. 10C when the coding tree blocks 1001 and 1003 including an extra area are adjacent to each other, an overlapping area 1005 is formed.
  • the overlapping region 1005 may be set to an average value of values of adjacent coding tree blocks 1001 and 1003.
  • 11A to 11C illustrate block division maps for adjusting the strength of deblocking according to an embodiment of the present invention.
  • the boundary of the coding block is distinguished by 1 pixel.
  • the value of the 1 pixel is 0, the inside of the coding block is represented, and the 1 is the boundary of the coding block.
  • the boundaries of the coding blocks are expressed by the number of pixels (or, pixel width, luma sample line, luma sample length, etc.) in order to control the strength of de-blocking.
  • the number of pixels may be determined by at least one of a size of a coding block, a value of a quantization parameter, and an encoding mode. For example, as shown in FIG. 11A, when the coding block is large, the number of pixels may be set to two, and when the coding block is small, the number of pixels may be set to one. In addition, if the value of the quantization parameter is large, the number of pixels may be increased. If the value of the quantization parameter is small, the number of pixels may be set. As another example, when the encoding mode is intra, the number of the pixels may be increased, and when the encoding mode is inter, the number of the pixels may be set to be small. These can all be set in reverse.
  • the number of pixels may mean the number of pixels located at a block boundary to be updated by filtering. For example, when a 3 pixel value in one block located at the block boundary line is to be updated, the boundary of the block may be indicated by 3 pixels in the block division map. As another example, the number of pixels may mean the number of pixels located at a block boundary line to be referred to for filtering. For example, when filtering is performed by referring to 4 pixel values of a block located at a block boundary line, the block boundary may be indicated by 4 pixels in the block division map.
  • the boundary value of the coding block is displayed differently in order to adjust the strength of deblocking.
  • the boundary value of the coding block may be determined by at least one of a size of a coding block, a value of a quantization parameter, an encoding mode, a number of pixels to be updated, and a number of pixels to be referred to for filtering.
  • the boundary value of the coding block can be set large, and conversely, the coding block is small, or the value of the quantization parameter is small. If the encoding mode is inter, the boundary value of the coding block may be set small. Both of these can also be set in reverse.
  • FIG. 11C shows the number of pixels at the boundary of the coding block and the boundary value of the coding block in order to control the strength of deblocking. Descriptions thereof will be omitted herein as described with reference to FIGS. 11A and 11B.
  • the configured block partitioning map is used in the learning process to help the CNN filter to operate as a strong deblocking filter.
  • FIG. 12 illustrates a flowchart of decoding an image using a CNN-based filter according to an embodiment of the present invention.
  • At least one of a quantization parameter map and a block partitioning map and a YUV for improving image quality are input to the CNN-based filter (1201).
  • the quantization parameter map may be set to the same resolution as the YUV to improve the image quality.
  • the block partitioning map may be displayed differently between the partitioned boundary of the block and the inner region of the block.
  • the number and value of pixels representing the partitioned boundary of the block in the block division map are determined by at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated, and the number of pixels to be referred to for filtering. Can be determined.
  • a YUV having improved image quality is output using a coefficient of the CNN-based filter learned using the original YUV as a final output (1203).
  • a hint such as a block mode map is additionally input to the CNN-based filter
  • the hint of the CNN-based filter is additionally input and learned.
  • FIG. 13 is a diagram schematically illustrating a configuration of an apparatus for decoding an image according to an embodiment of the present invention.
  • the apparatus illustrated in FIG. 13 may be, for example, a component or a module corresponding to the filter unit 460 of FIG. 4.
  • the apparatus for decoding the image may include an input unit 1301, a filter unit 1303, and an output unit 1305. Other configurations may be included, but a description of components not directly related to the present disclosure will be omitted.
  • the input unit 1301 inputs at least one of a quantization parameter map and a block division map and a YUV for improving image quality.
  • the quantization parameter map may be set to the same resolution as the YUV to improve the image quality, and the block partitioning map may be displayed differently between the partitioned boundary of the block and the internal region of the block.
  • the number and value of pixels representing the divided boundary of the block in the block division map may be determined by at least one of a size of a coding block, a value of a quantization parameter, and an encoding mode.
  • the filter unit 1303 applies at least one of the quantization parameter map and the block division map input to the input unit 1301 and the coefficients of the CNN-based filter learned to the YUV to improve the image quality.
  • the output unit 1305 outputs a YUV having an improved image quality by applying at least one of the input quantization parameter map and the block division map and the coefficients of the CNN-based filter learned to the YUV to improve the image quality.
  • the input unit 1301, the filter unit 1303, and the output unit 1305 have been described, but they may be integrated into one configuration and implemented, and one configuration may be divided into several configurations. It may be.
  • Some techniques of this disclosure relate to performing CNN based intra prediction.
  • a technique of performing CNN-based intra prediction will be described with reference to FIGS. 14 to 21.
  • FIG. 14 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image encoding apparatus according to an embodiment of the present invention.
  • the CNN predictor 1400 of FIG. 14 may be, for example, an intra predictor 122 of the image encoding apparatus illustrated in FIG. 1 or a module included in the intra predictor 122.
  • the CNN prediction unit 1400 may restore the encoding object image (that is, the original image) transferred from the block splitter (eg, 110 of FIG. 1) and the adder (eg, 170 of FIG. 1).
  • a prediction block may be generated by performing CNN based intra prediction on an image.
  • the CNN prediction unit 1400 may include a CNN setting unit 1410 and a CNN execution unit 1430.
  • the CNN setting unit 1410 may calculate filter coefficients, that is, coefficients of a convolution kernel, by performing supervised learning using a CNN composed of a plurality of layers.
  • the structure of the CNN is as described above with reference to FIG. 5, and the CNN may further include an upsampling layer or a pulling layer to adjust the size of the layer.
  • Image data input to the input layer may be configured as a reference region encoded before the current block.
  • the reference region includes a neighboring region adjacent to the current block, and a block of components (hereinafter, referred to as 'current blocks of other channels') that are encoded before the block of the component to be encoded among the luma blocks and chroma blocks constituting the current block. It may include at least one block (or area).
  • the peripheral area may be an area of the same channel as the current block or an area of another channel.
  • the peripheral area may be configured in block units (ie, peripheral blocks) or in pixel units (ie, peripheral pixels or peripheral lines).
  • the reference region may further include a new region (ie, an average block, an average pixel, or an average line) generated by averaging pixel values of the peripheral region.
  • FIG. 15 is an exemplary diagram of a peripheral region that can be used as input data of a CNN. Specifically, FIG. 15A illustrates a peripheral area in block units, and FIG. 15B illustrates a peripheral area in pixel units.
  • the reference area that is, the neighboring blocks in block units, includes the left block C, the upper block B, the right upper block D, and the lower left block E adjacent to the current block X.
  • original blocks ie, uncoded blocks
  • prediction blocks ie, prediction blocks
  • reconstruction blocks of neighboring blocks are denoted differently.
  • the original block is denoted by 'Ao'
  • the predictive block is denoted by 'Ap'
  • the reconstructed block is denoted by 'Ar'.
  • an average block obtained by averaging pixel values of the neighboring blocks A, B, C, D, and E is denoted by 'F'.
  • the pixel-based reference region may include pixels of '1 ⁇ 1' adjacent to the current block X, and lines of '1 ⁇ n' or 'n ⁇ 1'. .
  • the reference region in the block unit has a wider application range of the convolution kernel than the reference region in the pixel unit, the accuracy of the CNN's learning process and inference process can be improved.
  • the present embodiment will be described on the premise that the reference region is in block units.
  • chroma blocks are either used at their original size, or up-scaled using an upsampling layer to be the same size as the luma block. Can be used.
  • the right block of the current block X When neighboring blocks of a channel different from the current block are input to the input layer, the right block of the current block X, in addition to the neighboring blocks Ar, Br, Cr, Dr, Er shown in FIG.
  • One or more blocks (not shown) of the lower block and the lower right block may be further input to the input layer.
  • the accuracy of intra prediction may be improved by adding one or more blocks among the right block, the lower block, and the right lower block of the current block of the luma channel, which have already been encoded, as input data.
  • 16 is a diagram illustrating an example of configuring an input layer of a CNN from a plurality of neighboring blocks.
  • the input layer may be composed of a plurality of layers for each of the neighboring blocks Ar, Br, Cr, Dr, and Er, as shown in FIG. 16A, and as shown in FIG. 16B. Peripheral blocks of Ar and Br may be integrated into one layer.
  • the image data output from the output layer may be a prediction block of the current block.
  • the output label may be composed of original blocks (ie, unencoded blocks) of the current block for supervised learning through comparison with the output data.
  • Table 1 shows some example configurations of the CNN layer.
  • the exemplary embodiments are not intended to limit the embodiments to which the technology of the present disclosure may be applied thereto.
  • CNN layer example Input layer Output layer data data Label Example 1 Neighboring blocks of the same channel as the current block Predictive block of current block Original block of the current block
  • Example 2 -Current block of channel different from current block Predictive block of current block Original block of the current block
  • Example 3 Neighboring blocks of the same channel as the current block and their average blocks current blocks of other channels
  • Example 4 Peripheral blocks of the same channel as the current block and their average blocks current blocks of the other channels neighbor blocks of the other channels and their average blocks Predictive block of current block Original block of the current block
  • the data of the input layer may be configured in various combinations
  • the data of the output layer is the prediction block of the current block
  • the label of the output layer is the original block of the current block.
  • Input data and output data should be the same in the learning process and inference process of CNN, respectively.
  • the CNN setting unit 1410 may set the hint information to minimize the error between the output data and the output label and improve the accuracy of intra prediction.
  • the hint information may include at least one of directional information of intra prediction, a quantization parameter (QP) of the current block or reference region, and an absolute sum of transform coefficients or residual signals of the neighboring block (ie, the amount of residual). It may include.
  • the hint information may be transmitted to the image decoding apparatus through the bitstream and used to decode the current block.
  • 17 is an exemplary diagram for describing a suitable prediction direction for a current block in view of the pixel value form of neighboring blocks.
  • neighboring blocks of the current block X are composed of an upper left block A, an upper block B, and a left block C.
  • FIG. 17 neighboring blocks of the current block X are composed of an upper left block A, an upper block B, and a left block C.
  • the upper left block A is about half white, and the left block C is mostly white.
  • the upper block (B) is mostly colored other than white.
  • the upper left block A and the left block C are mostly white, but the upper block B is white. Usually color other than white. Considering that most pixel value forms of the current block X have a color other than white, it can be seen that performing intra prediction in the vertical direction (vertical direction) can maximize the prediction accuracy.
  • the CNN prediction unit 1400 intends to improve the accuracy of intra prediction by using the directional information of the intra prediction as hint information of the learning process and the inference process of the CNN.
  • the directional information of intra prediction may be an intra prediction mode number indicating 65 directional modes and non-directional modes illustrated in FIG. 3.
  • the hint information including one or more prediction directional information may be encoded by the encoder 150 of the image encoding apparatus of FIG. 1 and transmitted to the image decoding apparatus of FIG. 4.
  • the CNN setting unit 1410 selects some of the 65 prediction directions (eg, horizontal direction, vertical direction, diagonal down-right direction, diagonal up-right direction, etc.) as a representative direction, and selects one of the selected representative directions. May be set as hint information for intra prediction of the current block.
  • the CNN setting unit 1410 may transmit the hint information to the image decoding apparatus in a manner similar to that of the most probable mode (MPM).
  • MPM most probable mode
  • the hint information may include a quantization parameter (QP) indicating the strength of the quantization.
  • QP quantization parameter
  • the QP may be a QP value applied to the quantization process of the current block or reference region.
  • the hint information may include the amount of residuals.
  • the amount of residual may be the sum of the transform coefficients of the neighboring block or the absolute value of the residual signals.
  • the hint information may be composed of one or more maps and may be concatenated with a layer of the CNN.
  • the map for the hint information may be concatenated at various locations between the input layer and the output layer.
  • the map for the hint information may be concatenated immediately after the input layer as shown in FIG. 18, or may be concatenated immediately before the output layer.
  • the input data may be configured in various combinations according to the direction of intra prediction.
  • the directionality of the intra prediction is the horizontal direction (horizontal direction)
  • the input data may be composed of one or more blocks selected from the left neighboring blocks Ar, Cr, Er of the current block X and their average blocks. Can be.
  • the input data may consist of one or more blocks selected from the upper neighboring blocks Ar, Br, and Dr of the current block X and their average blocks. Can be.
  • the CNN setting unit 1410 may calculate filter coefficients through an iterative learning process using an error backpropagation algorithm in order to minimize an error between the output data and the output label.
  • the error between the output data and the output label may be propagated in the reverse direction from the output layer of the CNN to the input layer via the hidden layer.
  • the connection weights between nodes may be updated in a direction to reduce the corresponding error.
  • the CNN setting unit 1410 may calculate the filter coefficients by repeating the learning process of the CNN using the error backpropagation algorithm until the corresponding error is less than a predetermined threshold.
  • the above filter coefficient calculation process may be performed in a predetermined unit (eg, CU, CTU, slice, frame, or sequence (group of frames)).
  • the CNN setting unit 1410 may calculate filter coefficients for each current block or may calculate filter coefficients for each frame.
  • the filter coefficient When the filter coefficient is calculated in units of frames, the filter coefficient may be commonly used for intra prediction of a plurality of current blocks included in the frame.
  • the prediction direction information which is one of the hint information, may also be plural.
  • one map when the intra prediction directional information is composed of one map, one map may include a plurality of directional values.
  • the calculated information about the filter coefficients may be transmitted to the image decoding apparatus through the bitstream and used for the image decoding process.
  • the CNN setting unit 1410 may configure a filter coefficient set by previously calculating a plurality of filter coefficients using predetermined sample images.
  • the CNN setting unit 1410 may set one filter coefficient selected according to a predetermined criterion in the set as the filter coefficient for the current block. For example, the CNN setting unit 1410 may select one filter coefficient from the set based on the similarity of pixel values between the current block and the sample images. Alternatively, the CNN setting unit 1410 may select a filter coefficient closest to the filter coefficient calculated through one learning process from the set.
  • the selection information of the filter coefficient for example, the index information, may be transmitted to the image decoding apparatus through the bitstream and used for the image decoding process.
  • FIG. 14 illustrates that the CNN setting unit 1410 is included in the CNN prediction unit 1400, it should be noted that this is exemplary and the present embodiment is not limited thereto. That is, the CNN setting unit 1410 may be implemented as a separate unit from the CNN prediction unit 1400, or may be integrated with the CNN execution unit 1430 and implemented as one unit.
  • the CNN execution unit 1430 performs a CNN-based inference process on the input data using the filter coefficients set by the CNN setting unit 1410, that is, the coefficient values of the convolution kernel, thereby outputting the output data, that is, the current block.
  • a prediction block can be generated.
  • the generated prediction block may be transferred to a subtractor of the image encoding apparatus and used to generate a residual block from the current block.
  • FIG. 19 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image decoding apparatus according to an embodiment of the present invention.
  • the CNN predictor 1900 of FIG. 19 may be, for example, an intra predictor 442 of the image decoding apparatus illustrated in FIG. 4, or a module included in the intra predictor 442.
  • the CNN predictor 1900 of FIG. 19 differs only in a method of setting input signals and filter coefficients, that is, coefficient values of a convolution kernel, from the CNN predictor 1400 of FIG. 14. The description will be omitted or briefly described.
  • the CNN prediction unit 1900 may generate a prediction block by performing CNN-based intra prediction based on a reconstructed image.
  • the CNN prediction unit 1900 may include a CNN setting unit 1910 and a CNN execution unit 1930.
  • the structure of the CNN is as described above with reference to FIG. 5, and the CNN may further include an upsampling layer or a pulling layer to adjust the size of the layer.
  • the image data input to the input layer (hereinafter referred to as 'input data') may be configured as a reference region decoded before the current block.
  • the reference region includes a neighboring region adjacent to the current block, and a block of components decoded before the block of the component to be decoded among the luma blocks and chroma blocks constituting the current block (hereinafter, referred to as a current block of another channel). It may include at least one block (or area).
  • the peripheral area may be an area of the same channel as the current block or an area of another channel.
  • the peripheral area may be configured in block units (ie, peripheral blocks) or in pixel units (ie, peripheral pixels or peripheral lines).
  • the reference region may further include a new region (ie, an average block, an average pixel, or an average line) generated by averaging pixel values of the peripheral region.
  • the input data may be composed of neighboring blocks of the same channel as the current block and current blocks of a channel different from the average block thereof.
  • the input layer may be composed of a plurality of layers for each of the neighboring blocks, or a plurality of neighboring blocks may be integrated into one layer.
  • the image data output from the output layer (hereinafter, referred to as 'output data') may be a prediction block of the current block.
  • CNN layer Some configuration examples of the CNN layer are as described above with reference to Table 1. However, it should be noted that this is exemplary and does not limit the present embodiment.
  • the CNN setting unit 1910 may configure one or more maps using the hint information transmitted from the image encoding apparatus, and then concatenate at various positions between the input layer and the output layer.
  • the hint information is information for improving the accuracy of intra prediction.
  • the hint information includes information on prediction directionality, a quantization parameter (QP) of the current block or reference region, and an absolute sum of transform coefficients or residual signals of a neighboring block (ie, residual). It may comprise at least one of).
  • QP quantization parameter
  • residual absolute sum of transform coefficients or residual signals of a neighboring block
  • the prediction directional information included in the hint information may be an intra prediction mode number indicating 65 directional modes and a non-directional mode, or may be index information indicating any one or more representative directions selected from the 65 directional modes. have.
  • the input data may be configured in various combinations according to the direction of intra prediction.
  • the directionality of the intra prediction is the horizontal direction (horizontal direction)
  • the input data may be composed of one or more blocks selected from the left neighboring blocks of the current block and their average blocks.
  • the directionality of the intra prediction is in the vertical direction (vertical direction)
  • the input data may be composed of one or more blocks selected from upper neighboring blocks of the current block and their average blocks.
  • the CNN setting unit 1910 may set the filter coefficients transmitted from the image encoding apparatus as filter coefficients for intra prediction of the current block.
  • the filter coefficient may be a value calculated by a video encoding apparatus in a predetermined unit, for example, a CU unit or a frame unit.
  • the filter coefficient When the filter coefficient is set in units of frames, the filter coefficient may be commonly used for intra prediction of a plurality of current blocks included in the frame.
  • the prediction direction information which is one of the hint information, may also be plural.
  • the directional information of intra prediction is composed of one map, but may include a plurality of directional values in one map.
  • the CNN setting unit 1910 may set filter coefficients for intra prediction of the current block based on index information of the filter coefficients transmitted from the image encoding apparatus. have.
  • FIG. 19 illustrates that the CNN setting unit 1910 is included in the CNN prediction unit 1900, it should be noted that this is exemplary and the present embodiment is not limited thereto. That is, the CNN setting unit 1910 may be implemented as a separate unit from the CNN prediction unit 1900. In addition, the CNN setting unit 1910 may be integrated with the CNN execution unit 1930 and implemented as one unit.
  • the CNN execution unit 1930 performs a CNN-based inference process on the input data using the filter coefficients set by the CNN setting unit 1910, that is, the coefficient values of the convolution kernel, and thus, output data, that is, the current block, for the current block.
  • a prediction block can be generated.
  • the generated prediction block may be transferred to an adder and added to the residual block to be used to recover the current block.
  • FIG. 20 is a flowchart illustrating an operation of a CNN prediction unit that may be included in the image encoding apparatus illustrated in FIG. 14.
  • the CNN setting unit 1410 may set input data and an output label of the CNN.
  • the input data may be composed of a reference region encoded before the current block.
  • the input data may be composed of neighboring blocks of the same channel as the current block.
  • the input data may be composed of neighboring blocks of the same channel as the current block, average blocks thereof, and current blocks of channels different from the current block.
  • the data of the output layer may be a prediction block of the current block, and the label of the output layer may be composed of the original block of the current block.
  • the CNN setting unit 1410 may set the directional information of the prediction as the hint information in order to improve the accuracy of the intra prediction.
  • the set hint information may be transmitted to the image decoding apparatus through the bitstream and used to decode the current block.
  • the input data may be configured in various combinations according to the direction of intra prediction.
  • the CNN setting unit 1410 may calculate filter coefficients through a learning process.
  • the CNN setting unit 1410 may repeat the learning process using an error backpropagation algorithm to improve the accuracy of intra prediction.
  • the filter coefficient calculation process may be performed in a predetermined unit, for example, a frame unit or a block unit.
  • the CNN setting unit 1410 may configure a filter coefficient set by calculating a plurality of filter coefficients in advance by using predetermined sample images. In this case, the CNN setting unit 1410 may set one filter coefficient selected according to a predetermined criterion in the set as the filter coefficient for the current block.
  • the CNN execution unit 1430 performs the CNN-based inference process on the input data using the filter coefficients set by the CNN setting unit 1410, that is, the coefficient values of the convolution kernel, thereby outputting the data.
  • a prediction block for the current block may be generated.
  • the generated prediction block may be transferred to a subtractor (eg, 130 of FIG. 1) of the image encoding apparatus and used to generate a residual block from the current block.
  • FIG. 21 is a flowchart illustrating an operation of a CNN predictor that may be included in the image decoding apparatus illustrated in FIG. 19.
  • the CNN setting unit 1910 may set filter coefficients for intra prediction of the current block based on information about filter coefficients transmitted from the image encoding apparatus.
  • the input data of the CNN may be composed of a reference region decoded before the current block, and the output data becomes a prediction block for the current block.
  • the CNN setting unit 1910 configures the hint information extracted by the decoder (eg, 410 of FIG. 4) as a map and concatenates the layer of the CNN. (concatenation) can be.
  • the input data may be configured in various combinations according to the direction of intra prediction.
  • the CNN execution unit 1930 performs a CNN-based inference process on the input data using the filter coefficients set by the CNN setting unit 1910, that is, the coefficient values of the convolution kernel, thereby outputting the data.
  • a prediction block for the current block may be generated.
  • the generated prediction block may be transferred to an adder (eg, 450 of FIG. 4) and added to the residual block to be used to recover the current block.
  • Some techniques of this disclosure relate to performing CNN based inter prediction.
  • a technique of performing CNN-based inter prediction will be described with reference to FIGS. 22 to 28.
  • the CNN predictor 2200 of FIG. 22 may be, for example, an inter predictor 124 of the image encoding apparatus illustrated in FIG. 1, or one module included in the inter predictor 124.
  • the CNN prediction unit 2200 may include a CNN setting unit 2210 and a CNN execution unit 2230.
  • the CNN predictor 2200 sets image data (that is, input data) and filter coefficients (that is, coefficients of a convolution kernel) to be input to an input layer of the CNN, and uses the CNN to use the CNN. By inferring the current block can be predicted.
  • the CNN setting unit 2210 may set input data.
  • the CNN setting unit 2210 may select at least one reference picture and set a search region in the selected reference picture as input data.
  • the search area in the reference picture means a specific area in the reference picture having a size equal to or larger than the size of the current block.
  • the position of the search region in the reference picture may be determined based on the position of the current block.
  • the position may be the same position as the current block in the reference picture or a position shifted from the same position by a predefined motion vector.
  • the predefined motion vector may be a motion vector (MV) of a neighboring block adjacent to the current block, or an initial MV or a predicted motion vector shared between the video encoding apparatus and the video decoding apparatus. MV) can be used.
  • MV motion vector
  • MV global motion vector
  • the size of the search area in the reference picture may be the same size as the current block or may be an extended size than the current block.
  • the size of the search area may be a size of an area in which a block having the same size as the current block located at the location of the search area is extended left and right by the x coordinate of the above-described predefined motion vector and expanded up and down by the y coordinate. have.
  • the CNN setting unit 2210 may select a reference picture based on the inter prediction direction. For example, in the case of unidirectional prediction, a reference picture of a specific order of reference picture list 0 may be selected. In the case of bidirectional prediction, a reference picture of a particular order of reference picture list 0 and a reference picture of a particular order of reference picture list 1 may be selected.
  • the information on the reference picture to be used as the input data of the CNN may include selection information (eg, reference picture index value) for a reference picture of a specific order in a specific reference picture list.
  • the information about the reference picture may be transmitted to the image decoding apparatus through the encoder (for example, 190 of FIG. 1).
  • the information about the reference picture may be encoded as a syntax of the coding unit (CU) so that different reference pictures for each coding unit may be used as input data of the CNN.
  • the information about the reference picture is encoded as a syntax of a higher unit than the coding unit, for example, CTU, slice, PPS, or SPS, so that the same reference pictures are input data of the CNN for all the coding units included in the higher unit. It can also be used.
  • the CNN setting unit 2210 may select a reference picture that is predefined so that the image encoding apparatus and the image decoding apparatus share a reference picture to be used as input data of the CNN. For example, in the case of unidirectional prediction, a first reference picture (eg, a reference picture corresponding to reference picture index 0) and a second reference picture (eg, a reference picture corresponding to reference picture index 1) of reference picture list 0 may be selected. Can be.
  • the first reference picture of reference picture list 0 e.g., reference picture corresponding to reference picture index 0
  • the first reference picture of reference picture list 1 e.g., reference picture corresponding to reference picture index 0
  • the CNN setting unit 2210 may set the input data of the CNN by selecting the reference picture in various ways.
  • the CNN setting unit 2210 selects at least one reference picture to be used as input data of the CNN from among the plurality of reference pictures, and encodes information about the selected reference picture by the encoder (eg, 190 of FIG. 1). It can also be delivered to the image decoding apparatus through.
  • information about a reference picture to be used as an input of the CNN may be encoded as a syntax of a coding unit (CU), so that different reference pictures for each coding unit may be used as input data of the CNN.
  • CU coding unit
  • the information about the reference picture to be used as the input of the CNN is encoded as a syntax of a higher unit than the coding unit, for example, CTU, slice, PPS, or SPS, so that the same reference pictures are included for all coding units included in the higher unit. It can also be used as input data for a CNN.
  • the information on the reference picture to be used as the input of the CNN may be a picture order count (POC) value of the selected picture or a difference value between the POC value of the selected picture and the POC value of the current picture.
  • POC picture order count
  • the CNN setting unit 2210 may further set at least one of a reconstructed peripheral region adjacent to the current block in the current picture and a motion vector of the peripheral region as additional input data.
  • the peripheral area may be an area of the same component as the current block or may be an area of another component.
  • the peripheral area may be configured in block units (ie, peripheral blocks) or in pixel units (ie, peripheral pixels or peripheral lines).
  • the motion vectors of the plurality of peripheral regions may be set as input data individually or in combination with one or more motion vectors.
  • all input data is assumed to be a luma component.
  • the input data may consist of various combinations of three components (ie, Y, Cb, Cr).
  • the motion vector as input data may be represented using a color code.
  • the color code is a mapping of coordinate values of a motion vector to color values.
  • the motion vector represented by the color code may be input to the CNN by configuring one or more maps. For example, the colors are mapped to two-dimensional planes of the x coordinate and the y coordinate, and a color value corresponding to the value of the motion vector (x, y) is used as the color code of the motion vector.
  • the motion vector may be represented by one map composed of x coordinate values of the corresponding motion vector and the other map composed of y coordinate values of the corresponding motion vector.
  • the map for the motion vector may have the same resolution as the area corresponding to the motion vector.
  • the map of the motion vector of the reconstructed peripheral area adjacent to the current block may have the same resolution as the size of the corresponding peripheral area.
  • the map for the motion vector may mean that a color code representing coordinate values of the motion vector is mapped to each pixel of the corresponding map.
  • the map for the motion vector may mean that the same color code is mapped for each predetermined region of the map corresponding to the unit of the motion vector. For example, when a unit of a motion vector is a region obtained by dividing a peripheral region, that is, a sub block unit, the same color code may be mapped to all pixels included in one sub block of the map.
  • a map composed of x coordinate values (or y coordinate values) for the motion vector may have an x coordinate value (or all pixels included in the unit of the motion vector) for each pixel of the map. , y coordinate value) itself may be mapped.
  • the CNN setting unit 2210 may include a pooling layer or an upsampling layer. You can adjust the resolution so that they are the same.
  • image data (ie, output data) output through the output layer of the CNN corresponding to the input data may be motion vectors or prediction pixels of the current block.
  • 23 is an exemplary diagram of a layer configuration of a CNN.
  • input data is set to a search region in two reference pictures and three peripheral regions in the current picture.
  • the peripheral areas are selected from the restored three blocks located on the upper side, the left side, and the upper left side with respect to the current block, and each is composed of a separate map and is input to the input layer.
  • the output data becomes the prediction pixels of the current block.
  • CNN layer example Input layer Output layer data data Label Example 1 -Search area in a plurality of reference pictures Motion vector of current block or prediction pixels of current block The actual motion vector of the current block or the original block of the current block Example 2 -Search region in a plurality of reference pictures-motion vector of the surrounding region Motion vector of current block or prediction pixels of current block.
  • the actual motion vector of the current block or the original block of the current block Example 3 -Search region in a plurality of reference pictures-peripheral region (restored pixels)
  • Motion vector of current block or prediction pixels of current block The actual motion vector of the current block or the original block of the current block
  • the actual motion vector of the current block or the original block of the current block
  • the data of the input layer can be composed of various combinations
  • the data of the output layer is the motion vector or prediction pixels of the current block
  • the label of the output layer is the actual motion vector of the current block or the original of the current block It becomes a block.
  • the actual motion vector of the current block may mean, for example, a motion vector calculated through motion estimation (ME) using a full search method or a motion vector obtained by refining it.
  • the input data and the output data should be basically the same in the learning process and the inference process of the CNN, respectively.
  • the CNN setting unit 2210 may further set the hint information as additional input data in order to improve the accuracy of inter prediction.
  • the hint information may include time axis distance information between the current picture and the reference picture, for example, a difference value between a picture order count (POC) value of the current picture and a POC value of the reference picture.
  • POC picture order count
  • the time axis distance information exists as many as the number of reference pictures.
  • the hint information may include a quantization parameter (QP).
  • QP quantization parameter
  • the quantization parameter used as the hint information may be selected from among quantization parameter values of the current block, the surrounding area, or the search area in the reference picture, and may be a value (eg, an average value) derived from at least some of them.
  • FIG. 24 is an exemplary diagram of time-base distance information between a current picture and a reference picture. Specifically, FIG. 24A shows a case of unidirectional prediction, and FIG. 24B shows a case of bidirectional prediction.
  • time axis distance information becomes -3 and -1 from the left side. Then, by using the motion estimation result (solid line) from the search area 2411 in the t-3 picture to the search area 2413 in the t-1 picture, the current in the t picture from the search area 2411 in the t-3 picture.
  • the motion vector up to block 2415 and the motion vector from the search region 2413 in the t-1 picture to the current block 2415 in the t picture can be inferred.
  • time axis distance information becomes -1 and +2 from the left side. Then, using the motion estimation result (solid line) from the search area 2431 in the t-1 picture to the search area 2435 in the t + 2 picture, the current in the t picture from the search area 2431 in the t-1 picture is obtained.
  • the motion vector up to block 2433 and the motion vector from the search region 2435 in the t + 2 picture to the current block 2433 in the t picture can be inferred.
  • the hint information may consist of one or more maps and may be concatenated with a layer of the CNN.
  • the map for the hint information may be concatenated at various locations between the input layer and the output layer.
  • the hint information may be transmitted to the image decoding apparatus through the bitstream and used to decode the current block.
  • FIG. 25 is an exemplary diagram of a layer configuration of a CNN including hint information.
  • input data is set to hint information including a search area and a map in two reference pictures.
  • the output data becomes the prediction pixels of the current block.
  • the CNN setting unit 2210 may calculate filter coefficients through an iterative learning process in order to minimize an error between the output data and the output label.
  • the CNN setting unit 2210 may use an error backpropagation algorithm.
  • the CNN setting unit 2210 may propagate an error between the output data and the output label in a reverse direction from the output layer to the input layer through the hidden layer in the learning process of the CNN.
  • the CNN setting unit 2210 may update the connection weights between the nodes to reduce the corresponding error.
  • the CNN setting unit 2210 may calculate filter coefficients by repeating a learning process using an error backpropagation algorithm until the corresponding error is less than a predetermined threshold.
  • the above filter coefficient calculation process may be performed in a predetermined unit (eg, CU, CTU, slice, frame, or sequence (group of frames)).
  • the CNN setting unit 2210 may calculate filter coefficients for each current block or may calculate filter coefficients for each frame.
  • the filter coefficients When filter coefficients are calculated in units of frames, the filter coefficients may be commonly used for inter prediction of a plurality of current blocks included in the frame.
  • the calculated information about the filter coefficients may be transmitted to the image decoding apparatus through the bitstream and used for the image decoding process.
  • the CNN setting unit 2210 may configure a filter coefficient set by previously calculating a plurality of filter coefficients using predetermined sample images.
  • the CNN setting unit 2210 may set one filter coefficient selected according to a predetermined criterion in the set as the filter coefficient for the current block. For example, the CNN setting unit 2210 may select one filter coefficient from the set based on the similarity of pixel values between the current block and the sample images. Alternatively, the CNN setting unit 2210 may select a filter coefficient closest to the filter coefficient calculated through one learning process from the set.
  • the selection information of the filter coefficient for example, the index information, may be transmitted to the image decoding apparatus through the bitstream and used for the image decoding process.
  • FIG. 22 illustrates that the CNN setting unit 2210 is included in the CNN predicting unit 2200, it should be noted that this is exemplary and the present embodiment is not limited thereto. That is, the CNN setting unit 2210 may be implemented as a separate unit from the CNN predicting unit 2200, or may be integrated with the CNN executing unit 2230 and implemented as one unit.
  • the CNN execution unit 2230 may execute the CNN using the input data and the filter coefficients set by the CNN setting unit 2210 to generate a motion vector of the current block or directly generate prediction pixels of the current block.
  • the generated prediction pixels may be transferred to a subtractor of the image encoding apparatus and used to generate a residual block from the current block.
  • the CNN prediction unit 2200 primarily generates motion vectors or prediction pixels of the current block according to existing inter prediction schemes (ie, motion prediction (ME) and motion compensation (MC)). Afterwards, the motion information or the prediction pixels of the current block may be finally generated by refining the corresponding motion vector or the prediction pixels through a CNN-based inference process.
  • existing inter prediction schemes ie, motion prediction (ME) and motion compensation (MC)
  • the CNN prediction unit 2200 determines a reference picture encoded and decoded before the current picture, and searches for reference pixels most similar to the current block in the determined reference picture.
  • the CNN predictor 2200 may generate motion vectors or prediction pixels of the current block by using the found reference pixels. In this case, the generated motion vector or prediction pixels of the current block are set as input data of the CNN by the CNN setting unit 2210.
  • the CNN setting unit 2210 may set filter coefficients (that is, coefficient values of a convolution kernel) to be applied to the CNN.
  • the filter coefficient may be calculated by performing a CNN learning process on input data including a search region in a reference picture.
  • the input data may further include at least one of a peripheral region adjacent to the current block in the current picture and a motion vector of the peripheral region.
  • the input data may further include at least one of hint information for improving the accuracy of inter prediction, for example, time axis distance information between a current picture and a reference picture and a quantization parameter (QP).
  • the filter coefficient may be a value selected from a preset specific value or a set consisting of a plurality of preset specific values.
  • the CNN execution unit 2230 executes the CNN by using the input data and the filter coefficients set by the CNN setting unit 2210 to refine the motion vector or the prediction pixels of the current block, thereby improving the motion vector or the current block.
  • the prediction pixels can be finally generated.
  • the generated prediction pixels may be transferred to a subtractor of the image encoding apparatus and used to generate a residual block from the current block.
  • FIG. 26 is a block diagram illustrating a configuration of a CNN prediction unit that may be included in an image decoding apparatus according to an embodiment of the present invention.
  • the CNN predictor 2600 of FIG. 26 may be, for example, an inter predictor 444 of the image decoding apparatus illustrated in FIG. 4, or one module included in the inter predictor 444.
  • the CNN predictor 2600 may include a CNN setting unit 2610 and a CNN execution unit 2630.
  • the CNN prediction unit 2600 determines a reference picture based on the reference picture selection information signaled from the image encoding apparatus, and performs the inference process of the CNN using the determined reference picture to move the current block.
  • Vector or prediction pixels can be generated.
  • the CNN setting unit 2610 may set input data.
  • the CNN setting unit 2610 may select a reference picture based on the reference picture selection information signaled from the image encoding apparatus, and set a search region in the selected reference picture as input data.
  • the search area in the reference picture means a specific area in the reference picture having a size larger than or equal to the size of the current block, and the position and size of the search area are as described above with reference to FIG.
  • the CNN setting unit 2610 may select the reference picture based on the information on the inter prediction direction signaled from the image encoding apparatus. For example, in the case of unidirectional prediction, similarly to a video encoding apparatus, a reference picture of a specific order of reference picture list 0 may be selected as input data. In the case of bidirectional prediction, similarly to the image encoding apparatus, a reference picture of a particular order of reference picture list 0 and a reference picture of a particular order of reference picture list 1 may be selected.
  • the selection information of the reference picture may be a value indicating a reference picture of a specific order in the selected reference picture list, for example, a reference picture index value.
  • the selection information of the reference picture may be a picture order count (POC) value of the selected picture or a difference value between the POC value of the selected picture and the POC value of the current picture.
  • POC picture order count
  • the CNN setting unit 2610 may further set at least one of a reconstructed peripheral region adjacent to the current block in the current picture and a motion vector of the peripheral region as additional input data.
  • the motion vector may consist of one or more maps expressed in color code.
  • the motion vector may be represented by one map composed of color values corresponding to the values of the corresponding vector (x, y).
  • the motion vector may be represented by one map composed of x coordinate values of the vector and the other map composed of y coordinate values of the vector.
  • image data (ie, output data) output through the output layer of the CNN corresponding to the input data may be motion vectors or prediction pixels of the current block.
  • the CNN setting unit 2610 may further set the hint information as additional input data in order to improve the accuracy of the inter prediction.
  • the hint information may include time-base distance information between a current picture and a reference picture, for example, a difference value between a picture order count (POC) value of the current picture and a POC value of the reference picture.
  • POC picture order count
  • the time axis distance information exists as many as the number of reference pictures.
  • the hint information may include a quantization parameter (QP) value of a current block, a peripheral region, or a search region in a reference picture.
  • QP quantization parameter
  • the hint information may be concatenated with the input layer or the convolutional layer of the CNN to form one concatenation layer.
  • the hint information may be composed of one or more maps having the same resolution as the layer to be concatenated.
  • the hint information may be transmitted to the image decoding apparatus through the bitstream and used as input data for CNN based inter prediction.
  • image data (ie, output data) output through the output layer of the CNN corresponding to the input data may be motion vectors or prediction pixels of the current block.
  • the CNN setting unit 2610 may set the filter coefficients signaled from the image encoding apparatus as filter coefficients for inter prediction of the current block.
  • the CNN setting unit 2610 sets the filter coefficients previously stored in the image decoding apparatus as filter coefficients for inter prediction of the current block. Can be.
  • the CNN setting unit 2610 may select selection information of the filter coefficients signaled from the image encoding apparatus, for example.
  • the specific filter coefficient of the set selected according to the index information may be set as the filter coefficient for inter prediction of the current block.
  • the CNN execution unit 2630 may generate the prediction pixels by inferring the motion information of the current block by executing the CNN using the input data and the filter coefficients set by the CNN setting unit 2610. In this case, the generated prediction pixels may be transferred to an adder and added to the residual block to be used to recover the current block.
  • the CNN prediction unit 2600 primarily generates motion vectors or prediction pixels of the current block according to existing inter prediction schemes (ie, motion prediction (ME) and motion compensation (MC)). Afterwards, the motion information or the prediction pixels of the current block may be finally generated by refining the corresponding motion vector or the prediction pixels through a CNN-based inference process.
  • existing inter prediction schemes ie, motion prediction (ME) and motion compensation (MC)
  • the CNN predictor 2600 may determine a motion vector of the current block and a reference picture referenced by the motion vector by using a syntax element of the inter prediction mode extracted from the decoder.
  • the CNN predictor 2600 may generate motion information or prediction pixels of the current block by predicting the current block by using the determined motion vector and the reference picture.
  • the CNN setting unit 2610 may set the generated motion information or prediction pixels of the current block as image data (that is, input data) to be input to the input layer of the CNN.
  • the CNN setting unit 2610 may set filter coefficients (that is, coefficient values of a convolution kernel) to be applied to the CNN.
  • the filter coefficient may be a value transmitted from the image encoding apparatus.
  • the filter coefficient may be a value selected by the image encoding apparatus in a set consisting of a predetermined specific value or a plurality of predetermined specific values.
  • FIG. 26 illustrates that the CNN setting unit 2610 is included in the CNN predicting unit 2600, it should be noted that this is exemplary and the present embodiment is not limited thereto. That is, the CNN setting unit 2610 may be implemented as a separate unit from the CNN prediction unit 2600, or may be integrated with the CNN execution unit 2630 and implemented as a unit.
  • the CNN execution unit 2630 executes the CNN by using the input data and the filter coefficients set by the CNN setting unit 2610 to refine the motion vector or the prediction pixels of the current block to thereby refine the motion vector of the current block.
  • prediction pixels may be finally generated.
  • the generated prediction pixels may be transferred to an adder and added to the residual block to be used to recover the current block.
  • FIG. 27A and 27B are flowcharts illustrating a process of performing inter prediction by a CNN predictor included in the image encoding apparatus illustrated in FIG. 22.
  • FIG. 27A illustrates a CNN based inter prediction process according to the first embodiment
  • FIG. 27B illustrates a CNN based inter prediction process according to the second embodiment.
  • the CNN setting unit 2210 may set image data (ie, input data) to be input to an input layer of the CNN in order to perform an inference process of the CNN.
  • the input data may include a search region in the reference picture.
  • the input data may further include at least one of a reconstructed peripheral region adjacent to the current block in the current picture and a motion vector of the peripheral region.
  • the input data may further include hint information, such as time axis distance information between the current picture and the reference picture, in order to improve the accuracy of inter prediction.
  • the CNN setting unit 2210 may set filter coefficients (that is, coefficient values of a convolution kernel) to be applied to the CNN.
  • the filter coefficient may be a value calculated by performing a learning process of the CNN on the input data set in step S2711.
  • the filter coefficient may be a value selected from a preset specific value or a set consisting of a plurality of preset specific values.
  • the CNN execution unit 2230 may execute the CNN using the input data and the filter coefficients set in operation S2711 to generate a motion vector of the current block or directly generate prediction pixels of the current block.
  • the generated prediction pixels may be transferred to a subtractor and used to generate a residual block from the current block.
  • the CNN prediction unit 2200 may generate motion vectors or prediction pixels of the current block according to existing inter prediction schemes (ie, motion estimation (ME) and motion compensation (MC)). Can be.
  • existing inter prediction schemes ie, motion estimation (ME) and motion compensation (MC)
  • the CNN prediction unit 2200 determines a reference picture encoded and decoded before the current picture, and searches for reference pixels most similar to the current block in the determined reference picture.
  • the CNN predictor 2200 may generate motion vectors or prediction pixels of the current block by using the found reference pixels.
  • the CNN setting unit 2210 may set the motion vector or the prediction pixels of the current block generated in operation S1031 as image data (that is, input data) to be input to the input layer of the CNN.
  • the CNN setting unit 2210 may set filter coefficients (that is, coefficient values of a convolution kernel) to be applied to the CNN.
  • the filter coefficient may be calculated by performing a learning process of the CNN on the set input data.
  • the input data may include a predetermined search area in the reference picture.
  • the input data may further include at least one of a reconstructed peripheral region adjacent to the current block in the current picture and a motion vector of the peripheral region.
  • the input data may further include hint information for improving the accuracy of inter prediction, for example, at least one of time axis distance information between a current picture and a reference picture and a quantization parameter (QP).
  • the filter coefficient may be a value selected from a preset specific value or a set consisting of a plurality of preset specific values.
  • step S2735 the CNN execution unit 2230 executes the CNN using the input data and the filter coefficients set in step S2733 to refine the motion vectors or the prediction pixels of the current block generated in step S2731, thereby Finally, the motion vector or prediction pixels of the block can be generated.
  • the generated prediction pixels may be transferred to a subtractor of the image encoding apparatus and used to generate a residual block from the current block.
  • FIG. 28A and 28B are flowcharts illustrating a process of performing inter prediction by a CNN predictor included in the image decoding apparatus illustrated in FIG. 26.
  • FIG. 28A illustrates a CNN based inter prediction process according to the first embodiment
  • FIG. 28B illustrates a CNN based inter prediction process according to the second embodiment.
  • the CNN setting unit 2610 may set image data (ie, input data) to be input to an input layer of the CNN to perform an inference process of the CNN.
  • the input data may include a search region in the reference picture determined based on the reference picture selection information signaled from the image encoding apparatus.
  • the input data may further include at least one of a reconstructed peripheral area of the current block in the current picture and a motion vector of the peripheral area.
  • the input data may further include hint information to improve the accuracy of inter prediction.
  • the CNN setting unit 2610 may set filter coefficients (that is, coefficient values of a convolution kernel) to be applied to the CNN.
  • the filter coefficient may be a signal signaled from the image encoding apparatus.
  • the filter coefficient may be a value selected by the image encoding apparatus in a set consisting of a predetermined specific value or a plurality of predetermined specific values.
  • the CNN execution unit 2630 may generate a motion vector or prediction pixels of the current block by executing the CNN using the input data and the filter coefficients set in operation S2811.
  • the generated prediction pixels may be transferred to an adder of the image encoding apparatus and added to the residual block to be used to reconstruct the current block.
  • the CNN prediction unit 2600 performs motion vector or prediction of the current block according to existing inter prediction schemes (ie, motion estimation (ME) and motion compensation (MC)). Pixels can be generated.
  • existing inter prediction schemes ie, motion estimation (ME) and motion compensation (MC)
  • the CNN predictor 2600 may determine a motion vector of the current block and a reference picture referenced by the motion vector by using a syntax element for the inter prediction mode extracted from the decoder.
  • the CNN predictor 2600 may generate a motion vector or prediction pixels of the current block by predicting the current block by using the determined motion vector and the reference picture.
  • the CNN setting unit 2610 may set motion vector or prediction pixels of the current block generated in operation S2831 as image data (that is, input data) to be input to the input layer of the CNN.
  • the CNN setting unit 2610 may set filter coefficients (that is, coefficient values of a convolution kernel) to be applied to the CNN.
  • the filter coefficient may be a signal signaled from the image encoding apparatus.
  • the filter coefficient may be a value selected by the image encoding apparatus in a set consisting of a predetermined specific value or a plurality of predetermined specific values.
  • step S2835 the CNN execution unit 2630 executes the CNN using the filter coefficient set in step S2833 to refine the motion vector or prediction pixels of the current block generated in step S2831, thereby Finally, motion vectors or prediction pixels may be generated.
  • the generated prediction pixels may be transferred to an adder of the image decoding apparatus and added to the residual block to be used to reconstruct the current block.
  • Some techniques of this disclosure relate to a technique for performing CNN based filtering on a reference region in order to minimize the quantization error of the reference region used for intra prediction of the current block.
  • the prediction region instead of directly generating the prediction block of the current block based on the CNN, the prediction region may be greatly improved without filtering the existing intra prediction structure by greatly filtering the surrounding area used for intra prediction of the current block.
  • 29 is a flowchart illustrating a process of calculating a filter coefficient of a CNN according to an embodiment of the present invention.
  • the image encoding apparatus may set input data of a CNN.
  • the input data may include a reference region encoded before the current block, which is a block to be encoded.
  • the reference region is a block of a component encoded before the block of the component to be encoded among the luma block and chroma blocks constituting the current block, and adjacent to the current block. May comprise at least one block). That is, the peripheral area may be an area of the same component as the current block or may be an area of another component. In addition, the reference region may further include a new region (ie, an average block, an average line, or an average pixel) generated by averaging pixel values of the peripheral region.
  • the peripheral area may be configured in pixel units (ie, peripheral lines or peripheral pixels) or in block units (ie, peripheral blocks).
  • 30 is an exemplary diagram of a peripheral region that may be set as input data of a CNN. Specifically, FIG. 30A illustrates a peripheral region in pixel units, and FIG. 30B illustrates a peripheral region in block units.
  • the peripheral area (ie, the peripheral pixel or the peripheral line) in units of pixels includes '1 ⁇ 1' pixels adjacent to the current block X, and '1 ⁇ n' or 'n ⁇ '. May include lines of 1 '.
  • the peripheral area (that is, the neighboring block) in units of blocks includes a left block (C), an upper block (B), a right upper block (D), and a lower left block adjacent to the current block (X).
  • E and the upper left block (A).
  • original blocks ie, uncoded blocks
  • prediction blocks ie, prediction blocks
  • reconstruction blocks of neighboring blocks will be described differently.
  • the original block is denoted by 'Ao'
  • the predictive block is denoted by 'Ap'
  • the reconstructed block is denoted by 'Ar'.
  • an average block obtained by averaging pixel values of the neighboring blocks A, B, C, D, and E is denoted by 'F'. Since the periphery area of the block unit has a wider application range of the convolution kernel than the periphery area of the pixel unit, setting the periphery area of the block unit as the input data of the CNN can improve the accuracy of the output data.
  • the right block and the lower block of the current block X in addition to the neighboring blocks Ar, Br, Cr, Dr, Er shown in FIG.
  • one or more blocks (not shown) of the lower right block may be further input to the CNN.
  • the accuracy of the output data may be further improved by further inputting one or more blocks among the right block, the lower block, and the right lower block of the current block of the luma component that have already been encoded to the CNN.
  • the present embodiment will be described on the premise that the reference area is composed of a peripheral area of a block unit, that is, one or more peripheral blocks.
  • the input data may be composed of at least one layer and input to the CNN.
  • neighboring blocks Ar, Br, Cr, Dr, and Er may be configured as separate layers and input to the CNN.
  • all or part of the neighboring blocks (Ar and Br) may be integrated into a multiplier or the like and may be configured as a single layer and input to the CNN.
  • the input data may further include additional information to improve the output accuracy of the CNN.
  • the additional information may include all encoding related information that can be referred to by the image encoding / decoding apparatus.
  • the additional information may include at least one of a quantization parameter (QP) value of the surrounding area, a quantization parameter (QP) value of the current block (in the case of an image decoding apparatus), and information about a residual of the surrounding area.
  • the information about the residual of the peripheral region may include an absolute value of each of the transform coefficients of the corresponding peripheral region or an absolute sum of all the transform coefficients in the frequency domain.
  • the information on the residual of the peripheral region may include an absolute value of each of the residual signals of the corresponding peripheral region or an absolute sum of all residual signals in the spatial domain.
  • a convolution kernel of 'n ⁇ 1 ⁇ k' or '1 ⁇ n ⁇ k' may be applied to the input layer of the CNN.
  • a convolution kernel of 'n ⁇ m ⁇ k' may be applied to the input layer of the CNN.
  • the chroma block is used at its original size or up-scaled to the same size as the luma block using an upsampling layer. Can be used.
  • the apparatus for encoding an image may set an output label to be used for supervised learning of the CNN.
  • the output label means an explicit correct answer to the input data set in step S2910 and is used to calculate a squared error through comparison with the output data of the CNN.
  • the output label may be original pixel values of the peripheral region set as input data, or of pixel values of other components of the peripheral region, which are smaller than the quantization parameter (QP) value applied to the peripheral region.
  • QP quantization parameter
  • the pixel values may be applied to the quantization parameter.
  • the output data refers to data output through the CNN's execution result output layer, and pixel values for restoring pixel values of the peripheral area set as input data to a level before quantization (hereinafter, referred to as 'restored peripheral areas'). May be).
  • the input data and output data of the CNN should be basically the same in the learning process and the inference process.
  • CNN layer example Input layer Output layer data data Label Example 1 -Peripheral area of line unit-Additional information Reconstructed Peripheral Area, or Residual Information in Restored Peripheral Area Original pixel values of the peripheral region, or pixel values of a component previously decoded using a QP smaller than the current QP among the components of the peripheral region.
  • Example 2 -Peripheral area in block unit-Additional information Reconstructed Peripheral Area, or Residual Information in Restored Peripheral Area Original pixel values of the peripheral region, or pixel values of a component previously decoded using a QP smaller than the current QP among the components of the peripheral region.
  • Example 3 Peripheral area in line or block unit-Additional information Residual information in the restored periphery or restored periphery Original pixel values of the peripheral region, or pixel values of a component previously decoded using a QP smaller than the current QP among the components of the peripheral region.
  • the data (input data) of the input layer may be composed of a peripheral area and additional information in line units and / or block units.
  • the data (output data) of the output layer may be a peripheral area reconstructed by approximating original pixel values before quantization with the peripheral area set as the input data, or residual information (pixel value) of the peripheral area. )).
  • the label of the output layer is decoded by using original pixel values of the peripheral area set as input data or a QP (Quantization Parameter) smaller than a region of a component set as input data among luma blocks and chroma blocks constituting the peripheral area.
  • QP Quality Parameter
  • the image encoding apparatus calculates the filter coefficients of the CNN by repeatedly performing CNN's supervised learning process using an error backpropagation algorithm based on the input data set in operation S2910 and the output label set in operation S2920. can do.
  • the image encoding apparatus may calculate a filter coefficient for each of the Y, Cb, and Cr components, or may calculate one filter coefficient commonly applied to all of the Y, Cb, and Cr components.
  • the CNN-based filter unit 3200 may include a CNN setting unit 3210 and a CNN execution unit 3230.
  • the structure of the CNN is as described above with reference to FIG. 5, and the CNN may further include an upsampling layer or a pooling layer to adjust the size of the layer.
  • the CNN setting unit 3210 may set input data of the CNN.
  • the input data may include a reference area reconstructed before the current block.
  • the reference region is a block of a component encoded before the block of the component to be encoded among the luma block and chroma blocks constituting the current block, and adjacent to the current block. May comprise at least one block). That is, the peripheral area may be an area of the same component as the current block or may be an area of another component.
  • the reference region may further include a new region (ie, an average block, an average line, or an average pixel) generated by averaging pixel values of the peripheral region.
  • the peripheral area may be configured in pixel units (ie, peripheral lines or peripheral pixels) or in block units (ie, peripheral blocks). Since the periphery area of the block unit is wider in the application range of the convolution kernel than the periphery area of the pixel unit, the accuracy of the output data can be improved by setting the periphery area of the block unit as input data of the CNN.
  • the input data may be composed of at least one layer and input to the CNN.
  • the input data may be composed of one layer for each neighboring block and input to the CNN.
  • the input data may be composed of one layer in which all or part of the neighboring blocks (Ar and Br) are integrated and input to the CNN.
  • the input data may further include additional information to improve the accuracy of the CNN learning process and the CNN inference process.
  • the additional information may include at least one of a quantization parameter (QP) value of the reference region, a quantization parameter value of the current block (in the case of an image decoding apparatus), and information about a residual of the reference region.
  • QP quantization parameter
  • the information about the residual of the reference region may be the absolute values or the absolute sums of the transform coefficients of the reference region in the frequency domain, and may be the respective absolute values or the absolute sum of the residual signals in the spatial domain.
  • the additional information may further include intra prediction mode information (eg, directional information of intra prediction) of the current block.
  • a convolution kernel of 'n ⁇ 1 ⁇ k' or '1 ⁇ n ⁇ k' may be applied to the input layer of the CNN.
  • a convolution kernel of 'n ⁇ m ⁇ k' may be applied to the input layer of the CNN.
  • the CNN setting unit 3210 may set the filter coefficient calculated by the image encoding apparatus as a filter coefficient for intra prediction of the current block.
  • the filter coefficient may be a value calculated by a video encoding apparatus in a predetermined unit, for example, a CU unit or a frame unit.
  • the filter coefficient When the filter coefficient is set in units of frames, the filter coefficient may be commonly used for intra prediction of a plurality of current blocks included in the frame.
  • the CNN setting unit 3210 may perform a filter for intra prediction of the current block from the filter coefficient set based on the index information of the filter coefficient transmitted from the image encoding apparatus. You can also set the factor.
  • FIG. 32 illustrates that the CNN setting unit 3210 is included in the CNN-based filter unit 3200, it should be noted that this is exemplary and the present embodiment is not limited thereto. That is, the CNN setting unit 3210 may be implemented as a separate unit from the CNN-based filter unit 3200. In addition, the CNN setting unit 3210 may be integrated with the CNN execution unit 3230 and implemented as one unit.
  • the CNN execution unit 3230 performs a CNN-based inference process on the input data using the filter coefficients set by the CNN setting unit 3210, that is, the coefficient values of the convolution kernel, thereby outputting the output data, that is, the current block.
  • a prediction block can be generated.
  • the generated prediction block may be i) delivered to the subtractor in the case of the image encoding apparatus and used to generate a residual block of the current block, and ii) delivered to the adder in the case of the image decoding apparatus and transmitted to the presenter. It can be used to recover the current block by adding it to the residual block of the block.
  • FIG 33 is a flowchart illustrating a filtering process of a reference region according to an aspect of the present embodiment.
  • the intra predictor may determine an intra prediction mode to be used for encoding or decoding a current block.
  • the intra prediction mode may include a plurality of modes according to the prediction direction as described above with reference to FIG. 3.
  • the intra prediction mode may include 65 directional modes and a non-directional mode including a planar mode and a DC mode.
  • the intra prediction unit may select a reference region to be used for intra prediction of the current block according to the determined intra prediction mode. That is, the reference region may be configured differently according to the intra prediction mode of the current block.
  • the reference region may include a restored peripheral region adjacent to the current block as described above with reference to FIG. 29.
  • the reference region may further include additional information related to intra prediction of the current block.
  • the intra predictor may determine whether to perform filtering on the reference region selected in operation S3310 by determining whether the preset filtering condition is satisfied.
  • the reference region selected for intra prediction of the current block may have a quantization error with original pixel values while undergoing quantization / dequantization. Since the quantization error causes a decrease in the accuracy of intra prediction, it is necessary to filter the reference region before performing the intra prediction on the current block in order to minimize the quantization error. However, filtering the reference region does not guarantee minimization of the quantization error, and there is a problem that the complexity of the image encoding and decoding process may increase due to the filtering. Therefore, the intra prediction unit according to the present embodiment may adaptively perform filtering on the reference region to be used for intra prediction of the current block only under specific conditions.
  • the filtering condition may be set based on the size of each of the reference regions selected for intra prediction of the current block.
  • the intra predictor may perform filtering on the neighbor block only when the size of the neighbor block included in the reference region is greater than or equal to '4 ⁇ 4'.
  • the filtering condition may be set based on the intra prediction mode of the current block and the size of the current block. For example, when the intra prediction mode is the DC mode, filtering is not performed on the reference region regardless of the size of the current block, and when the intra prediction mode is the directional mode having the prediction direction of 'Vertical-Right', the size of the current block is Filtering of the reference region may be performed only when it is greater than or equal to '8 ⁇ 8'. However, it should be noted that this is exemplary and does not limit the present embodiment.
  • the intra predictor may adaptively filter the reference region selected for intra prediction of the current block, thereby minimizing the increase in the complexity of the image encoding and decoding process while improving the accuracy of the intra prediction.
  • the intra prediction unit may determine to perform filtering on the reference region selected for intra prediction of the current block, and proceed to step S3330.
  • the intra prediction unit may determine not to perform filtering on the reference region selected for intra prediction of the current block, and proceed to operation S3340.
  • the intra predictor performs filtering on the reference region selected in operation S3310 to restore pixel values of the reference region to a level before quantization (hereinafter, referred to as 'filtered reference region'). May be generated).
  • a CNN based filter may be used to filter the corresponding reference region.
  • a process of filtering the reference region by using the CNN-based filter will be described in detail.
  • the intra prediction unit may set the input data and the filter coefficient of the CNN to perform CNN-based filtering on the reference region.
  • the input data of the CNN is set as the reference region to be used for intra prediction of the current block, and is selected as the reference region selected in step S3310.
  • the filter coefficient of the CNN is set to the filter coefficient calculated through the supervised learning process of the image encoding apparatus.
  • the filter coefficients calculated by the image encoding apparatus may be signaled to the image decoding apparatus through the bitstream and used in the CNN-based filtering process of the image decoding apparatus.
  • the filter coefficient stored in each apparatus may be used in the CNN-based filtering process without additional signaling.
  • the filter coefficient when the filter coefficient is preset with a plurality of specific values to form one set and then stored in the image encoding apparatus and the image decoding apparatus, selection information of the filter coefficient selected by the image encoding apparatus (for example, a filter in the set)
  • the specific filter coefficient set in the set according to the index information of the coefficient may be used in the CNN-based filtering process of the image decoding apparatus.
  • the filter coefficient may be previously set to a plurality of specific values according to the quantization parameter value to configure one set, and a specific example thereof is shown in Table 4.
  • Table 4 since the contents of Table 4 are exemplary, it should be noted that the present embodiment is not limited thereto.
  • Group QP Filter coefficient Group 1 0 to 10 W00, W01, W02, W03 Group 2 (G02) 11 to 20 W04, W05, W06, W07 Group 3 (G03) 21-30 W08, W09, W10, W11 Group 4 (G04) 31-40 W12, W13, W14, W15 Group 5 (G05) 41- W16, W17, W18, W19
  • the filter coefficients 'W00 to W03' set to Group 1 are assigned to the reference region. It can be used for CNN based filtering process.
  • the filter coefficients 'W16 to W19' set to group 5 are included in the CNN-based filtering process for the reference region. Can be used.
  • the group index information 'G01 to G05' corresponding to the filter coefficient selected by the image encoding apparatus may be signaled to the image decoding apparatus through a bitstream and used for the CNN based filtering process of the image decoding apparatus.
  • the intra prediction unit may execute the CNN to which the filter coefficient set in operation S3332 is applied to perform the inference process on the input data to generate output data.
  • the generated output data may be a filtered reference region in which pixel values of the reference region set as input data are restored to a level before quantization.
  • the intra predictor may perform intra prediction on the current block by using the reference region.
  • the reference region used for intra prediction may be an unfiltered reference region selected in step S3310.
  • the reference region used for intra prediction may be a reference region filtered to a level before quantization in step S3330.
  • the intra predictor may perform intra prediction on the current block using the pixel values of the filtered reference region as it is. Also, in this case, the intra predictor may use pixel values (that is, weighted average values) calculated by applying preset weights to the pixel values of the filtered reference region and the pixel values before filtering of the reference region. Intra prediction may be performed on the current block.
  • the weight may be an experimentally determined value in order to improve the accuracy of the intra prediction result and the image encoding and decoding efficiency.
  • FIG. 34 is a flowchart illustrating a filtering process of a reference region according to another aspect of the present embodiment.
  • the filtering process of FIG. 34 differs from the filtering process of FIG. 33 in a specific method of filtering the reference region, and a description of overlapping portions will be omitted or briefly described below.
  • the intra predictor may determine an intra prediction mode to be used to encode or decode a current block.
  • the intra prediction unit may select a reference region to be used for intra prediction of the current block according to the determined intra prediction mode.
  • the reference region may include a restored peripheral region adjacent to the current block.
  • the reference region may further include additional information related to intra prediction of the current block.
  • the intra predictor may determine whether to perform the filtering on the reference region selected in operation S3410 by determining whether the preset filtering condition is satisfied.
  • the reference region selected for intra prediction of the current block may have a quantization error with original pixel values while undergoing quantization / dequantization. Accordingly, in order to minimize such quantization error and not to greatly increase the complexity of the image encoding / decoding process, the intra prediction unit may adaptively filter the reference region only under specific conditions.
  • the filtering condition may be set based on the size of each of the reference regions selected for intra prediction of the current block.
  • the intra predictor may perform filtering on the neighbor block only when the size of the neighbor block included in the reference region is greater than or equal to '4 ⁇ 4'.
  • the filtering condition may be set based on the intra prediction mode of the current block and the size of the current block. For example, when the intra prediction mode is the DC mode, filtering is not performed on the reference region regardless of the size of the current block, and when the intra prediction mode is the directional mode having the prediction direction of 'Vertical-Right', the size of the current block is Filtering of the reference region may be performed only when it is greater than or equal to '8 ⁇ 8'. However, it should be noted that this is exemplary and does not limit the present embodiment.
  • the intra prediction unit may determine to perform filtering on the reference region selected for intra prediction of the current block, and proceed to operation S3430.
  • the intra prediction unit may determine not to perform filtering on the reference region selected for intra prediction of the current block, and proceed to operation S3440.
  • the intra predictor performs filtering on the reference region selected in operation S3410 to restore pixel values of the reference region to a level before quantization (hereinafter, referred to as 'filtered reference region'). May be generated).
  • a low pass filter eg, a 2-tap filter, a 3-tap filter, etc.
  • a CNN-based filter may be used to filter the corresponding reference region.
  • the filtering for the reference region is performed by first filtering using a low pass filter (or a CNN based filter), and then second filtering using a CNN based filter (or a low pass filter). It can be done in a manner.
  • a process of filtering the reference region using the low pass filter and the CNN-based filter will be described in detail.
  • the intra predictor may first filter the reference region using the low pass filter. Since a method for filtering the reference region using the low pass filter is obvious to those skilled in the art, a detailed description thereof will be omitted.
  • the intra predictor may secondarily filter the reference region filtered in operation S3432 using the CNN-based filter.
  • the input data of the CNN is set as a reference region to be used for intra prediction of the current block, and is filtered as a reference region in step S3432.
  • the filter coefficient of the CNN is set to the filter coefficient calculated through the supervised learning process of the image encoding apparatus.
  • the filter coefficients calculated by the image encoding apparatus may be signaled to the image decoding apparatus through the bitstream and used in the CNN-based filtering process of the image decoding apparatus.
  • the filter coefficient stored in each apparatus may be used in the CNN-based filtering process without additional signaling.
  • the filter coefficient when the filter coefficient is preset with a plurality of specific values to form one set and then stored in the image encoding apparatus and the image decoding apparatus, selection information of the filter coefficient selected by the image encoding apparatus (for example, a filter in the set)
  • the specific filter coefficient set in the set according to the index information of the coefficient may be used in the CNN-based filtering process of the image decoding apparatus.
  • the filter coefficient may be previously set to a plurality of specific values according to the range of the quantization parameter value to configure one set, and a specific example thereof is as described above with reference to Table 4.
  • the intra predictor may generate output data by executing an inference process on input data by executing a CNN to which a specific filter coefficient is applied.
  • the generated output data may be a filtered reference region in which pixel values of the reference region set as input data are restored to a level before quantization.
  • the intra predictor may perform intra prediction on the current block by using the reference region.
  • the reference region used for intra prediction may be an unfiltered reference region selected in step S3410.
  • the reference region used for intra prediction may be a reference region filtered to a level before quantization in step S3430.
  • the intra predictor may perform intra prediction on the current block using the pixel values of the filtered reference region as it is. Also, in this case, the intra predictor may use pixel values (that is, weighted average values) calculated by applying preset weights to the pixel values of the filtered reference region and the pixel values before filtering of the reference region. Intra prediction may be performed on the current block.
  • the weight may be an experimentally determined value in order to improve the accuracy of the intra prediction result and the image encoding and decoding efficiency.
  • the computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. That is, computer-readable recording media include storage media such as magnetic storage media (eg, ROM, floppy disk, hard disk, etc.), and optical reading media (eg, CD-ROM, DVD, etc.).
  • the computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne le codage ou décodage d'image et, plus spécifiquement, un appareil et un procédé d'application d'un réseau neuronal artificiel (ANN) à un codage ou décodage d'image. L'appareil et le procédé de la présente invention sont caractérisés par l'application d'un coefficient de filtre basé sur un CNN à une première image et à une carte de paramètres de quantification et/ou à une carte de division de blocs pour produire une seconde image.
PCT/KR2019/002654 2018-02-23 2019-03-07 Appareil et procédé destinés à appliquer un réseau neuronal artificiel à un codage ou décodage d'image WO2019194425A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/064,304 US11265540B2 (en) 2018-02-23 2020-10-06 Apparatus and method for applying artificial neural network to image encoding or decoding
US17/576,000 US20220141462A1 (en) 2018-02-23 2022-01-14 Apparatus and method for applying artificial neural network to image encoding or decoding

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
KR20180040588 2018-04-06
KR10-2018-0040588 2018-04-06
KR1020180072499A KR102648464B1 (ko) 2018-06-25 2018-06-25 지도 학습을 이용한 영상 개선 방법 및 장치
KR10-2018-0072499 2018-06-25
KR10-2018-0072506 2018-06-25
KR1020180072506A KR20200000548A (ko) 2018-06-25 2018-06-25 Cnn 기반의 영상 부호화 또는 복호화 장치 및 방법
KR10-2018-0081123 2018-07-12
KR1020180081123A KR102668262B1 (ko) 2018-07-12 Cnn 기반의 영상 부호화 또는 복호화 장치 및 방법
KR1020180099166A KR20190117352A (ko) 2018-04-06 2018-08-24 영상 부호화 또는 복호화 장치 및 방법
KR10-2018-0099166 2018-08-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/064,304 Continuation US11265540B2 (en) 2018-02-23 2020-10-06 Apparatus and method for applying artificial neural network to image encoding or decoding

Publications (1)

Publication Number Publication Date
WO2019194425A1 true WO2019194425A1 (fr) 2019-10-10

Family

ID=68101048

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/002654 WO2019194425A1 (fr) 2018-02-23 2019-03-07 Appareil et procédé destinés à appliquer un réseau neuronal artificiel à un codage ou décodage d'image

Country Status (1)

Country Link
WO (1) WO2019194425A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111432208A (zh) * 2020-04-01 2020-07-17 济南浪潮高新科技投资发展有限公司 一种利用神经网络确定帧内预测模式的方法
CN113052924A (zh) * 2019-12-27 2021-06-29 无锡祥生医疗科技股份有限公司 用于超声图像编解码间的画质补偿方法及其卷积神经网络
CN113259671A (zh) * 2020-02-10 2021-08-13 腾讯科技(深圳)有限公司 视频编解码中的环路滤波方法、装置、设备及存储介质
US20210329286A1 (en) * 2020-04-18 2021-10-21 Alibaba Group Holding Limited Convolutional-neutral-network based filter for video coding
WO2022067806A1 (fr) * 2020-09-30 2022-04-07 Oppo广东移动通信有限公司 Procédés de codage et de décodage vidéo, codeur, décodeur et support de stockage
WO2022132277A1 (fr) * 2020-12-16 2022-06-23 Tencent America LLC Procédé et appareil de vidéocodage
WO2023051222A1 (fr) * 2021-09-28 2023-04-06 腾讯科技(深圳)有限公司 Procédé et appareil de filtrage, procédé et appareil de codage, procédé et appareil de décodage, support lisible par ordinateur et dispositif électronique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130124517A (ko) * 2010-12-21 2013-11-14 인텔 코포레이션 고효율 비디오 코딩을 위한 콘텐츠 적응적 장애 보상 필터링
WO2017036370A1 (fr) * 2015-09-03 2017-03-09 Mediatek Inc. Procédé et appareil de traitement basé sur un réseau neuronal dans un codage vidéo
WO2017178827A1 (fr) * 2016-04-15 2017-10-19 Magic Pony Technology Limited Post-filtrage en boucle destiné au codage et décodage vidéo
KR20180001428A (ko) * 2016-06-24 2018-01-04 한국과학기술원 Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130124517A (ko) * 2010-12-21 2013-11-14 인텔 코포레이션 고효율 비디오 코딩을 위한 콘텐츠 적응적 장애 보상 필터링
WO2017036370A1 (fr) * 2015-09-03 2017-03-09 Mediatek Inc. Procédé et appareil de traitement basé sur un réseau neuronal dans un codage vidéo
WO2017178827A1 (fr) * 2016-04-15 2017-10-19 Magic Pony Technology Limited Post-filtrage en boucle destiné au codage et décodage vidéo
KR20180001428A (ko) * 2016-06-24 2018-01-04 한국과학기술원 Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LULU ZHOU: "Convolutional Neural Network Filter (CNNF) for intra frame", JVET-I0022 (VERSION 4), JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3, 24 January 2018 (2018-01-24), Gwangju, Korea, pages 1 - 9, XP030151131 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052924A (zh) * 2019-12-27 2021-06-29 无锡祥生医疗科技股份有限公司 用于超声图像编解码间的画质补偿方法及其卷积神经网络
CN113259671A (zh) * 2020-02-10 2021-08-13 腾讯科技(深圳)有限公司 视频编解码中的环路滤波方法、装置、设备及存储介质
CN111432208B (zh) * 2020-04-01 2022-10-04 山东浪潮科学研究院有限公司 一种利用神经网络确定帧内预测模式的方法
CN111432208A (zh) * 2020-04-01 2020-07-17 济南浪潮高新科技投资发展有限公司 一种利用神经网络确定帧内预测模式的方法
US20210329286A1 (en) * 2020-04-18 2021-10-21 Alibaba Group Holding Limited Convolutional-neutral-network based filter for video coding
US11902561B2 (en) * 2020-04-18 2024-02-13 Alibaba Group Holding Limited Convolutional-neutral-network based filter for video coding
WO2022067806A1 (fr) * 2020-09-30 2022-04-07 Oppo广东移动通信有限公司 Procédés de codage et de décodage vidéo, codeur, décodeur et support de stockage
KR20220123102A (ko) * 2020-12-16 2022-09-05 텐센트 아메리카 엘엘씨 비디오 코딩을 위한 방법 및 장치
US11483591B2 (en) 2020-12-16 2022-10-25 Tencent America LLC Method and apparatus for video coding
WO2022132277A1 (fr) * 2020-12-16 2022-06-23 Tencent America LLC Procédé et appareil de vidéocodage
JP7449402B2 (ja) 2020-12-16 2024-03-13 テンセント・アメリカ・エルエルシー ビデオコーディングのための方法および装置
KR102647645B1 (ko) * 2020-12-16 2024-03-15 텐센트 아메리카 엘엘씨 비디오 코딩을 위한 방법 및 장치
WO2023051222A1 (fr) * 2021-09-28 2023-04-06 腾讯科技(深圳)有限公司 Procédé et appareil de filtrage, procédé et appareil de codage, procédé et appareil de décodage, support lisible par ordinateur et dispositif électronique

Similar Documents

Publication Publication Date Title
WO2018097693A2 (fr) Procédé et dispositif de codage et de décodage vidéo, et support d'enregistrement à flux binaire mémorisé en son sein
WO2018012886A1 (fr) Procédé de codage/décodage d'images et support d'enregistrement correspondant
WO2020076036A1 (fr) Procédé et dispositif de traitement de signal vidéo à l'aide d'un procédé de configuration mpm destiné à de multiples lignes de référence
WO2018066867A1 (fr) Procédé et appareil de codage et décodage d'image, et support d'enregistrement pour la mémorisation de flux binaire
WO2019172705A1 (fr) Procédé et appareil de codage/décodage d'image utilisant un filtrage d'échantillon
WO2018030773A1 (fr) Procédé et appareil destinés au codage/décodage d'image
WO2019194425A1 (fr) Appareil et procédé destinés à appliquer un réseau neuronal artificiel à un codage ou décodage d'image
WO2019182385A1 (fr) Dispositif et procédé de codage/décodage d'image, et support d'enregistrement contenant un flux binaire
WO2017043816A1 (fr) Procédé de traitement d'image basé sur un mode de prédictions inter-intra combinées et appareil s'y rapportant
WO2017176030A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2019083334A1 (fr) Procédé et dispositif de codage/décodage d'image sur la base d'un sous-bloc asymétrique
WO2017018664A1 (fr) Procédé de traitement d'image basé sur un mode d'intra prédiction et appareil s'y rapportant
WO2018097692A2 (fr) Procédé et appareil de codage/décodage d'image et support d'enregistrement contenant en mémoire un train de bits
WO2020111785A1 (fr) Procédé de codage/décodage de signal vidéo et appareil associé
WO2017039117A1 (fr) Procédé d'encodage/décodage d'image et dispositif correspondant
WO2019182292A1 (fr) Procédé et appareil de traitement du signal vidéo
WO2018135885A1 (fr) Procédé de décodage et de codage d'image pour fournir un traitement de transformation
WO2013141596A1 (fr) Procédé et dispositif pour le codage vidéo échelonnable sur la base d'unité de codage de structure arborescente, et procédé et dispositif pour décoder une vidéo échelonnable sur la base d'unité de codage de structure arborescente
WO2017183751A1 (fr) Procédé de traitement d'image basé sur un mode d'interprédiction et dispositif associé
WO2020111843A1 (fr) Procédé et dispositif de traitement de signal vidéo utilisant un filtrage d'intraprédiction
WO2021015524A1 (fr) Procédé et dispositif de traitement de signal vidéo
WO2015133838A1 (fr) Procédé de codage/décodage d'image basé sur une unité polygonale et appareil associé
WO2020139061A1 (fr) Procédé et dispositif de codage et de décodage d'inter-prédiction
WO2020032531A1 (fr) Procédé et dispositif de codage/décodage d'image et support d'enregistrement stockant un train de bits
WO2021015586A1 (fr) Procédé et dispositif de traitement du signal vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19781638

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19781638

Country of ref document: EP

Kind code of ref document: A1