WO2021244884A1 - Filtre basé sur un réseau de neurones en codage vidéo - Google Patents

Filtre basé sur un réseau de neurones en codage vidéo Download PDF

Info

Publication number
WO2021244884A1
WO2021244884A1 PCT/EP2021/063771 EP2021063771W WO2021244884A1 WO 2021244884 A1 WO2021244884 A1 WO 2021244884A1 EP 2021063771 W EP2021063771 W EP 2021063771W WO 2021244884 A1 WO2021244884 A1 WO 2021244884A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
sample
offset
another
neural network
Prior art date
Application number
PCT/EP2021/063771
Other languages
English (en)
Inventor
Philippe Bordes
Franck Galpin
Thierry DUMAS
Pavel Nikitin
Fabrice Urban
Original Assignee
Interdigital Vc Holdings France, Sas
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interdigital Vc Holdings France, Sas filed Critical Interdigital Vc Holdings France, Sas
Priority to JP2022572477A priority Critical patent/JP2023528780A/ja
Priority to CN202180042531.5A priority patent/CN115943629A/zh
Priority to US17/925,479 priority patent/US20230188713A1/en
Priority to EP21727169.1A priority patent/EP4162680A1/fr
Publication of WO2021244884A1 publication Critical patent/WO2021244884A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present embodiments generally relate to a method and an apparatus for filtering in video encoding or decoding.
  • image and video coding schemes usually employ prediction and transform to leverage spatial and temporal redundancy in the video content.
  • intra or inter prediction is used to exploit the intra or inter picture correlation, then the differences between the original block and the predicted block, often denoted as prediction errors or prediction residuals, are transformed, quantized, and entropy coded.
  • the compressed data are decoded by inverse processes corresponding to the entropy coding, quantization, transform, and prediction.
  • a method of video encoding or decoding comprising: accessing a version of reconstructed samples of a region of a picture; generating a weight for a respective sample of a plurality of samples in said region based on said version of reconstructed samples of said region, using a neural network; encoding or decoding a single offset parameter for said region; and filtering said region by adjusting said plurality of samples in said region, wherein a sample in said region is adjusted responsive to a weight for said sample and said offset for said region.
  • an apparatus for video encoding or decoding comprising one or more processors, wherein said one or more processors are configured to: access a version of reconstructed samples of a region of a picture; generate a weight for a respective sample of a plurality of samples in said region based on said version of reconstructed samples of said region, using a neural network; encode or decode a single offset parameter for said region; and filter said region by adjusting said plurality of samples in said region, wherein a sample in said region is adjusted responsive to a weight for said sample and said offset for said region.
  • an apparatus of video encoding or decoding comprising: means for accessing a version of reconstructed samples of a region of a picture; means for generating a weight for a respective sample of a plurality of samples in said region based on said version of reconstructed samples of said region, using a neural network; means for encoding or decoding a single offset parameter for said region; and means for filtering said region by adjusting said plurality of samples in said region, wherein a sample in said region is adjusted responsive to a weight for said sample and said offset for said region.
  • an apparatus of video encoding or decoding comprising: means for accessing a version of reconstructed samples of a region of a picture; means for generating a plurality of weights for a sample of a plurality of samples in said region based on said version of reconstructed samples of said region, using a plurality of neural networks; means for encoding or decoding a plurality of offset parameters for said region; and means for filtering said region by adjusting said plurality of samples in said region, wherein a sample in said region is adjusted responsive to said plurality of weights for said sample and said plurality of offsets for said region.
  • One or more embodiments also provide a computer program comprising instructions which when executed by one or more processors cause the one or more processors to perform the encoding method or decoding method according to any of the embodiments described above.
  • One or more of the present embodiments also provide a computer readable storage medium having stored thereon instructions for encoding or decoding video data according to the methods described above.
  • One or more embodiments also provide a computer readable storage medium having stored thereon a bitstream generated according to the methods described above.
  • One or more embodiments also provide a method and apparatus for transmitting or receiving the bitstream generated according to the methods described above.
  • FIG. 1 illustrates a block diagram of a system within which aspects of the present embodiments may be implemented.
  • FIG. 2 illustrates a block diagram of an embodiment of a video encoder.
  • FIG. 3 illustrates a block diagram of an embodiment of a video decoder.
  • FIG. 4 illustrates an example of successive loop filtering.
  • FIG. 5 is a pictorial example illustrating four 1-D directional patterns for EO (Edge Offset) sample classification.
  • FIG. 6 a pictorial example illustrating that positive offsets are used for categories 1 and 2, and negative offsets are used for categories 3 and 4 of EO classification.
  • FIG. 7 is a pictorial example illustrating BO (Band Offset) with the associated starting band position and offsets of four consecutive bands.
  • FIG. 8 illustrates an exemplary method for decoding a bitstream using SAO.
  • FIG. 9 illustrates an example of using a Convolutional Neural Network (CNN) to restore images after reconstruction.
  • FIG. 10 illustrates an encoder architecture according to an embodiment.
  • FIG. 11 illustrates a portion of a picture to be filtered, the weight mask from the NN, and the filter result.
  • FIG. 12 illustrates an example of an NN used in the filter, according to an embodiment.
  • FIG. 13 illustrates a decoder architecture according to an embodiment.
  • FIG. 14 illustrates an encoder architecture with pre-classification as input to the neural network, according to an embodiment.
  • FIG. 15 illustrates an encoder architecture with a plurality of neural network based filters, according to an embodiment.
  • FIG. 16 illustrates an encoder architecture with a plurality of neural network based filters, according to another embodiment.
  • FIG. 17 illustrates an encoder architecture with a plurality of neural network based filters, according to yet another embodiment.
  • FIG. 18 illustrates a decoder architecture with selection of one CNN among multiple CNNs, according to an embodiment.
  • FIG. 19 illustrates a decoder architecture with selection of several CNNs among multiple
  • FIG. 20 illustrates an encoding process that uses multiple NNs for correction, according to an embodiment.
  • FIG. 21 illustrates an example of the linear combination of NN outputs.
  • FIG. 22 illustrates a decoding process that uses multiple NNs for correction, according to an embodiment.
  • FIG. 23 illustrates a method for selecting K NNs to be combined, according to an embodiment.
  • FIG. 25 illustrates that the number of actually used NNs depends on the partitioning shape, according to an embodiment.
  • FIG. 26 illustrates an example of application of 3x4 convolution layer in one direction only.
  • FIG. 27 illustrates an example of training the NNs based on datasets with different coding mode features, according to an embodiment.
  • FIG. 28 illustrates two examples of activation functions: ReLU and Leaky ReLU.
  • FIG. 1 illustrates a block diagram of an example of a system in which various aspects and embodiments can be implemented.
  • System 100 may be embodied as a device including the various components described below and is configured to perform one or more of the aspects described in this application. Examples of such devices, include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia settop boxes, digital television receivers, personal video recording systems, connected home appliances, and servers.
  • Elements of system 100 singly or in combination, may be embodied in a single integrated circuit, multiple ICs, and/or discrete components.
  • the processing and encoder/decoder elements of system 100 are distributed across multiple ICs and/or discrete components.
  • system 100 is communicatively coupled to other systems, or to other electronic devices, via, for example, a communications bus or through dedicated input and/or output ports.
  • system 100 is configured to implement one or more of the aspects described in this application.
  • the system 100 includes at least one processor 110 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this application.
  • Processor 110 may include embedded memory, input output interface, and various other circuitries as known in the art.
  • the system 100 includes at least one memory 120 (e.g., a volatile memory device, and/or a non-volatile memory device).
  • System 100 includes a storage device 140, which may include non-volatile memory and/or volatile memory, including, but not limited to, EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic disk drive, and/or optical disk drive.
  • the storage device 140 may include an internal storage device, an attached storage device, and/or a network accessible storage device, as non-limiting examples.
  • System 100 includes an encoder/decoder module 130 configured, for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module 130 may include its own processor and memory.
  • the encoder/decoder module 130 represents module(s) that may be included in a device to perform the encoding and/or decoding functions. As is known, a device may include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 130 may be implemented as a separate element of system 100 or may be incorporated within processor 110 as a combination of hardware and software as known to those skilled in the art.
  • Program code to be loaded onto processor 110 or encoder/decoder 130 to perform the various aspects described in this application may be stored in storage device 140 and subsequently loaded onto memory 120 for execution by processor 110.
  • one or more of processor 110, memory 120, storage device 140, and encoder/decoder module 130 may store one or more of various items during the performance of the processes described in this application. Such stored items may include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.
  • memory inside of the processor 110 and/or the encoder/decoder module 130 is used to store instructions and to provide working memory for processing that is needed during encoding or decoding.
  • a memory external to the processing device (for example, the processing device may be either the processor 110 or the encoder/decoder module 130) is used for one or more of these functions.
  • the external memory may be the memory 120 and/or the storage device 140, for example, a dynamic volatile memory and/or a non-volatile flash memory.
  • an external non-volatile flash memory is used to store the operating system of a television.
  • a fast external dynamic volatile memory such as a RAM is used as working memory for video coding and decoding operations, such as for MPEG-2, HEVC, or VVC.
  • the input to the elements of system 100 may be provided through various input devices as indicated in block 105.
  • Such input devices include, but are not limited to, (i) an RF portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Composite input terminal, (iii) a USB input terminal, and/or (iv) an HDMI input terminal.
  • the input devices of block 105 have associated respective input processing elements as known in the art.
  • the RF portion may be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) down converting the selected signal, (iii) band- limiting again to a narrower band of frequencies to select (for example) a signal frequency band which may be referred to as a channel in certain embodiments, (iv) demodulating the down converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets.
  • the RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, band- limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers.
  • the RF portion may include a tuner that performs various of these functions, including, for example, down converting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband.
  • the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, down converting, and filtering again to a desired frequency band.
  • the USB and/or HDMI terminals may include respective interface processors for connecting system 100 to other electronic devices across USB and/or HDMI connections. It is to be understood that various aspects of input processing, for example, Reed-Solomon error correction, may be implemented, for example, within a separate input processing IC or within processor 110 as necessary. Similarly, aspects of USB or HDMI interface processing may be implemented within separate interface ICs or within processor 110 as necessary.
  • the demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 110, and encoder/decoder 130 operating in combination with the memory and storage elements to process the datastream as necessary for presentation on an output device.
  • connection arrangement 115 for example, an internal bus as known in the art, including the I2C bus, wiring, and printed circuit boards.
  • the system 100 includes communication interface 150 that enables communication with other devices via communication channel 190.
  • the communication interface 150 may include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel 190.
  • the communication interface 150 may include, but is not limited to, a modem or network card and the communication channel 190 may be implemented, for example, within a wired and/or a wireless medium.
  • Data is streamed to the system 100, in various embodiments, using a Wi-Fi network such as IEEE 802.11.
  • the Wi-Fi signal of these embodiments is received over the communications channel 190 and the communications interface 150 which are adapted for Wi-Fi communications.
  • the communications channel 190 of these embodiments is typically connected to an access point or router that provides access to outside networks including the Internet for allowing streaming applications and other over-the-top communications.
  • Other embodiments provide streamed data to the system 100 using a set-top box that delivers the data over the HDMI connection of the input block 105.
  • Still other embodiments provide streamed data to the system 100 using the RF connection of the input block 105.
  • the system 100 may provide an output signal to various output devices, including a display 165, speakers 175, and other peripheral devices 185.
  • the other peripheral devices 185 include, in various examples of embodiments, one or more of a stand-alone DVR, a disk player, a stereo system, a lighting system, and other devices that provide a function based on the output of the system 100.
  • control signals are communicated between the system 100 and the display 165, speakers 175, or other peripheral devices 185 using signaling such as AV.Link, CEC, or other communications protocols that enable device-to-device control with or without user intervention.
  • the output devices may be communicatively coupled to system 100 via dedicated connections through respective interfaces 160, 170, and 180.
  • the output devices may be connected to system 100 using the communications channel 190 via the communications interface 150.
  • the display 165 and speakers 175 may be integrated in a single unit with the other components of system 100 in an electronic device, for example, a television.
  • the display interface 160 includes a display driver, for example, a timing controller (T Con) chip.
  • the display 165 and speaker 175 may alternatively be separate from one or more of the other components, for example, if the RF portion of input 105 is part of a separate set-top box.
  • the output signal may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs.
  • FIG. 2 illustrates an example video encoder 200, such as a High Efficiency Video Coding (HEVC) encoder.
  • FIG. 2 may also illustrate an encoder in which improvements are made to the HEVC standard or an encoder employing technologies similar to HEVC, such as a VVC (Versatile Video Coding) encoder under development by JVET (Joint Video Exploration Team).
  • HEVC High Efficiency Video Coding
  • the terms “reconstructed” and “decoded” may be used interchangeably, the terms “encoded” or “coded” may be used interchangeably, the terms “pixel” or “sample” may be used interchangeably, and the terms “image,” “picture” and “frame” may be used interchangeably.
  • the term “reconstructed” is used at the encoder side while “decoded” is used at the decoder side.
  • the video sequence may go through pre-encoding processing (201), for example, applying a color transform to the input color picture (e.g., conversion from RGB 4:4:4 to YCbCr 4:2:0), or performing a remapping of the input picture components in order to get a signal distribution more resilient to compression (for instance using a histogram equalization of one of the color components).
  • Metadata can be associated with the pre-processing, and attached to the bitstream.
  • a picture is encoded by the encoder elements as described below.
  • the picture to be encoded is partitioned (202) and processed in units of, for example, CUs.
  • Each unit is encoded using, for example, either an intra or inter mode.
  • intra prediction 260
  • inter mode motion estimation (275) and compensation (270) are performed.
  • the encoder decides (205) which one of the intra mode or inter mode to use for encoding the unit, and indicates the intra/inter decision by, for example, a prediction mode flag.
  • the encoder may also blend (263) intra prediction result and inter prediction result, or blend results from different intra/inter prediction methods.
  • Prediction residuals are calculated, for example, by subtracting (210) the predicted block from the original image block.
  • the motion refinement module (272) uses already available reference picture in order to refine the motion field of a block without reference to the original block.
  • a motion field for a region can be considered as a collection of motion vectors for all pixels with the region. If the motion vectors are sub-block-based, the motion field can also be represented as the collection of all sub-block motion vectors in the region (all pixels within a sub-block has the same motion vector, and the motion vectors may vary from sub-block to sub-block). If a single motion vector is used for the region, the motion field for the region can also be represented by the single motion vector (same motion vectors for all pixels in the region).
  • the prediction residuals are then transformed (225) and quantized (230).
  • the quantized transform coefficients, as well as motion vectors and other syntax elements, are entropy coded (245) to output a bitstream.
  • the encoder can skip the transform and apply quantization directly to the non-transformed residual signal.
  • the encoder can bypass both transform and quantization, i.e., the residual is coded directly without the application of the transform or quantization processes.
  • the encoder decodes an encoded block to provide a reference for further predictions.
  • the quantized transform coefficients are de-quantized (240) and inverse transformed (250) to decode prediction residuals. Combining (255) the decoded prediction residuals and the predicted block, an image block is reconstructed.
  • FIG. 3 illustrates a block diagram of an example video decoder 300.
  • a bitstream is decoded by the decoder elements as described below.
  • Video decoder 300 generally performs a decoding pass reciprocal to the encoding pass as described in FIG. 2.
  • the encoder 200 also generally performs video decoding as part of encoding video data.
  • the input of the decoder includes a video bitstream, which can be generated by video encoder 200.
  • the bitstream is first entropy decoded (330) to obtain transform coefficients, motion vectors, and other coded information.
  • the picture partition information indicates how the picture is partitioned.
  • the decoder may therefore divide (335) the picture according to the decoded picture partitioning information.
  • the transform coefficients are de-quantized (340) and inverse transformed (350) to decode the prediction residuals. Combining (355) the decoded prediction residuals and the predicted block, an image block is reconstructed.
  • the predicted block can be obtained (370) from intra prediction (360) or motion- compensated prediction (i.e., inter prediction) (375).
  • the decoder may blend (373) the intra prediction result and inter prediction result, or blend results from multiple intra/inter prediction methods.
  • the motion field may be refined (372) by using already available reference pictures.
  • In-loop filters (365) are applied to the reconstructed image.
  • the filtered image is stored at a reference picture buffer (380).
  • the decoded picture can further go through post-decoding processing (385), for example, an inverse color transform (e.g. conversion from YCbCr 4:2:0 to RGB 4:4:4) or an inverse remapping performing the inverse of the remapping process performed in the pre-encoding processing (201).
  • post-decoding processing can use metadata derived in the pre-encoding processing and signaled in the bitstream.
  • FIG. 4 illustrates an example of successive loop filtering.
  • the output is the reconstructed picture samples.
  • the input to the encoder is the sum (430) of predicted samples (410) and the decoded/reconstructed prediction residuals (420), which may be clipped (440) to be within the dynamic range supported by the encoder/decoder.
  • the input to in-loop filtering is the predicted samples directly.
  • Encoding/decoding filter parameters e.g., DBF, SAO, ALF but not BF.
  • SAO is a sample-based filtering operation on a CTU (Coding Tree Unit) basis that allows to add offsets to some categories of reconstructed samples to reduce coding artefacts.
  • a CTU is composed of one CTB per component.
  • SAO can be activated or deactivated per CTB.
  • Two SAO modes are specified: edge offset (EO) and band offset (BO).
  • EO edge offset
  • BO band offset
  • SAO the sample classification is based on local directional structures in the picture to be filtered.
  • the parameters for EO or BO may be explicitly coded or derived from the neighborhood.
  • SAO can be applied to the luma and chroma components, where the SAO mode is the same for Cb and Cr components.
  • the SAO parameters are configured individually for each color component.
  • EO uses four 1-D directional patterns: horizontal, vertical, 135° diagonal, and 45° diagonal, as shown in FIG. 5, for sample classification, where label “p c ” represents a current sample and labels “po” and “pi” represent two neighboring samples.
  • Four EO classes are specified based on the directions, and each EO class corresponds to one direction. The selected EO class is signaled in the bitstream as side information.
  • the categorization rules for a sample are summarized in TABLE 1. As also shown in FIG. 6, categories 1 and 4 are associated with a local valley and a local peak along the selected 1-D pattern, respectively, categories 2 and 3 are associated with concave and convex corners along the selected 1-D pattern, respectively.
  • sample offsets may be 0 and the corresponding samples are not adjusted. If there are other in-loop filters after the SAO filter, the filtered samples go through more filtering. The filtered reconstructed samples are used as the final output of the decoder.
  • the encoder may perform a similar process as method 800.
  • step 810 is implemented to obtain the reconstructed samples, for example, the SAO filtering process uses the reconstructed samples from the deblocking filter as input.
  • the offset values are encoded in the bitstream.
  • the filtered reconstructed samples can be used as references for other pictures.
  • the offset can be decided by collecting for each category c of each class the sum of the difference between the original (target) sample value and the reconstructed sample value diff(c).
  • N(c) the number of samples of the current block that belong to category c
  • FIG. 9 illustrates an example of using a Convolutional Neural Network (CNN) to restore images after reconstruction.
  • An image S is first reconstructed as S.
  • the image is restored as S by adding a correction R computed (inferred) by the CNN.
  • R computed (inferred)
  • a loss function based on the error between the restored image and the original image is minimized.
  • the CNN is usually composed of several convolutional layers followed by an activation layer (for example, a sigmoid, ReLU or Leaky ReLU function) and the loss function may also consider some regularization terms to stabilize the CNN training.
  • an activation layer for example, a sigmoid, ReLU or Leaky ReLU function
  • the loss function may also consider some regularization terms to stabilize the CNN training.
  • CNNs share weights along the spatial dimensions of its input and its intermediate representations. Given the local statistics of natural images, the CNNs usually need much fewer parameters than fully- connected neural networks to achieve equivalent performance in image restoration for instance. Moreover, the weight sharing makes the number of parameters in a CNN independent of its input size, meaning that a trained CNN can restore images of various sizes.
  • a 3-layer CNN is shared by the luma and chroma components.
  • the parameters of the CNN are trained and then encoded in the bitstreams with the first I-picture of each random-access segment (RAS).
  • the training uses only pictures of temporal levels 0 and 1.
  • a multi-level on/off control is applied at picture, coding tree block (CTB), and 32x32 block levels for each color component.
  • CTB coding tree block
  • three different 2-layer CNNs for luma and three different 2-layer CNNs for chroma are encoded.
  • the index of the best CNN to use for each of luma and chroma is signaled per CTB.
  • the CNNs are compressed to 6-bits per weight.
  • the on/off control is performed per tile.
  • a set of bigger but fixed neural network parameters are trained once, one per QP.
  • the three input components (Y, U, V) are concatenated to be processed together by the CNN.
  • the input sample blocks are padded with a certain size of pixels corresponding to the total padding size of CNN during training. The number of parameters may be even reduced by repeating some layers.
  • the present application proposes an in-loop filter based on neural networks (NN) that may replace one or several existing in-loop filters, or may be added to the existing in-loop filters. Since the proposed filter adjusts the samples with adaptive offsets as in HEVC or VVC SAO filters, we denote the proposed filter as an NN-based SAO (Sample Adaptive Offset) filter.
  • an NN-based filter adjusts the reconstructed samples by offsets as performed in SAO filters.
  • the NN filter determines a weight mask. A weight in this mask corresponds to either the decision of whether a sample of the reconstructed block is corrected or the strength of the correction of this sample, depending on the value of this weight.
  • the NN filter is controlled with few parameters (offset) to control the strength of the filter. These parameters are encoded in the bitstream.
  • NN-based in-loop filter architecture may replace one or several existing in-loop filters, or may be added to the existing in-loop filters. Since the proposed filter adjusts the samples with
  • FIG. 10 illustrates an encoder architecture (1000) according to an embodiment.
  • W represents a weight mask. If the weights are binary, i.e., either 0 or 1, the mask weight of index i decides whether the sample of S of index i is corrected. If the weights are non-binary, e.g., floats, the absolute value of the weight of index i can be viewed as the strength of the correction for the sample of S of index i.
  • the term offset represents the control parameter for the strength of the filter correction.
  • the sets of data (5, S, W ⁇ are typically blocks (or matrices) of the same size. However, they can be re-arranged into 1-D or N- D vectors. Appropriate padding may be added at layers input or output to guarantee that the size of W is the same as the size of S. Alternatively, the input block S may be larger than W to consider the reduction from the first layer(s).
  • the corresponding (local) reconstructed block is S.
  • the video encoder (1010) may correspond to encoder 200, except the in-loop filter (265) that is extended or replaced with the proposed filter (1040).
  • the NN (1020) is typically composed of several convolutional layers, but may be composed of fully connected and/or short cut links for example. Its input is the reconstructed block to be filtered and the output is the weight mask W.
  • the value of “offset” is encoded in the bitstream for each block (1050).
  • the value “offset” is quantized before coding.
  • the operations “X” (1060) and “+” (1070) correspond to the product of all the terms of W by the scalar value “offset” and the sum term by term respectively. Because “offset” is used to scale the weight mask W, the offset may also be considered as a scaling parameter. In a variant, the values of W are clipped, for example between -1 and 1.
  • FIG. 11(a), 11(b) and 11(c) illustrate a portion of the initial reconstructed picture to be filtered, the corresponding weight masks from the NN, and the filtered result, respectively.
  • a six-layer CNN is used, as shown in FIG. 12, with ReLU activation and one final clipping layer.
  • FIG. 11(b) different shades correspond to different weight values.
  • Leaky-ReLU activation function such as the Leaky-ReLU as depicted in FIG. 28 with alpha parameter equal to 0.1 for instance.
  • Leaky-ReLU activation function has two merits. First it facilitates the error backpropagation algorithm hence convergence at the NN training stage; and second it allows negative weight mask values. In a variant, one uses Leaky-ReLU for internal layers and ReLU for the last layer only.
  • pixel values are:
  • the filtered result is
  • FIG. 13 illustrates a decoder architecture (1300) according to an embodiment.
  • the input of the decoder includes a bitstream, for example, one generated by encoder 1000.
  • the video decoder module (1310) may correspond to decoder 300, except the in-loop filter (365) that is extended or replaced with the proposed filter (1340).
  • the NN (1320) should be the same as the one used in a corresponding encoder in order to properly decode the bitstream.
  • the input to the NN (1320) is the reconstructed block to be filtered and the output is the weight mask W.
  • the output of the NN filter (1320) may be the scaled offsets (residuals) for correcting one component (1 channel) or more, e.g., luma and chroma residuals samples (3 channels) or 2 chroma residual samples (2 channels), with possibly other information.
  • the filter control parameter “offset” is decoded (1310) from the bitstream for the block.
  • the control parameter is then multiplied (1360) with the weight mask. Namely, the control parameter is scaled by a weight for each sample in the block in order to generate the scaled offset for each sample.
  • the scaled offset is then added (1370) to the corresponding sample in the initial reconstructed block.
  • the product W. offset provides the adjustment offset for each sample in the block. Note that here only a single control parameter needs to be conveyed for the block for the filtering process, as the parameters for the NN are not transmitted in the bitstream. Thus, with very little signaling overhead, the proposed filter achieves sample- wise adjustment in filtering which can improve the compression efficiency.
  • the NN module has additional inputs such as the quantization step (QP), the image type (e.g., type I, P or B), the reconstructed residuals samples or reconstructed samples from another component.
  • the additional input is a classification module (1420) that classifies the samples of the blocks as depicted in FIG. 14.
  • the module (1420) is illustrated in dashed lines to show it is optional. While a decoder is shown in FIG. 14, a corresponding encoder can be modified accordingly.
  • This classification (1420) can be based on local gradients or other semantic classifications.
  • the classifier is the same as the one used in existing in-loop filters such as HEVC/VVC SAO, ALF classifier or deblocking filter classifier.
  • the classifier may associate to each sample of S a binary label (0: not in the class, 1: belongs to the class), one integer label among T values (ci, C2, ... CT ⁇ , or a non-integer value (e.g., floating point cnoat).
  • One advantage of using pre-classifier input is that the number of layers of the NN (1430) may be reduced, since the purpose of the first layer(s) is to perform classification in general. However, the use of an a priori explicit classifier may reduce the ability of the training to learn optimal classification.
  • NN filters 1530, 1540 are used as shown in FIG. 15.
  • filters k is set to 2 in FIG. 15
  • offsetk ⁇ i o..k.
  • the best filter to use is selected (1570) as the one minimizing the distortion of S L with S or the one that minimizes the rate-distortion tradeoff (distortion and encoding cost of offseti and filter index i, 1550, 1560).
  • filter index b of the selected filter and associated offsetb are encoded in the bitstream explicitly or implicitly via prediction (using previously reconstructed parameters for instance).
  • a classifier 1510, 1520
  • a single RDO module (1630) selects which CNN filter will be used finally. In the example depicted in FIG.
  • both offsets values offseti, offset2 ⁇ will be encoded in the bitstream.
  • FIG. 18 illustrates a decoder architecture (1800) with multiple CNNs according to an embodiment.
  • the input of the decoder includes a bitstream.
  • the video decoder module (1810) may correspond to decoder 300, except the in-loop filter (365) that is extended or replaced with an NN based filter (1870).
  • the filter control parameters “offset” and filter index “b” are decoded (1810) from the bitstream for the block.
  • the filter index “b” controls (1840) which one of K CNNs (1820, 1830) is to be used for generating the weight mask W.
  • the control parameter “offset” is then multiplied (1850) with the weight mask.
  • the scaled offset is then added (1860) to the corresponding sample in the initial reconstructed block.
  • the CNNs are used without pre-classification.
  • the pre classification module as illustrated for the encoder for example, in FIGs. 15-16, can be applied.
  • the selection of the CNN to be used for a block and the control parameter “offset” allows tailoring the filtering process to the local characteristics of the current block.
  • the choice of a single CNN may not be optimal because it may be preferred to cumulate the benefit of the two or more CNNs in some way. This may depend also on the way the CNNs have been trained.
  • the neural network ensemble is a learning paradigm where multiple neural networks are jointly used to solve a problem.
  • FIG. 19 illustrates a decoding process (1900) that uses multiple NNs for correction according to an embodiment.
  • the decoder decodes K filter indexes (io, ... ik-i ⁇ allowing to select (1940) K filters among N available NNs, and K offsets (offseto,...offsetk-i ⁇ .
  • K K filter indexes (io, ... ik-i ⁇ allowing to select (1940) K filters among N available NNs, and K offsets (offseto,...offsetk-i ⁇ .
  • K K
  • the inputs to the K NNs are the reconstructed block S and possibly additional information such as QP, coding mode or samples of other components.
  • the K weight masks and the K offsets are combined (mutual combination) (1950) using a weighted linear combination of the weight masks and offsets to derive the additive correction Corr(x) to be applied (1960) to the reconstructed samples S(x), where “x” denote the sample at position “x” in the block, as follows:
  • FIG. 20 illustrates an encoding process (2000) that uses multiple NNs for correction, according to an embodiment.
  • the encoder selects K NNs among N available NNs.
  • K 2. More generally, the process can be applied when there are more than one NNs (2020, 2025) used in the filter.
  • the inputs to the K NNs are the reconstructed block and possibly additional information such as QP, coding mode, reconstructed residuals or reconstructed samples of other components (2015).
  • the value of K may be different for luma or chroma NN-based filters, e.g., if luma and chroma do not share same filter.
  • the scaling parameters offseti and offsets can be derived (2040).
  • the scaling parameters can be coded per region or per block (CTU or CU) in the bitstream.
  • the mutual combination of the K CNNs allows building the additive correction Corr(x) to be applied to the reconstructed samples S(x) through a weighted linear combination of the NN outputs (2050), where the weights in the linear combination are the scaling parameters offsetk.
  • the linear combination is illustrated in an example in FIG. 21. Mathematically, the linear combination can be expressed as: Adding the correction term to the initial reconstructed block, the final reconstructed block is generated (2060).
  • the derivation of the scaling values (offsetk) can be made at the encoder side using least square minimization (LSM) of the mean squared error (MSE):
  • FIG. 22 illustrates a decoding process (2200) that uses multiple NNs for correction, according to an embodiment. Similar to method 2000, multiple NNs are used to filter the initial reconstructed samples S. At the encoder side, method 2000 derives offseti and offset2 at step 2040. At the decoder side, method 2200 decodes offseti and offset2 from the bitstream.
  • bestCost is set to a large value.
  • ko ki
  • NN(ko) is applied (2310) and one single offseto is derived (2345).
  • ko 1 ki, NN(ki) is also applied (2310) and scaling parameters ⁇ offseto; offseti ⁇ are derived (2340).
  • the correction factor is calculated (2350, 2355), and the corrected reconstructed block Reels ' ) is computed (2360).
  • the encoding cost is estimated (2370) with a Lagrangian multiplier for example, taking into the distortion with original block and the coding cost of ⁇ ko; ki; offseto; offseti ⁇ . If the cost from the current pair ⁇ ko; ki ⁇ is smaller than bestCost (2380), bestCost is set to the current Cost, and ⁇ ko; ki; offseto; offseti, Rec’ ⁇ is stored (2385). After all possible pairs are tested, Rec’ associated with the bestCost is restored, and parameters ⁇ ko; ki; offseto; offseti ⁇ are encoded (2390).
  • TABLE 2 provides an example for coding syntax elements related to various embodiment described above.
  • TABLE 2 provides an example of syntax for coding the indexes of NNs and scaling parameters to be used for correcting one reconstructed block with a mutual combination of NNs.
  • filter_luma_flag specifies whether the luma or chroma sample block is corrected (with NN filters) or not, respectively.
  • the variable cpt scale off corresponds to the number of non-zero scaling parameters offsetk.
  • the N possible indexes are ordered into a table which is up-dated before coding the NN parameters (nn_filter()) for each block.
  • the up-date is made by moving on top of the list the most probable indexes (ex: indexes used by previously coded left anf top blocks). In this way, the old coded indexes slowly go to the bottom of the list while the most recently used ones are on top of the list.
  • pred_scale_off[i] (i > 0) ? pred_scale_off[i-l] : 0
  • pred_scale_off[i]” is equal to the last decoded value of offx .
  • idx_filter_off_val_chroma allows deriving the index kc of the NN to be used for inferring the K NN outputs to be combined.
  • kc idx filter off val chroma.
  • TABLE 3a shows the result of using the proposed Mutual Combination of NNs method for luma NN filters, compared to the NN based filter without combining NN outputs. With the proposed combination of NN outputs, about 0.89% bitrate reduction is obtained compared to 0.57% bitrate reduction for the method without combination.
  • TABLE 3b shows results obtained with the proposed Mutual Combination of NNs with another set of NNs. The results of TABLE 3a and TABLE 3b (left) have been obtained with NNs trained with ReLU activation function. The results of TABLE 3b (right) have been obtained with NNs trained with Leaky ReLU activation function. In this example, about 2.45% bitrate reduction is obtained using LeakyReLU activation function compared to 1.60% bitrate reduction with ReLU activation function.
  • NNs are combined for correcting the current reconstructed block using spatial segmentation of the block into several (K) regions, where different NNs may be used for different block partitions.
  • one scaling parameter (offsetk) is coded for each partition/region of the block.
  • TABLE 4 provides an example of syntax elements associated with this embodiment.
  • TABLE 4 provides an example of syntax for coding index of NN, partition shape (dir split) and scaling parameters to be used for correcting one reconstructed block with mutual combination of NNs.
  • K 1 for chroma component.
  • the number of actually used NNs ⁇ cpt scale off) depends on the partitioning shape as shown in FIG. 25.
  • the semantics of the syntax elements are the same as in TABLE 2.
  • the index or scaling parameter predictors may be the values of the previously decoded partitions.
  • TABLE 5 shows the result of using the proposed spatial combination of NNs method, compared to the NN based filter without combining NN outputs.
  • one may signal in the bitstream (e.g., slice header or picture header) how many NNs may be combined (K).
  • one may signal the set of the N NNs among a larger set of M NNs, with M > N.
  • the N NNs may be inferred from other parameters in the bitstream, such as the quantization parameter (QP), the picture size or the nature of the video (e.g., sport, game, movie).
  • QP quantization parameter
  • the decoder may infer the N NNs from the current QP.
  • the subset can be made of the NNs that have been trained with some (e.g., 2) QP values below and some (e.g., 2) QP values above the current QP value.
  • the derivation of the scaling parameters ⁇ offsetk, bias ⁇ can be made by the encoder using for example LSM method, which involves a system of K+l equations obtained through the partial derivation of (eq.3) relatively to the variables ⁇ offsetk, bias ⁇ equal to zero.
  • Region-based NN-filter e.g., Deblocking filter
  • the NN-filter is a region-based filter specialized in correcting some spatially located artefacts of known locations (e.g., deblocking filters), the correction may be limited to these known locations (e.g., reconstructed CU edges).
  • the NN-filter inference may be one direction scanning convolution as shown in FIG. 26, in the direction of the spatially known artefacts location (e.g., horizontal or vertical CU edges).
  • the training of NNs can be made with traditional methods based on supervised learning where the output of the NN is matched with the desired output (original signal), trying to minimize a loss function such as the difference between NN output and the desired output.
  • the training of the NN parameters is performed by minimizing the loss with gradient descent algorithms.
  • the loss can be the distortion
  • the dataset contains set of pairs ⁇ S, S ⁇ that may be block patches.
  • the classification may be done with coding modes, with datasets created from blocks coded with a range of QPs, or blocks selected from I pictures only, or P or B pictures only, as illustrated in FIG. 27. That is, one can train several NNs based on datasets with different coding mode features. Also, the training may be done in at least two passes. In the first pass, a set of NN (NN-1) are trained (2740) with patches extracted (2730) from Intra pictures of decoded bitstreams B1 (2710, 2720), then a set of video sequences are encoded (2750) with these NN-1 filters enabled on Intra pictures only to generate bitstreams B2. In the second pass, one can extract (2770) patches from Inter pictures of decoded bitstreams B2 (2760) in order to train (2780) another set of NNs (NN-2), dedicated for filtering Inter pictures.
  • the purpose of the NN filter is to replace existing filters (e.g., SAO, ALF9) used in the bitstreams, one can select the S values with the classification existing in the bitstream. For example, considering SAO, if S has been encoded with SAO parameters EO 90, then it will be associated with NN-filter associated with “EO_90”.
  • SAO if S has been encoded with SAO parameters EO 90, then it will be associated with NN-filter associated with “EO_90”.
  • the classification in the bitstream may be biased by the encoder choice that may have been based on rate-distortion and other contextual considerations.
  • the coding cost (rate) depends on the CABAC contexts which depend on the history of the CABAC encoder. For the training, it may be preferable not to consider the rate cost but the distortion only.
  • One can overcome this limitation by choosing for S the CTUs coded in mode NEW only (discarding the modes merge and OFF) but the encoding bias still exists.
  • ⁇ “i” is marked as “not placed” o for each data “i” in MD: o while ( data “i” marked as not placed in one dataset )
  • the NN filtering process is performed block by block as the current video standards are usually block based.
  • the present embodiments can be applied to a region that has a shape that is different from rectangular or square, as the NN can be trained and implemented for other shapes, or can be a fully convolutional network, hence independent of the region shape or size.
  • a deep Neural Network is provided to restore images after reconstruction by a video codec, to replace or complement the SAO filter.
  • the proposed filters leverage the power of a CNN for the classification of pixels to correct, while keeping the correction “closed- loop” by computing at the encoder the optimal correction to apply.
  • the CNN can also compute the amount of correction to set on a particular pixel. It leverages the benefit of encoding a parameter to control the filter action while only requiring small amount of data to be encoded in the bitstream.
  • the NN can generate a pixel-wise weight mask (values in the mask may vary from pixel to pixel), the actual offset (weight * offset) to be applied to adjust the pixels in the block may vary from pixel to pixel, thus achieving a finer granularity than the SAO filter in HEVC and VVC with a lower signaling cost.
  • the NN may also produce the weights on a sub-block basis (same weight within a sub-block, but weights can vary from sub-block to sub-block in the block).
  • the proposed NN filter may be applied at some specific locations only in the picture. For example, it may be used to correct blocking artefact specifically, which occurs near block frontier mainly, and/or at transform border only or prediction unit only.
  • the methods are not limited to NN based filter but can be applied to any other or traditional filters where correction terms are added to reconstructed pictures to improve image quality and reduce coding artefacts. While in-loop filtering is described in the above examples, the proposed filtering methods can also be performed out of the coding loop, for example, as a post-processing step applied outside the decoder. [148] Various methods are described herein, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined. Additionally, terms such as “first”, “second”, etc.
  • first decoding may be used in various embodiments to modify an element, component, step, operation, etc., for example, a “first decoding” and a “second decoding”. Use of such terms does not imply an ordering to the modified operations unless specifically required. So, in this example, the first decoding need not be performed before the second decoding, and may occur, for example, before, during, or in an overlapping time period with the second decoding.
  • Various methods and other aspects described in this application can be used to modify modules, for example, the motion refinement and motion compensation modules (270, 272, 372, 375), of a video encoder 200 and decoder 300 as shown in FIG. 2 and FIG. 3.
  • the present aspects are not limited to VVC or HEVC, and can be applied, for example, to other standards and recommendations, and extensions of any such standards and recommendations. Unless indicated otherwise, or technically precluded, the aspects described in this application can be used individually or in combination.
  • Decoding may encompass all or part of the processes performed, for example, on a received encoded sequence in order to produce a final output suitable for display.
  • processes include one or more of the processes typically performed by a decoder, for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding.
  • a decoder for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding.
  • Various implementations involve encoding.
  • “encoding” as used in this application may encompass all or part of the processes performed, for example, on an input video sequence in order to produce an encoded bitstream.
  • the implementations and aspects described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device.
  • processors also include communication devices, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • references to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment.
  • the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
  • this application may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
  • Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
  • This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.
  • the word “signal” refers to, among other things, indicating something to a corresponding decoder.
  • the encoder signals a quantization matrix for de-quantization.
  • the same parameter is used at both the encoder side and the decoder side.
  • an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter.
  • signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments.
  • signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry the bitstream of a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Dans un mode de réalisation de l'invention, pour effectuer un filtrage dans la boucle d'une version d'échantillons reconstruits d'un bloc, il suffit de signaler un seul paramètre de décalage dans le flux binaire. Sur la base de la version d'échantillons reconstruits, un masque de poids par pixel est généré à l'aide d'un réseau de neurones. Étant donné que les paramètres du réseau de neurones sont connus au niveau aussi bien du codeur que du décodeur, ces paramètres n'ont pas besoin d'être signalés dans le flux binaire. L'unique paramètre de décalage, mis à l'échelle par le masque de poids, est utilisé pour ajuster les échantillons dans le bloc. Ainsi, même si seul un seul paramètre de décalage est utilisé, les échantillons sont ajustés par des décalages par pixel. Le réseau de neurones peut également prendre d'autres paramètres, tels que des paramètres de quantification et des types d'image, comme entrée. En outre, il peut exister de multiples réseaux de neurones qui génèrent différents masques de poids, différents décalages étant alors signalés et un ou plusieurs des réseaux de neurones étant destinés à être sélectionnés pour le filtrage.
PCT/EP2021/063771 2020-06-04 2021-05-24 Filtre basé sur un réseau de neurones en codage vidéo WO2021244884A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022572477A JP2023528780A (ja) 2020-06-04 2021-05-24 ビデオコーディングにおけるニューラルネットワークベースのフィルタ
CN202180042531.5A CN115943629A (zh) 2020-06-04 2021-05-24 视频编码中基于神经网络的滤波器
US17/925,479 US20230188713A1 (en) 2020-06-04 2021-05-24 Neural network based filter in video coding
EP21727169.1A EP4162680A1 (fr) 2020-06-04 2021-05-24 Filtre basé sur un réseau de neurones en codage vidéo

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
EP20305591 2020-06-04
EP20305591.8 2020-06-04
EP20306417.5 2020-11-20
EP20306417 2020-11-20
EP20306628 2020-12-21
EP20306628.7 2020-12-21
EP21305444 2021-04-07
EP21305444.8 2021-04-07

Publications (1)

Publication Number Publication Date
WO2021244884A1 true WO2021244884A1 (fr) 2021-12-09

Family

ID=76059905

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/063771 WO2021244884A1 (fr) 2020-06-04 2021-05-24 Filtre basé sur un réseau de neurones en codage vidéo

Country Status (5)

Country Link
US (1) US20230188713A1 (fr)
EP (1) EP4162680A1 (fr)
JP (1) JP2023528780A (fr)
CN (1) CN115943629A (fr)
WO (1) WO2021244884A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125231A1 (fr) * 2021-12-28 2023-07-06 维沃移动通信有限公司 Procédé et terminal de filtrage de boucle
WO2023132765A1 (fr) * 2022-01-04 2023-07-13 Telefonaktiebolaget Lm Ericsson (Publ) Filtrage pour codage et décodage d'image
WO2023156365A1 (fr) * 2022-02-15 2023-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Codeur, décodeur et procédés de codage d'une image à l'aide d'une classification élastique
WO2024019343A1 (fr) * 2022-07-20 2024-01-25 현대자동차주식회사 Filtre vidéo en boucle s'adaptant à divers types de bruit et de caractéristiques
WO2024025280A1 (fr) * 2022-07-27 2024-02-01 Samsung Electronics Co., Ltd. Procédé et système de mise à l'échelle basée sur le contenu pour des filtres en boucle basés sur l'intelligence artificielle

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220337824A1 (en) * 2021-04-07 2022-10-20 Beijing Dajia Internet Information Technology Co., Ltd. System and method for applying neural network based sample adaptive offset for video coding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017036370A1 (fr) * 2015-09-03 2017-03-09 Mediatek Inc. Procédé et appareil de traitement basé sur un réseau neuronal dans un codage vidéo
EP3451293A1 (fr) * 2017-08-28 2019-03-06 Thomson Licensing Procédé et appareil de filtrage avec apprentissage profond à branches multiples
WO2019072097A1 (fr) * 2017-10-12 2019-04-18 Mediatek Inc. Procédé de codage vidéo utilisant un réseau neuronal
US20190230354A1 (en) * 2016-06-24 2019-07-25 Korea Advanced Institute Of Science And Technology Encoding and decoding methods and devices including cnn-based in-loop filter
WO2019205871A1 (fr) * 2018-04-25 2019-10-31 杭州海康威视数字技术股份有限公司 Procédés et appareils de codage et de décodage d'image, et dispositif associé

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7591630B2 (en) * 2003-08-29 2009-09-22 Casepick Systems, Llc Materials-handling system using autonomous transfer and transport vehicles
TWI816224B (zh) * 2015-06-08 2023-09-21 美商Vid衡器股份有限公司 視訊解碼或編碼方法及裝置
JP7410149B2 (ja) * 2018-08-24 2024-01-09 中興通訊股▲ふん▼有限公司 視覚メディアエンコードおよびデコードのための平面予測モード
US10311334B1 (en) * 2018-12-07 2019-06-04 Capital One Services, Llc Learning to process images depicting faces without leveraging sensitive attributes in deep learning models
WO2020257629A1 (fr) * 2019-06-19 2020-12-24 Beijing Dajia Internet Information Technology Co., Ltd. Procédés et appareils d'affinement de prédiction avec flux optique
EP4022902A4 (fr) * 2019-09-25 2022-11-23 Huawei Technologies Co., Ltd. Harmonisation de mode de fusion triangulaire avec prédiction pondérée
EP3863284A1 (fr) * 2020-02-04 2021-08-11 Apple Inc. Codage de bloc à plusieurs étages

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017036370A1 (fr) * 2015-09-03 2017-03-09 Mediatek Inc. Procédé et appareil de traitement basé sur un réseau neuronal dans un codage vidéo
US20190230354A1 (en) * 2016-06-24 2019-07-25 Korea Advanced Institute Of Science And Technology Encoding and decoding methods and devices including cnn-based in-loop filter
EP3451293A1 (fr) * 2017-08-28 2019-03-06 Thomson Licensing Procédé et appareil de filtrage avec apprentissage profond à branches multiples
WO2019072097A1 (fr) * 2017-10-12 2019-04-18 Mediatek Inc. Procédé de codage vidéo utilisant un réseau neuronal
WO2019205871A1 (fr) * 2018-04-25 2019-10-31 杭州海康威视数字技术股份有限公司 Procédés et appareils de codage et de décodage d'image, et dispositif associé

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUHANG XU (FUJITSU) ET AL: "Non-CE10: A CNN based in-loop filter for intra frame", no. JVET-O0157 ; m48249, 5 July 2019 (2019-07-05), XP030218728, Retrieved from the Internet <URL:http://phenix.int-evry.fr/jvet/doc_end_user/documents/15_Gothenburg/wg11/JVET-O0157-v2.zip JVET-O0157.docx> [retrieved on 20190705] *
PARK WOON-SUNG ET AL: "CNN-based in-loop filtering for coding efficiency improvement", 2016 IEEE 12TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), IEEE, 11 July 2016 (2016-07-11), pages 1 - 5, XP032934608, DOI: 10.1109/IVMSPW.2016.7528223 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125231A1 (fr) * 2021-12-28 2023-07-06 维沃移动通信有限公司 Procédé et terminal de filtrage de boucle
WO2023132765A1 (fr) * 2022-01-04 2023-07-13 Telefonaktiebolaget Lm Ericsson (Publ) Filtrage pour codage et décodage d'image
WO2023156365A1 (fr) * 2022-02-15 2023-08-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Codeur, décodeur et procédés de codage d'une image à l'aide d'une classification élastique
WO2024019343A1 (fr) * 2022-07-20 2024-01-25 현대자동차주식회사 Filtre vidéo en boucle s'adaptant à divers types de bruit et de caractéristiques
WO2024025280A1 (fr) * 2022-07-27 2024-02-01 Samsung Electronics Co., Ltd. Procédé et système de mise à l'échelle basée sur le contenu pour des filtres en boucle basés sur l'intelligence artificielle

Also Published As

Publication number Publication date
EP4162680A1 (fr) 2023-04-12
US20230188713A1 (en) 2023-06-15
JP2023528780A (ja) 2023-07-06
CN115943629A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
US20230188713A1 (en) Neural network based filter in video coding
JP7425241B2 (ja) 双方向オプティカルフローに基づく映像符号化及び復号化
US20220141456A1 (en) Method and device for picture encoding and decoding
US20220385922A1 (en) Method and apparatus using homogeneous syntax with coding tools
CN112369025A (zh) 基于上下文的二进制算术编码和解码
CA3149102A1 (fr) Transformee secondaire pour codage et decodage video
EP3709657A1 (fr) Réduction du nombre de corbeilles codées régulières
CN112771874A (zh) 用于画面编码和解码的方法和设备
EP4248650A1 (fr) Prédiction intra à partition géométrique
US20230298219A1 (en) A method and an apparatus for updating a deep neural network-based image or video decoder
US20220385917A1 (en) Estimating weighted-prediction parameters
US20220141466A1 (en) Unification of context-coded bins (ccb) count method
US11973964B2 (en) Video compression based on long range end-to-end deep learning
US20240031606A1 (en) Karhunen loeve transform for video coding
CN114127746A (zh) 卷积神经网络的压缩
CN114080613A (zh) 对深度神经网络进行编码的系统和方法
EP3675500A1 (fr) Prédiction de paramètres de quantification pour le codage et le décodage vidéo
US20240031611A1 (en) Deep prediction refinement
US20230171421A1 (en) Motion refinement using a deep neural network
US20230156232A1 (en) Adaptive application of generalized sample offset
US20230156185A1 (en) Generalized sample offset
WO2024002879A1 (fr) Reconstruction par mélange de prédiction et de résidu
WO2023247533A1 (fr) Procédés et appareils de codage et de décodage d&#39;image ou de vidéo
TW202420823A (zh) 使用彈性網路之深度特徵壓縮的熵調適
WO2020072397A1 (fr) Codage de vecteur de mouvement basé sur la taille de bloc en mode affine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21727169

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022572477

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021727169

Country of ref document: EP

Effective date: 20230104