CN117280683A

CN117280683A - Method and apparatus for encoding/decoding video

Info

Publication number: CN117280683A
Application number: CN202280016526.1A
Authority: CN
Inventors: P·博尔德斯; F·加尔平; K·纳赛尔; 陈娅; T·杜马斯; A·罗伯特
Original assignee: InterDigital CE Patent Holdings SAS
Current assignee: InterDigital CE Patent Holdings SAS
Priority date: 2021-02-25
Filing date: 2022-02-22
Publication date: 2023-12-22
Also published as: CN117083853A

Abstract

A method is provided for reconstructing at least a portion of a first picture from at least a portion of a second picture, the first picture and the second picture having different sizes. The reconstructing includes: decoding the second picture from the bitstream; and determining at least one first sample of the at least one portion of the first picture using at least one resampling filter applied to at least one second sample of the at least one portion of the decoded second picture. A corresponding apparatus for reconstructing at least a portion of a first picture is provided. A method for encoding/decoding video and corresponding apparatus are provided, the method comprising reconstructing at least a portion of a first picture from at least a portion of a second picture, the first picture and the second picture having different sizes.

Description

Method and apparatus for encoding/decoding video

Technical Field

Embodiments of the present invention relate generally to a method and apparatus for video encoding or decoding. Some embodiments relate to methods and apparatus for video encoding or decoding in which original pictures and reconstructed pictures are dynamically rescaled for encoding.

Background

To achieve high compression efficiency, image and video coding schemes typically employ prediction and transformation to exploit spatial and temporal redundancy in video content. Generally, intra or inter prediction is used to exploit intra or inter image correlation, and then transform, quantize, and entropy encode differences (commonly denoted as prediction errors or prediction residuals) between original blocks and predicted blocks. To reconstruct video, the compressed data is decoded by an inverse process corresponding to entropy encoding, quantization, transformation, and prediction.

Disclosure of Invention

According to an embodiment, there is provided a method for reconstructing at least a portion of a first picture from at least a portion of a second picture, wherein the first picture and the second picture have different sizes, and the reconstructing comprises: decoding the second picture from the bitstream; at least one first sample of the at least one portion of the first picture is determined using at least one resampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

According to another embodiment, there is provided an apparatus for reconstructing at least a portion of a first picture from at least a portion of a second picture, the apparatus comprising one or more processors, wherein the one or more processors are configured to: decoding the second picture from the bitstream; at least one first sample of the at least one portion of a first picture is determined using at least one resampling filter applied to at least one second sample of the at least one portion of a decoded second picture, the first picture and the second picture having different sizes.

According to another embodiment, there is provided a video encoding method including: encoding a second picture in the bitstream, the second picture being a scaled-down picture of the first picture; encoding a third picture in the bitstream, the third picture having a same size as the first picture, wherein encoding the third picture comprises: reconstructing at least a portion of the first picture by upsampling at least a portion of the second picture after decoding, the upsampling comprising: at least one first sample of the at least one portion of the first picture is determined using at least one upsampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

According to another embodiment, an apparatus for video encoding is provided, the apparatus comprising one or more processors, wherein the one or more processors are configured to: encoding a second picture in the bitstream, the second picture being a scaled-down picture of the first picture; encoding a third picture in the bitstream, the third picture having a same size as the first picture, wherein encoding the third picture comprises: reconstructing at least a portion of the first picture by upsampling at least a portion of the second picture after decoding, the upsampling comprising: at least one first sample of the at least one portion of the first picture is determined using at least one upsampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

According to another embodiment, there is provided a video decoding method including: decoding a second picture in the bitstream, the second picture being a scaled-down picture of the first picture; decoding a third picture in the bitstream, the third picture having a same size as the first picture, wherein decoding the third picture comprises: reconstructing at least a portion of the first picture by upsampling at least a portion of the second picture after decoding, the upsampling comprising: at least one first sample of the at least one portion of the first picture is determined using at least one upsampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

According to another embodiment, an apparatus for video decoding is provided, the apparatus comprising one or more processors, wherein the one or more processors are configured to: decoding a second picture in the bitstream, the second picture being a scaled-down picture of the first picture; decoding a third picture in the bitstream, the third picture having a same size as the first picture, wherein decoding the third picture comprises: reconstructing at least a portion of the first picture by upsampling at least a portion of the second picture after decoding, the upsampling comprising: at least one first sample of the at least one portion of the first picture is determined using at least one upsampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

In a variant, a method for encoding/decoding video includes: the at least one reconstructed portion of the first picture is stored in a decoded picture buffer that stores a reference picture used to encode a third picture.

According to another aspect, there is provided a method for encoding video, wherein encoding video comprises: classifying samples of the first picture; determining, for at least a portion of a first picture, a first filter based on the classification, the first filter for a first encoding operation using the at least a portion of the first picture; providing a first modified portion of a first picture; a second filter is determined based on the classification, the second filter for a second encoding operation using the first modified portion of the first picture.

An apparatus for encoding video is provided. The apparatus includes one or more processors, wherein the one or more processors are configured to encode video by: classifying samples of the first picture; determining, for at least a portion of a first picture, a first filter based on the classification, the first filter for a first encoding operation using the at least a portion of the first picture; providing a first modified portion of a first picture; a second filter is determined based on the classification, the second filter for a second encoding operation using the first modified portion of the first picture.

According to another aspect, there is provided a method for decoding video, wherein decoding video comprises: classifying samples of the first picture; determining, for at least a portion of a first picture, a first filter based on the classification, the first filter for a first decoding operation using the at least a portion of the first picture; providing a first modified portion of a first picture; a second filter is determined based on the classification, the second filter for a second decoding operation using the first modified portion of the first picture.

An apparatus for decoding video is provided. The apparatus includes one or more processors, wherein the one or more processors are configured to: decoding the video, wherein decoding the video includes classifying samples of the first picture; determining, for at least a portion of a first picture, a first filter based on the classification, the first filter for a first decoding operation using the at least a portion of the first picture; providing a first modified portion of a first picture; a second filter is determined based on the classification, the second filter for a second decoding operation using the first modified portion of the first picture.

According to an embodiment of any of the above aspects, the classification is stored in a decoded picture buffer storing the reference picture, i.e. the index associated with each sample of the first picture is stored in the decoded picture buffer.

According to another aspect, there is provided another method for encoding video, wherein encoding video comprises: classifying samples of the reference picture; and for at least one block of video, determining at least a portion of the reference picture using at least one motion vector for the at least one block; determining, for at least a portion of a reference picture, at least one interpolation filter based on the classification; determining a prediction for the block based on filtering the at least a portion of a reference picture using the determined at least one interpolation filter; the block is encoded based on the prediction.

There is provided an apparatus for encoding video, the apparatus comprising one or more processors configured to encode video by: classifying samples of the reference picture; and for at least one block of video: determining at least a portion of the reference picture using at least one motion vector of the at least one block; determining, for at least a portion of a reference picture, at least one interpolation filter based on the classification; determining a prediction for the block based on filtering the at least a portion of a reference picture using the determined at least one interpolation filter; the block is encoded based on the prediction.

According to another aspect, another method for decoding video includes: classifying samples of the reference picture; and for at least one block of video: determining at least a portion of the reference picture using at least one motion vector of the at least one block; determining, for at least a portion of a reference picture, at least one interpolation filter based on the classification; determining a prediction for the block based on filtering the at least a portion of a reference picture using the determined at least one interpolation filter; the block is decoded based on the prediction.

There is provided an apparatus for decoding video, the apparatus comprising one or more processors configured to decode video by: classifying samples of the reference picture; and for at least one block of video: determining at least a portion of the reference picture using at least one motion vector of the at least one block; determining, for at least a portion of a reference picture, at least one interpolation filter based on the classification; determining a prediction for the block based on filtering the at least a portion of a reference picture using the determined at least one interpolation filter; the block is decoded based on the prediction.

One or more embodiments also provide a computer program comprising instructions that, when executed by one or more processors, cause the one or more processors to perform a reconstruction method, or an encoding method or a decoding method according to any of the embodiments described herein. One or more of the embodiments of the present invention also provides a computer-readable storage medium having stored thereon instructions for reconstructing a portion of a picture, encoding or decoding video data according to the above-described method. One or more embodiments of the present invention also provide a computer-readable storage medium having stored thereon a bitstream generated according to the above method. One or more embodiments of the present invention also provide a method and apparatus for transmitting or receiving a bitstream generated according to the above method.

Drawings

FIG. 1 illustrates a block diagram of a system in which aspects of an embodiment of the invention may be implemented.

Fig. 2 shows a block diagram of an embodiment of a video encoder.

Fig. 3 shows a block diagram of an embodiment of a video decoder.

Fig. 4 illustrates an exemplary method for encoding video according to an embodiment.

Fig. 5 illustrates an exemplary method for reconstructing video according to an embodiment.

Fig. 6 illustrates an example of motion compensation of a current block in a current picture in a reference picture when the reference picture has a different resolution from the current picture, according to an embodiment.

Fig. 7 shows an example of determining filter coefficient values according to phases of samples according to an embodiment.

Fig. 8 shows an example of dual-stage motion supplemental filtering according to an embodiment.

Fig. 9 shows an example of horizontal filtering in the first stage of motion compensation filtering according to an embodiment.

Fig. 10 shows an example of vertical filtering in the second stage of motion compensation filtering according to an embodiment.

Fig. 11 shows an example of a symmetric filter and a filter rotation.

Fig. 12 shows an example of a method for determining an upsampling filter according to an embodiment.

Fig. 13 illustrates an example of a method for encoding/decoding a picture according to an embodiment.

Fig. 14A shows an example of different phases corresponding to two upsampling in the horizontal direction and the vertical direction according to an embodiment.

Fig. 14B to 14I show examples of different shapes of the up-sampling filter according to the embodiment.

Fig. 15 shows an example of a method for determining upsampling filter coefficients according to an embodiment.

Fig. 16 shows an example of a method for encoding video according to an embodiment.

Fig. 17 shows an example of a method for decoding video according to an embodiment.

Figure 18 shows an example of a method for encoding/decoding video according to an embodiment,

figure 19 shows an example of a method for encoding/decoding video according to another embodiment,

figure 20 shows an example of a method for encoding/decoding video according to another embodiment,

fig. 21 shows an example of a method for decoding video according to another embodiment.

Fig. 22 illustrates two remote devices communicating over a communication network in accordance with an example of the present principles.

Fig. 23 shows the syntax of a signal according to an example of the present principles.

Detailed Description

Various aspects are described herein, including tools, features, embodiments, models, methods, and the like. Many of these aspects are described in detail and at least illustrate individual characteristics, often in a manner that may sound limited. However, this is for clarity of description and does not limit the application or scope of these aspects. Indeed, all the different aspects may be combined and interchanged to provide further aspects. Moreover, these aspects may also be combined and interchanged with those described in previous submissions.

The aspects described and contemplated in this patent application may be embodied in many different forms. The following figures 1, 2 and 3 provide some embodiments, but other embodiments are contemplated, and the discussion of figures 1, 2 and 3 is not limiting of the breadth of the specific implementation. At least one of these aspects generally relates to video encoding and decoding, and at least one other aspect generally relates to transmitting a generated or encoded bitstream. These and other aspects may be implemented as a method, an apparatus, a computer-readable storage medium having stored thereon instructions for encoding or decoding video data according to any of the methods, and/or a computer-readable storage medium having stored thereon a bitstream generated according to any of the methods.

In this application, the terms "reconstruct" and "decode" are used interchangeably, the terms "pixel" and "sample" are used interchangeably, and the terms "image", "picture" and "frame" are used interchangeably.

Various methods are described herein, and each method includes one or more steps or actions for achieving the method. Unless a particular order of steps or actions is required for proper operation of the method, the order and/or use of particular steps and/or actions may be modified or combined. Furthermore, terms such as "first," second, "and the like, may be used in various implementations to modify elements, components, steps, operations, and the like, such as" first decoding "and" second decoding. The use of such terms does not imply a ordering of modified operations unless specifically required. Thus, in this example, the first decoding need not be performed prior to the second decoding, and may occur, for example, prior to, during, or in overlapping time periods.

The various methods and other aspects described herein may be used to modify modules, such as the motion compensation modules (270,375) of the video encoder 200 and decoder 300, as shown in fig. 2 and 3. Furthermore, aspects of the present invention are not limited to VVC or HEVC, and may be applied to, for example, other standards and recommendations (whether pre-existing or developed in the future) and extensions of any such standards and recommendations (including VVC and HEVC). The aspects described in this application may be used alone or in combination unless otherwise indicated or technically excluded.

FIG. 1 illustrates a block diagram of an example of a system in which various aspects and embodiments may be implemented. The system 100 may be embodied as a device that includes various components described below and is configured to perform one or more of the aspects described herein. Examples of such devices include, but are not limited to, various electronic devices such as personal computers, laptops, smartphones, tablets, digital multimedia set-top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. The elements of system 100 may be embodied in a single integrated circuit, multiple ICs, and/or discrete components, alone or in combination. For example, in at least one embodiment, the processing and encoder/decoder elements of system 100 are distributed across multiple ICs and/or discrete components. In various embodiments, system 100 is communicatively coupled to other systems or other electronic devices via, for example, a communication bus or through dedicated input and/or output ports. In various embodiments, the system 100 is configured to implement one or more aspects of the aspects described herein.

The system 100 includes at least one processor 110 configured to execute instructions loaded therein for implementing, for example, the various aspects described herein. The processor 110 may include an embedded memory, an input-output interface, and various other circuits known in the art. The system 100 includes at least one memory 120 (e.g., volatile memory device and/or non-volatile memory device). The system 100 includes a storage device 140 that may include non-volatile memory and/or volatile memory, including but not limited to EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash memory, magnetic disk drives, and/or optical disk drives. By way of non-limiting example, storage device 140 may include an internal storage device, an attached storage device, and/or a network accessible storage device.

The system 100 includes an encoder/decoder module 130 configured to, for example, process data to provide encoded video or decoded video, and the encoder/decoder module 130 may include its own processor and memory. Encoder/decoder module 130 represents a module that may be included in a device to perform encoding and/or decoding functions. As is well known, an apparatus may include one or both of an encoding module and a decoding module. In addition, the encoder/decoder module 130 may be implemented as a separate element of the system 100, or may be incorporated within the processor 110 as a combination of hardware and software as known to those skilled in the art.

Program code to be loaded onto processor 110 or encoder/decoder 130 to perform various aspects described herein may be stored in storage device 140 and subsequently loaded onto memory 120 for execution by processor 110. According to various embodiments, one or more of the processor 110, memory 120, storage 140, and encoder/decoder module 130 may store one or more of the various items during execution of the processes described in this application. Such storage items may include, but are not limited to, input video, decoded video, or portions of decoded video, bitstreams, matrices, variables, and intermediate or final results of processing equations, formulas, operations, and arithmetic logic.

In some embodiments, memory internal to the processor 110 and/or encoder/decoder module 130 is used to store instructions as well as to provide working memory for processing as needed during encoding or decoding. However, in other embodiments, memory external to the processing device (e.g., the processing device may be the processor 110 or the encoder/decoder module 130) is used for one or more of these functions. The external memory may be memory 120 and/or storage device 140, such as dynamic volatile memory and/or nonvolatile flash memory. In several embodiments, external non-volatile flash memory is used to store the operating system of the television. In at least one embodiment, a fast external dynamic volatile memory (such as RAM) is used as a working memory for video encoding and decoding operations, such as for MPEG-2 (MPEG refers to moving picture experts group, MPEG-2 is also referred to as ISO/IEC 13818, and 13818-1 is also referred to as h.222, 13818-2 is also referred to as h.262), HEVC (HEVC refers to high efficiency video coding, also referred to as h.265 and MPEG-H part 2), or VVC (general purpose video coding, a new standard developed by the joint video experts group (jfet)).

Inputs to the elements of system 100 may be provided through various input devices as indicated in block 105. Such input devices include, but are not limited to: (i) A Radio Frequency (RF) section that receives an RF signal transmitted over the air, for example, by a broadcaster; (ii) A Component (COMP) input terminal (or set of COMP input terminals); (iii) a Universal Serial Bus (USB) input terminal; and/or (iv) a High Definition Multimedia Interface (HDMI) input terminal. Other examples not shown in fig. 1 include composite video.

In various embodiments, the input devices of block 105 have associated respective input processing elements as known in the art. For example, the RF section may be associated with elements suitable for: (i) Selecting a desired frequency (also referred to as selecting a signal, or limiting the signal frequency band to one frequency band); (ii) down-converting the selected signal; (iii) Again limiting the frequency band to a narrower frequency band to select a signal band that may be referred to as a channel in some implementations, for example; (iv) demodulating the down-converted and band-limited signal; (v) performing error correction; and (vi) de-multiplexing to select a desired data packet stream. The RF portion of the various embodiments includes one or more elements for performing these functions, such as a frequency selector, a signal selector, a band limiter, a channel selector, a filter, a down-converter, a demodulator, an error corrector, and a demultiplexer. The RF section may include a tuner that performs various of these functions including, for example, down-converting the received signal to a lower frequency (e.g., intermediate or near baseband frequency) or to baseband. In one set-top box embodiment, the RF section and its associated input processing elements receive RF signals transmitted over a wired (e.g., cable) medium and perform frequency selection by filtering, down-converting, and re-filtering to a desired frequency band. Various embodiments rearrange the order of the above (and other) elements, remove some of these elements, and/or add other elements that perform similar or different functions. Adding components may include inserting components between existing components, such as an insertion amplifier and an analog-to-digital converter. In various embodiments, the RF section includes an antenna.

In addition, the USB and/or HDMI terminals may include respective interface processors for connecting the system 100 to other electronic devices across a USB and/or HDMI connection. It should be appreciated that various aspects of the input processing (e.g., reed-Solomon error correction) may be implemented, for example, within a separate input processing IC or within the processor 110, as desired. Similarly, aspects of the USB or HDMI interface processing may be implemented within a separate interface IC or within the processor 110, as desired. The demodulated, error corrected and demultiplexed streams are provided to various processing elements including, for example, a processor 110 and an encoder/decoder 130 that operate in conjunction with memory and storage elements to process the data streams as needed for presentation on an output device.

The various elements of system 100 may be disposed within an integrated housing. Within the integrated housing, the various elements may be interconnected and transmit data therebetween using a suitable connection arrangement 115 (e.g., internal buses known in the art, including I2C buses, wiring, and printed circuit boards).

The system 100 includes a communication interface 150 that allows communication with other devices via a communication channel 190. Communication interface 150 may include, but is not limited to, a transceiver configured to transmit and receive data over communication channel 190. Communication interface 150 may include, but is not limited to, a modem or network card, and communication channel 190 may be implemented within a wired and/or wireless medium or the like.

In various embodiments, the data stream is transmitted to system 100 using a Wi-Fi network, such as IEEE 802.11 (IEEE refers to institute of electrical and electronics engineers). Wi-Fi signals of these embodiments are received through a communication channel 190 and a communication interface 150 suitable for Wi-Fi communication. The communication channel 190 in these embodiments is typically connected to an access point or router that provides access to external networks, including the internet, to allow streaming applications and other OTT communications. Other embodiments provide streaming data to the system 100 using a set top box that delivers the data over an HDMI connection of the input block 105. Still other embodiments provide streaming data to the system 100 using the RF connection of the input block 105. As described above, various embodiments provide data in a non-streaming manner. In addition, various embodiments use wireless networks other than Wi-Fi, such as cellular networks or bluetooth networks.

The system 100 may provide output signals to various output devices including the display 165, speakers 175, and other peripheral devices 185. The display 165 of various embodiments includes, for example, one or more of a touch screen display, an Organic Light Emitting Diode (OLED) display, a curved display, and/or a collapsible display. The display 165 may be used in a television, tablet, laptop, cellular telephone (mobile telephone), or other device. The display 165 may also be integrated with other components (e.g., as in a smart phone), or may be a stand-alone display (e.g., an external monitor for a laptop). In various examples of implementations, other peripheral devices 185 include one or more of a standalone digital video disc (or digital versatile disc) (DVR, which may be denoted by both terms), a disc player, a stereo system, and/or a lighting system. Various embodiments use one or more peripheral devices 185 that provide functionality based on the output of the system 100. For example, a disc player performs the function of playing the output of the system 100.

In various embodiments, control signals are communicated between the system 100 and the display 165, speakers 175, or other peripheral device 185 using signaling (such as av.link, CEC, or other communication protocol capable of enabling device-to-device control with or without user intervention). These output devices may be communicatively coupled to system 100 via dedicated connections through respective interfaces 160, 170, and 180. Alternatively, the output device may be connected to the system 100 via the communication interface 150 using the communication channel 190. The display 165 and speaker 175 may be integrated in a single unit with other components of the system 100 in an electronic device (e.g., a television). In various embodiments, the display interface 160 includes a display driver, e.g., a timing controller (tcon) chip.

Alternatively, for example, if the RF portion of input 105 is part of a separate set top box, display 165 and speaker 175 may be separate from one or more of the other components. In various implementations where the display 165 and speaker 175 are external components, the output signal may be provided via a dedicated output connection (including, for example, an HDMI port, a USB port, or a COMP output).

The implementation may be performed by computer software implemented by the processor 110, or by hardware, or by a combination of hardware and software. As a non-limiting example, these embodiments may be implemented by one or more integrated circuits. As a non-limiting example, the memory 120 may be of any type suitable to the technical environment and may be implemented using any suitable data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory. As a non-limiting example, the processor 110 may be of any type suitable to the technical environment, and may encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture.

Fig. 2 shows an encoder 200. Variations of this encoder 200 are contemplated, but for clarity, the encoder 200 is described below without describing all contemplated variations.

In some embodiments, fig. 2 also shows an encoder in which the HEVC standard is modified, or an encoder employing techniques similar to HEVC, such as a VVC (multi-function video coding) encoder developed by jfet (joint video exploration team).

Prior to encoding, the video sequence may undergo a pre-encoding process (201), such as applying a color transform to the input color picture (e.g., a conversion from RGB 4:4 to YCbCr 4:2: 0), or performing remapping of the input picture components, in order to obtain a more resilient signal distribution to compression (e.g., histogram equalization using one of the color components). Metadata may be associated with the preprocessing and appended to the bitstream.

In encoder 200, pictures are encoded by encoder elements, as described below. The pictures to be encoded are partitioned (202) and processed in units such as CUs. For example, each unit is encoded using an intra mode or an inter mode. When a unit is encoded in intra mode, the encoder performs intra prediction (260). In inter mode, motion estimation (275) and motion compensation (270) are performed. The encoder decides (205) which of the intra mode or inter mode is used to encode the unit and indicates the intra/inter decision by, for example, a prediction mode flag. The encoder may also mix (263) intra-prediction results and inter-prediction results, or mix results from different intra/inter-prediction methods. For example, a prediction residual is calculated by subtracting (210) the prediction block from the original image block.

The motion correction module (272) uses the already available reference pictures to correct the motion field of the block without reference to the original block. The motion field of a region can be considered as a set of motion vectors for all pixels of the region. If the motion vector is based on sub-blocks, the motion field may also be represented as a set of all sub-block motion vectors in the region (all pixels within a sub-block have the same motion vector and the motion vectors may be different from sub-block to sub-block). If a single motion vector is used for the region, the motion field for the region may also be represented by a single motion vector (the same motion vector for all pixels in the region).

The prediction residual is then transformed (225) and quantized (230). The quantized transform coefficients, as well as motion vectors and other syntax elements, are entropy encoded (245) to output a bitstream. The encoder may skip the transform and directly apply quantization to the untransformed residual signal. The encoder may bypass both transformation and quantization, i.e. directly encode the residual without applying a transformation or quantization process.

The encoder decodes the encoded block to provide a reference for further prediction. The quantized transform coefficients are dequantized (240) and inverse transformed (250) to decode the prediction residual. The decoded prediction residual and the prediction block are combined (255), reconstructing the image block. An in-loop filter (265) is applied to the reconstructed picture to perform, for example, deblocking/SAO (sample adaptive offset) filtering to reduce coding artifacts. The filtered image is stored at a reference picture buffer (280).

Fig. 3 shows a block diagram of a video decoder 300. In decoder 300, the bit stream is decoded by a decoder element, as described below. The video decoder 300 generally performs a decoding process that is the inverse of the encoding process described in fig. 2. Encoder 200 typically also performs video decoding as part of encoding video data.

In particular, the input to the decoder comprises a video bitstream, which may be generated by the video encoder 200. First, the bitstream is entropy decoded (330) to obtain transform coefficients, motion vectors, and other encoded information. The picture partition information indicates how to partition the picture. Thus, the decoder may divide (335) the pictures according to the decoded picture partition information. The transform coefficients are dequantized (340) and inverse transformed (350) to decode the prediction residual. The decoded prediction residual and the prediction block are combined (355), reconstructing the image block.

The prediction block may be obtained (370) from intra prediction (360) or motion compensated prediction (i.e., inter prediction) (375). The decoder may mix 373 the intra prediction result and the inter prediction result, or mix the results from a plurality of intra/inter prediction methods. The motion field may be modified 372 by using already available reference pictures before motion compensation. An in-loop filter (365) is applied to the reconstructed image. The filtered image is stored at a reference picture buffer (380).

The decoded picture may further undergo post-decoding processing (385), such as an inverse color transform (e.g., conversion from YCbCr 4:2:0 to RGB 4:4:4), or performing an inverse remapping of the remapping process performed in the pre-encoding processing (201). The post-decoding process may use metadata derived in the pre-encoding process and signaled in the bitstream.

Reference picture resampling

At low bit rates and/or when pictures have less high frequencies, in order to better trade off coding efficiency, pictures of reduced size may be encoded instead of full resolution images, typically in the case of 4K or 8K frames. Responsible for the decoder to scale up the decoded picture prior to display. The principle of Reference Picture Resampling (RPR) is to dynamically rescale the pictures of a video sequence on a picture basis in order to better trade-off coding efficiency.

Fig. 4 and 5 show examples of a method for encoding (400) and decoding (500) video, respectively, in which an image to be encoded may be rescaled for encoding, according to an embodiment. For example, such encoders and decoders may conform to the VVC standard.

Given an original video sequence consisting of pictures of size (picture width x picture height), the encoder selects for each original picture a resolution (i.e., picture size) for encoding the frame. Different PPS (for picture parameter sets) are encoded in the bitstream in the size of the picture, and the slice/picture header of the picture to be decoded indicates which PPS is used on the decoder side to decode the picture.

The standard does not specify the functions of a downsampler (440) and an upsampler (540) to be used as pre-processing or post-processing, respectively.

For each frame, the encoder selects whether to encode at the original resolution or at a reduced resolution (e.g., picture width/height divided by 2). The selection may be made by two processes that encode or take into account spatial and temporal activity in the original picture.

When the encoder chooses to encode the original picture at a reduced resolution, the original picture is reduced (440) to produce a bitstream before being input to the core encoder (410). The reconstructed picture at the reduced resolution is then stored (420) in a Decoded Picture Buffer (DPB) for encoding a subsequent picture. Thus, the Decoded Picture Buffer (DPB) may include pictures having a size different from the current picture size.

At the decoder, pictures are decoded (510) from the bitstream and the reconstructed pictures at the reduced resolution are stored (520) in a Decoded Picture Buffer (DPB) for decoding subsequent pictures. According to an embodiment, the reconstructed picture is upsampled (540) to its original resolution and transmitted, for example, to a display.

According to an embodiment, in case the current picture to be encoded uses a reference picture from the DPB of a different size than the current picture, rescaling (430/530) (zooming in or out) of the reference block is performed (on the fly) with separate (horizontal and vertical) interpolation filters and appropriate samples during the motion compensation process to construct the prediction block. Fig. 6 illustrates an example of motion compensation with implicit block resampling, which may be implemented in the rescaling (430/530) of the encoding and decoding methods described above. The selection of the filter coefficients depends on the phase (θ _x ,θ _y ) (the position of the sample to be interpolated in the reference picture), in this case the phase depends on both the motion vector and the size (equation 1) of the reference picture (620 in fig. 6) (SXref, SYref) and the current picture (610 in fig. 6) (SXcur, SYcur) (fig. 6).

In order to predict a current block of size (SXcur, SYcur) predicts P (610), for each sample Xcur of P, its position in the reference picture (Xref, yrf) is determined. The value of (Xref, yrf) is a function of the motion vector (MVx, MVy) of the current block and the scaling ratio between the current block size and the corresponding region in the reference picture (SXref, SYref) (620).

As shown in fig. 6, let (θx, θy) denote the phase of the non-integer part as the motion compensation point (Xref, yrf) in the reference picture. The position (Xref, yrf) and phase (θx, θy) are given by the following equations:

where int (x) gives the integer part of x.

In an embodiment, motion Compensation (MC) uses two separate 1D filters to reduce the computational effort (fig. 7). As shown in fig. 8, 9 and 10, the MC process is performed in two stages: first horizontal motion compensation filtering (820,900) and then vertical motion compensation filtering (840,1000), or in a variant, vertical motion compensation filtering may be performed first, followed by horizontal motion compensation filtering.

Fig. 8 shows an example of dual-stage motion supplemental filtering according to an embodiment. The block position (Xref, yrf) and the phase (θx, θy) in the reference picture are determined (810) from the block position (XCur, YCur) in the current picture and the motion vector (MVx, MVy) of the current block. According to an embodiment, horizontal filtering (shown on fig. 9) with a 1D filter is performed (820,940) to determine motion compensated samples that are amplified in the horizontal direction.

In an implementation, since the motion vector has sub-pixel precision, there are as many 1D filters as there are sub-pixel positions (phases). Fig. 7 depicts how the coefficients w (i) of the filter are determined from the phase of the motion compensated samples Xcur. The reconstructed samples "rec" are calculated using 1D filtering as:

the reconstructed samples are stored 830 into a temporary buffer 930 of the same size SXcur, SYref. Vertical filtering is then performed (840) with a 1D filter, as shown in fig. 10, using a temporary buffer as input to determine motion compensated samples that are amplified in the vertical direction.

Note that it is also possible to perform vertical filtering first and then horizontal filtering, as they are separate filters.

The resulting prediction samples are stored (850) into blocks (1050) of size (SXcur, SYcur).

In the above description, the current picture and the reference picture are considered to correspond to the same window. This means that if the motion is zero, the upper left and lower right samples of the two pictures correspond to two identical scene points. If this is not the case, an offset window parameter should be added to (Xref, yrf).

The motion compensation with implicit resampling described above allows reuse of interpolation filters designed for classical motion compensation, such as those used in the VVC standard. Moreover, this process avoids the necessity of storing the reference picture at several resolutions. However, the simplicity of the upsampling filter limits the compression efficiency of the encoder. Thus, improvements are needed.

In an embodiment, a method for reconstructing at least a portion of a first picture from at least a portion of a second picture is provided, wherein the first picture and the second picture have different sizes. For example, the second picture has a smaller resolution than the first picture. According to this embodiment, reconstructing the portion of the first picture comprises: decoding a second picture from the bitstream; and determining at least one first sample of the at least one portion of the first picture using at least one upsampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

In an embodiment, a method for reconstructing includes transmitting the at least one reconstructed portion of a first picture to a display. In an embodiment, the steps of the reconstruction method provided below may be implemented in the method for decoding (510,540) described with reference to fig. 5.

According to an embodiment, the method for reconstructing may be implemented in an encoding method or a decoding method. At least a portion of the first picture is obtained by decoding the second picture and upsampling at least a portion of the second picture, as described below. At least a portion of the reconstructed first picture is then stored in a decoded picture buffer for future use as a reference picture when encoding/decoding a subsequent picture of the same size or a different size than the first picture.

In the following, some embodiments are provided in which filter parameters are determined. The filter parameters include upsampling filter coefficients, associated tap positions (shapes), and possibly an index identifying the filter. In the methods for reconstructing a picture, for encoding and/or for decoding provided above, any of the embodiments provided below may be implemented alone or in combination with any one or more of the other embodiments.

According to an embodiment, the upsampling filter is inseparable. In this embodiment, the upsampling filter cannot be handled by two-step upsampling with a 1D filter. The F filter may be linear or nonlinear.

According to another embodiment, the upsampling filter coefficients are encoded in the bitstream. In a variant, the upsampling filter coefficients may be encoded even if the reference picture and the current picture have the same size. In the bitstream, the size of the original picture (after upsampling) is encoded. The size of the original picture may be a parameter associated with the upsampling filter. The upsampled filter coefficients and/or the original size may be encoded in, for example, APS (adaptive parameter set, e.g., used to transmit adaptive loop filter coefficients in VVC standards), slice headers, picture headers, or PPS. There may be default values for the upsampling filter coefficients that are not encoded in the bitstream.

The filter coefficients may be derived per picture, per region in one picture, per several pictures in different pictures or per several regions.

Fig. 12 shows an example of a method 1200 for determining an upsampling filter according to an embodiment. Several upsampling filters may be used. The selection of the upsampling filter to be used may be controlled by a classification process.

According to a variant, when the upsampling is in the loop of the motion compensation for predicting the current picture, the upsampling of the reference picture to be used by the current picture is performed in response to determining (1210) that the reference picture resolution is smaller than the current picture.

The classification process determines (1220) a class index for each reference sample or group of reference samples (e.g., a group of 4x4 samples). One filter is associated with one class index. In the example of fig. 14A, an area to be interpolated is shown, and black samples show reference samples for which category indexes have been determined, and examples (1, 2, 3) of samples to be interpolated.

For each sample to be interpolated in the upsampled picture, a corresponding set of co-located reference samples is determined. For example, fig. 14A shows an example of a co-located reference sample (black sample in a dotted line box) associated with sample 3 to be interpolated. The class index associated with the co-located reference sample of the sample to be interpolated allows one single class index value for the sample to be interpolated to be derived. For example, it may be the class index value of the co-located reference sample closest to the current sample to be interpolated, or the class index value at a predetermined relative position, or the mean/median of the class index values of several co-located reference samples.

For each sample to be interpolated, an upsampling filter is selected (1230) based on a class index derived for the sample to be interpolated. Since classification is performed on reference samples of a reference picture to be upsampled or, in the case of upsampling for display, on a decoded picture, there is no need to encode a class index value of an upsampling filter used to determine each sample to be interpolated.

An upsampling filter is then applied (1240) to determine the value of the sample to be interpolated.

According to an embodiment, the classification process (1220) may be similar to the classification process used in Adaptive Loop Filters (ALFs) in VVC standards. The reconstructed samples "t (r)" are classified into K categories (k=25 for luma samples and k=8 for chroma samples), and K different filters are determined using the samples of each category. Classification is performed using local gradient derived directionality and activity values.

For example, the method 1200 described above may be applied when a picture is encoded in a reduced version, decoded in a reduced version, and up-sampled for output (e.g., for transmission to a display).

According to another embodiment, the method 1200 may also be used to determine a downsampling filter that may be used to downsample a picture. For example, when a picture is to be encoded in a reduced version, downsampling of the picture may be performed prior to its encoding.

Fig. 13 illustrates an example of a method for encoding/decoding a picture according to an embodiment. According to this embodiment, it is determined whether the current picture is to be encoded or decoded using inter prediction (1305).

When the current picture is encoded/decoded without inter prediction, the picture is encoded/decoded using intra prediction, for example (1340).

When encoding/decoding a current picture using inter prediction, it is determined whether a reference picture resolution is less than a resolution of the current picture (1310). If not, the current picture is encoded/decoded using the reference picture stored in the DPB (1340). When the reference picture has a larger size than the current picture, the scaling down is performed using a conventional RPR (reference picture resampling) motion interpolation process from the VVC standard when encoding/decoding the current picture.

When the reference picture has a smaller size than the current picture (1310), the amplification is performed with an upsampling filter determined according to any of the embodiments presented herein (1320). Upsampling using the filter may be performed dynamically within the motion compensation process as the current picture is encoded/decoded 1340, or the reference picture of the DPB may be amplified 1320 before the current frame is encoded/decoded 1340 and stored 1330 in the DPB.

In the latter case, the DPB may contain several instances of reference pictures of different resolutions, and the motion compensation is unchanged compared to encoding/decoding without RPR (1340).

According to an embodiment, the upsampling filter is a wiener-based adaptive filter (WF). For example, the coefficients are determined in a similar manner as the coefficients of ALF in the VVC standard.

In VVC, the in-loop ALF filter (adaptive loop filter) is a linear filter, the purpose of which is to reduce coding artifacts on reconstructed samples. Coefficient c of the filter _n Is determined such that the mean square error between the original samples s (r) and the filtered samples t (r) is minimized by using wiener-based adaptive filter techniques.

Wherein:

r= (x, y) is the sample position belonging to the region "R" to be filtered.

Original samples: s (r)

Samples to be filtered: t (r)

FIR filter with N coefficients: c= [ c ] ₀ ,…c _N-1 ] ^T

Filter tap position offset: { p ₀ ,p ₁ ,…p _N-1 P, where _n Representing the sample position offset for the nth filter tap pair r. This set of tap positions may also be referred to as a filter "shape".

Filtered samples: f (r)

To find the sum of squares of minimum errors (SSE) between s (r) and f (r), SSE can be determined with respect to c _n And let the derivative equal to zero. Then, the coefficient value "c" is obtained by solving the following equation:

[Tc].c ^T ＝v ^T (equation 3)

Wherein:

in VVC, the coefficients of ALF may be encoded in the bitstream so that they may be dynamically adapted to the video content. There are also some default coefficients and the encoder indicates which set of coefficients each CTU will use.

In VVC, symmetrical filters are used, as shown in the upper part of fig. 11, and some filters can be obtained from other filters by rotation, as shown in the lower part of fig. 11. Each coefficient in the filter shown in the upper part of fig. 11 is associated with one or two positions p (x, y). For example, let p9 (0, 0) and p3 (0, -1) or p3 (0, 1) represent the positions of c9 and c 3. In the case of a diagonal transformation, the position p (x, y) is moved to p (y, x), in the case of a vertical flip transformation, the position p (x, y) is moved to p (-x, y), and in the case of a rotation, the position p (x, y) is moved to p (y, -x).

According to an embodiment, the above-described method for determining ALF coefficients is used for determining upsampling filter coefficients.

According to an embodiment, each upsampling stage may have at least one WF. The phase of the samples to be interpolated allows the up-sampling filter to be used (1230) to be determined. The example depicted in fig. 14A corresponds to 2 upsampling in the horizontal direction and the vertical direction. The black dots are reconstructed samples t (r) of the decoded picture (reference picture or decoded picture to be upsampled for display), and the white dots correspond to the samples f (r') (missing samples) to be interpolated. Thus, "r'" may be different from "r". In this example, there are 3 phases {0,1,2,3}. Phase 0 has the same position as the reconstructed sample (r' =r). WF corresponding to phase-0 may be omitted (inferred to be the same).

(equation 2) is modified as follows (1240):

in (equation 3), the expression of "v" is modified as follows:

where R '= (x, y) is the sample position belonging to the region to be interpolated "R'".

According to a variant, only missing points r (x, y) in the enlarged picture (i.e. points with non-co-located points in the reduced picture) are interpolated. In another variant, all positions r (x, y) are interpolated, i.e. the missing points and the points with co-located points in the scaled-down picture.

In a variant, some samples corresponding to some phase subsets are interpolated using only WF filters, while other phases are interpolated using conventional separable 1D filters. For example, in fig. 14A, phases 0 and 1 are interpolated with WF in a first step, and the next phases 2, 3 are interpolated with a horizontal 1D filter using filtered samples of phases 0 and 1. Or conversely phases 0 and 2 are interpolated with WF and the next phases 1, 3 are interpolated with a 1D vertical filter.

In fig. 14A, a square filter shape of size 4x4 is shown, but it may have a different shape. Fig. 14B-E show different shapes that may be used to interpolate samples with phase 3, the filter shape being shown by black samples, which represent reconstructed samples to be used to interpolate samples with phase 3.

Fig. 14F and 14G illustrate other examples of horizontal filter shapes that may be used to interpolate samples with phase 2. Fig. 14H shows another example of a vertical filter shape that may be used to interpolate samples with phase 1. Fig. 14I shows another example of a center filter shape that may be used to interpolate samples with phase 3.

The shape may depend on the category and/or phase. Similar to ALF, the coefficients of some shapes/classes may be the same as other classes/shapes, but obtained by rotation, and the coefficients of one shape may be obtained by symmetry. For example, after a 90 ° rotation, the coefficients of the shape of fig. 14B may be the same as the shape of fig. 14C.

In a variant, classification of the reference sample is performed (1220). For each category, a different upsampling WF is used. In another variant, the classification may be the same as that used by ALF.

Fig. 15 shows an example of a method 1500 for determining upsampling filter coefficients for use at the encoder side according to an embodiment.

The original picture is scaled down (1510) and encoded (1520). Reconstructed samples from the encoded pictures are classified by category 1530. A set F0 of filter coefficients is determined (1540) for a region R of the reconstructed picture (e.g., for a CTU or a group of CTUs). The set of filter coefficients F0 includes an up-sampling filter for each class and phase, where f0= { g ₀₀ ,g ₀₁ ,…,g _0M Where M is the number of classes or phases or the number of combinations of classes and phases, in the case of one filter associated with each class and phase. As described above, the filter of the set F0 is determined by (equation 3, equation 5).

The determined upsampling filter F0 is applied (1550) to the upsampled region R of the region R where the reconstructed picture was obtained using equation 4 ^up Is f0 (r').

Other upsampling filters Fi are similarly applied 1555 to upsampled regions R that determine regions R of the reconstructed picture ^up Fi = { g), where Fi = { g% _i0 ,g _i1 ,…,g _iM And i= {1, … L }, where L is the number of possible filters per class and/or phase that have been transmitted or known by the decoder. Advantageously, the distortion can be directly derived from the coefficients and the value of the original sample s (r').

The selection of the filter for the class/stage can be done by finding a new up-sampling filter g _0s Coding and reusing default or previously transmitted for each category/phase sFilter value g _is The best trade-off (1560) between (i= {1, … L }) (e.g., using the rate-distortion lagrangian cost) is determined. Distortion is the difference (e.g., L1 or L2 norm) between the upsampled reconstructed region and the corresponding region in the original picture.

If the filter g is determined for class/phase s _0s Is lower than the filter g _is Any of the rate distortion costs of (2), then filter g is matched in the bitstream _0s Is encoded (1570).

For each class/stage s, the index I of the filter providing the lowest rate distortion cost (where i= … L) is encoded (1580) in the bitstream for region R.

In some embodiments, region R may be a region in the reconstructed picture, be the entire picture, be a set of several pictures or a set of several regions in different pictures.

The method for determining the filter to be used for region R is described above in the case where there is one filter per class and/or phase. In the case where F0 and Fi each include a single filter, a similar method can be applied.

In a variation, the determination of the filter coefficients may be accomplished through machine learning using an iterative optimization algorithm (e.g., using gradient descent). When R is large, this may have the advantage of learning a large number of samples/images without numerical limitations on Tc and v.

According to an embodiment, reconstructed upsampled pictures are stored in the DPB even though the encoded pictures correspond to downsampled pictures as shown in fig. 16 and 17. According to this embodiment, the DPB includes only high resolution reference pictures.

Fig. 16 and 17 show a method 1600 for encoding video and a method 1700 for decoding video, respectively, according to an embodiment. The original picture may be encoded at a lower resolution or a higher resolution.

The original high resolution picture is downsampled (1660) by the encoder prior to encoding (1610). The upsampling filter coefficients may be derived (1640) as described above, and the reconstructed picture is upsampled (1650) before being stored in the DPB (1620). Conventional RPR motion compensation is then applied (reference picture is high resolution, current picture is low resolution) (1630).

In the decoding stage, the downscaled picture is decoded from the bitstream (1710), and if upsampled filter coefficients are present in the bitstream, the upsampled filter coefficients are decoded (1740). The low resolution decoded picture is upsampled 1750 and stored in the DPB 1720. Conventional RPR motion compensation is then applied (reference picture is high resolution, current picture is low resolution) (1730). In a variant, low resolution decoded pictures are stored in the DPB and the upsampled decoded pictures are used for display only.

If the original picture is encoded at high resolution, downsampling (1660) and upsampling (1650,1750) are bypassed.

Note that in a variant, the upsampling filter has predetermined default coefficients and steps 1640 and 1740 are not present/bypassed.

Post-filtering for image restoration

In video standards (e.g., HEVC, VVC), a restoration filter is applied to reconstructed pictures to reduce coding artifacts. For example, sample Adaptive Offset (SAO) filters have been introduced in HEVC to reduce ringing and banding artifacts in reconstructed pictures, supplementing the deblocking filter (DBF), which specifically reduces artifacts at block boundaries. In VVC, an additional Adaptive Loop Filter (ALF) attempts to minimize the mean square error between the original samples and the reconstructed samples using wiener-based adaptive filter coefficients. SAO and ALF use classification of reconstructed samples to select filters to be applied.

ALF classification

As described above, ALF is a specific post-filter for reconstructed image recovery. ALF classifies samples into K categories (as an example: k=25 for luma samples) or K regions (as an example: k=8 for chroma samples), and determines K different filters using samples of each category or region. In the case of classification, classification of luminance samples is performed with directionality and activity values derived from local gradients.

In VVC, the coefficients of ALF may be encoded in the bitstream so that they may be dynamically adapted to the video content. These coefficients may be stored for reuse in other pictures. There are also some default coefficients and the encoder indicates which set of coefficients each CTU will use.

In VVC, a symmetric filter is used (as shown in the top part of fig. 11), and some filter coefficients may be obtained from other filter coefficients by rotation (as shown in the bottom part of fig. 11).

Motion compensated filtering and SIF

In hybrid video coding, inter prediction predicts a current block using motion compensation of a reference block extracted from a previously reconstructed reference picture. The position difference between the current block and the reference block is a motion vector.

The motion vector may have sub-pixel precision (e.g., 1/16 in VVC), and the motion compensation process selects the motion vector with the corresponding sub-pixel position (θ) in the reference picture _x ,θ _y ) As shown in fig. 6. Conventionally, in order to reduce implementation complexity, motion compensated interpolation filtering is performed with separable filters: a horizontal filter and a vertical filter.

To improve coding efficiency, for some sub-pixel positions, the encoder may select among several filters and signal in the bitstream. For example, in the VVC standard, for a 1/2 sub-pixel position, a selection can be made between two interpolation filters (regular or gaussian). Such tools are also known as switched interpolation filters (SIF tools). A gaussian filter is a low-pass filter that smoothes high frequencies compared to a regular filter.

According to ALF post-filtering, better efficiency is obtained in the filtering process when the samples (or groups of samples) to be filtered are pre-classified and the classification is used to select a specific set of filter coefficients for each sample (or group of samples). On the encoder side, classification can be used to determine the coefficients of the filters, which minimize the mean square error between the original samples "s (r)" and the filtered samples "t (r)", by using wiener-based adaptive filter techniques (as described in, for example, the following article by c.tsai et al, "Adaptive Loop Filtering for Video Coding (adaptive loop filtering for video coding)", IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (IEEE signal processing monograph journal), volume 7, 6, month 2013, 12).

However, classification of samples significantly increases the number of operations per sample.

In VVC, only ALF uses classification. The SIF tool signals which filter is to be used by each CU for motion compensation, but the same filter is used to construct all prediction samples of the prediction unit. For RPR, a single set of rescaled interpolation filters is selected for each picture at the ratio between the reference block size and the current block size, and all samples are filtered with the single filter. A set of rescaling filters contains for each phase the coefficients of the filter to be used.

In accordance with one aspect of the present principles, there is provided a method for encoding/decoding video, wherein a sample classification of a reference picture is used to select at least one motion compensated interpolation filter when predicting a block of a picture of the video.

According to an embodiment, for each sample or group of samples from a reference picture that needs to be interpolated, the class to which the sample belongs (according to the classification performed on the reference picture) is determined. Then, interpolation filters associated with the class are selected, and the samples are filtered using coefficients of the selected filters.

In accordance with another aspect of the present principles, there is provided a method for encoding/decoding video, wherein sample classifications of reconstructed pictures are shared between different encoding/decoding modules of an encoder/decoder. For example, a reference picture is classified, and then the classification is used to select at least one filter used during an encoding/decoding operation (such as resampling filtering or motion compensated interpolation filtering) of a new picture using the reference picture.

According to another example, the reconstructed picture is classified and then the classification is used to select at least one filter that is used during encoding/decoding operations of the reconstructed picture (such as post-filtering and/or resampling for display) and/or during encoding/decoding operations of a new picture that uses the reconstructed picture as a reference picture (such as resampling filtering or motion compensated interpolation filtering). For example, this may be done for each sample (or group of samples), the classification of the sample (or group of samples) allowing the selection of a filter to be used with the sample (or group of samples).

Conventionally, a filter comprises several coefficients, each coefficient being applied to a neighboring sample of the current sample being filtered, the neighboring sample being determined according to a selected filter shape, an example of which is given in fig. 11.

According to an embodiment, in order to share the classification between any encoding/decoding modules, the result of the classification is stored in a common space accessible by any of the encoding/decoding modules, such as a Decoded Picture Buffer (DPB) storing reference pictures.

In accordance with the present principles, the ability of sample classification to filter selection is used for motion compensated interpolation filters and resampling filters while maintaining relatively little complexity. This is done by sharing sample classifications for several filtering purposes: a recovery filter (e.g., ALF or bilateral filter), MC filtering, resampling filter. In an embodiment, the classification may be stored in the DPB.

At the encoder, classification of the reconstructed samples allows a special filter to be derived for each sample class. This can be done by minimizing the mean square error between the original samples and the reconstructed samples belonging to one class using, for example, wiener-based adaptive filter coefficients.

Next, on the decoder side, the selection of the filters to be used is controlled by a classification process. For example, the classification process determines a class index for each sample, and a filter is associated with one class index.

In some variations, classification is by sample set rather than sample. For example, the sample group is a 2x2 region.

Classification of interpolation filters

Fig. 18 shows a method 1800 for encoding or decoding video according to an embodiment. According to this embodiment, a set of interpolation filters is defined, including interpolation filters for each class index. The interpolation filter may be determined in the same manner as the ALF filter, and when the adaptation of the content is required, the coefficients of the new interpolation filter may be transmitted to the decoder.

A reference picture is input to the process. Samples of the reference picture are classified (1810). Then, at 1820, motion compensation of the block is performed to determine a prediction for the current block to be encoded or decoded.

For a video block to be encoded or decoded, a motion vector is obtained. The motion vector allows a portion or block of a reference picture to be determined for predicting the block.

When the motion vector points to a sub-sample position, as shown in fig. 6, the samples of the motion compensated portion of the reference picture must be interpolated to determine the block samples for prediction. In accordance with the present principles, an interpolation filter (1830) is determined for each sub-sample based on the classification.

Thus, the prediction of the block is determined (1840) as interpolated samples of the reference picture.

According to an embodiment, for determining the interpolation filter (1830), for each sample of the motion compensated portion of the reference picture, a class index is determined, e.g., from one or more class indices associated with one or more neighboring samples at a sample position in the reference picture. Then, an interpolation filter is selected for each sub-sample to be interpolated using the class index determined for the sub-sample. Then, a prediction of the block is generated (1840) by interpolating each sub-sample of the motion compensated portion of the reference picture using an interpolation filter selected for the sub-sample. Finally, the prediction is used to encode or decode the block (depending on whether the method is implemented at the encoder or decoder) (1850). At the time of encoding, the residual between the original block and its prediction is determined and encoded. At decoding, the residual is decoded and added to the prediction to reconstruct the block, the prediction for the block being generated using the same process as the encoder.

In accordance with another aspect of the present principles, the same sample classification is shared between the encoding/decoding modules of the encoder or decoder. A set of filters is defined for each encoding or decoding operation (such as motion compensated interpolation, resampling, ALF) that uses filters.

Same classification for interpolation and resampling filters

Fig. 19 shows an example of a method 1800 for encoding or decoding video according to another embodiment. To exploit the ability of sample classification to filter selection for both Motion Compensated (MC) interpolation (1940) and resampling (1930) filters, a common classification of reference pictures (19810) may be performed and used.

Advantageously, the entire reconstructed picture is classified and the classification for each sample is stored (1920) so that the classification can be used by the motion compensated interpolation filter and resampling filter processes. In the case where resampling is done implicitly within the MC process (1950), the classification is input directly to the MC.

According to an embodiment, the classification is stored in the DPB together with the reference picture so that it can be reused by other processes.

Same classification for interpolation, resampling filter and post-filtering

Fig. 20 shows an example of a method 2000 for encoding or decoding video according to another embodiment. In this variation, classification (2030) is performed on the reconstructed picture before applying a recovery filter (also referred to as post-filter, PF) (e.g., ALF) (2050). The encoder may then use the classification to derive filter coefficients (2040) for the post-filter (e.g.,/ALF). The classification is used to select a filter to be used by post filtering (2050). Advantageously, this classification is also used by resampling filtering or motion compensated interpolation filtering, so that only one single classification phase (2030) is performed. Note that in this variant, other processes (e.g., resampling filtering or motion compensated interpolation filtering) use classification performed before applying the restoration filter (post-filtering), while other processes use restored picture samples (after applying post-filtering).

According to an embodiment, the classification may be stored in the DPB (2020) such that it may be reused by other processes. In a variant, if the picture is used only as a reference (2060), storage in the DPB is performed.

Out-of-loop resampling

In the case of RPR, the resampling process of the decoded picture may not be specified (fig. 5: 540). Fig. 21 shows an example of a method 2100 for decoding video according to another embodiment. The picture is decoded (2110), and samples of the decoded picture are classified (2130). A post-filter (2150) is applied based on the classification and finally the classification is stored in the DPB so that the classification can be used for other processes using the decoded picture as a reference picture.

The selection of the resampling filter to be used (e.g., upsampling) may be controlled by the classification process (2130). The classification process determines a class index for each sample (or group of samples) and one filter is associated with one class index. The filter index allows the resampling filter (2160) to be selected.

It should be appreciated that the above-described encoding or decoding method may be implemented in the encoder 200 or decoder 300 described with respect to fig. 2 and 3 for encoding or decoding video in/from a bitstream.

In an embodiment, shown in fig. 22, in the environment where two remote devices a and B transmit over a communication network NET, device a comprises a processor associated with a memory RAM and a ROM, the memory being configured to implement a method for encoding video according to any of the embodiments described with fig. 1 to 21, and device B comprises a processor associated with a memory RAM and a ROM, the memory being configured to implement a method for decoding video according to any of the embodiments described with respect to fig. 1 to 21.

According to an example, the network is a broadcast network adapted to broadcast/transmit encoded data representing video from device a to decoding devices including device B.

The signal intended to be transmitted by device a carries at least one bitstream comprising encoded data representative of video. The bit stream may be generated in accordance with any implementation of the present principles.

Fig. 23 shows an example of the syntax of such signals transmitted by a packet-based transmission protocol. Each transmitted packet P includes a header H and a PAYLOAD payoad. In some embodiments, the PAYLOAD payoad may comprise encoded video data encoded according to any of the above embodiments. In some embodiments, the signal includes filter (upsampling, interpolation) coefficients as determined above.

Various implementations participate in decoding. As used in this application, "decoding" may encompass all or part of a process performed on a received encoded sequence, for example, in order to produce a final output suitable for display. In various implementations, such processes include one or more processes typically performed by a decoder, such as entropy decoding, inverse quantization, inverse transformation, and differential decoding. In various implementations, such processes also or alternatively include processes performed by the various embodying decoders described herein, e.g., decoding up-sampling filter coefficients, up-sampling decoded pictures.

As a further example, in an embodiment, "decoding" refers only to entropy decoding, in another embodiment "decoding" refers only to differential decoding, and in yet another embodiment "decoding" refers to a combination of entropy decoding and differential decoding. The phrase "decoding process" is intended to refer specifically to a subset of operations or broadly to a broader decoding process, as will be clear based on the context of the specific description, and is believed to be well understood by those skilled in the art.

Various implementations participate in the encoding. In a similar manner to the discussion above regarding "decoding," as used in this application, may encompass, for example, all or part of a process performed on an input video sequence to produce an encoded bitstream. In various implementations, such processes include one or more processes typically performed by an encoder, such as partitioning, differential encoding, transformation, quantization, and entropy encoding. In various implementations, such processes also or alternatively include processes performed by the various embodying encoders described herein, e.g., determining upsampling filter coefficients, upsampling decoded pictures.

As a further example, in an embodiment, "encoding" refers only to entropy encoding, in another embodiment, "encoding" refers only to differential encoding, and in yet another embodiment, "encoding" refers to a combination of differential encoding and entropy encoding. Whether the phrase "encoding process" refers specifically to a subset of operations or broadly refers to a broader encoding process will be apparent based on the context of the specific description and is believed to be well understood by those skilled in the art.

Note that syntax elements used herein are descriptive terms. Thus, they do not exclude the use of other syntax element names.

The present disclosure has described various information, such as, for example, syntax, that may be transmitted or stored. This information can be encapsulated or arranged in a variety of ways, including, for example, in a manner common in video standards, such as placing the information in SPS, PPS, NAL units, headers (e.g., NAL unit headers or slice headers), or SEI messages. Other ways are also available, including for example, a general way for system-level or application-level criteria, such as placing information into one or more of the following:

sdp (session description protocol), which is a format for describing multimedia communication sessions for session notification and session invitation, is used, for example, as described in RFC and in connection with RTP (real-time transport protocol) transport.

DASH MPD (media presentation description) descriptor, e.g. a descriptor associated with a representation or collection of representations to provide additional characteristics to the content representation, as used in DASH and transmitted over HTTP.

RTP header extension, e.g. as used during RTP streaming.

Iso base media file format, for example, as used in OMAF and using a box, which is an object-oriented building block defined by a unique type identifier and length, also referred to as "atom" in some specifications.

e. HLS (HTTP real-time streaming) manifest transmitted over HTTP. For example, a manifest may be associated with a version or set of versions of content to provide characteristics of the version or set of versions.

When the figures are presented as flow charts, it should be understood that they also provide block diagrams of corresponding devices. Similarly, when the figures are presented as block diagrams, it should be understood that they also provide a flow chart of the corresponding method/process.

Some embodiments refer to rate distortion optimization. In particular, during the encoding process, a balance or trade-off between rate and distortion is typically considered, often taking into account constraints of computational complexity. Rate distortion optimization is typically expressed as minimizing a rate distortion function, which is a weighted sum of rate and distortion. There are different approaches to solving the rate distortion optimization problem. For example, these methods may be based on extensive testing of all coding options (including all considered modes or coding parameter values) and evaluating their coding costs and the associated distortion of the reconstructed signal after encoding and decoding completely. Faster methods may also be used to reduce coding complexity, in particular the calculation of approximate distortion based on prediction or prediction residual signals instead of reconstructed residual signals. A mix of the two methods may also be used, such as by using approximate distortion for only some of the possible coding options, and full distortion for other coding options. Other methods evaluate only a subset of the possible coding options. More generally, many methods employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete assessment of both coding cost and associated distortion.

The specific implementations and aspects described herein may be implemented in, for example, a method or process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (e.g., discussed only as a method), the implementation of the features discussed may also be implemented in other forms (e.g., an apparatus or program). The apparatus may be implemented in, for example, suitable hardware, software and firmware. The method may be implemented in a processor such as that commonly referred to as a processing device,

the processing device includes, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end users.

Reference to "one embodiment" or "an embodiment" or "one embodiment" or "an embodiment" and other variations thereof means that a particular feature, structure, characteristic, etc., described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one embodiment" or "in an embodiment" and any other variations that occur in various places throughout this application are not necessarily all referring to the same embodiment.

In addition, the present application may be directed to "determining" various information. The determination information may include, for example, one or more of estimation information, calculation information, prediction information, or retrieval information from memory.

Furthermore, the present application may relate to "accessing" various information. The access information may include, for example, one or more of receiving information, retrieving information (e.g., from memory), storing information, moving information, copying information, computing information, determining information, predicting information, or estimating information.

In addition, the present application may be directed to "receiving" various information. As with "access," receipt is intended to be a broad term. Receiving information may include, for example, one or more of accessing information or retrieving information (e.g., from memory). Further, during operations such as, for example, storing information, processing information, transmitting information, moving information, copying information, erasing information, computing information, determining information, predicting information, or estimating information, the "receiving" is typically engaged in one way or another.

It should be understood that, for example, in the case of "a/B", "a and/or B", and "at least one of a and B", use of any of the following "/", "and/or" and "at least one" is intended to cover selection of only the first listed option (a), or selection of only the second listed option (B), or selection of both options (a and B). As a further example, in the case of "A, B and/or C" and "at least one of A, B and C", such phrases are intended to cover selection of only the first listed option (a), or only the second listed option (B), or only the third listed option (C), or only the first and second listed options (a and B), or only the first and third listed options (a and C), or only the second and third listed options (B and C), or all three options (a and B and C). As will be apparent to one of ordinary skill in the art and related arts, this extends to as many items as are listed.

Also, as used herein, the word "signaling" refers to (among other things) indicating something to the corresponding decoder. For example, in some implementations, the encoder signals a particular one of the plurality of upsampling filter coefficients. Thus, in one embodiment, the same parameters are used on both the encoder side and the decoder side. Thus, for example, an encoder may transmit (explicit signaling) certain parameters to a decoder so that the decoder may use the same certain parameters. Conversely, if the decoder already has specific parameters, among others, signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the specific parameters. By avoiding transmission of any actual functions, bit savings are achieved in various embodiments. It should be appreciated that the signaling may be implemented in various ways. For example, in various implementations, information is signaled to a corresponding decoder using one or more syntax elements, flags, and the like. Although the foregoing relates to the verb form of the word "signal," the word "signal" may also be used herein as a noun.

It will be apparent to one of ordinary skill in the art that implementations may produce various signals formatted to carry, for example, storable or transmittable information. The information may include, for example, instructions for performing a method or data resulting from one of the implementations. For example, the signal may be formatted to carry the bit stream of the described embodiments. Such signals may be formatted, for example, as electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or baseband signals. Formatting may include, for example, encoding the data stream and modulating the carrier with the encoded data stream. The information carried by the signal may be, for example, analog or digital information. It is known that signals may be transmitted over a variety of different wired or wireless links. The signal may be stored on a processor readable medium.

We describe a number of embodiments. The features of these embodiments may be provided separately or in any combination in the various claim categories and types. Further, embodiments may include one or more of the following features, devices, or aspects, alone or in any combination, across the various claim categories and types:

according to any of the embodiments, video is encoded/decoded, wherein the original picture can be encoded at high resolution or low resolution.

According to any of the embodiments, a picture is reconstructed from the scaled down decoded picture.

A bitstream or signal comprising one or more of the described syntax elements or variants thereof.

A bitstream or signal comprising a syntax conveying information generated according to any of the embodiments.

Creating and/or transmitting and/or receiving and/or decoding a bitstream or signal comprising one or more of the described syntax elements or variants thereof.

Creation and/or transmission and/or reception and/or decoding according to any of the embodiments.

A method, process, apparatus, medium storing instructions, medium storing data, or signal according to any one of the embodiments.

A television, set-top box, cellular telephone, tablet computer or other electronic device that performs reconstruction of pictures using upsampling according to any of the described embodiments.

Television, set-top box, mobile phone, tablet or other electronic device that performs reconstruction of a picture using upsampling and displays the resulting image (e.g., using a monitor, screen or other type of display) according to any of the embodiments.

Select (e.g., using a tuner) a channel to receive a signal comprising an encoded image and perform reconstruction of the picture using upsampling according to any of the described embodiments a television, a set-top box, a cellular phone, a tablet computer or other electronic device.

Television, set-top box, cellular phone, tablet or other electronic device that receives signals over the air (e.g., using an antenna) including encoded images and performs reconstruction of the pictures using upsampling according to any of the described embodiments.

According to any of the embodiments, video is encoded/decoded, wherein the same classification of pictures is shared in the encoding process or decoding process.

According to any of the embodiments, the video is encoded/decoded, wherein when sub-samples are to be interpolated, classification is used to select interpolation filters.

A television, set-top box, mobile phone, tablet or other electronic device performing reconstruction of a picture according to any of the embodiments.

Television, set-top box, cellular telephone, tablet or other electronic device that performs reconstruction of a picture and displays the resulting image (e.g., using a monitor, screen or other type of display) according to any of the described embodiments.

Select (e.g., using a tuner) a channel to receive a signal comprising an encoded image and perform reconstruction of the picture according to any of the described embodiments, a television, a set-top box, a cellular phone, a tablet computer, or other electronic device.

Television, set-top box, cellular telephone, tablet computer or other electronic device that receives signals over the air (e.g., using an antenna) including encoded images and performs reconstruction of pictures according to any of the described embodiments.

Claims

1. A method, the method comprising:

-decoding the first picture and,

-resampling at least a portion of the first picture to reconstruct at least a portion of a second picture using at least one resampling filter, wherein the resampling filter is selected in response to a phase of a first sample of the at least a portion of the second picture, the phase of the first sample being a sub-pixel position of the first sample in the at least a portion of the first picture.

2. The method of claim 1, further comprising transmitting the at least one reconstructed portion of the second picture to a display.

3. The method of claim 1 or 2, wherein the resampling filter is selected based on a classification of the first picture.

4. A method according to any one of claims 1 to 3, further comprising:

-storing the at least one reconstructed portion of the second picture in a decoded picture buffer storing a reference picture.

5. The method of claim 4, further comprising encoding a third picture, comprising:

determining a prediction for at least one block of the third picture using the at least one reconstructed portion of the second picture,

-encoding the at least one block of the third picture using the prediction.

6. The method of claim 4, further comprising decoding a third picture, comprising:

-decoding the at least one block of the third picture using the prediction.

7. A method according to any one of claims 1 to 6, comprising decoding coefficients of the resampling filter from a bitstream.

8. The method of any of claims 1 to 7, wherein the resampling filter is a non-separable filter.

9. The method of any one of claims 1 to 8, further comprising:

classifying samples of the at least a portion of the decoded first picture,

determining a class index of at least one first sample of the at least one portion of the second picture from at least one class index associated with at least one neighboring sample in the at least one portion of the decoded first picture,

-selecting the resampling filter in response to the determined class index associated with the at least one first sample.

10. The method of claim 9, wherein a different resampling filter is associated with each class.

11. The method of any of claims 1-10, wherein the resampling filter is determined based on a rate distortion cost determined between the at least one portion of the second picture and the at least one reconstructed portion of the second picture obtained from the decoded first picture.

12. An apparatus comprising one or more processors, wherein the one or more processors are configured to reconstruct at least a portion of a first picture from at least a portion of a second picture, the first picture and the second picture having different sizes, the reconstructing comprising:

decoding the second picture from the bitstream,

-determining at least one first sample of the at least one portion of the first picture using at least one resampling filter applied to at least one second sample of the at least one portion of the decoded second picture.

13. The device of claim 12, wherein the one or more processors are further configured to transmit the at least one reconstructed portion of the first picture to a display.

14. An apparatus comprising one or more processors, wherein the one or more processors are configured to encode a picture of video, the encoding comprising:

encoding a second picture in the bitstream, said second picture being a scaled-down picture of the first picture,

-encoding a third picture in the bitstream, the third picture having the same size as the first picture, wherein encoding the third picture comprises:

reconstructing at least a portion of the first picture by upsampling at least a portion of the second picture after decoding, the upsampling comprising determining at least one first sample of the at least a portion of the first picture using at least one resampling filter applied to at least one second sample of the at least a portion of the decoded second picture,

-store the at least one reconstructed portion of the first picture in a decoded picture buffer, the decoded picture buffer storing reference pictures for encoding the third picture.

15. The device of claim 14, wherein encoding the third picture further comprises:

Determining a prediction for at least one block of the third picture using the at least one reconstructed portion of the first picture,

-encoding the at least one block of the third picture using the prediction.

16. An apparatus comprising one or more processors, wherein the one or more processors are configured to decode video from a bitstream, the decoding comprising:

decoding a second picture from said bitstream, said second picture being a scaled-down picture of the first picture,

-decoding a third picture from the bitstream, the third picture having the same size as the first picture, wherein decoding the third picture comprises:

reconstructing at least a portion of the first picture by upsampling at least a portion of the decoded second picture, the upsampling comprising determining at least one first sample of the at least a portion of the first picture using at least one resampling filter applied to at least one second sample of the at least a portion of the decoded second picture,

-store the at least one reconstructed portion of the first picture in a decoded picture buffer, the decoded picture buffer storing reference pictures for decoding the third picture.

17. The device of claim 16, wherein decoding the third picture further comprises:

-decoding the at least one block of the third picture using the prediction.

18. The apparatus of any of claims 12 to 17, wherein the one or more processors are further configured to decode coefficients of the resampling filter from the bitstream.

19. The apparatus of any of claims 12 to 18, wherein the resampling filter is a non-separable filter.

20. The apparatus of any of claims 12 to 19, wherein the one or more processors are further configured to:

classifying samples of the at least a portion of the decoded second picture,

determining a class index of at least one first sample of the at least one portion of the first picture from at least one class index associated with at least one neighboring sample in the at least one portion of the decoded second picture,

21. The apparatus of claim 20, wherein a different resampling filter is associated with each class.

22. The device of any of claims 12-21, wherein selecting the resampling filter is responsive to a phase of a first sample of the at least a portion of the first picture to be upsampled, the phase of the first sample being a sub-pixel position of the first sample in the at least a portion of the second picture.

23. The device of any of claims 14-15 or 18-22, wherein the resampling filter is determined based on a rate distortion cost determined between the at least one portion of the first picture and the at least one reconstructed portion of the first picture obtained from the decoded second picture.

24. A signal comprising a bit stream, the signal being formed by performing the method of any of claims 3 to 4, 7 to 12.

25. A computer readable medium comprising the bitstream of claim 24.

26. A computer-readable storage medium having instructions stored thereon for causing one or more processors to perform the method of any one of claims 1 to 12.

27. A computer program product comprising instructions which, when the program is executed by one or more processors, cause the one or more processors to perform the method of any of claims 1 to 12.

28. An apparatus, the apparatus comprising:

-the device according to any one of claims 12 to 23; and

-at least one of the following: (i) An antenna configured to receive a signal, the signal comprising data representative of video; (ii) A band limiter configured to limit the received signal to a frequency band including the data representing video; or (iii) a display configured to display at least a portion of the at least one first image.

29. The apparatus of claim 28, comprising a television, a cellular telephone, a tablet computer, or a set-top box.

30. An apparatus, the apparatus comprising:

o an access unit configured to access data comprising the signal according to claim 24,

and an o transmitter configured to transmit the accessed data.

31. A method, the method comprising: accessing data comprising the signal of claim 24, and transmitting the accessed data.