CN118075459A

CN118075459A - Video encoding and decoding method and device

Info

Publication number: CN118075459A
Application number: CN202211479126.2A
Authority: CN
Inventors: 林泽辉; 蔡康颖; 徐逸群; 曹潇然; 周建同; 陈焕浜
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2024-05-24

Abstract

A video encoding and decoding method and device are disclosed, and relate to the field of computers. The encoding end (decoding end) performs inter-frame prediction using a first type of filter determined by the rendering information, acquires a reference frame of the data to be processed from a reconstructed image (decoded image) of the processed data, determines a prediction block of the data to be processed from the reference frame, and performs encoding (decoding) based on the prediction block. The first type filter is matched with an image rendering engine adopted by the data to be processed, the time-space domain correlation between the reference frame obtained by the first type filter and the data to be processed is improved, the accuracy of the motion information of the data to be processed determined by the coding end according to the reference frame is improved, and the corresponding data amount of the image in the code stream obtained by the coding end based on the motion information is reduced. The accuracy of the decoding end for determining the prediction block corresponding to the data to be processed according to the reference frame is improved, and when the decoding end decodes based on the prediction block, the processed data volume is reduced, and the video decoding efficiency is improved.

Description

Video encoding and decoding method and device

Technical Field

The present application relates to the field of computers, and in particular, to a video encoding and decoding method and apparatus.

Background

Among video coding and decoding techniques, video compression techniques are particularly important. Video compression systems perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundant information inherent in video sequences. For the video coding process of video compression, the coding end randomly selects more than one reference frame from the frames coded by video images, acquires a predicted block corresponding to a current image block from the reference frames, calculates a residual value between the predicted block and the current image block, and carries out quantization coding on the residual value. In the process of obtaining the predicted block, a DCT-based interpolation filter (DCT-based Interpolation Filter, DCTIF) is employed to determine a motion vector for the current image block, the motion vector indicating the current image block as compared to the offset coefficients of the predicted block. However, due to different video contents, if DCTIF are adopted for all image blocks in the video, the accuracy of determining the motion vector of the current image block is affected, and the difference between the current image block and the corresponding prediction block is large, so that the residual error value between the prediction block and the current image block is large, the data volume of the video code stream obtained by encoding is large, the video compression performance is low, and the video encoding efficiency is affected. Therefore, how to provide a more efficient video encoding and decoding method is a problem that needs to be solved.

Disclosure of Invention

The application provides a video coding and decoding method and device, which solve the problems of larger code stream data amount and lower video compression rate obtained by video compression.

In a first aspect, the present application provides a video coding method, which is applicable to a codec system or to a coding end supporting the codec system to implement the video coding method, for example, the coding end includes a video encoder. Here, the video encoding method provided by the present embodiment is described by taking an example of the execution of the encoding end, and the video encoding method includes: first, the encoding end obtains source data and rendering information corresponding to the source data. Second, the encoding end determines a first type filter matched with the rendering information from the set multiple types of filters. Third, the encoding end uses the first type filter to determine the reference frame of the second image from the reconstructed image. Fourth, the encoding end determines motion information of the second image compared to the first image based on the reference frame. Fifth, the encoding end encodes the second image according to the motion information of the second image.

Wherein, the rendering information is used for indicating: the encoding end generates a filter parameter used in the process of the code stream according to source data, wherein the source data comprises a plurality of source images.

For example, in the reference frame, the first type filter is obtained according to a reconstructed image obtained by encoding and decoding a first image in the plurality of images, where the first image is an image in the plurality of images, and a similarity between the first image and a second image reaches a threshold.

Because the first type filter is determined according to the rendering information, the first type filter is matched with an image rendering engine adopted by the second image, the time-space domain correlation between a reference frame obtained by interpolation of the reconstructed image and the second image is improved by the first type filter, the accuracy of motion information of the second image determined by the coding end according to the reference frame is improved, for example, the similarity between a prediction block determined by the coding end and a corresponding image block in the second image is improved, the coding end carries out coding based on the prediction block and the motion information between the two images, the coding effect of the image in a video is improved, the data amount corresponding to the image in a code stream obtained by coding is reduced, and the video compression rate is improved.

In one possible implementation manner, the encoding end determines a first type of filter matched with the rendering information from the set multiple types of filters, including: the encoding end uses a filter which is consistent with the parameters of the filter as a first type of filter from multiple types of filters.

Because the type of the first type filter is matched with the rendering information determined by the encoding end, the first type filter adopted by the encoding end is also matched with a filter in an image rendering engine adopted by source data, the problem of low time-space domain correlation caused by an interpolation filter only based on discrete cosine transform is avoided, the reduction of the data volume occupied by images in a code stream is facilitated, and the encoding effect of video is improved.

In one possible implementation, the filter parameters include one or more combinations of filter type, filter coefficients, and number of taps.

In one possible implementation, the filter type includes one of a bilinear interpolation filter, a bicubic interpolation filter, and a nearest neighbor interpolation filter.

In one possible implementation, if the filter parameters include filter coefficients, the coefficients of the first type of filter are filter coefficients; if the filter parameters do not include filter coefficients, the coefficients of the first type of filter are the set filter coefficients.

The coding end takes the filter coefficient as the coefficient of the first type filter under the condition that the filter coefficient can be obtained; and under the condition that the filter coefficient is not available, adopting the set filter coefficient as the coefficient of the first type of filter, and realizing the self-adaptive determination of the coefficient of the first type of filter by the coding end according to the content indicated by the filter parameter. And secondly, as the first type filter coefficient is consistent or close to the filter coefficient adopted by the image rendering engine, the first type filter coefficient is utilized to process when the coding end codes, so that the time-space domain correlation between the source image in the source data and the reference frame obtained by utilizing the first type filter coefficient is improved, the coding effect is further improved, and the data quantity corresponding to the image in the code stream obtained by coding is reduced.

In one possible implementation, the determining, by the encoding end, the reference frame of the second image includes: first, the coding end interpolates sub-pixels in the same row or column of the whole pixels in the reconstructed image based on the brightness coefficient matrix and the chromaticity coefficient matrix by using a first filter to obtain corresponding brightness value and chromaticity value. And secondly, the coding end interpolates the residual sub-pixels in the reconstructed image by using a first filter according to the brightness values of the sub-pixels in the same row or column of the whole pixels, the brightness coefficient matrix and the chromaticity coefficient matrix to obtain the brightness values and the chromaticity values of the residual sub-pixels. Thirdly, the coding end obtains a reference frame according to the brightness value and the chromaticity value of the sub-pixels.

Because the first type filter in the coding end interpolates the reconstructed image by using the first type filter coefficient, the interpolation pixel position is fixed, and then the inter-frame predictor in the coding end can accelerate the interpolation process based on the determined first type filter coefficient. And because the first type filter coefficient is matched with the filter coefficient adopted by the image rendering engine, the time-space domain correlation between the reference frame obtained based on the first type filter coefficient and the second image is improved, the accuracy of the inter-frame predictor in determining the prediction block and the corresponding motion information of the second image is enhanced, the coding effect is improved, and the data amount corresponding to the second image in the coded code stream is reduced.

In one possible implementation, the motion information is obtained by: first, the encoding end searches a second image block with similarity reaching a threshold value with the first image block in the reference frame based on the first image block in the second image, and takes the second image block as a prediction block. Second, the encoding end determines a first coordinate of the first image block in the second image and a second coordinate of the predicted block in the reference frame. Thirdly, the coding end obtains the motion information of the second image according to the first coordinate and the second coordinate.

Because the first type filter is determined according to the rendering information, the first type filter is matched with an image rendering engine adopted by the second image, the encoding end adopts the first type filter to interpolate when in inter-frame prediction to obtain a reference frame, the time-space domain correlation between the reference frame and the second image is improved, the similarity between a prediction block determined by the reference frame and a corresponding image block in the second image is improved by the encoding end according to the similarity between the prediction block determined by the reference frame and the corresponding image block in the second image, the accuracy of the motion information of the second image determined by the reference frame is improved, and furthermore, the encoding end encodes based on the motion information between the prediction block and the two images, and the data volume corresponding to the second image is reduced.

In a second aspect, the present application provides a video decoding method that is applicable to a codec system or to a decoding side supporting the codec system to implement the video decoding method, e.g. the decoding side includes a video decoder. Here, a description will be given by taking, as an example, a video decoding method provided by the present embodiment performed by a decoding end, the video decoding method including: first, the decoding end obtains the code stream and the rendering information corresponding to the code stream. Second, the decoding end determines a first type filter matched with the rendering information from the set multiple types of filters. Third, the decoding side determines the reference frame of the second image frame from the first decoded image using the first type of filter. Fourth, the decoding end decodes the second image frame according to the motion information between the first image frame and the second image frame and the reference frame.

Wherein the code stream includes a plurality of image frames, and the rendering information is used for indicating: the encoding end generates the filter parameters used in the process of the code stream according to the source data.

The reference frame is illustratively obtained by a first type of filter based on a first decoded image corresponding to a first image frame, where the first image frame is an image frame in the plurality of image frames having a similarity to a second image frame that reaches a threshold.

Because the first type filter is determined according to the rendering information, the first type filter is matched with an image rendering engine adopted by the source image corresponding to the second image frame, the time-space domain correlation of the reference frame obtained by interpolation of the first type filter on the first decoded image and the source image corresponding to the second image frame is improved, the accuracy of determining the prediction block corresponding to the second image frame by the decoding end according to the reference frame is improved, for example, the similarity of the prediction block determined by the decoding end and the image block corresponding to the second image frame is improved, when the decoding end decodes the second image frame based on the prediction block, the data amount processed by the decoding end is reduced under the condition that the processing capacity of the decoding end is consistent, and the video decoding efficiency is improved.

In one possible implementation, the decoding end decodes the second image frame according to the motion information between the first image frame and the second image frame and the reference frame, including: the decoding end determines a second position of the second image block in the reference frame according to the offset of the first position of the first image block in the second image frame indicated by the motion information. And the decoding end takes the second image block as a prediction block of the second image frame and decodes the second image frame based on the prediction block.

The position of the predicted block corresponding to the second image frame in the reference frame can be accurately determined due to the motion information indicated by the code stream, and then the decoding end can improve the decoding efficiency of the second image frame according to the motion information.

In one possible implementation manner, the decoding end determines a first type of filter matched with the rendering information from the set multiple types of filters, including: the decoding end takes a filter which is consistent with the parameters of the filter as a first type of filter from multiple types of filters.

Because the type of the first type filter determined by the decoding end is matched with the rendering information, the first type filter adopted by the decoding end is also matched with a filter in an image rendering engine adopted by source data, the problem of lower time-space domain correlation caused by an interpolation filter only based on discrete cosine transform is avoided, and the accuracy of determining a prediction block of a second image frame by an inter-frame predictor in the decoding end is enhanced.

The decoding end takes the filter coefficient as the coefficient of the first type filter under the state that the filter coefficient can be obtained; and under the condition that the filter coefficient is not available, adopting the set filter coefficient as the coefficient of the first type of filter, and realizing the self-adaptive determination of the coefficient of the first type of filter by the decoding end according to the content indicated by the filter parameter. And secondly, as the first type filter coefficient is consistent or close to the filter coefficient adopted by the image rendering engine, the first type filter coefficient is utilized for processing when the decoding end decodes, so that the time-space domain correlation between the source image corresponding to the second image frame in the code stream and the reference frame obtained by utilizing the first type filter coefficient processing is improved, the data volume processed by the decoding end is further reduced, and the video decoding efficiency is improved.

In one possible implementation, the determining, by the decoding end, the reference frame of the second image frame includes: first, the decoding end interpolates sub-pixels in the same row or column of the whole pixels in the reconstructed image based on the brightness coefficient matrix and the chromaticity coefficient matrix by using a first filter to obtain corresponding brightness value and chromaticity value. And secondly, the decoding end interpolates the residual sub-pixels in the reconstructed image by using a first filter according to the brightness values of the sub-pixels in the same row or column of the whole pixels, the brightness coefficient matrix and the chromaticity coefficient matrix to obtain the brightness values and the chromaticity values of the residual sub-pixels. Thirdly, the decoding end obtains a reference frame according to the brightness value and the chromaticity value of the sub-pixels.

Because the first type filter in the decoding end interpolates the reconstructed image by using the first type filter coefficient, the interpolation pixel position is fixed, and then the inter-frame predictor in the decoding end can accelerate the interpolation process based on the determined first type filter coefficient. And because the first type filter coefficient is matched with the filter coefficient adopted by the image rendering engine, the time-space domain correlation between the reference frame and the second image frame obtained by the decoding end based on the first type filter coefficient is improved, the accuracy of determining the prediction block of the second image frame by the inter-frame predictor is enhanced, the data volume processed by the decoding end is reduced under the condition that the processing capacity of the decoding end is consistent, and the video decoding efficiency is improved.

In a third aspect, the present application provides a video coding apparatus applied to a coding side and adapted for use in a video coding system comprising a coding side, the video coding apparatus comprising respective modules for performing the video coding method of the first aspect or any of the alternative implementations of the first aspect. Illustratively, the video encoding apparatus includes: the device comprises a first acquisition module, a first interpolation module and a coding module. The first acquisition module is used for acquiring the source data and rendering information corresponding to the source data. And the first interpolation module is used for determining a first type filter matched with the rendering information from the set multiple types of filters and determining a reference frame of the second image. And the encoding module is used for encoding the second image based on the motion information of the second image determined by the reference frame.

Wherein the rendering information is used to indicate: the filter parameters used in generating the code stream are based on source data, and the source data includes a plurality of source images. The reference frame is obtained by a first type filter according to a reconstructed image obtained by decoding a first image in a plurality of images after encoding, wherein the first image is an image with similarity reaching a threshold value with a second image in the plurality of images.

In a fourth aspect, the present application provides a video decoding apparatus applied to a decoding side and adapted for use in a video codec system comprising a decoding side, the video decoding apparatus comprising means for performing the video decoding method of the second aspect or any of the alternative implementations of the second aspect. Illustratively, the video decoding apparatus includes: the system comprises a second acquisition module, a second interpolation module and a decoding module. The second acquisition module is used for acquiring the code stream and rendering information corresponding to the code stream. And the second interpolation module is used for determining a first type filter matched with the rendering information from the set multiple types of filters and determining a reference frame of the second image frame. And the decoding module is used for decoding the second image frame according to the motion information between the first image frame and the second image frame and the reference frame.

Wherein the bitstream includes a plurality of image frames, and rendering information is used to indicate: filter parameters used in generating the code stream from the source data. The reference frame is obtained by a first type filter according to a first decoding image corresponding to a first image frame, and the first image frame is an image frame with similarity reaching a threshold value with a second image frame in a plurality of image frames.

In a fifth aspect, the present application provides a chip comprising: a processor and a power supply circuit; the power supply circuit is configured to supply power to a processor configured to perform the method of any one of the possible implementations of the first aspect and the first aspect; and/or the processor is configured to perform the method according to any one of the possible implementations of the second aspect and the second aspect.

In a sixth aspect, the present application provides a codec comprising a memory for storing computer instructions and a processor; the processor, when executing computer instructions, implements the method of any one of the possible implementations of the first aspect and the first aspect; and/or the processor, when executing computer instructions, implements the method of any one of the possible implementations of the second aspect and the second aspect.

In a seventh aspect, the present application provides a codec system, including an encoding end and a decoding end; the encoding end is used for encoding the plurality of images according to rendering information corresponding to the plurality of source images to obtain code streams corresponding to the plurality of images, so as to realize the method in any one of the possible implementation manners of the first aspect and the first aspect;

The decoding end is used for decoding the code stream according to rendering information corresponding to the plurality of source images to obtain a plurality of decoded images, and the method in any one of the possible implementation manners of the second aspect and the second aspect is realized.

In an eighth aspect, the present application provides a computer readable storage medium having stored therein a computer program or instructions which, when executed by a processing device, implement the method of any of the above-mentioned first aspect and optional implementation manners of the first aspect; and/or when the computer program or instructions is executed by a processing device, implement the method of any of the above second aspect and optional implementation manners of the second aspect.

In a ninth aspect, the present application provides a computer program product comprising a computer program or instructions which, when executed by a processing device, performs the method of any of the above-described first aspects and alternative implementations of the first aspect; and/or the computer program or instructions, when executed by a processing device, implement the method of any of the alternative implementations of the second and third aspects described above.

The advantages of the third to ninth aspects above may be referred to the description of the first aspect or any implementation manner of the second aspect, and are not repeated here. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.

Drawings

Fig. 1 is an exemplary block diagram of a video codec system provided by the present application;

Fig. 2 is a schematic structural diagram of a video encoder according to the present application;

Fig. 3 is a schematic diagram of a video decoder according to an embodiment of the present application;

Fig. 4 is a schematic flow chart of a video encoding method according to the present application;

FIG. 5 is a schematic diagram of inter prediction provided by the present application;

FIG. 6 is a schematic diagram of a sub-pixel interpolation provided by the present application;

fig. 7 is a schematic flow chart of a video decoding method according to the present application;

fig. 8 is a schematic structural diagram of a video encoding device according to the present application;

fig. 9 is a schematic structural diagram of a video decoding device according to the present application;

fig. 10 is a schematic structural diagram of a computer device according to the present application.

Detailed Description

The embodiment of the application provides a video coding method, which comprises the following steps: the encoding end uses a first type of filter determined by the rendering information to perform inter-frame prediction on the first image and the second image, and determines a reference frame of the second image from a reconstructed image (or a decoded image) of the first image. Further, the encoding terminal encodes the second image based on the motion information between the second image and the first image determined by the reference frame. In this embodiment, since the first type of filter is determined according to the rendering information, the first type of filter is matched with the image rendering engine adopted by the second image, so that the time-space domain correlation between the reference frame obtained by interpolating the reconstructed image and the second image is improved, which is beneficial to improving the accuracy of the motion information of the second image determined by the encoding end according to the reference frame, for example, the similarity between the prediction block determined by the encoding end and the corresponding image block in the second image is improved, the encoding end encodes based on the prediction block and the motion information between the two images, the encoding effect of the image in the video is improved, and the data amount corresponding to the image in the encoded code stream is reduced.

The embodiment of the application also provides a video decoding method, which comprises the following steps: the decoding end determines a first type filter by using rendering information, performs inter-frame prediction on the first image frame and the second image frame, and acquires a reference frame of the second image frame from a first decoded image of the first image frame. Furthermore, the decoding terminal decodes the second image frame based on the reference frame and motion information between the first image frame and the second image frame indicated in the code stream. In this embodiment, since the first type of filter is determined according to the rendering information, the first type of filter is matched with the image rendering engine adopted by the source image corresponding to the second image frame, so that the time-space domain correlation between the reference frame obtained by interpolating the first decoded image by the first type of filter and the source image corresponding to the second image frame is improved, which is beneficial to improving the accuracy of determining the prediction block corresponding to the second image frame by the decoding end according to the reference frame, for example, the similarity between the prediction block determined by the decoding end and the image block corresponding to the second image frame is improved, and when the decoding end decodes the second image frame based on the prediction block, the processed data amount is reduced, and the video decoding efficiency is improved.

The following description of the embodiments of the present application will be given with reference to the accompanying drawings, which are given by way of brief description of the related art.

Video encoding (video encoding): and compressing the multi-frame images included in the video into a code stream.

Video decoding (video decoding): and recovering the code stream into a processing process of multi-frame reconstructed images according to a specific grammar rule and a processing method.

In order to realize the encoding and decoding of video, the application provides a video encoding and decoding system. As shown in fig. 1, fig. 1 is an exemplary block diagram of a video codec system according to the present application, where the video codec system includes an encoding end 100 and a decoding end 200. The encoding side 100 generates encoded video data (or referred to as a bitstream). Thus, the encoding end 100 may be referred to as a video encoding device. The decoding side 200 may decode a code stream (e.g., video comprising one or more image frames) generated by the encoding side 100. Accordingly, the decoding end 200 may be referred to as a video decoding device. Various implementations of the encoding end 100, decoding end 200, or both may include one or more processors and memory coupled to the one or more processors with the processors. The memory may include, but is not limited to, random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (erasable PROM), electrically erasable programmable EPROM (EEPROM), flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures that can be accessed by a computer.

In the example of fig. 1, encoding end 100 includes source video 110, video encoder 120, and output interface 130. In some examples, output interface 130 may include a regulator/demodulator (modem) and/or a transmitter. The source video 110 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, such as an image rendering engine, or a combination of the above video sources.

Video encoder 120 may encode video data from source video 110. In some examples, encoding end 100 transmits the code stream directly to decoding end 200 by output interface 130 via link 300. In other examples, the code stream may also be stored on the storage device 400 for later access by the decoding end 200 for decoding and/or playback.

In the example of fig. 1, decoding end 200 includes an input interface 230, a video decoder 220, and a display device 210. In some examples, input interface 230 includes a receiver and/or a modem. Input interface 230 may receive encoded video data via link 300 and/or from storage 400. The display device 210 may be integrated with the decoding end 200 or may be external to the decoding end 200. In general, display device 210 displays decoded video data. The display device 210 may include a variety of display devices, such as a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.

The present application provides a possible video encoder based on the video codec system shown in fig. 1. Fig. 2 is a schematic structural diagram of a video encoder according to the present application, as shown in fig. 2.

The video encoder 120 includes an inter predictor 121, an intra predictor 122, a transformer 123, a quantizer 124, an inverse quantizer 126, an inverse transformer 127, a filter unit 128, and a memory 129. The inverse quantizer 126 and the inverse transformer 127 are used for image block reconstruction. The filter unit 128 is used to indicate one or more loop filters, such as a deblocking filter, an adaptive loop filter, and a sample adaptive offset filter.

Memory 129 may store video data encoded by components of video encoder 120. Video data stored in memory 129 may be obtained from source video 110. Memory 129 may be a reference image memory that stores reference video data for encoding video data by video encoder 120 in an intra, inter coding mode. Memory 129 may be a dynamic random access memory (DYNAMIC RAM, DRAM), magnetoresistive RAM (MAGNETIC RAM, MRAM), resistive RAM (RESISTIVE RAM, RRAM), or other type of memory device.

The following describes the workflow of a video encoder with respect to the video encoding flow in conjunction with the content of fig. 2.

After the source data generates a prediction block of the current image block (or referred to as a second image) via the inter predictor 121 and the intra predictor 122, the video encoder 120 subtracts the prediction block from the current image block to be encoded to form a residual image block. Residual video data in the residual block may be included in one or more Transform Units (TUs) and applied to transformer 123. The transformer 123 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform or a conceptually similar transform. The transformer 123 may convert the residual video data from a pixel value domain to a transform domain, such as the frequency domain.

The transformer 123 may send the resulting transform coefficients to the quantizer 124. Quantizer 124 quantizes the transform coefficients to further reduce bit rate. In some examples, quantizer 124 may then perform a scan of a matrix containing quantized residual transform coefficients. Or the entropy encoder 125 may perform the scan.

After quantization, the quantized transform coefficients are entropy encoded by the entropy encoder 125. For example, entropy encoder 125 may perform Context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or another entropy coding method or technique. After entropy encoding by entropy encoder 125, the encoded stream may be transmitted to a video decoder, or archived for later transmission or retrieval by the video decoder. The entropy encoder 125 may also entropy encode syntax elements of the current image block to be encoded.

The inverse quantizer 126 and the inverse transformer 127 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block for a reference image. The video encoder 120 adds the reconstructed residual block to a prediction block generated by the inter predictor 121 or the intra predictor 122 to generate a reconstructed image or a reconstructed image block. The filter unit 128 may be adapted to reconstructed image blocks to reduce distortion, such as blocking artifacts (blockartifacts). This reconstructed image or block of reconstructed images is then stored in the memory 129 as a reference block (or referred to as a first decoded image) that can be used by the inter predictor 121 as a reference block for inter prediction of blocks in subsequent video frames or images.

It should be appreciated that other structural variations of video encoder 120 may be used to encode the video stream. For example, for some image blocks or image frames, video encoder 120 may directly quantize the residual signal without processing via transformer 123, and correspondingly without processing via inverse transformer 127; or for some image blocks or image frames, the video encoder 120 does not generate residual data and accordingly does not need to be processed by the transformer 123, quantizer 124, inverse quantizer 126 and inverse transformer 127; or the video encoder 120 may store the reconstructed image block directly as a reference block without processing by the filter unit 128; or quantizer 124 and inverse quantizer 126 in video encoder 120 may be combined.

As shown in fig. 3, fig. 3 is a schematic structural diagram of a video decoder according to an embodiment of the present application. The video decoder 220 includes an entropy decoder 221, an inverse quantizer 222, an inverse transformer 223, a filter unit 224, a memory 225, an inter predictor 226, and an intra predictor 227. Video decoder 220 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 120 from fig. 2. First, a residual block or residual value is obtained using the entropy decoder 221, the inverse quantizer 222, and the inverse transformer 223, and the decoded code stream determines whether intra prediction or inter prediction is used for the current image block. If it is intra-prediction, the intra-predictor 227 constructs prediction information according to the used intra-prediction method using pixel values of pixel points in the surrounding reconstructed region. If the inter prediction is performed, the inter predictor 226 needs to parse out motion information, determine a reference block in the reconstructed image by using the parsed motion information, and use pixel values of pixels in the block as prediction information, and obtain the reconstructed image by filtering operation by adding residual information to the prediction information.

The source data in the above examples may be obtained by the encoding side or by an image rendering engine (e.g., V-Ray, unreal, unity) in communication with the encoding side. The image rendering engine may interpolate the acquired image using a combination of one or more of the following interpolation filters to obtain source data: bilinear interpolation (bilinear) filters, bicubic interpolation (bicubic) filters, or nearest neighbor interpolation (Nearest Neighbour) filters, etc.

Alternatively, a large amount of rendering information is generated when the image rendering engine interpolates the image, which may be used to indicate filter parameters employed by the image rendering engine, e.g., the filter parameters may include one or more of filter type, filter coefficients, and tap number.

In order to reduce the load on the side (such as the display side or the decoding side), the image rendering engine may be deployed on the cloud side (such as the cloud side server or the encoding side on the cloud side), and the encoding side 100 performs image rendering on the plurality of images to obtain source data, and encodes the source data to obtain a code stream. And, the encoding end 100 transmits the code stream to the decoding end 200, and the decoding end 200 decodes the code stream and plays the code stream.

Illustratively, in a cloud game scenario, the encoding end 100 processes a plurality of images through an image rendering engine to obtain a game frame, and encodes the game frame to obtain a code stream, so as to reduce the data amount of the transmitted game frame, the decoding end 200 decodes the code stream and plays the code stream, and a user operates a game on the decoding end 200 (such as a mobile device).

Since the encoding end 100 and the decoding end 200 are located at the cloud side and the end side, respectively, in order to improve the data transmission efficiency, the encoding end 100 compresses the source data to obtain a code stream, and then the encoding end 100 sends the code stream to the decoding end 200. For this reason, the present application provides a video encoding method, as shown in fig. 4, fig. 4 is a schematic flow chart of the video encoding method provided by the present application, and the video encoding method may be applied to the video encoding and decoding system shown in fig. 1 or the video encoder shown in fig. 2, and the video encoding method may be executed by the encoding end 100 or the video encoder 120, and here, the image encoding method provided by the present embodiment is described by taking the encoding end 100 as an example. As shown in fig. 4, the image encoding method provided in the present embodiment includes the following steps S410 to S450.

S410, the encoding end 100 acquires the source data and rendering information corresponding to the source data.

Wherein the rendering information is used to indicate: the filter parameters used in generating the code stream are based on source data, and the source data includes a plurality of source images.

In one possible scenario, the filter parameters may include one or more of a filter type, a filter coefficient, a number of taps, and the like.

The present application gives the following examples for the content of the filter type and the filter coefficient indication in the filter parameters.

Example 1, the filter type includes one of bilinear, bicubic, nearestNeighbour, etc.

Example 2, the filter coefficients may include filter coefficients that may be used to indicate: and (5) interpolating the coefficient matrix. As shown in the following matrix, this matrix is one possible example of a matrix of interpolation coefficients.

The above matrix is only an example provided by the present application, and should not be construed as limiting the present application, and the numerical values in the above matrix may also be changed to other numerical values.

S420, the encoding end 100 determines a first type filter matched with the rendering information from the set multiple types of filters.

The encoding end 100 determines a first type of filter from among the set multiple types of filters based on the filter type indicated in the filter parameters. The encoding end 100 uses this first type of filter as an interpolation filter used by the inter predictor in fig. 2.

Illustratively, the encoding side 100 may select a filter that is identical to the filter type indicated in the above-described filter parameters as the first type of filter.

For example, the filter type indicated in the filter parameters is bilinear filters, and the encoding end 100 will select bilinear filters from the set multiple types of filters as the first type of filters.

Because the type of the first type of filter determined by the encoding end 100 is matched with the rendering information, the first type of filter adopted by the encoding end 100 is also matched with a filter in an image rendering engine adopted by the source data, so that the problem of low time-space domain correlation caused by an interpolation filter only based on discrete cosine transform is avoided, the data volume occupied by images in a code stream is reduced, and the encoding effect of video is improved.

In one possible scenario, if the above-mentioned filter parameters include filter coefficients, the filter coefficients are taken as coefficients of the first type of filter.

If the filter parameters do not include the filter coefficients, the set filter coefficients are used as the coefficients of the first type of filter.

Optionally, the rendering information includes filter parameters including data such as a data sampling format, sampling precision, a number of taps, and an interpolation mode. Then, the encoding end can determine the filter coefficients according to data such as data sampling format, sampling precision, number of taps, interpolation mode and the like. For example, the set filter coefficients may be determined based on data such as data sampling format, sampling accuracy, number of taps, interpolation scheme, etc. The sampling format may be YUV420, with Y representing the luminance component and UV representing the chrominance components, with the foregoing sampling format representing sharing a set of chrominance components for every 4 luminance components. The sampling precision is divided into a luminance component sampling precision and a chrominance component sampling precision, the corresponding values are 1/M and 1/N respectively, in H.265/HEVC, M is set to 4, and N is set to 8. The number of taps may be 2, meaning that the sub-pixel interpolation is performed with luminance and chrominance values of two whole pixels. The weight used for interpolation of the first type filter is distributed according to the distance between the interpolation point and the sampling point, wherein the interpolation point is a sub-pixel, and the sampling point is an integral pixel. The encoding end 100 can obtain the corresponding luminance interpolation coefficient matrix and chrominance interpolation coefficient matrix according to the set filter coefficients. The luminance interpolation coefficient matrix may be as in the matrix 1, and the numerical values in the matrix are used to indicate the configured weights.

The above values in the data sampling format, sampling precision, and number of taps are only examples, and should not be construed as limiting the application, and other values may be used in other embodiments of the application.

The encoding end 100 takes the filter coefficient as the coefficient of the first type of filter in a state that the filter coefficient is available; and in the state that the filter coefficient is not available, the set filter coefficient is adopted as the coefficient of the first type of filter, so that the coding end 100 can adaptively determine the coefficient of the first type of filter according to the content indicated by the filter parameter. Secondly, as the first type filter coefficient is consistent or close to the filter coefficient adopted by the image rendering engine, the first type filter coefficient is utilized to process when the encoding end 100 encodes, so that the time-space domain correlation between the source image in the source data and the reference frame obtained by utilizing the first type filter coefficient processing is improved, the encoding effect is further improved, and the data quantity corresponding to the image in the code stream obtained by encoding is reduced.

With continued reference to fig. 4, the video encoding method provided in the present embodiment further includes step S430.

S430, the encoding end 100 determines a reference frame of the second image from the reconstructed image by using the first type filter.

The reference frame is obtained by a first type filter according to a reconstructed image obtained by encoding and decoding a first image in a plurality of images. The first image is encoded before the second image.

The reconstructed image corresponding to the first image is determined by the encoding end 100 from the images stored in the storage 129 in fig. 2, where the first image is an image with a similarity with the second image reaching a threshold value in the plurality of images, and further the similarity between the reconstructed image and the second image reaches the threshold value.

For the description of the reconstructed image, reference is made to the content of fig. 2, and details are not repeated here.

The encoding end 100 uses this first type of filter for interpolation filters as in the inter predictor 121 in fig. 2 to assist the inter predictor 121 in performing inter prediction, achieving motion estimation and motion compensation. For example, the encoding end 100 determines, from the reconstructed image, an image block matching the first image block according to a first condition (for example, 80% or more) based on the first image block in the second image, and to ensure the matching accuracy, the encoding end 100 interpolates the whole pixel in the image block matching the first image block according to the first condition to determine the pixel value of the sub-pixel in the image block matching the first image block according to the first condition. Wherein the integer pixel refers to the original pixel in the second image or the reconstructed image. The sub-pixels are sub-pixels obtained by interpolation of the whole pixels, and the positions of the sub-pixels are divided into two types, wherein one type is the sub-pixels which are in the same row or the same column as the whole pixels, and the other type is the sub-pixels which are not in the same row or the same column as the whole pixels.

In this embodiment, the above-described image block with the pixel values of the sub-pixels may be referred to as a reference block of the first image block. The encoding end 100 determines, based on the first image block, a prediction block from the reference block that matches the first image block with a second condition (e.g., 90% or more). And then the inter prediction process for the first image block is completed.

The encoding end 100 performs the above-mentioned processing on all the image blocks in the second image, and correspondingly obtains a plurality of reference blocks. The reconstructed image in which the reference block is located may be referred to as a reference frame.

The second image comprises M x N image blocks, M and N are integers greater than or equal to 1, the first image block is any one of a plurality of image blocks included in the second image, and the same reconstructed image comprises M x N image blocks, namely the maximum size of the image block can be consistent with the size of the second image or the reconstructed image, and the minimum size of the image block can be 1 pixel size.

For the process of interpolating the reconstructed image by the encoding end 100 according to the first type of filter, a possible example is provided in fig. 6 below, which is not described here again.

With continued reference to fig. 4, the video encoding method provided in the present embodiment further includes step S440.

S440, the encoding end 100 determines motion information of the second image compared to the first image based on the reference frame.

The inter predictor in the encoding side 100 determines a prediction block matching the first image block from the reference frame based on the first image block in the second image. The encoding end 100 determines motion information of the first image block in the second image according to the positions of the first image block and the prediction block. The motion information may indicate an offset, such as a motion vector (MotionVector, MV), of the first image block compared to the prediction block.

In one possible example, the first type of filter interpolates the reconstructed image to obtain the reference frame, as shown in fig. 5, and fig. 5 is a schematic diagram of inter-frame prediction provided by the present application, and the above motion information can be obtained in the following manner.

First, the encoding end 100 searches for a second image block in the reference frame, which has a similarity to the first image block reaching a threshold (e.g., 90%) based on the first image block in the second image, and uses the second image block as a prediction block.

For example, the encoding end 100 may Search the reference frame by using a Full Search method (FS) or a fast Search method based on the image block a as shown in fig. 5, and calculate the matching degree between the searched second image block and the first image block by using a mean-square error (MSE) and an average absolute error (MeanAbsolute Deviation, MAD) and so on, so as to obtain a prediction block, such as the image block B in fig. 5, where the similarity satisfies the threshold. In fig. 5, 3 sub-pixels are interpolated from one whole pixel in the second image, i.e. one pixel in the second image corresponds to 4 pixels in the reference frame. The encoding end 100 interpolates each pixel in the reconstructed image in this example.

In one possible scenario, the encoding end 100 performs the above search and matching only on the reference block portion in the reference frame, and determines the prediction block for which the similarity to the first image block reaches the threshold. It should be noted that, in this case, the encoding end 100 only interpolates the second image block in the reconstructed image, which reduces the amount of calculation and improves the interpolation efficiency.

Second, the encoding end 100 determines a first coordinate of the first image block in the second image and a second coordinate of the prediction block in the reference frame.

For example, the coordinates of the upper left corner pixel and the lower right corner pixel of the first image block in the second image are taken as the first coordinates of the first image block in the second image, and the coordinates of the image block a in fig. 5 are (1, 1) and (2, 2). Likewise, the coordinates of the upper left corner pixel and the lower right corner pixel of the predicted block in the reference frame are taken as the second coordinates of the predicted block in the reference frame, and the coordinates of the image block B in FIG. 5 areAnd (2, 2).

Thirdly, the coding end obtains the motion information of the second image according to the first coordinate and the second coordinate.

The encoding end 100 determines coordinates (1, 1) and (2, 2) of the image block a and coordinates of the image block B in fig. 5 as follows according to the above exampleAnd (2, 2), and filter coefficients, determining that image block A is offset/>The/>I.e. the motion information of the image block a in the second image. The above-mentioned filter coefficient is used to indicate the correspondence of one whole pixel interpolation to obtain the sub-pixels, and in this example, the one whole pixel interpolation to obtain 3 sub-pixels.

Because the first type of filter is determined according to the rendering information, the first type of filter is matched with the image rendering engine adopted by the second image, the encoding end 100 adopts the first type of filter to interpolate during inter-frame prediction to obtain a reference frame, the time-space domain correlation between the reference frame and the second image is improved, the similarity between a prediction block determined by the reference frame and a corresponding image block in the second image is improved by the encoding end according to the similarity between the prediction block determined by the reference frame and the corresponding image block in the second image, the accuracy of the motion information of the second image determined by the reference frame is improved, and furthermore, the encoding end encodes based on the motion information between the prediction block and the two images, and the data volume corresponding to the second image is reduced.

With continued reference to fig. 4, the video encoding method provided in the present embodiment further includes step S450.

S450, the encoding end 100 encodes the second image according to the motion information of the second image.

The encoding end 100 obtains a corresponding residual value (or referred to as a residual block) based on the first image block and the corresponding prediction block in the second image, and the motion information of the first image block determined as described above. The encoding end 100 encodes the residual error value and the motion vector, so as to encode the second image, and obtain a corresponding code stream.

Illustratively, the encoding end 100 obtains the motion information and the prediction block of the second image through inter prediction, and determines a corresponding residual block according to the second image and the corresponding prediction block, where the residual block and the motion information are processed by the transformer 123 and the quantizer 124 in fig. 2 to obtain quantized residual transform coefficients, and then the quantized residual transform coefficients are processed by the entropy encoder 125 to obtain a corresponding code stream.

Because the first type filter is determined according to the rendering information, the first type filter is matched with an image rendering engine adopted by the second image, the time-space domain correlation between a reference frame obtained by interpolation of the reconstructed image and the second image is improved by the first type filter, the accuracy of motion information of the second image determined by the coding end according to the reference frame is improved, for example, the similarity between a prediction block determined by the coding end and a corresponding image block in the second image is improved, the coding end codes based on the prediction block and the motion information between the two images, the coding effect of the image in a video is improved, and the data amount corresponding to the image in a code stream obtained by coding is reduced.

For the process of interpolating a reconstructed image by the encoding end 100 according to the first type of filter, as shown in fig. 6, fig. 6 is a schematic diagram of sub-pixel interpolation provided in the present application, and two possible examples are provided below.

In a first possible example, the encoding end interpolates the reconstructed image according to the first type of filter determined in S420, such as the bilinear interpolation filter, and the coefficients of the first type of filter, such as the luminance bilinear interpolation coefficient matrix and the chrominance bilinear interpolation coefficient matrix, to obtain the corresponding reference frame.

For example, a luminance bilinear interpolation coefficient matrix (matrix 1) and a chrominance bilinear interpolation coefficient matrix (matrix 2) are given below, the number of taps of which is 2 taps, that is, the luminance value and the chrominance value of two whole pixels are used to determine the luminance value and the chrominance value of a sub-pixel. The luminance bilinear interpolation coefficient matrix uses 8 columns, i.e., 8 taps, to correspond to the filter coefficients employed in the image rendering engine.

The encoding end 100 calculates the contents of the luminance values of the individual sub-pixels shown in fig. 6 based on the above matrix 1 using the first type of filter, and the present application gives the following examples.

First, the encoding end 100 interpolates sub-pixels in the same row or column of the whole pixel in the reconstructed image based on the matrix 1 by using the first filter to obtain a corresponding brightness value.

Illustratively, a first type of filter interpolates the sub-pixels a _0,0,b_0,0,c_0,0,d_0,0,h_0,0,n_0,0 that lie in a row or column of the whole pixel, and the first type of filter will calculate the luminance value of the whole pixel in the row or column with each sub-pixel. For example, a _0,0 is located at 1/4 of the position between a _0,0 and a _1,0, so that the first type filter uses the value of the second row in the luminance bilinear interpolation coefficient matrix to calculate the luminance value of a _0,0, for example, the luminance value of a _0,0 is equal to the luminance value of 48×a _0,0+16*A_1,0.b_0,0,c_0,0. Since h _0,0 is at the 1/2 position between A _0,0 and A _0,1. Therefore, the luminance value of h _0,0 is calculated by using the weight of the third row in the luminance bilinear interpolation coefficient matrix, and if the luminance value of h _0,0 is equal to the luminance value of 32×a _0,0+32*A_0,1.d_0,0,n_0,0, the same applies.

Second, the encoding end 100 interpolates the remaining sub-pixels in the reconstructed image according to the luminance values of the sub-pixels of the same row or column of the whole pixels and the matrix 10 by using a first type filter to obtain the luminance values of the remaining sub-pixels.

For example, the remaining sub-pixels of e _0,0,f_0,0,g_0,0,i_0,0,j_0,0,k_0,0,p_0,0,q_0,0,r_0,0 that are not in the whole pixel row or column are interpolated vertically, and the interpolated samples are derived from the luminance values of the sub-pixels in the whole pixel row obtained as described above. If p _0,0 is at the 3/4 position of a _0,0 and a _0,1, the luminance value of p _0,0 is 16 x a _0,0+48*a_0,1.

The encoding end 100 calculates the chromaticity value of each sub-pixel shown in fig. 6 based on the matrix 2 using the first type filter, and the step of calculating the luminance index corresponding to the sub-pixel by the encoding end 100 is identical. If first, the encoding end 100 interpolates sub-pixels of the same row or column of the whole pixels based on the matrix 2 by using a first filter to obtain corresponding chrominance values; second, the encoding end 100 interpolates the remaining sub-pixels based on the chrominance values of the sub-pixels in the same row or column as the whole pixel and the matrix 2 by using a first filter to obtain chrominance values corresponding to the remaining sub-pixels.

For an example of interpolation of the chrominance components of the sub-pixels by the encoding end 100 using the first type filter, reference may be made to the content of interpolation of the luminance components of the sub-pixels by the first type filter, which is not described herein.

Finally, the encoding end 100 obtains a corresponding reference frame based on the chrominance value and the luminance value of the sub-pixel.

In a second possible example, the encoding end interpolates the reconstructed image according to the determined first type filter, such as the bicubic interpolation filter, and the coefficients of the corresponding first type filter, such as the luminance bicubic interpolation coefficient matrix and the chrominance bicubic interpolation coefficient matrix, to obtain the corresponding reference frame.

For example, the following gives a luminance bicubic interpolation coefficient matrix (matrix 3) and a chrominance bicubic interpolation coefficient matrix (matrix 4), and the encoding end 100 interpolates the reconstructed image by using the two matrices to obtain the luminance value and the chrominance value of the sub-pixel under the whole pixel in the reconstructed image.

/>

For the content of interpolation of the reconstructed image by the encoding end 100 based on the matrix 3 and the matrix 4 by using the first type filter, the representation of the luminance value and the chrominance value of each sub-pixel in the whole pixel of the reconstructed image can be calculated by referring to the encoding end 100 based on the matrix 10 and the matrix 2, which is not described herein.

In one possible scenario, the encoding end 100 only interpolates the second image block in the reconstructed image, and for the representation of the second image, reference may be made to the content of S430 described above, which is not described here again.

Since the first type filter in the encoding end 100 interpolates the reconstructed image using the first type filter coefficient, the interpolation pixel position is fixed, and further, the inter-frame predictor in the encoding end 100 can accelerate the interpolation process based on the determined first type filter coefficient. And because the first type filter coefficient is matched with the filter coefficient adopted by the image rendering engine, the time-space domain correlation between the reference frame obtained based on the first type filter coefficient and the second image is improved, the accuracy of the inter-frame predictor in determining the prediction block and the corresponding motion information of the second image is enhanced, the coding effect is improved, and the data amount corresponding to the second image in the coded code stream is reduced.

After the encoding end 100 encodes the source data according to the video encoding method, or encodes one or more source images in the source data, a corresponding code stream is obtained. The encoding end 100 transmits the code stream to the decoding end 200, and the decoding end 200 decodes and plays the code stream. Fig. 7 is a schematic flow chart of a video decoding method according to the present application. The video decoding method may be applied to the video codec system shown in fig. 1 or the video decoder shown in fig. 3, and the video decoding method may be exemplarily performed by the decoding end 200 or the video decoder 220, and the image decoding method provided in this embodiment is described here as an example of the execution of the decoding end 200. As shown in fig. 7, the image encoding method provided in the present embodiment includes the following steps S710 to S740.

S710, the decoding end 200 obtains the code stream and rendering information corresponding to the code stream.

Wherein the rendering information is used to indicate: filter parameters used in generating a code stream from source data, and the code stream includes a plurality of image frames.

The code stream further includes, for example, an index relationship and motion information corresponding to each image frame, where the index relationship indicates a reconstructed image that is referred to when the source image is encoded to obtain the image frame, that is, a correspondence relationship between the image frame and the reconstructed image. The rendering information may be transmitted by the encoding end 100.

The filter parameters may include one or more of filter type, filter coefficients, number of taps, etc.

The following examples are given for the content of the filter type and the filter coefficient indication in the filter parameters.

Example 2, the filter coefficients may include filter coefficients to indicate: and (5) interpolating the coefficient matrix. The matrix may be any of the matrices described above.

S720, the decoding end 200 determines a first type filter matched with the rendering information from the set multiple types of filters.

The decoding end 200 determines a first type of filter from among the set multiple types of filters based on the filter type indicated in the filter parameters.

Illustratively, the decoding side 200 may select a filter that is identical to the filter type indicated in the above-described filter parameters as the first type of filter.

For example, the filter type indicated in the filter parameters is bilinear filters, and the decoding end 200 will select bilinear filters from the set multiple types of filters as the first type of filters.

Since the type of the first type of filter determined by the decoding end 200 is matched with the rendering information, the first type of filter adopted by the decoding end 200 is also matched with a filter in an image rendering engine adopted by the source data, so that the problem of low time-space domain correlation caused by an interpolation filter only based on discrete cosine transform is avoided, and the accuracy of determining a prediction block of the second image frame by an inter-frame predictor in the decoding end 200 is enhanced.

The decoding end 200 takes the filter coefficient as the coefficient of the first type of filter in a state that the filter coefficient can be obtained; and in the state that the filter coefficient is not available, the set filter coefficient is adopted as the coefficient of the first type of filter, so that the decoding end 200 can adaptively determine the coefficient of the first type of filter according to the content indicated by the filter parameter. And secondly, as the first type filter coefficient is consistent or close to the filter coefficient adopted by the image rendering engine, the first type filter coefficient is utilized to process when the decoding end 200 decodes, so that the time-space domain correlation between the source image corresponding to the second image frame in the code stream and the reference frame obtained by utilizing the first type filter coefficient processing is improved, the data volume processed by the decoding end is further reduced, and the video decoding efficiency is improved.

Examples of the above cases may refer to the content of the matrix 10 in the video encoding method, and are not described herein.

S730, the decoding end 200 determines the reference frame of the second image frame from the first decoded image by using the first type filter.

The reference frame is obtained by a first type filter according to a first decoding image corresponding to a first image frame, and the first image frame is an image frame with similarity reaching a threshold value with a second image frame in a plurality of image frames.

The first decoded image is illustratively a decoded image corresponding to an image frame, of the source images corresponding to the plurality of image frames, for which the similarity of the source image corresponding to the second image frame reaches a threshold.

The decoding end 200 may directly determine the first decoded image corresponding to the second image frame according to the above index relationship. The decoding end 200 interpolates the first decoded image using a first type of filter to obtain a reference frame having sub-pixels.

In one possible scenario, the decoding end 200 interpolates only the corresponding image block in the first decoded image according to the motion information included in the code stream, resulting in a reference frame with sub-pixels.

The interpolation process of the decoding end 200 for the first decoded image can be divided into: luminance component interpolation of the sub-pixel and chrominance component interpolation of the sub-pixel. For the interpolation process performed by the decoding side 200 on the first decoded image according to the first type of filtering, two possible examples are given below.

In a first possible example, the decoding side 200 interpolates the first decoded image using the above matrix 10 and matrix 2: first, the decoding end 200 interpolates sub-pixels in the same row or column of the whole pixels in the first decoded image based on the matrix 10 by using a first filter to obtain a corresponding brightness value.

Second, the decoding end 200 interpolates the remaining sub-pixels in the first decoded image according to the luminance values of the sub-pixels in the same row or column of the whole pixels by using the first type filter, so as to obtain the luminance values of the sub-pixels in the first decoded image.

Similarly, the process of interpolating the chrominance component of the sub-pixel of the first decoded image by the decoding end 200 is identical to the process of interpolating the luminance component of the sub-pixel of the first decoded image by the decoding end 200, and only the matrix 1 is replaced by the matrix 2, thereby obtaining the chrominance value of the sub-pixel in the first decoded image.

The decoding end 200 obtains a corresponding reference frame according to the chrominance values and the luminance values of the sub-pixels.

In a second possible example, the decoding end 200 interpolates the first decoded image by using the matrix 3 and the matrix 4, and the content of this example may refer to the content of the first possible example, which is not described herein.

For an example of the interpolation process performed by the decoding end 200 on the first decoded image, reference may be made to the content shown in fig. 6, which is not described herein.

In one possible scenario, the decoding end 200 interpolates only the image block in the reconstructed image, which is the image block indicated by the motion information.

Since the first type filter in the decoding end 200 interpolates the reconstructed image using the first type filter coefficient, the interpolation pixel position is fixed, and further, the inter-frame predictor in the decoding end 200 can accelerate the interpolation process based on the determined first type filter coefficient. In addition, as the first type filter coefficient is matched with the filter coefficient adopted by the image rendering engine, the time-space domain correlation between the reference frame and the second image frame obtained by the decoding end 200 based on the first type filter coefficient is improved, the accuracy of determining the prediction block of the second image frame by the inter-frame predictor is enhanced, and then the data volume processed by the decoding end is reduced under the condition that the processing capacity of the decoding end is consistent, and the video decoding efficiency is improved.

S740, the decoding end 200 decodes the second image frame according to the motion information between the first image frame and the second image frame and the reference frame.

The decoding end 200 determines that the first image block in the second image frame corresponds to the second image block in the reference frame according to the motion information between the first image frame and the second image frame, and the decoding end 200 decodes the second image frame based on the second image block to obtain a second decoded image. The motion information is used for indicating the first image blocks corresponding to the second image frames, compared with the offset between the second image blocks in the reference frames.

For example, the running information between the first image frame and the second image frame refers to the motion information of the source image corresponding to the first image frame and the source image corresponding to the second image frame, and the motion information is stored in the second image frame.

The first image block is any one of a plurality of image blocks corresponding to the second image frame. This second image block may be referred to as a prediction block.

Illustratively, after determining all the prediction blocks corresponding to the second image frame, the decoding end 200 decodes the second image frame according to the residual value corresponding to the first image block and the prediction block corresponding to the first image block in the code stream, to obtain a corresponding decoded image.

In one possible example, the decoding end decodes the second image frame according to the motion information between the first image frame and the second image frame and the reference frame, including:

first, the decoding end 200 determines a second position of the second image block in the reference frame according to the offset of the first image block in the second image frame indicated by the motion information.

For example, the decoding end 200 determines the second image block in the second position in the reference frame based on the offset indicated by the motion information and the first position of the first image block in the second image frame. The offset represents a distance of movement of the first position from the second position.

Second, the decoding end 200 takes the second image block as a prediction block of the second image frame, and decodes the second image frame based on the prediction block.

After the decoding end 200 determines all the prediction blocks corresponding to the second image frame, the decoding end 200 decodes based on all the prediction blocks and residual blocks corresponding to the respective prediction blocks obtained from the code stream.

As shown in fig. 3, the decoding end 200 may process the code stream through the entropy decoder 221, the inverse quantizer 222 and the inverse transformer 223 to obtain a residual block corresponding to the predicted block, and the decoding end 200 performs image reconstruction according to the residual block and the predicted block, and then processes the reconstructed image through the filter unit 224, thereby obtaining a decoded image.

Due to the motion information indicated by the code stream, the position of the prediction block corresponding to the second image frame in the reference frame can be accurately determined, and the decoding end 200 can improve the decoding efficiency of the second image frame according to the motion information.

The decoding side 200 may display the decoded image.

The method for video encoding according to the present application is described in detail above with reference to fig. 4 to 6, and the following description will refer to fig. 8, where fig. 8 is a schematic structural diagram of a video encoding device according to the present application. The video encoding device 800 may be used to implement the functions of the encoding end in the above method embodiments, so that the beneficial effects of the above method embodiments can also be implemented.

As shown in fig. 8, the video encoding apparatus 800 includes a first acquisition module 810, a first interpolation module 820, and an encoding module 830; the video encoding apparatus 800 is configured to implement the functions of the encoding end in the method embodiments corresponding to fig. 4 to 6. In one possible example, the specific process of the video encoding apparatus 800 for implementing the video encoding method described above includes the following processes:

The first obtaining module 810 is configured to obtain source data and rendering information corresponding to the source data. Wherein the rendering information is used to indicate: the filter parameters used in generating the code stream are based on source data, and the source data includes a plurality of images.

The first interpolation module 820 is configured to determine a first type of filter that matches the rendering information from the set multiple types of filters, and determine a reference frame of the second image. The reference frame is obtained by a first type filter according to a reconstructed image obtained by decoding a first image in a plurality of images after encoding, and the first image is an image with similarity reaching a threshold value with a second image in the plurality of images.

The encoding module 830 is configured to encode the second image based on the motion information of the second image determined by the reference frame.

It should be understood that the encoding end of the foregoing embodiment may correspond to the video encoding apparatus 800 and may correspond to the respective main bodies corresponding to fig. 4 to 6 for executing the methods according to the embodiments of the present application, and the operations and/or functions of the respective modules in the video encoding apparatus 800 are respectively for implementing the respective flows of the respective methods corresponding to the embodiments in fig. 4 to 6, which are not repeated herein for brevity.

The method for decoding video provided according to the present application is described in detail above with reference to fig. 3 and 7, and the following description will refer to fig. 9, where fig. 9 is a schematic structural diagram of a video decoding device provided according to the present application. The video decoding apparatus 900 may be used to implement the functions of the decoding end in the above-described method embodiment, so that the beneficial effects of the above-described method embodiment can also be implemented.

As shown in fig. 9, the video decoding apparatus 900 includes a second acquisition module 910, a second interpolation module 920, and a decoding module 930; the video decoding apparatus 900 is configured to implement the functions of the decoding end in the method embodiments corresponding to fig. 3 and fig. 7. In one possible example, the specific process of implementing the video decoding method by the video decoding apparatus 900 includes the following processes:

The second obtaining module 910 is configured to obtain the code stream and rendering information corresponding to the code stream. The code stream includes a plurality of image frames, and rendering information is used to indicate: filter parameters used in generating the code stream from the source data.

The second interpolation module 920 is configured to determine a first type of filter that matches the rendering information from the set multiple types of filters, and determine a reference frame of the second image frame. The reference frame is obtained by a first type filter according to a first decoding image corresponding to a first image frame, and the first image frame is an image frame with similarity reaching a threshold value with a second image frame in a plurality of image frames.

The decoding module 930 is configured to decode the second image frame according to the motion information between the first image frame and the second image frame and the reference frame.

It should be understood that the decoding end according to an embodiment of the present application may correspond to the video decoding apparatus 900 in the application embodiment, and may correspond to the respective main bodies corresponding to fig. 3 and 7 performing the method according to the embodiment of the present application, and the operations and/or functions of the respective modules in the video decoding apparatus 900 are respectively for implementing the respective flows of the respective methods corresponding to the embodiment in fig. 3 and 7, which are not repeated herein for brevity.

Fig. 10 is a schematic structural diagram of a computer device according to the embodiment of the present application, as shown in fig. 10. The computer device may be applied to the codec system shown in fig. 1, and the computer device may be either the encoding end 100 or the decoding end 200.

The computer device 1000 may be a mobile phone, a tablet computer, a television (also referred to as a smart television, a smart screen or a large screen device), a notebook computer, an Ultra-mobile Personal Computer (UMPC), a handheld computer, a netbook, a Personal digital assistant (Personal DIGITALASSISTANT, PDA), a wearable electronic device (e.g., a smart watch, a smart bracelet, a smart glasses), a vehicle-mounted device, a virtual reality device, a server, and other computing devices with computing capabilities.

As shown in fig. 10, the processing device 1000 may include a processor 1010, a memory 1020, a communication interface 1030, a bus 1040, and the like, with the processor 1010, the memory 1020, the communication interface 1030 being connected by the bus 1040.

It will be appreciated that the configuration illustrated in the embodiments of the present application does not constitute a specific limitation on the processing apparatus. In other embodiments of the application, the processing device may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 1010 may include one or more processing units, such as: the processor 1010 may include an application processor (application processor, AP), a modem processor, a central processor (Central Processing Unit, CPU), a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

An internal memory may also be provided in the processor 1010 for storing instructions and data. In some embodiments, the internal memory in the processor 1010 is a cache memory. The internal memory may hold instructions or data that is just used or recycled by the processor 1010. If the processor 1010 needs to reuse the instruction or data, it can be called directly from the internal memory. Repeated accesses are avoided and the latency of the processor 1010 is reduced, thereby improving the efficiency of the system.

Memory 1020 may be used to store computer-executable program code that includes instructions. The processor 1010 executes various functional applications of the processing device 1000 and data processing by executing instructions stored in the internal memory 1020. The internal memory 1020 may include a stored program area and a stored data area. The storage program area may store, among other things, an operating system, application programs (such as an encoding function, a transmitting function, etc.) required for at least one function, and the like. The storage data area may store data (e.g., a code stream, a reference frame, etc.) created during use of the processing device 1000, etc. In addition, the internal memory 1020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash memory (universal flash storage, UFS), and the like.

The communication interface 1030 is used to enable communication of the processing device 1000 with external devices or apparatus. In this embodiment, the communication interface 1030 is used for data interaction with other processing devices.

The bus 1040 may include a path for transferring information between the aforementioned components (e.g., the processor 1010, the memory 1020, the communication interface 1030). The bus 1040 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus 1040 in the figures. The bus 1040 may be a peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) high-speed bus, an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (unified bus, ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like.

It should be noted that, in fig. 10, only the processing device 1000 includes 1 processor 1010 and 1 memory 1020 as an example, where the processor 1010 and the memory 1020 are used to indicate a type of device or device, respectively, and in a specific embodiment, the number of each type of device or device may be determined according to service requirements.

When the video encoding (or video decoding) apparatus is implemented by hardware, the hardware may be implemented by a processor or a chip. The following description uses the hardware as a chip, where the chip includes a processor for implementing the functions of the encoding side and/or the decoding side in the above method. In one possible design, the chip further includes a power supply circuit for powering the processor. The chip can be directly formed by the chip, and can also comprise the chip and other discrete devices.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; but also optical media such as digital video discs (digital video disc, DVD); but also semiconductor media such as Solid State Drives (SSDs) STATE DRIVE.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of video decoding, the method performed by a decoder, the method comprising:

Acquiring a code stream and rendering information corresponding to the code stream, wherein the code stream comprises a plurality of image frames, and the rendering information is used for indicating: filter parameters used in the process of generating the code stream according to source data;

Determining a first type filter matched with the rendering information from the set multiple types of filters, and determining a reference frame of a second image frame; the reference frame is obtained by the first type filter according to a first decoding image corresponding to a first image frame, wherein the first image frame is an image frame with similarity reaching a threshold value with the second image frame in the plurality of image frames;

And decoding the second image frame according to the motion information between the first image frame and the second image frame and the reference frame.

2. The method of claim 1, wherein decoding the second image frame based on the motion information between the first image frame and the second image frame and the reference frame comprises:

For a plurality of image blocks corresponding to the second image frame, determining a second position of the second image block in the reference frame after the first position of the first image block in the second image frame is offset indicated by the motion information; the first image block is any one of the plurality of image blocks;

The second image frame is decoded based on the second image block.

3. The method according to claim 1 or 2, wherein the determining a first type of filter that matches the rendering information from among the set types of filters includes:

and taking a filter which is consistent with the filter parameters from the multiple types of filters as the first type of filter.

4. A method according to any one of claims 1 to 3, wherein the filter parameters comprise one or more combinations of filter type, filter coefficients and number of taps.

5. The method of claim 4, wherein the filter type comprises one of a bilinear interpolation filter, a bicubic interpolation filter, and a nearest neighbor interpolation filter.

6. The method according to claim 4 or 5, wherein if the filter parameters include the filter coefficients, the coefficients of the first type of filter are the filter coefficients;

and if the filter parameters do not comprise the filter coefficients, the coefficients of the first type of filter are set filter coefficients.

7. The method of any of claims 4 to 6, wherein the determining a reference frame for the second image frame comprises:

interpolating the first decoded image according to the chromaticity coefficient matrix and the brightness coefficient matrix indicated by the filter coefficients to obtain a chromaticity value and a brightness value of a sub-pixel under an integral pixel in the first decoded image;

And obtaining the reference frame based on the chrominance value and the luminance value of the sub-pixel.

8. A method of video encoding, the method performed by an encoder, the method comprising:

Acquiring source data and rendering information corresponding to the source data; the rendering information is used for indicating: filter parameters used in generating a code stream from source data, the source data comprising a plurality of source images;

determining a first type filter matched with the rendering information from a plurality of types of filters, and determining a reference frame of a second image, wherein the reference frame is obtained by the first type filter according to a reconstructed image obtained by encoding and decoding a first image in the plurality of images, and the first image is an image with similarity reaching a threshold value with the second image in the plurality of images;

the second image is encoded based on motion information of the second image compared to the first image determined by the reference frame.

9. The method of claim 8, wherein the determining a first type of filter that matches the rendering information from among the set plurality of types of filters comprises:

10. The method of claim 8 or 9, wherein the filter parameters comprise one or more combinations of filter type, filter coefficients and number of taps.

11. The method of claim 10, wherein the filter type comprises one of a bilinear interpolation filter, a bicubic interpolation filter, and a nearest neighbor interpolation filter.

12. The method according to claim 10 or 11, wherein if the filter parameters include the filter coefficients, the coefficients of the first type of filter are the filter coefficients;

13. The method of any of claims 10 to 12, wherein the determining the reference frame of the second image comprises:

Interpolating the reconstructed image according to the chromaticity coefficient matrix and the brightness coefficient matrix indicated by the filter coefficients to obtain chromaticity values and brightness values of sub-pixels under the whole pixels in the reconstructed image;

14. The method according to any one of claims 8 to 13, wherein the motion information is obtained by:

searching a second image block, which has similarity reaching a threshold value with the first image block, in the reference frame based on the first image block in the second image, and taking the second image block as a prediction block; the first image block is any one of a plurality of image blocks included in the second image;

Acquiring a first coordinate of the first image block in the second image and a second coordinate of the prediction block in the reference frame;

And obtaining the motion information of the second image according to the first coordinate and the second coordinate.

15. A video decoding device, the device comprising:

The second acquisition module is used for acquiring a code stream and rendering information corresponding to the code stream, the code stream comprises a plurality of image frames, and the rendering information is used for indicating: filter parameters used in the process of generating the code stream according to source data;

A second determining module, configured to determine a first type filter matched with the rendering information from the set multiple types of filters, and determine a reference frame of a second image frame; the reference frame is obtained by the first type filter according to a first decoding image corresponding to a first image frame, wherein the first image frame is an image frame with similarity reaching a threshold value with the second image frame in the plurality of image frames;

and the decoding module is used for decoding the second image frame according to the motion information between the first image frame and the second image frame and the reference frame.

16. The apparatus of claim 15, wherein the decoding module is further configured to determine, for a plurality of image blocks corresponding to the second image frame, a second position of a second image block in the reference frame after the first position in the second image frame is offset indicated by the motion information; the first image block is any one of the plurality of image blocks; and decoding the second image frame based on the second image block.

17. The apparatus according to claim 15 or 16, wherein the second determining module is further configured to take as the first type of filter a filter from the plurality of types of filters that is consistent with the filter parameters.

18. The apparatus of any one of claims 15 to 17, wherein the filter parameters comprise one or more combinations of filter type, filter coefficients, and number of taps.

19. The apparatus of claim 18, wherein the filter type comprises one of a bilinear interpolation filter, a bicubic interpolation filter, and a nearest neighbor interpolation filter.

20. The apparatus according to claim 18 or 19, wherein if the filter parameters include the filter coefficients, the coefficients of the first type of filter are the filter coefficients;

21. The apparatus according to any one of claims 18 to 20, wherein the decoding module is further configured to interpolate the first decoded image according to the chrominance coefficient matrix and the luminance coefficient matrix indicated by the filter coefficient to obtain chrominance values and luminance values of sub-pixels under the whole pixel in the first decoded image; and obtaining the reference frame based on the chrominance values and the luminance values of the sub-pixels.

22. A video encoding device, the device comprising:

the first acquisition module is used for acquiring source data and rendering information corresponding to the source data; the rendering information is used for indicating: filter parameters used in generating a code stream from source data, the source data comprising a plurality of source images;

The first determining module is used for determining a first type filter matched with the rendering information from the set multiple types of filters and determining a reference frame of a second image, wherein the reference frame is obtained by the first type filter according to a reconstructed image obtained by decoding after encoding a first image in the multiple images, and the first image is an image with similarity reaching a threshold value with the second image in the multiple images;

And the encoding module is used for encoding the second image based on the motion information of the second image compared with the first image, which is determined by the reference frame.

23. The apparatus of claim 22, wherein the first determining module is further configured to use, from among the plurality of types of filters, a filter that is consistent with the filter parameters as the first type of filter.

24. The apparatus of claim 22 or 23, wherein the filter parameters comprise one or more combinations of filter type, filter coefficients, and number of taps.

25. The apparatus of claim 24, wherein the filter type comprises one of a bilinear interpolation filter, a bicubic interpolation filter, and a nearest neighbor interpolation filter.

26. The apparatus according to claim 24 or 25, wherein if the filter parameters include the filter coefficients, the coefficients of the first type of filter are the filter coefficients;

27. The apparatus according to any one of claims 24 to 26, wherein the encoding module is further configured to interpolate the reconstructed image according to the chrominance coefficient matrix and the luminance coefficient matrix indicated by the filter coefficients to obtain chrominance values and luminance values of sub-pixels under the whole pixel in the reconstructed image; and obtaining the reference frame based on the chrominance values and the luminance values of the sub-pixels.

28. The apparatus according to any one of claims 22 to 26, wherein the encoding module is further configured to search for a second image block in the reference frame for which a similarity to the first image block reaches a threshold, based on a first image block in the second image, and take the second image block as a prediction block; the first image block is any one of a plurality of image blocks included in the second image; acquiring a first coordinate of the first image block in the second image and a second coordinate of the prediction block in the reference frame; and obtaining the motion information of the second image according to the first coordinate and the second coordinate.

29. A chip, comprising: a processor and a power supply circuit;

The power supply circuit is used for supplying power to the processor;

the processor being for performing the method of any one of claims 1 to 7; and/or the processor is configured to perform the method of any one of claims 8 to 14.

30. A codec comprising a memory and a processor, the memory configured to store computer instructions; the processor, when executing the computer instructions, implements the method of any one of claims 1 to 7; and/or the processor, when executing the computer instructions, implements the method of any of claims 8 to 14.

31. A coding and decoding system, comprising a coding end and a decoding end;

the encoding end is used for encoding a plurality of images according to rendering information corresponding to the plurality of source images to obtain code streams corresponding to the plurality of images, and the method of any one of claims 8 to 14 is realized;

The decoding end is configured to decode the code stream according to rendering information corresponding to a plurality of source images, to obtain a plurality of decoded images, and implement the method of any one of claims 1 to 7.

32. A computer readable storage medium, characterized in that the storage medium has stored therein a computer program or instructions which, when executed by a processing device, implement the method of any one of claims 1 to 7; and/or, when executed by a processing device, implement the method of any of claims 8 to 14.

33. A computer program product comprising a computer program or instructions which, when executed by a processing device, implements the method of any one of claims 1 to 7; and/or, when executed by a processing device, implement the method of any of claims 8 to 14.