WO2019223480A1

WO2019223480A1 - Video data decoding method and device

Info

Publication number: WO2019223480A1
Application number: PCT/CN2019/083848
Authority: WO
Inventors: 赵寅; 杨海涛; 陈建乐
Original assignee: 华为技术有限公司
Priority date: 2018-05-24
Filing date: 2019-04-23
Publication date: 2019-11-28
Also published as: CN110536133B; CN110536133A

Abstract

The present application discloses a method and device for acquiring a residual during video decoding. The method comprises: analyzing a code stream to acquire a transform coefficient of a block to be processed; converting the transform coefficient into a first residual of said block; identifying a scaling factor of said block according to pixel information in a neighboring region of the default space of said block; and adjusting the first residual on the basis of the scaling factor, so as to obtain a second residual of said block. The invention achieves, during the process of video decoding, flexible adjustment of residual processing while stabilizing a slice code rate, such that the acquired residual is more suitable for human visual perception, thereby improving encoding and decoding performance.

Description

Video data decoding method and device

This application claims priority from a Chinese patent application filed on May 24, 2018 with the State Intellectual Property Office of China, with the application number 201810508090.3 and the application name "Video Data Decoding Method and Device", the entire contents of which are incorporated herein by reference in.

Technical field

The present application relates to the technical field of video coding and decoding, and in particular, to a method and a device for acquiring residuals.

Background technique

The current video coding technologies include a variety of video coding standards, such as H.264 / AVC, H.265 / HEVC, Audio Video Coding Standard (AVS) and other video coding standards. The above video coding standards usually use a hybrid coding framework. The hybrid coding framework may include prediction, transformation, quantization, entropy coding and other links. The prediction link uses reconstructed pixels of the encoded region to generate predicted pixels of the original pixels corresponding to the current encoded image block. The difference in pixel values between the original pixel and the predicted pixel is called residual. In order to improve the coding efficiency of the residuals, the residuals are usually transformed first, transformed into transform coefficients, and then the transform coefficients are quantized. Then, the quantized transform coefficients and syntax elements (for example, indication information such as the coded image block size, prediction mode, and motion vector) are converted into a code stream through an entropy coding process.

Video decoding is a process of converting a code stream into a video image, and may include links such as entropy decoding, prediction, dequantization, inverse transform, and the like. First, the code stream is parsed through an entropy decoding process to obtain syntax elements and quantized transform coefficients. Then, on the one hand, the predicted pixels are obtained based on the syntax elements and the previously decoded reconstructed pixels; on the other hand, the quantized transform coefficients are obtained through an inverse quantization process to obtain the inverse quantized transform coefficients, and the inverse quantized transform coefficients are subjected to inverse Transform to get reconstructed residuals. And, the reconstructed residuals and prediction pixels are accumulated to obtain reconstructed pixels, thereby recovering a video image.

For lossy coding, the reconstructed pixel may be different from the original pixel, and the difference in value between the two is called distortion. Due to the existence of various visual masking effects, such as the brightness masking effect and the contrast masking effect, the intensity of distortion observed by the human eye is closely related to the characteristics of the background in which the distortion is located.

Summary of the Invention

The embodiment of the present application uses the spatial neighborhood pixel information of the current block to be processed (ie, the block to be decoded and the transform block) to simulate the original pixel information corresponding to the current block to be processed. According to the spatial neighborhood pixel information, adaptively derive the adjustment factor for the current to-be-processed block (that is, the transform block), and adjust the residual block corresponding to the current to-be-processed block based on the adaptively derived adjustment factor. During encoding or decoding, the residual bits of processing blocks with strong visual masking effects are reduced, and the residual bits of processing blocks with weak visual masking effects are increased, making the encoding of actual residuals more consistent with human visual perception, thereby Improved codec performance.

A first aspect of the embodiments of the present application provides a method for acquiring residuals in video decoding, including: parsing a bitstream to obtain a transform coefficient of a block to be processed; and transforming the transform coefficient into a first block of the block to be processed. A residual; determining an adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed; adjusting the first residual based on the adjustment factor to obtain the block to be processed Second residual.

Use the spatial neighborhood pixel information of the current to-be-processed block to simulate the original pixel information corresponding to the current to-be-processed block, adaptively derive the adjustment factor for the current to-be-processed block, and adjust the current to-be-processed block based on the adaptively derived adjustment factor. Corresponding residual blocks make the actual residuals more consistent with human visual perception, thereby improving the performance of encoding and decoding.

In a feasible implementation manner of the first aspect, before determining the adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed, the method further includes: The pixel values in the preset spatial neighborhood of the processing block are used to calculate pixel information in the preset spatial neighborhood of the block to be processed.

In a feasible implementation manner of the first aspect, the calculating pixel information in a preset spatial neighborhood of the block to be processed includes: obtaining one or more pixel sets in the preset spatial neighborhood; calculating Mean and / or dispersion of pixels in the one or more pixel sets to obtain pixel information in the preset spatial neighborhood.

The pixel information of the pixels around the block to be processed is used instead of the pixel information of the block to be processed, so that the decoding end can adaptively derive the pixel information, which saves the number of bits of transmitted pixel information and improves the coding efficiency.

In a feasible implementation manner of the first aspect, the dispersion includes: a mean square error sum, an average absolute error sum, a variance or a standard deviation.

Based on different scenarios and requirements for implementation complexity, different indicators can be selected as the representations of the dispersion to achieve a balance between performance and complexity.

In a feasible implementation manner of the first aspect, the dispersion includes: before the acquiring one or more pixel sets in the preset spatial neighborhood, further comprising: determining the one or more pixels All pixels in each pixel set in the set have been reconstructed.

Selecting the reconstructed pixels to calculate the pixel information ensures the accuracy of the pixel information used for the adjustment factor calculation.

In a feasible implementation manner of the first aspect, the pixel information is the average value, and the adjustment factor of the block to be processed is determined according to the pixel information in a preset spatial neighborhood of the block to be processed, Including: determining the adjustment factor according to the mean value and a first mapping relationship between the mean value and the adjustment factor, wherein the first mapping relationship satisfies one or more of the following conditions: when the average value is less than the first At the threshold, the adjustment factor decreases as the average value increases; when the average value is greater than a second threshold value, the adjustment factor increases as the average value increases, where the first threshold value Less than or equal to the second threshold; when the average is greater than or equal to the first threshold and less than or equal to the second threshold, the adjustment factor is a first preset constant.

In a feasible implementation manner of the first aspect, the pixel information is the dispersion, and the adjustment factor of the block to be processed is determined according to the pixel information in a preset spatial neighborhood of the block to be processed. ,include:

Determining the adjustment factor according to the dispersion and a second mapping relationship between the dispersion and the adjustment factor, wherein the second mapping relationship satisfies one or more of the following conditions:

When the dispersion is greater than a third threshold, the adjustment factor increases as the dispersion increases;

When the dispersion is less than or equal to the third threshold, the adjustment factor is a second preset constant.

In a feasible implementation manner of the first aspect, the pixel information is the mean and the dispersion, and the to-be-processed is determined according to the pixel information in a preset spatial neighborhood of the to-be-processed block. Regulators of the block, including:

Determining a first parameter according to the mean and the first mapping relationship;

Determining a second parameter according to the dispersion and the second mapping relationship;

A product or a weighted sum of the first parameter and the second parameter is used as the adjustment factor.

Based on different scenarios and implementation complexity requirements, different indicators can be selected to determine the adjustment factor, achieving a balance between performance and complexity.

In a feasible implementation manner of the first aspect, after the product or weighted sum of the first parameter and the second parameter is used as the adjustment factor, the method further includes: weighting the adjustment factor. Adjusting to obtain an adjusted adjustment factor; correspondingly, determining the adjustment of the block to be processed includes: using the adjusted adjustment factor as an adjustment factor of the block to be processed.

By further weighting the adjustment factors, the adjustment factors are optimized and the encoding efficiency is further improved.

In a feasible implementation manner of the first aspect, after determining the adjustment factor of the block to be processed, the method further includes: updating the adjustment factor according to a quantization parameter of the block to be processed; Adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed includes: adjusting the first residual based on the updated adjustment factor to obtain the The second residual of the block to be processed is described.

In a feasible implementation manner of the first aspect, the adjustment factor is adjusted in the following manner:

QC represents the adjustment factor, QP represents the quantization parameter, and N, M, and X are preset constants.

The introduction of quantization parameters to further optimize the adjustment factor, further improving the coding efficiency.

In a feasible implementation manner of the first aspect, the number of acquired transformation coefficients of the block to be processed is the same as the number of pixels of the block to be processed. After that, it further includes: arranging the transform coefficients of the block to be processed into transform coefficient blocks according to a preset positional relationship; correspondingly, the first residual of converting the transform coefficients to the block to be processed includes: Converting the transform coefficient block into a first residual block of the block to be processed; correspondingly, adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed, The method includes: adjusting the first residual block based on the adjustment factor to obtain a second residual block of the block to be processed.

In a feasible implementation manner of the first aspect, the first residual block includes a first luminance residual block of a luminance component of the block to be processed, and a luminance residual in the first luminance residual block. The pixels correspond to the pixels of the luminance component of the block to be processed one by one. Correspondingly, the second residual block includes a second luminance residual block of the luminance component of the block to be processed. Adjusting the first residual block to obtain a second residual block of the block to be processed includes: adjusting a luminance residual pixel in the first luminance residual block based on the adjustment factor to obtain the The luminance residual pixels in the second luminance residual block of the block to be processed.

In a feasible implementation manner of the first aspect, the luminance residual pixels in the second luminance residual block are obtained in the following manner:

Res2_Y (i) = (Res1_Y (i) × QC + offset_Y) ＞＞ shift_Y

Wherein, QC represents the adjustment factor, Res1_Y (i) represents the i-th brightness residual pixel in the first brightness residual block, and Res2_Y (i) represents the i-th brightness residual block in the second Brightness residual pixels, offset_Y and shift_Y are preset constants, and i is a natural number.

In a feasible implementation manner of the first aspect, the first residual block includes a first chrominance residual block of a chrominance component of the block to be processed, where The chroma residual pixels correspond to the pixels of the chroma component of the block to be processed one by one. Correspondingly, the second residual block includes a second chroma residual block of the chroma component of the block to be processed. The adjusting the first residual block based on the adjustment factor to obtain a second residual block of the block to be processed includes: adjusting a color in the first chroma residual block based on the adjustment factor. Degree residual pixels to obtain chrominance residual pixels in a second chrominance residual block of the block to be processed.

In a feasible implementation manner of the first aspect, the chroma residual pixels in the second chroma residual block are obtained in the following manner:

Res2_C (i) = (Res1_C (i) × QC + offset_C) ＞＞ shift_C

Among them, QC represents the adjustment factor, Res1_C (i) represents the ith chroma residual pixel in the first chroma residual block, and Res2_C (i) represents the The i-th chroma residual pixel, offset_C and shift_C are preset constants, and i is a natural number.

Separate the luminance residual and chrominance residual to further balance the relationship between performance and complexity.

In a feasible implementation manner of the first aspect, a bit width accuracy of a luminance residual pixel in the first luminance residual block is higher than a bit width of the luminance residual pixel in the second luminance residual block. Bit width precision.

In a feasible implementation manner of the first aspect, a bit width accuracy of a chroma residual pixel in the first chroma residual block is higher than a chroma residual in the second chroma residual block. Bit width accuracy of poor pixels.

For the first residual generated in the intermediate process, high-precision bit-width precision is used, which can improve the accuracy of the operation and improve the coding efficiency.

In a feasible implementation manner of the first aspect, the converting the transform coefficient block into a first residual block of the block to be processed includes: performing each transform coefficient in the transform coefficient block. Performing inverse quantization to obtain an inverse quantized transform coefficient block; performing inverse transform on the inverse quantized transform coefficient block to obtain a first residual block of the block to be processed.

In a feasible implementation manner of the first aspect, after the adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed, the method further includes: The residual pixels in the two residuals are added to the predicted pixels at corresponding positions in the block to be processed to obtain reconstructed pixels at the corresponding positions in the block to be processed.

The above two steps are a pre-order step and a subsequent step to obtain the residuals, so that the beneficial effects of adjusting the residuals can be superimposed with other prediction, transformation, and quantization techniques.

A second aspect of the embodiments of the present application discloses a device for acquiring residuals in video decoding, including: a parsing module for parsing a bitstream to obtain a transform coefficient of a block to be processed; a transform module for transforming the transform The coefficient is converted into a first residual of the block to be processed; a calculation module is configured to determine an adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed; and an adjustment module is configured to Adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed.

In a feasible implementation manner of the second aspect, the calculation module is further configured to calculate, based on pixel values in a preset spatial neighborhood of the block to be processed, the Pixel information.

In a feasible implementation manner of the second aspect, the calculation module is specifically configured to: obtain one or more pixel sets in the preset spatial neighborhood; and calculate an average value of pixels in the one or more pixel sets. And / or dispersion to obtain pixel information within the preset spatial neighborhood.

In a feasible implementation manner of the second aspect, the dispersion includes: a mean square error sum, an average absolute error sum, a variance or a standard deviation.

In a feasible implementation manner of the second aspect, the calculation module is further configured to determine that all pixels in each pixel set in the one or more pixel sets have been reconstructed.

In a feasible implementation manner of the second aspect, the pixel information is the average value, and the calculation module is specifically configured to determine according to the average value and a first mapping relationship between the average value and the adjustment factor. The adjustment factor, wherein the first mapping relationship satisfies one or more of the following conditions: when the average value is less than a first threshold value, the adjustment factor decreases as the average value increases; when the average value When it is greater than a second threshold, the adjustment factor increases as the average value increases, where the first threshold is less than or equal to the second threshold; when the average is greater than or equal to the first threshold, When it is less than or equal to the second threshold, the adjustment factor is a first preset constant.

In a feasible implementation manner of the second aspect, the calculation module is specifically configured to determine the adjustment factor according to the dispersion and a second mapping relationship between the dispersion and the adjustment factor, where: The second mapping relationship satisfies one or more of the following conditions: when the dispersion is greater than a third threshold, the adjustment factor increases as the dispersion increases; when the dispersion is less than or equal to When the third threshold is mentioned, the adjustment factor is a second preset constant.

In a feasible implementation manner of the second aspect, the pixel information is the mean and the dispersion, and the calculation module is specifically configured to determine a first parameter according to the mean and the first mapping relationship. Determining a second parameter according to the dispersion and the second mapping relationship; and using the product or weighted sum of the first parameter and the second parameter as the adjustment factor.

In a feasible implementation manner of the second aspect, the calculation module is further configured to: perform weight adjustment on the adjustment factor to obtain an adjusted adjustment factor; and use the adjusted adjustment factor as the waiting factor. Modulation factor for processing blocks.

In a feasible implementation manner of the second aspect, the calculation module is further configured to: update the adjustment factor according to the quantization parameter of the block to be processed; correspondingly, the adjustment module is specifically configured to: Adjusting the first residual based on the updated adjustment factor to obtain a second residual of the block to be processed.

In a feasible implementation manner of the second aspect, the adjustment factor is adjusted in the following manner:

In a feasible implementation manner of the second aspect, the number of acquired transformation coefficients of the block to be processed is the same as the number of pixels of the block to be processed, and the conversion module is further configured to: The transform coefficients of the blocks to be processed are arranged into transform coefficient blocks according to a preset position relationship; the transform coefficient blocks are converted into first residual blocks of the blocks to be processed; correspondingly, the adjustment module is specifically configured to: The adjustment factor adjusts the first residual block to obtain a second residual block of the block to be processed.

In a feasible implementation manner of the second aspect, the first residual block includes a first luminance residual block of a luminance component of the block to be processed, and a luminance residual in the first luminance residual block. The pixels correspond to the pixels of the luminance component of the block to be processed in a one-to-one correspondence. Correspondingly, the second residual block includes a second luminance residual block of the luminance component of the block to be processed. The adjustment module is specifically configured to: : Adjusting the luminance residual pixels in the first luminance residual block based on the adjustment factor to obtain the luminance residual pixels in the second luminance residual block of the block to be processed.

In a feasible implementation manner of the second aspect, the luminance residual pixels in the second luminance residual block are obtained in the following manner:

Res2_Y (i) = (Res1_Y (i) × QC + offset_Y) ＞＞ shift_Y

Among them, QC represents the adjustment factor, Res1_Y (i) represents the i-th brightness residual pixel in the first luminance residual block, and Res2_Y (i) represents the i-th pixel in the second luminance residual block. Brightness residual pixels, offset_Y and shift_Y are preset constants, and i is a natural number.

In a feasible implementation manner of the second aspect, the first residual block includes a first chrominance residual block of a chrominance component of the block to be processed, where The chroma residual pixels correspond to the pixels of the chroma component of the block to be processed one by one. Correspondingly, the second residual block includes a second chroma residual block of the chroma component of the block to be processed. The adjustment module is specifically configured to adjust a chroma residual pixel in the first chroma residual block based on the adjustment factor to obtain a chroma in a second chroma residual block of the block to be processed. Residual pixels.

In a feasible implementation manner of the second aspect, the chroma residual pixels in the second chroma residual block are obtained in the following manner:

Res2_C (i) = (Res1_C (i) × QC + offset_C) ＞＞ shift_C

Among them, QC represents the adjustment factor, Res1_C (i) represents the i-th chroma residual pixel in the first chroma residual block, and Res2_C (i) represents the second chroma residual block. The i-th chroma residual pixel, offset_C and shift_C are preset constants, and i is a natural number.

In a feasible implementation manner of the second aspect, the bit width accuracy of the luminance residual pixels in the first luminance residual block is higher than the bit width accuracy of the luminance residual pixels in the second luminance residual block. Bit width precision.

In a feasible implementation manner of the second aspect, a bit width accuracy of a chroma residual pixel in the first chroma residual block is higher than a chroma residual in the second chroma residual block. Bit width accuracy of poor pixels.

In a feasible implementation manner of the second aspect, the conversion module is specifically configured to: inverse quantize each transform coefficient in the transform coefficient block to obtain an inverse quantized transform coefficient block; The inverse-quantized transform coefficient block is inversely transformed to obtain a first residual block of the block to be processed.

In a feasible implementation manner of the second aspect, the apparatus further includes: a reconstruction unit, configured to compare a residual pixel in the second residual with a predicted pixel at a corresponding position in the block to be processed. To obtain reconstructed pixels at the corresponding positions in the block to be processed.

A third aspect of the present application provides a device for acquiring a residual error. The device includes: the device may be applied to an encoding side or a decoding side. The device includes a processor and a memory, and the processor and the memory are connected (for example, connected to each other through a bus). In a possible implementation manner, the device may further include a transceiver, and the transceiver is connected to the processor and the memory for receiving /send data. The memory is used to store program code and video data. The processor may be configured to read the program code stored in the memory and execute the method described in the first aspect.

A fourth aspect of the present application provides a video codec system, which includes a source device and a destination device. The source device can communicate with the destination device. The source device generates encoded video data. Therefore, the source device may be referred to as a video encoding device or a video encoding device. The destination device can decode the encoded video data generated by the source device. Therefore, the destination device may be referred to as a video decoding device or a video decoding device. The source device and the destination device may be examples of a video codec device or a video codec device. The method described in the first aspect will be applied to the video codec device or video codec device.

A fifth aspect of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores instructions that, when run on a computer, cause the computer to execute the method described in the first aspect above.

A sixth aspect of the present application provides a computer program product containing instructions that, when run on a computer, causes the computer to perform the method described in the first aspect above.

It should be understood that the second to sixth aspects of the present application have the same object of the invention and corresponding technical features of the embodiments corresponding to the first aspect of the present application, and similar technical features and beneficial technical effects are also obtained.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly explain the technical solutions in the embodiments of the present application or the background art, the drawings that are needed in the embodiments of the present application or the background art will be described below.

FIG. 1 is an exemplary block diagram of a video encoding and decoding system that can be configured for use in an embodiment of the present application; FIG.

2 is an exemplary system block diagram of a video encoder that can be configured for use in an embodiment of the present application;

FIG. 3 is an exemplary system block diagram of a video decoder that can be configured for use in an embodiment of the present application; FIG.

4 is a schematic flowchart of an exemplary method for acquiring residuals for video data decoding according to an embodiment of the present application;

FIG. 5 is a schematic diagram of pixels in a spatial neighborhood of a block to be processed in an exemplary embodiment of the present application; FIG.

6 is a system block diagram of an exemplary hardware pipeline design in an embodiment of the present application;

7 is a system block diagram of a residual acquisition device for video data decoding according to an exemplary embodiment of the present application;

FIG. 8 is a system block diagram of a residual acquisition device for video data decoding according to an exemplary embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

FIG. 1 is a schematic block diagram of a video encoding and decoding system 10 according to an embodiment of the present application. As shown in FIG. 1, the system 10 includes a source device 12 that generates encoded video data to be decoded by the destination device 14 at a later time. Source device 12 and destination device 14 may include any of a wide range of devices including desktop computers, notebook computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" "Touchpads, TVs, cameras, displays, digital media players, video game consoles, video streaming devices, or the like. In some applications, the source device 12 and the destination device 14 may be equipped for wireless communication.

The destination device 14 may receive the encoded video data to be decoded via the link 16. The link 16 may include any type of media or device capable of moving the encoded video data from the source device 12 to the destination device 14. In one possible implementation, the link 16 may include a communication medium that enables the source device 12 to directly transmit the encoded video data to the destination device 14 in real time. The encoded video data may be modulated according to a communication standard (eg, a wireless communication protocol) and transmitted to the destination device 14. Communication media may include any wireless or wired communication media, such as a radio frequency spectrum or one or more physical transmission lines. Communication media may form part of a packet-based network, such as a global network of a local area network, a wide area network, or the Internet. The communication medium may include a router, a switch, a base station, or any other equipment that may be used to facilitate communication from the source device 12 to the destination device 14.

Alternatively, the encoded data may be output from the output interface 22 to the storage device 24. Similarly, the encoded data can be accessed from the storage device 24 by an input interface. The storage device 24 may include any of a variety of distributed or locally-accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory Or any other suitable digital storage medium for storing encoded video data. In another possible implementation, the storage device 24 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 12. The destination device 14 may access the stored video data from the storage device 24 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting this encoded video data to the destination device 14. Possible implementations The file server includes a web server, a file transfer protocol server, a network attached storage device, or a local disk drive. The destination device 14 may access the encoded video data via any standard data connection including an Internet connection. This data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device 24 may be a streaming transmission, a download transmission, or a combination of the two.

The techniques of this application are not necessarily limited to wireless applications or settings. The technology can be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), encoding digital video for use in Digital video or other applications stored on a data storage medium and decoded on the data storage medium. In some feasible implementations, the system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.

In the feasible embodiment of FIG. 1, the source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some applications, the output interface 22 may include a modulator / demodulator (modem) and / or a transmitter. In the source device 12, the video source 18 may include a source such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface to receive video from a video content provider , And / or a computer graphics system for generating computer graphics data as a source video, or a combination of these sources. As a feasible implementation manner, if the video source 18 is a video camera, the source device 12 and the destination device 14 may form a so-called camera phone or video phone. The techniques described in this application may be exemplarily applicable to video decoding and may be applicable to wireless and / or wired applications.

Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video data may be transmitted directly to the destination device 14 via the output interface 22 of the source device 12. The encoded video data may also (or alternatively) be stored on the storage device 24 for later access by the destination device 14 or other device for decoding and / or playback.

The destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some applications, the input interface 28 may include a receiver and / or a modem. The input interface 28 of the destination device 14 receives the encoded video data via the link 16. The encoded video data communicated or provided on the storage device 24 via the link 16 may include various syntax elements generated by the video encoder 20 for use by the video decoder 30 of the video decoder 30 to decode the video data. These syntax elements may be included with the encoded video data transmitted on a communication medium, stored on a storage medium, or stored on a file server.

The display device 32 may be integrated with or external to the destination device 14. In some possible implementations, the destination device 14 may include an integrated display device and also be configured to interface with an external display device. In other feasible implementations, the destination device 14 may be a display device. Generally, the display device 32 displays the decoded video data to a user, and may include any of a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or another type of display device.

Video encoder 20 and video decoder 30 may operate according to, for example, the next-generation video codec compression standard (H.266) currently under development and may conform to the H.266 test model (JEM). Alternatively, the video encoder 20 and video decoder 30 may be based on, for example, the ITU-TH.265 standard, also referred to as a high-efficiency video decoding standard, or other proprietary or industrial standards of the ITU-TH.264 standard or extensions of these standards While operating, the ITU-TH.264 standard is alternatively referred to as MPEG-4 Part 10, also known as Advanced Video Coding (AVC). However, the techniques of this application are not limited to any particular decoding standard. Other possible implementations of the video compression standard include MPEG-2 and ITU-TH.263.

Although not shown in FIG. 1, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer ( MUX-DEMUX) unit or other hardware and software to handle encoding of both audio and video in a common or separate data stream. If applicable, in some feasible implementations, the MUX-DEMUX unit may conform to the ITUH.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, software, hardware, firmware, or any combination thereof. When the technology is partially implemented in software, the device may store the software's instructions in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this application. Each of the video encoder 20 and the video decoder 30 may be included in one or more encoders or decoders, and any of them may be integrated as a combined encoder / decoder (CODEC) in a corresponding device. part.

This application may exemplarily relate to the video encoder 20 "signaling" certain information to another device such as the video decoder 30. It should be understood, however, that video encoder 20 may signal information by associating specific syntax elements with various encoded portions of video data. That is, video encoder 20 may "signal" the data by storing specific syntax elements to the header information of various encoded portions of the video data. In some applications, these syntax elements may be encoded and stored (eg, stored to storage system 34 or file server 36) before being received and decoded by video decoder 30. Thus, the term "signaling" may illustratively refer to the transmission of syntax or other data used to decode compressed video data, regardless of whether this transmission occurs in real-time or near real-time or over a time span, such as may be encoded Occurs when a syntax element is stored to a media, which can then be retrieved by a decoding device at any time after it is stored on this media.

JCT-VC has developed the H.265 (HEVC) standard. The HEVC standardization is based on an evolution model of a video decoding device called a HEVC test model (HM). The latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16). The standard document is in full text. The citation is incorporated herein. HM assumes that video decoding devices have several additional capabilities over existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction coding modes, while HM provides up to 35 intra-prediction coding modes.

JVET is committed to developing the H.266 standard. The process of H.266 standardization is based on the evolution model of the video decoding device called the H.266 test model. The algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet. The latest algorithm description is included in JVET-G1001-v1. The algorithm description document is incorporated herein by reference in its entirety. . At the same time, reference software for the JEM test model can be obtained from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.

Generally speaking, the working model description of HM can divide a video frame or image into a sequence of tree blocks or maximum coding units (LCUs) containing both luminance and chrominance samples. LCUs are also called CTUs. The tree block has a similar purpose as the macro block of the H.264 standard. A slice contains several consecutive tree blocks in decoding order. A video frame or image can be split into one or more slices. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node and split into another four child nodes. The final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks. The syntax data associated with the decoded codestream can define the maximum number of times a tree block can be split, and can also define the minimum size of a decoding node.

The coding unit includes a decoding node and a prediction unit (PU), and a transformation unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU can range from 8 × 8 pixels to a maximum block size of 64 × 64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. For example, the syntax data associated with a CU may describe a case where a CU is partitioned into one or more PUs. The partition mode may be different between cases where the CU is skipped or is coded in direct mode, intra prediction mode, or inter prediction mode. The PU can be divided into non-square shapes. For example, the syntax data associated with a CU may also describe a case where a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.

The HEVC standard allows transformation based on the TU, which can be different for different CUs. The TU is usually sized based on the size of the PU within a given CU defined for the partitioned LCU, but this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some feasible implementations, a quad-tree structure called "residual quad tree" (RQT) can be used to subdivide the residual samples corresponding to the CU into smaller units. The leaf node of the RQT may be called a TU. The pixel difference values associated with the TU may be transformed to produce a transformation coefficient, which may be quantized.

Generally speaking, the PU contains data related to the prediction process. For example, when a PU is intra-mode encoded, the PU may include data describing the intra-prediction mode of the PU. As another feasible implementation manner, when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU. For example, the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).

Generally, TU uses transform and quantization processes. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 20 may calculate a residual value corresponding to the PU. Residual values include pixel differences, which can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding. This application generally uses the term "video block" to refer to the decoding node of a CU. In some specific applications, the term “video block” may also be used in this application to refer to a tree block including a decoding node and a PU and a TU, such as an LCU or a CU.

A video sequence usually contains a series of video frames or images. A group of pictures (GOP) exemplarily includes a series, one or more video pictures. The GOP may include syntax data in the header information of the GOP, the header information of one or more of the pictures, or elsewhere, and the syntax data describes the number of pictures included in the GOP. Each slice of the image may contain slice syntax data describing the coding mode of the corresponding image. Video encoder 20 typically operates on video blocks within individual video slices to encode video data. A video block may correspond to a decoding node within a CU. Video blocks may have fixed or varying sizes, and may differ in size according to a specified decoding standard.

As a feasible implementation, HM supports prediction of various PU sizes. Assuming that the size of a specific CU is 2N × 2N, HM supports intra prediction of PU sizes of 2N × 2N or N × N, and symmetrical PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N prediction. HM also supports asymmetric partitioning of PU-sized inter predictions of 2N × nU, 2N × nD, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of the CU is not partitioned, and the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Therefore, for example, “2N × nU” refers to a horizontally-divided 2N × 2NCU with 2N × 0.5NPU at the top and 2N × 1.5NPU at the bottom.

In this application, “N × N” and “N times N” are used interchangeably to refer to the pixel size of a video block according to vertical and horizontal dimensions, for example, 16 × 16 pixels or 16 × 16 pixels. In general, a 16 × 16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Similarly, an N × N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. Pixels in a block can be arranged in rows and columns. In addition, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N × M pixels, where M is not necessarily equal to N.

After the intra-predictive or inter-predictive decoding of the PU using the CU, the video encoder 20 may calculate the residual data of the TU of the CU. The PU may include pixel data in the spatial domain (also referred to as the pixel domain), and the TU may include transforming (e.g., discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after applying to the residual video data. The residual data may correspond to a pixel difference between a pixel of an uncoded image and a prediction value corresponding to a PU. Video encoder 20 may form a TU containing residual data of the CU, and then transform the TU to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization exemplarily refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "quadtree and binary tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, and TU in HEVC, and supports more flexible CU division shapes. A CU can be square or rectangular. A CTU first performs a quadtree partition, and the leaf nodes of the quadtree further perform a binary tree partition. At the same time, there are two partitioning modes in binary tree partitioning, symmetrical horizontal partitioning and symmetrical vertical partitioning. The leaf nodes of a binary tree are called CUs. JEM's CUs cannot be further divided during the prediction and transformation process, which means that JEM's CU, PU, and TU have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luminance pixels.

In some feasible implementations, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other possible implementations, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may perform context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), syntax-based context adaptive binary Arithmetic decoding (SBAC), probability interval segmentation entropy (PI, PE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 to decode the video data.

To perform CABAC, video encoder 20 may assign a context within a context model to a symbol to be transmitted. Context can be related to whether adjacent values of a symbol are non-zero. To perform CAVLC, video encoder 20 may select a variable length code of a symbol to be transmitted. Codewords in Variable Length Decoding (VLC) can be constructed such that relatively short codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rates relative to using equal length codewords for each symbol to be transmitted. The probability in CABAC can be determined based on the context assigned to the symbol.

FIG. 2 is a schematic block diagram of a video encoder 20 according to an embodiment of the present application. Video encoder 20 may perform intra-frame decoding and inter-frame decoding of video blocks within a video slice. Intra decoding relies on spatial prediction to reduce or remove the spatial redundancy of a video within a given video frame or image. Inter-frame decoding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames of a video sequence or video. The intra mode (I mode) may refer to any of several space-based compression modes. Inter-modes such as unidirectional prediction (P mode) or bidirectional prediction (B mode) may refer to any of several time-based compression modes.

In the feasible embodiment of FIG. 2, the video encoder 20 includes a segmentation unit 35, a prediction unit 41, a reference image memory 64, a summer 50, a transformation processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction unit 41 includes a motion estimation unit 42, a motion compensation unit 44, and an intra prediction module 46. For video block reconstruction, the video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and a summer 62. A deblocking filter (not shown in Figure 2) may also be included to filter block boundaries to remove block effect artifacts from the reconstructed video. When needed, the deblocking filter will typically filter the output of the summer 62. In addition to the deblocking filter, additional loop filters (in-loop or post-loop) can be used.

As shown in FIG. 2, the video encoder 20 receives video data, and the dividing unit 35 divides the data into video blocks. This segmentation may also include segmentation into slices, image blocks, or other larger units, and video block segmentation, for example, based on the quad-tree structure of the LCU and CU. Video encoder 20 exemplarily illustrates the components of a video block encoded within a video slice to be encoded. In general, a slice can be divided into multiple video blocks (and possibly into a collection of video blocks called image blocks).

The prediction unit 41 may select one of a plurality of possible decoding modes of the current video block, such as a plurality of intra decoding modes, based on a calculation result of coding quality and cost (for example, code rate-distortion cost, RDcost, also called rate distortion cost). One or more of the inter decoding modes. The prediction unit 41 may provide the obtained intra decoded or inter decoded block to the summer 50 to generate residual block data and provide the obtained intra decoded or inter decoded block to the summer 62 to reconstruct The encoded block is thus used as a reference image.

The motion estimation unit 42 and the motion compensation unit 44 within the prediction unit 41 perform inter-predictive decoding of a current video block with respect to one or more predictive blocks in one or more reference images to provide temporal compression. The motion estimation unit 42 may be configured to determine an inter prediction mode of a video slice according to a predetermined pattern of a video sequence. The predetermined mode can specify the video slices in the sequence as P slices, B slices, or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 may be highly integrated, but are described separately for conceptual purposes. The motion estimation performed by the motion estimation unit 42 is a process of generating a motion vector of an estimated video block. For example, a motion vector may indicate a displacement of a PU of a video block within a current video frame or image relative to a predictive block within a reference image.

A predictive block is a block that is found to closely match the PU of the video block to be decoded according to the pixel difference. The pixel difference can be determined by the sum of absolute differences (SAD), sum of squared differences (SSD), or other differences. In some feasible implementations, the video encoder 20 may calculate a value of a sub-integer pixel position of a reference image stored in the reference image memory 64. For example, video encoder 20 may interpolate values of quarter pixel positions, eighth pixel positions, or other fractional pixel positions of the reference image. Therefore, the motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector with a fractional pixel accuracy.

The motion estimation unit 42 calculates the motion vector of the PU of the video block in the inter-decoded slice by comparing the position of the PU with the position of the predictive block of the reference image. Reference images can be selected from the first reference image list (List 0) or the second reference image list (List 1), each of the lists identifying one or more reference images stored in the reference image memory 64. The motion estimation unit 42 sends the calculated motion vector to the entropy encoding unit 56 and the motion compensation unit 44.

Motion compensation performed by the motion compensation unit 44 may involve extracting or generating a predictive block based on a motion vector determined by motion estimation, possibly performing interpolation to sub-pixel accuracy. After receiving the motion vector of the PU of the current video block, the motion compensation unit 44 can locate the predictive block pointed to by the motion vector in one of the reference image lists. Video encoder 20 forms a residual video block by subtracting the pixel value of the predictive block from the pixel value of the current video block being decoded, thereby forming a pixel difference value. The pixel difference values form the residual data of the block, and may include both luminance and chrominance difference components. The summer 50 represents one or more components that perform this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with video blocks and video slices for use by video decoder 30 to decode video blocks of video slices.

If the PU is located in the B slice, the picture containing the PU can be associated with two reference picture lists called "List 0" and "List 1". In some possible implementations, an image containing B bands may be associated with a list combination that is a combination of list 0 and list 1.

In addition, if the PU is located in the B slice, the motion estimation unit 42 may perform unidirectional prediction or bidirectional prediction for the PU, wherein in some feasible implementations, the bidirectional prediction is a reference image list based on the list 0 and the list 1, respectively In other feasible implementation manners, the bidirectional prediction is a prediction based on a reconstructed future frame and a reconstructed past frame in the display order of the current frame, respectively. When the motion estimation unit 42 performs one-way prediction for a PU, the motion estimation unit 42 may search a reference image of List 0 or List 1 for a reference block for the PU. The motion estimation unit 42 may then generate a reference index indicating a reference image containing a reference block in List 0 or List 1 and a motion vector indicating a spatial displacement between the PU and the reference block. The motion estimation unit 42 may output a reference index, a prediction direction identifier, and a motion vector as motion information of the PU. The prediction direction identification may indicate a reference image in the reference index indication list 0 or list 1. The motion compensation unit 44 may generate a predictive image block of the PU based on a reference block indicated by the motion information of the PU.

When the motion estimation unit 42 performs bidirectional prediction for a PU, the motion estimation unit 42 may search for a reference block for the PU in the reference image in list 0 and may also search for another for the PU in the reference image in list 1 Reference block. The motion estimation unit 42 may then generate a reference index indicating the reference image containing the reference block in List 0 and List 1 and a motion vector indicating the spatial displacement between the reference block and the PU. The motion estimation unit 42 may output a reference index and a motion vector of the PU as motion information of the PU. The motion compensation unit 44 may generate a predictive image block of the PU based on a reference block indicated by the motion information of the PU.

In some feasible implementations, the motion estimation unit 42 does not output the complete set of motion information for the PU to the entropy encoding module 56. Instead, the motion estimation unit 42 may refer to the motion information of another PU to signal the motion information of the PU. For example, the motion estimation unit 42 may determine that the motion information of a PU is sufficiently similar to the motion information of a neighboring PU. In this embodiment, the motion estimation unit 42 may indicate an indication value in the syntax structure associated with the PU, which indicates to the video decoder 30 that the PU has the same motion information as the neighboring PU or has a Motion information derived from neighboring PUs. In another embodiment, the motion estimation unit 42 may identify candidate prediction motion vectors and motion vector differences (MVDs) associated with neighboring PUs in a syntax structure associated with the PUs. MVD indicates the difference between the motion vector of the PU and the indicated candidate prediction motion vector associated with the neighboring PU. Video decoder 30 may use the indicated candidate predicted motion vector and MVD to determine the motion vector of the PU.

As described above, the prediction module 41 may generate a list of candidate prediction motion vectors for each PU of the CU. One or more of the candidate prediction motion vector lists may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors.

The intra prediction unit 46 within the prediction unit 41 may perform intra predictive decoding of the current video block relative to one or more neighboring blocks in the same image or slice as the current block to be decoded to provide spatial compression . Therefore, instead of the inter prediction (as described above) performed by the motion estimation unit 42 and the motion compensation unit 44, the intra prediction unit 46 may intra-predict the current block. In particular, the intra prediction unit 46 may determine an intra prediction mode to use to encode the current block. In some feasible implementations, the intra prediction unit 46 may, for example, use various intra prediction modes to encode the current block during separate encoding traversals, and the intra prediction unit 46 (or in some feasible implementations, The mode selection unit 40) may select an appropriate intra prediction mode to use from the tested modes.

After the prediction unit 41 generates a predictive block of the current video block via inter prediction or intra prediction, the video encoder 20 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to the transform processing unit 52. The transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform (eg, a discrete sine transform DST). The transform processing unit 52 may transform the residual video data from a pixel domain to a transform domain (for example, a frequency domain).

The transformation processing unit 52 may send the obtained transformation coefficient to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the code rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some feasible implementations, the quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, the entropy encoding unit 56 may perform scanning.

After quantization, the entropy encoding unit 56 may entropy encode the quantized transform coefficients. For example, the entropy encoding unit 56 may perform context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), syntax-based context adaptive binary arithmetic decoding (SBAC), probability interval partitioning entropy ( PIPE) decoding or another entropy coding method or technique. The entropy encoding unit 56 may also entropy encode the motion vector and other syntax elements of the current video slice being decoded. After entropy encoding by the entropy encoding unit 56, the encoded code stream may be transmitted to the video decoder 30 or archived for later transmission or retrieved by the video decoder 30.

The entropy encoding unit 56 may encode information indicating a selected intra prediction mode according to the technique of the present application. Video encoder 20 may include encoding of various blocks in transmitted stream configuration data that may include multiple intra prediction mode index tables and multiple modified intra prediction mode index tables (also known as codeword mapping tables). Definition of the context and an indication of the MPM, the intra prediction mode index table, and the modified intra prediction mode index table for each of the contexts.

The inverse quantization unit 58 and the inverse transform unit 60 respectively apply inverse quantization and inverse transform to reconstruct a residual block in the pixel domain for later use as a reference block of a reference image. The motion compensation unit 44 may calculate a reference block by adding a residual block to a predictive block of one of the reference pictures within one of the reference picture lists. The motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. The summer 62 adds the reconstructed residual block and the motion-compensated prediction block generated by the motion compensation unit 44 to generate a reference block for storage in the reference image memory 64. The reference block may be used by the motion estimation unit 42 and the motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or image.

In the embodiment of the present application, after processing the inverse transform unit 60 to obtain residual data, a scaling factor may be calculated according to the reconstructed spatial neighborhood pixel information around the current block to be encoded, and the obtained residual is scaled using the scaling factor. Processing to obtain residual data for subsequent reconstruction of reference blocks or reference pixels.

FIG. 3 is a schematic block diagram of a video decoder 30 in an embodiment of the present application. In the feasible embodiment of FIG. 3, the video decoder 30 includes an entropy encoding unit 80, a prediction unit 81, an inverse quantization unit 86, an inverse transform unit 88, a summer 90, and a reference image memory 92. The prediction unit 81 includes a motion compensation unit 82 and an intra prediction unit 84. In some feasible implementations, the video decoder 30 may perform an exemplary reciprocal decoding flow with the encoding flow described with respect to the video encoder 20 from FIG. 4.

During the decoding process, video decoder 30 receives from video encoder 20 an encoded video codestream representing video blocks of the encoded video slice and associated syntax elements. The entropy encoding unit 80 of the video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. The entropy encoding unit 80 forwards the motion vector and other syntax elements to the prediction unit 81. Video decoder 30 may receive syntax elements at a video slice level and / or a video block level.

When a video slice is decoded into an intra-decoded (I) slice, the intra-prediction unit 84 of the prediction unit 81 may be based on a signaled intra-prediction mode and data from a previously decoded block of the current frame or image The prediction data of the video block of the current video slice is generated.

When the video image is decoded into inter-decoded (eg, B, P, or GPB) slices, the motion compensation unit 82 of the prediction unit 81 generates the current video based on the motion vector and other syntax elements received from the entropy encoding unit 80 A predictive block of a video block of an image. A predictive block may be generated from one of the reference pictures within one of the reference picture lists. The video decoder 30 may construct a reference image list (List 0 and List 1) using a default construction technique based on the reference image stored in the reference image memory 92.

The motion compensation unit 82 determines the prediction information of the video block of the current video slice by parsing the motion vector and other syntax elements, and uses the prediction information to generate the predictive block of the current video block being decoded. For example, the motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra prediction or inter prediction) of a video block to decode a video slice, an inter prediction slice type (e.g., B slice, P slice or GPB slice), construction information of one or more of the slice's reference image list, motion vector of each inter-coded video block of the slice, each warped frame of the slice The inter-prediction state of the inter-decoded video block and other information used to decode the video block in the current video slice.

The motion compensation unit 82 may also perform interpolation based on the interpolation filter. The motion compensation unit 82 may calculate an interpolation value of the sub-integer pixels of the reference block using an interpolation filter as used by the video encoder 20 during encoding of the video block. In this application, the motion compensation unit 82 may determine an interpolation filter used by the video encoder 20 from the received syntax elements and use the interpolation filter to generate a predictive block.

If the PU is encoded using inter prediction, the motion compensation unit 82 may generate a list of candidate prediction motion vectors for the PU. The codestream may include data identifying the position of the selected candidate prediction motion vector in the candidate prediction motion vector list of the PU. After generating a list of candidate prediction motion vectors for the PU, the motion compensation unit 82 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU. A reference block of a PU may be in a temporal image different from the PU. The motion compensation unit 82 may determine the motion information of the PU based on the selected motion information in the candidate prediction motion vector list of the PU.

The inverse quantization unit 86 performs inverse quantization (for example, dequantization) on the quantized transform coefficient provided in the code stream and decoded by the entropy encoding unit 80. The inverse quantization process may include determining the degree of quantization using quantization parameters calculated by video encoder 20 for each video block in the video slice, and similarly determining the degree of inverse quantization that should be applied. The inverse transform unit 88 applies an inverse transform (for example, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to generate a residual block in the pixel domain. Corresponding to the encoding end, in the embodiment of the present application, after processing the inverse transform unit 88 to obtain residual data, a scaling factor can be calculated according to the reconstructed spatial neighborhood pixel information around the current block to be decoded, and the scaling factor is used to The obtained residual is scaled to obtain residual data for subsequent reconstruction of the block to be decoded.

After the motion compensation unit 82 generates a predictive block of the current video block based on the motion vector and other syntax elements, the video decoder 30 determines the residual block from the inverse transform unit 88 and the corresponding predictive block generated by the motion compensation unit 82. And to form decoded video blocks. The summer 90 represents one or more components that perform this summing operation. When needed, a deblocking filter may also be applied to filter the decoded blocks in order to remove block effect artifacts. Other loop filters (in or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. The decoded video blocks in a given frame or image are then stored in a reference image memory 92, which stores a reference image for subsequent motion compensation. The reference image memory 92 also stores decoded video for later presentation on a display device such as the display device 32 of FIG. 1.

It should be understood that the techniques of this application may be performed by any of the video decoders described in this application. The video decoder includes, for example, video encoder 20 and video decoding as shown and described with respect to FIGS. 1-3.器 30。 30. That is, in a feasible implementation manner, the inverse transform unit 60 described with reference to FIG. 2 may perform the inverse transform unit 60 or other newly-added functional units after performing inverse transform during the encoding of a block of video data. The specific technology described. In another possible implementation, the inverse transform unit 88 or other newly added functional units described with respect to FIG. 3 may perform specific techniques described below during decoding of blocks of video data. Thus, a reference to a generic "video encoder" or "video decoder" may include video encoder 20, video decoder 30, or another video encoding or coding unit.

FIG. 4 schematically illustrates a flowchart of a method for acquiring a residual according to an embodiment of the present application. For example, the method may be performed by the video decoder 30. The video decoding method is described as a series of steps or operations. It should be understood that the method may be performed in various orders and / or occur simultaneously, and is not limited to the execution order shown in FIG. 4. Assuming a video data stream with multiple video frames is using a video decoder, performing the following steps includes decoding the current image block to be processed for the current video frame. It should also be understood that in the embodiment of the present application, the acquisition of the adjustment factor is related to the pixel information in the preset spatial neighborhood of the current block to be processed, and for the encoding end and the decoding end, the preset spatial neighbor of the current block to be processed is The pixel information in the domain is the same, so the adjustment factors are the same, and the adjustment of the residual data is corresponding. Those skilled in the art can understand that in general, encoding is an inverse process corresponding to decoding. Therefore, the technical solution embodied in the embodiment of the present application can also be executed by the video encoder 20 at the encoding end, and will not be described repeatedly.

S401. Parse a bitstream to obtain a transform coefficient of a block to be processed.

This step belongs to entropy decoding technology. Specifically, according to a preset parsing rule, a syntax element represented in bit form (binary value) in a code stream is parsed into an actual value corresponding to the syntax element. The analysis involves transform coefficients. Specifically, the binary representation of the transform coefficients in the code stream is parsed into specific values of the transform coefficients through the analysis rules of the transform coefficients. It should be understood that multiple transformation coefficients of the block to be processed are sequentially analyzed. Generally, the number of acquired transformation coefficients of the block to be processed is the same as the number of pixels of the block to be processed, and the transformed coefficients of the parsed block to be processed are arranged according to a preset positional relationship as Transform coefficient blocks. This process is often called the inverse scan or scan process. The preset positional relationship includes a fixed mapping position of a preset transform coefficient, and also a mapping position of a transform coefficient determined according to a preset rule, such as a mapping position of a transform coefficient determined according to an intra prediction mode (also referred to as Scan mode of transform coefficients in intra prediction mode).

Typical entropy decoding techniques, including the CABAC mentioned above, can be referred to the introduction in the H.265 standard (Rec.ITU-T H.265v4), pages 201 to 243, section 9.3. JEM has also improved the CABAC technology. For details, please refer to JVET-G1001-v1, page 41 to page 43, and the introduction in section 2.6. The embodiment of the present application does not limit which entropy decoding technology is used.

S402. Transform the transform coefficient into a first residual of the block to be processed.

It should be understood that when the transform coefficients of the block to be processed are arranged into transform coefficient blocks, correspondingly, the first residual of converting the transform coefficient to the block to be processed includes: transforming the transform coefficient block into A first residual block of the block to be processed; correspondingly, adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed includes: based on the adjustment factor Adjusting the first residual block to obtain a second residual block of the block to be processed.

Generally, this step can be divided into two sub-steps:

S4021: Inverse quantize each transform coefficient in the transform coefficient block to obtain an inverse quantized transform coefficient block.

In one example, inverse quantization is performed on the quantized transformation coefficient A (i) to obtain a reconstructed transformation coefficient R (i), which can be described as:

R (i) = sign {A (i)} · round {A (i) · Qs (i) + o2 (i)}

The quantization step size Qs (i) can be a floating point number, and o2 (i) is a rounding offset. In some feasible implementations, in order to avoid the use of floating-point operations, integer addition and shift are used to approximate the replacement of floating-point multiplication. For example, H.265 / HEVC approximates the inverse quantization process expressed by the above formula as:

R (i) = sign {A (i)} · (A (i) · Qs' (i) + (1 << (bdshift-1)))> bdshift

Among them, bdshift is the shift parameter, Qs '(i) is an integer, and Qs' (i) / 2 ^{bdshift is} similar to the quantization step size Qs (i) in the above formula. At this time, o2 (i) = 0.5. Round down.

In one example, Qs' (i) is determined by the level scale l (i) and the scaling factor m (i).

Qs' (i) = m (i) · l (i)

And l (i) is a function of Quantization Parameter (QP), that is,

Among them, the level scale table levelScale [k] = {40,45,51,57,64,72}, k = 0,1, ..., 5;

It means that QP (i) is rounded off by 6 and% is the remainder operation.

In particular, when the product of the length and width of a transform block is equal to an odd power of 2, R (i) can also be obtained by the following formula:

R (i) = sign {A (i)} · (A (i) · Qs' (i) · a + (1 << (bdshift-1 + s))) >> (bdshift + s)

Where a and s are preset constants, and

For example, a = 181 and s = 8.

It should be understood that, in this text, the symbol << indicates a left shift operation, and the symbol >> indicates a right shift operation, which will not be described again.

This step is generally called inverse quantization or scaling. In the H.265 standard, scalar quantization is used to perform inverse quantization. For details, see JCTVC-M1002-v1 (available from http: //phenix.int-evry .fr / jct / Get) on page 20, section 3.5.5, or H.265 standards on pages 173 to 177, and section 8.6, are not repeated here. At the same time, it should also be understood that vector quantization can also be used for inverse quantization. The embodiment of the present application does not limit which inverse quantization technology is used.

S4022. Inverse transform the inverse-quantized transform coefficient block to obtain a first residual block of the block to be processed.

This step is generally called an inverse transform. Typical inverse transform techniques include Inverse Discrete Cosine Transform (IDCT) or Inverse Discrete Sine Transform (IDST) in H.265. More specifically, Such as DCT-II type inverse transform or DST-VII type inverse transform, it can also be DCT-VIII type inverse transform or DST-I type inverse transform; for example, an inverse transform is determined by the transform mode information of the transform block, using the above The determined inverse transform performs inverse transform processing, such as Adaptive Multiple Core Transform (AMT) in JEM. The inverse transform processing may also include performing an inseparable second transform on the partially inversely quantized transform coefficients to obtain a new set of transform coefficients, such as NSST (Non-Separable and Secondary Transform) processing in JEM, and then using discrete cosine-based The transform or inverse transform of the discrete sine transform inverse transforms this new set of transform coefficients. For details, please refer to the introduction of transformation technology in JCTVC-M1002-v1, pages 18 to 20, and section 3.5. JEM has also improved the transformation and inverse transformation technologies. For details, please refer to JVET-G1001-v1, pages 28 to 35, and the introduction in section 2.4, which will not be repeated. In the embodiment of the present application, there is no limitation on which inverse transform technology is used.

It should be understood that, in some feasible implementation manners, corresponding to the encoding end processing, in the process of transforming the transform coefficients into the first residual of the block to be processed, only inverse quantization exists (at this time, the transform The coefficient is actually a residual value after quantization) or there is only an inverse transform, which is not limited in the embodiment of the present application.

S403. Calculate pixel information in a preset spatial neighborhood of the block to be processed based on pixel values in the preset spatial neighborhood of the block to be processed.

It should be understood that this step only needs to use the pixel information in the preset spatial neighborhood of the block to be processed, and does not need to wait for the completion of steps S401 and S402. Similarly, steps S401 and S402 need not wait for the completion of step S403. That is, there is no sequential relationship.

Specifically, this step can be divided into two sub-steps:

S4031. Obtain one or more pixel sets in the preset spatial neighborhood.

First, the concept of spatial neighborhood is introduced: the pixels in the spatial neighborhood of the current to-be-processed (to-be-decoded) image block refer to pixels on the same frame as the current to-be-processed image block. With reference to FIG. 5, the spatial neighborhood pixels of the current image block to be processed may include: reconstruction values of at least one pixel in the spatial neighborhood Y of the image block X (also referred to as the image region X). Specifically, the spatial neighborhood pixels It can include M pixels, where M is a positive integer, and several alternative examples of the spatial neighborhood Y include:

As shown in Figures 5 (a) -5 (d), the image block X (indicated by the solid line) corresponds to a w × h coding unit (that is, the width of the coding unit is w pixels and the height is h pixels. The end can also be called a decoding unit, a decoding block, etc.), and the spatial neighborhood Y (indicated by a dotted line) is constituted in one of the following four ways:

1) Method 1: w × n pixels above X, m × h pixels to the left of X, and m × n pixels to the left of X, as shown in FIG. 5 (a), at this time M = w × n + m × h + m × n.

2) Method two: w × n pixels above X and m × h pixels to the left of X, as shown in FIG. 5 (b).

3) Method three: w × n pixels above X, m × h pixels to the left of X, and m × h pixels to the right of X, as shown in FIG. 5 (c).

4) Method 4: w × n pixels above X, w × n pixels below X, m × h pixels to the left of X, and m × h pixels to the right of X, as shown in Figure 5 (d) Show.

As shown in FIGS. 5 (e) -5 (f), the image block X corresponds to a w × h region in a wc × hc coding unit C (indicated by a dotted line), and the structure of the spatial neighborhood Y is, for example, the following 2 One of the species:

1) Method 1: Wc × n pixels above the coding unit C to which X belongs, and m × hc pixels to the left of C, as shown in FIG. 5 (e).

2) Method two: wc × n pixels above the coding unit C to which X belongs, m × hc pixels to the left of C, and m × hc pixels to the right of C, as shown in FIG. 5 (f).

Among them, m and n are preset constants, for example, m = n = 1, or m = n = 2, or m = 2, n = 1, or m = 1, n = 2. m and n may also be related to the size of the image block X, for example, when the width of the image block X is less than or equal to the first threshold (for example, 8), n = 2; when the width of the image block X is greater than the first threshold (for example, 8) In this case, n = 1. The spatial neighborhood pixels may be all pixels in the spatial neighborhood Y, or may be a part of pixels sampled from the spatial neighborhood Y, which is not limited in the present invention.

In some feasible implementation manners, before the acquiring one or more pixel sets in the preset spatial neighborhood, the method further includes: determining all pixels in each pixel set in the one or more pixel sets. Refactoring is complete.

Specifically, it is checked whether the pixels in the spatial neighborhood have been reconstructed, and the brightness values of the reconstructed pixels in the spatial neighborhood are obtained. For example, for the spatial neighborhood structure shown in FIG. 5 (b), it is respectively checked whether the pixels on the left and upper sides of the image region X have been reconstructed to obtain the brightness values of the pixels that have been reconstructed in these regions. As another example, for the spatial neighborhood structure shown in FIG. 5 (c), it is checked whether the pixels on the left, upper, and right sides of the image region X have been reconstructed to obtain the brightness values of the pixels that have been reconstructed in these regions. . For another example, for the spatial neighborhood structure shown in FIG. 5 (c), it is checked whether the left, upper, and right pixels of the image region X have been reconstructed. If the pixels on the left and right have been reconstructed, But the pixels on the upper side have not been reconstructed, then the brightness values of the pixels on the left and right sides are obtained; if the pixels on all three sides have been reconstructed, the brightness values of the pixels on the left and upper sides are obtained; Reconstructed, but the pixels on the right are not reconstructed, the brightness values of the left and upper pixels are obtained.

For acquiring one or more pixel sets in the preset spatial neighborhood, the entire Y region can be understood as a preset spatial neighborhood, and the pixels on the left, upper, and right sides of X each constitute a pixel set, or Let X's left side, upper side, and right side each be a preset space area. It should be understood that a pixel set may include only one pixel or all pixels in a preset spatial neighborhood.

Optionally, if the number of reconstructed pixels in the spatial neighborhood Y is less than a threshold, the adjustment factor may be set to a preset constant, and S4032 and S404 need not be performed. The threshold value is, for example, 16 and the spatial neighborhood Y includes 1/4 of the number of pixels.

S4032. Calculate the mean and / or dispersion of pixels in the one or more pixel sets to obtain pixel information in the preset spatial neighborhood.

In the embodiment of the present invention, in order to achieve the effect of adaptively adjusting the residual error, the spatial neighborhood pixel information of the current block to be processed (ie, the transform block) is used to simulate the original pixel information corresponding to the current block to be processed. The statistical characteristics of spatial neighborhood pixel information refer to the numerical results obtained by statistically analyzing the pixel values of multiple pixels in the spatial neighborhood pixel information. The statistical characteristics of spatial neighborhood pixel information may include at least the pixel mean _Pavg and / or pixels. Dispersion P _con . The statistical characteristics of the pixel information in the spatial neighborhood reflect the characteristics of the background area where the current image block is located to some extent (such as background brightness and background contrast).

Among them, the average value of the brightness value (ie, the brightness component) of K1 pixels in the spatial neighborhood pixel information is _Pavg , which is simply referred to as the pixel average value, that is:

Where P (k) is the brightness value (ie, the brightness component) of a pixel in the spatial neighborhood, where K1 is a positive integer less than or equal to M, such as K1 = M / 2 or M, where the spatial neighborhood pixels include M Pixels.

The mean absolute error sum (Mean Absolute Difference, MAD) of the brightness values of the K2 pixels and the pixel average value P _avg in the pixel information of the spatial neighborhood can be used as a representation of the dispersion P _con , that is;

Among them, K1 and K2 are all positive integers less than or equal to M. K1 may be equal to K2 or K1> K2, for example, K1 = M / 2 or M, and K2 = M / 4 or M.

It should be understood that the dispersion can also be expressed by other means such as the mean square error sum, the variance or standard deviation, and the correlation between pixels, without limitation. At the same time, the pixel information in the preset spatial neighborhood can also be represented by other physical quantities related to the pixel values in the spatial neighborhood besides the mean and the dispersion, without limitation.

S404. Determine an adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed.

It should be understood that the specific representation manner of the pixel information calculated in step S403 is consistent with step S404. Exemplarily, when only the pixel average value is used to determine the adjustment factor of the block to be processed, in step S403, only the pixel average value needs to be calculated, and the pixel dispersion does not need to be calculated.

In a feasible implementation manner, the pixel information is the average value, and determining the adjustment factor of the block to be processed according to the pixel information in a preset spatial neighborhood of the block to be processed includes: The average value and the first mapping relationship between the average value and the adjustment factor to determine the adjustment factor, wherein the first mapping relationship satisfies one or more of the following conditions: when the average value is less than a first threshold value, all The adjustment factor decreases as the average value increases; when the average value is greater than a second threshold value, the adjustment factor increases as the average value increases, wherein the first threshold value is less than or equal to The second threshold; when the average value is greater than or equal to the first threshold and less than or equal to the second threshold, the adjustment factor is a first preset constant.

Specifically, the pixel mean value P _avg adjustment factor to the first piecewise function f (P _avg) is calculated according to _1; wherein the pixel average value obtained from step P _avg S403.

The adjustment factor QC is determined by a first piecewise function f ₁ (P _avg ) with respect to the pixel mean value P _avg , that is, QC = f ₁ (P _avg ) ^β , where β> 0, such as β = 1 or 0.5. f ₁ (P _avg) is U-shaped function of the P _avg is, f ₁ (P _avg) satisfied when P _avg is less than the threshold value the first derivative of T1, f ₁ (P _avg) is less than 0, when P _avg is greater than the threshold value T2, f _The first derivative of ₁ (P _avg ) is greater than 0. When P _{avg is} between the thresholds T1 and T2, f ₁ (P _avg ) is equal to the constant C0; where T1 ≥ 0, T2 ≥ 0, T2 ≥ T1, and T1 is 0 , 60, 64, or 128, T2 is, for example, 0, 80, 128, or 170; C0 is a positive real number, for example, 0.5, 0.75, 1, 1.5, 8, 16, 256, or 1024. More specifically, the f ₁ (P _avg ) function is, for example,

Wherein η ₁ is a positive real number, such as η ₁ = 150 or 200.8; η ₂ is a positive real number, such as η ₂ = 425 or 485.5. f ₁ (P _avg ) function for example

Where η ₃ is a positive real number, such as η ₃ = 425 or 256 or 135.1.

It should be understood that the first mapping relationship may be an independent variable with the mean value as described above, and the adjustment factor is a first piecewise function f ₁ (P _avg ) of the dependent variable, or may be the mean and The preset correspondence between the adjustment factors, specifically, the preset correspondence between the mean and the adjustment factor can be solidified and coded at both ends. When the mean is obtained, a table lookup method can be used. To determine the corresponding adjustment factor. The table lookup method reduces the computational complexity and is more conducive to hardware implementation. The method of obtaining the adjustment factor through calculation can obtain more accurate results, and does not need to store the above-mentioned correspondence relationship table.

In a feasible implementation manner, the pixel information is the dispersion, and determining the adjustment factor of the block to be processed according to the pixel information in a preset spatial neighborhood of the block to be processed includes: The dispersion and a second mapping relationship between the dispersion and the adjustment factor to determine the adjustment factor, wherein the second mapping relationship satisfies one or more of the following conditions: when the dispersion is greater than a third At the threshold, the adjustment factor increases as the dispersion increases; when the dispersion is less than or equal to the third threshold, the adjustment factor is a second preset constant.

Specifically, the adjustment factor Dispersion P _con second piecewise function f (P _con) is calculated according to _2; wherein the dispersion obtained from step P _con S403.

The adjustment factor QC is determined by a second piecewise function f ₂ (P _con ) regarding the dispersion P _con , that is, QC = f ₂ (P _con ) ^γ , where γ> 0, for example, γ = 1 or 0.8. f ₂ (P _con ) is a monotonic function about P _con , f ₂ (P _con ) satisfies when (P _con ) ^{α is} less than the threshold T3, f ₂ (P _con ) is a constant C3, and when (P _con ) ^{α is} greater than or equal to At threshold T3, the first derivative of f ₂ (P _con ) is greater than zero. Among them, T3 ≧ 0, T3 is, for example, 0, 3, 5, or 10; α> 0, for example, α = 1/2 or 1; C3 is a positive real number, for example, 0.5, 0.8, 1, 16, 32, or 256. More specifically, the f ₂ (P _con ) function is, for example,

Where η ₄ is a positive real number, such as η ₄ = 10, 20, 35.5, 80, or 100.

It should be understood that the second mapping relationship may be the second piecewise function f ₂ (P _con ) using the dispersion as an independent variable and the adjustment factor as the dependent variable as described above, or may be the dispersion The preset correspondence between the degree and the adjustment factor. Specifically, the preset correspondence between the degree of dispersion and the adjustment factor can be solidified and coded at both ends. When the degree of dispersion is obtained, the The method of looking up a table determines the corresponding adjustment factor.

In a feasible implementation manner, the pixel information is the average and the dispersion, and the adjustment factor of the block to be processed is determined according to the pixel information in a preset spatial neighborhood of the block to be processed. Includes: determining a first parameter according to the mean and the first mapping relationship; determining a second parameter according to the dispersion and the second mapping relationship; and multiplying a product of the first parameter and the second parameter Or a weighted sum is used as the adjustment factor.

Specifically, the adjustment factor QC is jointly determined by a first piecewise function f ₁ (P _avg ) on the pixel mean value P _avg and a second piecewise function f ₂ (P _con ) on the dispersion P _con , for example QC = f ₁ (P _avg ) ^β · f ₂ (P _con ) ^γ , where β, γ> 0, such as β = 1, γ = 1, or β = 0.5, γ = 1.5, or β = 2, γ = 1; or, for example, QC = f ₁ (P _avg ) · k1 + f ₂ (P _con ) · k2, where k1 and k2 are positive real numbers, such as k1 = k2 = 0.5, or k1 = 0.25, k2 = 0.75, or k1 = 0.2, k2 = 0.7.

Incidentally, the above parameters T1, T2, T3, C0, C3, C4, η 1, η 2, η 3, η 4 can also be calculated according to the statistical characteristics of the adaptive video image is a predetermined constant , Can also be extracted from the video bitstream.

In a feasible implementation manner, after the product of the first parameter and the second parameter is used as the adjustment factor, the method further includes: weighting the adjustment factor to obtain an adjusted An adjustment factor; correspondingly, determining the adjustment of the block to be processed includes: using the adjusted adjustment factor as an adjustment factor of the block to be processed.

Specifically, the adjustment factors with respect to the pixel mean value QC of the first segment of P _avg function f ₁ (P _avg) and on the dispersion of a second segment P _con function f ₂ (P _con) and the weighting coefficient s Joint decision, for example, QC = (f ₁ (P _avg ) ^β · f ₂ (P _var ) ^γ · s + offset) ＞＞ shift, where offset and shift are preset constants, such as offset = 1 << (shift-1 ), Shift = 8 or 12 or 16. The weighting coefficient s may be obtained by parsing a Sequence Parameter Set (SPS), or may be obtained by parsing a slice header.

In a feasible implementation manner, after determining the adjustment factor of the block to be processed, the method further includes: updating the adjustment factor according to a quantization parameter of the block to be processed; correspondingly, the based on Adjusting the first residual by the adjustment factor to obtain the second residual of the block to be processed includes: adjusting the first residual based on the updated adjustment factor to obtain the block to be processed Second residual.

Specifically, the adjustment factor is adjusted in the following manner:

QC represents the adjustment factor, QP represents the quantization parameter, and N, M, and X are preset constants, for example, N = 256 or 128, M = 30 or 32, and X = 32 or 24.

S405. Adjust the first residual based on the adjustment factor to obtain a second residual of the block to be processed.

It should be understood that, generally, a video image includes a luminance component (Y) and a chrominance component (Cb, Cr, or U, V). Correspondingly, the block to be processed includes the luminance component and the chrominance component of the block to be processed, the first residual block includes the first luminance residual block and the first chrominance residual block, and the second residual block includes the second luminance residual Block and the second chroma residual block, wherein, in some embodiments, the chroma residual block can be further divided into a residual block of a Cb component and a residual block of a Cr component, or a residual block of a U component And V component residual blocks.

In a feasible implementation manner, this step includes: adjusting the first brightness residual only based on the adjustment factor to obtain a second brightness residual of the block to be processed. In this case, no adjustment is made to the chroma residual, that is, for the chroma component of the block to be processed, the second residual is the first residual.

In another feasible implementation manner, this step includes: adjusting the first chroma residual only based on the adjustment factor to obtain a second chroma residual of the block to be processed. In this case, no adjustment is made to the luminance residual, that is, for the luminance component of the block to be processed, the second residual is the first residual.

In another feasible implementation manner, the step includes: adjusting the first brightness residual based on the adjustment factor to obtain a second brightness residual of the block to be processed, and adjusting based on the adjustment factor. The first chroma residual to obtain a second chroma residual of the block to be processed. It should be understood that the adjustment factor for adjusting the luminance residual and the adjustment factor for adjusting the chrominance residual may be the same or different. The adjustment factor for adjusting the chrominance residual can be obtained by calculating the luminance pixel information of the preset spatial neighborhood of the block to be processed, or can be calculated by similar methods by calculating the chromaticity of the preset spatial neighborhood of the block to be processed. Obtaining pixel information, or obtaining luminance and chrominance pixel information comprehensively considering the preset spatial neighborhood of the block to be processed, is not limited.

The following describes the cases of luminance residuals and chrominance residuals:

The first residual block includes a first luminance residual block of a luminance component of the block to be processed, and a luminance residual pixel in the first luminance residual block and a pixel of a luminance component of the block to be processed are one A corresponding, corresponding, the second residual block includes a second luminance residual block of a luminance component of the block to be processed, and the first residual block is adjusted based on the adjustment factor to obtain the The second residual block of the block to be processed includes: adjusting the luminance residual pixels in the first luminance residual block based on the adjustment factor to obtain the luminance in the second luminance residual block of the block to be processed. Residual pixels. The luminance residual pixels in the second luminance residual block are obtained in the following manner:

Res2_Y (i) = (Res1_Y (i) × QC + offset_Y) ＞＞ shift_Y

Among them, QC represents the adjustment factor, Res1_Y (i) represents the i-th brightness residual pixel in the first luminance residual block, and Res2_Y (i) represents the i-th pixel in the second luminance residual block. Brightness residual pixels, offset_Y and shift_Y are preset constants, and i is a natural number. Exemplarily, shift = 8 or 10 or 12, and offset = 1 <((shift-1).

The first residual block includes a first chrominance residual block of a chrominance component of the block to be processed, and a chrominance residual pixel in the first chrominance residual block and a color of the block to be processed. The pixels of the degree component correspond one-to-one. Correspondingly, the second residual block includes a second chrominance residual block of the chrominance component of the block to be processed, and the first residual is adjusted based on the adjustment factor. A difference block to obtain a second residual block of the block to be processed, comprising: adjusting the chroma residual pixels in the first chroma residual block based on the adjustment factor to obtain the The chroma residual pixels in the second chroma residual block. The chroma residual pixels in the second chroma residual block are obtained in the following manner:

Res2_C (i) = (Res1_C (i) × QC + offset_C) ＞＞ shift_C

Among them, QC represents the adjustment factor, Res1_C (i) represents the i-th chroma residual pixel in the first chroma residual block, and Res2_C (i) represents the second chroma residual block. The i-th chroma residual pixel, offset_C and shift_C are preset constants, and i is a natural number. Exemplarily, shift = 8 or 10 or 12, and offset = 1 <((shift-1).

It should be understood that in this step, the first residual is subjected to scaling processing to obtain a second residual. In order to improve the calculation accuracy in the processing process, the first residual is used as an intermediate processing result, which can adopt a higher-precision bit width (also known as bit depth) in numerical form, and a higher-precision bit width Store the value of the first residual. For example, when the bit width of the pixel of the block to be processed is D, the first residual can be saved as D + E bits. For example, D can be 8, 10, or 12, and E can be Is 1, 2, 3, or 4. Generally, the bit width of the second residual is processed to be the same as the bit width of the pixels of the block to be processed. In this step, when obtaining the second residual, the accuracy of the bit width will be reduced. In combination with the examples in this paragraph, that is, the above-mentioned right shift_Y or shift_C, for example, includes an operation of right shifting E bits. The bit width accuracy of the residual pixels in the first residual block is higher than the bit width accuracy of the residual pixels in the second residual block. The luminance and chrominance components can be described as: the bit width accuracy of the luminance residual pixels in the first luminance residual block is higher than the bit width accuracy of the luminance residual pixels in the second luminance residual block. The bit width accuracy and the bit width accuracy of the chroma residual pixels in the first chroma residual block are higher than the bit width accuracy of the chroma residual pixels in the second chroma residual block. Obviously, when the chroma component is not adjusted, generally, the bit width of the first chroma residual is not stored with a higher precision bit width, and there is no step of reducing the bit width when obtaining the second chroma residual. .

S406. Add the residual pixels in the second residual and predicted pixels at corresponding positions in the block to be processed to obtain reconstructed pixels at the corresponding positions in the block to be processed.

The predicted pixels are generally generated by an intra prediction technique or an inter prediction technique. For typical intra-prediction and inter-prediction techniques, you can refer to the H.265 standard (Rec.ITU-T H.265 v4), pages 125 to 172, section 8.4 intra prediction and section 8.5 frames. An introduction to forecasting. JEM has also made a lot of improvements in intra prediction and inter prediction techniques. For details, see JVET-G1001-v1, pages 6 to 28, section 2.2 Intra prediction technology improvements, and Section 2.3 inter frames. The introduction of the improvement of prediction technology will not be repeated here. The embodiment of the present application does not limit what kind of prediction technology is used.

In addition, in some feasible implementation manners, after the residual pixels in the second residual and the predicted pixels at corresponding positions in the block to be processed are added, the added value is also limited to one interval. Within, for example, it is limited to the allowable value range of the pixels of the block to be processed, and correspondingly, the added value after the limit is used as the reconstructed pixel of the corresponding position in the block to be processed.

In some feasible implementation manners, after obtaining the reconstructed pixels at the corresponding positions in the block to be processed, it may further include performing a filtering process on the reconstructed pixels, such as a bilateral filtering process proposed in JEM. In some feasible implementation manners, whether a block to be processed needs to be filtered is determined by decoding syntax elements obtained.

The encoding process corresponding to the present invention is, for example: for a block to be encoded, an adjustment factor calculated according to pixels in the spatial neighborhood, using the reciprocal of the adjustment factor to scale the prediction residual of this encoding block, and transforming the scaled prediction residual. And quantization to obtain a quantized transform coefficient, and the quantized transform coefficient is encoded into a code stream by an entropy coding unit. It should be understood that, as mentioned above, in general, encoding and decoding are reversible processes. Therefore, when the decoding end uses the adjustment factor for residual processing, correspondingly, the encoding end uses the inverse of the adjustment factor for residual processing.

According to another encoding process corresponding to the present invention, for a block to be encoded, an adjustment factor calculated according to pixels in a spatial neighborhood is used, the adjustment step is used to scale the quantization step of this encoding block, and the prediction residual is transformed and used. The scaled quantization step size quantizes the transform coefficients to obtain the quantized transform coefficients. The quantized transform coefficients are encoded into a code stream by the entropy coding unit.

In the solution provided in the embodiment of the present application, at the decoding end, the spatial pixel information of the current block to be processed (ie, the block to be decoded and the transform block) is used to simulate the original pixel information corresponding to the current block to be processed. According to the spatial neighborhood pixel information, an adjustment factor for the current block to be processed (that is, a transform block) is adaptively derived, which reflects the intensity of the visual masking effect generated by the background area of the current block, and is based on the adaptively derived adjustment factor to Adjust the residual block corresponding to the current block to be processed to reduce the residual bit of the processing block with strong visual masking effect and increase the residual bit of the processing block with weak visual masking effect during video encoding or decoding. , Making the encoding of the actual residuals more in line with the human visual perception, thereby improving the encoding and decoding performance.

At the same time, the embodiments of the present application also show significant beneficial effects in terms of flow design.

First, make a brief introduction to pipeline design or pipeline design. The pipeline design is a method of systematically dividing the combinational logic, inserting registers between each part (hierarchical), and temporarily storing intermediate data. The purpose is to decompose a large operation into several small operations, and the time of each small operation is small, so the frequency can be increased, and the small operations can be performed in parallel, so the data throughput rate (improve the processing speed) can be improved. Generally, each small operation is called a pipeline stage.

In a typical pipeline design of a hardware decoder, steps S402 and S406 belong to different pipeline stages. For example, they can be called "inverse quantization and inverse transform pipeline stages" and "reconstruction pipeline stages". There is a buffer between the two pipeline stages. The "inverse quantization inverse transform pipeline stage" does not depend on the data produced by the "reconstruction pipeline stage".

In the embodiment of the present application, since the adjustment processing of the residual block can be completed in the reconstruction pipeline stage, as shown in FIG. 6, there is no disruption to the original pipeline design, and due to the calculation of the adjustment factor and the residual error Scaling processing complexity is relatively low, it will not significantly increase the complexity of the reconstruction pipeline stage, improve the parallelism between decoder modules, and is conducive to the implementation of high-performance decoders.

FIG. 7 schematically illustrates a block diagram of a residual acquisition device according to an embodiment of the present application, including:

An apparatus 700 for acquiring residuals in video decoding includes: an analysis module 701 for analyzing a bitstream to obtain transform coefficients of a block to be processed; and a conversion module 702 for converting the transform coefficients into the to-be-processed The first residual of the block;

A calculation module 703 is configured to determine an adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed; an adjustment module 704 is configured to adjust the first residual based on the adjustment factor To obtain a second residual of the block to be processed.

In a feasible implementation manner, the calculation module 703 is further configured to calculate pixel information in a preset spatial neighborhood of the block to be processed based on pixel values in a preset spatial neighborhood of the block to be processed.

In a feasible implementation manner, the calculation module 703 is specifically configured to: obtain one or more pixel sets in the preset spatial neighborhood; calculate an average value and / or of pixels in the one or more pixel sets. The degree of dispersion to obtain pixel information within the preset spatial neighborhood.

In a feasible implementation manner, the dispersion includes: a mean square error sum, an average absolute error sum, a variance or a standard deviation.

In a feasible implementation manner, the calculation module 703 is further configured to determine that all pixels in each pixel set in the one or more pixel sets have completed reconstruction.

In a feasible implementation manner, the pixel information is the average value, and the calculation module 703 is specifically configured to determine the adjustment according to the average value and a first mapping relationship between the average value and the adjustment factor. A factor, wherein the first mapping relationship satisfies one or more of the following conditions: when the average value is less than a first threshold value, the adjustment factor decreases as the average value increases; when the average value is greater than a second value When the threshold value is increased, the adjustment factor increases as the average value increases, wherein the first threshold value is less than or equal to the second threshold value; when the average value is greater than or equal to the first threshold value and less than or When equal to the second threshold, the adjustment factor is a first preset constant.

In a feasible implementation manner, the calculation module 703 is specifically configured to determine the adjustment factor according to the dispersion and a second mapping relationship between the dispersion and the adjustment factor, wherein the first The two mapping relationships satisfy one or more of the following conditions: when the dispersion is greater than a third threshold, the adjustment factor increases as the dispersion increases; when the dispersion is less than or equal to the third When the threshold is set, the adjustment factor is a second preset constant.

In a feasible implementation manner, the pixel information is the mean and the dispersion, and the calculation module 703 is specifically configured to: determine a first parameter according to the mean and the first mapping relationship; A second parameter is determined by the dispersion and the second mapping relationship; a product or a weighted sum of the first parameter and the second parameter is used as the adjustment factor.

In a feasible implementation manner, the calculation module 703 is further configured to: perform weight adjustment on the adjustment factors to obtain an adjusted adjustment factor; and use the adjusted adjustment factor as a value of the block to be processed. Regulatory factors.

In a feasible implementation manner, the calculation module 703 is further configured to: update the adjustment factor according to the quantization parameter of the block to be processed; correspondingly, the adjustment module 704 is specifically configured to: The updated adjustment factor adjusts the first residual to obtain a second residual of the block to be processed.

In a feasible implementation manner, the adjustment factor is adjusted in the following manner:

In a feasible implementation manner, the number of the obtained transformation coefficients of the block to be processed is the same as the number of pixels of the block to be processed, and the conversion module 702 is further configured to: The transform coefficients of the blocks are arranged into transform coefficient blocks according to a preset positional relationship; the transform coefficient blocks are converted into a first residual block of the block to be processed; correspondingly, the adjustment module 704 is specifically configured to: based on the An adjustment factor adjusts the first residual block to obtain a second residual block of the block to be processed.

In a feasible implementation manner, the first residual block includes a first luminance residual block of a luminance component of the block to be processed, and the luminance residual pixels in the first luminance residual block are related to the luminance residual pixels. The pixels of the luminance component of the block to be processed correspond one-to-one. Correspondingly, the second residual block includes a second luminance residual block of the luminance component of the block to be processed. The adjustment module 704 is specifically configured to: The adjustment factor adjusts the luminance residual pixels in the first luminance residual block to obtain the luminance residual pixels in the second luminance residual block of the block to be processed.

In a feasible implementation manner, the luminance residual pixels in the second luminance residual block are obtained in the following manner:

Res2_Y (i) = (Res1_Y (i) × QC + offset_Y) ＞＞ shift_Y

In a feasible implementation manner, the first residual block includes a first chrominance residual block of a chrominance component of the block to be processed, and a chrominance residual in the first chrominance residual block. The pixels are in one-to-one correspondence with the pixels of the chrominance component of the block to be processed. Correspondingly, the second residual block includes a second chrominance residual block of the chrominance component of the block to be processed. The adjustment module 704 is specifically configured to adjust the chroma residual pixels in the first chroma residual block based on the adjustment factor to obtain the chroma residual pixels in the second chroma residual block of the block to be processed. .

In a feasible implementation manner, the chroma residual pixels in the second chroma residual block are obtained in the following manner:

Res2_C (i) = (Res1_C (i) × QC + offset_C) ＞＞ shift_C

In a feasible implementation manner, the bit width accuracy of the luminance residual pixels in the first luminance residual block is higher than the bit width accuracy of the luminance residual pixels in the second luminance residual block.

In a feasible implementation manner, the bit width accuracy of the chroma residual pixel in the first chroma residual block is higher than the bit width of the chroma residual pixel in the second chroma residual block. Bit width precision.

In a feasible implementation manner, the conversion module 702 is specifically configured to: inverse quantize each transform coefficient in the transform coefficient block to obtain an inverse quantized transform coefficient block; and perform inverse quantization The inverse transform of the transform coefficient block is performed to obtain a first residual block of the block to be processed.

In a feasible implementation manner, the apparatus 700 further includes: a reconstruction unit 705, configured to add a residual pixel in the second residual and a predicted pixel at a corresponding position in the block to be processed, To obtain a reconstructed pixel at the corresponding position in the block to be processed.

In the solution provided in the embodiment of the present application, at the decoding end, the spatial pixel information of the current block to be processed (ie, the block to be decoded and the transform block) is used to simulate the original pixel information corresponding to the current block to be processed. According to the spatial neighborhood pixel information, an adjustment factor for the current block to be processed (that is, a transform block) is adaptively derived, which reflects the intensity of the visual masking effect generated by the background area of the current block, and is based on the adaptively derived adjustment factor to Adjust the residual block corresponding to the current block to be processed to reduce the residual bit of the processing block with strong visual masking effect and increase the residual bit of the processing block with weak visual masking effect during video encoding or decoding. , Making the encoding of the actual residuals more in line with human visual perception, thereby improving the performance of encoding and decoding.

At the same time, in the embodiment of the present application, since the adjustment processing of the residual block can be completed in the reconstruction pipeline stage, there is no loop-breaking design of the original pipeline, and because of the calculation of the adjustment factor and the complexity of the scaling processing of the residual Both are relatively low, which will not significantly increase the complexity of the reconstruction pipeline stage, improve the parallelism between decoder modules, and facilitate the implementation of high-performance decoders.

FIG. 8 is a schematic block diagram of a video decoding device according to an embodiment of the present application. The device 800 may be applied to an encoding side or a decoding side. The device 800 includes a processor 801 and a memory 802, and the processor 801 and the memory 802 are connected (for example, connected to each other through the bus 804). In a possible implementation manner, the device 800 may further include a transceiver 803, and the transceiver 803 is connected to process The receiver 801 and the memory 802 are configured to receive / transmit data.

The memory 802 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or A portable read-only memory (CD-ROM). The memory 802 is used to store related program code and video data.

The processor 801 may be one or more central processing units (CPUs). When the processor 801 is a CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor 801 is configured to read the program code stored in the memory 802 and execute operations of the implementation manner corresponding to FIG. 4 and various feasible implementation manners thereof.

Exemplarily, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores instructions, and when the computer-readable storage medium is run on a computer, the computer executes the implementation corresponding to FIG. 4 and the implementation thereof Operation of various feasible embodiments.

Exemplarily, the embodiment of the present application further provides a computer program product containing instructions, which when executed on a computer, causes the computer to execute the operations corresponding to the implementation manner shown in FIG. 4 and various feasible implementation manners thereof.

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in combination with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A professional technician can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present invention are wholly or partially generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a network site, computer, server, or data center. Transmission to another network site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, and may also be a data storage device such as a server, a data center, or the like that includes one or more available medium integration. The usable medium may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape, etc.), an optical medium (such as a DVD, etc.), or a semiconductor medium (such as a solid state hard disk), and the like.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not described in detail in one embodiment, reference may be made to related descriptions in other embodiments.

The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed by the present invention. It should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A method for acquiring residuals in video decoding, which is characterized by:

Parse the code stream to obtain the transform coefficients of the block to be processed;

Converting the transform coefficient into a first residual of the block to be processed;

Determining an adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed;

Adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed.
The method according to claim 1, before the determining the adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed, further comprising:

Based on pixel values in a preset spatial neighborhood of the block to be processed, pixel information in a preset spatial neighborhood of the block to be processed is calculated.
The method according to claim 2, wherein the calculating pixel information in a preset spatial neighborhood of the block to be processed comprises:

Acquiring one or more pixel sets in the preset spatial neighborhood;

Calculate the mean and / or dispersion of pixels in the one or more pixel sets to obtain pixel information in the preset spatial neighborhood.
The method according to claim 3, wherein the dispersion comprises: a mean square error sum, an average absolute error sum, a variance or a standard deviation.
The method according to claim 3 or 4, before the acquiring one or more pixel sets in the preset spatial neighborhood, further comprising:

It is determined that all pixels in each pixel set in the one or more pixel sets have completed reconstruction.
The method according to any one of claims 3 to 5, wherein the pixel information is the average value, and the to-be-processed is determined according to the pixel information in a preset spatial neighborhood of the to-be-processed block. Regulatory factors for blocks, including:

Determine the adjustment factor according to the mean value and a first mapping relationship between the mean value and the adjustment factor, wherein the first mapping relationship satisfies one or more of the following conditions:

When the average value is less than the first threshold, the adjustment factor decreases as the average value increases;

When the average value is greater than a second threshold value, the adjustment factor increases as the average value increases, wherein the first threshold value is less than or equal to the second threshold value;

When the average value is greater than or equal to the first threshold value and less than or equal to the second threshold value, the adjustment factor is a first preset constant.
The method according to any one of claims 3 to 6, wherein the pixel information is the dispersion, and the determining is performed according to the pixel information in a preset spatial neighborhood of the block to be processed. Regulatory factors for processing blocks, including:

Determining the adjustment factor according to the dispersion and a second mapping relationship between the dispersion and the adjustment factor, wherein the second mapping relationship satisfies one or more of the following conditions:

When the dispersion is greater than a third threshold, the adjustment factor increases as the dispersion increases;

When the dispersion is less than or equal to the third threshold, the adjustment factor is a second preset constant.
The method according to claim 7, wherein the pixel information is the mean and the dispersion, and the to-be-processed is determined based on the pixel information in a preset spatial neighborhood of the to-be-processed block. Regulators of the block, including:

Determining a first parameter according to the mean and the first mapping relationship;

Determining a second parameter according to the dispersion and the second mapping relationship;

A product or a weighted sum of the first parameter and the second parameter is used as the adjustment factor.
The method according to claim 8, wherein after the using a product or a weighted sum of the first parameter and the second parameter as the adjustment factor, further comprising:

Weighting the adjustment factors to obtain adjusted adjustment factors;

Correspondingly, determining the adjustment factor of the block to be processed includes:

And using the adjusted adjustment factor as the adjustment factor of the block to be processed.
The method according to any one of claims 1 to 9, wherein after determining the adjustment factor of the block to be processed, further comprising:

Update the adjustment factor according to the quantization parameter of the block to be processed;

Correspondingly, the adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed includes:

Adjusting the first residual based on the updated adjustment factor to obtain a second residual of the block to be processed.
The method according to claim 10, wherein the adjustment factor is adjusted in the following manner:

QC represents the adjustment factor, QP represents the quantization parameter, and N, M, and X are preset constants.
The method according to any one of claims 1 to 11, wherein the number of acquired transformation coefficients of the block to be processed is the same as the number of pixels of the block to be processed, and After processing the transform coefficients of the block, the method further includes: arranging the transform coefficients of the block to be processed into transform coefficient blocks according to a preset position relationship;

Correspondingly, converting the transform coefficient into a first residual of the block to be processed includes: converting the transform coefficient block into a first residual block of the block to be processed;

Correspondingly, adjusting the first residual based on the adjustment factor to obtain the second residual of the block to be processed includes: adjusting the first residual block based on the adjustment factor to obtain the The second residual block of the block to be processed is described.
The method according to claim 12, wherein the first residual block comprises a first luminance residual block of a luminance component of the block to be processed, and a luminance residual in the first luminance residual block. The pixels correspond to the pixels of the luminance component of the block to be processed one by one. Correspondingly, the second residual block includes a second luminance residual block of the luminance component of the block to be processed. Adjusting the first residual block to obtain a second residual block of the block to be processed includes:

Adjusting the luminance residual pixels in the first luminance residual block based on the adjustment factor to obtain the luminance residual pixels in the second luminance residual block of the block to be processed.
The method according to claim 13, wherein the luminance residual pixels in the second luminance residual block are obtained in the following manner:

Res2_Y (i) = (Res1_Y (i) × QC + offset_Y) ＞＞ shift_Y

Among them, QC represents the adjustment factor, Res1_Y (i) represents the i-th brightness residual pixel in the first luminance residual block, and Res2_Y (i) represents the i-th pixel in the second luminance residual block. Brightness residual pixels, offset_Y and shift_Y are preset constants, and i is a natural number.
The method according to claim 12, wherein the first residual block comprises a first chrominance residual block of a chrominance component of the block to be processed, The chroma residual pixels correspond to the pixels of the chroma component of the block to be processed, correspondingly, the second residual block includes a second chroma residual block of the chroma component of the block to be processed, The step of adjusting the first residual block based on the adjustment factor to obtain a second residual block of the block to be processed includes:

Adjusting the chroma residual pixels in the first chroma residual block based on the adjustment factor to obtain the chroma residual pixels in the second chroma residual block of the block to be processed.
The method according to claim 15, wherein the chroma residual pixels in the second chroma residual block are obtained in the following manner:

Res2_C (i) = (Res1_C (i) × QC + offset_C) ＞＞ shift_C

Among them, QC represents the adjustment factor, Res1_C (i) represents the i-th chroma residual pixel in the first chroma residual block, and Res2_C (i) represents the second chroma residual block. The i-th chroma residual pixel, offset_C and shift_C are preset constants, and i is a natural number.
The method according to claim 13 or 14, wherein the bit width accuracy of the luminance residual pixels in the first luminance residual block is higher than that of the luminance residual pixels in the second luminance residual block. The bit width precision.
The method according to any one of claims 15 to 17, wherein a bit width accuracy of a chroma residual pixel in the first chroma residual block is higher than that of the second chroma residual block The bit-width accuracy of the chroma residual pixels in.
The method according to any one of claims 12 to 18, wherein the converting the transform coefficient block into a first residual block of the block to be processed comprises:

Performing inverse quantization on each transform coefficient in the transform coefficient block to obtain an inverse quantized transform coefficient block;

Performing inverse transform on the inverse-quantized transform coefficient block to obtain a first residual block of the block to be processed.
The method according to any one of claims 1 to 19, wherein after the adjusting the first residual based on the adjustment factor to obtain a second residual of the block to be processed, further comprising: :

Adding the residual pixel in the second residual and the predicted pixel at a corresponding position in the block to be processed to obtain a reconstructed pixel at the corresponding position in the block to be processed.
A device for acquiring residuals in video decoding, which includes:

A parsing module for parsing a code stream to obtain a transform coefficient of a block to be processed;

A conversion module, configured to convert the transform coefficient into a first residual of the block to be processed;

A calculation module, configured to determine an adjustment factor of the block to be processed according to pixel information in a preset spatial neighborhood of the block to be processed;

An adjustment module, configured to adjust the first residual based on the adjustment factor to obtain a second residual of the block to be processed.
The apparatus according to claim 21, wherein the calculation module is further configured to:

Based on pixel values in a preset spatial neighborhood of the block to be processed, pixel information in a preset spatial neighborhood of the block to be processed is calculated.
The apparatus according to claim 22, wherein the calculation module is specifically configured to:

Acquiring one or more pixel sets in the preset spatial neighborhood;

Calculate the mean and / or dispersion of pixels in the one or more pixel sets to obtain pixel information in the preset spatial neighborhood.
The device according to claim 23, wherein the dispersion comprises: a mean square error sum, an average absolute error sum, a variance or a standard deviation.
The device according to claim 23 or 24, wherein the calculation module is further configured to:

It is determined that all pixels in each pixel set in the one or more pixel sets have completed reconstruction.
The device according to any one of claims 23 to 25, wherein the pixel information is the average value, and the calculation module is specifically configured to:

Determine the adjustment factor according to the mean value and a first mapping relationship between the mean value and the adjustment factor, wherein the first mapping relationship satisfies one or more of the following conditions:

When the average value is less than the first threshold, the adjustment factor decreases as the average value increases;

When the average value is greater than a second threshold value, the adjustment factor increases as the average value increases, wherein the first threshold value is less than or equal to the second threshold value;

When the average value is greater than or equal to the first threshold value and less than or equal to the second threshold value, the adjustment factor is a first preset constant.
The device according to any one of claims 23 to 26, wherein the calculation module is specifically configured to:

Determining the adjustment factor according to the dispersion and a second mapping relationship between the dispersion and the adjustment factor, wherein the second mapping relationship satisfies one or more of the following conditions:

When the dispersion is greater than a third threshold, the adjustment factor increases as the dispersion increases;

When the dispersion is less than or equal to the third threshold, the adjustment factor is a second preset constant.
The device according to claim 27, wherein the pixel information is the mean and the dispersion, and the calculation module is specifically configured to:

Determining a first parameter according to the mean and the first mapping relationship;

Determining a second parameter according to the dispersion and the second mapping relationship;

A product or a weighted sum of the first parameter and the second parameter is used as the adjustment factor.
The apparatus according to claim 28, wherein the calculation module is further configured to:

Weighting the adjustment factors to obtain adjusted adjustment factors;

And using the adjusted adjustment factor as the adjustment factor of the block to be processed.
The device according to any one of claims 21 to 29, wherein the calculation module is further configured to:

Update the adjustment factor according to the quantization parameter of the block to be processed;

Correspondingly, the adjustment module is specifically configured to:

Adjusting the first residual based on the updated adjustment factor to obtain a second residual of the block to be processed.
The device according to claim 30, wherein the adjustment factor is adjusted in the following manner:

QC represents the adjustment factor, QP represents the quantization parameter, and N, M, and X are preset constants.
The apparatus according to any one of claims 21 to 31, wherein the number of acquired transformation coefficients of the block to be processed is the same as the number of pixels of the block to be processed, and the conversion module further Configured to arrange the transform coefficients of the block to be processed into transform coefficient blocks according to a preset position relationship;

Converting the transform coefficient block into a first residual block of the block to be processed;

Correspondingly, the adjustment module is specifically configured to adjust the first residual block based on the adjustment factor to obtain a second residual block of the block to be processed.
The apparatus according to claim 32, wherein the first residual block comprises a first luminance residual block of a luminance component of the block to be processed, and a luminance residual in the first luminance residual block. The pixels correspond to the pixels of the luminance component of the block to be processed in a one-to-one correspondence. Correspondingly, the second residual block includes a second luminance residual block of the luminance component of the block to be processed. The adjustment module is specifically configured to: :

Adjusting the luminance residual pixels in the first luminance residual block based on the adjustment factor to obtain the luminance residual pixels in the second luminance residual block of the block to be processed.
The device according to claim 33, wherein the luminance residual pixels in the second luminance residual block are obtained in the following manner:

Res2_Y (i) = (Res1_Y (i) × QC + offset_Y) ＞＞ shift_Y

Among them, QC represents the adjustment factor, Res1_Y (i) represents the i-th brightness residual pixel in the first luminance residual block, and Res2_Y (i) represents the i-th pixel in the second luminance residual block. Brightness residual pixels, offset_Y and shift_Y are preset constants, and i is a natural number.
The apparatus according to claim 32, wherein the first residual block comprises a first chrominance residual block of a chrominance component of the block to be processed, The chroma residual pixels correspond to the pixels of the chroma component of the block to be processed one by one. Correspondingly, the second residual block includes a second chroma residual block of the chroma component of the block to be processed. The adjustment module is specifically configured to:

Adjusting the chroma residual pixels in the first chroma residual block based on the adjustment factor to obtain the chroma residual pixels in the second chroma residual block of the block to be processed.
The device according to claim 35, wherein the chroma residual pixels in the second chroma residual block are obtained in the following manner:

Res2_C (i) = (Res1_C (i) × QC + offset_C) ＞＞ shift_C

Among them, QC represents the adjustment factor, Res1_C (i) represents the i-th chroma residual pixel in the first chroma residual block, and Res2_C (i) represents the second chroma residual block. The i-th chroma residual pixel, offset_C and shift_C are preset constants, and i is a natural number.
The device according to claim 33 or 34, wherein a bit width accuracy of the luminance residual pixels in the first luminance residual block is higher than that of the luminance residual pixels in the second luminance residual block The bit width precision.
The device according to any one of claims 35 to 37, wherein a bit width accuracy of a chroma residual pixel in the first chroma residual block is higher than that of the second chroma residual block The bit-width accuracy of the chroma residual pixels in.
The device according to any one of claims 32 to 38, wherein the conversion module is specifically configured to:

Performing inverse quantization on each transform coefficient in the transform coefficient block to obtain an inverse quantized transform coefficient block;

Performing inverse transform on the inverse-quantized transform coefficient block to obtain a first residual block of the block to be processed.
The device according to any one of claims 21 to 39, further comprising:

A reconstruction unit, configured to add a residual pixel in the second residual and a predicted pixel at a corresponding position in the block to be processed to obtain a reconstructed pixel at the corresponding position in the block to be processed.