CN114079782A

CN114079782A - Video image reconstruction method and device, computer equipment and storage medium

Info

Publication number: CN114079782A
Application number: CN202010844965.4A
Authority: CN
Inventors: 王海鑫; 王力强; 许晓中; 刘杉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2022-02-22

Abstract

The application provides a video image reconstruction method, a video image reconstruction device, computer equipment and a storage medium, and relates to the technical field of video coding and decoding. The method comprises the following steps: obtaining residual coding of a coded target coding unit in response to coding and decoding in a string prediction mode; the residual coding is obtained by coding the residual coefficient of the target coding block; decoding residual coding of the target coding unit to obtain a residual coefficient of the target coding unit; acquiring a residual signal of the target coding unit based on the residual coefficient of the target coding unit; and obtaining a reconstructed signal of the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit. The scheme carries out complete residual signal processing on the string prediction, improves the quality of the reconstructed signal of the coding unit based on the string prediction mode, and further improves the coding and decoding performance.

Description

Video image reconstruction method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of video coding and decoding, in particular to an image reconstruction method, an image reconstruction device, computer equipment and a storage medium.

Background

In the current Video compression technology, such as VVC (universal Video Coding) and AVS3(Audio Video Coding Standard 3), a Coding and decoding method of string prediction is introduced.

In the related art, for a coding unit of string prediction, a prediction signal of the coding unit directly consists of searched reference pixels, which is equivalent to directly using the prediction signal as a reconstructed signal, and a certain information loss usually occurs during the coding process of a coding unit, so that a certain difference exists between the prediction signal and an original signal, thereby affecting the performance of coding and decoding.

Disclosure of Invention

The embodiment of the application provides a video image reconstruction method, a video image reconstruction device, computer equipment and a storage medium, which can be used for reconstructing an image by combining a residual signal and a prediction signal of an encoded coding unit when encoding and decoding are carried out in a serial prediction mode, so that the encoding and decoding performance in the serial prediction mode is improved. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a video image reconstruction method, including:

obtaining residual coding of a coded target coding unit in response to coding and decoding in a string prediction mode; the residual coding is obtained by coding the residual coefficient of the target coding block;

decoding residual coding of the target coding unit to obtain a residual coefficient of the target coding unit;

acquiring a residual signal of the target coding unit based on the residual coefficient of the target coding unit;

and obtaining a reconstructed signal of the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit.

According to an aspect of an embodiment of the present application, there is provided a video image encoding method, including:

acquiring an original signal of an uncoded target coding unit;

based on a reference signal, carrying out prediction-based on the original signal of the target coding unit in a serial prediction mode to obtain a prediction signal of the target coding unit and a residual signal of the target coding unit;

obtaining a residual coding of the target coding unit based on a residual signal of the target coding unit;

and adding motion information corresponding to the prediction signal of the target coding unit and residual coding of the target coding unit to the coded video code stream.

According to an aspect of the embodiments of the present application, there is provided a video image reconstruction apparatus, including:

a residual coding acquisition module, configured to respond to coding and decoding performed in a string prediction manner, and acquire a residual coding of a coded target coding unit; the residual coding is obtained by coding the residual coefficient of the target coding block;

a coefficient decoding module, configured to decode residual coding of the target coding unit to obtain a residual coefficient of the target coding unit;

a residual signal obtaining module, configured to obtain a residual signal of the target coding unit based on a residual coefficient of the target coding unit;

and a signal reconstruction module, configured to obtain a reconstructed signal of the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit.

According to an aspect of embodiments of the present application, there is provided a computer device, comprising a processor and a memory, wherein at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the above-mentioned video image reconstruction method.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the above-mentioned video image reconstruction method.

In yet another aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the above-mentioned video image reconstruction method.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

when the encoder/decoder performs encoding and decoding through a string prediction mode, residual error information is introduced into a string prediction process, and complete residual error signal processing is performed on string prediction, so that the quality of reconstructed signals of an encoding unit based on the string prediction mode is improved, and encoding and decoding performance is improved.

Drawings

FIG. 1 is a basic flow diagram of a video encoding process as exemplarily shown herein;

FIG. 2 is a diagram illustrating inter prediction modes according to an embodiment of the present application;

FIG. 3 is a diagram illustrating candidate motion vectors according to an embodiment of the present application;

FIG. 4 is a diagram of an intra block copy mode, as provided by one embodiment of the present application;

FIG. 5 is a diagram illustrating an intra-string copy mode according to an embodiment of the present application;

FIG. 6 is a simplified block diagram of a communication system provided by one embodiment of the present application;

FIG. 7 is a schematic diagram of the placement of a video encoder and a video decoder in a streaming environment as exemplary illustrated herein;

FIG. 8 is a flowchart of a video image reconstruction method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of coefficient decoding according to the embodiment shown in FIG. 8;

FIG. 10 is a schematic diagram of an image reconstruction framework in a coding/decoding process according to an embodiment of the present application;

fig. 11 is a block diagram of a video image reconstruction apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before describing the embodiments of the present application, a brief description of the video encoding technique will be provided with reference to fig. 1. Fig. 1 illustrates a basic flow diagram of a video encoding process.

A video signal refers to a sequence of images comprising a plurality of frames. A frame (frame) is a representation of spatial information of a video signal. Taking the YUV mode as an example, one frame includes one luminance sample matrix (Y) and two chrominance sample matrices (Cb and Cr). From the viewpoint of the manner of acquiring the video signal, the method can be divided into two manners, that is, a manner shot by a camera and a manner generated by a computer. Due to the difference of statistical characteristics, the corresponding compression encoding modes may be different.

In some mainstream Video Coding technologies, such as h.265/HEVC (High efficiency Video Coding), h.266/VVC (universal Video Coding) Standard, and AVS (Audio Video Coding Standard) (such as AVS3), a hybrid Coding framework is adopted to perform a series of operations and processes on an input original Video signal as follows:

1. block Partition Structure (Block Partition Structure): the input image is divided into several non-overlapping processing units, each of which will perform a similar compression operation. This processing Unit is called a CTU (Coding Tree Unit), or LCU (Large Coding Unit). The CTU can continue to perform finer partitioning further down to obtain one or more basic Coding units, called CU (Coding Unit). Each CU is the most basic element in an encoding link. Described below are various possible encoding schemes for each CU.

2. Predictive Coding (Predictive Coding): the method comprises the modes of intra-frame prediction, inter-frame prediction and the like, and residual video signals are obtained after the original video signals are predicted by the selected reconstructed video signals. The encoding side needs to decide for the current CU the most suitable one among the many possible predictive coding modes and inform the decoding side. The intra-frame prediction means that the predicted signal comes from an already encoded and reconstructed region in the same image. Inter-prediction means that the predicted signal is from a picture (called a reference picture) that has already been coded and is different from the current picture.

3. Transform coding and Quantization (Transform & Quantization): the residual video signal is subjected to Transform operations such as DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), etc., to convert the signal into a Transform domain, which is referred to as Transform coefficients. In the signal in the transform domain, a lossy quantization operation is further performed to lose certain information, so that the quantized signal is favorable for compressed representation. In some video coding standards, there may be more than one transform mode that can be selected, so the encoding side also needs to select one of the transforms for the current CU and inform the decoding side. The degree of refinement of the quantization is generally determined by the quantization parameter. QP (Quantization Parameter) values are larger, and coefficients representing a larger value range are quantized into the same output, so that larger distortion and lower code rate are generally brought; conversely, the QP value is smaller, and the coefficients representing a smaller value range will be quantized to the same output, thus usually causing less distortion and corresponding to a higher code rate.

4. Entropy Coding (Entropy Coding) or statistical Coding: and (3) carrying out statistical compression coding on the quantized transform domain signals according to the frequency of each value, and finally outputting a compressed code stream of binarization (0 or 1). Meanwhile, the encoding generates other information, such as the selected mode, motion vector, etc., which also needs to be entropy encoded to reduce the code rate. The statistical coding is a lossless coding mode, and can effectively reduce the code rate required by expressing the same signal. Common statistical Coding methods include Variable Length Coding (VLC) or context-based Binary Arithmetic Coding (CABAC).

5. Loop Filtering (Loop Filtering): the coded image is subjected to operations of inverse quantization, inverse transformation and prediction compensation (the operations 2 to 4 are reversed), and a reconstructed decoded image can be obtained. Compared with the original image, the reconstructed image has a distortion (distortion) due to the difference between partial information and the original image due to the quantization effect. The distortion degree generated by quantization can be effectively reduced by performing filtering operation on the reconstructed image, such as deblocking (deblocking), SAO (Sample Adaptive Offset), ALF (Adaptive Lattice Filter), or other filters. Since these filtered reconstructed pictures are to be used as reference for subsequent coded pictures for prediction of future signals, the above-mentioned filtering operation is also referred to as loop filtering, and filtering operation within the coding loop.

According to the above coding process, at the decoding end, after the decoder obtains the compressed code stream for each CU, the decoder performs entropy decoding to obtain various mode information and quantized transform coefficients. And carrying out inverse quantization and inverse transformation on each coefficient to obtain a residual signal. On the other hand, the prediction signal corresponding to the CU is obtained from the known coding mode information, and the prediction signal and the CU are added to obtain a reconstructed signal. Finally, the reconstructed value of the decoded image needs to undergo loop filtering operation to generate a final output signal.

Some mainstream video coding standards, such as HEVC, VVC, AVS3, etc., adopt a hybrid coding framework based on blocks. The original video data are divided into a series of coding blocks, and the compression of the video data is realized by combining video coding methods such as prediction, transformation, entropy coding and the like. Motion compensation is a type of prediction method commonly used in video coding, and the motion compensation derives a prediction value of a current coding block from a coded area based on the redundancy characteristic of video content in a time domain or a space domain. Such prediction methods include: inter prediction, intra block copy prediction, intra string copy prediction, etc., which may be used alone or in combination in a particular coding implementation. For coding blocks using these prediction methods, it is generally necessary to encode, either explicitly or implicitly in the code stream, one or more two-dimensional displacement vectors indicating the displacement of the current block (or of a co-located block of the current block) with respect to its reference block or blocks.

It should be noted that the displacement vector may have different names in different prediction modes and different implementations, and is described herein in the following manner: 1) the displacement Vector in the inter prediction mode is called Motion Vector (MV for short); 2) a displacement Vector in an IBC (Intra Block Copy) prediction mode is called a Block Vector (BV); 3) the displacement Vector in the ISC (Intra String Copy) prediction mode is called a String Vector (SV). Intra-frame string replication is also referred to as "string prediction" or "string matching," etc.

MV refers to a displacement vector for inter prediction mode, pointing from the current picture to a reference picture, whose value is the coordinate offset between the current block and the reference block, where the current block and the reference block are in two different pictures. In the inter-frame prediction mode, motion vector prediction can be introduced, a prediction motion vector corresponding to the current block is obtained by predicting the motion vector of the current block, and the difference value between the prediction motion vector corresponding to the current block and the actual motion vector is coded and transmitted. In the embodiment of the present application, predicting a motion vector refers to obtaining a prediction value of a motion vector of a current block by a motion vector prediction technique.

BV refers to a displacement vector for IBC prediction mode, whose value is the coordinate offset between the current block and the reference block, both in the current picture. In the IBC prediction mode, block vector prediction may be introduced, a prediction block vector corresponding to the current block is obtained by predicting the block vector of the current block, and a difference between the prediction block vector corresponding to the current block and the actual block vector is encoded and transmitted. In the embodiment of the present application, the prediction block vector refers to a prediction value of a block vector of the current block obtained by a block vector prediction technique.

SV is a displacement vector for ISC prediction mode, and its value is the coordinate offset between the current string and the reference string, both in the current picture. In the ISC prediction mode, string vector prediction can be introduced, a predicted string vector corresponding to the current string is obtained by predicting the string vector of the current string, and the difference value between the predicted string vector corresponding to the current string and the actual string vector is coded and transmitted. In the embodiment of the present application, the predicted string vector is a predicted value of a string vector of a current string obtained by a string vector prediction technique.

Several different prediction modes are described below:

one, inter prediction mode

As shown in fig. 2, inter-frame prediction uses the correlation of the video time domain, and uses the pixels of the neighboring coded pictures to predict the pixels of the current picture, so as to achieve the purpose of effectively removing the video time domain redundancy, and effectively save the bits of the coding residual data. Wherein, P is the current frame, Pr is the reference frame, B is the current block to be coded, and Br is the reference block of B. The coordinate positions of B 'and B in the image are the same, the coordinate of Br is (xr, yr), and the coordinate of B' is (x, y). The displacement between the current block to be coded and its reference block, called Motion Vector (MV), is:

MV＝(xr-x,yr-y)。

the bits required to encode MVs can be further reduced by using MV prediction techniques, considering that temporal or spatial neighboring blocks have strong correlation. In h.265/HEVC, inter Prediction includes two MV Prediction techniques, Merge and AMVP (Advanced Motion Vector Prediction).

The Merge mode establishes an MV candidate list for the current PU (Prediction Unit), where there are 5 candidate MVs (and their corresponding reference pictures). And traversing the 5 candidate MVs, and selecting the optimal MV with the minimum rate-distortion cost. If the codec builds the candidate list in the same way, the encoder only needs to transmit the index of the optimal MV in the candidate list. It should be noted that the MV prediction technique of HEVC also has a skip mode, which is a special case of the Merge mode. After finding the optimal MV in the Merge mode, if the current block and the reference block are substantially the same, no residual data need to be transmitted, only the index of the MV and a skip flag need to be transmitted.

The MV candidate list established in the Merge mode includes two cases of spatial domain and temporal domain, and also includes a combined list mode for B Slice (B frame image). In which the spatial domain provides a maximum of 4 candidate MVs, the establishment of which is shown in part (a) of fig. 3. The spatial domain list is established according to the sequence of A1 → B1 → B0 → A0 → B2, wherein B2 is alternative, namely when one or more of A1, B1, B0 and A0 do not exist, the motion information of B2 is needed to be used; the time domain provides only 1 candidate MV at most, and its establishment is shown as part (b) in fig. 3, and is scaled by the MV of the co-located PU according to the following formula:

curMV＝td*colMV/tb；

wherein, curMV represents the MV of the current PU, colMV represents the MV of the co-located PU, td represents the distance between the current picture and the reference picture, and tb represents the distance between the co-located picture and the reference picture. If the PU at the D0 position on the co-located block is not available, the co-located PU at the D1 position is used for replacement. For a PU in B Slice, since there are two MVs, its MV candidate list also needs to provide two MVPs (Motion Vector Predictor). HEVC generates a combined list for B Slice by pairwise combining the first 4 candidate MVs in the MV candidate list.

Similarly, the AMVP mode builds a MV candidate list for the current PU using MV correlation of spatial and temporal neighboring blocks. Different from the Merge mode, the optimal predicted MV is selected from the MV candidate list in the AMVP mode, and differential coding is performed on the optimal predicted MV obtained by Motion search with the current block to be coded, that is, the coded MVD is MV-MVP, where MVD is a Motion Vector Difference (Motion Vector Difference); the decoding end can calculate the MV of the current decoding block only by the serial numbers of the MVD and the MVP in the list by establishing the same list. The MV candidate list of AMVP mode also includes both spatial and temporal cases, except that the MV candidate list of AMVP mode is only 2 in length.

As described above, in the AMVP mode of HEVC, MVDs need to be encoded. In HEVC, the resolution of the MVD is controlled by use _ integer _ mv _ flag in slice _ header, and when the value of the flag is 0, the MVD is encoded at 1/4 (luminance) pixel resolution; when the flag has a value of 1, the MVD is encoded with full (luminance) pixel resolution. A method of Adaptive Motion Vector Resolution (AMVR) is used in the VVC. This method allows adaptive selection of the resolution of the coded MV per CU. In the normal AMVP mode, alternative resolutions include 1/4, 1/2, 1 and 4 pixel resolution. For a CU with at least one non-zero MVD component, a flag is first encoded to indicate whether quarter luma sample MVD precision is used for the CU. If the flag is 0, the MVD of the current CU is encoded with 1/4 pixel resolution. Otherwise, a second flag needs to be encoded to indicate that the CU uses 1/2 pixel resolution or other MVD resolution. Otherwise, a third flag is encoded to indicate whether 1-pixel resolution or 4-pixel resolution is used for the CU.

Two, IBC prediction mode

The IBC is an intra-frame Coding tool adopted in HEVC Screen Content Coding (SCC) extension, and significantly improves the Coding efficiency of Screen Content. In AVS3 and VVC, IBC techniques have also been adopted to improve the performance of screen content encoding. The IBC uses the spatial correlation of the screen content video to predict the pixels of the current block to be coded by using the pixels of the image coded on the current image, thereby effectively saving the bits required by the coded pixels. As shown in fig. 4, the displacement between the current block and its reference block in IBC is called BV (block vector). The H.266/VVC employs a BV prediction technique similar to inter prediction to further save the bits needed to encode BV. VVC predicts BV using AMVP mode similar to that in inter prediction and allows BVD to be encoded using 1 or 4 pixel resolution.

Third, ISC prediction mode

ISC techniques divide an encoded block into a series of pixel strings or unmatched pixels in some scan order (e.g., raster scan, round-trip scan, and Zig-Zag scan). Similar to IBC, each string finds a reference string of the same shape in the encoded region of the current picture, derives the predicted value of the current string, and can effectively save bits by encoding the residual between the pixel value of the current string and the predicted value instead of directly encoding the pixel value. Fig. 5 shows a schematic diagram of intra-frame string replication, where the dark gray regions are coded regions, the 28 pixels in white are string 1, the 35 pixels in light gray are string 2, and the 1 pixel in black represents an unmatched pixel. The displacement between string 1 and its reference string is string vector 1 in fig. 5; the displacement between string 2 and its reference string is string vector 2 in fig. 5.

The intra-frame string copy technique needs to encode the SV corresponding to each string in the current coding block, the string length, and a flag indicating whether there is a matching string. Where SV represents the displacement of the string to be encoded to its reference string. The string length represents the number of pixels that the string contains. In different implementations, there are many ways to encode the string length, and several examples are given below (some of which may be used in combination):

1) directly coding the length of the string in the code stream;

2) encoding the number of the pixels to be processed subsequent to the string in the code stream, and calculating the length of the current string by a decoding end according to the size N of the current block, the number N1 of the processed pixels and the number N2 of the pixels to be processed obtained by decoding, wherein L is N-N1-N2;

3) and encoding a mark in the code stream to indicate whether the string is the last string, and if the string is the last string, calculating the length L of the current string to be N-N1 according to the size N of the current block and the number N1 of processed pixels. If a pixel does not find a corresponding reference in the referenceable region, the pixel value of the unmatched pixel is directly encoded.

The decoding flow of a string prediction is shown in the following table 1:

TABLE 1

The associated semantics are described as follows:

the matching type flag isc _ match _ type _ flag [ i ] of the intra prediction of the string prediction is a binary variable. A value of '1' indicates that the ith part of the current coding unit is a string; a value of '0' indicates that the i-th part of the current coding unit is an unmatched pixel. The analytical process is shown in 8.3. IscMatchTypeFlag [ i ] is equal to the value of isc _ match _ type _ flag [ i ]. If isc _ match _ type _ flag [ i ] is not present in the bitstream, the value of IscMatchTypeFlag [ i ] is 0.

The last flag isc _ last _ flag [ i ] of the intra prediction of the string prediction is a binary variable. A value of '1' indicates that the ith part of the current coding unit is the last part of the current coding unit, the length of this part, StrLen [ i ], is equal to NumTotalPixel-NumCodPixel; a value of '0' indicates that a value of '1' indicates that the ith part of the current coding unit is not the last part of the current coding unit, and that the length of this part, StrLen [ i ], is equal to NumTotalPixel-NumCoded Pixel-NumRemaining Pixel Minus1[ i ] -1. The analytical process is shown in 8.3. IscLastFlag [ i ] is equal to the value of isc _ last _ flag [ i ].

The value of the next remaining number of pixels next _ remaining _ pixel _ in _ cu [ i ] represents the number of pixels remaining in the current coding unit that have not yet been decoded after decoding of the ith portion of the current coding unit is completed. The value of NextRemaining Pixel InCu [ i ] is equal to the value of next _ remaining _ pixel _ in _ cu [ i ].

Intra-prediction intra-prediction unmatched pixel Y component value isc _ unaMATCHED _ pixel _ Y [ i ]

Intra-prediction intra-prediction mismatch pixel U component value isc _ unamatted _ pixel _ U [ i ]

Intra-prediction intra-prediction mismatch pixel V component value isc _ unaMATCHED _ pixel _ V [ i ]

The three values are 10-bit unsigned integers representing the values of the Y, Cb or Cr components of the unmatched pixels of the i-th part of the current coding unit. IscUnmatedPixelY [ i ], IscUnmatedPixelU [ i ], and IscUnmatedPixelV [ i ] are equal to the values of isc _ unmapped _ pixel _ y [ i ], isc _ unmapped _ pixel _ u [ i ], and isc _ unmapped _ pixel _ v [ i ], respectively.

As shown in fig. 6, a simplified block diagram of a communication system provided by one embodiment of the present application is shown. Communication system 200 includes a plurality of devices that may communicate with each other over, for example, network 250. By way of example, the communication system 200 includes a first device 210 and a second device 220 interconnected by a network 250. In the embodiment of fig. 6, the first device 210 and the second device 220 perform unidirectional data transfer. For example, the first apparatus 210 may encode video data, such as a video picture stream captured by the first apparatus 210, for transmission over the network 250 to the second apparatus 220. The encoded video data is transmitted in the form of one or more encoded video streams. The second device 220 may receive the encoded video data from the network 250, decode the encoded video data to recover the video data, and display a video picture according to the recovered video data. Unidirectional data transmission is common in applications such as media services.

In another embodiment, the communication system 200 includes a third device 230 and a fourth device 240 that perform bi-directional transmission of encoded video data, which may occur, for example, during a video conference. For bi-directional data transfer, each of the third device 230 and the fourth device 240 may encode video data (e.g., a stream of video pictures captured by the devices) for transmission over the network 250 to the other of the third device 230 and the fourth device 240. Each of third apparatus 230 and fourth apparatus 240 may also receive encoded video data transmitted by the other of third apparatus 230 and fourth apparatus 240, and may decode the encoded video data to recover the video data, and may display video pictures on an accessible display device according to the recovered video data.

In the embodiment of fig. 6, the first device 210, the second device 220, the third device 230, and the fourth device 240 may be computer devices such as a server, a personal computer, and a smart phone, but the principles disclosed herein may not be limited thereto. The embodiment of the application is suitable for a Personal Computer (PC), a mobile phone, a tablet Computer, a media player and/or a special video conference device. Network 250 represents any number of networks that communicate encoded video data between first device 210, second device 220, third device 230, and fourth device 240, including, for example, wired and/or wireless communication networks. The communication network 250 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For purposes of this application, the architecture and topology of network 250 may be immaterial to the operation of the present disclosure, unless explained below.

By way of example, fig. 7 illustrates the placement of a video encoder and a video decoder in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, Digital TV (television), storing compressed video on Digital media including CD (Compact Disc), DVD (Digital Versatile Disc), memory stick, and the like.

The streaming system may include an acquisition subsystem 313, which may include a video source 301, such as a digital camera, that creates an uncompressed video picture stream 302. In an embodiment, the video picture stream 302 includes samples taken by a digital camera. The video picture stream 302 is depicted as a thick line to emphasize a high data amount video picture stream compared to the encoded video data 304 (or encoded video code stream), the video picture stream 302 may be processed by an electronic device 320, the electronic device 320 comprising a video encoder 303 coupled to a video source 301. The video encoder 303 may comprise hardware, software, or a combination of hardware and software to implement or embody aspects of the disclosed subject matter as described in greater detail below. The encoded video data 304 (or encoded video codestream 304) is depicted as a thin line compared to the video picture stream 302 to emphasize the lower data amount of the encoded video data 304 (or encoded video codestream 304), which may be stored on the streaming server 305 for future use. One or more streaming client subsystems, such as client subsystem 306 and client subsystem 308 in fig. 7, may access streaming server 305 to retrieve

copies

307 and 309 of encoded video data 304. The client subsystem 306 may include, for example, a video decoder 310 in an electronic device 330. Video decoder 310 decodes incoming copies 307 of the encoded video data and generates an output video picture stream 311 that may be presented on a display 312, such as a display screen, or another presentation device (not depicted). In some streaming systems, encoded video data 304, video data 307, and video data 309 (e.g., video streams) may be encoded according to certain video encoding/compression standards.

It should be noted that

electronic devices

320 and 330 may include other components (not shown). For example, the electronic device 320 may include a video decoder (not shown), and the electronic device 330 may also include a video encoder (not shown). Wherein the video decoder is configured to decode the received encoded video data; a video encoder is used to encode video data.

It should be noted that the technical solution provided in the embodiment of the present application may be applied to the h.266/VVC standard, the h.265/HEVC standard, the AVS (e.g., AVS3), or the next-generation video codec standard, and the embodiment of the present application does not limit this.

In the current AVS3 standard, for a coding unit of string prediction, a prediction signal of the coding unit directly consists of searched reference pixels, and the transformation, quantization and coefficient coding processes are skipped, which is equivalent to that the prediction signal is directly used as a reconstructed signal without the participation of a residual signal, and the complete use of the prediction signal without residual information brings a large difference from an original signal, thereby affecting the coding performance.

The application provides an image reconstruction method for introducing residual information into string prediction, when a CU is coded based on the string prediction mode, a residual signal of the CU is coded, when the CU coded through the string prediction is subjected to signal reconstruction, complete residual signal processing is carried out on the string prediction, namely the residual signal of the CU and a prediction signal are combined for signal reconstruction, the quality of a reconstructed signal of a coding unit based on the string prediction mode is improved, and the coding and decoding performance is further improved.

In the method provided by the embodiment of the present application, the execution main body of each step may be a decoding-side device or an encoding-side device. In the process of video decoding and video encoding, the technical scheme provided by the embodiment of the application can be adopted to reconstruct the image. The decoding end device and the encoding end device can be computer devices, and the computer devices refer to electronic devices with data calculation, processing and storage capabilities, such as PCs, mobile phones, tablet computers, media players, special video conference devices, servers and the like.

In addition, the methods provided herein can be used alone or in any order in combination with other methods. The encoder and decoder based on the methods provided herein may be implemented by 1 or more processors or 1 or more integrated circuits.

Referring to fig. 8, a flowchart of a video image reconstruction method according to an embodiment of the present application is shown. For convenience of explanation, only the steps executed by the computer device will be described. The method may include the steps of:

step 801, responding to encoding and decoding performed in a string prediction mode, and acquiring residual encoding of an encoded target encoding unit; the residual coding is obtained by coding the residual coefficient of the target coding block.

In the embodiment of the present application, when the encoder performs the encoding based on the string prediction on the target coding unit, an original signal of the target coding unit that is not encoded may be obtained; based on the reference signal, carrying out prediction-based on the original signal of the target coding unit in a serial prediction mode to obtain a prediction signal of the target coding unit and a residual signal of the target coding unit; obtaining a residual coding of the target coding unit based on the residual signal of the target coding unit; and adding the motion information corresponding to the prediction signal of the target coding unit and the residual coding of the target coding unit to the coded video code stream.

In one possible implementation, when encoding and decoding are performed by using the serial prediction method, the encoder/decoder directly obtains the residual coding of the target coding unit, so as to perform coefficient decoding of the residual on the encoded target coding unit.

The residual coding is obtained by transforming, quantizing, entropy coding/statistical coding the residual signal of the target coding unit in the process of coding the target coding unit by the coder.

In one possible implementation, when the encoder/decoder performs image reconstruction on an encoded target coding unit, it is first determined whether residual coding decoding is required on the target encoded image.

For example, the encoder/decoder acquires residual decoding indication information of the target coding unit in response to encoding and decoding by means of string prediction; the residual decoding indication information indicates whether to decode the residual of the target coding block;

and the encoder/decoder responds to the residual indication information to indicate residual decoding of the target coding block and acquires residual coding of the target coding unit.

In one possible implementation, the residual decoding indication information includes at least one of the following information:

1) and a first index decoded in the sequence header corresponding to the target coding unit, the first index indicating a residual coefficient of a coding unit decoded by a decoding method of the series prediction in the corresponding sequence.

In the embodiment of the present application, when the encoder performs video encoding by using the string prediction method, an index may be added to the sequence header, where the index indicates the residual coefficients of the coding units that need to decode the string prediction technique in all CUs of the sequence.

For example, when the encoder/decoder decodes the first index in the sequence header and determines that the residual coefficient of the target coding unit exists, it is determined that the decoding of the residual coefficient is required for the target coding unit.

2) And a second index decoded in the picture header corresponding to the target coding unit, the second index indicating a residual coefficient of a coding unit decoded by a decoding method of the series prediction in the corresponding picture.

In the embodiment of the present application, when the encoder performs video coding by using the string prediction method, an index may be added to the picture header, where the index indicates the residual coefficients of the coding units that need to decode the string prediction technique in all CUs of the picture.

For example, when the encoder/decoder decodes the second index in the picture header and determines that the residual coefficient of the target coding unit exists, it is determined that the decoding of the residual coefficient is required for the target coding unit.

3) And a third index decoded in the slice header of the target coding unit, the third index referencing a residual coefficient indicating a coding unit decoded by a decoding method of the series prediction in the corresponding slice.

In the embodiment of the present application, when an encoder performs video encoding by using a string prediction method, an index may be added to a slice (slice/patch) header, where the index indicates residual coefficients of coding units that need to decode a string prediction technique in all CUs of the slice.

For example, when the encoder/decoder decodes the third index in the slice header and determines that the residual coefficient of the target coding unit exists, it is determined that the decoding of the residual coefficient is required for the target coding unit.

4) And a fourth index decoded in the largest coding unit LCU of the target coding unit, the fourth index indicating a residual coefficient of a coding unit decoded by a decoding method of the string prediction in the corresponding LCU.

In the embodiment of the present application, when the encoder performs video coding by using the serial prediction method, an index may be added to the LCU, where the index indicates residual coefficients of coding units that need to decode the serial prediction technique in all CUs of the LCU.

For example, when the encoder/decoder decodes the fourth index in the LCU and determines that the residual coefficient of the target coding unit exists, it is determined that the decoding of the residual coefficient is required for the target coding unit.

5) A fifth index decoded in the target coding unit, the fifth index indicating a residual coefficient of the target coding unit.

In the embodiment of the present application, when the encoder performs video coding by using the string prediction method, an index may be added to the target coding unit, where the index indicates the current residual coefficient.

For example, when the encoder/decoder decodes the fifth index in the target coding unit and determines that the residual coefficient of the target coding unit exists, it is determined that the decoding of the residual coefficient is required for the target coding unit.

6) A component type of the target coding unit, the component type including a luminance component or a chrominance component.

For example, if a CU is currently a luma component, it indicates that the CU needs to decode residual coefficients of the string prediction technique;

alternatively, if a CU is currently a chroma component, it indicates that the CU needs to decode the residual coefficients of the string prediction technique.

That is, if a target coding unit is a luminance component or a chrominance component, the encoder/decoder determines that decoding of residual coefficients is required for the target coding unit.

7) And the coding block identification of the color component of each pixel in the target coding unit is used for indicating whether the color component on the corresponding pixel is nonzero.

In the present embodiment, for the current CU, the encoder/decoder decodes the cbf of each color component from the codestream (identifying whether the component has a non-zero coefficient) to indicate that the CU needs to decode the coefficients of the string prediction technique; for example, when cbf indicates that there are non-zero coefficients, the encoder/decoder determines that decoding of residual coefficients is required for the target coding unit; alternatively, when cbf indicates that there are no non-zero coefficients, the encoder/decoder determines that decoding of the residual coefficients is required for the target coding unit.

8) The size of the target coding unit.

In the embodiment of the present application, the encoder/decoder may indicate that the CU needs to decode the coefficients of the string prediction technique according to the size of the decoded block (measured by length, width, size of long side, size of short side, etc.); for example, when the size of a CU exceeds a threshold, the encoder/decoder determines the residual coefficients needed to decode the string prediction technique for that CU.

9) Coefficient indication information within the target coding unit, the coefficient indication information indicating whether all coefficients of the target coding unit are zero.

In the embodiment of the present application, the encoder/decoder may indicate that the CU needs to decode the coefficients of the string prediction technique depending on whether all the coefficients within the CU are 0.

For example, if the coefficients in the target coding unit are not all 0, it is determined that the residual coefficients of the string prediction technique need to be decoded for the target decoding unit.

Alternatively, if all of the coefficients in the target coding unit are 0, it is determined that the residual coefficients of the string prediction technique need to be decoded for the target decoding unit.

10) Whether the target coding unit contains isolated points.

In an embodiment of the present application, the encoder/decoder may indicate that the CU needs to decode the coefficients of the string prediction technique according to whether the string prediction content contains isolated points. For example, when the isolated point is included, it is determined that the CU requires a residual coefficient of the decoded string prediction technique, or when the isolated point is not included, it is determined that the CU requires a residual coefficient of the decoded string prediction technique.

Step 802, decoding the residual coding of the target coding unit to obtain the residual coefficient of the target coding unit.

In one possible implementation, the encoder/decoder may decode the residual Coding to obtain the residual Coefficient by Scan Region based Coefficient Coding (SRCC).

Fig. 9 is a schematic diagram illustrating coefficient decoding according to an embodiment of the present application. As shown in fig. 9, the SRCC technique uses (SRx, SRy) to determine the quantized coefficient region to be scanned in a Transform Unit (TU), where SRx is the abscissa of the rightmost non-zero coefficient in the coefficient matrix, and SRy is the ordinate of the bottommost non-zero coefficient in the coefficient matrix. Only the coefficients within the (SRx, SRy) determined scanning area need to be encoded. The scanning order of the coding is a reverse zigzag scanning from the lower right corner to the upper left corner, and each coefficient is coded in turn.

In one possible implementation, the residual coding of the target coding unit includes at least two sub-coding blocks;

the encoder/decoder decodes the residual coding of the target coding unit, and when obtaining the residual coefficient of the target coding unit, the encoder/decoder may skip the scanning of the all-zero sub-coding block of the at least two sub-coding blocks, and perform the coefficient decoding based on the scanning area on the non-all-zero sub-coding block of the at least two sub-coding blocks to obtain the residual coefficient of the target coding unit.

In one possible implementation, the encoder/decoder skips scanning of an all-zero sub-coding block of the at least two sub-coding blocks, and performs scanning-region-based coefficient decoding on the non-all-zero sub-coding block in a scanning manner along a horizontal direction with a width of the target coding block as a basic unit to obtain a residual coefficient of the target coding unit.

In one possible implementation, the encoder/decoder skips scanning of an all-zero sub-coding block of the at least two sub-coding blocks, and performs scanning-region-based coefficient decoding on the non-all-zero sub-coding block in a scanning manner along a vertical direction with a height of the target coding block as a basic unit to obtain a residual coefficient of the target coding unit.

In the embodiment of the present application, the encoder/decoder may skip the unnecessary scanning area by marking the non-zero sub-blocks, for example, the encoder/decoder decodes each NxN sub-block coefficient in the CU as 0 when decoding the coefficient. If so, the scan of this all-zero region is skipped. N may be one or more preset values, such as 1, 2, 4, …, w × h/2, width (w) of the current CU, or height (h).

In the embodiment of the present application, when decoding residual coding, the encoder/decoder may also perform scanning by changing a scanning order, for example, when a scanning manner along a horizontal direction (including a scanner-scan, a transform-scan, and the like) is adopted, a CU block width is used as a basic unit; when the scanning mode along the vertical direction (including the raster-scan, reverse-scan, etc.) is adopted, the CU block height is taken as a basic unit.

In another possible implementation, the encoder/decoder uses other coefficient decoding approaches other than SRCC, such as the coefficient decoding approach of the first stage of AVS 3.

Step 803, a residual signal of the target coding unit is obtained based on the residual coefficient of the target coding unit.

In one possible implementation manner, the obtaining a residual signal of the target coding unit based on the residual coefficient of the target coding unit includes:

carrying out inverse quantization on the residual coefficient of the target coding unit to obtain a transformation coefficient of the target coding unit;

and performing inverse transformation skipping processing on the transformation coefficient of the target coding unit, or performing inverse transformation processing on the transformation coefficient of the target coding unit to obtain a residual signal of the target coding unit.

Transform Skip (TS) is a technique in which the residual video signal is directly used as a Transform coefficient without converting it into a Transform domain or without performing DFT or DCT on the residual signal. Then, similar to other residual blocks based on a transformation mode, further lossy quantization operation is carried out, and certain information is lost, so that the quantized signal is beneficial to compression expression. The subsequent operations may include encoding whether each position of the residual block is 0, and the size, sign, etc. of the non-zero coefficient accordingly.

In some video coding standards, such as HEVC, VVC, the encoder explicitly informs the decoding end if it needs to select a transform skip mode for the current coded CU. However, in the AVS video coding standard, an implicitly expressed transform skip mode (ISTS) scheme is used only for screen content, as shown in table 2 below:

TABLE 2

Based on the above table 2, the encoding side needs to select (DCT 2) mode or Transform Skip (TS) mode by cost comparison. In order to conceal the ISTS, it is considered that when the encoding end selects (DCT 2) mode, the number of nonzero coefficients of the current block is even, and if the actual nonzero coefficients are odd, the encoding end adjusts the transform coefficients to make the nonzero coefficients even. Similarly, when the encoder selects the transform skip mode, the number of the obtained non-zero coefficients is an odd number, and if the actual non-zero number is an even number, the encoding end adjusts the transform coefficients to make the non-zero number be the odd number.

In one possible implementation, the inverse transform skipping processing on the transform coefficient of the target coding unit includes:

performing inverse transform skipping processing on the transform coefficient of the target coding unit in response to a non-zero number of transform coefficients of the target coding unit satisfying a first condition;

alternatively, the first and second electrodes may be,

responding to a first flag bit of the target coding unit obtained by decoding from a code stream and indicating that the inverse transformation is skipped, and performing the inverse transformation skipping processing on a transformation coefficient of the target coding unit;

alternatively, the first and second electrodes may be,

in response to the isolated point existing in the target coding unit, performing inverse transformation skipping processing on the transformation coefficient of the target coding unit;

alternatively, the first and second electrodes may be,

in response to the absence of isolated points in the target coding unit, inverse transform skipping processing is performed on the transform coefficients of the target coding unit.

In this embodiment, before the encoder/decoder performs the inverse transform skipping, it may be determined whether the encoder/decoder needs to perform the inverse transform skipping, and the determination method may include the following several methods:

1) judging whether to carry out inverse transformation skipping or not through the transformation skipping mode judgment of implicit expression;

2) decoding the code stream to obtain a flag bit, and judging whether to perform inverse transformation skipping or not;

3) the transformation skipping mode is directly applied to carry out inverse transformation skipping without selection;

4) and judging whether reverse transformation skipping is carried out or not according to whether isolated points exist in the current CU string prediction.

In one possible implementation, the inverse transform processing on the transform coefficient of the target coding unit includes:

in response to that non-zero numbers of the transform coefficients of the target coding unit satisfy a second condition, performing inverse transform processing on the transform coefficients of the target coding unit through a transform kernel indicated by the second condition;

alternatively, the first and second electrodes may be,

in response to the second flag bit of the target coding unit obtained by decoding from the code stream, performing inverse transformation skipping processing on the transformation coefficient of the target coding unit through the transformation core indicated by the second flag bit;

alternatively, the first and second electrodes may be,

and performing inverse transformation skipping processing on the transformation coefficient of the target coding unit through a specified transformation core.

In the embodiment of the present application, before performing inverse transformation, the encoder/decoder may determine whether inverse transformation is required, and determine a transformation kernel to be used if inverse transformation is required; the inverse transformation method may include the following:

1) performing inverse transformation by using a transformation kernel of implicit expression;

2) decoding the code stream to obtain whether to use DCT2 or other transformation cores to carry out inverse transformation;

3) DCT2 or some other transform kernel is used directly without selection.

In one possible implementation manner, the inverse quantizing the residual coefficient of the target coding unit to obtain the transform coefficient of the target coding unit includes:

adjusting the quantization parameters according to a specified quantization parameter adjustment mode, wherein the quantization parameter adjustment mode comprises increasing or reducing;

and performing inverse quantization on the residual coefficient of the target coding unit based on the adjusted quantization parameter to obtain the transform coefficient of the target coding unit.

In this embodiment of the present application, when the above method is performed by an encoder, before the encoder performs inverse quantization, the QP may be amplified based on the QP used in the quantization step in the encoding process of the target coding unit by the encoder according to a service requirement or a scene requirement, so as to allow a larger residual signal to appear, thereby expanding the application of coefficient encoding to a string copy technology, and further increasing the applicable range of encoding, or reducing the QP, so as to improve the encoding precision.

Step 804, obtaining a reconstructed signal of the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit.

In one possible implementation, the encoder/decoder superimposes the residual signal and the prediction signal of the target coding unit to obtain a reconstructed signal of the target coding unit.

In this embodiment of the present application, when the above method is executed by an encoder, the encoder obtains an adjusted matching threshold value based on the adjusted quantization parameter before obtaining a reconstructed signal obtained by decoding the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit, where the matching threshold value is a threshold value used for indicating whether pixels are matched in a string prediction process; and acquiring a prediction signal of the target coding unit in a string prediction mode based on the adjusted matching threshold.

In another exemplary scheme of the embodiment of the present application, the encoder uses an unadjusted matching threshold to obtain a prediction signal of the target coding unit by a string prediction manner in combination with motion information (such as an SV corresponding to each string in the target coding unit, a string length, a flag indicating whether there is a matching string, and the like) and a reference signal that has already been reconstructed.

That is to say, in the scheme shown in the embodiment of the present application, the encoder may directly obtain the prediction signal based on the serial prediction mode through the motion information, and superimpose the prediction signal with the decoded residual signal to derive the reconstructed signal; or, based on the above QP adjustment, the threshold for determining whether the pixels in the string prediction match may be adjusted, that is, the adjusted matching threshold corresponds to the adjusted QP, and then the prediction signal is obtained based on the string prediction mode through the motion information, and is superimposed on the decoded residual signal to derive the reconstructed signal.

In summary, according to the scheme shown in the embodiment of the present application, when the encoder/decoder performs encoding and decoding in the string prediction mode, the residual information is introduced into the string prediction process, and complete residual signal processing is performed on the string prediction, so that the reconstructed signal quality of the encoding unit based on the string prediction mode is improved, and the encoding and decoding performance is further improved.

In addition, according to the scheme shown in the embodiment of the application, in the image reconstruction process at the encoder side, on the basis of the QP used for encoding, the QP used for inverse quantization is enlarged or reduced, and correspondingly, when the prediction signal is obtained, the matching threshold is also adjusted, so that the image reconstruction accuracy can be flexibly adjusted according to business or scene requirements, that is, the flexibility of image reconstruction accuracy control is improved.

In addition, according to the scheme shown in the embodiment of the application, when the residual error coefficient is decoded in the image reconstruction process, all-zero sub-regions in the TU are skipped for searching, and non-all-zero sub-regions in the TU are searched and decoded, so that the decoding efficiency of the residual error coefficient is improved, and the image reconstruction efficiency is improved.

The image reconstruction method provided by the application can be applied to an encoder and can also be applied to a decoder. Wherein, in the encoder, the reconstructed signal of the encoded CU is used as a reference signal for encoding a subsequent CU, and in the decoder, the reconstructed signal of the encoded CU is used for decoding the subsequent CU and playing video.

Please refer to fig. 10, which shows a schematic diagram of an image reconstruction framework in a coding/decoding process according to an embodiment of the present application. As shown in fig. 10, in one possible application example of the above scheme of the present application, the encoder 101 divides an image to obtain an original signal 1011 of the coding unit CU1, and performs prediction coding on the original signal 1011 of the coding unit CU1 in a series prediction manner by reconstructing a reconstructed signal in a signal buffer 1012 to obtain a predicted signal 1013, and a residual signal 1014 between the original signal 1011 and the predicted signal 1013; the encoder 101 performs change and quantization operations on the residual signal 1014 to obtain a residual coefficient 1015, and performs entropy coding or statistical coding on the residual coefficient 1015 to obtain a residual code 1016; the residual coding 1016 and the motion information 1017 corresponding to the prediction signal 1013 (SV corresponding to each string, string length, flag indicating whether there is a matching string, and the like) are added to the coded code stream as the coding result of the CU1, and are transmitted to the decoder 102.

After the CU1 is encoded in the encoder 101, the encoder 101 further obtains a prediction signal 1013 through prediction by using motion information 1017, performs coefficient decoding and inverse quantization & inverse transformation/inverse transformation skip processing on the residual coding 1016 to obtain a residual coefficient 1015 and a residual signal 1014 in turn, superimposes and fuses the residual signal 1014 and the prediction signal 1013 to obtain a reconstructed signal of the CU1, and adds the reconstructed signal of the CU1 to the reconstructed signal buffer 1012 to be used as a reference signal for encoding a subsequent CU.

In the decoder 102, when the decoder 102 determines that residual coefficients of the CU1 need to be decoded when the CU1 is decoded by the serial prediction method, motion information 1017 and residual coding 1016 of the CU1 are obtained from the code stream, a prediction signal 1013 is obtained from the motion information 1017 by using a signal reconstructed in a reconstructed signal buffer 1021 as a reference signal, the residual coding 1016 is subjected to coefficient decoding and inverse quantization & inverse transformation/inverse transformation skipping processing, residual coefficients 1015 and residual signals 1014 are obtained in sequence, the residual signals 1014 and the prediction signal 1013 are fused to obtain a reconstructed signal of the CU1, and the reconstructed signal of the CU1 is used as a video playback on one hand and is also used as a reference signal for decoding a subsequent CU on the other hand.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 11, a block diagram of a video image reconstruction apparatus according to an embodiment of the present application is shown. The device has the functions of realizing the method examples, and the functions can be realized by hardware or by hardware executing corresponding software. The device may be the computer device described above, or may be provided on a computer device. The apparatus 1100 may include:

a residual coding obtaining module 1101, configured to obtain a residual coding of the coded target coding unit in response to the coding and decoding performed by the string prediction mode; the residual coding is obtained by coding the residual coefficient of the target coding block;

a coefficient decoding module 1102, configured to decode residual coding of the target coding unit to obtain a residual coefficient of the target coding unit;

a residual signal obtaining module 1103, configured to obtain a residual signal of the target coding unit based on a residual coefficient of the target coding unit;

a signal reconstructing module 1104, configured to obtain a reconstructed signal of the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit.

In one possible implementation, the residual coding obtaining module 1101 is configured to,

responding to encoding and decoding in a string prediction mode, and acquiring residual decoding indication information of the target encoding unit; the residual decoding indication information indicates whether to decode the residual of the target coding block;

and responding to the residual indication information to indicate residual decoding of the target coding block, and acquiring residual coding of the target coding unit.

a first index obtained by decoding in a sequence header corresponding to the target coding unit, wherein the first index is used for indicating a residual coefficient of a coding unit which is decoded in a decoding mode of serial prediction in a corresponding sequence;

a second index obtained by decoding in the image header corresponding to the target coding unit, wherein the second index is used for indicating a residual coefficient of a coding unit which is decoded in a decoding mode of serial prediction in a corresponding image;

a third index decoded in the slice header of the target coding unit, the third index referencing a residual coefficient indicating a coding unit decoded by a decoding manner of a string prediction in a corresponding slice;

a fourth index decoded in a largest coding unit LCU of the target coding unit, where the fourth index is used to indicate a residual coefficient of a coding unit decoded in a decoding manner of string prediction in a corresponding LCU;

a fifth index decoded in the target coding unit, the fifth index being used to indicate residual coefficients of the target coding unit;

a component type of the target coding unit, the component type comprising a luma component or a chroma component;

the coding block identification of the color component of each pixel in the target coding unit is used for indicating whether the color component on the corresponding pixel is nonzero or not;

a size of the target coding unit;

coefficient indication information within the target coding unit, the coefficient indication information indicating whether coefficients of the target coding unit are all zero;

and whether the target coding unit contains isolated points.

In one possible implementation, the residual signal obtaining module 1103 is configured to,

in response to a non-zero number of transform coefficients of the target coding unit satisfying a first condition, inverse transform skip processing is performed on the transform coefficients of the target coding unit;

alternatively, the first and second electrodes may be,

responding to a first flag bit of the target coding unit obtained by decoding from a code stream and indicating that inverse transformation skipping is carried out, and carrying out inverse transformation skipping processing on a transformation coefficient of the target coding unit;

alternatively, the first and second electrodes may be,

in response to the existence of the isolated point in the target coding unit, performing inverse transform skipping processing on the transform coefficient of the target coding unit;

alternatively, the first and second electrodes may be,

in response to a second condition being satisfied by non-zero numbers of transform coefficients of the target coding unit, inverse transform processing is performed on the transform coefficients of the target coding unit through a transform kernel indicated by the second condition;

alternatively, the first and second electrodes may be,

in response to a second flag bit of the target coding unit obtained by decoding from a code stream, performing inverse transformation skipping processing on a transformation coefficient of the target coding unit through a transformation core indicated by the second flag bit;

alternatively, the first and second electrodes may be,

and performing inverse quantization on the residual coefficient of the target coding unit based on the adjusted quantization parameter to obtain a transform coefficient of the target coding unit.

In one possible implementation, the apparatus further includes:

a matching threshold adjustment module, configured to, before the signal reconstruction module 1104 obtains a reconstructed signal obtained by decoding the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit, obtain an adjusted matching threshold based on the adjusted quantization parameter, where the matching threshold is a threshold used to indicate whether pixels are matched in a string prediction process;

and the prediction signal acquisition module is used for acquiring the prediction signal of the target coding unit in a string prediction mode based on the adjusted matching threshold.

the coefficient decoding module 1102 is configured to skip scanning of an all-zero sub-coding block of the at least two sub-coding blocks, and perform scanning area-based coefficient decoding on a non-all-zero sub-coding block of the at least two sub-coding blocks to obtain a residual coefficient of the target coding unit.

In a possible implementation manner, the coefficient decoding module 1102 is configured to skip scanning of an all-zero sub-coding block of the at least two sub-coding blocks, and perform scanning area-based coefficient decoding on the non-all-zero sub-coding block in a scanning manner along a horizontal direction with a width of the target coding block as a basic unit to obtain a residual coefficient of the target coding unit.

In a possible implementation manner, the coefficient decoding module 1102 is configured to skip scanning of an all-zero sub-coding block of the at least two sub-coding blocks, and perform scanning area-based coefficient decoding on the non-all-zero sub-coding block in a scanning manner along a vertical direction with a height of the target coding block as a basic unit to obtain a residual coefficient of the target coding unit.

Referring to fig. 12, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be the encoding side device described above, or may be the decoding side device described above. The computer device 150 may include: processor 151, memory 152, communication interface 153, encoder/decoder 154, and bus 155.

The processor 151 includes one or more processing cores, and the processor 151 executes various functional applications and information processing by executing software programs and modules.

The memory 152 may be used to store a computer program that the processor 151 is used to execute in order to implement the above-described video image reconstruction method.

The communication interface 153 may be used for communicating with other devices, such as for transmitting and receiving audio and video data.

The encoder/decoder 154 may be used to perform encoding and decoding functions, such as encoding and decoding audio-visual data.

The memory 152 is coupled to the processor 151 via a bus 155.

Further, the memory 152 may be implemented by any type or combination of volatile or non-volatile storage devices, including, but not limited to: magnetic or optical disk, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory), SRAM (Static Random-Access Memory), ROM (Read-Only Memory), magnetic Memory, flash Memory, PROM (Programmable Read-Only Memory).

Those skilled in the art will appreciate that the configuration shown in FIG. 12 is not intended to be limiting of the computer device 150 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which when executed by a processor implements the above-mentioned video image reconstruction method.

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the above-mentioned video image reconstruction method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for reconstructing a video image, the method comprising:

2. The method of claim 1, wherein obtaining the residual coding of the encoded target coding unit in response to the encoding and decoding by means of the serial prediction comprises:

3. The method of claim 2, wherein the residual decoding indication information comprises at least one of the following information:

a size of the target coding unit;

and whether the target coding unit contains isolated points.

4. The method of claim 1, wherein obtaining the residual signal of the target coding unit based on the residual coefficients of the target coding unit comprises:

5. The method according to claim 4, wherein said performing inverse transform skip processing on the transform coefficients of the target coding unit comprises:

alternatively, the first and second electrodes may be,

6. The method according to claim 4, wherein said inverse transforming the transform coefficients of the target coding unit comprises:

alternatively, the first and second electrodes may be,

7. The method of claim 4, wherein the inverse quantizing the residual coefficients of the target coding unit to obtain the transform coefficients of the target coding unit, comprises:

8. The method according to claim 7, wherein before obtaining the reconstructed signal obtained by decoding the target coding unit based on the residual signal of the target coding unit and the prediction signal of the target coding unit, the method further comprises:

obtaining an adjusted matching threshold value based on the adjusted quantization parameter, wherein the matching threshold value is a threshold value used for indicating whether pixels are matched in a string prediction process;

and acquiring a prediction signal of the target coding unit in a string prediction mode based on the adjusted matching threshold.

9. The method according to any of claims 1 to 8, wherein the residual coding of the target coding unit comprises at least two sub-coded blocks;

the decoding the residual coding of the target coding unit to obtain the residual coefficient of the target coding unit includes:

and skipping scanning of all-zero sub-coding blocks in the at least two sub-coding blocks, and performing scanning area-based coefficient decoding on non-all-zero sub-coding blocks in the at least two sub-coding blocks to obtain residual coefficients of the target coding unit.

10. The method of claim 9, wherein skipping scanning of all-zero sub-coding blocks of the at least two sub-coding blocks, performing scanning-region-based coefficient decoding on non-all-zero sub-coding blocks of the at least two sub-coding blocks, and obtaining residual coefficients of the target coding unit comprises:

and skipping scanning of all-zero sub-coding blocks in the at least two sub-coding blocks, and performing coefficient decoding based on a scanning area on the non-all-zero sub-coding blocks in a scanning mode along the horizontal direction by taking the width of the target coding block as a basic unit to obtain a residual coefficient of the target coding unit.

11. The method of claim 9, wherein skipping scanning of all-zero sub-coding blocks of the at least two sub-coding blocks, performing scanning-region-based coefficient decoding on non-all-zero sub-coding blocks of the at least two sub-coding blocks, and obtaining residual coefficients of the target coding unit comprises:

and skipping the scanning of all-zero sub-coding blocks in the at least two sub-coding blocks, and performing coefficient decoding based on a scanning area on the non-all-zero sub-coding blocks in a scanning mode along the vertical direction by taking the height of the target coding block as a basic unit to obtain a residual coefficient of the target coding unit.

12. A method for encoding a video image, the method comprising:

acquiring an original signal of an uncoded target coding unit;

13. A video image reconstruction apparatus, characterized in that the apparatus comprises:

14. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of any one of claims 1 to 12.

15. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of any of claims 1 to 12.