CN117426088A - Video encoding and decoding method, device, system and storage medium - Google Patents

Video encoding and decoding method, device, system and storage medium Download PDF

Info

Publication number
CN117426088A
CN117426088A CN202180098997.7A CN202180098997A CN117426088A CN 117426088 A CN117426088 A CN 117426088A CN 202180098997 A CN202180098997 A CN 202180098997A CN 117426088 A CN117426088 A CN 117426088A
Authority
CN
China
Prior art keywords
current block
intra
characteristic information
prediction mode
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180098997.7A
Other languages
Chinese (zh)
Inventor
戴震宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN117426088A publication Critical patent/CN117426088A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction

Abstract

The application provides a video encoding and decoding method, device, system and storage medium, if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of a self-encoder, decoding a code stream to obtain the characteristic information of the current block (S402); acquiring pixel values of reconstructed pixel points around the current block (S403); and inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a predicted block of the current block output by the decoding network (S404). I.e. the present application adds an intra prediction mode based on the self-encoder, providing more options for intra prediction. If the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, determining a prediction block of the current block according to the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block, and under the condition that the correlation between the original value of the current block and the reconstructed value around the current block is not large, not only the pixel values of the reconstructed pixel points around the current block but also the characteristic information of the current block are considered during prediction, so that the accurate prediction of the current block can be realized, and the accuracy of the intra-frame prediction is improved.

Description

Video encoding and decoding method, device, system and storage medium Technical Field
The present disclosure relates to the field of video encoding and decoding technologies, and in particular, to a video encoding and decoding method, apparatus, system, and storage medium.
Background
Digital video technology may be incorporated into a variety of video devices, such as digital televisions, smartphones, computers, electronic readers, or video players, among others. With the development of video technology, video data includes a larger amount of data, and in order to facilitate the transmission of video data, video apparatuses perform video compression technology to make the transmission or storage of video data more efficient.
The compression of video data is currently achieved by reducing or eliminating redundant information in the video data by spatial prediction or temporal prediction. The prediction method includes inter prediction and intra prediction, wherein the intra prediction predicts a current block based on neighboring blocks already decoded in the same frame image.
The current intra prediction mode predicts the predicted value of the current block based on the reconstructed value around the current block, and if the correlation between the current fast original value and the reconstructed value around the current block is not large, the prediction is inaccurate.
Disclosure of Invention
The embodiment of the application provides a video coding and decoding method, device, system and storage medium, and provides an intra-frame prediction mode based on a self-encoder, which can realize accurate prediction of a current block when the correlation between a current fast original value and reconstructed values around the current block is not large.
In a first aspect, the present application provides a video encoding method, including:
decoding the code stream, and determining an intra-frame prediction mode of the current block;
if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain the characteristic information of the current block;
acquiring pixel values of reconstructed pixel points around the current block;
and inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a predicted block of the current block output by the decoding network.
In a second aspect, an embodiment of the present application provides a video decoding method, including:
determining an intra-frame prediction mode of a current block from preset N first intra-frame prediction modes, wherein N is a positive integer, and the N first intra-frame prediction modes comprise intra-frame prediction modes based on a self-encoder;
if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, acquiring the self-encoder corresponding to the current block, wherein the self-encoder comprises an encoding network and a decoding network;
inputting an original value of the current block into the coding network to obtain first characteristic information of the current block output by the coding network;
And inputting the first characteristic information of the current block and pixel values of reconstructed pixel points around the current block into the decoding network to obtain a predicted block of the current block output by the decoding network.
In a third aspect, the present application provides a video encoder for performing the method of the first aspect or implementations thereof. In particular, the encoder comprises functional units for performing the method of the first aspect described above or in various implementations thereof.
In a fourth aspect, the present application provides a video decoder for performing the method of the second aspect or implementations thereof. In particular, the decoder comprises functional units for performing the method of the second aspect described above or in various implementations thereof.
In a fifth aspect, a video encoder is provided that includes a processor and a memory. The memory is for storing a computer program and the processor is for calling and running the computer program stored in the memory for performing the method of the first aspect or implementations thereof.
In a sixth aspect, a video decoder is provided that includes a processor and a memory. The memory is for storing a computer program and the processor is for invoking and running the computer program stored in the memory to perform the method of the second aspect or implementations thereof described above.
In a seventh aspect, a video codec system is provided that includes a video encoder and a video decoder. The video encoder is for performing the method of the first aspect described above or in various implementations thereof, and the video decoder is for performing the method of the second aspect described above or in various implementations thereof.
An eighth aspect provides a chip for implementing the method of any one of the first to second aspects or each implementation thereof. Specifically, the chip includes: a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the method as in any one of the first to second aspects or implementations thereof described above.
In a ninth aspect, a computer-readable storage medium is provided for storing a computer program for causing a computer to perform the method of any one of the above first to second aspects or implementations thereof.
In a tenth aspect, there is provided a computer program product comprising computer program instructions for causing a computer to perform the method of any one of the first to second aspects or implementations thereof.
In an eleventh aspect, there is provided a computer program which, when run on a computer, causes the computer to perform the method of any one of the above-described first to second aspects or implementations thereof.
In a twelfth aspect, there is provided a code stream generated based on the method of the first aspect.
Based on the technical scheme, in the intra-frame prediction process of video encoding and decoding, a decoding end determines an intra-frame prediction mode of a current block by decoding a code stream; if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain the characteristic information of the current block; acquiring pixel values of reconstructed pixel points around the current block; and inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a predicted block of the current block output by the decoding network. I.e. the present application adds an intra prediction mode based on the self-encoder, providing more options for intra prediction. If the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, determining a prediction block of the current block according to the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block, and under the condition that the correlation between the current fast original value and the reconstructed value around the current block is not large, not only considering the pixel values of the reconstructed pixel points around the current block but also considering the characteristic information of the current block during prediction, accurate prediction of the current block can be realized, and the accuracy of intra-frame prediction is improved.
Drawings
Fig. 1 is a schematic block diagram of a video codec system according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a video encoder according to an embodiment of the present application;
FIG. 3 is a schematic block diagram of a video decoder according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a reference pixel according to an embodiment of the present application;
fig. 5 is a schematic diagram of 35 intra prediction modes of HEVC;
FIG. 6 is a schematic diagram of 67 intra prediction modes of a VVC;
fig. 7 is a schematic diagram of MIP intra prediction mode;
FIG. 8 is a schematic diagram of a network architecture of a self-encoder according to an embodiment of the present application;
FIG. 9A is a graphical representation of an activation function according to an embodiment of the present application;
FIG. 9B is another graphical representation of an activation function according to an embodiment of the present application;
fig. 10 is a schematic flow chart of a video decoding method according to an embodiment of the present application;
FIG. 11 is a schematic diagram of prediction according to an embodiment of the present application;
fig. 12 is another flow chart of a video decoding method according to an embodiment of the present application;
FIG. 13 is a schematic diagram of a prediction according to an embodiment of the present application;
fig. 14 is another flow chart of a video decoding method according to an embodiment of the present application;
FIG. 15 is a schematic diagram of a prediction according to an embodiment of the present application;
Fig. 16 is another flow chart of a video decoding method according to an embodiment of the present application;
FIG. 17 is a schematic diagram of a prediction according to an embodiment of the present application;
fig. 18 is a schematic flow chart of a video encoding method according to an embodiment of the present application;
FIG. 19 is a schematic block diagram of a video decoder provided by an embodiment of the present application;
FIG. 20 is a schematic block diagram of a video encoder provided in an embodiment of the present application;
FIG. 21 is a schematic block diagram of an electronic device provided by an embodiment of the present application;
fig. 22 is a schematic block diagram of a video codec system provided in an embodiment of the present application.
Detailed Description
The method and the device can be applied to the fields of image encoding and decoding, video encoding and decoding, hardware video encoding and decoding, special circuit video encoding and decoding, real-time video encoding and decoding and the like. For example, the schemes of the present application may be incorporated into audio video coding standards (audio video coding standard, AVS for short), such as the h.264/audio video coding (audio video coding, AVC for short) standard, the h.265/high efficiency video coding (high efficiency video coding, HEVC for short) standard, and the h.266/multi-function video coding (versatile video coding, VVC for short) standard. Alternatively, the schemes of the present application may operate in conjunction with other proprietary or industry standards including ITU-T H.261, ISO/IECMPEG-1Visual, ITU-T H.262 or ISO/IECMPEG-2Visual, ITU-T H.263, ISO/IECMPEG-4Visual, ITU-T H.264 (also known as ISO/IECMPEG-4 AVC), including Scalable Video Codec (SVC) and Multiview Video Codec (MVC) extensions. It should be understood that the techniques of this application are not limited to any particular codec standard or technique.
For ease of understanding, a video codec system according to an embodiment of the present application will be described first with reference to fig. 1.
Fig. 1 is a schematic block diagram of a video codec system according to an embodiment of the present application. It should be noted that fig. 1 is only an example, and the video codec system of the embodiment of the present application includes, but is not limited to, the one shown in fig. 1. As shown in fig. 1, the video codec system 100 includes an encoding device 110 and a decoding device 120. Wherein the encoding device is arranged to encode (which may be understood as compressing) the video data to generate a code stream and to transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
The encoding device 110 of the present embodiment may be understood as a device having a video encoding function, and the decoding device 120 may be understood as a device having a video decoding function, i.e., the present embodiment includes a broader means for the encoding device 110 and the decoding device 120, such as including a smart phone, a desktop computer, a mobile computing device, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a television, a camera, a display device, a digital media player, a video game console, a vehicle-mounted computer, and the like.
In some embodiments, the encoding device 110 may transmit the encoded video data (e.g., a bitstream) to the decoding device 120 via the channel 130. Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120.
In one example, channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time. In this example, the encoding apparatus 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to the decoding apparatus 120. Where the communication medium comprises a wireless communication medium, such as a radio frequency spectrum, the communication medium may optionally also comprise a wired communication medium, such as one or more physical transmission lines.
In another example, channel 130 includes a storage medium that may store video data encoded by encoding device 110. Storage media include a variety of locally accessed data storage media such as compact discs, DVDs, flash memory, and the like. In this example, the decoding device 120 may obtain encoded video data from the storage medium.
In another example, channel 130 may comprise a storage server that may store video data encoded by encoding device 110. In this example, the decoding device 120 may download stored encoded video data from the storage server. Alternatively, the storage server may store the encoded video data and may transmit the encoded video data to a decoding device 120, such as a web server (e.g., for a website), a File Transfer Protocol (FTP) server, or the like.
In some embodiments, the encoding apparatus 110 includes a video encoder 112 and an output interface 113. Wherein the output interface 113 may comprise a modulator/demodulator (modem) and/or a transmitter.
In some embodiments, the encoding device 110 may include a video source 111 in addition to a video encoder 112 and an input interface 113.
Video source 111 may include at least one of a video capture device (e.g., a video camera), a video archive, a video input interface for receiving video data from a video content provider, a computer graphics system for generating video data.
The video encoder 112 encodes video data from the video source 111 to produce a bitstream. The video data may include one or more pictures (pictures) or sequences of pictures (sequence of pictures). The code stream contains encoded information of the image or image sequence in the form of a bit stream. The encoded information may include encoded image data and associated data. The associated data may include a sequence parameter set (sequence parameter set, SPS for short), a picture parameter set (picture parameter set, PPS for short), and other syntax structures. An SPS may contain parameters that apply to one or more sequences. PPS may contain parameters that apply to one or more pictures. A syntax structure refers to a set of zero or more syntax elements arranged in a specified order in a bitstream.
The video encoder 112 directly transmits the encoded video data to the decoding apparatus 120 via the output interface 113. The encoded video data may also be stored on a storage medium or storage server for subsequent reading by the decoding device 120.
In some embodiments, decoding apparatus 120 includes an input interface 121 and a video decoder 122.
In some embodiments, decoding apparatus 120 may include a display device 123 in addition to input interface 121 and video decoder 122.
Wherein the input interface 121 comprises a receiver and/or a modem. The input interface 121 may receive encoded video data through the channel 130.
The video decoder 122 is configured to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123.
The display device 123 displays the decoded video data. The display device 123 may be integral with the decoding apparatus 120 or external to the decoding apparatus 120. The display device 123 may include a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other types of display devices.
In addition, fig. 1 is merely an example, and the technical solution of the embodiment of the present application is not limited to fig. 1, for example, the technology of the present application may also be applied to single-side video encoding or single-side video decoding.
The following describes a video coding framework according to an embodiment of the present application.
Fig. 2 is a schematic block diagram of a video encoder according to an embodiment of the present application. It should be appreciated that the video encoder 200 may be used for lossy compression of images (lossy compression) and may also be used for lossless compression of images (lossless compression). The lossless compression may be visual lossless compression (visually lossless compression) or mathematical lossless compression (mathematically lossless compression).
The video encoder 200 may be applied to image data in luminance and chrominance (YCbCr, YUV) format. For example, the YUV ratio may be 4:2:0, 4:2:2, or 4:4:4, y represents brightness (Luma), cb (U) represents blue chromaticity, cr (V) represents red chromaticity, U and V represent chromaticity (Chroma) for describing color and saturation. For example, in color format, 4:2:0 represents 4 luminance components per 4 pixels, 2 chrominance components (yyycbcr), 4:2:2 represents 4 luminance components per 4 pixels, 4 chrominance components (yyyycbcrbcr), and 4:4:4 represents a full-pixel display (yyyycbcrcbcrbcrcbcr).
For example, the video encoder 200 reads video data, divides a frame of image into a number of Coding Tree Units (CTUs) for each frame of image in the video data, and in some examples, CTBs may be referred to as "tree blocks", "maximum coding units" (Largest Coding unit, LCUs) or "coding tree blocks" (coding tree block, CTBs). Each CTU may be associated with a block of pixels of equal size within the image. Each pixel may correspond to one luminance (or luma) sample and two chrominance (or chroma) samples. Thus, each CTU may be associated with one block of luma samples and two blocks of chroma samples. One CTU size is, for example, 128×128, 64×64, 32×32, etc. One CTU may be further divided into several Coding Units (CUs), where a CU may be a rectangular block or a square block. The CU may be further divided into a Prediction Unit (PU) and a Transform Unit (TU), so that the encoding, the prediction, and the transform are separated, and the processing is more flexible. In one example, CTUs are divided into CUs in a quadtree manner, and CUs are divided into TUs, PUs in a quadtree manner.
Video encoders and video decoders may support various PU sizes. Assuming that the size of a particular CU is 2nx2n, video encoders and video decoders may support 2 nx2n or nxn PU sizes for intra prediction and support 2 nx2n, 2 nx N, N x 2N, N x N or similar sized symmetric PUs for inter prediction. Video encoders and video decoders may also support asymmetric PUs of 2nxnu, 2nxnd, nL x 2N, and nR x 2N for inter prediction.
In some embodiments, as shown in fig. 2, the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, a loop filtering unit 260, a decoded image buffer 270, and an entropy encoding unit 280. It should be noted that video encoder 200 may include more, fewer, or different functional components.
Alternatively, in this application, a current block (current block) may be referred to as a current Coding Unit (CU) or a current Prediction Unit (PU), or the like. The prediction block may also be referred to as a prediction image block or an image prediction block, and the reconstructed image block may also be referred to as a reconstructed block or an image reconstructed image block.
In some embodiments, prediction unit 210 includes an inter prediction unit 211 and an intra estimation unit 212. Because of the strong correlation between adjacent pixels in a frame of video, intra-prediction methods are used in video coding techniques to eliminate spatial redundancy between adjacent pixels. Because of the strong similarity between adjacent frames in video, the inter-frame prediction method is used in the video coding and decoding technology to eliminate the time redundancy between adjacent frames, thereby improving the coding efficiency.
The inter prediction unit 211 may be used for inter prediction, which may refer to image information of different frames, using motion information to find a reference block from the reference frame, generating a prediction block from the reference block, for eliminating temporal redundancy; the frames used for inter-prediction may be P frames, which refer to forward predicted frames, and/or B frames, which refer to bi-directional predicted frames. The motion information includes a reference frame list in which the reference frame is located, a reference frame index, and a motion vector. The motion vector may be integer or sub-pixel, and if the motion vector is sub-pixel, then interpolation filtering is required to make the required sub-pixel block in the re-reference frame, where the integer or sub-pixel block in the reference frame found from the motion vector is referred to as the reference block. Some techniques may use the reference block directly as a prediction block, and some techniques may reprocess the reference block to generate a prediction block. Reprocessing a prediction block on the basis of a reference block is also understood to mean that the reference block is taken as a prediction block and then a new prediction block is processed on the basis of the prediction block.
The intra estimation unit 212 predicts pixel information within the current code image block for eliminating spatial redundancy by referring to only information of the same frame image. The frame used for intra prediction may be an I-frame. For example, as shown in fig. 4, a white 4×4 block is a current block, and gray pixels of a left row and a top column of the current block are reference pixels of the current block, which are used for intra prediction to predict the current block. These reference pixels may already be fully available, i.e. fully already encoded. There may also be some unavailability, such as the current block is the leftmost side of the entire frame, and then the reference pixel to the left of the current block is not available. Or when the current block is encoded and decoded, the lower left part of the current block is not encoded and decoded yet, and then the lower left reference pixel is not available. For the case where reference pixels are not available, the available reference pixels or some values or some methods may be used for padding or not.
There are multiple prediction modes for intra prediction, taking the international digital video coding standard H series as an example, the h.264/AVC standard has 8 angular prediction modes and 1 non-angular prediction mode, and the h.265/HEVC is extended to 33 angular prediction modes and 2 non-angular prediction modes.
As shown in fig. 5, the intra prediction modes used in HEVC are a Planar mode (Planar), DC, and 33 angular modes, for a total of 35 prediction modes.
As shown in fig. 6, the intra modes used by VVC are Planar, DC, and 65 angle modes, for a total of 67 prediction modes. There is a prediction matrix (Matrix based intra prediction, MIP) prediction mode based on training for the luma component and a CCLM prediction mode for the chroma component.
As shown in fig. 7, in the MIP technique, for a rectangular prediction block with a width W and a height H, MIP selects W reconstructed pixels in a row above the block and H reconstructed pixels in a column to the left as inputs. If the pixels at these locations have not been reconstructed, the pixels at the non-reconstructed locations are set to a default value, e.g., 512 for 10bit pixels. The MIP generation predictor is based on three steps, reference pixel averaging, matrix vector multiplication and linear interpolation upsampling, respectively.
MIP acts on blocks of 4x4 to 64x64 size, and for a rectangular prediction block, MIP mode selects the appropriate prediction matrix according to rectangular side length; for a rectangle with a short side of 4, 16 sets of matrix parameters are selected; for a rectangle with the short side of 8, 8 sets of matrix parameters are selected in total; other rectangles, a total of 6 sets of matrix parameters are selected. MIP predicts by using alternative matrix, and index of the least cost matrix is coded into code stream for decoding end to read parameters of matrix for prediction.
It should be noted that, with the increase of the angle modes, the intra-frame prediction will be more accurate, and the requirements for the development of high-definition and ultra-high-definition digital video are more satisfied.
Residual unit 220 may generate a residual block of the CU based on the pixel block of the CU and the prediction block of the PU of the CU. For example, residual unit 220 may generate a residual block of the CU such that each sample in the residual block has a value equal to the difference between: samples in pixel blocks of a CU, and corresponding samples in prediction blocks of PUs of the CU.
The transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on Quantization Parameter (QP) values associated with the CU. The video encoder 200 may adjust the degree of quantization applied to the transform coefficients associated with the CU by adjusting the QP value associated with the CU.
The inverse transform/quantization unit 240 may apply inverse quantization and inverse transform, respectively, to the quantized transform coefficients to reconstruct a residual block from the quantized transform coefficients.
The reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. In this way, reconstructing sample blocks for each TU of the CU, video encoder 200 may reconstruct pixel blocks of the CU.
Loop filtering unit 260 may perform a deblocking filtering operation to reduce blocking artifacts of pixel blocks associated with the CU.
In some embodiments, the loop filtering unit 260 includes a deblocking filtering unit for deblocking artifacts and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit for removing ringing effects.
The decoded image buffer 270 may store reconstructed pixel blocks. Inter prediction unit 211 may use the reference image containing the reconstructed pixel block to perform inter prediction on PUs of other images. In addition, intra estimation unit 212 may use the reconstructed pixel blocks in decoded image buffer 270 to perform intra prediction on other PUs in the same image as the CU.
The entropy encoding unit 280 may receive the quantized transform coefficients from the transform/quantization unit 230. Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
Fig. 3 is a schematic block diagram of a video decoder according to an embodiment of the present application.
As shown in fig. 3, the video decoder 300 includes: an entropy decoding unit 310, a prediction unit 320, an inverse quantization/transformation unit 330, a reconstruction unit 340, a loop filtering unit 350, and a decoded image buffer 360. It should be noted that the video decoder 300 may include more, fewer, or different functional components.
The video decoder 300 may receive the bitstream. The entropy decoding unit 310 may parse the bitstream to extract syntax elements from the bitstream. As part of parsing the bitstream, the entropy decoding unit 310 may parse entropy-encoded syntax elements in the bitstream. The prediction unit 320, the inverse quantization/transformation unit 330, the reconstruction unit 340, and the loop filtering unit 350 may decode video data according to syntax elements extracted from a bitstream, i.e., generate decoded video data.
In some embodiments, prediction unit 320 includes an intra-estimation unit 321 and an inter-prediction unit 322.
The intra estimation unit 321 may perform intra prediction to generate a prediction block of the PU. Intra-estimation unit 321 may use an intra-prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs. The intra-estimation unit 321 may also determine an intra-prediction mode of the PU according to one or more syntax elements parsed from the bitstream.
The inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to syntax elements parsed from the bitstream. Furthermore, if the PU uses inter prediction encoding, entropy decoding unit 310 may parse the motion information of the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU based on the motion information of the PU. Inter prediction unit 322 may generate a prediction block for the PU based on one or more reference blocks for the PU.
The inverse quantization/transform unit 330 may inverse quantize (i.e., dequantize) transform coefficients associated with the TUs. Inverse quantization/transform unit 330 may determine the degree of quantization using QP values associated with the CUs of the TUs.
After inverse quantizing the transform coefficients, inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add samples of the residual block to corresponding samples of the prediction block to reconstruct a pixel block of the CU, resulting in a reconstructed image block.
Loop filtering unit 350 may perform a deblocking filtering operation to reduce blocking artifacts of pixel blocks associated with the CU.
The video decoder 300 may store the reconstructed image of the CU in a decoded image buffer 360. The video decoder 300 may use the reconstructed image in the decoded image buffer 360 as a reference image for subsequent prediction or may transmit the reconstructed image to a display device for presentation.
The basic flow of video encoding and decoding is as follows: at the encoding end, one frame of image is divided into blocks, and for a current block, the prediction unit 210 generates a prediction block of the current block using intra prediction or inter prediction. The residual unit 220 may calculate a residual block, which may also be referred to as residual information, based on the difference between the prediction block and the original block of the current block, i.e., the prediction block and the original block of the current block. The residual block is transformed and quantized by the transforming/quantizing unit 230, and the like, so that information insensitive to human eyes can be removed to eliminate visual redundancy. Alternatively, the residual block before being transformed and quantized by the transforming/quantizing unit 230 may be referred to as a time domain residual block, and the time domain residual block after being transformed and quantized by the transforming/quantizing unit 230 may be referred to as a frequency residual block or a frequency domain residual block. The entropy encoding unit 280 receives the quantized change coefficient output from the change quantization unit 230, and may entropy encode the quantized change coefficient to output a bitstream. For example, the entropy encoding unit 280 may eliminate character redundancy according to the target context model and probability information of the binary code stream.
At the decoding end, the entropy decoding unit 310 may parse the code stream to obtain prediction information of the current block, a quantization coefficient matrix, etc., and the prediction unit 320 generates a prediction block of the current block using intra prediction or inter prediction on the current block based on the prediction information. The inverse quantization/transformation unit 330 performs inverse quantization and inverse transformation on the quantized coefficient matrix using the quantized coefficient matrix obtained from the code stream to obtain a residual block. The reconstruction unit 340 adds the prediction block and the residual block to obtain a reconstructed block. The reconstructed blocks constitute a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the blocks, resulting in a decoded image. The encoding side also needs to obtain a decoded image in a similar operation to the decoding side. The decoded image may also be referred to as a reconstructed image, which may be a subsequent frame as a reference frame for inter prediction.
The block division information determined by the encoding end, and mode information or parameter information such as prediction, transformation, quantization, entropy coding, loop filtering, etc. are carried in the code stream when necessary. The decoding end analyzes the code stream and analyzes and determines the same block division information as the encoding end according to the existing information, and predicts, transforms, quantizes, entropy codes, loop filters and other mode information or parameter information, so that the decoded image obtained by the encoding end is ensured to be the same as the decoded image obtained by the decoding end.
The foregoing is a basic flow of a video codec under a block-based hybrid coding framework, and as technology advances, some modules or steps of the framework or flow may be optimized.
The current intra prediction mode predicts the predicted value of the current block based on the reconstructed value around the current block, and if the correlation between the current fast original value and the reconstructed value around the current block is not large, the prediction is inaccurate.
In order to solve the above technical problems, embodiments of the present application provide an intra-prediction mode based on a self-encoder, which can implement accurate prediction of a current block when a correlation between a current fast original value and a reconstructed value around the current block is not large.
Fig. 8 is a schematic diagram of a network structure of a self-encoder according to an embodiment of the present application, and as shown in fig. 8, the self-encoder includes an encoding network and a decoding network.
As shown in fig. 8, the encoding network includes 4 fully-connected layers (fully connected layer, abbreviated FCL) and 4 activation functions, wherein one fully-connected layer is followed by one activation function, so that the characteristic information output by the fully-connected layer is input into the next fully-connected layer after being subjected to nonlinear transformation. The output of the last activation function in the coding network is the output of the coding network. Each fully connected layer includes 128 nodes, the first 3 activation functions being the leak relu activation functions, and the last activation function being the sigmoid activation function. It should be noted that fig. 8 is only an example of a coding network, and the coding network of the present application includes, but is not limited to, those shown in fig. 8, for example, including more or less fully connected layers and activation functions than those of fig. 8, where each fully connected layer includes no limited to 128 nodes, and the nodes included in each connected layer may be the same or different. Alternatively, the full connection layer in fig. 8 may be replaced with a convolution layer, and the leak relu activation function may be replaced with other activation functions, for example, with the activation functions of relu, elu, etc.
As shown in fig. 8, the decoding network includes 4 fully-connected layers and 3 activation functions, each of the first 3 fully-connected layers is followed by an activation function for non-linearly transforming the characteristic information output from the fully-connected layers and then inputting the non-linearly transformed characteristic information into the next fully-connected layer. The output of the last fully-connected layer in the decoding network serves as the output of the decoding network. The network structure of the decoding network is symmetrical to that of the encoding network, each full connection layer in the decoding network comprises 128 nodes, and 3 activation functions are leakage relu activation functions. It should be noted that fig. 8 is only an example of a decoding network, and the encoding network of the present application includes, but is not limited to, those shown in fig. 8, for example, including more or less fully connected layers and activation functions than those of fig. 8, where each fully connected layer includes no limited to 128 nodes, and the nodes included in each connected layer may be the same or different. Alternatively, the full connection layer in fig. 8 may be replaced with a convolution layer, and the leak relu activation function may be replaced with other activation functions, for example, with the activation functions of relu, elu, etc.
With continued reference to fig. 8, in actual prediction, the original pixel value of the current block is input into the coding network, and the last activation function of the coding network outputs the characteristic Information of the current block, which is denoted as Side Information (SI), and the characteristic Information is a low-dimensional characteristic map (feature map). And then, outputting the characteristic information of the current block by the last activation function of the coding network, inputting the pixel values of the reconstructed pixel points around the current block into a decoding network, and outputting the predicted block of the current block through the processing of the decoding network.
As can be seen from fig. 8, based on the intra prediction mode of the self-encoder, when predicting the prediction block of the current block, not only the pixel values of the reconstructed pixels around the current block are considered, but also the original pixel values of the current block are considered, and when the correlation between the current fast original value and the reconstructed values around the current block is not great, the accurate prediction of the current block can be realized, thereby improving the accuracy of the intra prediction.
As shown in fig. 8, when determining the prediction block of the current block, the encoding end needs to input the characteristic information of the current block output by the encoding network and the pixel values of the reconstructed pixels around the current block into the decoding network, and the decoding network outputs the prediction block of the current block. For the decoding end, in order for the decoding end to accurately determine the predicted block of the current block, the encoding end needs to carry the characteristic information of the current block output by the encoding network in a code stream, and sends the characteristic information to the decoding end. The decoding end analyzes the characteristic information of the current block from the code stream, outputs the characteristic information of the current block and pixel values of reconstructed pixel points around the current block from a decoding network of the encoder to obtain a predicted block of the current block, and further obtains a reconstructed block of the current block according to the predicted block of the current block and a residual block of the current block analyzed in the code stream.
That is, the characteristic information output by the coding network needs to be written into the code stream, and in order to reduce the number of codes occupied by the characteristic information, the characteristic information output by the coding network needs to be valued, and the valued characteristic information is written into the code stream.
In order to reduce the number of the codes occupied by the characteristic information after the value is taken, the value of the characteristic information output by the coding network needs to meet the number of continuous distribution in a limited range. For example, the value range of the element value in the characteristic information output by the last layer activation function of the coding network is [ a, b ], wherein a and b are integers.
Example 1, a is 0, b is 1, that is, the value range of the element value in the characteristic information output by the last layer of the activation function of the coding network is [0,1], so that when the characteristic information output by the last layer of the activation function is rounded, the rounded result is equal to 0 or 1, and 1 bit may be used for representing, for example, bit0 represents a value of 0, and bit1 represents a value of 1.
Optionally, the expression of the last layer activation function of the coding network is shown in formula (1):
wherein x is the input of the last layer of activation function, namely the characteristic value of the layer output before the last layer of activation function in the coding network, and S (x) is the characteristic information of the last layer of activation function output.
When the last layer of activation function of the coding network adopts the above formula (1), as shown in fig. 9A, each element value in the characteristic information output by the coding network can be limited to 0 to 1, and when the last layer of activation function is rounded, the last layer of activation function is rounded to 0 or 1, so that characteristic information can be coded into a code stream conveniently.
It should be noted that, in this example one, the expression of the last layer activation function includes, but is not limited to, the above formula (1), but may be any other expression that constrains the output of the encoding network between [0,1 ].
In the second example, a is-1, b is 1, that is, the value range of the element value in the characteristic information output by the last layer of the activation function of the coding network is [ -1,1], so that when the characteristic information output by the last layer of the activation function is rounded, the rounding result is-1, 0 or 1. For the rounded number, first, a one-bit binary symbol may be used to represent whether the rounded number is 0, and if not, a one-bit binary symbol may be used to represent positive and negative. If 0 is written or read to represent 0, 10 is written or read to represent-1, and 11 is written to represent 1.
Optionally, the expression of the last layer activation function of the coding network is shown in formula (2):
wherein x is the input of the last layer of activation function, namely the characteristic value of the layer output before the last layer of activation function in the coding network, S (x) is the characteristic information of the last layer of activation function output, n is a positive integer, and n=10 is optional.
When the expression of the last layer of activation function of the coding network adopts the above formula (2), as shown in fig. 9B, each element value in the characteristic information output by the coding network can be limited to be between-1 and 1, and when the value is rounded, the rounded result is-1, 0 or 1, so that the characteristic information can be conveniently encoded into the code stream.
It should be noted that, in this example two, the expression of the last layer activation function includes, but is not limited to, the above formula (2), but may also be any other expression that constrains the output of the encoding network between [ -1,1 ].
In some embodiments, in order to make the output of the last layer activation function of the coding network meet different coding methods, the characteristic information of the output of the last layer activation function may be further enlarged or reduced. For example, by multiplying the result output by the above formula (1) by 2, the limiting range of the characteristic information output by the coding network can be changed from 0 to 1 to 0 to 2, and the characteristic information in 0 to 2 can be rounded, and the rounded result is 0, 1 and 2. Alternatively, the result output by the above formula (1) may be multiplied by 2 and then subtracted by 1, so that the limiting range of the characteristic information output by the coding network may be changed from 0 to 1, and when rounding, the rounding result is-1, 0 and 1.
The self-encoder in the embodiment of the application facilitates encoding after rounding the characteristic information output by the last layer of activation function by limiting the element value in the characteristic information output by the last layer of activation function of the encoding network to a certain range, for example, to a range of [0,1] or [ -1,1 ]. In addition, the embodiment of the application provides two different expressions of the last layer of activation function, the implementation operation process is simple, the calculated amount is small, the characteristic information output by the last layer of activation function is ensured to be constrained within a certain range, and the coding efficiency of the coding network is improved.
The following describes the training process of the self-encoder.
The coding network and the decoding network of the self-encoder in the embodiment of the application perform optimization training synchronously, and the specific process is that a training coding block is input into the coding network, the coding network outputs characteristic information of the training coding block, after the characteristic information is valued, the characteristic information and pixel values of reconstructed pixel points around the training coding block are input into the decoding network, and the decoding network outputs a prediction block of the training coding block. And calculating loss according to the true values of the prediction blocks output by the decoding network and the prediction blocks of the training coding blocks, and reversely updating the weight of each layer in the self-encoder according to the loss to finish the training of the self-encoder. While the updating process of each layer weight is realized by deriving each layer output, it is known from the above that the feature information output by the coding network needs to be rounded, and the rounding process is not conductive. The embodiment of the application solves the technical problem in the following two ways:
In one mode, a uniformly distributed noise is added to the characteristic information output by the last layer of activation function of the coding network, and the value range of the noise can be-0.5, so that the rounding process is simulated. Taking the last layer of activation function shown in the formula (1) as an example, adding noise to the characteristic information with the value range of 0-1 output by the last layer of activation function, wherein the value range of the characteristic information after noise addition is-0.5-1.5, and taking the characteristic information as the input of a decoding network.
In the second mode, in the training process of the self-encoder, the rounded discrete value is directly used as the input of the decoding network in the forward calculation, that is, the characteristic information output by the last layer of activation function is directly input into the decoding network after four-cut five-cut, for example, as shown in the following formula (3):
wherein B (x) is the rounded characteristic information, S (x) is the characteristic information output by the last layer of activation function, and L represents rounding.
When the derivative is calculated reversely, the characteristic information output by the last layer of activation function is directly used for reverse derivative propagation.
B′(x)=S′(x) (4)
Wherein B' (x) represents a value obtained by deriving the characteristic information S (x) output from the last layer activation function.
According to the method and the device for the intra-frame prediction, in the training process of the self-encoder, the two modes are used for processing, so that the training accuracy of the self-encoder is improved, and the accuracy of the intra-frame prediction can be guaranteed when the self-encoder is used for intra-frame prediction.
In some embodiments, the present application further trains a plurality of self-encoders for prediction blocks of different sizes, for example, trains blocks of shapes of 32×32, 32×16, 32×8, 32×4, 16×16, 16×8, 16×4,8×8,8×4,4×4, etc., to obtain the self-encoders corresponding to the blocks.
Alternatively, the blocks of each shape may be luminance blocks including luminance components, and the self-encoder may be trained to obtain prediction values for predicting the luminance blocks of different shapes.
Alternatively, the blocks of each shape may be chroma blocks including chroma components, and then a self-encoder for obtaining prediction values for predicting the chroma blocks of different shapes is trained.
Optionally, the blocks of each shape include a luminance component and a chrominance component, and the self-encoder is further trained to obtain a luminance prediction value and a chrominance prediction value for predicting the blocks of different shapes.
The method does not limit the training data of the training self-encoder, and optionally, the DIV2K data set can be used as a training set to train the self-encoder corresponding to the blocks with different shapes.
The present application does not limit the dimension of the characteristic information output from the encoding network of the encoder. Optionally, the feature information output by the coding network is an nxm feature vector, for example, the feature vector is a 1×2 vector, for example, (0, 1); for another example, the feature vector is a 1×3 vector, for example, (0, 1).
In some embodiments, for both a×b and b×a shaped blocks, only the self-encoder corresponding to a×b block may be trained during training, and b×a shaped blocks may be used during intra prediction. Specifically, a block of b×a is transposed to a block of a size of a×b, the transposed block of a×b is input from the encoder to obtain a prediction block of a size of a×b output from the encoder, and the prediction block of a×b is transposed to a prediction block of b×a as a prediction block of b×a.
In summary, the embodiment of the application can train to obtain the self-encoders corresponding to the blocks with different shapes, and can select the self-encoders corresponding to the blocks with different shapes when carrying out intra-frame prediction on the blocks with different shapes, thereby ensuring the accuracy of intra-frame prediction on the blocks with different shapes.
The network structure and training process of the self-encoder are described above, and based on this, the video decoding method provided in the embodiment of the present application is described below with reference to fig. 10 by taking the decoding end as an example.
Fig. 10 is a schematic flow chart of a video decoding method according to an embodiment of the present application, which is applied to the video decoder shown in fig. 1 and fig. 2. As shown in fig. 10, the method of the embodiment of the present application includes:
s401, decoding the code stream, and determining an intra-frame prediction mode of the current block.
In some embodiments, the current block is also referred to as a current decoded block, a current decoding unit, a decoded block, a block to be decoded, a current block to be decoded, and the like.
The implementation manner of determining the intra prediction mode of the current block in S401 includes, but is not limited to, the following:
in one aspect, if the code stream includes the first flag, the S401 includes the following S401-A1 and S401-A2:
S401-A1, a decoding end decodes a code stream to obtain a first mark, wherein the first mark is used for indicating whether a current sequence allows using an intra-frame prediction mode based on a self-encoder;
S401-A2, determining the intra prediction mode of the current block according to the first mark.
For example, if the value of the first flag is equal to a second value (e.g., 0), indicating that the current sequence does not allow the use of the intra-prediction mode based on the self-encoder, at this time, it may be determined that the intra-prediction mode of the current block is not the intra-prediction mode based on the self-encoder.
For another example, if the value of the first flag is equal to the first value (e.g. 1), it indicates that the current sequence allows the intra prediction mode based on the self-encoder, and at this time, the implementation manner of determining the intra prediction mode of the current block according to the first flag in S401-A2 includes, but is not limited to, the following:
in a first implementation, if the value of the first flag is equal to the first value, it is determined that the intra prediction mode of the current block is a self-encoder based intra prediction mode. That is, in this implementation, if the value of the first flag is equal to the first value, it indicates that the current sequence allows the use of the intra-prediction mode based on the self-encoder, and the intra-prediction modes of the decoded blocks in the current sequence are all intra-prediction modes based on the self-encoder.
In a second implementation manner, the code stream includes a first flag and a second flag, and at this time, if the value of the first flag is equal to the first value, the decoding end decodes the code stream to obtain the second flag, where the second flag is used to indicate whether the current block uses the intra-frame prediction mode based on the self-encoder, and determines the intra-frame prediction mode of the current block according to the second flag. For example, when the value of the second flag is a value of 1 (e.g., 1), it indicates that the intra-prediction mode of the current block is the intra-prediction mode based on the self-encoder, and when the value of the second flag is not equal to the value of 1, it indicates that the intra-prediction mode of the current block is other intra-prediction modes than the intra-prediction mode based on the self-encoder.
In the first aspect, the decoding end may determine the intra-prediction mode of the current block according to the first flag decoded from the code stream, or determine the intra-prediction mode of the current block according to the first flag and the second flag.
In a second aspect, if the code stream includes the second flag but does not include the first flag, the S401 includes the following S401-B1 and S401-B2:
S401-B1, decoding the code stream to obtain a second mark;
S401-B2, determining the intra prediction mode of the current block according to the second mark.
In the second mode, the code stream includes a second flag directly indicating whether the current block uses the intra-frame prediction mode based on the self-encoder, and when the first flag is not included, the decoding end decodes the code stream to obtain the second flag, and determines the intra-frame prediction mode of the current block directly based on the second flag, for example, when the value of the second flag is a value of 1 (for example, 1), the intra-frame prediction mode of the current block is the intra-frame prediction mode based on the self-encoder, and when the value of the second flag is not equal to the value of 1, the intra-frame prediction mode of the current block is other intra-frame prediction modes except the intra-frame prediction mode based on the self-encoder.
In the second mode, the first flag is not written in the code stream, but the second flag is directly written in the code stream, so as to indicate whether the current block uses the intra-frame prediction mode based on the self-encoder, thereby saving code words and reducing the decoding burden of a decoding end.
Optionally, the second flag may be a flag newly added in the code stream.
Optionally, the second flag is an indication flag existing in the code stream for indicating an intra-prediction mode, and in this embodiment, the value of the indication flag of the intra-prediction mode is extended, and the value indicating the intra-prediction mode based on the self-encoder is added. For example, the value of the indication flag of the existing intra prediction mode is extended, and when the value of the indication flag of the intra prediction mode is a value of 1, it indicates that the intra prediction mode of the current block is the intra prediction mode based on the self-encoder. The mode does not need to additionally add a field for representing the second mark in the code stream, thereby saving code words and improving decoding efficiency.
The embodiment of the application does not limit the specific writing position of the first mark and the second mark in the code stream.
Optionally, the first flag is included in a sequence level parameter syntax element.
For example, using VVC Draft 10 as an example, a first flag is added to the sequence level parameter syntax, and the modification of the sequence level parameter syntax (Sequence parameter set RBSP syntax) is as shown in table 1:
TABLE 1
Wherein sps_ae_enabled_flag represents a first flag.
Optionally, the second flag is included in an encoding unit syntax element.
According to the above manner, the code stream is decoded, and the intra prediction mode of the current block is obtained, and then the following step S402 is performed.
S402, if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain the characteristic information of the current block.
When the coding end uses the self-coding to carry out intra-frame prediction on the current block, the coding network of the self-coding outputs the first characteristic information of the current block, then the first characteristic information is rounded to obtain the characteristic information of the current block, and the characteristic information of the current block is coded into the code stream, so that the decoding end determines the predicted block of the current block according to the characteristic information of the current block.
Based on this, the decoding end executes S401 above, and if it is determined that the intra-frame prediction mode of the current block is the intra-frame prediction mode based on the self-encoder, continues decoding the code stream, and obtains the feature information of the current block.
The embodiment of the application does not limit the specific position of the characteristic information of the current block in the code stream, and optionally, the characteristic information of the current block is positioned in the syntax element of the coding unit, and the decoding end decodes the syntax element of the coding unit to obtain the characteristic information of the current block.
S403, obtaining pixel values of reconstructed pixel points around the current block.
As known from the working principle of the self-encoder, the feature information of the current block and the pixel values of the reconstructed pixels around the current block after rounding are input into a decoding network of the self-encoder, and the decoding network outputs a predicted block of the current block.
Based on this, when the decoding end predicts the prediction block of the current block based on the intra prediction mode of the self-encoder, it is also necessary to acquire the pixel values of the reconstructed pixels around the current block.
For example, the reconstructed pixel points around the current block include n rows of pixel points above the current block and/or m columns of pixel points at the left layer, where n and m are positive integers, and n and m may be equal or unequal. Each of the n rows of pixel points may be continuous or discontinuous. Each row of m rows of pixels may be continuous or discontinuous. Optionally, the n rows of pixel points may be adjacent or not, and the m columns of pixel points may be adjacent or not.
S404, inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a prediction block of the current block output by the decoding network.
In some embodiments, the self-encoders corresponding to the blocks of different shapes may be different, so the decoding end may select the self-encoder corresponding to the current block from the plurality of self-encoders according to the size of the current block. Next, as shown in fig. 11, the obtained feature information of the current block and the pixel values of the reconstructed pixels around the current block are input into a decoding network of a self-encoder corresponding to the current block, so as to obtain a predicted block of the current block output by the decoding network.
And the decoding end decodes the code stream after obtaining the predicted block of the current block according to the method to obtain the residual block of the current block, and adds the predicted block and the residual block to obtain the reconstructed block of the current block.
According to the decoding method, the intra-frame prediction mode of the current block is determined by decoding the code stream; if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain the characteristic information of the current block; acquiring pixel values of reconstructed pixel points around the current block; and inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a predicted block of the current block output by the decoding network. That is, the embodiments of the present application add intra prediction modes based on the self-encoder, providing more options for intra prediction. If the intra-frame prediction mode of the current block is determined to be based on the intra-frame prediction mode of the self-encoder, the prediction block of the current block is determined according to the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block, and under the condition that the correlation between the current fast original value and the reconstructed value around the current block is not large, the prediction takes not only the pixel values of the reconstructed pixel points around the current block but also the characteristic information of the current block into consideration, so that the accurate prediction of the current block can be realized, and the accuracy of the intra-frame prediction is improved.
In some embodiments, the second flag is used to indicate whether the luma component and/or the chroma component of the current block uses the intra-prediction mode based on the self-encoder. The decoding method of the embodiment of the present application is described below with respect to different components of the current block.
Fig. 12 is another flowchart of a video decoding method according to an embodiment of the present application, where the second flag is used to indicate whether the luminance component of the current block uses the intra prediction mode based on the self-encoder. As shown in fig. 12, includes:
s501, decoding the code stream, and determining an intra-frame prediction mode of a brightness component of the current block.
The implementation manner of determining the intra prediction mode of the luminance component of the current block in S501 includes, but is not limited to, the following:
in one aspect, if the code stream includes the first flag, the S501 includes the following S501-A1 and S501-A2:
S501-A1, a decoding end decodes a code stream to obtain a first mark, wherein the first mark is used for indicating whether a current sequence allows using an intra-frame prediction mode based on a self-encoder;
S501-A2, determining an intra prediction mode of a brightness component of the current block according to the first mark.
For example, if the value of the first flag is equal to a second value (e.g., 0), indicating that the current sequence does not allow the use of the intra prediction mode based on the self-encoder, at this time, it may be determined that the intra prediction mode of the luminance component of the current block is not the intra prediction mode based on the self-encoder.
For another example, if the value of the first flag is equal to the first value (e.g. 1), it indicates that the current sequence allows the intra prediction mode based on the self-encoder, and at this time, the implementation manner of determining the intra prediction mode of the luminance component of the current block according to the first flag in S501-A2 includes, but is not limited to, the following:
in a first implementation, if the value of the first flag is equal to the first value and the luma component of the current block has an intra-prediction mode based on the self-encoder, the intra-prediction mode of the luma component of the current block is determined to be the intra-prediction mode based on the self-encoder.
In a second implementation manner, the code stream includes a first flag and a second flag, and at this time, if the value of the first flag is equal to the first value, the decoding end decodes the code stream to obtain the second flag, where the second flag is used to indicate whether the luminance component of the current block uses the intra-prediction mode based on the self-encoder, and determines the intra-prediction mode of the luminance component of the current block according to the second flag. For example, when the value of the second flag is a value of 1 (e.g., 1), the intra-prediction mode indicating the luminance component of the current block is the intra-prediction mode based on the self-encoder, and when the value of the second flag is not equal to the value of 1, the intra-prediction mode indicating the luminance component of the current block is other intra-prediction modes than the intra-prediction mode based on the self-encoder.
In the first aspect, the decoding end may determine the intra prediction mode of the luminance component of the current block according to the first flag decoded from the code stream, or determine the intra prediction mode of the luminance component of the current block according to the first flag and the second flag.
In a second aspect, if the code stream includes the second flag but does not include the first flag, the S501 includes the following S501-B1 and S501-B2:
S501-B1, decoding the code stream to obtain a second mark;
S501-B2, determining whether the luminance component of the current block uses a self-encoder intra-prediction mode according to the second flag.
In the second mode, the code stream includes a second flag directly indicating whether the intra-prediction mode of the self-encoder is used for the luminance component of the current block, and when the first flag is not included, the decoding end decodes the code stream to obtain the second flag, and determines the intra-prediction mode of the luminance component of the current block directly from the second flag, for example, when the value of the second flag is a value of 1 (for example, 1), the intra-prediction mode indicating the luminance component of the current block is the intra-prediction mode of the self-encoder, and when the value of the second flag is not equal to the value of 1, the intra-prediction mode indicating the luminance component of the current block is other intra-prediction modes than the intra-prediction mode of the self-encoder.
In the second mode, the first flag is not written in the code stream, but the second flag is directly written in the code stream, so as to indicate whether the luminance component of the current block uses the intra-frame prediction mode based on the self-encoder, thereby saving the code word and reducing the decoding burden of the decoding end.
The embodiment of the application does not limit the specific writing position of the first mark and the second mark in the code stream.
Optionally, the first flag is included in a sequence level parameter syntax element.
Optionally, the second flag is included in an encoding unit syntax element.
S502, if the luminance component of the current block uses the intra-frame prediction mode based on the self-encoder, decoding the code stream to obtain the luminance characteristic information of the current block.
In this embodiment of the present application, the specific writing position of the luminance feature information of the current block in the code stream is not limited, and may be carried at any position in the code stream, for example.
In one example, the luminance characteristic information of the current block is carried in a syntax element corresponding to a luminance component of the current block. At this time, the decoding end decodes the syntax element corresponding to the luminance component of the current block; and obtaining the brightness characteristic information of the current block in the syntax element corresponding to the brightness component of the current block.
In another example, the luma feature information of the current block is carried in the coding unit syntax element.
Optionally, a second flag intra_ae_flag is also included in the coding unit syntax element.
Specifically, the decoding end encodes a unit syntax (Coding unit syntax) bitstream, obtains a unit syntax element, and reads a second flag intra_ae_flag therefrom, which is a unit-level control flag for indicating whether the luminance component of the current block uses the intra prediction mode based on the self-encoder. If the intra_ae_flag is 1, the intra prediction mode used for the self-encoder, representing the luminance component of the current block, is further read out the luminance feature information side [ ] of the current block.
In one example, taking VVC Draft 10 as an example, the changes in the associated syntax table of Coding unit syntax are shown in table 2:
TABLE 2
In table 2 above, si_size indicates how many characteristic information elements need to be encoded. The coding of the index [ ] using the 1-bit fixed-length code (u (1)) is performed with respect to the case where the value of the index [ ] is 0 or 1, and the coding may be performed by using a context model.
Alternatively, if the value range of the side info is larger, a multi-bit codeword may be used for decoding, for example, when the value range of the side info is-1, 0,1, the corresponding syntax table of Coding unit syntax is modified as shown in table 3:
TABLE 3 Table 3
In table 3 above, abssidinfo [ ] is the absolute value of side info [ ], and when it is not 0, its symbol is further decoded. Similarly, a context model may be used in addition to fixed-length code decoding.
Alternatively, when the value range of the side info [ ] is larger, more bit code words may be used to represent the luminance feature information side info [ ] of the current block.
S503, obtaining brightness values of reconstructed pixel points around the current block.
For example, the reconstructed pixel points around the current block include n rows of pixel points above the current block and/or m columns of pixel points at the left layer, where n and m are positive integers, and n and m may be equal or unequal. Each of the n rows of pixel points may be continuous or discontinuous. Each row of m rows of pixels may be continuous or discontinuous. Optionally, the n rows of pixel points may be adjacent or not, and the m columns of pixel points may be adjacent or not.
S504, inputting the brightness characteristic information of the current block and the brightness values of the reconstructed pixel points around the current block into a decoding network to obtain a brightness prediction block of the current block.
In some embodiments, the self-encoders corresponding to the blocks of different shapes may be different, and the self-encoders corresponding to the chrominance components and the luminance components may be different, so the decoding end may select the self-encoder corresponding to the luminance component of the current block from the plurality of self-encoders according to the size of the luminance component of the current block. Next, as shown in fig. 13, the luminance characteristic information of the current block and the luminance values of the reconstructed pixels around the current block are obtained and input into the self-encoder corresponding to the luminance component of the current block, so as to obtain the luminance prediction block of the current block output by the decoding network.
According to the decoding method, through decoding a code stream, an intra-frame prediction mode of a brightness component of a current block is determined; if the intra-frame prediction mode of the brightness component of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain brightness characteristic information of the current block; acquiring brightness values of reconstructed pixel points around the current block; and inputting the brightness characteristic information of the current block and brightness values of reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the brightness component of the current block to obtain a brightness prediction block of the current block output by the decoding network. That is, the embodiment of the application adds the intra-frame prediction mode based on the self-encoder when in the intra-frame prediction of the brightness, and enriches the intra-frame prediction mode of the brightness component. If the intra-frame prediction mode of the luminance component of the current block is determined to be based on the intra-frame prediction mode of the self-encoder, the luminance prediction block of the current block is determined according to the luminance characteristic information of the current block and the luminance values of the reconstructed pixels around the current block, and under the condition that the original value of the current fast luminance component and the reconstructed values around the current block have little correlation, the luminance characteristic information of the current block is considered as well as the pixel values of the reconstructed pixels around the current block are considered during prediction, so that the accurate prediction of the luminance component of the current block can be realized, and the accuracy of the intra-frame prediction of the luminance component is improved.
Fig. 14 is another flowchart of a video decoding method according to an embodiment of the present application, where the second flag is used to indicate whether the chroma component of the current block uses the intra prediction mode based on the self-encoder. As shown in fig. 14, includes:
s601, decoding the code stream, and determining an intra-frame prediction mode of a chroma component of the current block.
The implementation manner of determining the intra prediction mode of the chroma component of the current block in S601 includes, but is not limited to, the following:
in one aspect, if the code stream includes the first flag, the S601 includes the following S601-A1 and S601-A2:
S601-A1, a decoding end decodes a code stream to obtain a first mark, wherein the first mark is used for indicating whether a current sequence allows an intra-frame prediction mode based on a self-encoder to be used;
S601-A2, determining an intra prediction mode of a chroma component of the current block according to the first mark.
For example, if the value of the first flag is equal to a second value (e.g., 0), indicating that the current sequence does not allow the use of the intra-prediction mode based on the self-encoder, at this time, it may be determined that the intra-prediction mode of the chroma component of the current block is not the intra-prediction mode based on the self-encoder.
For another example, if the value of the first flag is equal to the first value (e.g. 1), it indicates that the current sequence allows the intra prediction mode based on the self-encoder, and at this time, the implementation manner of determining the intra prediction mode of the chroma component of the current block according to the first flag in S601-A2 includes, but is not limited to, the following several modes:
in a first implementation, if the value of the first flag is equal to the first value and the chroma component of the current block has a self-encoder based intra prediction mode, the intra prediction mode of the chroma component of the current block is determined to be the self-encoder based intra prediction mode.
In a second implementation manner, the code stream includes a first flag and a second flag, and at this time, if the value of the first flag is equal to the first value, the decoding end decodes the code stream to obtain the second flag, where the second flag is used to indicate whether the chroma component of the current block uses the intra-prediction mode based on the self-encoder, and determines the intra-prediction mode of the chroma component of the current block according to the second flag. For example, when the value of the second flag is a value of 1 (e.g., 1), the intra-prediction mode indicating the chroma component of the current block is the intra-prediction mode based on the self-encoder, and when the value of the second flag is not equal to the value of 1, the intra-prediction mode indicating the chroma component of the current block is other intra-prediction modes than the intra-prediction mode based on the self-encoder.
In the first aspect, the decoding end may determine the intra prediction mode of the chroma component of the current block according to the first flag decoded from the code stream, or determine the intra prediction mode of the chroma component of the current block according to the first flag and the second flag.
In a second aspect, if the code stream includes the second flag but does not include the first flag, the S601 includes the following S601-B1 and S601-B2:
S601-B1, decoding the code stream to obtain a second mark;
S601-B2, determining whether the chroma component of the current block uses a self-encoder based intra prediction mode according to the second flag.
In the second mode, the code stream includes a second flag directly indicating whether the chroma component of the current block uses the intra-frame prediction mode based on the self-encoder, and when the first flag is not included, the decoding end decodes the code stream to obtain the second flag, and determines the intra-frame prediction mode of the chroma component of the current block directly based on the second flag, for example, when the value of the second flag is a value of 1 (for example, 1), the intra-frame prediction mode indicating the chroma component of the current block is the intra-frame prediction mode based on the self-encoder, and when the value of the second flag is not equal to the value of 1, the intra-frame prediction mode indicating the chroma component of the current block is other intra-frame prediction modes except the intra-frame prediction mode based on the self-encoder.
In the second mode, the first flag is not written in the code stream, but the second flag is directly written in the code stream, so as to indicate whether the chroma component of the current block uses the intra-frame prediction mode based on the self-encoder, thereby saving code words and reducing the decoding burden of a decoding end.
The embodiment of the application does not limit the specific writing position of the first mark and the second mark in the code stream.
Optionally, the first flag is included in a sequence level parameter syntax element.
Optionally, the second flag is included in an encoding unit syntax element.
S602, if the chroma component of the current block uses an intra-frame prediction mode based on a self-encoder, decoding the code stream to obtain the chroma characteristic information of the current block.
In this embodiment of the present application, the specific writing position of the chroma feature information of the current block in the code stream is not limited, and may be carried at any position in the code stream, for example.
In one example, the chroma feature information of the current block is carried in a syntax element corresponding to a chroma component of the current block. At this time, the decoding end decodes the syntax element corresponding to the chroma component of the current block; and obtaining the chroma characteristic information of the current block in the syntax element corresponding to the chroma component of the current block.
In another example, the chroma feature information of the current block is carried in the coding unit syntax element.
Optionally, a second flag intra_ae_flag is also included in the coding unit syntax element.
Specifically, the decoding end encodes a unit syntax (Coding unit syntax) bitstream, obtains a unit syntax element, and reads a second flag intra_ae_flag therefrom, which is a unit-level control flag for indicating whether a chroma component of the current block uses a self-encoder-based intra prediction mode. If this intra_ae_flag is 1, the intra prediction mode used for the self-encoder, representing the chroma component of the current block, is further read out the chroma feature information side [ ] of the current block.
In one example, taking VVC Draft 10 as an example, the changes in the associated syntax table of Coding unit syntax are shown in table 4:
TABLE 4 Table 4
In table 4 above, si_size indicates how many characteristic information elements need to be encoded. The coding of the index_cb [ ] and the index_cr [ ] using the 1-bit fixed length code (u (1)) is performed when the values of the index_cb [ ] and the index_cr [ ] are 0 or 1, and the coding may be performed by using a context model.
Alternatively, if the values of the side info_cb [ ] and side info_cr [ ] are larger, multi-bit codewords may be used for decoding, as exemplified by table 3 above.
S603, obtaining chromaticity values of reconstructed pixel points around the current block.
For example, the reconstructed pixel points around the current block include n rows of pixel points above the current block and/or m columns of pixel points at the left layer, where n and m are positive integers, and n and m may be equal or unequal. Each of the n rows of pixel points may be continuous or discontinuous. Each row of m rows of pixels may be continuous or discontinuous. Optionally, the n rows of pixel points may be adjacent or not, and the m columns of pixel points may be adjacent or not.
S604, inputting the chromaticity characteristic information of the current block and chromaticity values of reconstructed pixel points around the current block into a decoding network to obtain a chromaticity prediction block of the current block.
In some embodiments, the self-encoders corresponding to the blocks of different shapes may be different, and the self-encoders corresponding to the chrominance components may be different, so the decoding end may select the self-encoder corresponding to the chrominance component of the current block from the plurality of self-encoders according to the size of the chrominance component of the current block. Next, as shown in fig. 15, the obtained chroma characteristic information of the current block and the chroma values of the reconstructed pixels around the current block are input into a self-encoder corresponding to the chroma component of the current block, so as to obtain a chroma prediction block of the current block output by the decoding network.
According to the decoding method, through decoding a code stream, an intra-frame prediction mode of a chroma component of a current block is determined; if the intra-frame prediction mode of the chroma component of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain the chroma characteristic information of the current block; acquiring chromaticity values of reconstructed pixel points around the current block; and inputting the chromaticity characteristic information of the current block and chromaticity values of reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the chromaticity component of the current block to obtain a chromaticity prediction block of the current block output by the decoding network. That is, the embodiment of the application adds the intra-frame prediction mode based on the self-encoder when in the intra-frame prediction of the chroma, and enriches the intra-frame prediction mode of the chroma component. If the intra-frame prediction mode of the chroma component of the current block is determined to be based on the intra-frame prediction mode of the self-encoder, the chroma prediction block of the current block is determined according to the chroma characteristic information of the current block and the chroma values of reconstructed pixels around the current block, and under the condition that the original value of the current fast chroma component and the reconstructed values around the current block have little correlation, the accurate prediction of the chroma component of the current block can be realized and the accuracy of the intra-frame prediction of the chroma component is improved due to the fact that the pixel values of the reconstructed pixels around the current block are considered during prediction and the chroma characteristic information of the current block is considered.
Fig. 16 is another flowchart of a video decoding method according to an embodiment of the present application, where the second flag is used to indicate whether the intra prediction mode based on the self-encoder is used for the luminance component and the chrominance component of the current block. As shown in fig. 16, includes:
s701, decoding the code stream, and determining the intra prediction modes of the luminance component and the chrominance component of the current block.
The implementation manner of determining the intra prediction modes of the luminance component and the chrominance component of the current block in S701 described above includes, but is not limited to, the following:
in one aspect, if the code stream includes the first flag, the S701 includes the following S701-A1 and S701-A2:
S701-A1, a decoding end decodes a code stream to obtain a first mark, wherein the first mark is used for indicating whether a current sequence allows using an intra-frame prediction mode based on a self-encoder;
S701-A2, determining an intra prediction mode of a luminance component and a chrominance component of the current block according to the first flag.
For example, if the value of the first flag is equal to the second value (e.g., 0), indicating that the current sequence does not allow the use of the intra prediction mode based on the self-encoder, then it may be determined that neither the intra prediction mode of the luma component nor the chroma component of the current block is the intra prediction mode based on the self-encoder.
For another example, if the value of the first flag is equal to the first value (e.g., 1), it indicates that the current sequence allows the intra prediction mode based on the self-encoder, and at this time, the implementation manner of determining the intra prediction modes of the luminance component and the chrominance component of the current block according to the first flag in S701-A2 includes, but is not limited to, the following:
in a first implementation, if the value of the first flag is equal to the first value and the luma component and the chroma component of the current block have intra-prediction modes based on the self-encoder, the intra-prediction modes of the luma component and the chroma component of the current block are determined to be intra-prediction modes based on the self-encoder.
In a second implementation manner, the code stream includes a first flag and a second flag, and at this time, if the value of the first flag is equal to the first value, the decoding end decodes the code stream to obtain the second flag, where the second flag is used to indicate whether the luminance component and the chrominance component of the current block use intra prediction modes based on the self-encoder, and determines the intra prediction modes of the luminance component and the chrominance component of the current block according to the second flag. For example, when the value of the second flag is 1 (e.g., 1), the intra prediction modes indicating the luminance component and the chrominance component of the current block are both intra prediction modes based on the self-encoder, and when the value of the second flag is not 1, the intra prediction modes indicating the luminance component and the chrominance component of the current block are both other intra prediction modes than the intra prediction modes based on the self-encoder.
In the first aspect, the decoding end may determine the intra prediction modes of the luminance component and the chrominance component of the current block according to the first flag decoded from the code stream, or determine the intra prediction modes of the luminance component and the chrominance component of the current block according to the first flag and the second flag.
In a second aspect, if the code stream includes the second flag but does not include the first flag, the S701 includes the following S701-B1 and S701-B2:
S701-B1, decoding the code stream to obtain a second mark;
S701-B2, determining whether the luminance component and the chrominance component of the current block use the self-encoder-based intra prediction mode according to the second flag.
In the second mode, the code stream includes a second flag directly indicating whether the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, and when the first flag is not included, the decoding end decodes the code stream to obtain the second flag, and directly determines whether the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder according to the second flag. For example, when the value of the second flag is a value of 1 (e.g., 1), the intra prediction modes indicating the luminance component and the chrominance component of the current block are both intra prediction modes based on the self-encoder, and when the value of the second flag is not equal to the value of 1, the intra prediction modes indicating the luminance component and the chrominance component of the current block are both other intra prediction modes than the intra prediction modes based on the self-encoder.
In the second mode, the first flag is not written in the code stream, but the second flag is directly written in the code stream, so as to indicate whether the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, thereby saving code words and reducing the decoding burden of a decoding end.
The embodiment of the application does not limit the specific writing position of the first mark and the second mark in the code stream.
Optionally, the first flag is included in a sequence level parameter syntax element.
Optionally, the second flag is included in an encoding unit syntax element.
S702, if the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, the code stream is decoded, and the luminance characteristic information and the chrominance characteristic information of the current block are obtained.
In this embodiment of the present application, specific writing positions of luminance feature information and chrominance feature information of a current block in a code stream are not limited, and may be carried at any position in the code stream, for example.
In one example, the chroma feature information of the current block is carried in syntax elements corresponding to the luma component and the chroma component of the current block. At this time, the decoding end decodes syntax elements corresponding to the luminance component and the chrominance component of the current block; and obtaining the luminance characteristic information and the chrominance characteristic information of the current block in syntax elements corresponding to the luminance component and the chrominance component of the current block.
In another example, luminance characteristic information and chrominance characteristic information of the current block are carried in the coding unit syntax element.
Optionally, a second flag intra_ae_flag is also included in the coding unit syntax element.
Specifically, the decoding end encodes a unit syntax (Coding unit syntax) bitstream, obtains a unit syntax element, and reads a second flag intra_ae_flag therefrom, which is a unit-level control flag for indicating whether a luminance component and a chrominance component of the current block use an intra prediction mode based on a self-encoder. If the intra_ae_flag is 1, the intra prediction mode of the luma component and the chroma component of the current block used for the self-encoder is indicated, and the luma feature information and the chroma feature information of the current block are further read.
S703, obtaining pixel values of reconstructed pixel points around the current block.
For example, the reconstructed pixel points around the current block include n rows of pixel points above the current block and/or m columns of pixel points at the left layer, where n and m are positive integers, and n and m may be equal or unequal. Each of the n rows of pixel points may be continuous or discontinuous. Each row of m rows of pixels may be continuous or discontinuous. Optionally, the n rows of pixel points may be adjacent or not, and the m columns of pixel points may be adjacent or not.
The pixel values of the reconstructed pixel points around the current block include a chrominance value and a luminance value.
S704, inputting the chromaticity characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network to obtain a luminance prediction block and a chromaticity prediction block of the current block.
In some embodiments, the self-encoders corresponding to the blocks of different shapes may be different, and the self-encoders corresponding to the luminance component and the chrominance component may be different, so the decoding end may select the self-encoder corresponding to the luminance component and the self-encoder corresponding to the chrominance component of the current block from a plurality of self-encoders according to the sizes of the luminance component and the chrominance component of the current block. Next, as shown in fig. 17, the obtained chroma characteristic information of the current block and the chroma values of the reconstructed pixels around the current block are input into a self-encoder corresponding to the chroma component of the current block, so as to obtain a chroma prediction block of the current block output by the decoding network. And inputting the obtained brightness characteristic information of the current block and brightness values of reconstructed pixel points around the current block into a self-encoder corresponding to the brightness component of the current block to obtain a brightness prediction block of the current block output by a decoding network.
Optionally, the chroma component and the luma component of the current block correspond to the same self-encoder, and at this time, the chroma characteristic information and the luma characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block may be input into the self-encoder corresponding to the current block, so as to obtain the chroma prediction block and the luma prediction block of the current block output by the decoding network.
According to the decoding method, through decoding a code stream, intra-frame prediction modes of a luminance component and a chrominance component of a current block are determined; if the intra-frame prediction modes of the luminance component and the chrominance component of the current block are both based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain luminance characteristic information and chrominance characteristic information of the current block; and inputting the chromaticity characteristic information, the brightness characteristic information and the pixel values of the reconstructed pixel points around the current block into a decoding network of the encoder to obtain a brightness prediction block and a chromaticity prediction block of the current block output by the decoding network. That is, according to the embodiment of the application, the luminance component and the chrominance component of the current block can be indicated and predicted at the same time, and the prediction efficiency of the current block can be improved.
The decoding method according to the embodiment of the present application is described above, and on this basis, the encoding method provided by the embodiment of the present application is described below.
Fig. 18 is a schematic flow chart of a video encoding method according to an embodiment of the present application, which is applied to the video encoder shown in fig. 1 and fig. 2. As shown in fig. 18, the method of the embodiment of the present application includes:
s801, determining an intra-frame prediction mode of a current block from preset N first intra-frame prediction modes, wherein N is a positive integer, and the N first intra-frame prediction modes comprise intra-frame prediction modes based on a self-encoder.
In the video coding process, a video encoder receives a video stream composed of a series of image frames, performs video coding for each frame image in the video stream, and performs block division on the image frames to obtain a current block.
In some embodiments, the current block is also referred to as a current encoded block, a current image block, an encoded block, a current encoding unit, a current block to be encoded, an image block to be encoded, and the like.
In block division, a block divided by a conventional method includes both a chrominance component of a current block position and a luminance component of the current block position. Whereas a split tree technique (dual tree) may divide separate component blocks, such as separate luminance blocks, which may be understood to contain only the luminance component of the current block position, and separate chrominance blocks, which may be understood to contain only the chrominance component of the current block position. Thus the luminance component and the chrominance component in the same position may belong to different blocks, and the division may have greater flexibility. If a split tree is used in the CU partitioning, then some CUs contain both luma and chroma components, some CUs contain only luma components, and some CUs contain only chroma components.
In some embodiments, the current block of the embodiments of the present application includes only a chrominance component, which may be understood as a chrominance block.
In some embodiments, the current block of the embodiments of the present application includes only a luminance component, which may be understood as a luminance block.
In some embodiments, the current block includes both a luma component and a chroma component.
When the video encoder performs Intra prediction on the current block, at least one Intra prediction mode of the N first Intra prediction modes is tried, for example, intra prediction mode based on the self-encoder, DM mode, DC mode (intra_chroma_dc), horizontal mode (intra_chroma_horizontal), vertical mode (intra_chroma_vertical), bilinear (Bilinear) mode, PCM mode, and cross-component prediction mode (TSCPM, PMC, CCLM in VVC), etc.
In this embodiment, the video encoder determines the intra prediction mode of the current block from the N first intra prediction modes, including but not limited to the following:
in one aspect, the video encoder determines the intra-prediction mode of the current block from the N first intra-prediction modes according to the feature of the current block, for example, if the pixel value of the current block does not have a great correlation with the pixel values of surrounding reconstructed pixels, the intra-prediction mode based on the self-encoder may be determined as the intra-prediction mode of the current block.
In a second mode, the video encoder determines an intra prediction mode of the current block from among N first intra prediction modes in the following manner S8011.
S8011, determining the intra-frame prediction mode of the current block from N first intra-frame prediction modes according to the rate distortion cost.
For example, a rate distortion cost corresponding to each of the N first intra-prediction modes is calculated, and the first intra-prediction mode with the smallest rate distortion cost is determined as the intra-prediction mode of the current block.
In this embodiment, the rate-distortion cost corresponding to each first intra-frame prediction mode may be calculated by using an existing mode of calculating the rate-distortion cost. Optionally, for example, the calculation amount is reduced, and the intra prediction mode of the current block may be determined from N first intra prediction modes by coarse screening or coarse screening plus fine screening, where the implementation manner includes, but is not limited to, the following several examples 1 and 2:
example 1, the step S8011 includes the steps of S8011-A1 to S8011-A3 as follows:
S8011-A1, determining a predicted value corresponding to a first intra-frame prediction mode when the current block is encoded by using the first intra-frame prediction mode;
S8011-A2, determining a first rate distortion cost of the first intra-frame prediction mode according to the distortion between the predicted value and the original value of the current block and the number of bits consumed when the flag bit of the first intra-frame prediction mode is encoded;
S8011-A3, determining the intra-frame prediction mode of the current block from N first intra-frame prediction modes according to the first rate distortion cost.
Specifically, for each of the N first intra-frame prediction modes, the current block is predicted by using the first intra-frame prediction mode, so as to obtain a predicted value of the current block, where the predicted value is a predicted value corresponding to the first intra-frame prediction mode. Then, the predicted value corresponding to the first intra prediction mode is compared with the original value of the current block, the distortion D1 between the predicted value corresponding to the first intra prediction mode and the original value of the current block is calculated, and the number of bits R2 consumed in encoding the flag of the first intra prediction mode is calculated. A first rate-distortion cost J1 corresponding to the first intra-prediction mode is determined according to the distortion D1 and the bit number R2, for example, j1=d1+r1. According to the above manner, the first rate distortion cost J1 corresponding to each of the N first intra prediction modes can be determined. And finally, determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the first rate distortion cost J1 corresponding to each intra-frame prediction mode.
According to the method and the device for determining the first rate-distortion cost, the first rate-distortion cost is determined through the distortion between the predicted value and the original value and the bit number consumed by the coding marker bit, compared with the method and the device for determining the first rate-distortion cost through the distortion between the reconstructed value and the original value and the coding bit number corresponding to the whole intra-frame prediction mode, the method and the device avoid calculating the reconstructed value and counting the bit number of the whole coding process, greatly reduce the calculated amount and improve the calculation speed of the first rate-distortion cost. When the intra-frame prediction mode of the current block is selected based on the first rate-distortion cost, the selection speed of the intra-frame prediction mode can be effectively improved.
The implementation manner of determining the intra prediction mode of the current block from the N first intra prediction modes according to the first rate-distortion cost in S8011-A3 includes, but is not limited to, the following:
in one possible implementation 1, according to the first rate-distortion cost, the intra-prediction mode of the current block is determined by the first intra-prediction mode with the smallest first rate-distortion cost among the N first intra-prediction modes. The determining method has the advantages of simple process, small calculated amount and high determining speed.
In one possible implementation 2, the video encoder determines the intra prediction mode of the current block by means of the following steps S8011-a31 to S8011-a 34:
S8011-A31, selecting M second intra-frame prediction modes from N first intra-frame prediction modes according to the first rate distortion cost, wherein M is a positive integer smaller than N; the process can be understood as coarse screening, i.e. according to the first rate-distortion cost, roughly selecting M second intra-frame prediction modes from N first intra-frame prediction modes, and carrying out candidate fine screening.
S8011-A32, determining a reconstruction value corresponding to the second intra-frame prediction mode when the current block is encoded by using the second intra-frame prediction mode.
S8011-A33, determining a second rate distortion cost of a second intra-frame prediction mode according to the distortion between the reconstruction value and the original value of the current block and the number of bits consumed when the current block is encoded by using the second intra-frame prediction mode;
and S8011-A34, determining a second intra-frame prediction mode with the minimum second rate distortion cost from the M second intra-frame prediction modes as the intra-frame prediction mode of the current block.
Specifically, according to the first rate distortion cost, M second intra-frame prediction modes are roughly selected from N first intra-frame prediction modes, and then the intra-frame prediction mode of the current block is finely selected from the M second intra-frame prediction modes. Specifically, for each of the M second intra-frame prediction modes, a prediction value corresponding to the second intra-frame prediction mode is added to a residual value to obtain a reconstruction value of the current block, and the reconstruction value is recorded as a reconstruction value corresponding to the second intra-frame prediction mode. Calculating a distortion D2 between the reconstructed value and an original value of the current block, and counting a number of bits R2 consumed when the current block is encoded using a second intra prediction mode, and determining a second rate-distortion cost J2 corresponding to the second intra prediction mode, for example, j2=d2+r2, according to the distortion D2 and the number of bits R2. According to the above manner, the second rate distortion cost J2 corresponding to each of the M second intra prediction modes can be determined. And finally, determining the second intra-frame prediction mode with the minimum second rate distortion cost J2 from the M second intra-frame prediction modes as the intra-frame prediction mode of the current block.
In the first example, the N first intra prediction modes are coarse-filtered and fine-filtered together to determine the intra prediction mode of the current block.
In some embodiments, S8011 may also be implemented according to the method of example two below.
In example two, the first intra prediction mode other than the intra prediction mode based on the self-encoder is first coarse-screened, and the intra prediction mode based on the self-encoder is fine-screened together with the coarse-screened first intra prediction mode, so as to increase the use probability of the intra prediction mode based on the self-encoder. That is, the above-mentioned S8011 includes the steps of S8011-B1 to S8011-B3 as follows:
S8011-B1, determining a first rate distortion cost of a third intra-frame prediction mode according to the distortion between a predicted value corresponding to the third intra-frame prediction mode and an original value of the current block and the number of bits consumed when a flag bit of the third intra-frame prediction mode is encoded, wherein the third intra-frame prediction mode is a first intra-frame prediction mode except for an intra-frame prediction mode based on a self-encoder in N first intra-frame prediction modes;
S8011-B2, Q third intra-frame prediction modes are selected from N-1 third intra-frame prediction modes according to the first rate distortion cost, and Q is a positive integer smaller than N-1.
Specifically, for convenience of description, a first intra prediction mode other than the intra prediction mode based on the self-encoder among the N first intra prediction modes is denoted as a third intra prediction mode, and N-1 third intra prediction modes are used. And calculating a third intra-frame prediction mode corresponding to each of the N-1 third intra-frame prediction modes according to the same method for calculating the first rate-distortion cost. And selecting Q third intra-frame prediction modes with the minimum first rate-distortion cost from N-1 third intra-frame prediction modes according to the first rate-distortion cost.
S8011-B3, according to the preset rounding range of the first characteristic information, determining P predicted values corresponding to the intra-frame prediction mode based on the self-encoder, selecting R predicted values from the P predicted values, wherein P, R is a positive integer, and R is smaller than or equal to P.
In order to reduce the amount of calculation, the prediction process and rounding process of the coding network are abandoned for the intra-frame prediction mode based on the self-encoder, and P possible values of the rounded first feature information are predicted according to a preset rounding range of the first feature information, and P prediction values corresponding to the intra-frame prediction mode based on the self-encoder are determined according to the P possible values of the first feature information.
Alternatively, the preset rounding range of the first characteristic information may be 0,1, or-1, 0,1, or the like.
In one example, according to the preset rounded range of the first characteristic information, determining P prediction values corresponding to the intra prediction mode based on the self-encoder may be achieved by: predicting P values of the first characteristic information output by the coding network according to a preset rounding range of the first characteristic information; inputting the characteristic information under P values and the pixel values of the reconstructed pixel points around the current block into a decoding network to obtain predicted values under the P values output by the decoding network; and determining the predicted values under the P values as P predicted values corresponding to the intra-frame prediction mode based on the self-encoder.
For example, assuming that the preset rounding range of the first characteristic information may be 0,1 and the first characteristic information is a 1X2 characteristic vector, P possible values of the first characteristic information are {0,0}, {0,1}, {1,0}, {1,1}, and p=2, respectively 2 . Alternatively, if the first feature information is a feature vector of 1Xn, p=2 n N is a positive integer greater than or equal to 1. For the 4 values of the first information, each of the 4 values is takenThe first characteristic information under the value and the pixel values of the reconstructed pixel points around the current block are input into a decoding network to obtain a predicted value under the value output by the decoding network, and then the predicted value under P values is obtained.
In one example, r=p, and the prediction values under the determined P values are added to the fine filtering process, so as to determine the intra prediction mode of the current block.
In one example, R is smaller than P, and the selecting R predictors from the P predictors in S8011-B3 includes the following ways:
in one embodiment, R predictors are randomly selected from the P predictors.
In the second mode, R predicted values closest to the original value of the current block are selected from among the P predicted values.
Thirdly, determining a fourth rate distortion cost corresponding to the P predicted values according to the distortion between the P predicted values and the original value of the current block; and selecting R predicted values with the minimum fourth rate distortion cost from the P predicted values. Optionally, the fourth rate-distortion cost corresponding to each of the P predictors may be equal to the distortion D1 between the predictor and the original value of the current block. Optionally, the fourth rate-distortion cost corresponding to the predicted value is equal to the sum of the distortion D1 between the predicted block and the original value of the current block and the number of bits R1 consumed to encode the flag bit based on the intra mode of the self-encoder. I.e., coarsely screening R predicted values from P predicted values, performs S8011-B4 as follows.
S8011-B4, determining an intra prediction mode of the current block from the N first intra prediction modes based on Q prediction values corresponding to the Q third intra prediction modes and R prediction values corresponding to the intra prediction modes of the self-encoder.
Implementations of S8011-B4 described above include, but are not limited to, the following:
in one aspect, the Q prediction values corresponding to the Q third intra prediction modes and the R prediction values corresponding to the intra prediction modes based on the self-encoder are compared with the original value of the current block, and the intra prediction mode corresponding to the preset value closest to the original value is determined as the intra prediction mode of the current block.
In a second mode, the intra prediction mode of the current block is selected by fine filtering, that is, the step S8011-B4 includes the following steps S8011-B41 to S8011-B43:
S8011-B41, Q reconstruction values corresponding to the Q predicted values and R reconstruction values corresponding to the R predicted values are determined.
Specifically, a residual value corresponding to each of the Q predicted values is determined, and the residual value and the predicted value are added to obtain a reconstructed value corresponding to the preset value, thereby obtaining Q reconstructed values. And similarly, determining a residual value corresponding to each predicted value in the R predicted values, and adding the residual value and the predicted value to obtain a reconstructed value corresponding to the predicted value, thereby obtaining R reconstructed values.
S8011-B42, determining a third rate distortion cost according to the distortion between the Q+R reconstruction values and the original value of the current block respectively and the number of bits consumed when the current block is encoded by using the first intra prediction mode corresponding to the Q+R reconstruction values.
According to the above steps, q+r reconstructed values may be obtained, for each of the q+r reconstructed values, a distortion D3 between the reconstructed value and an original value of the current block is calculated, and a sum of D3 and R3 is determined as a third rate-distortion cost corresponding to the reconstructed value, using the number of bits R3 consumed when encoding the current block in the first intra prediction mode corresponding to the reconstructed value.
S8011-B43, determining the first intra-frame prediction mode with the minimum third rate distortion cost from the N first intra-frame prediction modes as the intra-frame prediction mode of the current block.
In this implementation, adding R prediction values based on the intra prediction mode of the self-encoder increases the probability of selection of the intra prediction mode based on the self-encoder during fine screening of the intra prediction mode.
According to the above steps, after determining the intra prediction mode of the current block, the following step S802 is performed.
S802, if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, acquiring the self-encoder corresponding to the current block, wherein the self-encoder comprises an encoding network and a decoding network.
As can be seen from the above, the self-encoders corresponding to the blocks of different shapes and sizes may be different, and thus, the video encoder selects the self-encoder corresponding to the current block from among the different self-encoders according to the size of the current block.
S803, the original value of the current block is input into the coding network, and the first characteristic information of the current block output by the coding network is obtained.
S804, inputting the first characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network to obtain a predicted block of the current block output by the decoding network.
Next, an original value (i.e., an original pixel value) of the current block is input into an encoding network of the encoder, and first characteristic information of the current block output by the encoding network is obtained. And then, inputting the first characteristic information and the pixel values of the reconstructed pixel points around the current block into a decoding network to obtain a prediction block of the current block output by the decoding network.
In some embodiments, the current block includes a luminance component and/or a chrominance component, and at this time, S803 includes the following ways:
in one aspect, if it is determined that the luminance component of the current block uses the intra-prediction mode based on the self-encoder, the original luminance value of the current block is input into the encoding network, and the first luminance characteristic information of the current block is obtained.
In a second mode, if it is determined that the chroma component of the current block uses the intra-frame prediction mode based on the self-encoder, the original chroma value of the current block is input into the encoding network, and the first chroma characteristic information of the current block is obtained.
In a third mode, if it is determined that both the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, the original luminance value and the original chrominance value of the current block are input into the encoding network, and the first luminance characteristic information and the first chrominance characteristic information of the current block are obtained. Alternatively, the original luminance value and the original chrominance value of the current block may be input into the encoding network at the same time, or may be input into the encoding network one by one.
On this basis, the following modes are included in the S804:
in one mode, if the first characteristic information of the current block includes first luminance characteristic information, the first luminance characteristic information and luminance values of reconstructed pixels around the current block are input into a decoding network to obtain a luminance prediction block of the current block.
In the second mode, if the first characteristic information of the current block includes the first chrominance characteristic information, the first chrominance characteristic information and chrominance values of reconstructed pixel points around the current block are input into a decoding network to obtain a chrominance prediction block of the current block.
In a third mode, if the first characteristic information of the current block includes the first luminance characteristic information and the first chrominance characteristic information, and pixel values of reconstructed pixel points around the current block are input into a decoding network to obtain a luminance prediction block and a chrominance prediction block of the current block.
In some embodiments, as can be seen from the above, since the characteristic information output by the coding network needs to be written into the code stream, rounding needs to be performed, based on which the above S803 includes S803-A1 and S803-A2:
S803-A1, rounding the first characteristic information of the current block to obtain second characteristic information of the current block;
S803-A2, inputting the second characteristic information and the pixel values of the reconstructed pixel points around the current block into a decoding network to obtain a predicted block of the current block output by the decoding network.
In some embodiments, if the current block includes a chrominance component and/or a chrominance component, the above S803-A1 includes the following ways:
in one mode, if the first characteristic information of the current block includes the first luminance characteristic information, the first luminance characteristic information is valued to obtain the second luminance characteristic information of the current block.
In the second mode, if the first characteristic information of the current block includes the first chrominance characteristic information, the first chrominance characteristic information is valued to obtain the second chrominance characteristic information of the current block.
In a third aspect, if the first characteristic information of the current block includes first luminance characteristic information and first chrominance characteristic information, the first luminance characteristic information and the first chrominance characteristic information are respectively valued to obtain second luminance characteristic information and second chrominance characteristic information of the current block.
Correspondingly, the above S803-A2 includes the following modes:
in one mode, if the second characteristic information of the current block includes second luminance characteristic information, the second luminance characteristic information and luminance values of reconstructed pixels around the current block are input into a decoding network to obtain a luminance prediction block of the current block.
And if the second characteristic information of the current block comprises second chromaticity characteristic information, inputting the second chromaticity characteristic information and chromaticity values of reconstructed pixel points around the current block into a decoding network to obtain a chromaticity prediction block of the current block.
In a third mode, if the second characteristic information of the current block includes the second luminance characteristic information and the second chrominance characteristic information, and pixel values of reconstructed pixel points around the current block are input into a decoding network to obtain a luminance prediction block and a chrominance prediction block of the current block.
In some embodiments, the video encoding apparatus writes second characteristic information of the current block to the bitstream.
Optionally, if the second characteristic information of the current block includes second luminance characteristic information, writing the second luminance characteristic information into the code stream;
optionally, if the second characteristic information of the current block includes second chroma characteristic information, writing the second chroma characteristic information into the code stream;
optionally, if the second characteristic information of the current block includes second luminance characteristic information and second chrominance characteristic information, the second luminance characteristic information and the second chrominance characteristic information are written into the code stream.
In some embodiments, the video encoding device writes a first flag in the bitstream that indicates whether the current sequence allows use of the self-encoder based intra prediction mode.
In some embodiments, the video encoding apparatus further writes a second flag in the bitstream if the value of the first flag is a first value, the second flag indicating whether the current block uses the self-encoder based intra-prediction mode, the first value indicating that the current sequence allows use of the self-encoder based intra-prediction mode.
In some embodiments, the video encoding device writes the second identification directly in the bitstream without writing the first flag to decode the codeword.
Optionally, the first flag is included in a sequence level parameter code stream.
Optionally, the second flag is included in the coding unit syntax element.
Optionally, if the current block includes chrominance information and luminance information, the second flag is used to indicate whether the luminance component and/or the chrominance component of the current block uses the intra-prediction mode based on the self-encoder.
According to the encoding method, the intra-frame prediction mode of the current block is determined from preset N first intra-frame prediction modes, N is a positive integer, and the N first intra-frame prediction modes comprise intra-frame prediction modes based on a self-encoder; if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, acquiring the self-encoder corresponding to the current block, wherein the self-encoder comprises an encoding network and a decoding network; inputting an original value of the current block into the coding network to obtain first characteristic information of the current block output by the coding network; and inputting the first characteristic information of the current block and pixel values of reconstructed pixel points around the current block into the decoding network to obtain a predicted block of the current block output by the decoding network. That is, the present application adds the intra-frame prediction mode based on the self-encoder, enriching the intra-frame prediction mode. If the intra-frame prediction mode of the chroma component of the current block is determined to be the intra-frame prediction mode based on the self-encoder, the prediction block of the current block is determined according to the pixel value of the current block and the pixel values of the reconstructed pixel points around the current block, and under the condition that the correlation between the current fast original value and the reconstructed value around the current block is not large, the prediction takes not only the pixel values of the reconstructed pixel points around the current block but also the characteristic information of the current block into consideration, so that the accurate prediction of the current block can be realized, and the accuracy of the intra-frame prediction is improved.
It should be understood that fig. 10-18 are only examples of the present application and should not be construed as limiting the present application.
The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in detail. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be considered as disclosed herein.
It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application. In addition, in the embodiment of the present application, the term "and/or" is merely an association relationship describing the association object, which means that three relationships may exist. Specifically, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Method embodiments of the present application are described in detail above in connection with fig. 10-18, and apparatus embodiments of the present application are described in detail below in connection with fig. 19-21.
Fig. 19 is a schematic block diagram of a video decoder provided in an embodiment of the present application.
As shown in fig. 19, the video decoder 10 includes:
a mode determining unit 11 for decoding the code stream and determining an intra prediction mode of the current block;
a feature determining unit 12, configured to decode the code stream if the intra-prediction mode of the current block is an intra-prediction mode based on a self-encoder, and obtain feature information of the current block;
an obtaining unit 13, configured to obtain pixel values of reconstructed pixel points around the current block;
and the prediction unit 14 is configured to input the characteristic information of the current block and pixel values of reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block, so as to obtain a predicted block of the current block output by the decoding network.
In some embodiments, the mode determining unit 11 is specifically configured to decode the code stream to obtain a first flag, where the first flag is used to indicate whether the current sequence allows use of an intra prediction mode based on a self-encoder; and determining an intra-frame prediction mode of the current block according to the first mark.
In some embodiments, the mode determining unit 11 is specifically configured to decode the code stream to obtain a second flag if the value of the first flag is a first value, where the second flag is used to indicate whether the current block uses the intra-prediction mode based on the self-encoder, and the first value is used to indicate that the current sequence allows using the intra-prediction mode based on the self-encoder; and determining an intra-frame prediction mode of the current block according to the second mark.
In some embodiments, the mode determining unit 11 is specifically configured to decode the code stream to obtain a second flag, where the second flag is used to indicate whether the current block uses the intra prediction mode based on the self-encoder; and determining an intra-frame prediction mode of the current block according to the second mark.
Optionally, the first flag is included in a sequence level parameter syntax element.
Optionally, the second flag is included in an encoding unit syntax element.
Optionally, the second flag is used to indicate whether the luminance component and/or the chrominance component of the current block uses an intra prediction mode based on a self-encoder.
In some embodiments, the mode determining unit 11 is specifically configured to determine, according to the second flag, whether the luminance component and the chrominance component of the current block use the intra-prediction mode based on the self-encoder if the second flag is used to indicate whether the luminance component and the chrominance component of the current block use the intra-prediction mode based on the self-encoder;
If the second flag is used for indicating whether the luminance component of the current block uses the intra-frame prediction mode based on the self-encoder, determining whether the luminance component of the current block uses the intra-frame prediction mode based on the self-encoder according to the second flag;
if the second flag is used to indicate whether the chroma component of the current block uses the self-encoder based intra-prediction mode, determining whether the chroma component of the current block uses the self-encoder based intra-prediction mode based on the second flag.
In some embodiments, the feature determining unit 12 is specifically configured to decode the code stream to obtain luminance characteristic information and chrominance characteristic information of the current block if both the luminance component and the chrominance component of the current block use intra-frame prediction modes based on a self-encoder;
if the brightness component of the current block uses an intra-frame prediction mode based on a self-encoder, decoding the code stream to obtain brightness characteristic information of the current block;
and if the chroma component of the current block uses the intra-frame prediction mode based on the self-encoder, decoding the code stream to obtain the chroma characteristic information of the current block.
In some embodiments, the feature determining unit 12 is specifically configured to decode a syntax element corresponding to a luminance component of the current block, to obtain the syntax element corresponding to the luminance component of the current block; and obtaining the brightness characteristic information of the current block in the syntax element corresponding to the brightness component of the current block.
In some embodiments, the feature determining unit 12 is specifically configured to decode a syntax element corresponding to a chroma component of the current block, to obtain the syntax element corresponding to the chroma component of the current block; and obtaining the chromaticity characteristic information of the current block in the syntax element corresponding to the chromaticity component of the current block.
In some embodiments, the prediction unit 14 is specifically configured to, if the characteristic information of the current block includes luminance characteristic information and chrominance characteristic information of the current block, input the luminance characteristic information of the current block and luminance values of reconstructed pixels around the current block into the decoding network to obtain a luminance prediction block of the current block, and input the chrominance characteristic information of the current block and chrominance values of reconstructed pixels around the current block into the decoding network to obtain a chrominance prediction block of the current block;
if the characteristic information of the current block comprises the brightness characteristic information of the current block, inputting the brightness characteristic information of the current block and brightness values of reconstructed pixel points around the current block into the decoding network to obtain a brightness prediction block of the current block;
And if the characteristic information of the current block comprises the chromaticity characteristic information of the current block, inputting the chromaticity characteristic information of the current block and chromaticity values of reconstructed pixel points around the current block into the decoding network to obtain a chromaticity prediction block of the current block.
Optionally, the element value in the characteristic information of the current block is an integer.
Optionally, the characteristic information of the current block is obtained by rounding the characteristic information output by the last layer activation function of the coding network of the self-encoder.
Optionally, the value range of the element value in the characteristic information output by the last layer of activation function of the coding network is [ a, b ], and a and b are integers.
Optionally, a is 0, and b is 1.
Illustratively, the last layer of the encoding network has an expression of an activation function:
wherein x is the input of the last layer activation function, and S (x) is the characteristic information output by the last layer activation function.
Optionally, a is-1, and b is 1.
Illustratively, the last layer of the encoding network has an expression of an activation function:
wherein x is the input of the last layer activation function, S (x) is the characteristic information output by the last layer activation function, and n is a positive integer.
Optionally, n is 10.
In some embodiments, the self-encoder inputs the original characteristic information output by the encoding network into the decoding network after noise adding processing in the training process.
In some embodiments, the self-encoder inputs the original characteristic information output by the encoding network into the decoding network after taking a value in the forward propagation process, and performs derivative operation on the original characteristic information output by the encoding network to update the weight parameters in the encoding network in the backward propagation process.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the video decoder 10 shown in fig. 19 may perform the decoding method according to the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the video decoder 10 are respectively for implementing corresponding flows in each method such as the foregoing decoding method, which are not described herein for brevity.
Fig. 20 is a schematic block diagram of a video encoder provided in an embodiment of the present application.
As shown in fig. 20, the video encoder 20 may include:
A mode determining unit 21, configured to determine an intra prediction mode of a current block from preset N first intra prediction modes, where N is a positive integer, and the N first intra prediction modes include intra prediction modes based on a self-encoder;
an obtaining unit 22, configured to obtain a self-encoder corresponding to the current block if the intra-prediction mode of the current block is an intra-prediction mode based on the self-encoder, where the self-encoder includes an encoding network and a decoding network;
a feature determining unit 23, configured to input an original value of the current block into the coding network, to obtain first characteristic information of the current block output by the coding network;
and the prediction unit 24 is configured to input the first characteristic information of the current block and pixel values of reconstructed pixel points around the current block into the decoding network, so as to obtain a predicted block of the current block output by the decoding network.
In some embodiments, the prediction unit 24 is specifically configured to round up the first characteristic information of the current block to obtain the second characteristic information of the current block; and inputting the second characteristic information and the pixel values of the reconstructed pixel points around the current block into the decoding network to obtain a predicted block of the current block output by the decoding network.
In some embodiments, the prediction unit 24 is further configured to write second characteristic information of the current block into a code stream.
In some embodiments, the prediction unit 24 is further configured to write a first flag in the code stream, where the first flag is used to indicate whether the current sequence allows use of the intra prediction mode based on the self-encoder.
In some embodiments, the prediction unit 24 is further configured to write a second flag in the code stream if the value of the first flag is a first value, where the second flag is used to indicate whether the current block uses a self-encoder based intra prediction mode, and the first value is used to indicate that the current sequence allows use of the self-encoder based intra prediction mode.
In some embodiments, the prediction unit 24 is further configured to write a second flag in the bitstream, where the second flag is used to indicate whether the current block uses a self-encoder based intra prediction mode.
Optionally, the first flag is included in a sequence level parameter syntax element.
Optionally, the second flag is included in an encoding unit syntax element.
Optionally, the second flag is used to indicate whether the luminance component and/or the chrominance component of the current block uses an intra prediction mode based on a self-encoder.
In some embodiments, the mode determining unit 21 is specifically configured to determine the intra-prediction mode of the current block from the N first intra-prediction modes according to a rate-distortion cost.
In some embodiments, the mode determining unit 21 is specifically configured to determine a prediction value corresponding to the first intra-prediction mode when the current block is encoded using the first intra-prediction mode; determining a first rate distortion cost of the first intra-prediction mode according to the distortion between the predicted value and the original value of the current block and the number of bits consumed in encoding the flag bit of the first intra-prediction mode; and determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the first rate distortion cost.
In some embodiments, the mode determining unit 21 is specifically configured to select M second intra-prediction modes from the N first intra-prediction modes according to the first rate-distortion cost, where M is a positive integer less than N; determining a reconstruction value corresponding to the second intra-frame prediction mode when the current block is encoded by using the second intra-frame prediction mode; determining a second rate-distortion cost of the second intra-prediction mode according to the distortion between the reconstructed value and the original value of the current block and the number of bits consumed in encoding the current block using the second intra-prediction mode; and determining a second intra-frame prediction mode with the minimum second rate distortion cost from the M second intra-frame prediction modes as the intra-frame prediction mode of the current block.
In some embodiments, the mode determining unit 21 is specifically configured to determine a first rate-distortion cost of a third intra-prediction mode according to a distortion between a prediction value corresponding to the third intra-prediction mode and an original value of the current block, and a number of bits consumed when encoding a flag bit of the third intra-prediction mode, where the third intra-prediction mode is a first intra-prediction mode other than an intra-prediction mode based on a self-encoder among the N first intra-prediction modes; according to the first rate distortion cost, selecting Q third intra-frame prediction modes from N-1 third intra-frame prediction modes, wherein Q is a positive integer smaller than N-1; according to the preset rounding range of the first characteristic information, P predicted values corresponding to an intra-frame prediction mode based on a self-encoder are determined, R predicted values are selected from the P predicted values, P, R is a positive integer, and R is smaller than or equal to P; and determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the Q prediction values corresponding to the Q third intra-frame prediction modes and the R prediction values corresponding to the intra-frame prediction modes based on the self-encoder.
In some embodiments, the mode determining unit 21 is specifically configured to determine Q reconstructed values corresponding to the Q predicted values and R reconstructed values corresponding to the R predicted values; determining a third rate-distortion cost according to the distortion between the Q+R reconstruction values and the original value of the current block and the number of bits consumed when the current block is encoded by using a first intra-frame prediction mode corresponding to the Q+R reconstruction values; and determining a first intra-frame prediction mode with the minimum third rate distortion cost from the N first intra-frame prediction modes as the intra-frame prediction mode of the current block.
In some embodiments, the mode determining unit 21 is specifically configured to predict P values of the first characteristic information output by the encoding network according to a preset rounding range of the first characteristic information; inputting the characteristic information under the P values and the pixel values of the reconstructed pixel points around the current block into the decoding network to obtain predicted values under the P values output by the decoding network; and determining the predicted values under the P values as P predicted values corresponding to the intra-frame prediction mode based on the self-encoder.
In some embodiments, if the R is smaller than the P, the mode determining unit 21 is specifically configured to determine a fourth rate-distortion cost corresponding to the P predicted values according to the distortion between the P predicted values and the original value of the current block; and selecting R predicted values with the fourth rate distortion cost being the smallest from the P predicted values.
In some embodiments, the feature determining unit 23 is specifically configured to, if it is determined that the luminance component of the current block uses the intra prediction mode based on the self-encoder, input the original luminance value of the current block into the encoding network, to obtain the first luminance characteristic information of the current block;
if the chroma component of the current block is determined to use an intra-frame prediction mode based on a self-encoder, inputting an original chroma value of the current block into the coding network to obtain first chroma characteristic information of the current block;
if it is determined that both the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, the original luminance value and the original chrominance value of the current block are input into the encoding network, and first luminance characteristic information and first chrominance characteristic information of the current block are obtained.
In some embodiments, the feature determining unit 23 is specifically configured to, if the first characteristic information of the current block includes the first luminance characteristic information, input the first luminance characteristic information and luminance values of reconstructed pixels around the current block into the decoding network to obtain a luminance prediction block of the current block;
If the first characteristic information of the current block comprises the first chrominance characteristic information, inputting the first chrominance characteristic information and chrominance values of reconstructed pixel points around the current block into the decoding network to obtain a chrominance prediction block of the current block;
and if the first characteristic information of the current block comprises the first brightness characteristic information and the first chrominance characteristic information, inputting the first brightness characteristic information, the first chrominance characteristic information and pixel values of reconstructed pixel points around the current block into the decoding network to obtain a brightness prediction block and a chrominance prediction block of the current block.
In some embodiments, the feature determining unit 23 is specifically configured to, if the first characteristic information of the current block includes the first luminance characteristic information, take a value on the first luminance characteristic information to obtain second luminance characteristic information of the current block;
if the first characteristic information of the current block comprises the first chromaticity characteristic information, the first chromaticity characteristic information is valued to obtain second chromaticity characteristic information of the current block;
and if the first characteristic information of the current block comprises the first brightness characteristic information and the first chromaticity characteristic information, respectively taking values of the first brightness characteristic information and the first chromaticity characteristic information to obtain second brightness characteristic information and second chromaticity characteristic information of the current block.
In some embodiments, the prediction unit 24 is specifically configured to write the second luminance characteristic information into the code stream if the second luminance characteristic information of the current block includes the second luminance characteristic information;
if the second characteristic information of the current block comprises the second chromaticity characteristic information, writing the second chromaticity characteristic information into the code stream;
and if the second characteristic information of the current block comprises the second brightness characteristic information and the second chromaticity characteristic information, writing the second brightness characteristic information and the second chromaticity characteristic information into the code stream.
Optionally, the value range of the element value in the first characteristic information output by the last layer of activation function of the coding network is [ a, b ], and a and b are integers.
Optionally, a is 0, and b is 1.
Illustratively, the last layer of the encoding network has an expression of an activation function:
wherein x is input information of the last layer of activation function, and S (x) is first characteristic information output by the last layer of activation function.
Optionally, a is-1, and b is 1.
Illustratively, the last layer of the encoding network has an expression of an activation function:
The system comprises a last layer of activation functions, wherein x is input information of the last layer of activation functions, S (x) is first characteristic information output by the last layer of activation functions, and n is a positive integer.
Optionally, n is 10.
Optionally, the self-encoder performs noise adding processing on the first characteristic information output by the encoding network in the training process, and then inputs the first characteristic information into the decoding network.
Optionally, in the training process of the self-encoder, the self-encoder rounds up the first characteristic information output by the encoding network and inputs the rounded first characteristic information into the decoding network, and in the counter-propagation, the self-encoder performs derivative operation on the first characteristic information output by the encoding network to update the weight parameter in the encoding network.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the video encoder 20 shown in fig. 20 may correspond to a corresponding main body in performing the encoding method of the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the video decoder 20 are respectively for implementing corresponding flows in each method such as the encoding method, and are not described herein for brevity.
The apparatus and system of embodiments of the present application are described above in terms of functional units in conjunction with the accompanying drawings. It should be understood that the functional units may be implemented in hardware, or in instructions in software, or in a combination of hardware and software units. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software units in the decoding processor. Alternatively, the software elements may reside in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.
Fig. 21 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
As shown in fig. 32, the electronic device 30 may be a video encoder or a video decoder according to the embodiments of the present application, and the electronic device 30 may include:
A memory 33 and a processor 32, the memory 33 being adapted to store a computer program 34 and to transmit the program code 34 to the processor 32. In other words, the processor 32 may call and run the computer program 34 from the memory 33 to implement the methods in embodiments of the present application.
For example, the processor 32 may be configured to perform the steps of the method 200 described above in accordance with instructions in the computer program 34.
In some embodiments of the present application, the processor 32 may include, but is not limited to:
a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
In some embodiments of the present application, the memory 33 includes, but is not limited to:
volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).
In some embodiments of the present application, the computer program 34 may be partitioned into one or more units that are stored in the memory 33 and executed by the processor 32 to perform the methods provided herein. The one or more elements may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program 34 in the electronic device 30.
As shown in fig. 21, the electronic device 30 may further include:
a transceiver 33, the transceiver 33 being connectable to the processor 32 or the memory 33.
The processor 32 may control the transceiver 33 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 33 may include a transmitter and a receiver. The transceiver 33 may further include antennas, the number of which may be one or more.
It will be appreciated that the various components in the electronic device 30 are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.
Fig. 22 is a schematic block diagram of a video codec system provided in an embodiment of the present application.
As shown in fig. 22, the video codec system 40 may include: a video encoder 41 and a video decoder 42, wherein the video encoder 41 is used for executing the video encoding method according to the embodiment of the present application, and the video decoder 42 is used for executing the video decoding method according to the embodiment of the present application.
The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.
The application also provides a code stream which is generated according to the coding method, and optionally, the code stream comprises the first mark or comprises the first mark and the second mark.
When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, a flow or function consistent with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional units in various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (55)

  1. A video decoding method, comprising:
    decoding the code stream, and determining an intra-frame prediction mode of the current block;
    if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, decoding the code stream to obtain the characteristic information of the current block;
    Acquiring pixel values of reconstructed pixel points around the current block;
    and inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a predicted block of the current block output by the decoding network.
  2. The method of claim 1, wherein the decoding the bitstream, determining an intra prediction mode for the current block, comprises:
    decoding the code stream to obtain a first flag, wherein the first flag is used for indicating whether the current sequence allows using an intra-frame prediction mode based on a self-encoder;
    and determining an intra-frame prediction mode of the current block according to the first mark.
  3. The method of claim 2, wherein the determining the intra-prediction mode of the current block based on the first flag comprises:
    if the value of the first flag is a first value, decoding the code stream to obtain a second flag, wherein the second flag is used for indicating whether the current block uses the intra-frame prediction mode based on the self-encoder, and the first value is used for indicating that the current sequence allows using the intra-frame prediction mode based on the self-encoder;
    And determining the intra-frame prediction mode of the current block according to the second mark.
  4. The method of claim 2, wherein the first flag is included in a sequence level parameter syntax element.
  5. A method according to claim 3, characterized in that the second flag is included in an encoding unit syntax element.
  6. A method according to claim 3, wherein the second flag is used to indicate whether the luma component and/or the chroma component of the current block uses a self-encoder based intra prediction mode.
  7. The method of claim 6, wherein the determining the current intra-block prediction mode based on the second flag comprises:
    if the second flag is used for indicating whether the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, determining whether the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder according to the second flag;
    if the second flag is used for indicating whether the luminance component of the current block uses the intra-frame prediction mode based on the self-encoder, determining whether the luminance component of the current block uses the intra-frame prediction mode based on the self-encoder according to the second flag;
    If the second flag is used to indicate whether the chroma component of the current block uses the self-encoder based intra prediction mode, determining whether the chroma component of the current block uses the self-encoder based intra prediction mode based on the second flag.
  8. The method of claim 7, wherein said decoding the code stream to obtain the characteristic information of the current block comprises:
    if the luminance component and the chrominance component of the current block both use an intra-frame prediction mode based on a self-encoder, decoding the code stream to obtain luminance characteristic information and chrominance characteristic information of the current block;
    if the brightness component of the current block uses an intra-frame prediction mode based on a self-encoder, decoding the code stream to obtain brightness characteristic information of the current block;
    and if the chroma component of the current block uses the intra-frame prediction mode based on the self-encoder, decoding the code stream to obtain the chroma characteristic information of the current block.
  9. The method of claim 8, wherein said decoding the bitstream to obtain luminance characteristics information of the current block comprises:
    decoding a syntax element corresponding to a luminance component of the current block;
    And obtaining the brightness characteristic information of the current block in the syntax element corresponding to the brightness component of the current block.
  10. The method of claim 8, wherein said decoding the code stream to obtain chroma characteristic information for the current block comprises:
    decoding syntax elements corresponding to chroma components of the current block;
    and obtaining the chromaticity characteristic information of the current block in the syntax element corresponding to the chromaticity component of the current block.
  11. The method according to claim 8, wherein inputting the characteristic information of the current block and the pixel values of the reconstructed pixels around the current block into a decoding network of a self-encoder corresponding to the current block to obtain the predicted block of the current block output by the decoding network, comprises:
    if the characteristic information of the current block comprises the luminance characteristic information and the chrominance characteristic information of the current block, inputting the luminance characteristic information of the current block and the luminance values of reconstructed pixels around the current block into the decoding network to obtain a luminance prediction block of the current block, and inputting the chrominance characteristic information of the current block and the chrominance values of reconstructed pixels around the current block into the decoding network to obtain a chrominance prediction block of the current block;
    If the characteristic information of the current block comprises the brightness characteristic information of the current block, inputting the brightness characteristic information of the current block and brightness values of reconstructed pixel points around the current block into the decoding network to obtain a brightness prediction block of the current block;
    and if the characteristic information of the current block comprises the chromaticity characteristic information of the current block, inputting the chromaticity characteristic information of the current block and chromaticity values of reconstructed pixel points around the current block into the decoding network to obtain a chromaticity prediction block of the current block.
  12. The method according to any one of claims 1-11, wherein the element values in the property information of the current block are integers.
  13. The method of claim 12, wherein the characteristic information of the current block is obtained by rounding characteristic information output from a last layer activation function of a coding network of the self-encoder.
  14. The method of claim 13, wherein the value range of the element values in the characteristic information output by the last layer activation function of the coding network is [ a, b ], and a and b are integers.
  15. The method of claim 14, wherein a is 0 and b is 1.
  16. The method of claim 15, wherein the last layer activation function of the encoding network has the expression:
    wherein x is the input of the last layer activation function, and S (x) is the characteristic information output by the last layer activation function.
  17. The method of claim 14, wherein a is-1 and b is 1.
  18. The method of claim 17, wherein the last layer activation function of the encoding network has the expression:
    wherein x is the input of the last layer activation function, S (x) is the characteristic information output by the last layer activation function, and n is a positive integer.
  19. The method of claim 18, wherein n is 10.
  20. The method of claim 13, wherein the self-encoder inputs the original characteristic information output from the encoding network to the decoding network after noise-adding processing during training.
  21. The method of claim 13, wherein the self-encoder takes the value of the original characteristic information output by the encoding network and inputs the value to the decoding network during the forward propagation, and takes the derivative operation of the original characteristic information output by the encoding network during the backward propagation, so as to update the weight parameters in the encoding network.
  22. A video encoding method, comprising:
    determining an intra-frame prediction mode of a current block from preset N first intra-frame prediction modes, wherein N is a positive integer, and the N first intra-frame prediction modes comprise intra-frame prediction modes based on a self-encoder;
    if the intra-frame prediction mode of the current block is based on the intra-frame prediction mode of the self-encoder, acquiring the self-encoder corresponding to the current block, wherein the self-encoder comprises an encoding network and a decoding network;
    inputting an original value of the current block into the coding network to obtain first characteristic information of the current block output by the coding network;
    and inputting the first characteristic information of the current block and pixel values of reconstructed pixel points around the current block into the decoding network to obtain a predicted block of the current block output by the decoding network.
  23. The method according to claim 22, wherein inputting the first characteristic information of the current block and the pixel values of the reconstructed pixels around the current block into the decoding network to obtain the predicted block of the current block output by the decoding network includes:
    rounding the first characteristic information of the current block to obtain second characteristic information of the current block;
    And inputting the second characteristic information and the pixel values of the reconstructed pixel points around the current block into the decoding network to obtain a predicted block of the current block output by the decoding network.
  24. The method of claim 23, wherein the method further comprises:
    and writing the second characteristic information of the current block into a code stream.
  25. The method of claim 22, wherein the method further comprises:
    a first flag is written in the bitstream, the first flag being used to indicate whether the current sequence allows use of the self-encoder based intra prediction mode.
  26. The method of claim 25, wherein the method further comprises:
    and if the value of the first flag is a first value, writing a second flag in the code stream, wherein the second flag is used for indicating whether the current block uses the intra-frame prediction mode based on the self-encoder, and the first value is used for indicating that the current sequence allows using the intra-frame prediction mode based on the self-encoder.
  27. The method of claim 25, wherein the first flag is included in a sequence level parameter syntax element.
  28. The method of claim 26, wherein the second flag is included in an encoding unit syntax element.
  29. The method of claim 26, wherein the second flag is used to indicate whether a luma component and/or a chroma component of the current block uses a self-encoder based intra prediction mode.
  30. The method according to any one of claims 22-29, wherein determining the intra prediction mode of the current block from among the preset N first intra prediction modes comprises:
    and determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the rate distortion cost.
  31. The method of claim 30, wherein the determining the intra-prediction mode of the current block from the N first intra-prediction modes according to the rate-distortion cost comprises:
    determining a predicted value corresponding to the first intra-frame prediction mode when the current block is encoded by using the first intra-frame prediction mode;
    determining a first rate distortion cost of the first intra-prediction mode according to the distortion between the predicted value and the original value of the current block and the number of bits consumed in encoding the flag bit of the first intra-prediction mode;
    and determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the first rate distortion cost.
  32. The method of claim 31, wherein the determining the intra-prediction mode of the current block from the N first intra-prediction modes according to a first rate-distortion cost comprises:
    selecting M second intra-frame prediction modes from the N first intra-frame prediction modes according to the first rate distortion cost, wherein M is a positive integer smaller than N;
    determining a reconstruction value corresponding to the second intra-frame prediction mode when the current block is encoded by using the second intra-frame prediction mode;
    determining a second rate-distortion cost of the second intra-prediction mode according to the distortion between the reconstructed value and the original value of the current block and the number of bits consumed in encoding the current block using the second intra-prediction mode;
    and determining a second intra-frame prediction mode with the minimum second rate distortion cost from the M second intra-frame prediction modes as the intra-frame prediction mode of the current block.
  33. The method of claim 30, wherein the determining the intra-prediction mode of the current block from the N first intra-prediction modes according to the rate-distortion cost comprises:
    Determining a first rate-distortion cost of a third intra-frame prediction mode according to the distortion between a prediction value corresponding to the third intra-frame prediction mode and an original value of the current block and the number of bits consumed in encoding a flag bit of the third intra-frame prediction mode, wherein the third intra-frame prediction mode is a first intra-frame prediction mode except for an intra-frame prediction mode based on a self-encoder among the N first intra-frame prediction modes;
    selecting Q third intra-frame prediction modes from N-1 third intra-frame prediction modes according to the first rate distortion cost, wherein Q is a positive integer smaller than N-1;
    according to the preset rounding range of the first characteristic information, P predicted values corresponding to an intra-frame prediction mode based on a self-encoder are determined, R predicted values are selected from the P predicted values, P, R is a positive integer, and R is smaller than or equal to P;
    and determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the Q prediction values corresponding to the Q third intra-frame prediction modes and the R prediction values corresponding to the intra-frame prediction modes based on the self-encoder.
  34. The method of claim 33, wherein the determining the intra prediction mode of the current block from the N first intra prediction modes based on the Q prediction values corresponding to the Q third intra prediction modes and the R prediction values corresponding to the intra prediction modes of the self-encoder comprises:
    Q reconstruction values corresponding to the Q predicted values and R reconstruction values corresponding to the R predicted values are determined;
    determining a third rate-distortion cost according to the distortion between the Q+R reconstruction values and the original value of the current block and the number of bits consumed when the current block is encoded by using a first intra-frame prediction mode corresponding to the Q+R reconstruction values;
    and determining a first intra-frame prediction mode with the minimum third rate distortion cost from the N first intra-frame prediction modes as the intra-frame prediction mode of the current block.
  35. The method according to claim 33, wherein the determining P prediction values corresponding to intra prediction modes based on the self-encoder according to the preset rounded range of the first characteristic information includes:
    predicting P values of the first characteristic information output by the coding network according to the preset rounding range of the first characteristic information;
    inputting the characteristic information under the P values and the pixel values of the reconstructed pixel points around the current block into the decoding network to obtain predicted values under the P values output by the decoding network;
    and determining the predicted values under the P values as P predicted values corresponding to the intra-frame prediction mode based on the self-encoder.
  36. The method of claim 33, wherein if R is less than P, selecting R predictors from the P predictors comprises:
    determining a fourth rate distortion cost corresponding to the P predicted values according to the distortion between the P predicted values and the original value of the current block;
    and selecting R predicted values with the fourth rate distortion cost being the smallest from the P predicted values.
  37. The method according to any one of claims 23-29, wherein said inputting the original value of the current block into the coding network, resulting in the first characteristic information output by the coding network, comprises:
    if the brightness component of the current block is determined to use an intra-frame prediction mode based on a self-encoder, inputting an original brightness value of the current block into the coding network to obtain first brightness characteristic information of the current block;
    if the chroma component of the current block is determined to use an intra-frame prediction mode based on a self-encoder, inputting an original chroma value of the current block into the coding network to obtain first chroma characteristic information of the current block;
    if it is determined that both the luminance component and the chrominance component of the current block use the intra-frame prediction mode based on the self-encoder, the original luminance value and the original chrominance value of the current block are input into the encoding network, and first luminance characteristic information and first chrominance characteristic information of the current block are obtained.
  38. The method of claim 37, wherein rounding the first characteristic information of the current block to obtain the second characteristic information of the current block, comprises:
    if the first characteristic information of the current block comprises the first brightness characteristic information, the first brightness characteristic information is valued to obtain second brightness characteristic information of the current block;
    if the first characteristic information of the current block comprises the first chromaticity characteristic information, the first chromaticity characteristic information is valued to obtain second chromaticity characteristic information of the current block;
    and if the first characteristic information of the current block comprises the first brightness characteristic information and the first chromaticity characteristic information, respectively taking values of the first brightness characteristic information and the first chromaticity characteristic information to obtain second brightness characteristic information and second chromaticity characteristic information of the current block.
  39. The method according to claim 38, wherein inputting the second characteristic information of the current block and the pixel values of the reconstructed pixels around the current block into the decoding network to obtain the predicted block of the current block output by the decoding network includes:
    If the second characteristic information of the current block comprises the second brightness characteristic information, inputting the second brightness characteristic information and brightness values of reconstructed pixel points around the current block into the decoding network to obtain a brightness prediction block of the current block;
    if the second characteristic information of the current block comprises the second chromaticity characteristic information, inputting the second chromaticity characteristic information and chromaticity values of reconstructed pixel points around the current block into the decoding network to obtain a chromaticity prediction block of the current block;
    and if the second characteristic information of the current block comprises the second brightness characteristic information and the second chromaticity characteristic information, inputting the second brightness characteristic information, the second chromaticity characteristic information and pixel values of reconstructed pixel points around the current block into the decoding network to obtain a brightness prediction block and a chromaticity prediction block of the current block.
  40. The method of claim 38 or 39, wherein said writing second characteristic information of the current block into a bitstream comprises:
    if the second characteristic information of the current block comprises the second brightness characteristic information, writing the second brightness characteristic information into the code stream;
    If the second characteristic information of the current block comprises the second chromaticity characteristic information, writing the second chromaticity characteristic information into the code stream;
    and if the second characteristic information of the current block comprises the second brightness characteristic information and the second chromaticity characteristic information, writing the second brightness characteristic information and the second chromaticity characteristic information into the code stream.
  41. The method according to any one of claims 22-29, wherein the value range of the element values in the first characteristic information output by the last layer activation function of the coding network is [ a, b ], and a and b are integers.
  42. The method of claim 41, wherein a is 0 and b is 1.
  43. The method of claim 42, wherein the last layer activation function of the encoding network has an expression:
    wherein x is input information of the last layer of activation function, and S (x) is first characteristic information output by the last layer of activation function.
  44. The method of claim 41, wherein a is-1 and b is 1.
  45. The method of claim 44, wherein the last layer activation function of the encoding network has an expression:
    The system comprises a last layer of activation functions, wherein x is input information of the last layer of activation functions, S (x) is first characteristic information output by the last layer of activation functions, and n is a positive integer.
  46. The method of claim 45, wherein n is 10.
  47. The method according to any one of claims 22-29, wherein the self-encoder inputs the first characteristic information output from the encoding network into the decoding network after noise-adding processing during training.
  48. The method according to any one of claims 22-29, wherein the self-encoder performs a derivative operation on the first characteristic information output by the encoding network during forward propagation, and performs a derivative operation on the first characteristic information output by the encoding network during backward propagation, so as to update the weight parameters in the encoding network.
  49. A video decoder, comprising:
    a mode determining unit for decoding the code stream and determining an intra prediction mode of the current block;
    a feature determining unit, configured to decode the code stream if the intra-frame prediction mode of the current block is an intra-frame prediction mode based on a self-encoder, and obtain feature information of the current block;
    An obtaining unit, configured to obtain pixel values of reconstructed pixel points around the current block;
    and the prediction unit is used for inputting the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into a decoding network of a self-encoder corresponding to the current block to obtain a prediction block of the current block output by the decoding network.
  50. A video encoder, comprising:
    a mode determining unit, configured to determine an intra-prediction mode of a current block from preset N first intra-prediction modes, where N is a positive integer, and the N first intra-prediction modes include intra-prediction modes based on a self-encoder;
    an obtaining unit, configured to obtain a self-encoder corresponding to the current block if the intra-frame prediction mode of the current block is a self-encoder-based intra-frame prediction mode, where the self-encoder includes an encoding network and a decoding network;
    the characteristic determining unit is used for inputting the original value of the current block into the coding network to obtain first characteristic information of the current block output by the coding network;
    and the prediction unit is used for inputting the first characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into the decoding network to obtain a prediction block of the current block output by the decoding network.
  51. A video decoder comprising a processor and a memory;
    the memory is shown for storing a computer program;
    the processor is adapted to invoke and run a computer program stored in the memory to implement the method of any of the preceding claims 1 to 21.
  52. A video encoder comprising a processor and a memory;
    the memory is shown for storing a computer program;
    the processor is adapted to invoke and run a computer program stored in the memory to implement the method of any of the preceding claims 22 to 48.
  53. A video codec system, comprising:
    a video encoder as defined in claim 51;
    and a video decoder according to claim 52.
  54. A computer-readable storage medium storing a computer program;
    the computer program causes a computer to perform the method of any of the preceding claims 1 to 21 or 22 to 48.
  55. A code stream generated based on the method of any one of claims 22 to 48.
CN202180098997.7A 2021-09-17 2021-09-17 Video encoding and decoding method, device, system and storage medium Pending CN117426088A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/119164 WO2023039859A1 (en) 2021-09-17 2021-09-17 Video encoding method, video decoding method, and device, system and storage medium

Publications (1)

Publication Number Publication Date
CN117426088A true CN117426088A (en) 2024-01-19

Family

ID=85602328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180098997.7A Pending CN117426088A (en) 2021-09-17 2021-09-17 Video encoding and decoding method, device, system and storage medium

Country Status (2)

Country Link
CN (1) CN117426088A (en)
WO (1) WO2023039859A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116456102B (en) * 2023-06-20 2023-10-03 深圳传音控股股份有限公司 Image processing method, processing apparatus, and storage medium
CN116760976B (en) * 2023-08-21 2023-12-08 腾讯科技(深圳)有限公司 Affine prediction decision method, affine prediction decision device, affine prediction decision equipment and affine prediction decision storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103096051B (en) * 2011-11-04 2017-04-12 华为技术有限公司 Image block signal component sampling point intra-frame decoding method and device thereof
US20180332292A1 (en) * 2015-11-18 2018-11-15 Mediatek Inc. Method and apparatus for intra prediction mode using intra prediction filter in video and image compression
KR20180014675A (en) * 2016-08-01 2018-02-09 한국전자통신연구원 Method and apparatus for encoding/decoding image and recording medium for storing bitstream
KR20200128175A (en) * 2018-04-01 2020-11-11 김기백 Method and apparatus of encoding/decoding an image using intra prediction
KR20200028856A (en) * 2018-09-07 2020-03-17 김기백 A method and an apparatus for encoding/decoding video using intra prediction
CN112840649A (en) * 2018-09-21 2021-05-25 Lg电子株式会社 Method for decoding image by using block division in image coding system and apparatus therefor
CA3114816C (en) * 2018-10-12 2023-08-22 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Video image component prediction method and apparatus, and computer storage medium

Also Published As

Publication number Publication date
WO2023039859A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
US11523124B2 (en) Coded-block-flag coding and derivation
CN110720218B (en) Intra-frame filtering applied with transform processing in video coding
CN105144718B (en) When skipping conversion for damaging the intra prediction mode of decoding
TWI705697B (en) Restriction on palette block size in video coding
WO2018001207A1 (en) Coding and decoding method and apparatus
CN114554201A (en) Intra-filtering flags in video coding
CN112352429B (en) Method, apparatus and storage medium for encoding and decoding video data
JP7277586B2 (en) Method and apparatus for mode and size dependent block level limiting
KR20210125088A (en) Encoders, decoders and corresponding methods harmonizing matrix-based intra prediction and quadratic transform core selection
WO2023236936A1 (en) Image encoding method and apparatus, and image decoding method and apparatus
CN117426088A (en) Video encoding and decoding method, device, system and storage medium
CN113411613B (en) Method for video coding image block, decoding device and coder/decoder
CN112567740B (en) Method and apparatus for intra prediction
CN115866297A (en) Video processing method, device, equipment and storage medium
WO2023044868A1 (en) Video encoding method, video decoding method, device, system, and storage medium
CN115086664A (en) Decoding method, encoding method, decoder and encoder for unmatched pixels
WO2022155922A1 (en) Video coding method and system, video decoding method and system, video coder and video decoder
WO2023092404A1 (en) Video coding and decoding methods and devices, system, and storage medium
WO2022193389A1 (en) Video coding method and system, video decoding method and system, and video coder and decoder
WO2022217447A1 (en) Video encoding and decoding method and system, and video codec
WO2022193390A1 (en) Video coding and decoding method and system, and video coder and video decoder
WO2022174475A1 (en) Video encoding method and system, video decoding method and system, video encoder, and video decoder
WO2023236113A1 (en) Video encoding and decoding methods, apparatuses and devices, system, and storage medium
WO2023184248A1 (en) Video coding and decoding method, apparatus, device and system, and storage medium
KR20230111256A (en) Video encoding and decoding methods and systems, video encoders and video decoders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination