WO2023044919A1 - 视频编解码方法、设备、系统、及存储介质 - Google Patents

视频编解码方法、设备、系统、及存储介质 Download PDF

Info

Publication number
WO2023044919A1
WO2023044919A1 PCT/CN2021/121047 CN2021121047W WO2023044919A1 WO 2023044919 A1 WO2023044919 A1 WO 2023044919A1 CN 2021121047 W CN2021121047 W CN 2021121047W WO 2023044919 A1 WO2023044919 A1 WO 2023044919A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
current block
transformation
points
prediction
Prior art date
Application number
PCT/CN2021/121047
Other languages
English (en)
French (fr)
Inventor
王凡
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/121047 priority Critical patent/WO2023044919A1/zh
Priority to CN202180102575.2A priority patent/CN117981320A/zh
Publication of WO2023044919A1 publication Critical patent/WO2023044919A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present application relates to the technical field of video coding and decoding, and in particular to a video coding and decoding method, device, system, and storage medium.
  • Digital video technology can be incorporated into a variety of video devices, such as digital televisions, smartphones, computers, e-readers, or video players, among others.
  • video devices implement video compression technology to enable more effective transmission or storage of video data.
  • Video is compressed through encoding, and the encoding process includes prediction, transformation, and quantization. For example, through intra-frame prediction and/or inter-frame prediction, determine the prediction block of the current block, subtract the prediction block from the current block to obtain a residual block, transform the residual block to obtain a transformation coefficient, and quantize the transformation coefficient to obtain a quantization coefficient, And encode the quantized coefficients to form a code stream.
  • the transformation is to transform the residual block from the spatial domain to the frequency domain to remove the correlation between the residuals.
  • the current transformation method has poor transformation effect, resulting in low video compression efficiency.
  • Embodiments of the present application provide a video encoding and decoding method, device, system, and storage medium, so as to improve transformation effects and further improve video compression efficiency.
  • the present application provides a video decoding method, including:
  • the embodiment of the present application provides a video encoding method, including:
  • the residual block is transformed according to the transform kernel, and the transformed coefficients are coded to obtain a code stream.
  • the present application provides a video encoder, configured to execute the method in the above first aspect or various implementations thereof.
  • the encoder includes a functional unit configured to execute the method in the above first aspect or its implementations.
  • the present application provides a video decoder, configured to execute the method in the above second aspect or various implementations thereof.
  • the decoder includes a functional unit configured to execute the method in the above second aspect or its various implementations.
  • a video encoder including a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
  • a sixth aspect provides a video decoder, including a processor and a memory.
  • the memory is used to store a computer program
  • the processor is used to invoke and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
  • a video codec system including a video encoder and a video decoder.
  • the video encoder is configured to execute the method in the above first aspect or its various implementations
  • the video decoder is configured to execute the method in the above second aspect or its various implementations.
  • the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or any of the implementations thereof. method.
  • a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • a computer program product including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to second aspects or the method in each implementation manner.
  • a computer program which, when running on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
  • the decoding end obtains the target transformation coefficient of the current block by decoding the code stream; predicts the current block to obtain the prediction block of the current block; determines the current block according to the prediction block.
  • this application is based on the correlation between the residual texture and the texture of the prediction block itself, and determines or guides or assists the selection of the transformation kernel through the characteristics of the prediction block, so as to reduce the transmission of the selection information of the transformation kernel in the code stream and reduce the While transforming the overhead in the code stream, the compression efficiency of the current block is improved.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application
  • Fig. 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application
  • Fig. 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
  • Figure 4A is a schematic diagram of an original image
  • FIG. 4B is a frequency domain diagram of FIG. 4A after DCT transformation
  • Fig. 5 is the schematic diagram of LFNST transformation
  • Fig. 6 is a schematic diagram of a base image corresponding to a transformation kernel
  • FIG. 7 is a schematic diagram of an intra prediction mode
  • FIG. 8 is a schematic diagram of another intra prediction mode
  • FIG. 9 is a schematic flowchart of a video decoding method provided in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a video decoding process involved in an embodiment of the present application.
  • FIG. 11 is another schematic flowchart of a video decoding method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a video decoding process involved in an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a video encoding method provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of the video encoding process involved in the embodiment of the present application.
  • FIG. 15 is a schematic flowchart of a video encoding method provided by an embodiment of the present application.
  • FIG. 16 is a schematic diagram of the video encoding process involved in the embodiment of the present application.
  • Fig. 17 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • Fig. 18 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
  • Fig. 19 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 20 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.
  • the application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc.
  • the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard.
  • the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
  • SVC scalable video codec
  • MVC multi-view video codec
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 .
  • the video codec system 100 includes an encoding device 110 and a decoding device 120 .
  • the encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
  • the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
  • the encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function
  • the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
  • the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 .
  • Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
  • channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time.
  • encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 .
  • the communication medium includes a wireless communication medium, such as a radio frequency spectrum.
  • the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
  • the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 .
  • the storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like.
  • the decoding device 120 may acquire encoded video data from the storage medium.
  • channel 130 may include a storage server that may store video data encoded by encoding device 110 .
  • the decoding device 120 may download the stored encoded video data from the storage server.
  • the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
  • FTP file transfer protocol
  • the encoding device 110 includes a video encoder 112 and an output interface 113 .
  • the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
  • the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
  • the video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
  • a video capture device for example, a video camera
  • a video archive for example, a video archive
  • a video input interface for example, a video archive
  • video input interface for example, a video input interface
  • computer graphics system used to generate video data.
  • the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
  • Video data may include one or more pictures or a sequence of pictures.
  • the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
  • Encoding information may include encoded image data and associated data.
  • the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
  • SPS sequence parameter set
  • PPS picture parameter set
  • An SPS may contain parameters that apply to one or more sequences.
  • a PPS may contain parameters applied to one or more images.
  • the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
  • the video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 .
  • the encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
  • the decoding device 120 includes an input interface 121 and a video decoder 122 .
  • the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
  • the input interface 121 includes a receiver and/or a modem.
  • the input interface 121 can receive encoded video data through the channel 130 .
  • the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
  • the display device 123 displays the decoded video data.
  • the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 .
  • the display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • plasma display a plasma display
  • OLED organic light emitting diode
  • FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 .
  • the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
  • Fig. 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images.
  • the lossless compression may be visually lossless compression or mathematically lossless compression.
  • the video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format.
  • the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation.
  • 4:2:0 means that every 4 pixels have 4 luminance components
  • 2 chroma components YYYYCbCr
  • 4:2:2 means that every 4 pixels have 4 luminance components
  • 4 Chroma component YYYYCbCrCbCr
  • 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
  • the video encoder 200 reads video data, and divides a frame of image into several coding tree units (coding tree units, CTUs) for each frame of image in the video data.
  • CTB can be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
  • LCU Large Coding unit
  • CTB coding tree block
  • Each CTU may be associated with a pixel block of equal size within the image. Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples.
  • each CTU may be associated with one block of luma samples and two blocks of chroma samples.
  • a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32 and so on.
  • a CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block.
  • the CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that coding, prediction, and transformation are separated, and processing is more flexible.
  • a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
  • the video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, video encoders and video decoders may support 2N ⁇ 2N or N ⁇ N PU sizes for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, NxN or similarly sized symmetric PUs for inter prediction.
  • the video encoder and video decoder can also support 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N and nR ⁇ 2N asymmetric PUs for inter-frame prediction, where U, D, L and R represent up, down, left, and right respectively, for example , 2N ⁇ nU means that the 2N ⁇ 2N CU is divided into two PUs according to the ratio of 1:3 up and down, 2N ⁇ nD means that the 2N ⁇ 2N CU is divided into two PUs according to the ratio of 3:1 up and down, nL ⁇ 2N means that the 2N ⁇ 2N CU is divided into two PUs according to the ratio of left and right 1:3, and nR ⁇ 2N means that the 2N ⁇ 2N CU is divided into two PUs according to the ratio of left and right 3:1.
  • the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
  • the current block may be called a current coding unit (CU) or a current prediction unit (PU).
  • a predicted block may also be called a predicted image block or an image predicted block, and a reconstructed image block may also be called a reconstructed block or an image reconstructed image block.
  • the prediction unit 210 includes an inter prediction unit 211 and an intra estimation unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
  • the inter-frame prediction unit 211 can be used for inter-frame prediction.
  • the inter-frame prediction can refer to image information of different frames.
  • the inter-frame prediction uses motion information to find a reference block from the reference frame, and generates a prediction block according to the reference block to eliminate temporal redundancy;
  • Frames used for inter-frame prediction may be P frames and/or B frames, P frames refer to forward predictive frames, and B frames refer to bidirectional predictive frames.
  • the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
  • the motion vector can be an integer pixel or a sub-pixel. If the motion vector is sub-pixel, then it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block.
  • the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block.
  • Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
  • the intra-frame estimation unit 212 (also referred to as an intra-frame prediction unit) only refers to the information of the same frame image to predict the pixel information in the current code image block for eliminating spatial redundancy.
  • a frame used for intra prediction may be an I frame.
  • the pixels in the left row and the upper column of the current block are reference pixels of the current block, and the intra prediction uses these reference pixels to predict the current block.
  • These reference pixels may all be available, that is, all have been encoded and decoded. Some parts may also be unavailable, for example, the current block is the leftmost of the whole frame, then the reference pixel on the left of the current block is unavailable.
  • the lower left part of the current block has not been encoded and decoded, so the reference pixel at the lower left is also unavailable.
  • the available reference pixel or some value or some method can be used for filling, or no filling is performed.
  • the intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes.
  • the intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes.
  • a prediction matrix Matrix based intra prediction, MIP
  • MIP prediction matrix
  • CCLM CCLM prediction mode
  • the intra-frame prediction will be more accurate and more in line with the development requirements of high-definition and ultra-high-definition digital video.
  • the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
  • Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
  • QP quantization parameter
  • Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
  • the reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
  • Loop filtering unit 260 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, wherein the deblocking filtering unit is used for deblocking, and the SAO/ALF unit Used to remove ringing effects.
  • SAO/ALF sample adaptive compensation/adaptive loop filtering
  • the decoded image buffer 270 may store reconstructed pixel blocks.
  • Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures.
  • intra estimation unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
  • Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
  • Fig. 3 is a schematic block diagram of a decoder involved in an embodiment of the present application.
  • the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
  • the video decoder 300 can receive code streams.
  • the entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream.
  • the prediction unit 320 , the inverse quantization transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
  • the prediction unit 320 includes an intra estimation unit 321 and an inter prediction unit 322 .
  • Intra estimation unit 321 may perform intra prediction to generate a predictive block for a PU.
  • Intra estimation unit 321 may use an intra prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs.
  • Intra estimation unit 321 may also determine the intra prediction mode of the PU from one or more syntax elements parsed from the codestream.
  • the inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 322 may generate a predictive block for the PU from one or more reference blocks for the PU.
  • Inverse quantization transform unit 330 may inverse quantize (ie, dequantize) the transform coefficients associated with a TU. Inverse quantization transform unit 330 may use a QP value associated with a CU of a TU to determine the degree of quantization.
  • inverse quantized transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
  • Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain the reconstructed image block.
  • Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
  • Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 .
  • the video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
  • the basic flow of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate the prediction block of the current block .
  • the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, for example, subtract the predicted block from the original block of the current block to obtain a residual block, which may also be referred to as residual information.
  • the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block.
  • the entropy encoding unit 280 receives the quantized transform coefficients output by the transform and quantization unit 230 , may perform entropy encoding on the quantized transform coefficients, and output a code stream.
  • the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
  • the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
  • the inverse quantization transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
  • the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
  • the reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image.
  • the encoding end also needs similar operations to the decoding end to obtain the decoded image.
  • the decoded image may also be referred to as a reconstructed image, and the reconstructed image may be a subsequent frame as a reference frame for inter-frame prediction.
  • the block division information determined by the encoder as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary.
  • the decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
  • the current block may be the current coding unit (CU) or the current prediction unit (PU).
  • the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework.
  • the basic process of the video codec but not limited to the framework and process.
  • the general hybrid encoding framework will first perform prediction, and the prediction uses the spatial or temporal correlation performance to obtain an image that is the same as or similar to the current block.
  • the prediction block and the current block are exactly the same, but it is difficult to guarantee that this is the case for all blocks in a video, especially for natural videos, or videos captured by cameras, because there are the presence of noise.
  • irregular movements, distortions, occlusions, changes in brightness, etc. in the video are difficult to fully predict. Therefore, the hybrid coding framework will subtract the predicted image from the original image of the current block to obtain a residual image, or subtract the predicted block from the current block to obtain a residual block.
  • Residual blocks are usually much simpler than the original image, so prediction can significantly improve compression efficiency.
  • the residual block is not directly encoded, but usually transformed first.
  • the transformation is to transform the residual image from the spatial domain to the frequency domain, and remove the correlation of the residual image. After the residual image is transformed into the frequency domain, since most of the energy is concentrated in the low-frequency region, most of the transformed non-zero coefficients are concentrated in the upper left corner. Quantization is then used for further compression. And because the human eye is not sensitive to high frequencies, a larger quantization step size can be used in high frequency areas.
  • Fig. 4A is a schematic diagram of the original image
  • Fig. 4B is a frequency domain diagram of Fig. 4A after discrete cosine transform (Discrete Cosine Transform, DCT for short).
  • DCT discrete Cosine Transform
  • FIG. 4B the original image shown in FIG. 4A has only non-zero coefficients in the upper left corner region after DCT transformation.
  • a DCT transformation is performed on the entire image, but in video coding and decoding, an image is divided into blocks for processing, so the transformation is also performed based on blocks.
  • DCT-II The most commonly used transforms in video compression standards, including DCT-II, DCT-VIII, and DST (Discrete Cosine Transform, discrete sine transform)-VII, etc.
  • T i (j) is the transformed coefficient
  • N is the point number of the original signal
  • i, j 0,1,...,N-1
  • ⁇ 0 is the compensation coefficient
  • DCT-II, DCT-VIII, and DST-VII both It is split into horizontal direction and vertical direction, and performs two-step one-dimensional transformation. For example, the transformation in the horizontal direction is performed first and then the transformation in the vertical direction, or the transformation in the vertical direction is performed first and then the transformation in the horizontal direction.
  • the above transformation method is more effective for horizontal and vertical textures, and it is very useful to improve compression efficiency.
  • the effect of the above transformation method on oblique textures will be worse. Therefore, as the demand for compression efficiency continues to increase, if oblique textures can be processed more effectively, the compression efficiency can be further improved.
  • a secondary transformation is currently used, that is, after the above-mentioned DCT-II, DCT-VIII, DST-VII and other basic transformations (Primary transform), the frequency domain signal is subjected to a second transformation Transformation converts the signal from one transform domain to another, and then performs quantization, entropy coding and other operations, the purpose of which is to further remove statistical redundancy.
  • the low frequency non-separable transform (LFNST for short) is a reduced secondary transform.
  • LFNST is used after the primary transform and before quantization.
  • LFNST is used after inverse quantization and before inverse basis transform.
  • LFNST performs a secondary transformation on the low-frequency coefficients in the upper left corner after the basic transformation.
  • the base transform concentrates energy to the upper left corner by decorrelating the image.
  • the secondary transform then decorrelates the low-frequency coefficients of the base transform.
  • 16 coefficients are input to the 4x4 LFNST transformation kernel, and the output is 8 coefficients; 64 coefficients are input to the 8x8 LFNST transformation kernel, and the output is 16 coefficients.
  • 8 coefficients are input to the 4x4 inverse LFNST transform kernel, and the output is 16 coefficients; 16 coefficients are input to the 8x8 inverse LFNST transform kernel, and the output is 64 coefficients.
  • the LFNST includes 4 sets of transformation kernels.
  • the base images corresponding to these 4 sets of transformation kernels are shown in FIG. 6 , and some obvious oblique textures can be seen.
  • Intra prediction uses the reconstructed pixels around the current block as a reference to predict the current block. Since the current video is coded from left to right and from top to bottom, the reference pixels available for the current block are usually on the left and top .
  • intra-frame prediction modes are given, among which there are 65 angle prediction modes except 0Planar and 1DC.
  • Planar usually handles some gradient textures
  • DC usually handles some flat areas
  • blocks with obvious angle textures usually use intra-frame angle prediction.
  • the wide-angle prediction mode can also be used for non-square blocks, and the wide-angle prediction mode makes the angle of prediction wider than that of square blocks.
  • 2 to 66 are angle prediction modes corresponding to the prediction mode of a square block, and -1 to -14 and 67 to 80 represent extended angle prediction modes in the wide angle prediction mode.
  • Angle prediction tiles the reference pixels to the current block as the prediction value according to the specified angle, which means that the prediction block will have obvious directional texture, and the residual error of the current block after angle prediction will also reflect the obvious angle statistically. characteristic. Therefore, the transformation kernel selected by the LFNST can be bound with the intra prediction mode, that is, after the intra prediction mode is determined, the LFNST can only use a set of transformation kernels corresponding to the intra prediction mode.
  • LFNST has a total of 4 groups of transformation kernels, each group has 2 transformation kernels, and the corresponding relationship between the intra prediction mode and the transformation kernel group is shown in Table 1:
  • IntraPredMode represents the intra prediction mode
  • Tr.set index represents the index of the transformation core group.
  • cross-component prediction modes used for chroma intra prediction are 81 to 83, and these modes are not available for luma intra prediction.
  • the transformation kernel of LFNST can be transposed to handle more angles with one transformation kernel group.
  • the modes 13 to 23 and 45 to 55 correspond to transformation kernel group 2, but 13 to 23 are obviously close to Horizontal mode, while 45 to 55 is obviously close to the vertical mode, and the corresponding transformation and inverse transformation of the 45 to 55 mode need to be matched by transposition.
  • LFNST has 4 groups of transformation kernels in total, and which group LFNST uses is specified according to the intra prediction mode. In this way, the correlation between the intra-frame prediction mode and the transform kernel of the LFNST is used, thereby reducing the transmission of the selection information of the transform kernel of the LFNST in the code stream, thereby saving codes. And whether the current block will use LFNST, and if LFNST is used, whether to use the first or second in a group needs to be determined by code stream or other conditions.
  • LFNST can improve the compression efficiency of the residual of oblique textures, but it can only be applied to intra-coded blocks at present. This is because there are many possible orientations for oblique textures, and LFNST uses multiple transformation kernels to handle them. Although LFNST uses some clustering processing, that is, a variety of textures with similar angles uses a transformation kernel, it still requires the existence of multiple sets of transformation kernels. By using the correlation between the intra prediction mode and the residual, the transformation kernel of LFNST can be determined directly through the intra prediction mode, which reduces the transmission of the selection information of the transformation kernel of LFNST in the code stream, thereby saving codes.
  • the embodiment of the present application determines the transformation kernel corresponding to the current block according to the prediction block of the current block, and uses the transformation kernel to perform inverse transformation on the decoded target transformation coefficient of the current block, reducing the selection of the transformation kernel Information is transmitted in the code stream to reduce the overhead of transformation in the code stream and improve the compression efficiency of the current block.
  • the video decoding method provided in the embodiment of the present application is introduced by taking the decoding end as an example.
  • FIG. 9 is a schematic flowchart of a video decoding method provided by an embodiment of the present application
  • FIG. 10 is a schematic diagram of a video decoding process involved in an embodiment of the present application.
  • the embodiment of the present application is applied to the video decoder shown in FIG. 1 and FIG. 2 .
  • the method of the embodiment of the present application includes:
  • the current block may also be referred to as a current decoding block, a current decoding unit, a decoding block, a block to be decoded, a current block to be decoded, and the like.
  • the current block when the current block includes a chroma component but does not include a luma component, the current block may be called a chroma block.
  • the current block when the current block includes a luma component but does not include a chroma component, the current block may be called a luma block.
  • the above-mentioned target transformation coefficient may be a basic transformation coefficient, and the basic transformation coefficient is also referred to as an initial transformation coefficient, an initial transformation coefficient, or a first transformation coefficient, and the like.
  • the above-mentioned target transformation coefficient is a transformation coefficient formed by the encoding end through secondary transformation on the residual block of the current block, specifically, the encoding end performs basic transformation on the residual block of the current block to obtain the basic transformation coefficient, Then perform secondary transformation on the basic transformation coefficients to obtain the target transformation coefficients of the current block.
  • the target transform coefficient at this time is also referred to as a secondary transform coefficient or a second transform coefficient.
  • the ways in which the decoding end decodes the code stream in S401 above to obtain the target transform coefficient of the current block include but are not limited to the following:
  • Mode 1 if the encoder does not quantize the target transform coefficients during encoding, but directly encodes the target transform coefficients to obtain a code stream. In this way, the decoding end decodes the code stream, and can directly obtain the target transformation coefficient of the current block from the code stream.
  • Method 2 During encoding, the encoding end quantizes the target transform coefficients to obtain quantized coefficients, and then encodes the quantized coefficients to obtain a code stream. In this way, the decoding end decodes the code stream to obtain the quantization coefficient of the current block, and dequantizes the quantization coefficient to obtain the target transformation coefficient of the current block.
  • intra-frame prediction is performed on the current block to obtain a predicted block of the current block.
  • the prediction mode used for the intra prediction of the current block which is specifically determined according to actual conditions.
  • inter-frame prediction is performed on the current block to obtain a prediction block of the current block.
  • the prediction mode used for the inter-frame prediction of the current block which is specifically determined according to actual conditions.
  • the decoder can determine the prediction block of the current block at least through the following several examples.
  • the code stream includes the indication information of the inter-frame prediction mode corresponding to the current block, and according to the indication information of the inter-frame mode, select the corresponding inter-frame mode to perform inter-frame prediction on the current block, and obtain the prediction of the current block piece.
  • the decoding end determines the inter-frame prediction mode corresponding to the current block according to the same inter-frame mode determination rule as the encoding end, and uses the determined inter-frame prediction mode to perform inter-frame prediction on the current block to obtain the prediction of the current block piece.
  • an autoencoder is used to perform inter-frame prediction to obtain a prediction block of the current block.
  • the autoencoder is a neural network model, which is obtained by training the prediction block obtained through inter-frame prediction, and can realize inter-frame prediction.
  • the self-encoder includes an encoding network and a decoding network.
  • the encoding end inputs the current block into the encoding network to obtain the characteristic information of the current block, and then inputs the characteristic information of the current block into the decoding network to obtain the decoding network
  • the residual block is obtained by subtracting the predicted block from the original value of the current block, and the residual block is transformed twice, and then quantized to form a code stream.
  • the encoding end writes the characteristic information of the current block into the code stream.
  • the decoding end decodes the code stream, obtains the characteristic information of the current block, and takes the same measures as the encoding end, that is, inputs the characteristic information of the current block into the decoding network of the self-encoder, and obtains the prediction block of the current block, which is decoded
  • the prediction block obtained by performing inter-frame prediction on the current block.
  • the inter-frame prediction is performed on the current block to obtain the prediction block of the current block, including but not limited to the above several methods.
  • the inter-frame prediction mode is taken as an example above.
  • the method shown in the above examples may also be used to determine the prediction block of the current block.
  • the texture of the residual block has a certain correlation with the texture of the prediction block, taking inter-frame coding as an example, for a block using inter-frame coding, the texture of the residual block and the texture of the prediction block itself predicted by inter-frame prediction have a certain correlation. Correlation. For example, usually the residuals will appear at the edges of objects, and the edges of objects will show obvious gradient features in the prediction block. As another example, for gradually changing textures, such as the folds of clothes, the texture of the residual often has the same or similar direction as the texture in the prediction block. Therefore, the embodiment of the present application determines or guides or assists the selection of the transformation kernel according to the characteristics of the prediction block.
  • the transformation kernel corresponding to the current block determined in the embodiment of the present application may be a transformation kernel or a group of transformation kernels, wherein a group of transformation kernels includes at least two transformation kernels, for example, a group of transformation kernels includes 2 transformation kernels or 3 transformation kernels transform kernel or more transform kernels.
  • the selection of the transformation kernel is related to the texture information of the prediction block, so that the transformation kernel corresponding to the current block can be determined by determining the texture information of the prediction block.
  • S403 includes the following S403-A and S403-B:
  • the texture information of the prediction block includes any information that can represent the texture feature of the prediction block, such as the texture direction of the prediction block, the texture size of the prediction block, and the like.
  • the spatial gray level co-occurrence matrix is used to represent the texture information of the prediction block.
  • the gradient information of the prediction block is used to represent the texture information of the prediction block, where the gradient information of the prediction block may indicate the texture change trend of the prediction block, that is, the texture change direction.
  • the above S403-A includes S403-A1:
  • S403-B includes S403-B1:
  • the gradient information of the prediction block includes at least one of a gradient direction and a gradient magnitude of the prediction block.
  • the ways of determining the gradient information of the prediction block in the above S403-A1 include but are not limited to the following:
  • the neural network model takes an image block as an input and is trained with the gradient information of the image block as a constraint, and can be used to predict the gradient information of the image block.
  • the present application does not limit the specific network structure of the neural network model, which can be determined according to actual needs, for example, it can be an image convolutional neural network, an adversarial neural network, and the like.
  • the prediction block is regarded as a kind of image block, and the decoding end can input the prediction block into the neural network model to obtain the gradient information of the prediction block output by the neural network model.
  • the gradient information of the predicted block may be determined by the gradient of all or part of the pixels in the predicted block.
  • the above S403-A1 includes the following S403-A11 and S403-A12:
  • S403-A12. Determine the gradient information of the prediction block according to the gradients of the N points.
  • the foregoing N points may be all pixel points in the prediction block.
  • the above N points may be some pixels in the prediction block.
  • the gradient information used to determine the prediction block is selected.
  • the N points of are the other pixel points in the prediction block except the outermost layer pixel points.
  • the above N points are pixels obtained by sampling pixels in the prediction block using a certain sampling method, for example, every other pixel is sampled.
  • the gradient of at least one of the N points includes a horizontal gradient and/or a vertical gradient, wherein the horizontal gradient can be understood as the gradient of the point in a horizontal manner, and the vertical gradient can be understood as the gradient of the point in the vertical direction. In this way, the two gradients can be calculated separately or together.
  • the ways of determining the gradients of N points in the prediction block in the above S403-A11 include but are not limited to the following ways:
  • Way 1 for each of the N points, calculate the gradient of the prediction block at that point, such as the gradient magnitude and gradient direction, where the gradient magnitude is also referred to as the magnitude.
  • the gradient of the prediction block at this point is recorded as the gradient of this point.
  • the predicted block is an image block.
  • the image function of the predicted block is f(x, y)
  • the image function f(x, y) is at the i-th point
  • the gradient at (x, y) is a vector with a specific size and direction, set to Gx and Gy, where Gx represents the gradient of the image function f(x, y) in the x direction (ie, the horizontal gradient), and Gy represents the image function in The gradient in the y direction (i.e. the vertical gradient).
  • the vector of this gradient can be expressed as shown in formula (4):
  • the magnitude of the gradient vector calculated by the above formula (5) is recorded as the gradient magnitude of the i-th point
  • the direction angle of the gradient vector calculated by the above-mentioned formula (6) is recorded as the gradient direction of the i-th point.
  • the gradient of each of the N points can be calculated, for example, the magnitude and direction of the gradient of each point can be calculated.
  • the second way is to use the neural network model to determine the gradient of each point in the N points.
  • the neural network model is trained with the original values of multiple pixels in the image as the input, and the gradient true value of the multiple pixels is constrained, so that the neural network model can predict the gradient of the pixels in the image.
  • the present application does not limit the specific network structure of the neural network model, which can be determined according to actual needs, for example, it can be an image convolutional neural network, an adversarial neural network, and the like.
  • input the above N points into the neural network model to obtain the gradient of each of the N points output by the neural network model, wherein the obtained gradient of each of the N points includes the horizontal gradient and At least one of the vertical gradients.
  • Method 3 Determine the gradient of each of the N points according to the pixel values of the adjacent points.
  • the horizontal gradient and the vertical gradient are calculated separately, that is, if the gradients of N points include the horizontal gradient and the vertical gradient, then the above S403-A11 includes:
  • S403-A112. Determine the vertical gradient of the i-th point according to the pixel values of adjacent points of the i-th point in the vertical direction of the prediction block.
  • the process of determining the horizontal gradient and vertical gradient of the i-th point is introduced, and the gradient determination process of other points in the N points refers to the i-th point Just a point.
  • the process of determining the horizontal gradient of the i-th point first obtains the pixel values of the adjacent points of the i-th point in the horizontal direction, and the adjacent points can be multiple or two.
  • the nearest neighbor of a point may or may not be adjacent to the i-th point.
  • the above-mentioned adjacent points are all located on the left side of the i-th point, or all are located on the right side of the i-th point.
  • some adjacent points are located on the left side of the i-th point, and some adjacent points are located on the right side of the i-th point, and the number of adjacent points on the left and right sides can be the same or different,
  • the i-th point has 4 adjacent points in the horizontal direction, 3 of which are located on the left side of the i-th point, 1 is located on the right side of the i-th point, or, 2 of them are located on the left side of the i-th point side, and 2 are located on the right side of the i-th point.
  • the gradient of the i-th point in the horizontal direction is determined according to the variation of the pixel values of the adjacent points of the i-th point in the horizontal direction. For example, if there is little difference between the pixel values of adjacent points in the horizontal direction and the pixel values of the i-th point, it means that the horizontal texture of the predicted block does not undergo a sudden change at the i-th point, that is, it is determined that the i-th Points have a smaller horizontal gradient.
  • the horizontal texture of the prediction block has a sudden change at the i-th point, that is, the i-th point is determined to be Points have a larger horizontal gradient.
  • the determination process of the vertical gradient at the i-th point is basically the same as that of the above-mentioned horizontal gradient at the i-th point.
  • the adjacent points can be multiple or two. Among these adjacent points, the nearest adjacent point to the i-th point can be the same The points may or may not be adjacent.
  • the above-mentioned adjacent points are all located on the upper side of the i-th point, or all are located on the lower side of the i-th point.
  • some adjacent points are located on the upper side of the i-th point, and some adjacent points are located on the lower side of the i-th point, and the number of adjacent points located on the upper side and the lower side can be the same or different , for example, the i-th point has 4 adjacent points in the vertical direction, 3 of which are located on the upper side of the i-th point, 1 is located on the lower side of the i-th point, or, 2 of them are located on the i-th point on the upper side of , and two are located on the lower side of the i-th point.
  • the gradient of the i-th point in the vertical direction is determined according to the variation of the pixel values of the adjacent points of the i-th point in the vertical direction. For example, if the difference between the pixel values of adjacent points in the vertical direction and the pixel value of the i-th point is not large, it means that the vertical texture of the prediction block does not undergo a sudden change at the i-th point, that is, it is determined that the i-th point The vertical gradient of the i points is small.
  • the vertical texture of the prediction block has a sudden change at the i-th point, that is, it is determined that the i-th point
  • the vertical gradient of i points is larger.
  • implementations of the above S403-A111 include but are not limited to the following:
  • Method 1 For each adjacent point of the i-th point in the horizontal direction of the prediction block, calculate the difference between the pixel value of each adjacent point and the pixel value of the i-th point, and determine the sum of the differences is the gradient of the i-th point in the horizontal direction, or, the average value of each difference is determined as the gradient of the i-th point in the horizontal direction.
  • Method 2 if the adjacent points of the i-th point in the horizontal direction of the prediction block include the left and right adjacent points of the i-th point in the horizontal direction of the prediction block, then the pixel value of the right adjacent point The ratio of the difference between the pixel value of the left adjacent point and 2 is determined as the horizontal gradient of the i-th point.
  • implementations of the above S403-A112 include but are not limited to the following:
  • Method 1 For each adjacent point of the i-th point in the vertical direction of the prediction block, calculate the difference between the pixel value of each adjacent point and the pixel value of the i-th point, and calculate the sum of the differences It is determined as the gradient of the i-th point in the vertical direction, or the average value of each difference is determined as the gradient of the i-th point in the vertical direction.
  • the gradients of the N points in the prediction block are determined.
  • execute S403-A12 determine the gradient information of the prediction block according to the gradients of the N points.
  • the ways of determining the gradient information of the predicted block according to the gradients of the N points in the above S403-A12 include but are not limited to the following:
  • the gradient of N points is determined as the gradient information of the prediction block, that is, the gradient information of the prediction block includes the gradient of N points, and the texture direction in the prediction block is estimated according to the gradient of the N points, Then, according to the texture direction, the transformation kernel is selected. For example, if the horizontal gradient and vertical gradient of all or how many of the N points are the same or approximately the same, then it can be estimated that the texture in the predicted block tends to be 45°, and the most effective texture processing for 45° can be selected accordingly The transformation kernel or group of transformation kernels.
  • Method 2 Determine the sum of the horizontal gradients of N points grad_hor_sum, and determine the sum of the vertical gradients of N points grad_ver_sum; and determine the gradient information of the prediction block according to the sum of the horizontal gradients and the sum of the vertical gradients of N points grad_para.
  • the horizontal gradient of each of the N points determined in the above S403-A11 is summed to obtain grad_hor_sum, and then, each of the N points determined in the above S403-A11 is The vertical gradients are summed to obtain grad_ver_sum, and then grad_para is determined according to grad_hor_sum and grad_ver_sum.
  • grad_hor_sum and grad_ver_sum are determined as grad_para, that is, the gradient information grad_para of the prediction block includes the sum of horizontal gradients of N points grad_hor_sum and the sum of vertical gradients of N points grad_ver_sum, and then according to the sum of grad_hor_sum and grad_ver_sum Size, to determine the transformation kernel corresponding to the current block, for example, when grad_hor_sum and grad_ver_sum are equal or approximately equal, it can be estimated that the texture in the predicted block tends to be 45°, then the most effective transformation for 45° texture processing can be selected accordingly Kernel or Transform Kernel Group.
  • At least one of grad_hor_sum and grad_ver_sum is relatively small, and when it is less than a certain value, it can be estimated that the texture in the prediction block tends to be horizontal or vertical, so a transformation kernel or a texture with no obvious direction texture can be selected accordingly Group transform kernels.
  • grad_ver_sum if the sum of the vertical gradients of N points grad_ver_sum is equal to 0, it is determined that the gradient information grad_para of the prediction block is 0.
  • S403-B1 is executed, that is, the transform kernel corresponding to the current block is determined according to the gradient information of the predicted block.
  • the above S403-B1 includes the following steps from S403-B11 to S403-B13:
  • a correspondence between prediction block gradients and transformation kernels is constructed in advance, and the correspondence relation includes transformation kernels or transformation kernel groups corresponding to gradients of prediction blocks of different sizes.
  • grad_para Tr. set index (-1/2,1/2) 2 [1/2,2) 3 [2,+ ⁇ ) 2 (- ⁇ ,-2] 2 (-2,-1/2] 1
  • Tr.set index represents the transformation core group index
  • grad_para represents the gradient information of the prediction block.
  • Table 2 when the gradient information grad_para of the prediction block is located at (-1/2,1/2), [2,+ ⁇ ) and (- ⁇ ,-2], they all correspond to transformation group 2, and the gradient information grad_para is located at When it is between [1/2,2), it corresponds to transformation group 3.
  • the gradient information grad_para is between (-2,-1/2], it corresponds to transformation group 1.
  • the above table 2 is just an example
  • the corresponding relationship between the gradient information grad_para of the prediction block and the transformation kernel group in practical applications includes but is not limited to the above Table 2.
  • the transposition operation may also be used.
  • (-1/2, 1/2) and [2, + ⁇ ) in Table 2 correspond to the transformation core group 2
  • (-1/2,1/2) needs to be transposed in actual use.
  • the transformation coefficient 0 corresponding to (-1/2,1/2) is inversely transformed using the transformation kernel group 2 to obtain
  • the transformation coefficient 1 of the transformation coefficient 1 is transposed to obtain the transformation coefficient 2.
  • whether to perform the transposition may be agreed in advance.
  • the transformation kernel (such as LFNST transformation kernel) corresponding to inter-frame coding of the present application can multiplex the transformation kernel (such as LFNST transformation kernel) corresponding to intra-frame coding ), for example, the transformation kernel in the above table 2 multiplexes the transformation kernel in the above table 1. In this way, there is no need to increase the space for storing the transformation kernel and additional logic.
  • the transformation kernel (for example, LFNST transformation kernel) corresponding to the inter-frame coding of the present application may partially multiplex the transformation kernel (such as the LFNST transformation kernel) corresponding to intra-frame coding, for example, the transformation kernel in the above-mentioned Table 2 is partially multiplexed Use the transform kernels in Table 1 above.
  • the transformation kernel (for example, LFNST transformation kernel) corresponding to inter-frame coding of the present application does not multiplex the transformation kernel (such as LFNST transformation kernel) corresponding to intra-frame coding, for example, the transformation kernel in Table 2 above does not multiplex The transform kernels in Table 1 above.
  • the transformation kernel group corresponding to the gradient information of the prediction block of the current block can be queried.
  • the gradient information of the prediction block of the current block determined according to the above method is 1, and 1 is located in [1/2,2 ) interval, and [1/2,2) corresponds to the transformation kernel group 3, therefore, the transformation kernel corresponding to the current block is determined to be the transformation kernel group 3.
  • the size of the transform kernel corresponding to blocks of different sizes is also different, for example, the size of the transform kernel corresponding to the 8 ⁇ 8 block is 4 ⁇ 4, and the size of the transform kernel corresponding to the 16 ⁇ 16 block is 8 ⁇ 8.
  • the correspondence between the preset prediction block gradient and the transformation kernel corresponding to different sizes of transformation kernels is also different.
  • the 8X8 transformation kernel corresponds to a preset prediction block gradient and the correspondence between the transformation kernel 1, and the 4X4
  • the transform kernel corresponds to a preset correspondence between the prediction block gradient and the transform kernel 2.
  • the above S403-B12 includes:
  • This application does not limit the size of the transformation kernel corresponding to the current block. For example, when at least one of the width and height of the current block is less than 8, the size of the transformation kernel corresponding to the current block is determined to be 4X4. If the width and height of the current block are greater than At 4, determine the transform kernel size 8X8 corresponding to the current block. If at least one of the width and height of the current block is less than 8, and both of the width and height are greater than 4, it is determined that the size of the transformation kernel corresponding to the current block can be 4 ⁇ 4 or 8 ⁇ 8.
  • the transformed quantized coefficients of the residual due to intra-coding are statistically more than the coefficients of inter-coding.
  • the intra-coded residual is statistically more complex than the inter-coded residual.
  • this is determined by the difference between the intra-frame prediction method and the inter-frame prediction method.
  • Intra-frame prediction uses spatial correlation
  • inter-frame prediction uses temporal correlation; on the other hand, the most commonly used random In the Random Access configuration
  • the intra frame is usually used as the bottom reference frame of the GOP (Group of Pictures, Group of Pictures) structure.
  • the quality requirements are relatively high, while some inter-frame frames have relatively low quality requirements. Therefore, the usage conditions of the transformation kernel for inter-frame transformation and the transformation kernel for intra-frame transformation can be differentiated.
  • the transform kernel of a 4X4 transform (such as LFNST) is used for a small block, that is, a block whose width and height have at least one less than 8.
  • the transform kernel of 8x8 transform (such as LFNST) is used for larger blocks, that is, blocks with width and height greater than 4.
  • a 4x4 transform (such as LFNST) has a transform kernel for blocks with at least one width and height less than 16.
  • the transformation kernel of 8X8 transformation (such as LFNST) is applied to a larger block, such as a block whose width and height are both greater than 8.
  • the size of the transformation kernel corresponding to the current block is 4X4, and then the size of the transformation kernel corresponding to the 4X4 transformation kernel is obtained. Correspondence between preset predicted block gradients and transform kernels. Next, in the corresponding relationship corresponding to the 4 ⁇ 4 transform kernel, determine the target transform kernel corresponding to the gradient information of the predicted block, and determine the target transform kernel as the transform kernel corresponding to the current block.
  • the width and height of the current block are greater than 8, then determine that the size of the transformation kernel corresponding to the current block is 8X8, and then obtain the preset prediction block gradient corresponding to the 8X8 transformation kernel Correspondence with the transformation kernel. Next, in the corresponding relationship corresponding to the 8X8 transform kernel, determine the target transform kernel corresponding to the gradient information of the predicted block, and determine the target transform kernel as the transform kernel corresponding to the current block.
  • the above describes the process of determining the transform kernel corresponding to the current block according to the texture information of the predicted block.
  • the transformation kernel corresponding to the current block may also be determined according to the neural network model.
  • the above S403 includes the following S403-1 and S403-2:
  • the transformation kernel indication information is used to indicate the transformation kernel corresponding to the current block.
  • the transformation kernel corresponding to the current block can be is a transformation kernel or a set of transformation kernels;
  • the model used is pre-trained.
  • the prediction block of the image block is used as input, and the model is trained with the true value of the transformation kernel indication information corresponding to the image block.
  • the trained model Transform kernels can be predicted.
  • the decoder can input the predicted block into the trained model, and obtain the transformation kernel indication information corresponding to the current block output by the model.
  • the transformation kernel indication information is used to indicate the transformation kernel corresponding to the current block, and then according to the transformation kernel indication information , to determine the transformation kernel corresponding to the current block.
  • the embodiment of the present application does not limit the specific network structure of the model, for example, any image recognition neural network such as an image convolutional neural network and an adversarial neural network.
  • the prediction block in order to reduce the amount of calculation and the complexity of the model, before inputting the prediction block into the model, the prediction block is first down-sampled to reduce the data volume and complexity of the prediction block, and then the down-sampled prediction Inputting a block into a pre-trained model can improve the efficiency of the model in predicting the transformation kernel indication information corresponding to the current block.
  • the above steps introduce the process of determining the transformation kernel corresponding to the current block according to the prediction block of the current block. After the transformation kernel corresponding to the current block is determined, the following steps are performed.
  • the target transform coefficient is a basic transform coefficient
  • the target transform coefficient is inversely transformed using a transform kernel to obtain a residual block of the current block.
  • the above-mentioned S404 includes the following steps:
  • T is the transformation kernel
  • the transformation kernel is the transformation matrix
  • the target transform coefficient can be inversely transformed twice based on the transform kernel to obtain the basic transform coefficient of the current block.
  • the basic transformation coefficients are subjected to inverse basic transformation to obtain the residual block of the current block.
  • the decoding end uses the DCT-II transformation shown in the above formula (1) In this way, inverse basic transformation is performed on the above basic transformation coefficients to obtain the residual block of the current block.
  • the decoding end uses the DCT-VIII transformation shown in the above formula (2) In the transformation mode, an inverse basic transformation is performed on the above basic transformation coefficients to obtain a residual block of the current block.
  • the decoding end uses the DST-VII transformation method shown in the above formula (3) In the transformation mode, an inverse basic transformation is performed on the above basic transformation coefficients to obtain a residual block of the current block.
  • the prediction block is added to the residual block to obtain the reconstructed block of the current block.
  • the target transformation coefficient of the current block is obtained by decoding the code stream; the prediction block of the current block is obtained by predicting the current block; the transformation kernel corresponding to the current block is determined according to the prediction block; according to the transformation kernel, Inverse transform is performed on the target transform coefficient, and the current fast residual block is obtained according to the transform result of the inverse transform. That is, this application is based on the correlation between the residual texture and the texture of the prediction block itself, and determines or guides or assists the selection of the transformation kernel through the characteristics of the prediction block, so as to reduce the transmission of the selection information of the transformation kernel in the code stream and reduce the While transforming the overhead in the code stream, the compression efficiency of the current block is improved.
  • Fig. 11 is another schematic flowchart of the video decoding method provided by the embodiment of the present application
  • Fig. 12 is a schematic diagram of the video decoding process involved in the embodiment of the present application, as shown in Fig. 11 and Fig. 12 , including:
  • the encoding end quantizes the target transform coefficients after the secondary transformation to form quantized coefficients, and encodes the quantized coefficients to form a code stream.
  • the decoder decodes the code stream to obtain the quantization coefficient of the current block.
  • a quantization mode is determined, and the quantization coefficient is dequantized using the determined quantization mode to obtain a target transform coefficient of the current block.
  • the method for determining the quantization method at the decoder can be:
  • the decoding end obtains the indication information of the quantization mode by decoding the code stream, and determines the quantization mode of the current block according to the indication information.
  • Method 2 The decoder adopts the default quantization method.
  • the decoding end determines the quantization mode of the current block in the same manner as the encoding end.
  • the aforementioned preset conditions include at least one of the following conditions:
  • At least one of the sum of the gradients in the horizontal direction and the sum of the gradients in the vertical direction is greater than or equal to the first preset value
  • At least one of the gradients in the horizontal direction and the gradient in the vertical direction of at least M points is greater than or equal to a second preset value, and M is a positive integer less than or equal to N.
  • the gradients of N points meet at least one of the above conditions 1 and 2, for example, if at least one of the sum of the gradients in the horizontal direction and the sum of the gradients in the vertical direction of N points is greater than or equal to the first A preset value, and/or when at least one of the gradient in the horizontal direction and the gradient in the vertical direction of at least M points in the N points is greater than or equal to the second preset value, it indicates that the specific and obvious texture of the prediction block is specified, and When the predicted block has obvious texture, the residual block also has obvious texture.
  • the decoding end performs the following steps of S504, S505, S506, S509 and S511, that is, calculates the gradient information of the predicted block, and determines the transformation kernel corresponding to the current block according to the gradient information of the prediction block, and according to the transformation kernel, the target
  • the transformation coefficients are inversely transformed twice to obtain the basic transformation coefficients of the current block, and then the basic transformation coefficients are inversely transformed to obtain the residual block of the current block.
  • the decoding method of the decoder includes at least the following two Way:
  • the decoding end executes S508, S509 and S511, that is, the decoding end determines that the transformation kernel corresponding to the current block is the first transformation kernel, and the first transformation kernel corresponds to the smallest directional texture among multiple preset transformation kernels for secondary transformation
  • the transformation kernel of for example, transformation kernel group 0.
  • the first transformation kernel perform inverse secondary transformation on the target transformation coefficient to obtain the basic transformation coefficient of the current block, and then perform inverse basic transformation on the basic transformation coefficient to obtain the residual block of the current block.
  • the decoding end performs the steps of S510 and S511, that is, the decoding end skips the inverse secondary transformation operation on the current block, directly uses the above target transformation coefficient as the basic transformation coefficient of the current block, and performs inverse basic transformation on the basic transformation coefficient , to get the residual block of the current block.
  • the first transformation kernel is transformation kernel group 0 in Table 1.
  • the decoding method of the present application is described above by taking the secondary transform between frames as an example.
  • the above method can also be used for decoding.
  • the above-mentioned steps which will not be repeated here.
  • the quantized coefficient of the current block is obtained by decoding the code stream; the quantized coefficient is dequantized to obtain the target transformation coefficient of the current block; the inter-frame prediction is performed on the current block to obtain the predicted block of the current block; Determine the gradients of N points in the prediction block; judge whether the gradients of N points meet the preset conditions, if the gradients of N points meet the preset conditions, determine the gradient information of the prediction block according to the gradients of N points, and according to Predict the gradient information of the block, determine the transformation kernel corresponding to the current block, and then perform inverse secondary transformation on the target transformation coefficient according to the transformation kernel to obtain the basic transformation coefficient of the current block, and finally perform inverse basic transformation on the basic transformation coefficient to obtain the current block residual block.
  • the first transformation kernel is the transformation kernel with the smallest direction texture corresponding to the plurality of preset transformation kernels, and use The first transformation kernel performs an inverse secondary transformation on the target transformation coefficient to obtain the basic transformation coefficient of the current block, and finally performs an inverse basic transformation on the basic transformation coefficient to obtain the residual block of the current block.
  • the gradients of N points meet the preset conditions, skip the inverse secondary transformation operation on the current block, directly use the target transformation coefficient as the basic transformation coefficient, and perform inverse basic transformation on the basic transformation coefficient to obtain the current block residual block.
  • this application judges the texture size of the prediction block of the current block according to the gradient of N points. If the texture of the prediction block is large, it means that the residual block has obvious texture. At this time, the transformation kernel corresponding to the current block is determined by the prediction block , and inversely transforming the target transform coefficients can realize accurate decoding of the current block. If the texture of the predicted block is small, it means that the residual block does not have obvious texture. At this time, the second inverse transformation is skipped, or a transformation kernel with no obvious texture is assigned to the current block to prevent the current block from being reversed twice. Transformation, thereby improving the accuracy of the inverse transformation of the current block and improving the decoding efficiency.
  • FIG. 13 is a schematic flowchart of a video encoding method provided by an embodiment of the present application
  • FIG. 14 is a schematic diagram of a video encoding process involved in an embodiment of the present application.
  • the embodiment of the present application is applied to the video encoder shown in FIG. 1 and FIG. 2 .
  • the method of the embodiment of the present application includes:
  • the video encoder receives a video stream, which is composed of a series of image frames, performs video encoding for each frame of image in the video stream, and divides the image frames into blocks to obtain the current block.
  • the current block is also referred to as a current coding block, a current image block, a coding block, a current coding unit, a current block to be coded, a current image block to be coded, and the like.
  • the block divided by the traditional method includes not only the chrominance component of the current block position, but also the luminance component of the current block position.
  • the separation tree technology can divide separate component blocks, such as a separate luma block and a separate chrominance block, where the luma block can be understood as only containing the luma component of the current block position, and the chrominance block can be understood as containing only the current block The chroma component of the position. In this way, the luma component and the chrominance component at the same position can belong to different blocks, and the division can have greater flexibility. If the separation tree is used in CU partitioning, some CUs contain both luma and chroma components, some CUs only contain luma components, and some CUs only contain chroma components.
  • the current block in the embodiment of the present application only includes chroma components, which may be understood as a chroma block.
  • the current block in this embodiment of the present application only includes a luma component, which may be understood as a luma block.
  • the current block includes both luma and chroma components.
  • intra-frame prediction is performed on the current block to obtain a prediction block of the current block.
  • the prediction mode used for the intra prediction of the current block which is specifically determined according to actual conditions.
  • inter-frame prediction is performed on the current block to obtain a prediction block of the current block.
  • the prediction mode used for the inter-frame prediction of the current block which is specifically determined according to actual conditions. For example, when the video encoder performs inter-frame prediction on the current block, it will try at least one inter-frame prediction mode among multiple inter-frame prediction modes, and select the inter-frame prediction mode with the smallest rate-distortion cost as the target inter-frame prediction mode, And use the target inter-frame prediction mode to perform inter-frame encoding on the current block to obtain a prediction block of the current block.
  • S602. Determine a transform kernel corresponding to the current block according to the prediction block.
  • the texture of the residual block has a certain correlation with the texture of the prediction block, taking inter-frame coding as an example, for a block using inter-frame coding, the texture of the residual block and the texture of the prediction block itself predicted by inter-frame prediction have a certain correlation. Correlation. For example, usually the residuals will appear at the edges of objects, and the edges of objects will show obvious gradient features in the prediction block. As another example, for gradually changing textures, such as the folds of clothes, the texture of the residual often has the same or similar direction as the texture in the prediction block. Therefore, the embodiment of the present application determines or guides or assists the selection of the transformation kernel according to the characteristics of the prediction block.
  • the transform kernel corresponding to the current block determined in this embodiment of the present application may be one transform kernel or a group of transform kernels, where a group of transform kernels includes at least two transform kernels.
  • the above S602 includes the following S602-A and S602-B:
  • the texture information of the prediction block includes any information that can represent the texture feature of the prediction block, such as the texture direction of the prediction block, the texture size of the prediction block, and the like.
  • the above S602-A includes S602-A1:
  • S602-B includes S602-B1:
  • the gradient information of the prediction block includes at least one of a gradient direction and a gradient magnitude of the prediction block.
  • the methods for determining the gradient information of the prediction block in the above S602-A1 include but are not limited to the following:
  • the gradient information of the predicted block may be determined by the gradient of all or part of the pixels in the predicted block.
  • the above S602-A1 includes the following S602-A11 and S602-A12:
  • S602-A12. Determine the gradient information of the predicted block according to the gradients of the N points.
  • the foregoing N points may be all pixel points in the prediction block.
  • the above N points may be some pixels in the prediction block.
  • the gradient information used to determine the prediction block is selected.
  • the N points of are the other pixel points in the prediction block except the outermost layer pixel points.
  • the above N points are pixels obtained by sampling pixels in the prediction block by using a certain sampling method, for example, sampling every other pixel.
  • the gradient of at least one point among the N points includes a horizontal gradient and/or a vertical gradient.
  • the above S602-A11 includes:
  • S602-A112. Determine the vertical gradient of the i-th point according to the pixel values of adjacent points of the i-th point in the vertical direction of the prediction block.
  • the adjacent points of the i-th point in the horizontal direction of the prediction block include the left and right adjacent points of the i-th point in the horizontal direction of the prediction block, then the right adjacent point The ratio of the difference between the pixel value of and the pixel value of the left adjacent point to 2 is determined as the horizontal gradient of the i-th point.
  • the adjacent points of the i-th point in the vertical direction of the prediction block include the upper and lower adjacent points of the i-th point in the vertical direction of the prediction block, then the lower phase
  • the ratio of the difference between the pixel value of the adjacent point and the pixel value of the upper adjacent point to 2 is determined as the vertical gradient of the i-th point.
  • the gradients of the N points in the prediction block are determined.
  • execute S602-A12 determine the gradient information of the predicted block according to the gradients of the N points.
  • the ways of determining the gradient information of the predicted block according to the gradients of the N points in the above S602-A12 include but are not limited to the following:
  • Way 1 The gradients of N points are determined as the gradient information of the prediction block.
  • Method 2 Determine the sum of the horizontal gradients of N points grad_hor_sum, and determine the sum of the vertical gradients of N points grad_ver_sum; and determine the gradient information of the prediction block according to the sum of the horizontal gradients and the sum of the vertical gradients of N points grad_para.
  • grad_ver_sum if the sum of the vertical gradients of N points grad_ver_sum is equal to 0, it is determined that the gradient information grad_para of the prediction block is 0.
  • the above S602-B1 includes the following steps from S602-B11 to S602-B13:
  • the sizes of the transform kernels corresponding to blocks of different sizes are also different. Based on this, the above S602-B11 includes:
  • the above S602-B12 includes:
  • inter frames may be used differently than intra frames, taking into account that inter frame coded residuals are statistically simpler than intra frame coded residuals.
  • a 4x4 quadratic transform eg LFNST
  • LFNST 4x4 quadratic transform
  • the transformation kernel of the 8X8 secondary transformation (such as LFNST) is applied to a larger block, such as a block whose width and height are both greater than 8.
  • the size of the transformation kernel corresponding to the current block is 4X4, and then the size of the transformation kernel corresponding to the 4X4 transformation kernel is obtained. Correspondence between preset predicted block gradients and transform kernels. Next, in the corresponding relationship corresponding to the 4 ⁇ 4 transform kernel, determine the target transform kernel corresponding to the gradient information of the predicted block, and determine the target transform kernel as the transform kernel corresponding to the current block.
  • the width and height of the current block are greater than 8, then determine that the size of the transformation kernel corresponding to the current block is 8X8, and then obtain the preset prediction block gradient corresponding to the 8X8 transformation kernel Correspondence with the transformation kernel. Next, in the correspondence relationship corresponding to the 8X8 transform kernel, determine the target transform kernel corresponding to the gradient information of the prediction block, and determine the target transform kernel as the transform kernel corresponding to the current block.
  • the above describes the process of determining the transform kernel corresponding to the current block according to the texture information of the predicted block.
  • the transformation kernel corresponding to the current block may also be determined according to the neural network model.
  • the above S602 includes the following S602-1 and S602-2:
  • the model used is pre-trained.
  • the prediction block of the image block is used as input, and the model is trained with the true value of the transformation kernel indication information corresponding to the image block.
  • the trained model Transform kernels can be predicted.
  • the encoder can input the predicted block into the trained model, and obtain the transformation kernel indication information corresponding to the current block output by the model.
  • the transformation kernel indication information is used to indicate the transformation kernel corresponding to the current block, and then according to the transformation kernel indication information , to determine the transformation kernel corresponding to the current block.
  • the embodiment of the present application does not limit the specific network structure of the model, for example, any image recognition neural network such as an image convolutional neural network and an adversarial neural network.
  • the prediction block in order to reduce the amount of calculation and the complexity of the model, before inputting the prediction block into the model, the prediction block is first down-sampled to reduce the data volume and complexity of the prediction block, and then the down-sampled prediction Inputting a block into a pre-trained model can improve the efficiency of the model in predicting the transformation kernel indication information corresponding to the current block.
  • the above steps introduce the process of determining the transformation kernel corresponding to the current block according to the prediction block of the current block. After the transformation kernel corresponding to the current block is determined, the following steps are performed.
  • the pixel value of the current block is subtracted from the pixel value of the predicted block to obtain the residual block of the current block.
  • the above S603 and the above S602 are implemented in no order, that is, the above S603 can be executed before the above S602, or after the above S602, or executed synchronously with the above S602, which is not limited in this application.
  • the residual block is transformed according to the transformation kernel to obtain transformed coefficients, and the transformed coefficients are transformed to obtain a code stream.
  • the residual block is transformed according to the transformation kernel to obtain transformed coefficients, the transformed coefficients are quantized, and the quantized coefficients are encoded to obtain a code stream.
  • the above S604 includes the following steps:
  • the coding end uses the DCT-II transformation method shown in the above formula (1) to perform basic transformation on the residual block of the current block to obtain the basic transformation coefficient of the current block.
  • the coding end uses the DCT-VIII transformation method shown in the above formula (2) to perform basic transformation on the residual block of the current block to obtain the basic transformation coefficient of the current block.
  • the encoding end uses the DST-VII transformation method shown in the above formula (3) to perform basic transformation on the residual block of the current block to obtain the basic transformation coefficient of the current block.
  • the basic transform coefficients are transformed twice to obtain the target transform coefficients of the current block.
  • the transformation kernel is used to perform secondary transformation on the basic transformation coefficient to obtain the target transformation coefficient of the current block, that is, the product of the transformation kernel and the basic transformation coefficient is used as the target transformation coefficient of the current block.
  • the target transform coefficient of the current block is directly encoded without being quantized to obtain a code stream.
  • the target transformation coefficient of the current block is quantized to obtain a quantized coefficient, and the quantized coefficient is encoded to obtain a code stream.
  • the encoding end can indicate to the decoding end which transformation kernel in the group of transformation kernels is used by the encoding end. Yes, the indication Information can be carried in the code stream.
  • the encoding end may also carry the indication information that the current block adopts the secondary transformation in the code stream, so that the decoding end performs the embodiment of the present application when determining that the current block uses the secondary transformation according to the indication information. decoding method.
  • the prediction block of the current block is obtained by predicting the current block; the transformation kernel corresponding to the current block is determined according to the prediction block; the residual block of the current block is obtained according to the prediction block and the current block;
  • the transformation kernel transforms the residual block, and encodes the transformed coefficients to obtain a code stream. That is, this application is based on the correlation between the residual texture and the texture of the prediction block itself, and determines or guides or assists the selection of the transformation kernel through the characteristics of the prediction block, so as to reduce the transmission of the selection information of the transformation kernel in the code stream and reduce the While transforming the overhead in the code stream, the compression efficiency of the current block is improved.
  • FIG. 15 is a schematic flowchart of a video encoding method provided by an embodiment of the present application
  • FIG. 16 is a schematic diagram of a video encoding process involved in an embodiment of the present application.
  • the embodiment of the present application is applied to the video encoder shown in FIG. 1 and FIG. 2 .
  • the method of the embodiment of the present application includes:
  • the pixel value of the current block is subtracted from the pixel value of the predicted block to obtain the residual block of the current block.
  • the above S703 and the above S702 are implemented in no order, that is, the above S703 can be executed before the above S702, or can be executed after the above S702, or executed synchronously with the above S702, which is not limited in this application.
  • the aforementioned preset conditions include at least one of the following conditions:
  • At least one of the sum of the gradients in the horizontal direction and the sum of the gradients in the vertical direction is greater than or equal to the first preset value
  • At least one of the gradients in the horizontal direction and the gradient in the vertical direction of at least M points is greater than or equal to a second preset value, and M is a positive integer less than or equal to N.
  • the gradients of N points meet at least one of the above conditions 1 and 2, for example, if at least one of the sum of the gradients in the horizontal direction and the sum of the gradients in the vertical direction of N points is greater than or equal to the first A preset value, and/or when at least one of the gradient in the horizontal direction and the gradient in the vertical direction of at least M points in the N points is greater than or equal to the second preset value, it indicates that the specific and obvious texture of the prediction block is specified, and When the predicted block has obvious texture, the residual block also has obvious texture.
  • the encoding end performs the following steps of S706, S707, S709, and S711 and S712, that is, calculates the gradient information of the predicted block, and determines the transformation kernel corresponding to the current block according to the gradient information of the prediction block, and according to the transformation kernel, the The basic transformation coefficient is transformed twice to obtain the target transformation coefficient of the current block, and then the target transformation coefficient is quantized to obtain the quantized coefficient, and finally the quantized coefficient is encoded to obtain the code stream.
  • the encoding method of the encoding end includes at least the following two Way:
  • Mode 1 the encoding end executes S708, S709, S711 and S712, that is, the encoding end determines that the transformation kernel corresponding to the current block is the first transformation kernel, and the first transformation kernel is the transformation with the smallest directional texture corresponding to multiple preset transformation kernels Kernel, for example, transformation kernel group 0.
  • the basic transform coefficients are transformed twice to obtain the target transform coefficients of the current block, and then the target transform coefficients are quantized to obtain quantized coefficients, and finally the quantized coefficients are encoded to obtain a code stream.
  • the encoding end performs the steps of S710, S711 and S712, that is, the encoding end skips the secondary transformation operation on the basic transformation coefficient, but directly uses the above basic transformation coefficient as the target transformation coefficient of the current block, and transforms the target The coefficients are quantized to obtain quantized coefficients, and finally the quantized coefficients are encoded to obtain a code stream.
  • the first transformation kernel is transformation kernel group 0 in VVC.
  • the encoding method of the present application is described above by taking the secondary transform between frames as an example.
  • the above method can also be used for encoding.
  • the prediction block of the current block is obtained by performing inter-frame prediction on the current block; the residual block of the current block is obtained according to the prediction block of the current block and the current block; Transform to obtain the basic transformation coefficient of the current block; determine the gradient of N points in the prediction block; judge whether the gradient of N points meets the preset condition, and if it meets the preset condition, determine the prediction block according to the gradient of N points According to the gradient information of the predicted block, the transformation kernel corresponding to the current block is determined, and according to the transformation kernel, the basic transformation coefficient is transformed twice to obtain the target transformation coefficient of the current block, and finally the target transformation of the current block is performed The coefficients are quantized to obtain quantized coefficients, and the quantized coefficients are encoded to obtain a code stream.
  • the transformation kernel corresponding to the current block is the first transformation kernel
  • the first transformation kernel is the transformation kernel with the smallest direction texture corresponding to the plurality of preset transformation kernels
  • the first transformation kernel is used to perform secondary transformation on the basic transformation coefficients to obtain the target transformation coefficients of the current block, and then quantize the target transformation coefficients of the current block to obtain quantized coefficients, and encode the quantized coefficients to obtain a code stream.
  • this application judges the texture size of the prediction block of the current block according to the gradient of N points. If the texture of the prediction block is large, it means that the residual block has obvious texture.
  • the transformation kernel corresponding to the current block is determined by the prediction block , and use the transformation kernel to perform secondary transformation on the basic transformation coefficients to improve the compression efficiency of the image. If the texture of the predicted block is small, it means that the residual block does not have obvious texture. At this time, the secondary transformation is skipped, or a transformation kernel with no obvious texture is corresponding to the current block to prevent the current block from being over-transformed, thereby improving The transformation accuracy of the current block is improved, and the coding efficiency is improved.
  • FIGS. 9 to 16 are only examples of the present application, and should not be construed as limiting the present application.
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
  • the implementation of the examples constitutes no limitation.
  • the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
  • Fig. 17 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
  • the video decoder 10 includes:
  • Decoding unit 11 configured to decode the code stream and determine the target transform coefficient of the current block
  • a prediction unit 12 configured to predict the current block to obtain a prediction block of the current block
  • a determining unit 13 configured to determine a transform kernel corresponding to the current block according to the predicted block
  • the inverse transform unit 14 is configured to perform inverse transform on the target transform coefficient according to the transform kernel, and obtain the current fast residual block according to the transform result of the inverse transform.
  • the prediction unit 12 is configured to perform inter-frame prediction on the current block to obtain a prediction block of the current block;
  • the inverse transformation unit 14 is configured to perform inverse secondary transformation on the target transformation coefficient according to the transformation kernel to obtain the basic transformation coefficient of the current block; perform inverse basic transformation on the basic transformation coefficient Transform to obtain the residual block of the current block.
  • the determining unit 13 is specifically configured to determine texture information of the prediction block; and determine a transform kernel corresponding to the current block according to the texture information of the prediction block.
  • the texture information of the prediction block includes gradient information of the prediction block
  • the determining unit 13 is specifically configured to determine the gradient information of the prediction block; according to the gradient information of the prediction block, determine the The transform kernel corresponding to the current block.
  • the determining unit 13 is specifically configured to determine gradients of N points in the prediction block, where N is a positive integer; and determine gradient information of the prediction block according to the gradients of the N points.
  • the gradients of the N points include horizontal gradients and/or vertical gradients.
  • the determining unit 13 is specifically configured to, for the i-th point among the N points, according to the i-th point in Determine the horizontal gradient of the i-th point based on the pixel values of adjacent points in the horizontal direction of the prediction block, where i is a positive integer less than or equal to N; according to the i-th point in the prediction block The pixel values of adjacent points in the vertical direction of , determine the vertical gradient of the i-th point.
  • the determining unit 13 is specifically configured to: if the adjacent points of the i-th point in the horizontal direction of the prediction block include the left side of the i-th point in the horizontal direction of the prediction block For the adjacent point and the right adjacent point, the ratio of the difference between the pixel value of the right adjacent point and the pixel value of the left adjacent point to 2 is determined as the horizontal gradient of the i-th point.
  • the determining unit 13 is specifically configured to: if the adjacent points of the i-th point in the vertical direction of the prediction block include the i-th point in the vertical direction of the prediction block The upper and lower adjacent points of the upper and lower adjacent points, the ratio of the difference between the pixel value of the lower adjacent point and the pixel value of the upper adjacent point to 2 is determined as the vertical value of the i-th point straight gradient.
  • the determination unit 13 is specifically configured to determine the sum of the horizontal gradients of the N points; determine the sum of the vertical gradients of the N points; according to the sum of the horizontal gradients of the N points and The sum of the vertical gradients determines the gradient information of the prediction block.
  • the determining unit 13 is specifically configured to determine the gradient information of the prediction block by determining the ratio of the sum of horizontal gradients and the sum of vertical gradients of the N points.
  • the determining unit 13 is specifically configured to determine that the gradient information of the prediction block is 0 if the sum of the vertical gradients of the N points is equal to 0.
  • the determining unit 13 is specifically configured to determine whether the gradients of the N points meet the preset conditions; if the gradients of the N points meet the preset conditions, then according to the gradients of the N points , to determine the gradient information of the prediction block.
  • the preset conditions include at least one of the following conditions:
  • At least one of the sum of the gradients in the horizontal direction and the sum of the gradients in the vertical direction is greater than or equal to the first preset value
  • At least one of the gradients in the horizontal direction and the gradient in the vertical direction of at least M points is greater than or equal to a second preset value, where M is a positive integer less than or equal to N.
  • the transformation unit 14 is configured to skip performing an inverse secondary transformation operation on the current block; or, determine the current block
  • the corresponding transformation kernel is the first transformation kernel, and the first transformation kernel is the transformation kernel corresponding to the smallest directional texture among the plurality of preset transformation kernels.
  • the N points are pixels other than the outermost layer pixels in the prediction block; or, the N points are obtained by sampling the pixels in the prediction block pixel.
  • the determining unit 13 is specifically configured to obtain a preset correspondence between the prediction block gradient and the transformation kernel; in the correspondence, determine the target transformation kernel corresponding to the gradient information of the prediction block ; Determine the target transform kernel as the transform kernel corresponding to the current block.
  • the determining unit 13 is specifically configured to determine the size of the transformation kernel corresponding to the current block according to the size of the current block; and obtain the size of the transformation kernel corresponding to the size of the transformation kernel according to the size of the transformation kernel corresponding to the current block.
  • the determining unit 13 is specifically configured to determine that the transform kernel size corresponding to the current block is 4X4 if at least one of the width and height of the current block is less than 16; if the width of the current block is and height are greater than 8, then it is determined that the transform kernel size corresponding to the current block is 8 ⁇ 8.
  • the determining unit 13 is specifically configured to input the prediction block into a pre-trained model, and obtain the transformation kernel indication information corresponding to the current block output by the model, and the transformation kernel indication information is used To indicate the transformation kernel corresponding to the current block; determine the transformation kernel corresponding to the current block according to the transformation kernel indication information.
  • the determining unit 13 is specifically configured to downsample the prediction block; input the downsampled prediction block into a pre-trained model, and obtain the transformation corresponding to the current block output by the model Nuclear instructions.
  • the decoding unit 11 is further configured to decode the code stream to obtain the quantized coefficient of the current block; perform inverse quantization on the quantized coefficient to obtain the target transform coefficient of the current block.
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the video decoder 10 shown in FIG. 17 can execute the decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of the various units in the video decoder 10 are for realizing the above-mentioned decoding method and other methods. For the sake of brevity, the corresponding process will not be repeated here.
  • Fig. 18 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
  • the video encoder 20 may include:
  • a prediction unit 21 configured to predict the current block to obtain a prediction block of the current block
  • a determining unit 22 configured to determine a transform kernel corresponding to the current block according to the predicted block
  • a residual unit 23 configured to obtain a residual block of the current block according to the predicted block and the current block;
  • the transformation unit 24 is configured to transform the residual block according to the transformation kernel, and encode the transformed coefficients to obtain a code stream.
  • the prediction unit 21 is configured to perform inter-frame prediction on the current block to obtain a prediction block of the current block.
  • the transformation unit 24 is specifically configured to perform basic transformation on the residual block to obtain the basic transformation coefficient of the current block; perform secondary transformation on the basic transformation coefficient according to the transformation kernel, Obtain the target transformation coefficient of the current block; encode the target transformation coefficient to obtain a code stream.
  • the determining unit 22 is specifically configured to determine texture information of the prediction block; and determine a transform kernel corresponding to the current block according to the texture information of the prediction block.
  • the texture information of the prediction block includes gradient information of the prediction block
  • the determining unit 22 is specifically configured to determine the gradient information of the prediction block; according to the gradient information of the prediction block, determine the The transform kernel corresponding to the current block.
  • the determining unit 22 is specifically configured to determine gradients of N points in the prediction block, where N is a positive integer; and determine gradient information of the prediction block according to the gradients of the N points.
  • the gradients of the N points include horizontal gradients and/or vertical gradients.
  • the determining unit 22 is specifically configured to, for the i-th point among the N points, according to the i-th point in Determine the horizontal gradient of the i-th point based on the pixel values of adjacent points in the horizontal direction of the prediction block, where i is a positive integer less than or equal to N; according to the i-th point in the prediction block The pixel values of adjacent points in the vertical direction of , determine the vertical gradient of the i-th point.
  • the determining unit 22 is specifically configured to: if the adjacent points of the i-th point in the horizontal direction of the prediction block include the left side of the i-th point in the horizontal direction of the prediction block For the adjacent point and the right adjacent point, the ratio of the difference between the pixel value of the right adjacent point and the pixel value of the left adjacent point to 2 is determined as the horizontal gradient of the i-th point.
  • the determining unit 22 is specifically configured to: if the adjacent points of the i-th point in the vertical direction of the prediction block include the i-th point in the vertical direction of the prediction block The upper and lower adjacent points of the upper and lower adjacent points, the ratio of the difference between the pixel value of the lower adjacent point and the pixel value of the upper adjacent point to 2 is determined as the vertical value of the i-th point straight gradient.
  • the determination unit 22 is specifically configured to determine the sum of the horizontal gradients of the N points; determine the sum of the vertical gradients of the N points; according to the sum of the horizontal gradients of the N points and The sum of the vertical gradients determines the gradient information of the prediction block.
  • the determining unit 22 is specifically configured to determine the gradient information of the prediction block by determining the ratio of the sum of horizontal gradients and the sum of vertical gradients of the N points.
  • the determining unit 22 is specifically configured to determine that the gradient information of the prediction block is 0 if the sum of the vertical gradients of the N points is equal to 0.
  • the determination unit 22 is specifically configured to determine whether the gradients of the N points meet the preset conditions; if the gradients of the N points meet the preset conditions, then according to the gradients of the N points , to determine the gradient information of the prediction block.
  • the preset conditions include at least one of the following conditions:
  • At least one of the sum of the gradients in the horizontal direction and the sum of the gradients in the vertical direction is greater than or equal to the first preset value
  • At least one of the gradients in the horizontal direction and the gradient in the vertical direction of at least M points is greater than or equal to a second preset value, where M is a positive integer less than or equal to N.
  • the transformation unit 24 is configured to skip performing a secondary transformation operation on the basic transformation coefficients; or, determine the current block
  • the corresponding transformation kernel is the first transformation kernel, and the first transformation kernel is the transformation kernel corresponding to the smallest directional texture among the plurality of preset transformation kernels for the secondary transformation.
  • the N points are pixels other than the outermost layer pixels in the prediction block; or, the N points are obtained by sampling the pixels in the prediction block pixel.
  • the determination unit 22 is specifically configured to obtain a preset correspondence between the prediction block gradient and the transformation kernel; in the correspondence, determine the target transformation kernel corresponding to the gradient information of the prediction block ; Determine the target transform kernel as the transform kernel corresponding to the current block.
  • the determining unit 22 is specifically configured to determine the size of the transformation kernel corresponding to the current block according to the size of the current block; and obtain the size of the transformation kernel corresponding to the size of the transformation kernel according to the size of the transformation kernel corresponding to the current block.
  • the determining unit 22 is specifically configured to determine that the transform kernel size corresponding to the current block is 4X4 if at least one of the width and height of the current block is less than 16; if the width of the current block is and height are greater than 8, then it is determined that the transform kernel size corresponding to the current block is 8 ⁇ 8.
  • the determining unit 22 is specifically configured to input the prediction block into a pre-trained model, and obtain the transformation kernel indication information corresponding to the current block output by the model, and the transformation kernel indication information is used To indicate the transformation kernel corresponding to the current block; determine the transformation kernel corresponding to the current block according to the transformation kernel indication information.
  • the determining unit 22 is specifically configured to downsample the prediction block; input the downsampled prediction block into a pre-trained model, and obtain the transformation corresponding to the current block output by the model Nuclear instructions.
  • the transform unit 24 is specifically configured to quantize the target transform coefficient to obtain the quantized coefficient of the current block; and encode the quantized coefficient to obtain the code stream.
  • the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
  • the video encoder 20 shown in FIG. 18 may correspond to the corresponding subject in the encoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the video decoder 20 are for realizing the encoding
  • the corresponding processes in each method, such as the method will not be repeated here.
  • the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units.
  • each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
  • the decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution.
  • the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • Fig. 19 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application, and the electronic device 30 may include:
  • a memory 33 and a processor 32 the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 .
  • the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
  • the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
  • the processor 32 may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 33 includes but is not limited to:
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application.
  • the one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
  • the electronic device 30 may also include:
  • a transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
  • the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 33 may include a transmitter and a receiver.
  • the transceiver 33 may further include antennas, and the number of antennas may be one or more.
  • bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
  • Fig. 20 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.
  • the video codec system 40 may include: a video encoder 41 and a video decoder 42, wherein the video encoder 41 is used to execute the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to execute The video decoding method involved in the embodiment of the present application.
  • the present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments.
  • the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a digital video disc (digital video disc, DVD)), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
  • a magnetic medium such as a floppy disk, a hard disk, or a magnetic tape
  • an optical medium such as a digital video disc (digital video disc, DVD)
  • a semiconductor medium such as a solid state disk (solid state disk, SSD)
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供一种视频编解码方法、设备、系统、及存储介质,通过对当前块进行帧间预测,得到当前块的预测块;根据预测块,确定当前块对应的变换核;根据变换核,对目标变换系数进行反变换,并根据反变换的变换结果得到所述当前快的残差块。即本申请基于残差纹理和预测块本身的纹理之间相关性,通过预测块的特征,来确定或指导或辅助变换核的选择,在减少变换核的选择信息在码流中的传输,降低变换在码流中的开销的同时,提升了当前块的压缩效率。

Description

视频编解码方法、设备、系统、及存储介质 技术领域
本申请涉及视频编解码技术领域,尤其涉及一种视频编解码方法、设备、系统、及存储介质。
背景技术
数字视频技术可以并入多种视频装置中,例如数字电视、智能手机、计算机、电子阅读器或视频播放器等。随着视频技术的发展,视频数据所包括的数据量较大,为了便于视频数据的传输,视频装置执行视频压缩技术,以使视频数据更加有效的传输或存储。
视频通过编码实现压缩,其编码过程包括预测、变换和量化等过程。例如,通过帧内预测和/或帧间预测,确定当前块的预测块,当前块减去预测块得到残差块,对残差块进行变换得到变换系数,对变换系数进行量化得到量化系数,并对量化系数进行编码,形成码流。
其中,变换是将残差块从空间域变换到频率域,以去除残差之间的相关性。但是目前的变换方法变换效果差,导致视频的压缩效率低。
发明内容
本申请实施例提供了一种视频编解码方法、设备、系统、及存储介质,以提高变换效果,进而提高视频的压缩效率。
第一方面,本申请提供了一种视频解码方法,包括:
解码码流,得到当前块的目标变换系数,所述目标变换系数为编码端对所述当前块的残差块经过二次变换形成的变换系数;
对所述当前块进行预测,得到所述当前块的预测块;
根据所述预测块,确定所述当前块对应的变换核;
根据所述变换核,对所述目标变换系数进行反变换,并根据反变换的变换结果得到所述当前快的残差块。
第二方面,本申请实施例提供一种视频编码方法,包括:
对当前块进行预测,得到所述当前块的预测块;
根据所述预测块,确定所述当前块对应的变换核;
根据所述预测块和所述当前块,得到所述当前块的残差块;
根据所述变换核,对所述残差块进行变换,变换后的系数进行编码,得到码流。
第三方面,本申请提供了一种视频编码器,用于执行上述第一方面或其各实现方式中的方法。具体地,该编码器包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
第四方面,本申请提供了一种视频解码器,用于执行上述第二方面或其各实现方式中的方法。具体地,该解码器包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
第五方面,提供了一种视频编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种视频解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种视频编解码系统,包括视频编码器和视频解码器。视频编码器用于执行上述第一方面或其各实现方式中的方法,视频解码器用于执行上述第二方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
基于以上技术方案,在视频编解码的帧内预测过程中,解码端通过解码码流,得到当前块的目标变换系数;对当前块进行预测,得到当前块的预测块;根据预测块,确定当前块对应的变换核;根据变换核,对目标变换系数进行反变换,并根据反变换的变换结果得到所述当前快的残差块。即本申请基于残差纹理和预测块本身的纹理之间相关性,通过预测块的特征,来确定或指导或辅助变换核的选择,在减少变换核的选择信息在码流中的传输,降低变换在码流中的开销的同时,提升了当前块的压缩效率。
附图说明
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2是本申请实施例涉及的视频编码器的示意性框图;
图3是本申请实施例涉及的视频解码器的示意性框图;
图4A是原始图像的示意图;
图4B是图4A经过DCT变换后的频域图;
图5是LFNST变换的示意图;
图6是变换核对应的基图像示意图;
图7是一种帧内预测模式示意图;
图8是另一种帧内预测模式示意图;
图9为本申请实施例提供的视频解码方法的一种流程示意图;
图10为本申请实施例涉及的视频解码过程示意图;
图11为本申请实施例提供的视频解码方法的另一流程示意图;
图12为本申请实施例涉及的视频解码过程示意图;
图13本申请实施例提供的视频编码方法的一种流程示意图;
图14本申请实施例涉及的视频编码过程的示意图;
图15本申请实施例提供的视频编码方法的一种流程示意图;
图16本申请实施例涉及的视频编码过程的示意图;
图17是本申请一实施例提供的视频解码器的示意性框图;
图18是本申请一实施例提供的视频编码器的示意性框图;
图19是本申请实施例提供的电子设备的示意性框图;
图20是本申请实施例提供的视频编解码系统的示意性框图。
具体实施方式
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器(调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测,其中U、D、L和R分别表示上下左右,例如,2N×nU是指将2N×2N的CU按照上下1:3的比例划分为两个PU,2N×nD是指将2N×2N的CU按照上下3:1的比例划分为两个PU,nL×2N是指将2N×2N的CU按照左右1:3的比例划分为两个PU,nR×2N是指将2N×2N的CU按照左右3:1的比例划分为两个PU。
在一些实施例中,如图2所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要再参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。
帧内估计单元212(也称为帧内预测单元)只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。例如对于4×4的当前块,当前块左边一行和上面一列的像素为当前块的参考像素,帧内预测使用这些参考像素对当前块进行预测。这些参考像素可能已经全部可得,即全部已经编解码。也可能有部分不可得,比如当前块是整帧的最左侧,那么当前块的左边的参考像素不可得。或者编解码当前块时,当前块左下方的部分还没有编解码,那么左下方的参考像素也不可得。对于参考像素不可得的情况,可以使用可得的参考像素或某些值或某些方法进行填充,或者不进行填充。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。对于亮度分量有基于训练得到的预测矩阵(Matrix based intra prediction,MIP)预测模式,对于色度分量,有CCLM预测模式。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需 求。
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图3是本申请实施例涉及的解码器的示意性框图。
如图3所示,视频解码器300包含:熵解码单元310、预测单元320、逆量化变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、逆量化变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧内估计单元321和帧间预测单元322。
帧内估计单元321(也称为帧内预测单元)可执行帧内预测以产生PU的预测块。帧内估计单元321可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元321还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元322可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元322可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元322可根据PU的一个或多个参考块来产生PU的预测块。
逆量化变换单元330(也称为反变换/量化单元)可逆量化(即,解量化)与TU相关联的变换系数。逆量化变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,逆量化变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
由上述图2和图3可知,视频编解码的基本流程如下:在编码端,将一帧图像划分成块,对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,例如将当前块的原始块减去预测块得到残差块,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变换量化单元230输出的量化后的变换系数,可对该量化后的变换系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。逆量化变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
当前块(current block)可以是当前编码单元(CU)或当前预测单元(PU)等。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
由上述可知编码时,通用的混合编码框架会先进行预测,预测利用空间或者时间上的相关性能得到一个跟当前块相同或相似的图像。对一个块来说,预测块和当前块是完全相同的情况是有可能出现的,但是很难保证一个视频中的所有块都如此,特别是对自然视频,或者说相机拍摄的视频,因为有噪音的存在。而且视频中不规则的运动,扭曲形变,遮挡,亮度等的变化,很难被完全预测。所以混合编码框架会将当前块的原始图像减去预测图像得到残差图像,或者说当前块减去预测块得到残差块。残差块通常要比原始图像简单很多,因而预测可以显著提升压缩效率。对残差块也不是直接进行编码,而是通常先进行变换。变换是把残差图像从空间域变换到频率域,去除残差图像的相关性。残差图像变换到频率域以后,由于能量大多集中在低频区域,变换后的非零系数大多集中在左上角。接下来利用量化来进一步压缩。而且由于人眼对高频不敏感,高频区域可以使用更大的量化步长。
图4A是原始图像的示意图,图4B是图4A经过离散余弦变换(Discrete Cosine Transform,简称DCT)后的频域图。如图4B所示,图4A所示的原始图像经过DCT变换以后只有左上角区域存在非零系数。需要说明是,该示例是对整幅图像做了DCT变换,而在视频编解码中,图像是分割成块来处理的,因而变换也是基于块来进行的。
视频压缩标准中最常用的变换,包括DCT-II、DCT-VIII和DST(Discrete Cosine Transform,离散正弦变换)-VII型等。
其中,DCT-II变换的基本公式如公式(1)所示:
Figure PCTCN2021121047-appb-000001
Figure PCTCN2021121047-appb-000002
其中,T i(j)为变换后的系数,N为原始信号的点数,i,j=0,1,…,N-1,ω 0为补偿系数,
DCT-VIII变换的基本公式如公式(2)所示:
Figure PCTCN2021121047-appb-000003
DST-VII变换的基本公式如公式(4)所示:
Figure PCTCN2021121047-appb-000004
由于图像都是2维的,而直接进行二维的变换运算量和内存开销都是当时硬件条件所不能接受的,因而在使用上述DCT-II,DCT-VIII,DST-VII进行变换时,均是拆分成水平方向和竖直方向,进行两步一维变换。如先进行水平方向的变换再进行竖直方向的变换,或者先进行竖直方向的变换再进行水平方向的变换。
上述变换方法对水平方向和竖直方向的纹理比较有效,提升压缩效率是非常有用。但是上述变换方法对斜向的纹理效果就会差一些,因此随着对压缩效率需求的不断提高,如果斜向的纹理能够更有效地处理,可以进一步提升压缩效率。
为了更有效地处理斜向纹理的残差,目前使用了二次变换,即在上述DCT-II,DCT-VIII,DST-VII等基础变换(Primary transform)之后,对频域信号进行第二次变换,将信号从一个变换域转换至另外一个变换域,之后再进行量化,熵编码等操作,其目的是进一步去除统计冗余。
低频不可分离变换(low frequency non-separable transform,简称LFNST)是一种缩减的二次变换。在编码端,LFNST用于基础变换(primary transform)之后量化之前。在解码端,LFNST用于反量化之后,反基础变换之前。
如图5所示,在编码端,LFNST对基础变换后的左上角的低频系数进行二次变换。基础变换通过对图像进行去相关性,把能量集中到左上角。而二次变换对基础变换的低频系数再去相关性。在编码端,16个系数输入到4x4的LFNST变换核,输出是8个系数;64个系数输入到8x8的LFNST变换核,输出是16个系数。在解码端,8个系数输入到4x4的反LFNST变换核,输出是16个系数;16个系数输入到8x8的反LFNST变换核,输出是64个系数。
可选的,LFNST包括4组变换核,这4组变换核对应的基图像如图6所示,可以看出一些明显的斜向纹理。
目前,LFNST只应用于帧内编码的块。帧内预测使用当前块周边已重建的像素作为参考对当前块进行预测,由于目前视频都是从左向右从上向下编码的,因而当前块可使用的参考像素通常在左侧和上侧。
如图7所示,给出了67种帧内预测模式,其中除0Planar,1DC外,有65种角度预测模式。Planar通常处理一些渐变的纹理,DC通常处理一些平坦区域,而对于角度纹理比较明显的块通常会使用帧内角度预测。当然对非正方形的块还可以使用宽角度预测模式,宽角度预测模式使得预测的角度会比正方形的块的角度范围更大。如图8所示,2~66为正方形块的预测模式对应的角度预测模式,-1~-14以及67~80代表宽角度预测模式下扩展的角度预测模式。
角度预测按照指定的角度将参考像素平铺到当前块作为预测值,这意味着预测块会有明显的方向纹理,而当前块经过角度预测后的残差在统计上也会体现出明显的角度特性。因而LFNST所选用的变换核可以跟帧内预测模式进行绑定,即确定了帧内预测模式以后,LFNST只能使用帧内预测模式对应的一组(set)变换核。
具体地,在一个实施例中,LFNST总共有4组变换核,每组有2个变换核,帧内预测模式和变换核组的对应关系如表1所示:
表1
IntraPredMode Tr.set index
IntraPredMode<0 1
0<=IntraPredMode<=1 0
2<=IntraPredMode<=12 1
13<=IntraPredMode<=23 2
24<=IntraPredMode<=44 3
45<=IntraPredMode<=55 2
56<=IntraPredMode<=80 1
81<=IntraPredMode<=83 0
表1中,IntraPredMode表示帧内预测模式,Tr.set index表示变换核组的索引。
需要注意的是,色度帧内预测使用的跨分量预测模式为81到83,亮度帧内预测并没有这几种模式。
LFNST的变换核可以通过转置来用一个变换核组对应处理更多的角度,例如表1中,13到23和45到55的模式都对应变换核组2,但是13到23明显是接近于水平的模式,而45到55明显是接近于竖直的模式,45到55的模式对应的变换和反变换后需要通过转置来进行匹配。
在一些实施例中,LFNST共有4组变换核,根据帧内预测模式指定LFNST使用哪一组。这样做利用了帧内预测模式和LFNST的变换核之间的相关性,从而减少了LFNST的变换核的选择信息在码流中的传输,进而节约码子。而当前块是否会使用LFNST,以及如果使用LFNST,是使用一个组中的第一个还是第二个,是需要通过码流或其他条件来确定的。
由上述可知,LFNST可以提升斜向纹理的残差的压缩效率,但它目前只能应用在帧内编码的块中。这是因为斜向的纹理有很多种可能的方向,而LFNST要使用多种变换核来处理。虽然LFNST使用了一些聚类的处理,即多种相近角度的纹理使用一种变换核,但它仍然要有多组变换核的存在。而利用帧内预测模式与残差之间的相关性,可以直接通过帧内预测模式来确定LFNST的变换核,减少了LFNST的变换核的选择信息在码流中的传输,进而节约码子。
在帧间编码的块中,残差的纹理也有很大比例会呈现明显的角度,而事实已经证明二次变换可以在基础变换的基础上对角度的残差纹理有更好的处理。但是,帧间预测模式或帧间运动信息与残差的纹理方向之间并没有明显的相关性,而如果在码流中传输选择变换核组的开销,对提升压缩效率不利。因而LFNST并没有在帧间编码的块中使用,造成帧间编码的块的压缩效率低。
为了解决上述技术问题,本申请实施例根据当前块的预测块,来确定当前块对应的变换核,并使用该变换核对解码出的当前块的目标变换系数进行反变换,在减少变换核的选择信息在码流中的传输,以降低变换在码流中的开销的同时,提升了当前块的压缩效率。
下面结合具体的实施例,对本申请实施例提供的视频编解码方法进行介绍。
首先结合图9,以解码端为例,对本申请实施例提供的视频解码方法进行介绍。
图9为本申请实施例提供的视频解码方法的一种流程示意图,图10为本申请实施例涉及的视频解码过程示意图。本申请实施例应用于图1和图2所示视频解码器。如图9和图10所示,本申请实施例的方法包括:
S401、解码码流,得到当前块的目标变换系数。
在一些实施例中,当前块也可以称为当前解码块、当前解码单元、解码块、待解码块、待解码的当前块等。
在一些实施例中,当前块包括色度分量不包括亮度分量时,当前块可以称为色度块。
在一些实施例中,当前块包括亮度分量不包括色度分量时,当前块可以称为亮度块。
在一些实施例中,上述目标变换系数可以为基础变换系数,该基础变换系数也称为初始变换系数或初次变换系数或第一变化系数等。
在一些实施例中,上述目标变换系数为编码端对当前块的残差块经过二次变换形成的变换系数,具体是,编码端对当前块的残差块进行基础变换,得到基础变换系数,再对基础变换系数进行二次变换,得到当前块的目标变换系数。在一些实施例中,此时的目标变换系数也称为二次变换系数或第二变换系数等。
本实施例中,上述S401中解码端解码码流,得到当前块的目标变换系数的方式包括但不限于如下几种:
方式一,若编码端在编码时,对目标变换系数不进行量化,而是直接对目标变换系数进行编码,得到码流。这样解码端解码码流,可以从码流中直接得到当前块的目标变换系数。
方式二,编码端在编码时,对目标变换系数进行量化,得到量化系数,再对量化系数进行编码,得到码流。这样,解码端解码码流,得到当前块的量化系数,对量化系数进行反量化,得到当前块的目标变换系数。
S402、对当前块进行预测,得到当前块的预测块。
在一些实施例中,若本申请实施例的方法应用于帧内预测时,则对当前块进行帧内预测,得到当前块的预测块。本申请实施例对当前块进行帧内预测所使用的预测模式不做限制,具体根据实际情况进行确定。
在一些实施例中,若本申请实施例的方法应用于帧间预测时,则对当前块进行帧间预测,得到当前块的预测块。本申请实施例对当前块进行帧间预测所使用的预测模式不做限制,具体根据实际情况进行确定。
下面以确定帧间预测方式为例,解码端至少可以通过如下几种示例所示的方式,确定出当前块的预测块。
在一种示例中,码流中包括当前块对应的帧间预测模式的指示信息,根据该帧间模式的指示信息,选择对应的帧间模式对当前块进行帧间预测,得到当前块的预测块。
在一种示例中,解码端根据与编码端相同的帧间模式确定规则,确定当前块对应的帧间预测模式,并用确定的帧间预测模式对当前块进行帧间预测,得到当前块的预测块。
在一种示例中,通过自编码器来进行帧间预测,得到当前块的预测块,该自编码器为神经网络模型,是经过帧间预测得到的预测块训练得到的,可以实现帧间预测。该自编码器包括编码网络和解码网络,在帧间编码时,编码端将当前块输入编码网络中,得到当前块的特征信息,再将该当前块的特征信息输入解码网络中,得到解码网络输出的当 前块的预测块。当前块的原始值减去预测块得到残差块,对残差块进行两次变换后,再量化形成码流。同时,编码端将当前块的特性信息写入码流中。这样,解码端解码码流,得到当前块的特性信息,采取与编码端相同的措施,即将当前块的特性信息输入自编码器的解码网络中,得到当前块的预测块,该预测块为解码端对当前块进行帧间预测得到的预测块。
需要说明的是,本步骤中对当前块进行帧间预测,得到当前块的预测块的方式包括但不限于如上几种。上述以帧间预测模式为例,对于帧内预测模式,也可以采用上述示例所示的方法,确定出当前块的预测块。
S403、根据预测块,确定当前块对应的变换核。
由于残差块的纹理与预测块的纹理具有一定的相关性,以帧间编码为例,对使用帧间编码的块,残差块的纹理和帧间预测的预测块本身的纹理有一定的相关性。例如,通常残差会出现在物体的边缘,而物体的边缘在预测块中会体现出明显的梯度特征。再例如,渐进变化的纹理,如衣服的褶皱,残差的纹理往往和预测块中的纹理有相同或相似的方向。因此,本申请实施例根据预测块的特征,来确定或指导或辅助变换核的选择。
本申请实施例中确定的当前块对应的变换核可以是一个变换核或者一组变换核,其中一组变换核中包括至少两个变换核,例如一组变换核包括2个变换核或3个变换核或更多的变换核。
在一些实施例中,变换核的选择与预测块的纹理信息相关,这样可以通过确定预测块的纹理信息,来确定当前块对应的变换核。
基于此,上述S403包括如下S403-A和S403-B:
S403-A、确定预测块的纹理信息;
S403-B、根据预测块的纹理信息,确定当前块对应的变换核。
其中,预测块的纹理信息包括任意可以表示预测块纹理特征的信息,例如预测块的纹理方向、预测块的纹理大小等。
在一些实施例中,使用空间灰度共生矩阵来表示预测块的纹理信息,灰度共生矩阵是从N×N的图像f(x,y)的灰度为i的像素出发,统计与i距离为δ=(dx2+dy2)^1/2,灰度为j的像素同时出现的概率P(i,j,δ,θ),θ为dx与δ之间的夹角。
在一些实施例中,使用预测块的梯度信息来表示预测块的纹理信息,其中预测块的梯度信息可以表示出预测块的纹理变化趋势,即纹理变化方向。此时,上述S403-A包括S403-A1:
S403-A1、确定预测块的梯度信息。
对应的,上述上述S403-B包括S403-B1:
S403-B1、根据预测块的梯度信息,确定当前块对应的变换核。
下面对上述S403-A1中确定预测块的梯度信息的过程进行详细介绍。
其中,预测块的梯度信息包括预测块的梯度方向和梯度大小中的至少一个。
上述S403-A1中确定预测块的梯度信息的方式包括但不限于如下几种:
方式一,通过神经网络模型的方式,确定预测块的梯度信息。
例如,该神经网络模型是以图像块为输入,以图像块的梯度信息为约束训练得到的,可以用于预测图像块的梯度信息。本申请对该神经网络模型的具体网络结构不做限制,具体根据实际需要确定,例如可以为图像卷积神经网络、对抗神经网络等。这样,在实际使用过程中,将预测块作为一种图像块,解码端可以将预测块输入到该神经网络模型中,得到该神经网络模型输出的预测块的梯度信息。
方式二,可以通过预测块中全部或部分像素点的梯度,来确定该预测块的梯度信息。在该方式二中,上述S403-A1包括如下S403-A11和S403-A12:
S403-A11、确定预测块中N个点的梯度,N为正整数;
S403-A12、根据N个点的梯度,确定预测块的梯度信息。
可选的,上述N个点可以为预测块中的全部像素点。
可选的,上述N个点可以为预测块中的部分像素点。
在一种示例中,由于预测块中最外一层像素点受其他图像块的影响较大,稳定性差,为了提高确定预测块的梯度信息的准确性,则选择用于确定预测块的梯度信息的N个点为预测块中除最外一层像素点之外的其他像素点。
在一种示例中,上述N个点为采用某一种采样方式,对预测块中的像素点进行采样得到的像素点,例如每隔一个像素点采样一次。
N个点中至少一个点的梯度包括水平梯度和/或竖直梯度,其中水平梯度可以理解为该点在水平方式上的梯度,竖直梯度可以理解为该点在竖直方向上的梯度。这样两个梯度可以单独进行计算,也可以一起计算。
上述S403-A11中确定预测块中N个点的梯度的方式包括但不限于如下几种方式:
方式一,针对N个点中的每个点,计算预测块在该点处的梯度,例如梯度大小和梯度方向,其中梯度大小也称为幅值。将预测块在该点处的梯度,记为该点的梯度。
具体是,预测块为一个图像块,以N个点中的第i个点为例,假设预测块的图像函数为f(x,y),图像函数f(x,y)在第i个点(x,y)处的梯度是一个具体大小和方向的矢量,设为Gx和Gy,其中Gx表示图像函数f(x,y)在x方向的梯度(即水平梯度),Gy表示图像函数在y方向的梯度(即竖直梯度)。这个梯度的矢量可以表示为公式(4)所示:
Figure PCTCN2021121047-appb-000005
其中T表示转置。
通过公式(5)计算出上述梯度矢量的幅值:
Figure PCTCN2021121047-appb-000006
通过公式(6)计算出上述梯度矢量的方向角:
Figure PCTCN2021121047-appb-000007
将上述公式(5)计算出的梯度矢量的幅值记为第i个点的梯度大小,将上述公式(6)计算出的梯度矢量的方向角记为第i个点的梯度方向。
参照上述方式,可以计算出N个点中每一个点的梯度,例如计算出每一个点的梯度大小和梯度方向。
方式二,使用神经网络模型,来确定N个点中每个点的梯度。
例如,以图像中多个像素点的原始值作为输入,以多个像素点的梯度真值为约束对该神经网络模型进行训练,使得该神经网络模型可以预测图像中像素点的梯度。本申请对该神经网络模型的具体网络结构不做限制,具体根据实际需要确定,例如可以为图像卷积神经网络、对抗神经网络等。在实际预测时,将上述N个点输入该神经网络模型中,得到该神经网络模型输出的N个点中每个点的梯度,其中得到的N个点中每个点的梯度包括水平梯度和竖直梯度中的至少一个。
方式三,根据邻近点的像素值,来确定N个点中每个点的梯度。
在该方式三中,水平梯度和竖直梯度分别进行计算,即若N个点的梯度包括水平梯度和竖直梯度,则上述S403-A11包括:
S403-A111、对于N个点中的第i个点,根据第i个点在预测块的水平方向上的邻近点的像素值,确定第i个点的水平梯度;
S403-A112、根据第i个点在预测块的竖直方向上的邻近点的像素值,确定第i个点的竖直梯度。
该方式三中,以N个点中的第i个为例,对确定该第i个点的水平梯度和竖直梯度的过程介绍,N个点中的其他点的梯度确定过程参照该第i个点即可。
在第i个点的水平梯度的确定过程,首先获取第i个点在水平方向上的邻近点的像素值,该邻近点可以是多个,也可以是两个,这些邻近点中与第i个点最近的邻近点可以与第i个点相邻,也可以不相邻。可选的,上述邻近点全部位于第i个点的左侧,或者,全部位于第i个点的右侧。可选的,上述邻近点中部分邻近点位于第i个点的左侧,部分邻近点位于第i个点的右侧,且左侧和右侧的邻近点的数量可以相同,也可以不同,例如第i个点在水平方向上有4个邻近点,其中3个位于第i个点的左侧,1个位于第i个点的右侧,或者,其中2个位于第i个点的左侧,2个位于第i个点的右侧。本申请实施例对第i个点在水平方向上的邻近点的选取不做限制。接着,根据第i个点在水平方向上的邻近点的像素值的变化情况,来确定该第i个点在水平方向上的梯度。例如,水平方向上邻近点的像素值与该第i个点的像素值之间的差异不大,则说明预测块的水平纹理在该第i个点处未发生突变,即确定该第i个点的水平梯度较小。再例如,水平方向上邻近点的像素值与该第i个点的像素值之间的差异较大,则说明预测块的水平纹理在该第i个点处发生突变,即确定该第i个点的水平梯度较大。
在第i个点的竖直梯度的确定过程与上述第i个点的水平梯度的确定过程基本一致。首先获取第i个点在竖直方向上的邻近点的像素值,该邻近点可以是多个,也可以是两个,这些邻近点中与第i个点最近的邻近点可以与第i个点相邻,也可以不相邻。可选的,上述邻近点全部位于第i个点的上侧,或者,全部位于第i个点的下侧。可选的,上述邻近点中部分邻近点位于第i个点的上侧,部分邻近点位于第i个点的下侧,且位于上侧和下侧的邻近点的数量可以相同,也可以不同,例如第i个点在竖直方向上有4个邻近点,其中3个位于第i个点的上侧,1个位于第i个点的下侧,或者,其中2个位于第i个点的上侧,2个位于第i个点的下侧。本申请实施例对第i个点在竖直方向上的邻近点的选取不做限制。接着,根据第i个点在竖直方向上的邻近点的像素值的变化情况,来确定该第i个点在竖直方向上的梯度。例如,竖直方向上邻近点的像素值与该第i个点的像素值之间的差异不大,则说明预测块的竖直纹理在该第i个点处未发生突变,即确定该第i个点的竖直梯度较小。再例如,竖直方向上邻近点的像素值与该第i个点的像素值之间的差异较大,则说明预测块的竖直纹理在该第i个点处发生突变,即确定该第i个点的竖直梯度较大。
本申请实施例对上述S403-A111根据第i个点在预测块的水平方向上的邻近点的像素值,确定第i个点的水平梯度不做限制。
在一些实施例中,上述S403-A111的实现方式包括但不限于如下几种:
方式一,针对第i个点在预测块的水平方向上的每个邻近点,计算每个邻近点的像素值与第i个点的像素值之间的差值,将各差值的和确定为第i个点在水平方向的梯度,或者,将各差值的平均值,确定为第i个点在水平方向的梯度。
方式二,若第i个点在预测块的水平方向上的邻近点包括第i个点在预测块的水平方向上的左相邻点和右相邻点,则将右相邻点的像素值与左相邻点的像素值的差值与2的比值,确定为第i个点的水平梯度。
本申请实施例对上述S403-A112根据第i个点在预测块的竖直方向上的邻近点的像素值,确定第i个点的竖直梯度不做限制。
在一些实施例中,上述S403-A112的实现方式包括但不限于如下几种:
方式一,针对第i个点在预测块的竖直方向上的每个邻近点,计算每个邻近点的像素值与第i个点的像素值之间的差值,将各差值的和确定为第i个点在竖直方向的梯度,或者,将各差值的平均值,确定为第i个点在竖直方向的梯度。
方式二,若第i个点在预测块的竖直方向上的邻近点包括第i个点在预测块的竖直方向上的上相邻点和下相邻点,则将下相邻点的像素值与上相邻点的像素值的差值与2的比值,确定为第i个点的竖直梯度。
根据上述方式,确定出预测块中N个点的梯度。接着,执行S403-A12、根据N个点的梯度,确定预测块的梯度信息。
在一些实施例中,上述S403-A12中根据N个点的梯度,确定预测块的梯度信息的方式包括但不限于如下几种:
方式一,将N个点的梯度,确定为预测块的梯度信息,也就是说,预测块的梯度信息包括N个点的梯度,根据这 N个点的梯度,估计预测块中的纹理方向,进而根据纹理方向,选择变换核。例如,N个点中全部或多少点的水平梯度和竖直梯度相同或者近似相同,那么可以估计预测块中的纹理是倾向于45°的,那么可以据此来选择对45°纹理处理最有效的变换核或变换核组。
方式二,确定N个点的水平梯度之和grad_hor_sum,以及确定N个点的竖直梯度之和grad_ver_sum;并根据N个点的水平梯度之和与竖直梯度之和,确定预测块的梯度信息grad_para。
在该方式二中,将上述S403-A11中确定出的N个点中每个点的水平梯度进行求和,得到grad_hor_sum,接着,将上述S403-A11中确定出的N个点中每个点的竖直梯度进行求和,得到grad_ver_sum,进而根据grad_hor_sum和grad_ver_sum,确定grad_para。
在一种示例中,将grad_hor_sum和grad_ver_sum,确定为grad_para,即预测块的梯度信息grad_para包括N个点的水平梯度之和grad_hor_sum以及N个点的竖直梯度之和grad_ver_sum,进而根据grad_hor_sum和grad_ver_sum的大小,确定当前块对应的变换核,例如,grad_hor_sum和grad_ver_sum相等或近似相等时,可以估计预测块中的纹理是倾向于45°的,那么可以据此来选择对45°纹理处理最有效的变换核或变换核组。再例如,grad_hor_sum和grad_ver_sum中至少一个比较小,小于某一个值时,可以估计预测块中的纹理是倾向于水平或竖直的,那么可以据此来选择没有明显方向纹理的一个变换核或一组变换核。
在另一种示例中,将N个点的水平梯度之和grad_hor_sum与竖直梯度之和grad_ver_sum的比值,确定预测块的梯度信息,即grad_para=grad_hor_sum/grad_ver_sum。
在一些实施例中,若N个点的竖直梯度之和grad_ver_sum等于0,则确定预测块的梯度信息grad_para为0。
根据上述方法确定出预测块的梯度信息后,执行S403-B1,即根据预测块的梯度信息,确定当前块对应的变换核。
在一些实施例中,上述S403-B1包括如下S403-B11至S403-B13的步骤:
S403-B11、获取预设的预测块梯度与变换核之间的对应关系;
S403-B12、在对应关系中,确定预测块的梯度信息所对应的目标变换核;
S403-B13、将目标变换核确定为当前块对应的变换核。
本申请,预先构建了预测块梯度与变换核之间的对应关系,该对应关系中包括不同大小的预测块的梯度所对应的变换核或变换核组。
在一种示例中,预测块的梯度信息grad_para与变换核组的对应关系如表2所示:
表2
grad_para Tr.set index
(-1/2,1/2) 2
[1/2,2) 3
[2,+∞) 2
(-∞,-2] 2
(-2,-1/2] 1
其中,Tr.set index(transform.set index)表示变换核组索引,grad_para表示预测块的梯度信息。如表2所示,预测块的梯度信息grad_para位于(-1/2,1/2)、[2,+∞)和(-∞,-2]时,均对应变换组2,梯度信息grad_para位于[1/2,2)之间时,对应变换组3,梯度信息grad_para位于(-2,-1/2]之间时,对应变换组1。需要说明的是,上述表2只是一种示例,实际应用中预测块的梯度信息grad_para与变换核组的对应关系包括但不限于上述表2。
可选的,本申请实施例在实际使用过程中,也可能会用到转置运算,表2中(-1/2,1/2)和[2,+∞)虽然都对应变换核组2,但是(-1/2,1/2)在实际使用时,需要进行转置,例如,将(-1/2,1/2)对应的变换系数0使用变换核组2进行反变换后得到的变换系数1,对该变换系数1进行转置后,得到变换系数2。其中,是否进行转置可以事先进行约定。
在一些实施例中,若本申请实施例应用于帧间编码时,则本申请的帧间编码对应的变换核(例如LFNST变换核)可以复用帧内编码对应的变换核(例如LFNST变换核),例如上述表2中的变换核复用上述表1中的变换核。这样可以不用额外增加存储变换核的空间以及额外的逻辑。
在一些实施例中,本申请的帧间编码对应的变换核(例如LFNST变换核)可以部分复用帧内编码对应的变换核(例如LFNST变换核),例如上述表2中的变换核部分复用上述表1中的变换核。
在一些实施例中,本申请的帧间编码对应的变换核(例如LFNST变换核)不复用帧内编码对应的变换核(例如LFNST变换核),例如上述表2中的变换核不复用上述表1中的变换核。
在上述表2中,可以查询到当前块的预测块的梯度信息所对应的变换核组,例如,根据上述方法确定的当前块的预测块的梯度信息为1,1位于[1/2,2)区间内,而[1/2,2)对应变换核组3,因此,确定当前块对应的变换核为变换核组3。
在一些实施例中,不同大小的块对应的变换核的大小也不相同,例如8X8块对应的变换核大小为4X4,而16X16块对应的变换核大小为8X8。不同大小的变换核所对应的预设的预测块梯度与变换核之间的对应关系也不相同,例如8X8变换核对应一个预设的预测块梯度与变换核之间的对应关系1,而4X4变换核则对应一个预设的预测块梯度与变换核之间的对应关系2。
基于此,上述S403-B11包括:
S403-B111、根据当前块的大小,确定当前块对应的变换核大小;
S403-B112、根据当前块对应的变换核大小,获取变换核大小所对应的预设的预测块梯度与变换核之间的对应关系。
对应的,上述S403-B12包括:
S403-B121、在上述变换核大小对应的对应关系中,确定预测块的梯度信息所对应的目标变换核。
本申请对当前块所对应的变换核大小不做限制,例如当前块的宽度和高度至少有一个小于8时,则确定当前块对应的变换核大小为4X4,若当前块的宽度和高度都大于4时,确定当前块对应的变换核大小8X8。若当前块的宽度和高度至少有一个小于8,且宽度和高度都大于4时,则确定当前块对应的变换核大小可以为4X4,也可以为8X8。
在一些实施例中,由于帧内编码的残差的变换后的量化系数统计上来说会比帧间编码的系数多。换句话说就是帧内编码的残差比帧间编码的残差统计上更复杂。这一方面是由帧内预测方法和帧间预测方法的差异决定的,帧内预测利用了空间上的相关性,帧间预测利用了时间上的相关性;另一方面在广播最常用的随机访问(Random Access)配置中,帧内帧通常被作为GOP(Group of Pictures,图片组)结构的最底层的参考帧使用,通常质量要求比较高,而有些帧间帧质量要求相对低一些。因而帧间变换的变换核和帧内变换的变换核的使用条件可以有所区分。
目前,4X4的变换(例如LFNST)的变换核用于小块,即宽度和高度至少有一个小于8的块。而8x8的变换(例如LFNST)的变换核用于更大的块,即宽度和高度都大于4的块。但是考虑到帧间编码的残差比帧内编码的残差统计上更简单,所以帧间的应用情况可以跟帧内不同。例如,4X4的变换(例如LFNST)的变换核用于宽度和高度至少有一个小于16的块。而8X8的变换(例如LFNST)的变换核应用于更大的块,例如宽度和高度都大于8的块。
基于上述描述,在本实施例的一种可能的实现方式中,若当前块的宽度和高度中的至少一个小于16,则确定当前块对应的变换核大小为4X4,进而获取4X4变换核对应的预设的预测块梯度与变换核之间的对应关系。接着,在该4X4变换核对应的对应关系中,确定预测块的梯度信息所对应的目标变换核,将该目标变换核确定为当前块对应的变换核。
在本实施例的另一种可能的实现方式中,若当前块的宽度和高度均大于8,则确定当前块对应的变换核大小为8X8,进而获取8X8变换核对应的预设的预测块梯度与变换核之间的对应关系。接着,在该8X8变换核对应的对应关系中,确定预测块的梯度信息所对应的目标变换核,将该目标变换核确定为当前块对应的变换核。
上文对根据预测块的纹理信息,确定当前块对应的变换核的过程进行了介绍。
在一些实施例中,还可以根据神经网络模型,来确定当前块对应的变换核。
即上述S403包括如下S403-1和S403-2:
S403-1、将预测块输入预先训练好的模型中,得到模型输出的当前块对应的变换核指示信息,该变换核指示信息用于指示当前块对应的变换核,当前块对应的变换核可以是一个变换核或一组变换核;
S403-2、根据变换核指示信息,确定当前块对应的变换核。
该方式中,使用的模型是预先训练好的,在训练时,以图像块的预测块为输入,以图像块对应的变换核指示信息的真值为约束对该模型进行训练,训练后的模型可以预测出变换核。这样解码端可以将预测块输入训练好的模型中,得到该模型输出的当前块对应的变换核指示信息,该变换核指示信息用于指示当前块对应的变换核,进而根据该变换核指示信息,确定出当前块对应的变换核。需要说明的是,本申请实施例对该模型的具体网络结构不做限制,例如为图像卷积神经网络、对抗神经网络等任意图像识别神经网络。
在一些实施例中,为了降低计算量和模型的复杂度,在将预测块输入模型之前,先对预测块进行降采样,以降低预测块的数据量和复杂度,再将降采样后的预测块输入预先训练好的模型中,可以提高模型预测当前块对应的变换核指示信息的效率。
上述步骤对根据当前块的预测块,确定当前块对应的变换核的过程进行了介绍,确定出当前块对应的变换核后,执行如下步骤。
S404、根据变换核,对目标变换系数进行反变换,并根据反变换的变换结果得到当前快的残差块。
在一些实施例中,若上述目标变换系数为基础变换系数,则使用变换核对该目标变换系数进行反变换,得到当前块的残差块。
在一些实施例中,若上述目标变换系数为二次变换后的系数,则上述S404包括如下步骤:
S404-A1、根据变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数。
S404-A2、对基础变换系数进行反基础变换,得到当前块的残差块。
本申请实施例对根据变换核,对目标变换系数进行反变换,得到当前块的基础变换系数的方式不做限制。
例如,根据如下公式(7),得到当前块的基础变换系数:
Figure PCTCN2021121047-appb-000008
其中,
Figure PCTCN2021121047-appb-000009
为当前块的目标变换系数,T为变换核,该变换核为变换矩阵,
Figure PCTCN2021121047-appb-000010
为当前块的基础变换系数。
根据上述公式(7)可以基于变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数。
需要说明的是,上述公式(7)只是一种示例,上述S404的实现方式包括但不限于上述公式(7)所示。
接着,对基础变换系数进行反基础变换,得到当前块的残差块。
具体的,根据基础变换所采用的方式,对基础变换系数进行反基础变换,得到当前块的残差块。
在一种示例中,若编码端采用上述公式(1)所示的DCT-II变换方式对当前块的残差块进行基础变换,则解码端采用上述公式(1)所示的DCT-II变换方式,对上述基础变换系数进行反基础变换,得到当前块的残差块。
在另一种示例中,若编码端采用上述公式(2)所示的DCT-VIII变换方式对当前块的残差块进行基础变换,则解码端采用上述公式(2)所示的DCT-VIII变换方式,对上述基础变换系数进行反基础变换,得到当前块的残差块。
在另一种示例中,若编码端采用上述公式(3)所示的DST-VII变换方式对当前块的残差块进行基础变换,则解码端采用上述公式(3)所示的DST-VII变换方式,对上述基础变换系数进行反基础变换,得到当前块的残差块。
根据上述步骤,得到当前块的残差块后,预测块与残差块相加,得到当前块的重建块。
本申请实施例的解码方法,通过解码码流,得到当前块的目标变换系数;对当前块进行预测,得到当前块的预测块;根据预测块,确定当前块对应的变换核;根据变换核,对目标变换系数进行反变换,并根据反变换的变换结果得到所述当前快的残差块。即本申请基于残差纹理和预测块本身的纹理之间相关性,通过预测块的特征,来确定或指导或辅助变换核的选择,在减少变换核的选择信息在码流中的传输,降低变换在码流中的开销的同时,提升了当前块的 压缩效率。
下面结合图11和图12,以帧间二次变换为例,对本申请实施例的解码方法作进一步说明。
图11为本申请实施例提供的视频解码方法的另一流程示意图,图12为本申请实施例涉及的视频解码过程示意图,如图11和图12所示,包括:
S501、解码码流,得到当前块的量化系数。
若编码端对二次变换后的目标变换系数进行量化,形成量化系数,并对量化系数进行编码,形成码流。对应的,解码端接收到码流后,解码码流,得到当前块的量化系数。
S502、对量化系数进行反量化,得到当前块的目标变换系数。
具体的,确定量化方式,并使用确定的量化方式,对量化系数进行反量化,得到当前块的目标变换系数。
解码端确定量化方式的方式可以是:
方式一,若码流中包括量化方式的指示信息,这样解码端通过解码码流,得到量化方式的指示信息,根据该指示信息,确定当前块的量化方式。
方式二,解码端采用默认的量化方式。
方式三,解码端采用与编码端相同的方式,确定当前块的量化方式。
S503、对当前块进行帧间预测,得到当前块的预测块。
上述S503的实现过程可以参照上述S402的描述,在此不再赘述。
S504、确定预测块中N个点的梯度。
上述S504的实现过程可以参照上述S403-A11的描述,在此不再赘述。
S505、判断N个点的梯度是否满足预设条件。
在实际应用中,若预测块中有明显的方向性的纹理时,残差块中才会有明显的方向性的纹理,若预测块中没有明显的方向性的纹理,则残差块中大概率也不会有明显的方向性的纹理。基于此,为了降低不必要的预测块的梯度信息计算,则确定出N个点的梯度后,先判断N个点的梯度是否满足预设条件,若N个点的梯度满足预设条件时,则执行S506,计算预测块的梯度信息。若N个点的梯度不满足预设条件时,则不计算预测块的梯度信息,而是执行S508、S509和S511,或者,执行S510和S511的步骤。
在一些实施例中,上述预设条件包括如下至少一个条件:
条件1,水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值;
条件2,至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值,M为小于或等于N的正整数。
具体的,若N个点的梯度满足上述条件1和条件2中的至少一个,例如若N个点的水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值,和/或N个点中至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值时,说明预测块具体明显的纹理,而预测块具有明显的纹理时残差块也具有明显的纹理。此时,解码端执行如下S504、S505、S506、S509和S511的步骤,即计算预测块的梯度信息,并根据预测块的梯度信息,确定当前块对应的变换核,并根据变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数,再对基础变换系数进行反基础变换,得到当前块的残差块。
若N个点的水平方向的梯度之和、与竖直方向的梯度之和均小于第一预设值,和/或N个点中至少M个点的水平方向的梯度和竖直方向的梯度中均小于第二预设值时,说明预测块不具体明显的纹理,而预测块不具有明显的纹理时残差块也不具有明显的纹理,此时,解码端的解码方式至少包括如下两种方式:
方式一,解码端执行S508、S509和S511,即解码端确定当前块对应的变换核为第一变换核,该第一变换核为二次变换的多个预设变换核中对应的方向纹理最小的变换核,例如为变换核组0。接着,根据第一变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数,再对基础变换系数进行反基础变换,得到当前块的残差块。
方式二,解码端执行S510和S511的步骤,即解码端跳过对当前块进行反二次变换操作,直接将上述目标变换系数作为当前块的基础变换系数,并对基础变换系数进行反基础变换,得到当前块的残差块。
S506、根据N个点的梯度,确定预测块的梯度信息。
S507、根据预测块的梯度信息,确定当前块对应的变换核。
上述S504和S505的实现过程可以参照上述S403-A12和S403-B1的实现,在此不再赘述。
S508、确定当前块对应的变换核为第一变换核,该第一变换核为多个预设变换核中对应的方向纹理最小的变换核。
例如,第一变换核为表1中的变换核组0。
S509、根据变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数。
上述S507的实现过程可以参照上述S404的实现,在此不再赘述。
S510、跳过对当前块进行反二次变换操作。
S511、对基础变换系数进行反基础变换,得到当前块的残差块。
上述S511的实现过程可以参照上述S405的实现,在此不再赘述。
需要说明的是,上述以帧间二次变换为例对本申请的解码方法进行了说明,对于帧内二次变换,也可以采用上述方式进行解码,具体参照上述上述步骤,在此不再赘述。
本申请实施例的解码方法,通过解码码流,得到当前块的量化系数;对量化系数进行反量化,得到当前块的目标变换系数;对当前块进行帧间预测,得到当前块的预测块;确定预测块中N个点的梯度;判断N个点的梯度是否满足预设条件,若N个点的梯度满足预设条件时,根据N个点的梯度,确定预测块的梯度信息,并根据预测块的梯度信息,确定当前块对应的变换核,再根据变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数,最后对基础变换系数进行反基础变换,得到当前块的残差块。若N个点的梯度满足预设条件时,则确定当前块对应的变换核为 第一变换核,该第一变换核为多个预设变换核中对应的方向纹理最小的变换核,并使用第一变换核对目标变换系数进行反二次变换,得到当前块的基础变换系数,最后对基础变换系数进行反基础变换,得到当前块的残差块。或者,若N个点的梯度满足预设条件时,则跳过对当前块进行反二次变换操作,直接将目标变换系数作为基础变换系数,对基础变换系数进行反基础变换,得到当前块的残差块。即本申请根据N个点的梯度判断当前块的预测块的纹理大小,若预测块的纹理较大时,说明残差块具有明显的纹理,此时,通过预测块确定当前块对应的变换核,并对目标变换系数进行反变换,可以实现对当前块的准确解码。若预测块的纹理较小时,说明残差块不具有明显的纹理,此时跳过二次反变换,或者给当前块对应一个不具有明显纹理的变换核,防止对当前块进行过反二次变换,进而提高了当前块的反变换的准确性,提升解码效率。
上文对本申请实施例的解码方法进行介绍,在此基础上,下面对本申请实施例提供的编码方法进行介绍。
图13本申请实施例提供的视频编码方法的一种流程示意图,图14本申请实施例涉及的视频编码过程的示意图。本申请实施例应用于图1和图2所示视频编码器。如图13和图14所示,本申请实施例的方法包括:
S601、对当前块进行预测,得到当前块的预测块。
在视频编码过程中,视频编码器接收视频流,该视频流由一系列图像帧组成,针对视频流中的每一帧图像进行视频编码,视频编码器对图像帧进行块划分,得到当前块。
在一些实施例中,当前块也称为当前编码块、当前图像块、编码块、当前编码单元、当前待编码块、当前待编码的图像块等。
在块划分时,传统方法划分后的块既包含了当前块位置的色度分量,又包含了当前块位置的亮度分量。而分离树技术(dual tree)可以划分单独分量块,例如单独的亮度块和单独的色度块,其中亮度块可以理解为只包含当前块位置的亮度分量,色度块理解为只包含当前块位置的色度分量。这样相同位置的亮度分量和色度分量可以属于不同的块,划分可以有更大的灵活性。如果分离树用在CU划分中,那么有的CU既包含亮度分量又包含色度分量,有的CU只包含亮度分量,有的CU只包含色度分量。
在一些实施例中,本申请实施例的当前块只包括色度分量,可以理解为色度块。
在一些实施例中,本申请实施例的当前块只包括亮度分量,可以理解为亮度块。
在一些实施例中,该当前块即包括亮度分量又包括色度分量。
在一些实施例中,若本申请实施例的方法应用于帧内预测时,则对当前块进行帧内预测,得到当前块的预测块。本申请实施例对当前块进行帧内预测所使用的预测模式不做限制,具体根据实际情况进行确定。
在一些实施例中,若本申请实施例的方法应用于帧间预测时,则对当前块进行帧间预测,得到当前块的预测块。本申请实施例对当前块进行帧间预测所使用的预测模式不做限制,具体根据实际情况进行确定。例如,视频编码器在对当前块进行帧间预测时,会尝试多个帧间预测模式中的至少一个帧间预测模式,选出率失真代价最小的帧间预测模式作为目标帧间预测模式,并使用该目标帧间预测模式对当前块进行帧间编码,得到当前块的预测块。
S602、根据预测块,确定当前块对应的变换核。
由于残差块的纹理与预测块的纹理具有一定的相关性,以帧间编码为例,对使用帧间编码的块,残差块的纹理和帧间预测的预测块本身的纹理有一定的相关性。例如,通常残差会出现在物体的边缘,而物体的边缘在预测块中会体现出明显的梯度特征。再例如,渐进变化的纹理,如衣服的褶皱,残差的纹理往往和预测块中的纹理有相同或相似的方向。因此,本申请实施例根据预测块的特征,来确定或指导或辅助变换核的选择。
本申请实施例中确定的当前块对应的变换核可以是一个变换核或者一组变换核,其中一组变换核中包括至少两个变换核。
在一些实施例中,上述S602包括如下S602-A和S602-B:
S602-A、确定预测块的纹理信息;
S602-B、根据预测块的纹理信息,确定当前块对应的变换核。
其中,预测块的纹理信息包括任意可以表示预测块纹理特征的信息,例如预测块的纹理方向、预测块的纹理大小等。
在一些实施例中,上述S602-A包括S602-A1:
S602-A1、确定预测块的梯度信息。
对应的,上述上述S602-B包括S602-B1:
S602-B1、根据预测块的梯度信息,确定当前块对应的变换核。
下面对上述S602-A1中确定预测块的梯度信息的过程进行详细介绍。
其中,预测块的梯度信息包括预测块的梯度方向和梯度大小中的至少一个。
上述S602-A1中确定预测块的梯度信息的方式包括但不限于如下几种:
方式一,通过神经网络模型的方式,确定预测块的梯度信息。
方式二,可以通过预测块中全部或部分像素点的梯度,来确定该预测块的梯度信息。在该方式二中,上述S602-A1包括如下S602-A11和S602-A12:
S602-A11、确定预测块中N个点的梯度,N为正整数;
S602-A12、根据N个点的梯度,确定预测块的梯度信息。
可选的,上述N个点可以为预测块中的全部像素点。
可选的,上述N个点可以为预测块中的部分像素点。
在一种示例中,由于预测块中最外一层像素点受其他图像块的影响较大,稳定性差,为了提高确定预测块的梯度信息的准确性,则选择用于确定预测块的梯度信息的N个点为预测块中除最外一层像素点之外的其他像素点。
在一种示例中,上述N个点为采用某一种采样方式,对预测块中的像素点进行采样得到的像素点,例如每隔一个 像素点采样一次。
可选的,N个点中至少一个点的梯度包括水平梯度和/或竖直梯度。
在一些实施例中,上述S602-A11包括:
S602-A111、对于N个点中的第i个点,根据第i个点在预测块的水平方向上的邻近点的像素值,确定第i个点的水平梯度;
S602-A112、根据第i个点在预测块的竖直方向上的邻近点的像素值,确定第i个点的竖直梯度。
在一些实施例中,若第i个点在预测块的水平方向上的邻近点包括第i个点在预测块的水平方向上的左相邻点和右相邻点,则将右相邻点的像素值与左相邻点的像素值的差值与2的比值,确定为第i个点的水平梯度。
本申请实施例对上述S602-A112根据第i个点在预测块的竖直方向上的邻近点的像素值,确定第i个点的竖直梯度不做限制。
在一种方式中,若第i个点在预测块的竖直方向上的邻近点包括第i个点在预测块的竖直方向上的上相邻点和下相邻点,则将下相邻点的像素值与上相邻点的像素值的差值与2的比值,确定为第i个点的竖直梯度。
根据上述方式,确定出预测块中N个点的梯度。接着,执行S602-A12、根据N个点的梯度,确定预测块的梯度信息。
在一些实施例中,上述S602-A12中根据N个点的梯度,确定预测块的梯度信息的方式包括但不限于如下几种:
方式一,将N个点的梯度,确定为预测块的梯度信息。
方式二,确定N个点的水平梯度之和grad_hor_sum,以及确定N个点的竖直梯度之和grad_ver_sum;并根据N个点的水平梯度之和与竖直梯度之和,确定预测块的梯度信息grad_para。
例如,将N个点的水平梯度之和grad_hor_sum与竖直梯度之和grad_ver_sum的比值,确定预测块的梯度信息,即grad_para=grad_hor_sum/grad_ver_sum。
在一些实施例中,若N个点的竖直梯度之和grad_ver_sum等于0,则确定预测块的梯度信息grad_para为0。
根据上述方法确定出预测块的梯度信息后,执行S602-B1,即根据预测块的梯度信息,确定当前块对应的变换核。
在一些实施例中,上述S602-B1包括如下S602-B11至S602-B13的步骤:
S602-B11、获取预设的预测块梯度与变换核之间的对应关系;
S602-B12、在对应关系中,确定预测块的梯度信息所对应的目标变换核;
S602-B13、将目标变换核确定为当前块对应的变换核。
在一些实施例中,不同大小的块对应的变换核的大小也不相同,基于此,上述S602-B11包括:
S602-B111、根据当前块的大小,确定当前块对应的变换核大小;
S602-B112、根据当前块对应的变换核大小,获取变换核大小所对应的预设的预测块梯度与变换核之间的对应关系。
对应的,上述S602-B12包括:
S602-B121、在上述变换核大小对应的对应关系中,确定预测块的梯度信息所对应的目标变换核。
在一些实施例中,考虑到帧间编码的残差比帧内编码的残差统计上更简单,所以帧间的应用情况可以跟帧内不同。例如,4X4的二次变换(例如LFNST)的变换核用于宽度和高度至少有一个小于16的块。而8X8的二次变换(例如LFNST)的变换核应用于更大的块,例如宽度和高度都大于8的块。
基于上述描述,在本实施例的一种可能的实现方式中,若当前块的宽度和高度中的至少一个小于16,则确定当前块对应的变换核大小为4X4,进而获取4X4变换核对应的预设的预测块梯度与变换核之间的对应关系。接着,在该4X4变换核对应的对应关系中,确定预测块的梯度信息所对应的目标变换核,将该目标变换核确定为当前块对应的变换核。
在本实施例的另一种可能的实现方式中,若当前块的宽度和高度均大于8,则确定当前块对应的变换核大小为8X8,进而获取8X8变换核对应的预设的预测块梯度与变换核之间的对应关系。接着,在该8X8变换核对应的对应关系中,确定预测块的梯度信息所对应的目标变换核,将该目标变换核确定为当前块对应的变换核。
上文对根据预测块的纹理信息,确定当前块对应的变换核的过程进行了介绍。
在一些实施例中,还可以根据神经网络模型,来确定当前块对应的变换核。
即上述S602包括如下S602-1和S602-2:
S602-1、将预测块输入预先训练好的模型中,得到模型输出的当前块对应的变换核指示信息,该变换核指示信息用于指示当前块对应变换核;
S602-2、根据变换核指示信息,确定当前块对应的变换核。
该方式中,使用的模型是预先训练好的,在训练时,以图像块的预测块为输入,以图像块对应的变换核指示信息的真值为约束对该模型进行训练,训练后的模型可以预测出变换核。这样编码端可以将预测块输入训练好的模型中,得到该模型输出的当前块对应的变换核指示信息,该变换核指示信息用于指示当前块对应的变换核,进而根据该变换核指示信息,确定出当前块对应的变换核。需要说明的是,本申请实施例对该模型的具体网络结构不做限制,例如为图像卷积神经网络、对抗神经网络等任意图像识别神经网络。
在一些实施例中,为了降低计算量和模型的复杂度,在将预测块输入模型之前,先对预测块进行降采样,以降低预测块的数据量和复杂度,再将降采样后的预测块输入预先训练好的模型中,可以提高模型预测当前块对应的变换核指示信息的效率。
需要说明的是,上述S602的实现过程与上述S403的实现过程基本相同,可以参照上述S403的描述,在此不再赘述。
上述步骤对根据当前块的预测块,确定当前块对应的变换核的过程进行了介绍,确定出当前块对应的变换核后,执行如下步骤。
S603、根据预测块和当前块,得到当前块的残差块。
例如,当前块的像素值减去预测块的像素值,得到当前块的残差块。
需要说明的是,上述S603与上述S602在实现时没有先后顺序,即上述S603可以在上述S602之前执行,也可以在上述S602之后执行,或者与上述S602同步执行,本申请对此不做限制。
S604、根据变换核,对残差块进行变换,变换后的系数进行编码,得到码流。
在一些实施例中,根据变换核对残差块进行变换,得到变换后的系数,对该变换后的系数进行变换,得到码流。
在一些实施例中,根据变换核对残差块进行变换,得到变换后的系数,对该变换后的系数进行量化,对量化后的系数进行编码,得到码流。
在一些实施例中,上述S604包括如下步骤:
S604-A1、对残差块进行基础变换,得到当前块的基础变换系数;
S604-A2、根据变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数;
S604-A3、对目标变换系数进行编码,得到码流。
在一种示例中,编码端采用上述公式(1)所示的DCT-II变换方式对当前块的残差块进行基础变换,得到当前块的基础变换系数。
在另一种示例中,编码端采用上述公式(2)所示的DCT-VIII变换方式对当前块的残差块进行基础变换,得到当前块的基础变换系数。
在另一种示例中,编码端采用上述公式(3)所示的DST-VII变换方式对当前块的残差块进行基础变换,得到当前块的基础变换系数。
接着,根据变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数。
本申请实施例对根据变换核,对目标变换系数进行反二次变换,得到当前块的基础变换系数的方式不做限制。
例如,根据上述公式(7),使用变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数,即将变换核与基础变换系数的乘积,作为当前块的目标变换系数。
对目标变换系数进行编码,得到码流。
在一种示例中,对当前块的目标变换系数不进行量化,直接进行编码,得到码流。
在一种示例中,对当前块的目标变换系数进行量化,得到量化系数,对量化系数进行编码,得到码流。
在一些实施例中,若当前块对应的变换核为一组变换核时,则编码端可以向解码端指示编码端具体使用了该一组变换核中的哪一个变换核,可以的,该指示信息可以携带在码流中。
在一些实施例中,编码端还可以在码流中携带当前块采用二次变换的指示信息,这样使得解码端根据该指示信息,在确定当前块使用二次变换时,执行本申请实施例的解码方法。
本申请实施例的编码方法,通过对当前块进行预测,得到当前块的预测块;根据预测块,确定当前块对应的变换核;根据预测块和当前块,得到当前块的残差块;根据变换核,对残差块进行变换,变换后的系数进行编码,得到码流。即本申请基于残差纹理和预测块本身的纹理之间相关性,通过预测块的特征,来确定或指导或辅助变换核的选择,在减少变换核的选择信息在码流中的传输,降低变换在码流中的开销的同时,提升了当前块的压缩效率。
下面结合图15和图16,以帧间二次变换为例,对本申请实施例的编码方法作进一步说明。
图15本申请实施例提供的视频编码方法的一种流程示意图,图16本申请实施例涉及的视频编码过程的示意图。本申请实施例应用于图1和图2所示视频编码器。如图15和图16所示,本申请实施例的方法包括:
S701、对当前块进行帧间预测,得到当前块的预测块。
上述S701的实现过程与上述S601一致,参照上述S601的描述,在此不再赘述。
S702、根据预测块和当前块,得到当前块的残差块。
例如,当前块的像素值减去预测块的像素值,得到当前块的残差块。
S703、对残差块进行基础变换,得到当前块的基础变换系数。
上述S703的实现过程与上述S604一致,参照上述S604的描述,在此不再赘述。
需要说明的是,上述S703与上述S702在实现时没有先后顺序,即上述S703可以在上述S702之前执行,也可以在上述S702之后执行,或者与上述S702同步执行,本申请对此不做限制。
S704、确定预测块中N个点的梯度。
具体参照上述S602-A11的描述,在此不再赘述。
S705、判断N个点的梯度是否满足预设条件。
在实际应用中,若预测块中有明显的方向性的纹理时,残差块中才会有明显的方向性的纹理,若预测块中没有明显的方向性的纹理,则残差块中大概率也不会有明显的方向性的纹理。基于此,为了降低不必要的预测块的梯度信息计算,则确定出N个点的梯度后,先判断N个点的梯度是否满足预设条件,若N个点的梯度满足预设条件时,则执行S706,计算预测块的梯度信息。若N个点的梯度不满足预设条件时,则不计算预测块的梯度信息,而是执行S708、S709、S711和S712,或者,执行S710、S711和S712的步骤。
在一些实施例中,上述预设条件包括如下至少一个条件:
条件1,水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值;
条件2,至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值,M为小于或等于N的正整数。
具体的,若N个点的梯度满足上述条件1和条件2中的至少一个,例如若N个点的水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值,和/或N个点中至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值时,说明预测块具体明显的纹理,而预测块具有明显的纹理时残差块也具有明 显的纹理。此时,编码端执行如下S706、S707、S709、和S711和S712的步骤,即计算预测块的梯度信息,并根据预测块的梯度信息,确定当前块对应的变换核,并根据变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数,再对目标变换系数进行量化,得到量化系数,最后对量化系数进行编码,得到码流。
若N个点的水平方向的梯度之和、与竖直方向的梯度之和均小于第一预设值,和/或N个点中至少M个点的水平方向的梯度和竖直方向的梯度中均小于第二预设值时,说明预测块不具体明显的纹理,而预测块不具有明显的纹理时残差块也不具有明显的纹理,此时,编码端的编码方式至少包括如下两种方式:
方式一,编码端执行S708、S709、S711和S712,即编码端确定当前块对应的变换核为第一变换核,该第一变换核为多个预设变换核中对应的方向纹理最小的变换核,例如为变换核组0。接着,根据第一变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数,再对目标变换系数进行量化,得到量化系数,最后对量化系数进行编码,得到码流。
方式二,编码端执行S710、S711和S712的步骤,即编码端跳过对基础变换系数进行二次变换操作,而是直接将上述基础变换系数作为当前块的目标变换系数,并对该目标变换系数进行量化,得到量化系数,最后对量化系数进行编码,得到码流。
S706、根据N个点的梯度,确定预测块的梯度信息。
S707、根据预测块的梯度信息,确定当前块对应的变换核。
上述S707和S707的实现过程可以参照上述S602-A12和S602-B1的实现,在此不再赘述。
S708、确定当前块对应的变换核为第一变换核,该第一变换核为多个预设变换核中对应的方向纹理最小的变换核。
例如,第一变换核为VVC中的变换核组0。
S709、根据变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数。
上述S709的实现过程可以参照上述S605的实现,在此不再赘述。
S710、跳过对基础变换系数进行二次变换操作,并将基础变换系数作为目标变换系数。
S711、对当前块的目标变换系数进行量化,得到量化系数。
S712、对量化系数进行编码,得到码流。
需要说明的是,上述以帧间二次变换为例对本申请的编码方法进行了说明,对于帧内二次变换,也可以采用上述方式进行编码,具体参照上述上述步骤,在此不再赘述。
本申请实施例提供的视频编码方法,通过对当前块进行帧间预测,得到当前块的预测块;根据当前块的预测块和当前块,得到当前块的残差块;对残差块进行基础变换,得到当前块的基础变换系数;确定预测块中N个点的梯度;判断N个点的梯度是否满足预设条件,若满足预设条件时,则根据N个点的梯度,确定预测块的梯度信息,并根据预测块的梯度信息,确定当前块对应的变换核,以及根据变换核,对基础变换系数进行二次变换,得到当前块的目标变换系数,最后对对当前块的目标变换系数进行量化,得到量化系数,并对量化系数进行编码,得到码流。若N个点的梯度不满足预设条件时,则确定当前块对应的变换核为第一变换核,该第一变换核为多个预设变换核中对应的方向纹理最小的变换核,并使用第一变换核对基础变换系数进行二次变换,得到当前块的目标变换系数,再对当前块的目标变换系数进行量化,得到量化系数,对量化系数进行编码,得到码流。或者,若N个点的梯度不满足预设条件时,则跳过对基础变换系数进行二次变换操作,并将基础变换系数作为目标变换系数,对当前块的目标变换系数进行量化,得到量化系数,对量化系数进行编码,得到码流。即本申请根据N个点的梯度判断当前块的预测块的纹理大小,若预测块的纹理较大时,说明残差块具有明显的纹理,此时,通过预测块确定当前块对应的变换核,并使用变换核对基础变换系数进行二次变换,以提高图像的压缩效率。若预测块的纹理较小时,说明残差块不具有明显的纹理,此时跳过二次变换,或者给当前块对应一个不具有明显纹理的变换核,防止对当前块进行过变换,进而提高了当前块的变换准确性,提升编码效率。
应理解,图9至图16仅为本申请的示例,不应理解为对本申请的限制。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
上文结合图9至图16,详细描述了本申请的方法实施例,下文结合图17至图20,详细描述本申请的装置实施例。
图17是本申请一实施例提供的视频解码器的示意性框图。
如图17所示,视频解码器10包括:
解码单元11,用于解码码流,确定当前块的目标变换系数;
预测单元12,用于对当前块进行预测,得到所述当前块的预测块;
确定单元13,用于根据所述预测块,确定所述当前块对应的变换核;
反变换单元14,用于根据所述变换核,对所述目标变换系数进行反变换,根据反变换的变换结果得到所述当前快的残差块。
在一些实施例中,预测单元12,用于对所述当前块进行帧间预测,得到所述当前块的预测块;
在一些实施例中,反变换单元14,用于根据所述变换核,对所述目标变换系数进行反二次变换,得到所述当前块 的基础变换系数;对所述基础变换系数进行反基础变换,得到所述当前块的残差块。
在一些实施例中,确定单元13,具体用于确定所述预测块的纹理信息;根据所述预测块的纹理信息,确定所述当前块对应的变换核。
在一些实施例中,所述预测块的纹理信息包括所述预测块的梯度信息,确定单元13,具体用于确定所述预测块的梯度信息;根据所述预测块的梯度信息,确定所述当前块对应的变换核。
在一些实施例中,确定单元13,具体用于确定所述预测块中N个点的梯度,所述N为正整数;根据所述N个点的梯度,确定所述预测块的梯度信息。
可选的,所述N个点的梯度包括水平梯度和/或竖直梯度。
在一些实施例中,若所述N个点的梯度包括水平梯度和竖直梯度,确定单元13,具体用于对于所述N个点中的第i个点,根据所述第i个点在所述预测块的水平方向上的邻近点的像素值,确定所述第i个点的水平梯度,所述i为小于或等于N的正整数;根据所述第i个点在所述预测块的竖直方向上的邻近点的像素值,确定所述第i个点的竖直梯度。
在一些实施例中,确定单元13,具体用于若所述第i个点在所述预测块的水平方向上的邻近点包括所述第i个点在所述预测块的水平方向上的左相邻点和右相邻点,则将所述右相邻点的像素值与所述左相邻点的像素值的差值与2的比值,确定为所述第i个点的水平梯度。
在一些实施例中,确定单元13,具体用于若所述第i个点在所述预测块的竖直方向上的邻近点包括所述第i个点在所述预测块的竖直方向上的上相邻点和下相邻点,则将所述下相邻点的像素值与所述上相邻点的像素值的差值与2的比值,确定为所述第i个点的竖直梯度。
在一些实施例中,确定单元13,具体用于确定所述N个点的水平梯度之和;确定所述N个点的竖直梯度之和;根据所述N个点的水平梯度之和与竖直梯度之和,确定所述预测块的梯度信息。
在一些实施例中,确定单元13,具体用于将所述N个点的水平梯度之和与竖直梯度之和的比值,确定所述预测块的梯度信息。
在一些实施例中,确定单元13,具体用于若所述N个点的竖直梯度之和等于0,则确定所述预测块的梯度信息为0。
在一些实施例中,确定单元13,具体用于确定所述N个点的梯度是否满足预设条件;若所述N个点的梯度满足预设条件时,则根据所述N个点的梯度,确定所述预测块的梯度信息。
在一些实施例中,所述预设条件包括如下至少一个条件:
条件1,水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值;
条件2,至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值,所述M为小于或等于N的正整数。
在一些实施例中,若确定所述N个点的梯度不满足所述预设条件,变换单元14,用于跳过对所述当前块进行反二次变换操作;或者,确定所述当前块对应的变换核为第一变换核,所述第一变换核为多个预设变换核中对应的方向纹理最小的变换核。
可选的,所述N个点为所述预测块中除最外一层像素点之外的其他像素点;或者,所述N个点为对所述预测块中的像素点进行采样得到的像素点。
在一些实施例中,确定单元13,具体用于获取预设的预测块梯度与变换核之间的对应关系;在所述对应关系中,确定所述预测块的梯度信息所对应的目标变换核;将所述目标变换核确定为所述当前块对应的变换核。
在一些实施例中,确定单元13,具体用于根据所述当前块的大小,确定所述当前块对应的变换核大小;根据所述当前块对应的变换核大小,获取所述变换核大小所对应的预设的预测块梯度与变换核之间的对应关系;在所述变换核大小对应的对应关系中,确定所述预测块的梯度信息所对应的目标变换核。
在一些实施例中,确定单元13,具体用于若所述当前块的宽度和高度中的至少一个小于16,则确定所述当前块对应的变换核大小为4X4;若所述当前块的宽度和高度均大于8,则确定所述当前块对应的变换核大小为8X8。
在一些实施例中,确定单元13,具体用于将所述预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息,所述变换核指示信息用于指示所述当前块对应的变换核;根据所述变换核指示信息,确定所述当前块对应的变换核。
在一些实施例中,确定单元13,具体用于对所述预测块进行降采样;将降采样后的预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息。
在一些实施例中,解码单元11,还用于解码码流,得到所述当前块的量化系数;对所述量化系数进行反量化,得到所述当前块的目标变换系数。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图17所示的视频解码器10可以执行本申请实施例的解码方法,并且视频解码器10中的各个单元的前述和其它操作和/或功能分别为了实现上述解码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
图18是本申请一实施例提供的视频编码器的示意性框图。
如图18所示,该视频编码器20可包括:
预测单元21,用于对当前块进行预测,得到所述当前块的预测块;
确定单元22,用于根据所述预测块,确定所述当前块对应的变换核;
残差单元23,用于根据所述预测块和所述当前块,得到所述当前块的残差块;
变换单元24,用于根据所述变换核,对所述残差块进行变换,变换后的系数进行编码,得到码流。
在一些实施例中,预测单元21,用于对当前块进行帧间预测,得到所述当前块的预测块。
在一些实施例中,变换单元24,具体用于对所述残差块进行基础变换,得到所述当前块的基础变换系数;根据所述变换核,对所述基础变换系数进行二次变换,得到所述当前块的目标变换系数;对所述目标变换系数进行编码,得到码流。
在一些实施例中,确定单元22,具体用于确定所述预测块的纹理信息;根据所述预测块的纹理信息,确定所述当前块对应的变换核。
在一些实施例中,所述预测块的纹理信息包括所述预测块的梯度信息,确定单元22,具体用于确定所述预测块的梯度信息;根据所述预测块的梯度信息,确定所述当前块对应的变换核。
在一些实施例中,确定单元22,具体用于确定所述预测块中N个点的梯度,所述N为正整数;根据所述N个点的梯度,确定所述预测块的梯度信息。
可选的,所述N个点的梯度包括水平梯度和/或竖直梯度。
在一些实施例中,若所述N个点的梯度包括水平梯度和竖直梯度,确定单元22,具体用于对于所述N个点中的第i个点,根据所述第i个点在所述预测块的水平方向上的邻近点的像素值,确定所述第i个点的水平梯度,所述i为小于或等于N的正整数;根据所述第i个点在所述预测块的竖直方向上的邻近点的像素值,确定所述第i个点的竖直梯度。
在一些实施例中,确定单元22,具体用于若所述第i个点在所述预测块的水平方向上的邻近点包括所述第i个点在所述预测块的水平方向上的左相邻点和右相邻点,则将所述右相邻点的像素值与所述左相邻点的像素值的差值与2的比值,确定为所述第i个点的水平梯度。
在一些实施例中,确定单元22,具体用于若所述第i个点在所述预测块的竖直方向上的邻近点包括所述第i个点在所述预测块的竖直方向上的上相邻点和下相邻点,则将所述下相邻点的像素值与所述上相邻点的像素值的差值与2的比值,确定为所述第i个点的竖直梯度。
在一些实施例中,确定单元22,具体用于确定所述N个点的水平梯度之和;确定所述N个点的竖直梯度之和;根据所述N个点的水平梯度之和与竖直梯度之和,确定所述预测块的梯度信息。
在一些实施例中,确定单元22,具体用于将所述N个点的水平梯度之和与竖直梯度之和的比值,确定所述预测块的梯度信息。
在一些实施例中,确定单元22,具体用于若所述N个点的竖直梯度之和等于0,则确定所述预测块的梯度信息为0。
在一些实施例中,确定单元22,具体用于确定所述N个点的梯度是否满足预设条件;若所述N个点的梯度满足预设条件时,则根据所述N个点的梯度,确定所述预测块的梯度信息。
在一些实施例中,所述预设条件包括如下至少一个条件:
条件1,水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值;
条件2,至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值,所述M为小于或等于N的正整数。
在一些实施例中,若确定所述N个点的梯度不满足所述预设条件,变换单元24,用于跳过对所述基础变换系数进行二次变换操作;或者,确定所述当前块对应的变换核为第一变换核,所述第一变换核为二次变换的多个预设变换核中对应的方向纹理最小的变换核。
可选的,所述N个点为所述预测块中除最外一层像素点之外的其他像素点;或者,所述N个点为对所述预测块中的像素点进行采样得到的像素点。
在一些实施例中,确定单元22,具体用于获取预设的预测块梯度与变换核之间的对应关系;在所述对应关系中,确定所述预测块的梯度信息所对应的目标变换核;将所述目标变换核确定为所述当前块对应的变换核。
在一些实施例中,确定单元22,具体用于根据所述当前块的大小,确定所述当前块对应的变换核大小;根据所述当前块对应的变换核大小,获取所述变换核大小所对应的预设的预测块梯度与变换核之间的对应关系;在所述变换核大小对应的对应关系中,确定所述预测块的梯度信息所对应的目标变换核。
在一些实施例中,确定单元22,具体用于若所述当前块的宽度和高度中的至少一个小于16,则确定所述当前块对应的变换核大小为4X4;若所述当前块的宽度和高度均大于8,则确定所述当前块对应的变换核大小为8X8。
在一些实施例中,确定单元22,具体用于将所述预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息,所述变换核指示信息用于指示所述当前块对应的变换核;根据所述变换核指示信息,确定所述当前块对应的变换核。
在一些实施例中,确定单元22,具体用于对所述预测块进行降采样;将降采样后的预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息。
在一些实施例中,变换单元24,具体用于对所述目标变换系数进行量化,得到所述当前块的量化系数;对所述量化系数进行编码,得到所述码流。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图18所示的视频编码器20可以对应于执行本申请实施例的编码方法中的相应主体,并且视频解码器20中的各个单元的前述和其它操作和/或功能分别为了实现编码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图19是本申请实施例提供的电子设备的示意性框图。
如图19所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法200中的步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器33包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。
如图19所示,该电子设备30还可包括:
收发器33,该收发器33可连接至该处理器32或存储器33。
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
图20是本申请实施例提供的视频编解码系统的示意性框图。
如图20所示,该视频编解码系统40可包括:视频编码器41和视频解码器42,其中视频编码器41用于执行本申请实施例涉及的视频编码方法,视频解码器42用于执行本申请实施例涉及的视频解码方法。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。

Claims (52)

  1. 一种视频解码方法,其特征在于,包括:
    解码码流,确定当前块的目标变换系数;
    对所述当前块进行预测,得到所述当前块的预测块;
    根据所述预测块,确定所述当前块对应的变换核;
    根据所述变换核,对所述目标变换系数进行反变换,根据反变换的变换结果得到所述当前快的残差块。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述当前块进行预测,得到所述当前块的预测块,包括:
    对所述当前块进行帧间预测,得到所述当前块的预测块。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述变换核,对所述目标变换系数进行反变换,根据反变换的变换结果得到所述当前快的残差块,包括:
    根据所述变换核,对所述目标变换系数进行反二次变换,得到所述当前块的基础变换系数;
    对所述基础变换系数进行反基础变换,得到所述当前块的残差块。
  4. 根据权利要求1或2所述的方法,其特征在于,所述根据所述预测块,确定所述当前块对应的变换核,包括:
    确定所述预测块的纹理信息;
    根据所述预测块的纹理信息,确定所述当前块对应的变换核。
  5. 根据权利要求4所述的方法,其特征在于,所述预测块的纹理信息包括所述预测块的梯度信息,所述确定所述预测块的纹理信息,包括:
    确定所述预测块的梯度信息;
    所述根据所述预测块的纹理信息,确定所述当前块对应的变换核,包括:
    根据所述预测块的梯度信息,确定所述当前块对应的变换核。
  6. 根据权利要求5所述的方法,其特征在于,所述确定所述预测块的梯度信息,包括:
    确定所述预测块中N个点的梯度,所述N为正整数;
    根据所述N个点的梯度,确定所述预测块的梯度信息。
  7. 根据权利要求6所述的方法,其特征在于,所述N个点的梯度包括水平梯度和/或竖直梯度。
  8. 根据权利要求7所述的方法,其特征在于,若所述N个点的梯度包括水平梯度和竖直梯度,则确定所述预测块中N个点的梯度,包括:
    对于所述N个点中的第i个点,根据所述第i个点在所述预测块的水平方向上的邻近点的像素值,确定所述第i个点的水平梯度,所述i为小于或等于N的正整数;
    根据所述第i个点在所述预测块的竖直方向上的邻近点的像素值,确定所述第i个点的竖直梯度。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述第i个点在所述预测块的水平方向上的邻近点的像素值,确定所述第i个点的水平梯度,包括:
    若所述第i个点在所述预测块的水平方向上的邻近点包括所述第i个点在所述预测块的水平方向上的左相邻点和右相邻点,则将所述右相邻点的像素值与所述左相邻点的像素值的差值与2的比值,确定为所述第i个点的水平梯度。
  10. 根据权利要求8所述的方法,其特征在于,所述根据所述第i个点在所述预测块的竖直方向上的邻近点的像素值,确定所述第i个点的竖直梯度,包括:
    若所述第i个点在所述预测块的竖直方向上的邻近点包括所述第i个点在所述预测块的竖直方向上的上相邻点和下相邻点,则将所述下相邻点的像素值与所述上相邻点的像素值的差值与2的比值,确定为所述第i个点的竖直梯度。
  11. 根据权利要求8所述的方法,其特征在于,所述根据所述N个点的梯度,确定所述预测块的梯度信息,包括:
    确定所述N个点的水平梯度之和;
    确定所述N个点的竖直梯度之和;
    根据所述N个点的水平梯度之和与竖直梯度之和,确定所述预测块的梯度信息。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述N个点的水平梯度之和与竖直梯度之和,确定所述预测块的梯度信息,包括:
    将所述N个点的水平梯度之和与竖直梯度之和的比值,确定所述预测块的梯度信息。
  13. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    若所述N个点的竖直梯度之和等于0,则确定所述预测块的梯度信息为0。
  14. 根据权利要求7-13任一项所述的方法,其特征在于,所述根据所述N个点的梯度,确定所述预测块的梯度信息,包括:
    确定所述N个点的梯度是否满足预设条件;
    若所述N个点的梯度满足预设条件时,则根据所述N个点的梯度,确定所述预测块的梯度信息。
  15. 根据权利要求14所述的方法,其特征在于,所述预设条件包括如下至少一个条件:
    条件1,水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值;
    条件2,至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值,所述M为小于或等于N的正整数。
  16. 根据权利要求14所述的方法,其特征在于,若确定所述N个点的梯度不满足所述预设条件,则所述方法还包括:
    跳过对所述当前块进行反二次变换操作;或者,
    确定所述当前块对应的变换核为第一变换核,所述第一变换核为多个预设变换核中对应的方向纹理最小的变换核。
  17. 根据权利要求6-13任一项所述的方法,其特征在于,所述N个点为所述预测块中除最外一层像素点之外的其他像素点;或者,所述N个点为对所述预测块中的像素点进行采样得到的像素点。
  18. 根据权利要求5-13任一项所述的方法,其特征在于,所述根据所述预测块的梯度信息,确定所述当前块对应的变换核,包括:
    获取预设的预测块梯度与变换核之间的对应关系;
    在所述对应关系中,确定所述预测块的梯度信息所对应的目标变换核;
    将所述目标变换核确定为所述当前块对应的变换核。
  19. 根据权利要求18所述的方法,其特征在于,所述获取预设的预测块梯度与变换核之间的对应关系,包括:
    根据所述当前块的大小,确定所述当前块对应的变换核大小;
    根据所述当前块对应的变换核大小,获取所述变换核大小所对应的预设的预测块梯度与变换核之间的对应关系;
    所述在所述对应关系中,确定所述预测块的梯度信息所对应的目标变换核,包括:
    在所述变换核大小对应的对应关系中,确定所述预测块的梯度信息所对应的目标变换核。
  20. 根据权利要求19所述的方法,其特征在于,所述根据所述当前块的大小,确定所述当前块对应的变换核大小,包括:
    若所述当前块的宽度和高度中的至少一个小于16,则确定所述当前块对应的变换核大小为4X4;
    若所述当前块的宽度和高度均大于8,则确定所述当前块对应的变换核大小为8X8。
  21. 根据权利要求1或2所述的方法,其特征在于,所述根据所述预测块,确定所述当前块对应的变换核,包括:
    将所述预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息,所述变换核指示信息用于指示所述当前块对应的二次变换的变换核;
    根据所述变换核指示信息,确定所述当前块对应的变换核。
  22. 根据权利要求21所述的方法,其特征在于,所述将所述预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息,包括:
    对所述预测块进行降采样;
    将降采样后的预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息。
  23. 根据权利要求1或2所述的方法,其特征在于,所述解码码流,得到当前块的目标变换系数,包括:
    解码码流,得到所述当前块的量化系数;
    对所述量化系数进行反量化,得到所述当前块的目标变换系数。
  24. 一种视频编码方法,其特征在于,包括:
    对当前块进行预测,得到所述当前块的预测块;
    根据所述预测块,确定所述当前块对应的变换核;
    根据所述预测块和所述当前块,得到所述当前块的残差块;
    根据所述变换核,对所述残差块进行变换,变换后的系数进行编码,得到码流。
  25. 根据权利要求24所述的方法,其特征在于,所述对当前块进行预测,得到所述当前块的预测块,包括:
    对所述当前块进行帧间预测,得到所述当前块的预测块。
  26. 根据权利要求24或25所述的方法,其特征在于,所述根据所述变换核,对所述残差块进行变换,变换后的系数进行编码,得到码流,包括:
    对所述残差块进行基础变换,得到所述当前块的基础变换系数;
    根据所述变换核,对所述基础变换系数进行二次变换,得到所述当前块的目标变换系数;
    对所述目标变换系数进行编码,得到码流。
  27. 根据权利要求24或25所述的方法,其特征在于,所述根据所述预测块,确定所述当前块对应的变换核,包括:
    确定所述预测块的纹理信息;
    根据所述预测块的纹理信息,确定所述当前块对应的变换核。
  28. 根据权利要求27所述的方法,其特征在于,所述预测块的纹理信息包括所述预测块的梯度信息,所述确定所述预测块的纹理信息,包括:
    确定所述预测块的梯度信息;
    所述根据所述预测块的纹理信息,确定所述当前块对应的变换核,包括:
    根据所述预测块的梯度信息,确定所述当前块对应的变换核。
  29. 根据权利要求28所述的方法,其特征在于,所述确定所述预测块的梯度信息,包括:
    确定所述预测块中N个点的梯度,所述N为正整数;
    根据所述N个点的梯度,确定所述预测块的梯度信息。
  30. 根据权利要求29所述的方法,其特征在于,所述N个点的梯度包括水平梯度和/或竖直梯度。
  31. 根据权利要求30所述的方法,其特征在于,若所述N个点的梯度包括水平梯度和竖直梯度,则确定所述预测块中N个点的梯度,包括:
    对于所述N个点中的第i个点,根据所述第i个点在所述预测块的水平方向上的邻近点的像素值,确定所述第i个点的水平梯度,所述i为小于或等于N的正整数;
    根据所述第i个点在所述预测块的竖直方向上的邻近点的像素值,确定所述第i个点的竖直梯度。
  32. 根据权利要求31所述的方法,其特征在于,所述根据所述第i个点在所述预测块的水平方向上的邻近点的像素值,确定所述第i个点的水平梯度,包括:
    若所述第i个点在所述预测块的水平方向上的邻近点包括所述第i个点在所述预测块的水平方向上的左相邻点和右相邻点,则将所述右相邻点的像素值与所述左相邻点的像素值的差值与2的比值,确定为所述第i个点的水平梯度。
  33. 根据权利要求31所述的方法,其特征在于,所述根据所述第i个点在所述预测块的竖直方向上的邻近点的像 素值,确定所述第i个点的竖直梯度,包括:
    若所述第i个点在所述预测块的竖直方向上的邻近点包括所述第i个点在所述预测块的竖直方向上的上相邻点和下相邻点,则将所述下相邻点的像素值与所述上相邻点的像素值的差值与2的比值,确定为所述第i个点的竖直梯度。
  34. 根据权利要求31所述的方法,其特征在于,所述根据所述N个点的梯度,确定所述预测块的梯度信息,包括:
    确定所述N个点的水平梯度之和;
    确定所述N个点的竖直梯度之和;
    根据所述N个点的水平梯度之和与竖直梯度之和,确定所述预测块的梯度信息。
  35. 根据权利要求34所述的方法,其特征在于,所述根据所述N个点的水平梯度之和与竖直梯度之和,确定所述预测块的梯度信息,包括:
    将所述N个点的水平梯度之和与竖直梯度之和的比值,确定所述预测块的梯度信息。
  36. 根据权利要求34所述的方法,其特征在于,所述方法还包括:
    若所述N个点的竖直梯度之和等于0,则确定所述预测块的梯度信息为0。
  37. 根据权利要求30-36任一项所述的方法,其特征在于,所述根据所述N个点的梯度,确定所述预测块的梯度信息,包括:
    确定所述N个点的梯度是否满足预设条件;
    若所述N个点的梯度满足预设条件时,则根据所述N个点的梯度,确定所述预测块的梯度信息。
  38. 根据权利要求37所述的方法,其特征在于,所述预设条件包括如下至少一个条件:
    条件1,水平方向的梯度之和、与竖直方向的梯度之和中的至少一个大于或等于第一预设值;
    条件2,至少M个点的水平方向的梯度和竖直方向的梯度中的至少一个大于或等于第二预设值,所述M为小于或等于N的正整数。
  39. 根据权利要求38所述的方法,其特征在于,若确定所述N个点的梯度不满足所述预设条件,则所述方法还包括:
    跳过对基础变换系数进行二次变换操作;或者,
    确定所述当前块对应的变换核为第一变换核,所述第一变换核为多个预设变换核中对应的方向纹理最小的变换核。
  40. 根据权利要求29-36任一项所述的方法,其特征在于,所述N个点为所述预测块中除最外一层像素点之外的其他像素点;或者,所述N个点为对所述预测块中的像素点进行采样得到的像素点。
  41. 根据权利要求28-36任一项所述的方法,其特征在于,所述根据所述预测块的梯度信息,确定所述当前块对应的变换核,包括:
    获取预设的预测块梯度与变换核之间的对应关系;
    在所述对应关系中,确定所述预测块的梯度信息所对应的目标变换核;
    将所述目标变换核确定为所述当前块对应的变换核。
  42. 根据权利要求41所述的方法,其特征在于,所述获取预设的预测块梯度与变换核之间的对应关系,包括:
    根据所述当前块的大小,确定所述当前块对应的变换核大小;
    根据所述当前块对应的变换核大小,获取所述变换核大小所对应的预设的预测块梯度与变换核之间的对应关系;
    所述在所述对应关系中,确定所述预测块的梯度信息所对应的目标变换核,包括:
    在所述变换核大小对应的对应关系中,确定所述预测块的梯度信息所对应的目标变换核。
  43. 根据权利要求42所述的方法,其特征在于,所述根据所述当前块的大小,确定所述当前块对应的变换核大小,包括:
    若所述当前块的宽度和高度中的至少一个小于16,则确定所述当前块对应的变换核大小为4X4;
    若所述当前块的宽度和高度均大于8,则确定所述当前块对应的变换核大小为8X8。
  44. 根据权利要求24或25所述的方法,其特征在于,所述根据所述预测块,确定所述当前块对应的变换核,包括:
    将所述预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息,所述变换核指示信息用于指示所述当前块对应的变换核;
    根据所述变换核指示信息,确定所述当前块对应的变换核。
  45. 根据权利要求44所述的方法,其特征在于,所述将所述预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息,包括:
    对所述预测块进行降采样;
    将降采样后的预测块输入预先训练好的模型中,得到所述模型输出的所述当前块对应的变换核指示信息。
  46. 根据权利要求24或25所述的方法,其特征在于,所述根据所述当前块的目标变换系数,得到码流,包括:
    对所述目标变换系数进行量化,得到所述当前块的量化系数;
    对所述量化系数进行编码,得到所述码流。
  47. 一种视频解码器,其特征在于,包括:
    解码单元,用于解码码流,得到当前块的目标变换系数;
    预测单元,用于对所述当前块进行预测,得到所述当前块的预测块;
    确定单元,用于根据所述预测块,确定所述当前块对应的变换核;
    变换单元,用于根据所述变换核,对所述目标变换系数进行反变换,并根据反变换的变换结果得到所述当前快的残差块。
  48. 一种视频编码器,其特征在于,包括:
    预测单元,用于对当前块进行预测,得到所述当前块的预测块;
    确定单元,用于根据所述预测块,确定所述当前块对应的变换核;
    残差单元,用于根据所述预测块和所述当前块,得到所述当前块的残差块;
    变换单元,用于根据所述变换核,对所述残差块进行变换,变换后的系数进行编码,得到码流。
  49. 一种视频解码器,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现上述权利要求1至23任一项所述的方法。
  50. 一种视频编码器,其特征在于,包括处理器和存储器;
    所示存储器用于存储计算机程序;
    所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现如上述权利要求24至46任一项所述的方法。
  51. 一种视频编解码系统,其特征在于,包括:
    根据权利要求49所述的视频编码器;
    以及根据权利要求50所述的视频解码器。
  52. 一种计算机可读存储介质,其特征在于,用于存储计算机程序;
    所述计算机程序使得计算机执行如上述权利要求1至23或24至46任一项所述的方法。
PCT/CN2021/121047 2021-09-27 2021-09-27 视频编解码方法、设备、系统、及存储介质 WO2023044919A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/121047 WO2023044919A1 (zh) 2021-09-27 2021-09-27 视频编解码方法、设备、系统、及存储介质
CN202180102575.2A CN117981320A (zh) 2021-09-27 2021-09-27 视频编解码方法、设备、系统、及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/121047 WO2023044919A1 (zh) 2021-09-27 2021-09-27 视频编解码方法、设备、系统、及存储介质

Publications (1)

Publication Number Publication Date
WO2023044919A1 true WO2023044919A1 (zh) 2023-03-30

Family

ID=85719920

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/121047 WO2023044919A1 (zh) 2021-09-27 2021-09-27 视频编解码方法、设备、系统、及存储介质

Country Status (2)

Country Link
CN (1) CN117981320A (zh)
WO (1) WO2023044919A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017061671A1 (ko) * 2015-10-08 2017-04-13 엘지전자 주식회사 영상 코딩 시스템에서 적응적 변환에 기반한 영상 코딩 방법 및 장치
CN109922340A (zh) * 2017-12-13 2019-06-21 华为技术有限公司 图像编解码方法、装置、系统及存储介质
CN110268715A (zh) * 2017-02-28 2019-09-20 谷歌有限责任公司 变换核选择与熵编译
WO2021137445A1 (ko) * 2019-12-31 2021-07-08 (주)휴맥스 비디오 신호 처리를 위한 변환 커널 결정 방법 및 이를 위한 장치
WO2021139572A1 (zh) * 2020-01-08 2021-07-15 Oppo广东移动通信有限公司 编码方法、解码方法、编码器、解码器以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017061671A1 (ko) * 2015-10-08 2017-04-13 엘지전자 주식회사 영상 코딩 시스템에서 적응적 변환에 기반한 영상 코딩 방법 및 장치
CN110268715A (zh) * 2017-02-28 2019-09-20 谷歌有限责任公司 变换核选择与熵编译
CN109922340A (zh) * 2017-12-13 2019-06-21 华为技术有限公司 图像编解码方法、装置、系统及存储介质
WO2021137445A1 (ko) * 2019-12-31 2021-07-08 (주)휴맥스 비디오 신호 처리를 위한 변환 커널 결정 방법 및 이를 위한 장치
WO2021139572A1 (zh) * 2020-01-08 2021-07-15 Oppo广东移动通信有限公司 编码方法、解码方法、编码器、解码器以及存储介质

Also Published As

Publication number Publication date
CN117981320A (zh) 2024-05-03

Similar Documents

Publication Publication Date Title
TWI745594B (zh) 與視訊寫碼中之變換處理一起應用之內部濾波
CN113411577B (zh) 编码方法及装置
TW202005399A (zh) 基於區塊之自適應迴路濾波器(alf)之設計及發信令
CN111327904B (zh) 图像重建方法和装置
CN114885159B (zh) 位置相关预测组合的模式相关和大小相关块级限制的方法和装置
CN113545063A (zh) 使用线性模型进行帧内预测的方法及装置
WO2023044868A1 (zh) 视频编解码方法、设备、系统、及存储介质
WO2023044919A1 (zh) 视频编解码方法、设备、系统、及存储介质
WO2023173255A1 (zh) 图像编解码方法、装置、设备、系统、及存储介质
WO2022155922A1 (zh) 视频编解码方法与系统、及视频编码器与视频解码器
WO2023197229A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023122969A1 (zh) 帧内预测方法、设备、系统、及存储介质
WO2022116054A1 (zh) 图像处理方法、系统、视频编码器及视频解码器
WO2023122968A1 (zh) 帧内预测方法、设备、系统、及存储介质
WO2023184747A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
CN116760976B (zh) 仿射预测决策方法、装置、设备及存储介质
WO2023220970A1 (zh) 视频编码方法、装置、设备、系统、及存储介质
WO2023236113A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2022179394A1 (zh) 图像块预测样本的确定方法及编解码设备
WO2024050723A1 (zh) 一种图像预测方法、装置及计算机可读存储介质
WO2023220946A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2023220969A1 (zh) 视频编解码方法、装置、设备、系统及存储介质
WO2022116105A1 (zh) 视频编解码方法与系统、及视频编码器与视频解码器
WO2023000182A1 (zh) 图像编解码及处理方法、装置及设备
EP4383709A1 (en) Video encoding method, video decoding method, device, system, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21958037

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180102575.2

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021958037

Country of ref document: EP

Effective date: 20240429