WO2022116085A1 - 编码方法、解码方法、编码器、解码器以及电子设备 - Google Patents

编码方法、解码方法、编码器、解码器以及电子设备 Download PDF

Info

Publication number
WO2022116085A1
WO2022116085A1 PCT/CN2020/133597 CN2020133597W WO2022116085A1 WO 2022116085 A1 WO2022116085 A1 WO 2022116085A1 CN 2020133597 W CN2020133597 W CN 2020133597W WO 2022116085 A1 WO2022116085 A1 WO 2022116085A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
block
prediction
neural network
prediction mode
Prior art date
Application number
PCT/CN2020/133597
Other languages
English (en)
French (fr)
Inventor
戴震宇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2020/133597 priority Critical patent/WO2022116085A1/zh
Priority to CN202080065143.4A priority patent/CN114868386B/zh
Publication of WO2022116085A1 publication Critical patent/WO2022116085A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the embodiments of the present application relate to the technical field of image encoding and decoding, and more particularly, to an encoding method, a decoding method, an encoder, a decoder, and an electronic device.
  • Digital video compression technology mainly compresses huge digital video and video data to facilitate transmission and storage.
  • Digital video compression standards can realize video decompression, it is still necessary to pursue better digital video compression technology to improve compression performance. .
  • Embodiments of the present application provide an encoding method, a decoding method, an encoder, a decoder, and an electronic device, which can improve compression performance.
  • an embodiment of the present application provides an encoding method, including:
  • the target image blocks in the plurality of image blocks include target chrominance blocks
  • the optimal prediction mode is selected from the neural network-based chrominance prediction mode and the traditional prediction mode;
  • the target residual block, the permission flag and the control flag are encoded to obtain a code stream, and the permission flag is used to identify whether to allow the target chroma block to be intra-frame using the neural network-based chroma prediction mode prediction, the control flag is used to identify whether to use the neural network-based chroma prediction mode to perform intra-frame prediction on the target chroma block.
  • an embodiment of the present application provides a decoding method, including:
  • the permission flag indicates that the target chroma block can be intra-predicted using the neural network-based chroma prediction mode
  • the control flag indicates that the neural network-based chroma prediction mode is used to perform intra prediction on the target chroma block performing intra-frame prediction on the target chroma block, and using the neural network-based chroma prediction mode to perform intra-frame prediction on the target chroma block to obtain a target prediction block;
  • a target image frame is obtained.
  • an embodiment of the present application provides an encoder for executing the method in the first aspect or each of its implementations.
  • the encoder includes a functional unit for executing the method in the above-mentioned first aspect or each of its implementations.
  • an embodiment of the present application provides a decoder for executing the method in the second aspect or each of its implementations.
  • the decoder includes functional units for performing the methods in the second aspect or the respective implementations thereof.
  • an electronic device including:
  • a processor adapted to implement computer instructions
  • a computer-readable storage medium storing computer instructions adapted to be loaded by a processor and executed to perform the method in any one of the above-mentioned first to second aspects or implementations thereof.
  • an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are read and executed by a processor of a computer device, the computer device is made to execute the above-mentioned first step.
  • an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the method in any one of the above-mentioned first to second aspects or implementations thereof .
  • the neural network-based chrominance prediction mode is selected from the chrominance prediction mode and the traditional prediction mode; then the target prediction block is obtained based on the optimal prediction mode; the compression performance can be improved.
  • FIG. 1 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of specific directions of 33 angle prediction modes provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a MIP mode provided by an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of an extended coding framework based on the coding framework shown in FIG. 1 .
  • FIG. 6 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 7 and FIG. 8 are schematic structural diagrams of input of a neural network-based chrominance prediction mode provided that the input video format is YUV420 provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a first training strategy provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a decoding method according to an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of an encoder according to an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of a decoder according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the solutions provided by the embodiments of the present application can be applied to the technical field of digital video coding, for example, the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, and the field of real-time video coding and decoding.
  • the solutions provided in the embodiments of the present application may be combined with the Audio Video Coding Standard (AVS), the second-generation AVS standard (AVS2), or the third-generation AVS standard (AVS3).
  • AVS Audio Video Coding Standard
  • AVS2 second-generation AVS standard
  • AVS3 third-generation AVS standard
  • H.264/Audio Video Coding AVC
  • H.265/High Efficiency Video Coding HEVC
  • H.266/Versatile Video Coding Versatile Video Coding, VVC
  • the solutions provided by the embodiments of the present application can be used to perform lossy compression (lossy compression) on images, and can also be used to perform lossless compression (lossless compression) on images.
  • the lossless compression may be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
  • the encoder reads the pixels of unequal luminance components and the pixels of chrominance components for the original video sequences of different color formats, that is, the encoder reads a black and white image or a color image, and then targets the image or Color images are encoded separately.
  • the black and white image may include pixels of luminance component
  • the color image may include pixels of chrominance component
  • the color image may further include pixels of luminance component.
  • the color format of the original video sequence may be a luminance chrominance (YCbCr, YUV) format or a red-green-blue (Red-Green-Blue, RGB) format, or the like.
  • the encoder After the encoder reads a black and white image or a color image, it divides it into block data, and encodes the block data.
  • the block data can be a coding tree unit (Coding Tree Unit, CTU) or a coding unit block (Coding Unit, CU).
  • CTU Coding Tree Unit
  • CU Coding Unit
  • a coding tree unit can be further divided into several CUs, and the CU can be a rectangular block or a square block. . That is, the encoder can encode based on CTU or CU.
  • Today's encoders are usually mixed-frame coding modes, which generally include operations such as intra-frame and inter-frame prediction, transform and quantization, inverse transform and inverse quantization, loop filtering, and entropy coding.
  • Intra-frame prediction only refers to the information of the same frame image to predict the pixel information in the current divided block to eliminate spatial redundancy; inter-frame prediction can refer to the image information of different frames, and use motion estimation to search for the motion vector that best matches the current divided block Information, used to eliminate temporal redundancy; transformation converts the predicted image block to the frequency domain, energy redistribution, combined with quantization can remove information that is not sensitive to the human eye to eliminate visual redundancy; entropy coding can be based on the current context.
  • the model and probabilistic information of the binary code stream eliminate character redundancy.
  • FIG. 1 is a schematic block diagram of an encoding framework 100 provided by an embodiment of the present application.
  • the encoding framework 100 may include an intra prediction unit 180, an inter prediction unit 170, a residual unit 110, a transform and quantization unit 120, an entropy encoding unit 130, an inverse transform and inverse quantization unit 140, and a loop The channel filtering unit 150.
  • the encoding framework 100 may further include a decoded image buffer unit 160 .
  • This coding framework 100 may also be referred to as a mixed-frame coding mode.
  • intra-prediction unit 180 or inter-prediction unit 170 may predict an image block to be encoded to output a predicted block.
  • the residual unit 110 may calculate a residual block, that is, a difference between the predicted block and the to-be-encoded image block, based on the predicted block and the to-be-encoded image block.
  • the residual block is transformed and quantized by the transform and quantization unit 120 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before transformation and quantization by the transform and quantization unit 120 may be referred to as a time domain residual block
  • the time domain residual block after transformation and quantization by the transform and quantization unit 120 may be referred to as a frequency residual block.
  • the entropy encoding unit 130 may output a code stream based on the transform and quantization coefficients. For example, the entropy encoding unit 130 may eliminate character redundancy according to the target context model and the probability information of the binary code stream. For example, the entropy coding unit 130 may be used for context-based adaptive binary arithmetic entropy coding (CABAC).
  • CABAC context-based adaptive binary arithmetic entropy coding
  • the entropy encoding unit 130 may also be referred to as a header information encoding unit.
  • the image block to be encoded may also be referred to as an original image block or a target image block
  • a prediction block may also be referred to as a predicted image block or an image prediction block, and may also be referred to as a prediction signal or prediction information
  • Reconstruction blocks may also be referred to as reconstructed image blocks or image reconstruction blocks, and may also be referred to as reconstruction signals or reconstruction information.
  • the image block to be encoded may also be referred to as an encoding block or an encoded image block
  • the image block to be encoded may also be referred to as a decoding block or a decoded image block.
  • the image block to be encoded may be a CTU or a CU.
  • the encoding framework 100 calculates the residual between the prediction block and the image block to be encoded to obtain the residual block, and transmits the residual block to the decoding end through processes such as transformation and quantization. After the decoding end receives and parses the code stream, the residual block is obtained through the steps of inverse transformation and inverse quantization, and the reconstructed block is obtained by superimposing the prediction block predicted by the decoding end on the residual block.
  • the inverse transform and inverse quantization unit 140, the loop filtering unit 150 and the decoded image buffer unit 160 in the encoding framework 100 can be used to form a decoder.
  • the intra-frame prediction unit 180 or the inter-frame prediction unit 170 can predict the to-be-coded image block based on the existing reconstructed block, so as to ensure that the encoding end and the decoding end have the same understanding of the reference frame.
  • the encoder can duplicate the decoder's processing loop, which in turn can produce the same predictions as the decoder.
  • the quantized transform coefficients are inversely transformed and inversely quantized by the inverse transform and inverse quantization unit 140 to replicate the approximate residual block at the decoding end.
  • the in-loop filtering unit 150 can be used to smoothly filter out the effects of blockiness and other effects caused by block-based processing and quantization.
  • the image blocks output by the loop filtering unit 150 may be stored in the decoded image buffer unit 160 for use in prediction of subsequent images.
  • the intra-frame prediction unit 180 can be used for intra-frame prediction, and the intra-frame prediction only refers to the information of the same frame image, and predicts the pixel information in the image block to be encoded, so as to eliminate spatial redundancy;
  • the frame used for the intra-frame prediction can be an I frame .
  • the image block to be coded can refer to the upper left image block, the upper image block and the left image block can be used as reference information to predict the image block to be coded, and the image block to be coded can be predicted.
  • the block is used as the reference information for the next image block, so the whole image can be predicted.
  • every 4 pixels of each image frame of the digital video is composed of 4 Y components and 2 UV components, and the encoding framework 100 can
  • the Y component (ie, the luma block) and the UV component (ie, the chrominance block) are encoded separately.
  • the decoding end can also perform corresponding decoding according to the format.
  • the inter-frame prediction unit 170 can be used for inter-frame prediction, and the inter-frame prediction can refer to image information of different frames, and use motion estimation to search for motion vector information that best matches the image block to be encoded, so as to eliminate temporal redundancy;
  • the frames may be P frames and/or B frames, where P frames refer to forward predicted frames and B frames refer to bidirectional predicted frames.
  • the intra-frame prediction can use the angular prediction mode and the non-angle prediction mode to predict the to-be-coded image block to obtain the predicted block.
  • the optimal prediction mode of the image block, and the prediction mode is transmitted to the decoding end through the code stream.
  • the decoding end parses out the prediction mode, predicts the prediction block of the target decoding block, and superimposes the temporal residual block obtained through code stream transmission to obtain the reconstructed block.
  • the non-angle mode remains relatively stable, with average mode and plane mode; the angle mode continues to increase with the evolution of digital video codec standards.
  • the H.264/AVC standard has only 8 angle prediction modes and 1 non-angle prediction mode; H.265/HEVC is extended to 33 angle prediction modes and 2 non-angle prediction modes. model.
  • H.266/VVC the intra-frame prediction mode is further expanded. There are 67 traditional prediction modes and non-traditional prediction modes for luma blocks.
  • Matrix weighted intra-frame prediction (MIP) mode traditional prediction mode
  • the modes include the planar mode of mode number 0, the DC mode of mode number 1, and the angle prediction modes of mode number 2 to mode number 66.
  • FIG. 2 is a schematic diagram of specific directions of 33 angle prediction modes provided by an embodiment of the present application. As shown in FIG.
  • the 33 angle prediction modes are divided into a horizontal type mode and a vertical type mode, and the horizontal type mode includes H+32 (mode No. 2) to H-32 (mode No. 17), vertical type modes include V-32 (mode No. 18) to V+32 (mode No. 34).
  • V0 (mode number 26) and H0 (mode number 10) represent the vertical and horizontal directions respectively, and the prediction directions of the remaining angle prediction modes can be regarded as an angular offset in the vertical or horizontal direction.
  • VVC's reference software test platform VVC TEST MODEL, VTM
  • VTM cross component linear model prediction
  • CCLM cross component linear model prediction
  • MIP mode is currently unique to VVC, while CCLM mode also exists in other advanced standards, such as AV1's Chroma from Luma (CfL) mode and AVS3's Two Step Cross-Component prediction mode (Two Step Cross- component Prediction Mode, TSCPM).
  • CfL Chroma from Luma
  • TSCPM Two Step Cross-Component prediction mode
  • MIP is derived from neural network-based prediction technology, which uses a fully connected neural network.
  • FIG. 3 is a schematic flowchart of a MIP mode provided by an embodiment of the present application.
  • the upper K rows and the upper left K columns to reconstruct the pixels as input, after 3 layers of fully connected layers, nonlinear activation functions and 1 layer of linear fully connected layers, the The predicted pixel value of the encoded image block, that is, the predicted block of the image block to be encoded.
  • Reconstructed pixels may also be referred to as reconstructed pixel values or reconstructed pixels.
  • rate-distortion screening is performed on the parameters of multiple sets of fully connected neural networks, namely network weights, an optimal set of network weights is selected for prediction, and the index of this set of parameters is encoded into the code stream.
  • Network weights may include parameters such as matrices and biases.
  • MIP Compared with the neural network, MIP has undergone many simplifications including network parameters, input points, etc., and finally completes the prediction in the form of a vector multiplied by a matrix.
  • MIP selects W reconstructed pixels in the upper row of the block and H reconstructed pixels in the left column as input. If the pixels at these locations have not been reconstructed, they can be processed like traditional prediction methods.
  • MIP generates the predicted value mainly based on three steps, namely, the average of reference pixels, matrix-vector multiplication, and linear interpolation upsampling. MIP acts on blocks of size from 4x4 to 32x32.
  • the short side of the rectangle For a rectangular block, if the short side of the rectangle is 4, it will be selected from 16 pre-trained matrices and biases (ie network weights) with 16 columns and 4 rows. Optimal; if the short side length of the rectangle is 8, it will select the best from 8 pre-trained matrices and offsets with 16 columns and 8 rows; if the short side length of the rectangle is 16, it will be selected from the pre-trained matrix. The 6 sets of 64 columns and 8 rows of matrices and biases are selected optimally. The above-mentioned multiple sets of matrices and biases corresponding to blocks of a specific size can be obtained by combining the network weights of multiple trained neural networks.
  • the MIP tool mentioned above is derived from the intra-frame prediction based on the fully connected neural network, but it is limited to only use the prediction of luminance blocks. Moreover, there are too many types of MIP models, which are more complicated to train. At present, there is no good scheme for predicting chroma blocks based on fully connected neural network.
  • FIG. 1 to FIG. 3 are only examples of the present application, and should not be construed as a limitation of the present application.
  • the loop filtering unit 150 in the encoding framework 100 may include a deblocking filter (DBF) and a sample adaptive compensation filter (SAO).
  • DBF deblocking filter
  • SAO sample adaptive compensation filter
  • the coding framework 100 may use a neural network-based loop filtering algorithm to improve video compression efficiency.
  • the coding framework 100 may be a video coding hybrid framework based on a deep learning neural network.
  • a model based on a convolutional neural network may be used to calculate the result of filtering the pixels based on the deblocking filter and the sample adaptive compensation filtering.
  • the network structure of the in-loop filtering unit 150 on the luminance component and the chrominance component may be the same or different. Considering that the luminance component contains more visual information, the luminance component can also be used to guide the filtering of the chrominance component, so as to improve the reconstruction quality of the chrominance component.
  • FIG. 4 is a schematic block diagram of a decoding framework 200 provided by an embodiment of the present application.
  • the decoding framework 200 may include an entropy decoding unit 210, an inverse transform and inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filtering unit 260, and a decoded image buffer. unit 270.
  • the entropy decoding unit 210 After the entropy decoding unit 210 receives and parses the code stream, it obtains the prediction block and the frequency domain residual block. For the frequency domain residual block, the inverse transform and inverse quantization unit 220 performs steps such as inverse transformation and inverse quantization to obtain the time domain residual block. difference block, the residual unit 230 superimposes the prediction block predicted by the intra-frame prediction unit 240 or the inter-frame prediction unit 250 to the temporal residual block after inverse transformation and inverse quantization by the inverse transform and inverse quantization unit 220 to obtain Rebuild blocks. For example, the intra prediction unit 240 or the inter prediction unit 250 may obtain the prediction block by decoding the header information of the code stream.
  • FIG. 5 is a schematic flowchart of an encoding framework 100-1 provided by an embodiment of the present application.
  • the coding framework 100-1 may include a neural network-based chroma intra prediction unit 190, and the neural network-based chroma intra prediction unit 190 may predict the to-be-predicted chroma block.
  • the coding framework 100-1 is an extension of the coding framework 100, that is, other units in the coding framework 100-1 can be referred to the relevant descriptions in the coding framework 100, which are not repeated here to avoid repetition.
  • FIG. 6 is a schematic flowchart of an encoding method 300 provided by an embodiment of the present application. It should be understood that the encoding method 300 can be performed by the encoding end. For example, it is applied to the encoding frame 100-1 shown in FIG. 5 . For ease of description, the following description takes the encoding end as an example.
  • the encoding method 300 may include:
  • the chrominance prediction mode based on the neural network can be used to perform intra-frame prediction on the target chrominance block, select the optimal prediction mode from the chrominance prediction mode based on the neural network and the traditional prediction mode;
  • S340 Encode the target residual block, the permission flag and the control flag to obtain a code stream, where the permission flag is used to identify whether to allow the target chroma prediction mode to be performed on the target chroma block by using the neural network-based chroma prediction mode Intra-frame prediction, the control flag is used to identify whether to use the neural network-based chrominance prediction mode to perform intra-frame prediction on the target chrominance block.
  • the neural network-based chroma prediction mode can be used to perform intra-frame prediction on the target chroma block, in the neural network-based chroma prediction mode and the traditional The optimal prediction mode is selected in the prediction mode, and the identifier of the optimal prediction mode is encoded into the code stream as a control identifier for the decoding end to read.
  • the neural network-based chroma prediction mode By introducing the neural network-based chroma prediction mode, in the case where the target chroma block can be intra-predicted using the neural network-based chroma prediction mode, the neural network-based chroma prediction mode and the traditional The optimal prediction mode is selected from the prediction modes; then the target prediction block is obtained based on the optimal prediction mode; the compression performance can be improved.
  • the parameters of the reference software VTM-10.0 are set as follows:
  • the number of reference rows of the target chroma block is set to K.
  • Each fully connected model contains only one set of network weights.
  • the training set used is some frames of 8bit video in Class-B, C, D, E, F in the VVC test sequence.
  • Each fully connected model contains only one set of network weights.
  • the training set used is the DIV2K image set.
  • DIV2K is the training set of non-test sequence videos.
  • the S320 may include:
  • the neural network-based chroma prediction mode is determined as the optimal prediction mode; if the first prediction The rate-distortion cost of the block is higher than the rate-distortion cost of the second prediction block, and the conventional prediction mode is determined as the optimal prediction mode.
  • whether the chroma intra prediction mode based on the neural network is selected is determined by the encoder.
  • the neural network-based chroma intra prediction and traditional prediction modes will jointly perform rate-distortion screening. If the cost of the traditional mode is low, the traditional prediction mode is selected; if the cost of the chroma prediction mode based on the neural network is low, the prediction mode based on the neural network is selected.
  • the selected mode will be encoded into the code stream for the decoder to read.
  • the parsed chroma intra prediction mode is a neural network-based chroma prediction mode
  • the corresponding neural network model is used for prediction; when it is parsed as a traditional mode, the corresponding traditional mode is used for prediction.
  • the first prediction block may be obtained in the following manner:
  • the reconstruction part of the target luminance block and the reconstruction part adjacent to the target luminance block as input, predict the target chroma block to obtain the first prediction block.
  • the reconstruction part adjacent to the target chroma block may include the reconstruction reference row adjacent to the target chroma block and the reconstruction reference column adjacent to the target chroma block, or, the adjacent reconstruction part of the target chroma block.
  • the reconstruction part may include an upper reconstruction reference row of the target chroma block and a left reconstruction reference column of the target chroma block.
  • the reconstruction part adjacent to the target luminance block may include a reconstruction reference row above the target luminance block and a reconstruction reference column on the left side of the target luminance block.
  • the reconstruction part adjacent to the target luminance block may also be referred to as a reconstruction block of the target luminance block.
  • the reconstructed block of the luminance block at the corresponding position can also be input. , so as to assist the prediction to improve the prediction effect.
  • FIG. 7 and FIG. 8 are schematic structural diagrams of input of a neural network-based chrominance prediction mode provided that the input video format is YUV420 provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a pixel point of a chrominance block as an input in a chrominance prediction mode based on a neural network.
  • FIG. 8 is a schematic structural diagram of a pixel point of a luminance block as an input in a chrominance prediction mode based on a neural network. As shown in FIG.
  • the input of the neural network-based chrominance prediction mode may include reconstruction pixels in M rows adjacent to the target chroma block N, reconstruction pixels in the adjacent M columns of the target chroma block N, and reconstruction of the target luminance block.
  • Part 2N, 2M rows of reconstructed pixels adjacent to the reconstructed part 2N of the target luminance block, and 2M columns of reconstructed pixels adjacent to the reconstructed part 2N of the target luminance block, M is greater than or equal to 1.
  • the target chroma block may be rectangular, N equals 8, and M equals 2.
  • FIG. 7 and FIG. 8 are only examples of the present application, and the specific numerical values of M and N are not limited in the embodiments of the present application.
  • the ratio between luminance and chrominance will change.
  • the input size of the chrominance prediction mode based on neural network should also be adjusted accordingly.
  • the neural network involved in this application can be a multi-layer fully connected neural network.
  • the fully connected neural network consists of K (K greater than or equal to 1) layers of fully connected layers. Each fully connected layer is provided with a nonlinear activation function ReLU or other activation functions. Each fully connected layer contains M nodes.
  • K K greater than or equal to 1
  • Each fully connected layer contains M nodes.
  • the reconstructed blocks of the chroma blocks are ordered in a 1D vector form and used as the input to the first layer of the fully connected network.
  • a chroma block of a specific shape can be predicted through its corresponding neural network, that is, a chroma block of a specific shape can be predicted through its corresponding network weight.
  • the embodiment of the present application does not specifically limit the correspondence between the chroma blocks and the network, or the correspondence between the chroma blocks and the network weights.
  • a chroma block of a specific shape may correspond to one neural network or multiple neural networks, that is, it may correspond to one set of network weights or multiple sets of network weights, which is not specifically limited in this embodiment of the present application.
  • the neural network based chrominance prediction mode has a set of network weights.
  • the target chrominance block may be intra-frame predicted by using the set of network weights to obtain the first predicted block.
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the first training strategy; at this time, the first prediction block can be obtained in the following manner:
  • the prediction block with the smallest rate-distortion cost is determined as the first prediction block.
  • the first training strategy refers to training a neural network in the following manner to obtain the multiple network weights:
  • the training set includes a plurality of training samples
  • the neural network on the training set and if the neural network converges, test the neural network on the training set to obtain the test results of the multiple training samples; based on the test results of the multiple training samples, according to Reordering the plurality of training samples from large to small and from small to large, respectively, to obtain two sub-training sets, the two sub-training sets are used as training sets to retrain the neural network until multiple sub-training sets are obtained,
  • the number of the multiple sub-training sets is equal to the number of the multiple sets of network parameters;
  • the neural network is trained on the multiple sub-training sets to obtain the multiple sets of network parameters.
  • test results of the previous training set can be used as a classifier, and the training samples in the training set can be reordered according to the cost and divided into several sub-classes, so as to further train multiple sets of network weights.
  • the test result includes at least one of the following: peak signal-to-noise ratio (Peak Signal To Noise Ratio, PSNR), sum of absolute errors (Sum of Absolute Differences, SAD), or Hadamard transformed Sum of Absolute Differences after Hadamard (SATD).
  • PSNR Peak Signal To Noise Ratio
  • SAD Sum of Absolute Differences
  • SAD Sum of Absolute Differences after Hadamard
  • FIG. 9 is a schematic flowchart of a first training strategy provided by an embodiment of the present application.
  • the network is first trained on the generated training set.
  • the network is tested on the training set to obtain the PSNR of each training sample in the training set.
  • PSNR reorders the training set, and the larger PSNR is generally used as the sub-training set 1, and the smaller PSNR is generally used as the sub-training set 2.
  • PSNR can also be other indicators.
  • sum of absolute errors SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • sum of absolute errors after hadamard transformation SAD
  • SAD sum of absolute errors after hadamard transformation
  • the optimal prediction mode is the neural network-based chrominance prediction mode
  • the S340 may include:
  • the target residual block, the permission flag, the control flag and the index flag are encoded to obtain the code stream, and the index flag is used to identify the use of the neural network-based chrominance prediction mode for the
  • the index of the target network weight used when the target chroma block performs intra prediction, and the multiple sets of network weights include the target network weight.
  • the target image block includes a target luminance block
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the second training strategy; at this time, the first prediction block can be obtained in the following manner :
  • Intra-frame prediction is performed on the target chrominance block by using the network weight corresponding to the target luminance prediction mode in the multiple sets of network weights to obtain the first prediction block.
  • the second training strategy refers to training the neural network in the following manner to obtain the multiple network weights:
  • the training set is divided into multiple types of training sets corresponding to the multiple traditional prediction modes respectively;
  • a neural network is trained on the multi-class training set to obtain the plurality of sets of network weights.
  • the training set can be divided by the intra-frame prediction mode selected by the luminance blocks in the training set, and then multiple sets of network weights can be obtained based on the divided training set.
  • the plurality of conventional prediction modes include multiple types of:
  • Planar or matrix weighted intra prediction MIP mode DC mode, angle mode and wide angle mode.
  • the angle mode can be used for rectangular chroma blocks
  • the wide-angle mode can be used for non-rectangular chroma blocks.
  • Non-rectangular chroma blocks may be chroma blocks of unequal width and height.
  • the second training strategy is based on a variety of traditional prediction modes selected by the luminance blocks in the training set, and the classification of the training set, that is, a traditional prediction mode can correspond to a type of training set, assuming that the traditional prediction mode selected by the encoder is used
  • N sets of network weights can be trained on the divided N types of training sets.
  • N can be set to 6 categories, namely Planar or MIP mode, DC mode, angular mode of 2-17, angular mode of 18-33, angular mode of 34-49, and 50-66 angle mode.
  • the encoder and decoder use the network weights under the second training strategy, the encoder only needs to directly find out the selected mode of the luma block corresponding to the current chroma block, and then according to The mode of the luminance block selects a corresponding set from multiple sets of network weights.
  • the method 300 may further include:
  • the target chroma block According to the size of the target chroma block, it is determined whether the target chroma block can be intra-predicted using the neural network-based chroma prediction mode.
  • the determining, according to the size of the target chroma block, whether intra-frame prediction can be performed on the target chroma block by using the neural network-based chroma prediction mode includes:
  • the neural network-based chroma prediction can be used mode intra-predicts the target chroma block.
  • the embodiments of the present application do not limit the structure of the fully connected neural network, including the number of nodes in the fully connected layer, and the specific implementation methods such as nonlinear activation functions.
  • the number of columns of the columns or the number of rows of the reference row is not particularly limited.
  • the embodiments of the present application do not specifically limit the size of the chroma block applicable to the fully connected neural network, the structure and type of the neural network.
  • the embodiment of the present application does not limit the specific number of network weights that can be trained by the second training strategy.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the present application.
  • the implementation of the embodiments constitutes no limitation.
  • the encoding method according to the embodiment of the present application is described in detail from the perspective of the encoding end, and the decoding method according to the embodiment of the present application will be described below with reference to FIG. 10 from the perspective of the decoding end.
  • FIG. 10 shows a schematic flowchart of a decoding method 400 according to an embodiment of the present application.
  • the method 400 may consist of a decoding framework comprising a neural network based chroma intra prediction unit.
  • a neural network based chroma intra prediction unit can be extended into the decoding framework described in FIG. 4 to perform the decoding method 400 .
  • the method 400 may include:
  • the permission flag indicates that the target chroma prediction mode can be used to perform intra-frame prediction on the target chroma block
  • the control flag indicates that the neural network-based chroma prediction mode is used performing intra-frame prediction on the target chroma block, and using the neural network-based chroma prediction mode to perform intra-frame prediction on the target chroma block to obtain a target prediction block;
  • the S420 may include:
  • the reconstruction part of the target luminance block and the reconstruction part adjacent to the target luminance block as input, predict the target chroma block to obtain the target prediction piece.
  • the neural network-based chrominance prediction mode has a set of network weights; the S430 may include:
  • Intra-frame prediction is performed on the target chrominance block using the set of network weights to obtain the target prediction block.
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the first training strategy; the S420 may include:
  • the code stream is analyzed to obtain an index identifier, which is used to identify the index of the target network weight used when performing intra-frame prediction on the target chroma block using the neural network-based chroma prediction mode, so
  • the multiple sets of network weights include the target network weights
  • the first training strategy refers to training a neural network in the following manner to obtain the multiple network weights:
  • the training set includes a plurality of training samples
  • the neural network on the training set and if the neural network converges, test the neural network on the training set to obtain the test results of the multiple training samples; based on the test results of the multiple training samples, according to Reordering the plurality of training samples from large to small and from small to large, respectively, to obtain two sub-training sets, the two sub-training sets are used as training sets to retrain the neural network until multiple sub-training sets are obtained,
  • the number of the multiple sub-training sets is equal to the number of the multiple sets of network parameters;
  • the neural network is trained on the multiple sub-training sets to obtain the multiple sets of network parameters.
  • the test result includes at least one of the following: peak signal-to-noise ratio (PSNR), absolute error and SAD, or absolute error after Hadamard transform and SATD.
  • PSNR peak signal-to-noise ratio
  • SAD absolute error and SAD
  • SATD absolute error after Hadamard transform and SATD.
  • the target image block includes a target luminance block
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the second training strategy
  • the S420 may include :
  • Intra-frame prediction is performed on the target chrominance block by using the network weight corresponding to the target luminance prediction mode in the multiple sets of network weights to obtain the target prediction block.
  • the second training strategy refers to training a neural network in the following manner to obtain the multiple network weights:
  • a neural network is trained on the multi-class training set to obtain the plurality of sets of network weights.
  • the multiple types of luma prediction modes include multiple types in the following;
  • Planar or matrix weighted intra prediction MIP mode DC mode, angle mode and wide angle mode.
  • the process of the decoding method 400 is the inverse process of the encoding method 300, that is, the steps in the decoding method 400 may refer to the corresponding steps in the encoding method 300, which are not repeated here for brevity.
  • intra_bdpcm_chroma_flag indicates whether to adopt the chroma prediction mode
  • intra_bdpcm_chroma_dir_flag indicates whether to adopt the vertical direction chroma prediction mode
  • ChromaNNEnabled indicates the permission flag of the chroma prediction mode based on the neural network
  • the permission flag can be a binary variable; the value is '1' Indicates that the neural network-based chroma prediction mode can be used, and a value of '0' means that the neural network-based chroma prediction mode cannot be used.
  • the allowable flag is derived according to the size and color components of the current block.
  • chroma_nn_flag represents the control flag of the neural network-based chroma prediction mode, which can be a binary variable; a value of '1' means using the neural network-based chroma prediction mode, and a value of '0' means not using a neural network-based chroma prediction mode chroma prediction mode.
  • chroma_nn_idx represents the index flag of the network weight, the variable value range depends on the number of network weights of the neural network, and the variable value represents the index value of the target network weight used.
  • the rate-distortion selection of the possible modes will be performed by the encoder, and the encoder will filter from the possible prediction modes for the current block.
  • ChromaNNEnabled determines whether the chroma prediction mode of the neural network can be used under the current block. If ChromaNNEnabled is "1", try to perform neural network-based chroma block prediction for the current block; if ChromaNNEnabled is "0", skip to c).
  • the set of weights can be directly selected for prediction.
  • the prediction result is further subjected to a rate-distortion screening operation. If it is determined that the cost of the result in this prediction mode is lower than other prediction modes, chroma_nn_flag is set to "1", otherwise it is "0".
  • the training strategy of network weights can be divided into two cases; The weights are tried to predict one by one, and the rate-distortion screening operation is further performed to determine the set with the least cost among the multiple sets of network weights; if this set of costs is also smaller than other traditional prediction modes, chroma_nn_flag is set to "1", otherwise it is "0" ", and chroma_nn_idx is set to the index of the current weight (ie, the target network weight).
  • the encoder needs to find the prediction mode selected by the luminance block (ie the target luminance block) corresponding to the current chroma block, and select it according to the classification principle of network weights during training.
  • the network weight corresponding to the prediction mode selected by the target luminance block, and the network weight is further used for prediction and rate-distortion screening, and it is determined that the cost of the result in this prediction mode is less than other prediction modes, then chroma_nn_flag is set to "1", Otherwise "0".
  • step a if the current block has completed the search for intra-frame prediction, load the next prediction block to search for the intra-frame mode and jump to step a).
  • the decoder obtains the code stream, parses the code stream and performs inverse transformation, inverse quantization and block-by-block prediction on the obtained residual information, if the block is an intra-frame prediction block and the current color component is a chrominance component;
  • ChromaNNEnabled determines whether the chroma prediction mode of the neural network can be used under the current block. If ChromaNNEnabled is "0", skip to c).
  • ChromaNNEnabled is "1"
  • chroma_nn_flag determines whether the current block uses the neural network-based chroma prediction mode for prediction.
  • the set of network weights can be directly selected for prediction. If the number of sets of network weights defined in advance is multiple sets, according to the weight training strategy, it can be divided into the following two cases; if multiple sets of network weights are trained according to the first training strategy mentioned above, the decoder further obtains the weight index chroma_nn_idx , select the corresponding network weight for prediction according to the index.
  • the decoder needs to find the prediction mode in the luma block selection (that is, the target luma block) at the corresponding position of the current chroma block, and according to the classification of the network weights in the training In principle, the network weight corresponding to the prediction mode selected by the target luminance block is selected, and the current block is predicted.
  • step a if the current block has completed the intra-frame reconstruction, load the next prediction block to perform intra-frame prediction and jump to step a).
  • FIG. 11 is a schematic block diagram of an encoder 500 according to an embodiment of the present application.
  • the encoder 500 may include:
  • a dividing unit 510 configured to divide the target image frame into multiple image blocks, where the target image blocks in the multiple image blocks include target chrominance blocks;
  • a selection unit 520 configured to: select an optimal prediction mode from the neural network-based chrominance prediction mode and the traditional prediction mode if the target chrominance block can be intra-predicted by using the neural network-based chrominance prediction mode;
  • a first processing unit 530 configured to: obtain a target residual block based on obtaining a target prediction block by using the optimal prediction mode;
  • the second processing unit 540 is configured to encode the target residual block, the permission flag and the control flag to obtain a code stream, and the permission flag is used to flag whether the neural network-based chrominance prediction mode is allowed to performing intra-frame prediction on the target chroma block, and the control flag is used to identify whether to use the neural network-based chroma prediction mode to perform intra-frame prediction on the target chroma block.
  • the selection unit 520 is specifically configured to:
  • the neural network-based chroma prediction mode is determined as the optimal prediction mode; if the first prediction The rate-distortion cost of the block is higher than the rate-distortion cost of the second prediction block, and the conventional prediction mode is determined as the optimal prediction mode.
  • the target image block includes a target luminance block
  • the selection unit 520 is specifically configured to:
  • the reconstruction part of the target luminance block and the reconstruction part adjacent to the target luminance block as input, predict the target chroma block to obtain the first prediction block.
  • the neural network-based chrominance prediction mode has a set of network weights; the selection unit 520 is specifically configured to:
  • the neural network-based chroma prediction mode includes multiple sets of network weights obtained by training the neural network according to the first training strategy; the selection unit 520 is specifically configured to:
  • the prediction block with the smallest rate-distortion cost is determined as the first prediction block.
  • the first training strategy refers to training a neural network in the following manner to obtain the multiple network weights:
  • the training set includes a plurality of training samples
  • the neural network on the training set and if the neural network converges, test the neural network on the training set to obtain the test results of the multiple training samples; based on the test results of the multiple training samples, according to Reordering the plurality of training samples from large to small and from small to large, respectively, to obtain two sub-training sets, the two sub-training sets are used as training sets to retrain the neural network until multiple sub-training sets are obtained,
  • the number of the multiple sub-training sets is equal to the number of the multiple sets of network parameters;
  • the neural network is trained on the multiple sub-training sets to obtain the multiple sets of network parameters.
  • the test result includes at least one of the following: peak signal-to-noise ratio (PSNR), absolute error and SAD, or absolute error after Hadamard transform and SATD.
  • PSNR peak signal-to-noise ratio
  • SAD absolute error and SAD
  • SATD absolute error after Hadamard transform and SATD.
  • the optimal prediction mode is the neural network-based chrominance prediction mode
  • the second processing unit 540 is specifically configured to:
  • the target residual block, the permission flag, the control flag and the index flag are encoded to obtain the code stream, and the index flag is used to identify the use of the neural network-based chrominance prediction mode for the
  • the index of the target network weight used when the target chroma block performs intra prediction, and the multiple sets of network weights include the target network weight.
  • the target image block includes a target luminance block
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the second training strategy
  • Intra-frame prediction is performed on the target chrominance block by using the network weight corresponding to the target luminance prediction mode in the multiple sets of network weights to obtain the first prediction block.
  • the second training strategy refers to training the neural network in the following manner to obtain the multiple network weights:
  • the training set is divided into multiple types of training sets corresponding to the multiple traditional prediction modes respectively;
  • a neural network is trained on the multi-class training set to obtain the plurality of sets of network weights.
  • the multiple traditional prediction modes include multiple types of:
  • Planar or matrix weighted intra prediction MIP mode DC mode, angle mode and wide angle mode.
  • the selection unit 520 before selecting the optimal prediction mode from the neural network-based chrominance prediction mode and the traditional prediction mode, the selection unit 520 is further configured to:
  • the target chroma block According to the size of the target chroma block, it is determined whether the target chroma block can be intra-predicted using the neural network-based chroma prediction mode.
  • the selection unit 520 is specifically configured to:
  • the neural network-based chroma prediction can be used mode intra-predicts the target chroma block.
  • FIG. 12 is a schematic block diagram of a decoder 600 according to an embodiment of the present application.
  • the decoder 600 may include:
  • the parsing unit 610 is configured to parse the code stream to obtain a target residual block, a permission flag, and a control flag, where the permission flag is used to flag whether to allow the target chroma block to be intra-frame using the neural network-based chroma prediction mode prediction, the control flag is used to identify whether to use the neural network-based chroma prediction mode to perform intra-frame prediction on the target chroma block;
  • the first processing unit 620 is configured to: if the permission flag indicates that the neural network-based chroma prediction mode can be used to perform intra-frame prediction on the target chroma block, and the control flag indicates to use the neural network-based chroma prediction mode
  • the chrominance prediction mode of the network performs intra-frame prediction on the target chrominance block, and uses the neural network-based chrominance prediction mode to perform intra-frame prediction on the target chrominance block to obtain a target prediction block;
  • the second processing unit 630 is configured to obtain a target image frame based on the target residual block and the target prediction block.
  • the first processing unit 620 is specifically configured to:
  • the reconstruction part of the target luminance block and the reconstruction part adjacent to the target luminance block as input, predict the target chroma block to obtain the target prediction piece.
  • the neural network-based chrominance prediction mode has a set of network weights; the first processing unit 620 is specifically configured to:
  • Intra-frame prediction is performed on the target chrominance block using the set of network weights to obtain the target prediction block.
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the first training strategy; the first processing unit 620 is specifically configured to:
  • the code stream is analyzed to obtain an index identifier, which is used to identify the index of the target network weight used when performing intra-frame prediction on the target chroma block using the neural network-based chroma prediction mode, so
  • the multiple sets of network weights include the target network weights
  • the first training strategy refers to training a neural network in the following manner to obtain the multiple network weights:
  • the training set includes a plurality of training samples
  • the neural network on the training set and if the neural network converges, test the neural network on the training set to obtain the test results of the multiple training samples; based on the test results of the multiple training samples, according to Reordering the plurality of training samples from large to small and from small to large, respectively, to obtain two sub-training sets, the two sub-training sets are used as training sets to retrain the neural network until multiple sub-training sets are obtained,
  • the number of the multiple sub-training sets is equal to the number of the multiple sets of network parameters;
  • the neural network is trained on the multiple sub-training sets to obtain the multiple sets of network parameters.
  • the test result includes at least one of the following: peak signal-to-noise ratio (PSNR), absolute error and SAD, or absolute error after Hadamard transform and SATD.
  • PSNR peak signal-to-noise ratio
  • SAD absolute error and SAD
  • SATD absolute error after Hadamard transform and SATD.
  • the target image block includes a target luminance block
  • the neural network-based chrominance prediction mode includes multiple sets of network weights obtained by training the neural network according to the second training strategy
  • the first processing Unit 620 is specifically used for:
  • Intra-frame prediction is performed on the target chrominance block by using the network weight corresponding to the target luminance prediction mode in the multiple sets of network weights to obtain the target prediction block.
  • the second training strategy refers to training a neural network in the following manner to obtain the multiple network weights:
  • a neural network is trained on the multi-class training set to obtain the plurality of sets of network weights.
  • the multiple types of luma prediction modes include multiple types in the following;
  • Planar or matrix weighted intra prediction MIP mode DC mode, angle mode and wide angle mode.
  • the apparatus embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, details are not repeated here.
  • the encoder 500 shown in FIG. 11 may correspond to the corresponding subject in executing the method 300 of the embodiments of the present application, that is, the aforementioned and other operations and/or functions of the various units in the encoder 500 are respectively for implementing the method 300 and the like
  • each unit in the encoder 500 or the decoder 600 involved in the embodiments of the present application may be respectively or all merged into one or several other units to form, or some of the unit(s) may be further disassembled It is divided into a plurality of units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit. In other embodiments of the present application, the encoder 500 or the decoder 600 may also include other units.
  • a general-purpose computing device including a general-purpose computer such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., and a general-purpose computer may be implemented
  • a computer program (including program code) capable of executing the steps involved in the corresponding method is run on the computer to construct the encoder 500 or the decoder 600 involved in the embodiments of the present application, and to implement the encoding or decoding methods of the embodiments of the present application.
  • the computer program may be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and executed in the electronic device, so as to implement the corresponding methods of the embodiments of the present application.
  • the units mentioned above can be implemented in the form of hardware, can also be implemented by instructions in the form of software, and can also be implemented in the form of a combination of software and hardware.
  • the steps of the method embodiments in the embodiments of the present application may be completed by hardware integrated logic circuits in the processor and/or instructions in the form of software, and the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as hardware
  • the execution of the decoding processor is completed, or the execution is completed by a combination of hardware and software in the decoding processor.
  • the software may be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • FIG. 13 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device 700 includes at least a processor 710 and a computer-readable storage medium 720 .
  • the processor 710 and the computer-readable storage medium 720 may be connected through a bus or other means.
  • the computer-readable storage medium 720 is used for storing a computer program 721
  • the computer program 721 includes computer instructions
  • the processor 710 is used for executing the computer instructions stored in the computer-readable storage medium 720 .
  • the processor 710 is the computing core and the control core of the electronic device 700, which is suitable for implementing one or more computer instructions, and is specifically suitable for loading and executing one or more computer instructions to implement corresponding method processes or corresponding functions.
  • the processor 710 may also be referred to as a central processing unit (Central Processing Unit, CPU).
  • the processor 710 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field Programmable Gate Array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • the computer-readable storage medium 720 may be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; computer readable storage medium.
  • the computer-readable storage medium 720 includes, but is not limited to, volatile memory and/or non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM).
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Random Access Memory
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the electronic device 700 may be an encoding terminal, an encoder, or an encoding framework involved in the embodiments of the present application;
  • the computer-readable storage medium 720 stores first computer instructions; loaded and executed by the processor 710
  • the first computer instructions stored in the computer-readable storage medium 720 are used to implement corresponding steps in the encoding method provided by the embodiments of the present application; in other words, the first computer instructions in the computer-readable storage medium 720 are loaded and executed by the processor 710
  • Corresponding steps, in order to avoid repetition, are not repeated here.
  • the electronic device 700 may be a decoding end, a decoder, or a decoding framework involved in the embodiments of the present application;
  • the computer-readable storage medium 720 stores second computer instructions; loaded and executed by the processor 710
  • the second computer instructions stored in the computer-readable storage medium 720 are used to implement corresponding steps in the decoding method provided by the embodiments of the present application; in other words, the second computer instructions in the computer-readable storage medium 720 are loaded and executed by the processor 710 Corresponding steps, in order to avoid repetition, are not repeated here.
  • an embodiment of the present application further provides a computer-readable storage medium (Memory), where the computer-readable storage medium is a memory device in the electronic device 700 for storing programs and data.
  • computer readable storage medium 720 may include both a built-in storage medium in the electronic device 700 , and certainly also an extended storage medium supported by the electronic device 700 .
  • the computer-readable storage medium provides storage space in which the operating system of the electronic device 700 is stored.
  • one or more computer instructions suitable for being loaded and executed by the processor 710 are also stored in the storage space, and these computer instructions may be one or more computer programs 721 (including program codes).
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the data processing device 700 may be a computer, the processor 710 reads the computer instructions from the computer-readable storage medium 720, and the processor 710 executes the computer instructions, so that the computer executes the encoding method provided in the above-mentioned various optional manners or decoding method.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server or data center via Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • wired eg, coaxial cable, optical fiber, digital subscriber line, DSL
  • wireless eg, infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供一种编码方法、解码方法、编码器、解码器以及电子设备,能够提升压缩性能。该编码方法,包括:将目标图帧划分为多个图像块,所述多个图像块中的多个图像块包括目标色度块;若可使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;基于利用所述最优预测模式得到目标预测块,得到目标残差块;对所述目标残差块、允许标识以及控制标识进行编码,得到码流。

Description

编码方法、解码方法、编码器、解码器以及电子设备 技术领域
本申请实施例涉及图像编解码技术领域,并且更具体地,涉及编码方法、解码方法、编码器、解码器以及电子设备。
背景技术
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够实现视频解压缩,但目前仍然需要追求更好的数字视频压缩技术,以提升压缩性能。
发明内容
本申请实施例提供一种编码方法、解码方法、编码器、解码器以及电子设备,能够提升压缩性能。
一方面,本申请实施例提供了一种编码方法,包括:
将目标图帧划分为多个图像块,所述多个图像块中的目标图像块包括目标色度块;
若可使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;
基于利用所述最优预测模式得到的目标预测块,得到目标残差块;
对所述目标残差块、允许标识以及控制标识进行编码,得到码流,所述允许标识用于标识是否允许使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
另一方面,本申请实施例提供了一种解码方法,包括:
对码流解析获取目标残差块、允许标识以及控制标识,所述允许标识用于标识是否允许使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测;
若所述允许标识指示可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,且所述控制标识指示使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块;
基于所述目标残差块和所述目标预测块,得到目标图像帧。
另一方面,本申请实施例提供了一种编码器,用于执行上述第一方面或其各实现方式中的方法。具体地,该编码器包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
另一方面,本申请实施例提供了一种解码器,用于执行上述第二方面或其各实现方式中的方法。具体地,该解码器包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
另一方面,本申请实施例提供了一种电子设备,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机指令,该计算机指令被计算机设备的处理器读取并执行时,使得计算机设备执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
本申请实施例中,通过引入基于神经网络的色度预测模式,在可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测的情况下,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;然后基于所述最优预测模式得到目标预测块;能够提升压缩性能。
附图说明
图1是本申请实施例提供的编码框架的示意性框图。
图2是本申请实施例提供的33种角度预测模式的具体方向示意图。
图3是本申请实施例提供的MIP模式的示意性流程图。
图4是本申请实施例提供的解码框架的示意性框图。
图5是基于图1所示的编码框架的扩展后编码框架的示意性流程图。
图6是本申请实施例提供的编码方法的示意性流程图。
图7和图8是本申请实施例提供的输入视频格式为YUV420的情况下基于神经网络的色度预测模式的输入的示意性结构图。
图9是本申请实施例提供的第一训练策略的示意性流程图。
图10是本申请实施例的解码方法的示意性流程图。
图11是本申请实施例的编码器的示意性框图。
图12是本申请实施例的解码器的示意性框图。
图13是本申请实施例提供的电子设备的示意结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的方案可应用于数字视频编码技术领域,例如,图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域。本申请实施例提供的方案可结合至音视频编码标准(Audio Video coding Standard,AVS)、第二代AVS标准(AVS2)或第三代AVS标准(AVS3)。包括但不限于H.264/音视频编码(Audio Video coding,AVC)标准、H.265/高效视频编码(High Efficiency Video Coding,HEVC)标准以及H.266/多功能视频编码(Versatile Video Coding,VVC)标准。本申请实施例提供的方案可以用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
在数字视频编码过程中,编码器对不同颜色格式的原始视频序列读取不相等的亮度分量的像素和色度分量的像素,即编码器读取一幅黑白图像或彩色图像,然后针对图像或彩色图像分别进行编码。其中,黑白图像可以包括亮度分量的像素,彩色图像可以包括色度分量的像素,可选的,彩色图像还可以包括亮度分量的像素。原始视频序列的颜色格式可以是亮度色度(YCbCr,YUV)格式或红绿蓝(Red-Green-Blue,RGB)格式等。编码器读取一幅黑白图像或彩色图像之后,分别将其划分成块数据,并对块数据进行编码。该块数据可以是编码树单元(Coding Tree Unit,CTU)或编码单元块(Coding Unit,CU),一个编码树单元又可以继续被划分成若干个CU,CU可以为长方形块也可以为正方形块。即编码器可基于CTU或CU进行编码。如今编码器通常为混合框架编码模式,一般包含帧内与帧间预测、变换与量化、反变换与反量化、环路滤波及熵编码等操作。帧内预测只参考同一帧图像的信息,预测当前划分块内的像素信息,用于消除空间冗余;帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配当前划分块的运动矢量信息,用于消除时间冗余;变换将预测后的图像块转换到频率域,能量重新分布,结合量化可以将人眼不敏感的信息去除,用于消除视觉冗余;熵编码可以根据当前上下文模型以及二进制码流的概率信息消除字符冗余。
为了便于理解,先对本申请提供的编码框架进行简单介绍。
图1是本申请实施例提供的编码框架100的示意性框图。
如图1所示,该编码框架100可包括帧内预测单元180、帧间预测单元170、残差单元110、变换与量化单元120、熵编码单元130、反变换与反量化单元140、以及环路滤波单元150。可选的,该编码框架100还可包括解码图像缓冲单元160。该编码框架100也可称为混合框架编码模式。
在编码框架100中,帧内预测单元180或帧间预测单元170可对待编码图像块进行预测,以输出预测块。残差单元110可基于预测块与待编码图像块计算残差块,即预测块和待编码图像块的差值。该残差块经由变换与量化单元120变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换与量化单元120变换与量化之前的残差块可称为时域残差块,经过变换与量化单元120变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元130接收到变换与量化单元120输出的变换量化系数后,可基于该变换量化系数输出码流。例如,熵编码单元130可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。例如,熵编码单元130可以用于基于上下文的自适应二进制算术熵编码(CABAC)。熵编码单元130也可称为头信息编码单元。可选的,在本申请中,该待编码图像块也可称为原始图像块或目标图像块,预测块也可称为预测图像块或图像预测块,还可以称为预测信号或预测信息,重建块也可称为重建图像块或图像重建块,还可以称为重建信号或重建信息。此外,针对编码端,该待编码图像块也可称为编码块或编码图像块,针对解码端,该待编码图像块也可称为解码块或解码图像块。该待编码图像块可以是CTU或CU。
简言之,编码框架100将预测块与待编码图像块计算残差得到残差块经由变换与量化等过程,将 残差块传输到解码端。解码端接收并解析码流后,经过反变换与反量化等步骤得到残差块,将解码端预测得到的预测块叠加残差块后得到重建块。
需要说明的是,编码框架100中的反变换与反量化单元140、环路滤波单元150以及解码图像缓冲单元160可用于形成一个解码器。相当于,帧内预测单元180或帧间预测单元170可基于已有的重建块对待编码图像块进行预测,进而能够保证编码端和解码端的对参考帧的理解一致。换言之,编码器可复制解码器的处理环路,进而可与解码端产生相同的预测。具体而言,量化的变换系数通过反变换与反量化单元140反变换与反量化来复制解码端的近似残差块。该近似残差块加上预测块后可经过环路滤波单元150,以平滑滤除由于基于块处理和量化产生的块效应等影响。环路滤波单元150输出的图像块可存储在解码图像缓存单元160中,以便用于后续图像的预测。
帧内预测单元180可用于帧内预测,帧内预测只参考同一帧图像的信息,预测待编码图像块内的像素信息,用于消除空间冗余;帧内预测所使用的帧可以为I帧。例如,可根据从左至右、从上到下的编码顺序,待编码图像块可以参考左上方图像块,上方图像块以及左侧图像块作为参考信息来预测待编码图像块,而待编码图像块又作为下一个图像块的参考信息,如此,可对整幅图像进行预测。若输入的数字视频为彩色格式,例如YUV 4:2:0格式,则该数字视频的每一图像帧的每4个像素点由4个Y分量和2个UV分量组成,编码框架100可对Y分量(即亮度块)和UV分量(即色度块)分别进行编码。类似的,解码端也可根据格式进行相应的解码。帧间预测单元170可用于帧间预测,帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配待编码图像块的运动矢量信息,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。
针对帧内预测过程,帧内预测可借助角度预测模式与非角度预测模式对待编码图像块进行预测,以得到预测块,根据预测块与待编码图像块计算得到的率失真信息,筛选出待编码图像块最优的预测模式,并将该预测模式经码流传输到解码端。解码端解析出预测模式,预测得到目标解码块的预测块并叠加经码流传输而获取的时域残差块,可得到重建块。经过历代的数字视频编解码标准发展,非角度模式保持相对稳定,有均值模式和平面模式;角度模式则随着数字视频编解码标准的演进而不断增加。以国际数字视频编码标准H系列为例,H.264/AVC标准仅有8种角度预测模式和1种非角度预测模式;H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。在H.266/VVC中,帧内预测模式被进一步拓展,对于亮度块共有67种传统预测模式和非传统的预测模式矩阵加权帧内预测(Matrix weighted intra-frame prediction,MIP)模式,传统预测模式包括:模式编号0的平面(planar)模式、模式编号1的DC模式和模式编号2到模式编号66的角度预测模式。图2为本申请实施例提供的33种角度预测模式的具体方向示意图,如图2所示,33种角度预测模式分为水平类模式和竖直类模式,水平类模式包括H+32(模式编号2)至H-32(模式编号17),竖直类模式包括V-32(模式编号18)至V+32(模式编号34)。V0(模式编号26)和H0(模式编号10)分别表示竖直和水平方向,其余角度预测模式的预测方向都可以看作是在竖直或水平方向上做一个角度偏移。VVC的参考软件测试平台(VVC TEST MODEL,VTM)对于色度块除了planar模式、DC模式和角度模式外,还有跨分量线性色度预测(Cross component linear model prediction,CCLM)模式。MIP模式目前为VVC独有,而CCLM模式也存在于其它先进的标准里,例如AV1的来自亮度的色度(Chroma from Luma,CfL)模式和AVS3的两步跨分量预测模式(Two Step Cross-component Prediction Mode,TSCPM)。
MIP源于基于神经网络的预测技术,该技术采用全连接神经网络。
图3是本申请实施例提供的MIP模式的示意性流程图。
如图3所示,利用预测块左侧K列,上方K行和左上方的K列重建像素点为输入,经过3层全连接层、非线性激活函数和1层线性全连接层,得到待编码图像块的预测像素值,即待编码图像块的预测块。重建像素点也可称为重构像素值或重构像素点。针对不同形状的待编码图像块,对多套全连接神经网络的参数即网络权重进行率失真筛选,选择出最优的一套网络权重进行预测,并将此套参数的索引编入码流。网络权重可包括矩阵和偏置(biases)等参数。MIP相比于神经网络,经过了包括网络参数、输入点数等多方面的简化,最终采用向量乘矩阵的形式完成预测。在MIP技术中,对于一个宽度为N,高度为M的待编码图像块,MIP会选取该块上方一行的W个重建像素点和左侧一列的H个重建像素点作为输入。如果这些位置的像素还未被重建,可像传统预测方法一样处理。MIP产生预测值主要基于三个步骤,分别是参考像素取均值、矩阵向量相乘和线性插值上采样。MIP作用于4x4至32x32大小的块,对于一个长方形的块,若长方形短边长为4时,将会从预先训练好的16套16列4行的矩阵和偏置(即网络权重)中选取最优;若长方形短边长为8时,将会从预先训练好的8套16列8行的矩阵和偏置中选取最优;若长方形短边长为16时;将会从预先训练好的6套64 列8行的矩阵和偏置中选取最优。可通过合并多个训练后的神经网络的网络权重,获取上述涉及特定尺寸的块所对应的多套的矩阵和偏置。需要说明的是,上文中涉及的MIP工具源于基于全连接神经网络的帧内预测,但局限于其只使用在亮度块的预测。并且,MIP的模型种类过多,训练起来较为复杂。目前也没有一套很好的基于全连接神经网络的预测色度块的方案。
应理解,图1至图3仅为本申请的示例,不应理解为对本申请的限制。
例如,该编码框架100中的环路滤波单元150可包括去块滤波器(DBF)和样点自适应补偿滤波(SAO)。DBF的作用是去块效应,SAO的作用是去振铃效应。在本申请的其他实施例中,该编码框架100可采用基于神经网络的环路滤波算法,以提高视频的压缩效率。或者说,该编码框架100可以是基于深度学习的神经网络的视频编码混合框架。在一种实现中,可以在去块滤波器和样点自适应补偿滤波基础上,采用基于卷积神经网络的模型计算对像素滤波后的结果。环路滤波单元150在亮度分量和色度分量上的网络结构可以相同,也可以有所不同。考虑到亮度分量包含更多的视觉信息,还可以采用亮度分量指导色度分量的滤波,以提升色度分量的重建质量。
图4是本申请实施例提供的解码框架200的示意性框图。
如图4所示,该解码框架200可包括熵解码单元210、反变换反量化单元220、残差单元230、帧内预测单元240、帧间预测单元250、环路滤波单元260、解码图像缓存单元270。
熵解码单元210接收并解析码流后,以获取预测块和频域残差块,针对频域残差块,通过反变换反量化单元220进行反变换与反量化等步骤,可获取时域残差块,残差单元230将帧内预测单元240或帧间预测单元250预测得到的预测块叠加至经过通过反变换反量化单元220进行反变换与反量化之后的时域残差块,可得到重建块。例如,帧内预测单元240或帧间预测单元250可通过解码码流的头信息,获取预测块。
本申请实施例提供了一种基于全连接神经网络的预测色度块的方案,既考虑到了利用色度重建部分又利用到亮度重建部分对待预测色度块进行预测。图5是本申请实施例提供的编码框架100-1的示意性流程图。如图5所示,所述编码框架100-1可包括基于神经网络的色度帧内预测单元190,所述基于神经网络的色度帧内预测单元190可对待预测色度块进行预测,需要说明的是,所述编码框架100-1是编码框架100的扩展,即编码框架100-1中的其他单元可参见编码框架100中的相关描述,为避免重复,此处不再赘述。
图6是本申请实施例提供的编码方法300的示意性流程图。应理解,该编码方法300可由编码端执行。例如应用于图5所示的编码框架100-1。为便于描述,下面以编码端为例进行说明。
如图6所示,该编码方法300可包括:
S310,将目标图帧划分为多个图像块,所述多个图像块中的目标图像块包括目标色度块;
S320,若可使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;
S330,基于利用所述最优预测模式得到的目标预测块,得到目标残差块;
S340,对所述目标残差块、允许标识以及控制标识进行编码,得到码流,所述允许标识用于标识是否允许使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
简言之,编码端在进行色度块预测时,若可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式,最优预测模式的标识作为控制标识被编入码流,供解码端读取。
通过引入基于神经网络的色度预测模式,在可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测的情况下,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;然后基于所述最优预测模式得到目标预测块;能够提升压缩性能。
下面结合试验结果对性能提升效果进行说明。
在一次试验中,参考软件VTM-10.0的参数设置如下:
1、目标色度块的参考行数设为K,全连接模型适用于三种形状的色度块,即三种全连接模型,分别为4xN和Nx4(N>=4)形状的色度块,8xN和Nx8(N>=8)形状的色度块,以及16xN和Nx16(N>=16)形状的色度块。
2、每种全连接模型仅含有一套网络权重。
3、使用的训练集为VVC测试序列中的Class-B,C,D,E,F中的8bit视频的部分帧。
基于上述设置,在全帧内帧(All intra)配置的测试结果如下述表1所示:
表1
视频类 视频名 Y分量增益 U分量增益 V分量增益
Class B MarketPlace -0.17% -0.64% -0.30%
  RitualDance -0.16% -0.24% -0.29%
  Cactus -2.61% -8.44% -10.07%
  BasketballDrive -0.62% -1.20% -1.52%
  BQTerrace -0.09% -0.72% 0.32%
Class C BasketballDrill -3.29% -23.81% -16.74%
  BQMall -0.42% -0.74% 0.32%
  PartyScene -0.36% -0.15% 0.84%
  RaceHorses -0.53% -0.09% -0.17%
Class D BasketballPass -0.58% -1.07% -0.40%
  BQSquare -0.02% -0.57% 0.01%
  BlowingBubbles -0.36% -0.35% -0.22%
  RaceHorses -0.61% -0.19% 0.22%
ClassE FourPeople -1.53% -8.55% -8.04%
  Johnny -2.12% -12.42% -12.61%
  KristenAndSara -2.38% -15.96% -13.98%
ClassF BasketballDrillText -4.01% -23.63% -16.80%
  ArenaOfValor -2.71% -6.87% -11.36%
  SlideEditing -4.66% -18.57% -14.08%
  SlideShow -0.73% -1.77% -0.50%
其中,“-”代表BD-rates下降,即性能提升。
在另一次试验中,参考软件VTM-10.0的参数设置如下:
1、目标色度块的参考行数设为K,全连接模型适用于三种形状的色度块,即三种全连接模型,分别为4xN和Nx4(N>=4)形状的色度块,8xN和Nx8(N>=8)形状的色度块,以及16xN和Nx16(N>=16)形状的色度块。
2、每种全连接模型仅含有一套网络权重。
3、使用的训练集为DIV2K图片集。DIV2K为非测试序列视频的训练集。
基于上述设置,在All intra配置的测试结果如下述表2所示:
表2
视频类 视频名 Y分量增益 U分量增益 V分量增益
Class C BasketballDrill -0.39% -0.73% 0.20%
  BQMall -0.32% -0.56% 0.03%
  PartyScene -0.29% -0.72% 0.10%
  RaceHorses -0.51% -0.25% -0.34%
Class D BasketballPass -0.64% -1.46% -0.66%
  BQSquare -0.07% -0.81% -0.20%
  BlowingBubbles -0.34% -0.87% -0.74%
  RaceHorses -0.64% -0.67% -0.20%
ClassE FourPeople -0.17% -0.60% -0.36%
  Johnny -0.20% -1.09% -0.79%
  KristenAndSara -0.26% -0.95% -0.62%
其中,“-”代表BD-rates下降,即性能提升。
基于表1和表2可见,本申请实施例提供的方案,能够提升压缩性能。
在本申请的一些实施例中,所述S320可包括:
使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块;
使用所述传统预测模式对所述目标色度块进行帧内预测,以得到第二预测块;
若所述第一预测块的率失真代价低于所述第二预测块的率失真代价,将所述基于神经网络的色度预测模式确定为所述最优预测模式;若所述第一预测块的率失真代价高于所述第二预测块的率失真代价,将所述传统预测模式确定为所述最优预测模式。
换言之,基于神经网络的色度帧内预测模式是否被选中由编码端决定。在编码端,当筛选帧内预测模式时,基于神经网络的色度帧内预测与传统预测模式会共同进行率失真筛选。若传统模式代价低,则选中传统预测模式;若基于神经网络的色度预测模式代价低,则选中基于神经网络的预测模式。选中的模式将被编入码流供解码端读取。在解码端,当解析出的色度帧内预测模式为基于神经网络的色 度预测模式时,采用对应的神经网络模型进行预测;当解析出为传统模式时,采用对应传统模式进行预测。
在本申请的一些实施例中,可通过以下方式得到所述第一预测块:
以所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部分为输入,对所述目标色度块进行预测,得到所述第一预测块。
换言之,针对基于神经网络的色度预测模式,其输入为所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部分。所述重建部分可以是至少一行或至少一列重建像素点。所述目标色度块相邻的重建部分可包括所述目标色度块相邻的重建参考行以及所述目标色度块相邻的重建参考列,或者,所述目标色度块相邻的重建部分可包括所述目标色度块的上方重建参考行以及所述目标色度块的左侧重建参考列。所述目标亮度块相邻的重建部分可包括所述目标亮度块的上方重建参考行以及所述目标亮度块的左侧重建参考列。所述目标亮度块相邻的重建部分也可称为所述目标亮度块的重建块。本申请实施例中,当利用全连接神经网络对目标色度块进行帧内预测时,除了将色度块周围的参考像素作为网络输入外,还可以将对应位置上的亮度块的重建块输入,从而辅助预测,以提升预测效果。
图7和图8是本申请实施例提供的输入视频格式为YUV420的情况下基于神经网络的色度预测模式的输入的示意性结构图。具体地,图7是基于神经网络的色度预测模式下的色度块的像素点作为输入的示意性结构图。图8是基于神经网络的色度预测模式下的亮度块的像素点作为输入的示意性结构图。如图7所示,基于神经网络的色度预测模式的输入可包括目标色度块N相邻的M行重建像素点、目标色度块N相邻M列重建像素点、目标亮度块的重建部分2N、目标亮度块的重建部分2N相邻的2M行重建像素点以及目标亮度块的重建部分2N相邻的2M列重建像素点,M大于等于1。作为示例,在图7和图8中,目标色度块可以为矩形,N等于8,M等于2。当然,图7和图8仅为本申请的示例,本申请实施例对M和N的具体数值不作限定。在本申请的其他实施例中,当输入视频格式为YUV444或YUV422时,亮度和色度间的比例会发生变化,这时,基于神经网络的色度预测模式的输入大小也应随之进行调整。
此外,本申请中涉及的神经网络可以选用多层的全连接神经网络。全连接神经网络共由K(K大于等于1)层全连接层组成,每层全连接层后设置有非线性激活函数ReLU或其他激活函数,每层全连接包含M个节点。在一种实现方式中,考虑到全连接神经网络的需求,需要将目标色度块的上方重建参考行,左侧重建参考行,目标亮度块的上方重建参考行,左侧重建参考行和目标色度块的重建块的顺序排列成1维向量形式,并将其作为全连接网络首层的输入。
另外,本申请实施例中,针对特定形状的色度块,可通过其对应的神经网络预测,即针对特定形状的色度块,可通过其对应的网络权重进行预测。但本申请实施例对色度块和网络网络之间的对应关系,或色度块和网络权重的对应关系不作具体限定。例如,针对特定形状的色度块,可以对应一个神经网络,也可以对应多个神经网络,即可以对应一套网络权重,也可以对应多套网络权重,本申请实施例对此不作具体限定。
下面结合具体情况对得到第一预测块的方案进行说明。
情况1:
所述基于神经网络的色度预测模式具有一套网络权重。
此时,可使用所述一套网络权重对所述目标色度块进行帧内预测,得到所述第一预测块。
情况2:
所述基于神经网络的色度预测模式包括按照第一训练策略训练神经网络得到的多套网络权重;此时,可通过以下方式得到所述第一预测块:
使用所述多套网络权重分别对所述目标色度块进行帧内预测,得到多个预测块;
在所述多个预测块中选择出率失真代价最小的预测块;
将所述率失真代价最小的预测块确定为所述第一预测块。
在一种实现方式中,所述第一训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
获取训练集,所述训练集包括多个训练样本;
在所述训练集上训练神经网络,若神经网络收敛,将神经网络在所述训练集上进行测试,得到所述多个训练样本的测试结果;基于所述多个训练样本的测试结果,按照由大到小以及有小到大的顺序分别将所述多个训练样本进行重排序,得到两个子训练集,所述两个子训练集作为训练集重新训练神经网络,直至得到多个子训练集,所述多个子训练集的数量和所述多套网络参数的数量相等;
在所述多个子训练集上训练神经网络,得到所述多套网络参数。
简言之,可以将前一次的训练集的测试结果作为分类器,将训练集中的训练样本按照代价大小重 排序并分为若干子类,从而进一步训练出多套网络权重。
在一种实现方式中,该测试结果包括以下中的至少一项:峰值信噪比(Peak Signal To Noise Ratio,PSNR)、绝对误差和(Sum of Absolute Differences,SAD)、或哈达玛变换后的绝对误差和(Sum of Absolute Differences after Hadamard,SATD)。
图9是本申请实施例提供的第一训练策略的示意性流程图。如图9所示,首先在生成的训练集上对网络进行第一次训练,当网络收敛时,将网络在训练集上测试,得到训练集中每一个训练样本的PSNR,根据每一个训练样本的PSNR对训练集进行重排序,将PSNR较大的一般作为子训练集1,PSNR较小的一般作为子训练集2。当然,PSNR也可以是其他指标。例如绝对误差和(SAD),hadamard变换后的绝对误差和(SATD)等,重复上述步骤,将子训练集1进一步划分为子训练集1-1和子训练集1-2,将子训练集2进一步划分为子训练集2-1和子训练集2-2,直到子训练集个数等于N,然后在N个子集上训练后即可得到N套网络权重。在编解码端使用当前训练策略下的网络权重时,编码端需要筛选预测模式时采用率失真选择决定出具体哪一种网络权重被选中。
在一种实现方式中,所述最优预测模式为所述基于神经网络的色度预测模式,所述S340可包括:
对所述目标残差块、所述允许标识、所述控制标识以及索引标识进行编码,得到所述码流,所述索引标识用于标识使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测时使用的目标网络权重的索引,所述多套网络权重包括所述目标网络权重。
情况3:
所述目标图像块包括目标亮度块,所述基于神经网络的色度预测模式包括按照第二训练策略训练神经网络得到的多套网络权重;此时,可通过以下方式得到所述第一预测块:
确定所述目标亮度块使用的目标亮度预测模式;
使用所述多套网络权重中与所述目标亮度预测模式对应的网络权重,对所述目标色度块进行帧内预测,得到所述第一预测块。
在一种实现方式中,所述第二训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
以训练集中的亮度块所选中的多种传统预测模式作为依据,将所述训练集划分为所述多种传统预测模式分别对应的多类训练集;
在所述多类训练集上训练神经网络,得到所述多套网络权重。
简言之,可以通过训练集中的亮度块所选中的帧内预测模式对训练集进行划分,进而基于划分后的训练集得到多套网络权重。
在一种实现方式中,所述多种传统预测模式包括以下中的多种类型;
平面planar或矩阵加权帧内预测MIP模式、DC模式、角度模式以及宽角度模式。其中,角度模式可针对矩形色度块而言,宽角度模式可针对非矩形色度块而言。非矩形色度块可以是宽和高不相等的色度块。
换言之,第二训练策略以训练集中的亮度块所选中的多种传统预测模式作为依据,对训练集的分类,即一种传统预测模式可对应一类训练集,假设按照编码器选中的传统预测模式分为N类训练集,则可在分出的N类训练集上训练出N套网络权重。在一种实现方式中,可将N设定为6类,即Planar或MIP模式、DC模式、2~17的角度模式、18~33的角度模式、34~49的角度模式、以及50~66的角度模式。由于亮度块与色度块有着相似的纹理特征,编解码端使用第二训练策略下的网络权重时,编码端仅需要直接找出当前色度块对应位置的亮度块选中的模式,继而可根据亮度块的模式从多套网络权重里选择出对应的一套。
在本申请的一些实施例中,所述S320之前,所述方法300还可包括:
根据所述目标色度块的大小,确定是否可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
在本申请的一些实施例中,所述根据所述目标色度块的大小,确定是否可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,包括:
若所述目标色度块的宽度为4、8或大于等于16,或者,若所述目标色度块的高度为4、8或大于等于16,确定可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
需要说明的是,上述实施例仅为本申请的示例,不应理解为对本申请的限制。
例如,本申请实施例对全连接神经网络的结构,包括全连接层的节点个数,非线性激活函数等具体实现方式不作限定,再如,本申请实施例对全连接神经网络输入部分中参考列的列数或参考行的行数不作具体限定。再如,本申请实施例对全连接神经网络适用的色度块的大小,神经网络的结构以及种类不作具体限定。再如,本申请实施例对第二训练策略可训练的网络权重的具体数量不作限定。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体 细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文中结合图6至图9,从编码端的角度详细描述了根据本申请实施例的编码方法,下面将结合图10,从解码端的角度描述根据本申请实施例的解码方法。
图10示出了根据本申请实施例的解码方法400的示意性流程图。所述方法400可以由包括基于神经网络的色度帧内预测单元的解码框架。在一种实现方式中,可将基于神经网络的色度帧内预测单元扩展至图4所述的解码框架中,以执行所述解码方法400。
如图10所示,所述方法400可包括:
S410,对码流解析获取目标残差块、允许标识以及控制标识,所述允许标识用于标识是否允许使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测;
S420,若所述允许标识指示可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,且所述控制标识指示使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块;
S430,基于所述目标残差块和所述目标预测块,得到目标图像帧。
在本申请的一些实施例中,所述S420可包括:
以所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部分为输入,对所述目标色度块进行预测,得到所述目标预测块。
在本申请的一些实施例中,所述基于神经网络的色度预测模式具有一套网络权重;所述S430可包括:
使用所述一套网络权重对所述目标色度块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述基于神经网络的色度预测模式包括按照第一训练策略训练神经网络得到的多套网络权重;所述S420可包括:
对所述码流解析,得到索引标识,所述索引标识用于标识使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测时使用的目标网络权重的索引,所述多套网络权重包括所述目标网络权重;
使用所述索引标识指示的目标网络权重对所述目标色度块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述第一训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
获取训练集,所述训练集包括多个训练样本;
在所述训练集上训练神经网络,若神经网络收敛,将神经网络在所述训练集上进行测试,得到所述多个训练样本的测试结果;基于所述多个训练样本的测试结果,按照由大到小以及有小到大的顺序分别将所述多个训练样本进行重排序,得到两个子训练集,所述两个子训练集作为训练集重新训练神经网络,直至得到多个子训练集,所述多个子训练集的数量和所述多套网络参数的数量相等;
在所述多个子训练集上训练神经网络,得到所述多套网络参数。
在本申请的一些实施例中,所述测试结果包括以下中的至少一项:峰值信噪比PSNR、绝对误差和SAD、或哈达玛变换后的绝对误差和SATD。
在本申请的一些实施例中,所述目标图像块包括目标亮度块,所述基于神经网络的色度预测模式包括按照第二训练策略训练神经网络得到的多套网络权重;所述S420可包括:
确定所述目标亮度块使用的目标亮度预测模式;
使用所述多套网络权重中与所述目标亮度预测模式对应的网络权重,对所述目标色度块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述第二训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
基于色度块可使用的多类亮度预测模式,获取所述多类亮度预测模式分别对应的多类训练集,所述多类亮度预测模式包括所述目标亮度预测模式;
在所述多类训练集上训练神经网络,得到所述多套网络权重。
在本申请的一些实施例中,所述多类亮度预测模式包括以下中的多种类型;
平面planar或矩阵加权帧内预测MIP模式、DC模式、角度模式以及宽角度模式。
应理解,解码方法400的过程为编码方法300的逆过程,即解码方法400中的步骤可以参考编码方法300中的相应步骤,为了简洁,在此不再赘述。
下面结合具体的语法对本申请实施例的方案进行说明。
表3
Figure PCTCN2020133597-appb-000001
其中,intra_bdpcm_chroma_flag表示是否采用色度预测模式,intra_bdpcm_chroma_dir_flag表示是否采用垂直方向色度预测模式,ChromaNNEnabled表示基于神经网络的色度预测模式的允许标识,该允许标识可以是二值变量;值为‘1’表示可使用基于神经网络的色度预测模式,值为‘0’表示不可使用基于神经网络的色度预测模式。该允许标识根据当前块的大小和颜色分量导出,当当前块为色度块时,满足下列条件一个即为“1”:当前块的宽度或高度为4、当前块的宽度或高度为8、当前块的宽度或高度大于或等于16。chroma_nn_flag表示基于神经网络的色度预测模式的控制标识,该控制标识可以是二值变量;值为‘1’表示使用基于神经网络的色度预测模式,值为‘0’表示不使用基于神经网络的色度预测模式。chroma_nn_idx表示网络权重的索引标志,变量取值范围取决于神经网络的网络权重的数量,变量值代表使用的目标网络权重的索引值。
表4
Figure PCTCN2020133597-appb-000002
其中,表4中涉及的语法的语义和表3中相同,为避免重复,此处不再赘述。
实施例:
本技术方案在编码端的帧内预测流程如下:
编码端进入帧内模式的搜索时,会对可能的模式进行的率失真选择,编码器会从对于当前块可能的预测模式进行筛选,当进行筛选时。
a)、首先根据ChromaNNEnabled条件判断当前块下是否可以使用神经网络的色度预测模式。若ChromaNNEnabled为“1”,则对当前块尝试进行基于神经网络的色度块预测,若ChromaNNEnabled为“0”,则跳至c)。
b)、根据当前块的大小选择对应的事先定义好的网络权重。
若事先定义的网络权重为一套,则可直接选用该套权重进行预测。并将预测结果进一步进行率失真筛选操作,若确定该种预测模式下结果的代价小于其他预测模式,则chroma_nn_flag置为“1”,否则为“0”。若事先定义的网络权重套数为多套时,根据网络权重的训练策略又可分为两种情况;若多套网络权重按照上文涉及的第一训练策略训练得到,编码器需在多套网络权重逐一尝试预测,并进一步进行率失真筛选操作,确定出多套网络权重中代价最小的一套;若这组代价同时也小于其他传统预测模式,则chroma_nn_flag置为“1”,否则为“0”,且chroma_nn_idx置为当前权重(即目标网络权重)的索引。若多套网络权重按照上文涉及的第二训练策略得到,编码器需要找到当前色度块对应位置的亮度块(即目标亮度块)选中的预测模式,根据训练中对网络权重的分类原则选中与目标亮度块选择的预测模式所对应的网络权重,并进一步采用该网络权重进行预测和率失真的筛选,确定该种预测模式下结果的代价小于其他预测模式,则chroma_nn_flag置为“1”,否则为“0”。
c)、若当前块已完成帧内预测的搜索则加载下一个预测块进行帧内模式的搜索跳转至步骤a)。
本技术方案在解码端的帧内预测流程如下:
解码器获取码流,解析码流并对得到的残差信息进行反变换、反量化以及逐块的预测,若该块为帧内预测块且当前颜色分量为色度分量时;
a)、首先根据ChromaNNEnabled条件判断当前块下是否可以使用神经网络的色度预测模式。若ChromaNNEnabled为“0”,则跳至c)。
b)、ChromaNNEnabled为“1”时,根据解码出的chroma_nn_flag确定当前块是否使用基于神经网络的色度预测模式进行预测。
若事先定义的网络权重套数为一套,则可直接选用该套网络权重进行预测。若事先定义的网络权重套数为多套,根据权重训练时的策略又可分为以下两种情况;若多套网络权重按照上文涉及的第一训练策略训练得到,解码器进一步获取权重索引chroma_nn_idx,根据索引选中对应的网络权重进行预测。若多套网络权重按照上文涉及的第二训练策略训练得到,解码器需要找到当前色度块对应位置的亮度块选(即目标亮度块)中的预测模式,根据训练中对网络权重的分类原则选中与目标亮度块选择的预测模式所对应的网络权重,对当前块进行预测。
c)、若当前块已完成帧内的重建则加载下一个预测块进行帧内模式的预测跳转至步骤a)。
上文详细描述了本申请的方法实施例,下文结合图11至图13,详细描述本申请的装置实施例。
图11是本申请实施例的编码器500的示意性框图。
如图11所示,所述编码器500可包括:
划分单元510,用于将目标图帧划分为多个图像块,所述多个图像块中的目标图像块包括目标色度块;
选择单元520,用于:若可使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;
第一处理单元530,用于:基于利用所述最优预测模式得到目标预测块,得到目标残差块;
第二处理单元540,用于对所述目标残差块、允许标识以及控制标识进行编码,得到码流,所述允许标识用于标识是否允许使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
在本申请的一些实施例中,所述选择单元520具体用于:
使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块;
使用所述传统预测模式对所述目标色度块进行帧内预测,以得到第二预测块;
若所述第一预测块的率失真代价低于所述第二预测块的率失真代价,将所述基于神经网络的色度预测模式确定为所述最优预测模式;若所述第一预测块的率失真代价高于所述第二预测块的率失真代价,将所述传统预测模式确定为所述最优预测模式。
在本申请的一些实施例中,所述目标图像块包括目标亮度块,所述选择单元520具体用于:
以所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部 分为输入,对所述目标色度块进行预测,得到所述第一预测块。
在本申请的一些实施例中,所述基于神经网络的色度预测模式具有一套网络权重;所述选择单元520具体用于:
使用所述一套网络权重对所述目标色度块进行帧内预测,得到所述第一预测块。
在本申请的一些实施例中,所述基于神经网络的色度预测模式包括按照第一训练策略训练神经网络得到的多套网络权重;所述选择单元520具体用于:
使用所述多套网络权重分别对所述目标色度块进行帧内预测,得到多个预测块;
在所述多个预测块中选择出率失真代价最小的预测块;
将所述率失真代价最小的预测块确定为所述第一预测块。
在本申请的一些实施例中,所述第一训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
获取训练集,所述训练集包括多个训练样本;
在所述训练集上训练神经网络,若神经网络收敛,将神经网络在所述训练集上进行测试,得到所述多个训练样本的测试结果;基于所述多个训练样本的测试结果,按照由大到小以及有小到大的顺序分别将所述多个训练样本进行重排序,得到两个子训练集,所述两个子训练集作为训练集重新训练神经网络,直至得到多个子训练集,所述多个子训练集的数量和所述多套网络参数的数量相等;
在所述多个子训练集上训练神经网络,得到所述多套网络参数。
在本申请的一些实施例中,所述测试结果包括以下中的至少一项:峰值信噪比PSNR、绝对误差和SAD、或哈达玛变换后的绝对误差和SATD。
在本申请的一些实施例中,所述最优预测模式为所述基于神经网络的色度预测模式,所述第二处理单元540具体用于:
对所述目标残差块、所述允许标识、所述控制标识以及索引标识进行编码,得到所述码流,所述索引标识用于标识使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测时使用的目标网络权重的索引,所述多套网络权重包括所述目标网络权重。
在本申请的一些实施例中,所述目标图像块包括目标亮度块,所述基于神经网络的色度预测模式包括按照第二训练策略训练神经网络得到的多套网络权重;所述选择单元520具体用于:
确定所述目标亮度块使用的目标亮度预测模式;
使用所述多套网络权重中与所述目标亮度预测模式对应的网络权重,对所述目标色度块进行帧内预测,得到所述第一预测块。
在本申请的一些实施例中,所述第二训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
以训练集中的亮度块所选中的多种传统预测模式作为依据,将所述训练集划分为所述多种传统预测模式分别对应的多类训练集;
在所述多类训练集上训练神经网络,得到所述多套网络权重。
在本申请的一些实施例中,所述多种传统预测模式包括以下中的多种类型;
平面planar或矩阵加权帧内预测MIP模式、DC模式、角度模式以及宽角度模式。
在本申请的一些实施例中,所述选择单元520在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式之前,还用于:
根据所述目标色度块的大小,确定是否可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
在本申请的一些实施例中,所述选择单元520具体用于:
若所述目标色度块的宽度为4、8或大于等于16,或者,若所述目标色度块的高度为4、8或大于等于16,确定可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
图12是本申请实施例的解码器600的示意性框图。
如图12所示,所述解码器600可包括:
解析单元610,用于对码流解析获取目标残差块、允许标识以及控制标识,所述允许标识用于标识是否允许使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测;
第一处理单元620,用于:若所述允许标识指示可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,且所述控制标识指示使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块;
第二处理单元630,用于基于所述目标残差块和所述目标预测块,得到目标图像帧。
在本申请的一些实施例中,所述第一处理单元620具体用于:
以所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部分为输入,对所述目标色度块进行预测,得到所述目标预测块。
在本申请的一些实施例中,所述基于神经网络的色度预测模式具有一套网络权重;所述第一处理单元620具体用于:
使用所述一套网络权重对所述目标色度块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述基于神经网络的色度预测模式包括按照第一训练策略训练神经网络得到的多套网络权重;所述第一处理单元620具体用于:
对所述码流解析,得到索引标识,所述索引标识用于标识使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测时使用的目标网络权重的索引,所述多套网络权重包括所述目标网络权重;
使用所述索引标识指示的目标网络权重对所述目标色度块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述第一训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
获取训练集,所述训练集包括多个训练样本;
在所述训练集上训练神经网络,若神经网络收敛,将神经网络在所述训练集上进行测试,得到所述多个训练样本的测试结果;基于所述多个训练样本的测试结果,按照由大到小以及有小到大的顺序分别将所述多个训练样本进行重排序,得到两个子训练集,所述两个子训练集作为训练集重新训练神经网络,直至得到多个子训练集,所述多个子训练集的数量和所述多套网络参数的数量相等;
在所述多个子训练集上训练神经网络,得到所述多套网络参数。
在本申请的一些实施例中,所述测试结果包括以下中的至少一项:峰值信噪比PSNR、绝对误差和SAD、或哈达玛变换后的绝对误差和SATD。
在本申请的一些实施例中,所述目标图像块包括目标亮度块,所述基于神经网络的色度预测模式包括按照第二训练策略训练神经网络得到的多套网络权重;所述第一处理单元620具体用于:
确定所述目标亮度块使用的目标亮度预测模式;
使用所述多套网络权重中与所述目标亮度预测模式对应的网络权重,对所述目标色度块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述第二训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
基于色度块可使用的多类亮度预测模式,获取所述多类亮度预测模式分别对应的多类训练集,所述多类亮度预测模式包括所述目标亮度预测模式;
在所述多类训练集上训练神经网络,得到所述多套网络权重。
在本申请的一些实施例中,所述多类亮度预测模式包括以下中的多种类型;
平面planar或矩阵加权帧内预测MIP模式、DC模式、角度模式以及宽角度模式。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图11所示的编码器500可以对应于执行本申请实施例的方法300中的相应主体,即编码器500中的各个单元的前述和其它操作和/或功能分别为了实现方法300等各个方法中的相应流程。图12所示的解码器600可以对应于执行本申请实施例的方法400中的相应主体,并且解码器600中的各个单元的前述和其它操作和/或功能分别为了实现方法400等各个方法中的相应流程。
还应当理解,本申请实施例涉及的编码器500或解码器600中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该编码器500或解码器600也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造本申请实施例涉及的编码器500或解码器600,以及来实现本申请实施例的编码方法或解码方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于电子设备中,并在其中运行,来实现本申请实施例的相应方法。
换言之,上文涉及的单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过软硬件结合的形式实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件组合执行完成。可选地,软件可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图13是本申请实施例提供的电子设备700的示意结构图。
如图13所示,该电子设备700至少包括处理器710以及计算机可读存储介质720。其中,处理器710以及计算机可读存储介质720可通过总线或者其它方式连接。计算机可读存储介质720用于存储计算机程序721,计算机程序721包括计算机指令,处理器710用于执行计算机可读存储介质720存储的计算机指令。处理器710是电子设备700的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
作为示例,处理器710也可称为中央处理器(CentralProcessingUnit,CPU)。处理器710可以包括但不限于:通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
作为示例,计算机可读存储介质720可以是高速RAM存储器,也可以是非不稳定的存储器(Non-VolatileMemory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器710的计算机可读存储介质。具体而言,计算机可读存储介质720包括但不限于:易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在一种实现方式中,该电子设备700可以是本申请实施例涉及的编码端、编码器或编码框架;该计算机可读存储介质720中存储有第一计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第一计算机指令,以实现本申请实施例提供的编码方法中的相应步骤;换言之,计算机可读存储介质720中的第一计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
在一种实现方式中,该电子设备700可以是本申请实施例涉及的解码端、解码器或解码框架;该计算机可读存储介质720中存储有第二计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第二计算机指令,以实现本申请实施例提供的解码方法中的相应步骤;换言之,计算机可读存储介质720中的第二计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
根据本申请的另一方面,本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是电子设备700中的记忆设备,用于存放程序和数据。例如,计算机可读存储介质720。可以理解的是,此处的计算机可读存储介质720既可以包括电子设备700中的内置存储介质,当然也可以包括电子设备700所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了电子设备700的操作系统。并且,在该存储空间中还存放了适于被处理器710加载并执行的一条或多条的计算机指令,这些计算机指令可以是一个或多个的计算机程序721(包括程序代码)。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。例如,计算机程序721。此时,数据处理设备700可以是计算机,处理器710从计算机可读存储介质720读取该计算机指令,处理器710执行该计算机指令,使得该计算机执行上述各种可选方式中提供的编码方法或解码方法。
换言之,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地运行本申请实施例的流程或实现本申请实施例的功能。该计算机可以是通用计算机、专用计算机、计算机网络、 或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质进行传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元以及流程步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
最后需要说明的是,以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (26)

  1. 一种编码方法,其特征在于,包括:
    将目标图帧划分为多个图像块,所述多个图像块中的目标图像块包括目标色度块;
    若可使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;
    基于利用所述最优预测模式得到的目标预测块,得到目标残差块;
    对所述目标残差块、允许标识以及控制标识进行编码,得到码流,所述允许标识用于标识是否允许使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
  2. 根据权利要求1所述的方法,其特征在于,所述在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式,包括:
    使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块;
    使用所述传统预测模式对所述目标色度块进行帧内预测,以得到第二预测块;
    若所述第一预测块的率失真代价低于所述第二预测块的率失真代价,将所述基于神经网络的色度预测模式确定为所述最优预测模式;若所述第一预测块的率失真代价高于所述第二预测块的率失真代价,将所述传统预测模式确定为所述最优预测模式。
  3. 根据权利要求2所述的方法,其特征在于,所述目标图像块包括目标亮度块,所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块,包括:
    以所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部分为输入,对所述目标色度块进行预测,得到所述第一预测块。
  4. 根据权利要求2所述的方法,其特征在于,所述基于神经网络的色度预测模式具有一套网络权重;所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块,包括:
    使用所述一套网络权重对所述目标色度块进行帧内预测,得到所述第一预测块。
  5. 根据权利要求2所述的方法,其特征在于,所述基于神经网络的色度预测模式包括按照第一训练策略训练神经网络得到的多套网络权重;所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块,包括:
    使用所述多套网络权重分别对所述目标色度块进行帧内预测,得到多个预测块;
    在所述多个预测块中选择出率失真代价最小的预测块;
    将所述率失真代价最小的预测块确定为所述第一预测块。
  6. 根据权利要求5所述的方法,其特征在于,所述第一训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
    获取训练集,所述训练集包括多个训练样本;
    在所述训练集上训练神经网络,若神经网络收敛,将神经网络在所述训练集上进行测试,得到所述多个训练样本的测试结果;基于所述多个训练样本的测试结果,按照由大到小以及有小到大的顺序分别将所述多个训练样本进行重排序,得到两个子训练集,所述两个子训练集作为训练集重新训练神经网络,直至得到多个子训练集,所述多个子训练集的数量和所述多套网络参数的数量相等;
    在所述多个子训练集上训练神经网络,得到所述多套网络参数。
  7. 根据权利要求6所述的方法,其特征在于,所述测试结果包括以下中的至少一项:峰值信噪比PSNR、绝对误差和SAD、或哈达玛变换后的绝对误差和SATD。
  8. 根据权利要求5所述的方法,其特征在于,所述最优预测模式为所述基于神经网络的色度预测模式,所述对所述目标残差块、允许标识以及控制标识进行编码,得到码流,包括:
    对所述目标残差块、所述允许标识、所述控制标识以及索引标识进行编码,得到所述码流,所述索引标识用于标识使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测时使用的目标网络权重的索引,所述多套网络权重包括所述目标网络权重。
  9. 根据权利要求2所述的方法,其特征在于,所述目标图像块包括目标亮度块,所述基于神经网络的色度预测模式包括按照第二训练策略训练神经网络得到的多套网络权重;所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到第一预测块,包括:
    确定所述目标亮度块使用的目标亮度预测模式;
    使用所述多套网络权重中与所述目标亮度预测模式对应的网络权重,对所述目标色度块进行帧内预测,得到所述第一预测块。
  10. 根据权利要求9所述的方法,其特征在于,所述第二训练策略指按照以下方式训练神经网络, 得到所述多个网络权重:
    以训练集中的亮度块所选中的多种传统预测模式作为依据,将所述训练集划分为所述多种传统预测模式分别对应的多类训练集;
    在所述多类训练集上训练神经网络,得到所述多套网络权重。
  11. 根据权利要求10所述的方法,其特征在于,所述多种传统预测模式包括以下中的多种类型;
    平面planar或矩阵加权帧内预测MIP模式、DC模式、角度模式以及宽角度模式。
  12. 根据权利要求1至11中任一项所述的方法,其特征在于,所述在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式之前,所述方法还包括:
    根据所述目标色度块的大小,确定是否可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述目标色度块的大小,确定是否可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,包括:
    若所述目标色度块的宽度为4、8或大于等于16,或者,若所述目标色度块的高度为4、8或大于等于16,确定可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
  14. 一种解码方法,其特征在于,包括:
    对码流解析获取目标残差块、允许标识以及控制标识,所述允许标识用于标识是否允许使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测;
    若所述允许标识指示可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,且所述控制标识指示使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块;
    基于所述目标残差块和所述目标预测块,得到目标图像帧。
  15. 根据权利要求14所述的方法,其特征在于,所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块,包括:
    以所述目标色度块相邻的重建部分,所述目标亮度块的重建部分和所述目标亮度块相邻的重建部分为输入,对所述目标色度块进行预测,得到所述目标预测块。
  16. 根据权利要求14所述的方法,其特征在于,所述基于神经网络的色度预测模式具有一套网络权重;所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块,包括:
    使用所述一套网络权重对所述目标色度块进行帧内预测,得到所述目标预测块。
  17. 根据权利要求14所述的方法,其特征在于,所述基于神经网络的色度预测模式包括按照第一训练策略训练神经网络得到的多套网络权重;所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块,包括:
    对所述码流解析,得到索引标识,所述索引标识用于标识使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测时使用的目标网络权重的索引,所述多套网络权重包括所述目标网络权重;
    使用所述索引标识指示的目标网络权重对所述目标色度块进行帧内预测,得到所述目标预测块。
  18. 根据权利要求17所述的方法,其特征在于,所述第一训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
    获取训练集,所述训练集包括多个训练样本;
    在所述训练集上训练神经网络,若神经网络收敛,将神经网络在所述训练集上进行测试,得到所述多个训练样本的测试结果;基于所述多个训练样本的测试结果,按照由大到小以及有小到大的顺序分别将所述多个训练样本进行重排序,得到两个子训练集,所述两个子训练集作为训练集重新训练神经网络,直至得到多个子训练集,所述多个子训练集的数量和所述多套网络参数的数量相等;
    在所述多个子训练集上训练神经网络,得到所述多套网络参数。
  19. 根据权利要求18所述的方法,其特征在于,所述测试结果包括以下中的至少一项:峰值信噪比PSNR、绝对误差和SAD、或哈达玛变换后的绝对误差和SATD。
  20. 根据权利要求14所述的方法,其特征在于,所述目标图像块包括目标亮度块,所述基于神经网络的色度预测模式包括按照第二训练策略训练神经网络得到的多套网络权重;所述使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块,包括:
    确定所述目标亮度块使用的目标亮度预测模式;
    使用所述多套网络权重中与所述目标亮度预测模式对应的网络权重,对所述目标色度块进行帧内 预测,得到所述目标预测块。
  21. 根据权利要求20所述的方法,其特征在于,所述第二训练策略指按照以下方式训练神经网络,得到所述多个网络权重:
    基于色度块可使用的多类亮度预测模式,获取所述多类亮度预测模式分别对应的多类训练集,所述多类亮度预测模式包括所述目标亮度预测模式;
    在所述多类训练集上训练神经网络,得到所述多套网络权重。
  22. 根据权利要求21所述的方法,其特征在于,所述多类亮度预测模式包括以下中的多种类型;
    平面planar或矩阵加权帧内预测MIP模式、DC模式、角度模式以及宽角度模式。
  23. 一种编码器,其特征在于,包括:
    划分单元,用于将目标图帧划分为多个图像块,所述多个图像块中的多个图像块包括目标色度块;
    选择单元,用于:若可使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,在基于神经网络的色度预测模式和传统预测模式中选择最优预测模式;
    第一处理单元,用于基于利用所述最优预测模式得到的目标预测块,得到目标残差块;
    第二处理单元,用于对所述目标残差块、允许标识以及控制标识进行编码,得到码流,所述允许标识用于标识是否允许使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测。
  24. 一种解码器,其特征在于,包括:
    解析单元,用于对码流解析获取目标残差块、允许标识以及控制标识,所述允许标识用于标识是否允许使用基于神经网络的色度预测模式对所述目标色度块进行帧内预测,所述控制标识用于标识是否使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测;
    第一处理单元,用于:若所述允许标识指示可使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,且所述控制标识指示使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,使用所述基于神经网络的色度预测模式对所述目标色度块进行帧内预测,得到目标预测块;
    第二处理单元,与基于所述目标残差块和所述目标预测块,得到目标图像帧。
  25. 一种电子设备,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至13中任一项所述的编码方法,或实现如权利要求14至22中任一项所述的解码方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机指令,所述计算机指令适于由所述处理器加载并执行如权利要求1至13任一项所述的编码方法或如权利要求14至22中任一项所述的解码方法。
PCT/CN2020/133597 2020-12-03 2020-12-03 编码方法、解码方法、编码器、解码器以及电子设备 WO2022116085A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/133597 WO2022116085A1 (zh) 2020-12-03 2020-12-03 编码方法、解码方法、编码器、解码器以及电子设备
CN202080065143.4A CN114868386B (zh) 2020-12-03 2020-12-03 编码方法、解码方法、编码器、解码器以及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/133597 WO2022116085A1 (zh) 2020-12-03 2020-12-03 编码方法、解码方法、编码器、解码器以及电子设备

Publications (1)

Publication Number Publication Date
WO2022116085A1 true WO2022116085A1 (zh) 2022-06-09

Family

ID=81853780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133597 WO2022116085A1 (zh) 2020-12-03 2020-12-03 编码方法、解码方法、编码器、解码器以及电子设备

Country Status (2)

Country Link
CN (1) CN114868386B (zh)
WO (1) WO2022116085A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024016156A1 (zh) * 2022-07-19 2024-01-25 Oppo广东移动通信有限公司 滤波方法、编码器、解码器、码流以及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024137862A1 (en) * 2022-12-22 2024-06-27 Bytedance Inc. Method, apparatus, and medium for video processing
CN115988223A (zh) * 2022-12-26 2023-04-18 阿里巴巴(中国)有限公司 帧内预测模式的确定、图像编码以及图像解码方法
CN115834897B (zh) * 2023-01-28 2023-07-25 深圳传音控股股份有限公司 处理方法、处理设备及存储介质
CN115955574B (zh) * 2023-03-10 2023-07-04 宁波康达凯能医疗科技有限公司 一种基于权重网络的帧内图像编码方法、装置及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
CN110519595A (zh) * 2019-08-08 2019-11-29 浙江大学 一种基于频域量化损失估计的jpeg压缩图像复原方法
CN110602491A (zh) * 2019-08-30 2019-12-20 中国科学院深圳先进技术研究院 帧内色度预测方法、装置、设备及视频编解码系统
CN110677644A (zh) * 2018-07-03 2020-01-10 北京大学 一种视频编码、解码方法及视频编码帧内预测器
US20200252654A1 (en) * 2017-10-12 2020-08-06 Mediatek Inc. Method and Apparatus of Neural Network for Video Coding
US10771807B1 (en) * 2019-03-28 2020-09-08 Wipro Limited System and method for compressing video using deep learning

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5026092B2 (ja) * 2007-01-12 2012-09-12 三菱電機株式会社 動画像復号装置および動画像復号方法
JP5393573B2 (ja) * 2010-04-08 2014-01-22 株式会社Nttドコモ 動画像予測符号化装置、動画像予測復号装置、動画像予測符号化方法、動画像予測復号方法、動画像予測符号化プログラム、及び動画像予測復号プログラム
WO2019115865A1 (en) * 2017-12-13 2019-06-20 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
US20210056390A1 (en) * 2018-01-26 2021-02-25 Mediatek Inc. Method and Apparatus of Neural Networks with Grouping for Video Coding
US20190289327A1 (en) * 2018-03-13 2019-09-19 Mediatek Inc. Method and Apparatus of Loop Filtering for VR360 Videos
CN108900838B (zh) * 2018-06-08 2021-10-15 宁波大学 一种基于hdr-vdp-2失真准则的率失真优化方法
US10499081B1 (en) * 2018-06-19 2019-12-03 Sony Interactive Entertainment Inc. Neural network powered codec
CN110971897B (zh) * 2018-09-28 2021-06-29 杭州海康威视数字技术股份有限公司 色度分量的帧内预测模式的编码、解码方法、设备和系统
US10999606B2 (en) * 2019-01-08 2021-05-04 Intel Corporation Method and system of neural network loop filtering for video coding
CN111294602B (zh) * 2019-03-14 2022-07-08 北京达佳互联信息技术有限公司 一种帧内预测模式编解码方法和装置及设备
CN110991346A (zh) * 2019-12-04 2020-04-10 厦门市美亚柏科信息股份有限公司 一种疑似吸毒人员识别的方法、装置及存储介质
CN111105035A (zh) * 2019-12-24 2020-05-05 西安电子科技大学 基于稀疏学习与遗传算法相结合的神经网络剪枝方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
US20200252654A1 (en) * 2017-10-12 2020-08-06 Mediatek Inc. Method and Apparatus of Neural Network for Video Coding
CN110677644A (zh) * 2018-07-03 2020-01-10 北京大学 一种视频编码、解码方法及视频编码帧内预测器
US10771807B1 (en) * 2019-03-28 2020-09-08 Wipro Limited System and method for compressing video using deep learning
CN110519595A (zh) * 2019-08-08 2019-11-29 浙江大学 一种基于频域量化损失估计的jpeg压缩图像复原方法
CN110602491A (zh) * 2019-08-30 2019-12-20 中国科学院深圳先进技术研究院 帧内色度预测方法、装置、设备及视频编解码系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024016156A1 (zh) * 2022-07-19 2024-01-25 Oppo广东移动通信有限公司 滤波方法、编码器、解码器、码流以及存储介质

Also Published As

Publication number Publication date
CN114868386A (zh) 2022-08-05
CN114868386B (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
WO2022116085A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
WO2021185008A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
WO2022052533A1 (zh) 编码方法、解码方法、编码器、解码器以及编码系统
WO2022155923A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
US20220116664A1 (en) Loop filtering method and device
CN113068026B (zh) 编码预测方法、装置及计算机存储介质
US20240107015A1 (en) Encoding method, decoding method, code stream, encoder, decoder and storage medium
WO2022227062A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
JP2022548204A (ja) 変換スキップモードで映像データを符号化するための方法及び装置
US12132905B2 (en) Decoding method and coding method for unmatched pixel, decoder, and encoder
TW202341739A (zh) 圖像解碼方法、圖像編碼方法及相應的裝置
WO2022193394A1 (zh) 系数的编解码方法、编码器、解码器及计算机存储介质
CN113395520B (zh) 解码预测方法、装置及计算机存储介质
WO2023123398A1 (zh) 滤波方法、滤波装置以及电子设备
WO2023193254A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2022165763A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
WO2024212086A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023193253A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2024145790A1 (zh) 解码方法、编码方法、解码器和编码器
WO2024007116A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2022188239A1 (zh) 系数的编解码方法、编码器、解码器及计算机存储介质
WO2024145791A1 (zh) 解码方法、编码方法、解码器和编码器
US20240275955A1 (en) Intra prediction method and decoder
WO2023197181A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2024197744A1 (zh) 解码方法、编码方法、解码器和编码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20963932

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20963932

Country of ref document: EP

Kind code of ref document: A1