WO2022155923A1 - 编码方法、解码方法、编码器、解码器以及电子设备 - Google Patents

编码方法、解码方法、编码器、解码器以及电子设备 Download PDF

Info

Publication number
WO2022155923A1
WO2022155923A1 PCT/CN2021/073410 CN2021073410W WO2022155923A1 WO 2022155923 A1 WO2022155923 A1 WO 2022155923A1 CN 2021073410 W CN2021073410 W CN 2021073410W WO 2022155923 A1 WO2022155923 A1 WO 2022155923A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
target
prediction
prediction mode
network weights
Prior art date
Application number
PCT/CN2021/073410
Other languages
English (en)
French (fr)
Inventor
戴震宇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/073410 priority Critical patent/WO2022155923A1/zh
Priority to CN202180083611.5A priority patent/CN116686288A/zh
Publication of WO2022155923A1 publication Critical patent/WO2022155923A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction

Definitions

  • the embodiments of the present application relate to the technical field of image encoding and decoding, and more particularly, to an encoding method, a decoding method, an encoder, a decoder, and an electronic device.
  • Digital video compression technology mainly compresses huge digital video and video data to facilitate transmission and storage.
  • Digital video compression standards can realize video decompression, it is still necessary to pursue better digital video compression technology to improve compression performance. .
  • Embodiments of the present application provide an encoding method, a decoding method, an encoder, a decoder, and an electronic device, which can improve compression performance.
  • an encoding method comprising:
  • the permission flag is used to identify whether to allow intra-frame prediction of image blocks in the target image sequence using a first prediction mode, where the first prediction mode refers to a network weight based on online training a prediction mode for intra-frame prediction of image blocks;
  • a decoding method comprising:
  • the permission flag is used to identify whether to use the first prediction mode to perform intra-frame intra-frame execution on the image block in the target image sequence.
  • the first prediction mode refers to a prediction mode for performing intra-frame prediction on image blocks based on network weights trained online;
  • the target image frame is obtained.
  • an embodiment of the present application provides an encoder for executing the method in the first aspect or each of its implementations.
  • the encoder includes a functional unit for executing the method in the above-mentioned first aspect or each of its implementations.
  • an embodiment of the present application provides a decoder for executing the method in the second aspect or each of its implementations.
  • the decoder includes functional units for performing the methods in the second aspect or the respective implementations thereof.
  • an electronic device including:
  • a processor adapted to implement computer instructions
  • a computer-readable storage medium storing computer instructions adapted to be loaded by a processor and executed to perform the method in any one of the above-mentioned first to second aspects or implementations thereof.
  • an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are read and executed by a processor of a computer device, the computer device is made to execute the above-mentioned first step.
  • an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the method in any one of the above-mentioned first to second aspects or implementations thereof .
  • the first prediction mode and the traditional prediction mode can be selected The optimal prediction mode; and then the target prediction block is obtained based on the optimal prediction mode, and further, the compression performance can be improved.
  • FIG. 1 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of specific directions of 33 angle prediction modes provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a MIP mode provided by an embodiment of the present application.
  • FIG. 4 is an example of a block diagram of a decoder-side reconstruction process provided by an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
  • FIG. 6 is another schematic block diagram of an encoding framework provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of an encoding method provided by an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a network structure for training network weights provided by an embodiment of the present application.
  • FIG. 9 is another schematic block diagram of a decoding framework provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a decoding method provided by an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of an encoder according to an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of a decoder according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the solutions provided by the embodiments of the present application can be applied to the technical field of digital video coding, for example, the field of image coding and decoding, the field of video coding and decoding, the field of hardware video coding and decoding, the field of dedicated circuit video coding and decoding, and the field of real-time video coding and decoding.
  • the solutions provided in the embodiments of the present application may be combined with the Audio Video Coding Standard (AVS), the second-generation AVS standard (AVS2), or the third-generation AVS standard (AVS3).
  • AVS Audio Video Coding Standard
  • AVS2 second-generation AVS standard
  • AVS3 third-generation AVS standard
  • H.264/Audio Video Coding AVC
  • H.265/High Efficiency Video Coding HEVC
  • H.266/Versatile Video Coding Versatile Video Coding, VVC
  • the solutions provided by the embodiments of the present application can be used to perform lossy compression (lossy compression) on images, and can also be used to perform lossless compression (lossless compression) on images.
  • the lossless compression may be visually lossless compression (visually lossless compression) or mathematically lossless compression (mathematically lossless compression).
  • the encoder reads the pixels of unequal luminance components and the pixels of chrominance components for the original video sequences of different color formats, that is, the encoder reads a black and white image or a color image, and then targets the black and white image. or color images to encode.
  • the black and white image may include pixels of luminance component
  • the color image may include pixels of chrominance component
  • the color image may further include pixels of luminance component.
  • the color format of the original video sequence may be a luminance chrominance (YCbCr, YUV) format or a red-green-blue (Red-Green-Blue, RGB) format, or the like.
  • Y represents luminance (Luma)
  • Cb (U) represents blue color difference
  • Cr (V) represents red color difference
  • U and V represent chroma (Chroma) for describing color difference information.
  • the encoder reads a black-and-white or color image, it divides it into block data and encodes the block data.
  • the block data can be a coding tree unit (Coding Tree Unit, CTU) or a coding unit block (Coding Unit, CU).
  • a coding tree unit can be further divided into several CUs, and the CU can be a rectangular block or a square block. . That is, the encoder can encode based on CTU or CU.
  • Intra-frame prediction only refers to the information of the same frame image to predict the pixel information in the current divided block to eliminate spatial redundancy
  • inter-frame prediction can refer to the image information of different frames, and use motion estimation to search for the motion vector that best matches the current divided block Information, used to eliminate temporal redundancy
  • transformation converts the predicted image block to the frequency domain, energy redistribution, combined with quantization can remove information that is not sensitive to the human eye to eliminate visual redundancy
  • entropy coding can be based on the current context.
  • the model and probabilistic information of the binary code stream eliminate character redundancy.
  • FIG. 1 is a schematic block diagram of an encoding framework 100 provided by an embodiment of the present application.
  • the encoding framework 100 may include an intra prediction unit 180 , a residual unit 110 , a transform and quantization unit 120 , an entropy encoding unit 130 , an inverse transform and inverse quantization unit 140 , and a loop filtering unit 150 .
  • the encoding framework 100 may further include a decoded image buffer unit 160 and/or an inter-frame prediction unit 170 .
  • This coding framework 100 may also be referred to as a mixed-frame coding mode.
  • intra-prediction unit 180 or inter-prediction unit 170 may predict an image block to be encoded to output a predicted block.
  • the residual unit 110 may calculate a residual block, that is, a difference between the predicted block and the to-be-encoded image block, based on the predicted block and the to-be-encoded image block.
  • the residual block is transformed and quantized by the transform and quantization unit 120 to remove information insensitive to human eyes, so as to eliminate visual redundancy.
  • the residual block before transformation and quantization by the transform and quantization unit 120 may be referred to as a time domain residual block
  • the time domain residual block after transformation and quantization by the transform and quantization unit 120 may be referred to as a frequency residual block.
  • the entropy encoding unit 130 may output a code stream based on the transform and quantization coefficients. For example, the entropy encoding unit 130 may eliminate character redundancy according to the target context model and the probability information of the binary code stream. For example, the entropy coding unit 130 may be used for context-based adaptive binary arithmetic entropy coding (CABAC).
  • CABAC context-based adaptive binary arithmetic entropy coding
  • the entropy encoding unit 130 may also be referred to as a header information encoding unit.
  • the image block to be encoded may also be referred to as an original image block or a target image block
  • a prediction block may also be referred to as a predicted image block or an image prediction block, and may also be referred to as a prediction signal or prediction information
  • Reconstruction blocks may also be referred to as reconstructed image blocks or image reconstruction blocks, and may also be referred to as reconstruction signals or reconstruction information.
  • the image block to be encoded may also be referred to as an encoding block or an encoded image block
  • the image block to be encoded may also be referred to as a decoding block or a decoded image block.
  • the image block to be encoded may be a CTU or a CU.
  • the encoding framework 100 calculates the residual between the prediction block and the image block to be encoded to obtain the residual block, and transmits the residual block to the decoding end through processes such as transformation and quantization. After the decoding end receives and parses the code stream, the residual block is obtained through the steps of inverse transformation and inverse quantization, and the reconstructed block is obtained by superimposing the prediction block predicted by the decoding end on the residual block.
  • the inverse transform and inverse quantization unit 140, the loop filtering unit 150 and the decoded image buffer unit 160 in the encoding framework 100 can be used to form a decoder.
  • the intra-frame prediction unit 180 or the inter-frame prediction unit 170 can predict the to-be-coded image block based on the existing reconstructed block, so as to ensure that the encoding end and the decoding end have the same understanding of the reference frame.
  • the encoder can duplicate the decoder's processing loop, which in turn can produce the same predictions as the decoder.
  • the quantized transform coefficients are inversely transformed and inversely quantized by the inverse transform and inverse quantization unit 140 to replicate the approximate residual block at the decoding end.
  • the in-loop filtering unit 150 can be used to smoothly filter out the effects of blockiness and other effects caused by block-based processing and quantization.
  • the image blocks output by the loop filtering unit 150 may be stored in the decoded image buffer unit 160 for use in prediction of subsequent images.
  • the intra-frame prediction unit 180 can be used for intra-frame prediction, and the intra-frame prediction only refers to the information of the same frame image, and predicts the pixel information in the image block to be encoded, so as to eliminate spatial redundancy;
  • the frame used for the intra-frame prediction can be an I frame .
  • the image block to be coded can refer to the upper left image block, the upper image block and the left image block can be used as reference information to predict the image block to be coded, and the image block to be coded can be predicted.
  • the block is used as the reference information for the next image block, so the whole image can be predicted.
  • every 4 pixels of each image frame of the digital video is composed of 4 Y components and 2 UV components, and the encoding framework 100 can
  • the Y component (ie, the luma block) and the UV component (ie, the chrominance block) are encoded separately.
  • the decoding end can also perform corresponding decoding according to the format.
  • the inter-frame prediction unit 170 can be used for inter-frame prediction, and the inter-frame prediction can refer to image information of different frames, and use motion estimation to search for motion vector information that best matches the image block to be encoded, so as to eliminate temporal redundancy;
  • the frames may be P frames and/or B frames, where P frames refer to forward predicted frames and B frames refer to bidirectional predicted frames.
  • the intra-frame prediction can use the angular prediction mode and the non-angle prediction mode to predict the to-be-coded image block to obtain the predicted block.
  • the optimal prediction mode of the image block, and the prediction mode is transmitted to the decoding end through the code stream.
  • the decoding end parses out the prediction mode, predicts the prediction block of the target decoding block, and superimposes the temporal residual block obtained through code stream transmission to obtain the reconstructed block.
  • the non-angle mode remains relatively stable, with average mode and plane mode; the angle mode continues to increase with the evolution of digital video codec standards.
  • the H.264/AVC standard has only 8 angle prediction modes and 1 non-angle prediction mode; H.265/HEVC is extended to 33 angle prediction modes and 2 non-angle prediction modes. model.
  • the intra prediction mode is further expanded, and there are 67 traditional prediction modes and non-traditional prediction modes for luma blocks.
  • Non-traditional prediction modes may include Matrix weighted intra-frame prediction (MIP) modes.
  • the conventional prediction modes include: a planar mode of mode number 0, a DC mode of mode number 1, and angular prediction modes of mode number 2 to mode number 66.
  • FIG. 2 is a schematic diagram of specific directions of 33 angle prediction modes provided by an embodiment of the present application. As shown in FIG.
  • the 33 angle prediction modes are divided into a horizontal type mode and a vertical type mode, and the horizontal type mode includes H+32 (mode No. 2) to H-32 (mode No. 17), vertical type modes include V-32 (mode No. 18) to V+32 (mode No. 34).
  • V0 (mode number 26) and H0 (mode number 10) represent the vertical and horizontal directions respectively, and the prediction directions of the remaining angle prediction modes can be regarded as an angular offset in the vertical or horizontal direction.
  • VVC's reference software test platform VVC TEST MODEL, VTM
  • VTM cross component linear model prediction
  • CCLM cross component linear model prediction
  • MIP mode is currently unique to VVC, while CCLM mode also exists in other advanced standards, such as AV1's Chroma from Luma (CfL) mode and AVS3's Two Step Cross-Component prediction mode (Two Step Cross- component Prediction Mode, TSCPM).
  • CfL Chroma from Luma
  • TSCPM Two Step Cross-Component prediction mode
  • FIG. 3 is a schematic flowchart of a MIP mode provided by an embodiment of the present application.
  • the upper K rows and the K columns in the upper left are used as the input to reconstruct the pixel points, and the fully connected neural network is used to predict the image block to be coded, and then the image to be coded is obtained.
  • the predicted pixel point of the block that is, the predicted block of the image block to be encoded.
  • Reconstructed pixel points may also be referred to as reconstructed pixel values or reconstructed pixel points, and predicted pixel points may also be referred to as predicted pixel values.
  • the reference points around the image block to be encoded are used as inputs, and the fully connected neural network is used to predict the image block to be encoded. Then, the prediction block of the image block to be encoded is obtained.
  • the reference points around the image block to be encoded may be composed of K rows of upper reference rows with a width of N+K and K columns of left reference rows with a height of M around the image block to be encoded.
  • Rate-distortion screening is required for multiple sets of parameters of the fully connected neural network, that is, multiple sets of network weights, the optimal set of network weights is selected for prediction, and the index of this set of parameters is indexed into code stream.
  • Network weights may include parameters such as matrices and biases.
  • the MIP mode is derived from a prediction mode based on a neural network, specifically from an intra-frame prediction mode based on a fully connected neural network.
  • the prediction mode based on neural network refers to using neural network to perform intra-frame prediction on image blocks.
  • the neural network-based prediction mode may include a nonlinear neural network-based prediction mode or a linear network-based prediction mode.
  • one or more sets of network weights are trained based on the pre-prepared training set.
  • one or more sets of network weights that have been pre-trained are read.
  • the MIP mode Compared with the prediction mode based on neural network, the MIP mode has undergone many simplifications including network parameters and the number of input points, and finally completes the prediction in the form of a vector multiplied by a matrix.
  • MIP mode for an image block to be encoded with a width of N and a height of M, the MIP mode selects W reconstructed pixels in the upper row of the block and H reconstructed pixels in the left column as input. If the pixels at these locations have not been reconstructed, they can be processed like traditional prediction methods.
  • the prediction value generated by the MIP mode is mainly based on three steps, namely, the mean value of reference pixels, matrix-vector multiplication, and linear interpolation upsampling.
  • MIP mode works on blocks of 4x4 to 32x32 size.
  • a rectangular block if the short side of the rectangle is 4, it will be pre-trained from 16 sets of 16-column and 4-row matrices and biases (ie network weights) in Select the optimal; if the short side length of the rectangle is 8, the optimal selection will be made from 8 pre-trained matrices and offsets with 16 columns and 8 rows; if the short side length of the rectangle is 16, it will be selected from the pre-trained matrix.
  • the above-mentioned multiple sets of matrices and biases corresponding to blocks of a specific size can be obtained by combining the network weights of multiple trained neural networks.
  • FIG. 4 is an example of a block diagram of decoder-side reconstruction process provided by an embodiment of the present application.
  • the decoding end can first perform the mode list derivation process. Specifically, firstly, the reference point belongs to the neural network including the hidden layer and the output layer, and the sorted possible mode list (sort mode probability) is output. list), and then based on the prediction index obtained by parsing the code stream, the prediction mode to be used is selected from the mode list (mode list) determined based on the list of possible modes, and then the prediction of the image block is performed based on the prediction mode used.
  • the reference point can be output first to a neural network including hidden layers 1 to 3 and an output layer, and the neural network outputs a prediction block.
  • the neural network in the embodiment of the present application may be composed of several layers of fully connected hidden layers and nonlinear activation functions.
  • the number of hidden layers included in FIG. 4 is only an example, and should not be construed as a limitation of the present application.
  • intra-frame prediction based on a convolutional neural network (CNN) or intra-frame prediction based on a recurrent neural network (RNN), etc. may also be used, which is not specifically limited in this embodiment of the present application.
  • FIG. 1 to FIG. 4 are only examples of the present application, and should not be construed as a limitation of the present application.
  • the loop filtering unit 150 in the encoding framework 100 may include a deblocking filter (DBF) and a sample adaptive compensation filter (SAO).
  • DBF deblocking filter
  • SAO sample adaptive compensation filter
  • the coding framework 100 may use a neural network-based loop filtering algorithm to improve video compression efficiency.
  • the coding framework 100 may be a video coding hybrid framework based on a deep learning neural network.
  • a model based on a convolutional neural network may be used to calculate the result of filtering the pixels based on the deblocking filter and the sample adaptive compensation filtering.
  • the network structure of the in-loop filtering unit 150 on the luminance component and the chrominance component may be the same or different. Considering that the luminance component contains more visual information, the luminance component can also be used to guide the filtering of the chrominance component, so as to improve the reconstruction quality of the chrominance component.
  • FIG. 5 is a schematic block diagram of a decoding framework 200 provided by an embodiment of the present application.
  • the decoding framework 200 may include an entropy decoding unit 210, an inverse transform and inverse quantization unit 220, a residual unit 230, an intra prediction unit 240, an inter prediction unit 250, a loop filtering unit 260, and a decoded image buffer. unit 270.
  • the entropy decoding unit 210 After the entropy decoding unit 210 receives and parses the code stream, it obtains the prediction block and the frequency domain residual block. For the frequency domain residual block, the inverse transform and inverse quantization unit 220 performs steps such as inverse transformation and inverse quantization to obtain the time domain residual block. difference block, the residual unit 230 superimposes the prediction block predicted by the intra-frame prediction unit 240 or the inter-frame prediction unit 250 to the temporal residual block after inverse transformation and inverse quantization by the inverse transform and inverse quantization unit 220 to obtain Rebuild blocks. For example, the intra prediction unit 240 or the inter prediction unit 250 may obtain the prediction block by decoding the header information of the code stream.
  • the embodiment of the present application provides an encoding method based on an online training neural network.
  • FIG. 6 is a schematic flowchart of an encoding framework 100-1 provided by an embodiment of the present application.
  • the encoder can use the original image frame to be encoded to generate a training set, and input the training set into the network structure to obtain the network weight of the neural network.
  • the network structure is a predefined structure.
  • the network structure can define multiple groups of network structures according to the predicted luminance or chrominance and the size and shape of the image blocks to be encoded.
  • the network can be a nonlinear network or a linear network. Based on this, the prediction block obtained by using the first prediction mode or the traditional prediction mode is changed, quantized, and entropy encoded into the code stream.
  • the control flag of the first prediction mode and the network weight used by the neural network also need to be written into the code stream.
  • the encoding framework 100-1 may include an online neural network unit for executing the first prediction mode. It should be noted that the encoding framework 100-1 is an extension of the encoding framework 100, that is, other components in the encoding framework 100-1. For the unit, reference may be made to the relevant description in the coding framework 100 , which is not repeated here to avoid repetition.
  • Parameters used when training a neural network include, but are not limited to, parameters such as batch size and hyperparameters such as learning rate or optimizer.
  • hyperparameters are parameters whose values are set before starting the learning process, not parameter data obtained through training. Usually, hyperparameters need to be optimized, and a set of optimal hyperparameters is selected for the learning machine to improve the performance and effect of learning.
  • the batch size is used to define the number of samples taken for a training session, or the batch size is used to define the number of samples taken for each iteration of training.
  • the value of batch size affects the optimization degree and speed of the model, and also directly affects the usage of processor memory.
  • the smaller the processor memory the smaller the value of the batch size.
  • the loss function used in training the neural network includes but is not limited to L1 function, L2 function or discrete cosine transform (DCT for Discrete Cosine Transform, DCT) transformation function, etc.
  • DCT discrete cosine transform
  • the loss function also known as the cost function, is used for parametric estimation of the model.
  • the encoding end trains one or more sets of network weights online, and the one or more sets of network weights are used to construct the first prediction mode.
  • the encoder may quantize the set or sets of network weights.
  • the encoder selects the optimal prediction mode from the first prediction mode and the legacy prediction mode.
  • the encoder encodes the selected mode into the code stream, that is, adds syntax elements.
  • the encoder encodes intra-prediction network weights trained online.
  • FIG. 7 is a schematic flowchart of an encoding method 300 provided by an embodiment of the present application. It should be understood that the encoding method 300 can be performed by the encoding end. For example, it is applied to the encoding frame 100-1 shown in FIG. 6 . For ease of description, the following description takes the encoding end as an example.
  • the encoding method 300 may include:
  • S310 Acquire a target image sequence and a permission flag, where the permission flag is used to identify whether to allow intra-frame prediction of image blocks in the target image sequence by using a first prediction mode, where the first prediction mode refers to an online training-based The prediction mode for intra-frame prediction of image blocks by network weights;
  • S350 Encode the permission flag and the target residual block to obtain a code stream.
  • the first prediction mode and the traditional prediction mode can be selected The optimal prediction mode; and then the target prediction block is obtained based on the optimal prediction mode, and further, the compression performance can be improved.
  • Online training can effectively improve coding performance and reduce the complexity of intra-frame prediction based on neural network.
  • the amount of parameters of the online training network is much smaller than the pre-trained fixed model, and online training can reduce the computational complexity.
  • the intra-frame chroma prediction mode of online prediction is integrated into the VTM-10.0 test software, and the test results obtained are shown in Table 1. The prediction performance of the first prediction mode is described below with reference to Table 1.
  • Class-B is a 1080p test video
  • Class-C is a 480p test video
  • Class-E is a 720p test video.
  • Bjorgard incremental bitrate Negative delta bit rate, BD-rate
  • PSNR Peak Signal to Noise Ratio
  • the code rate decreases and the PSNR increases, which can indicate that the new method has better performance.
  • the PSNR that is, the quality of the video
  • BD-rate can be used to measure the performance of the encoding algorithm.
  • other parameters can also be used to measure the performance of the encoding algorithm to characterize the changes in the bit rate and PSNR of the video obtained by the new method compared to the video obtained by the original method. This is not specifically limited.
  • BDPSNR delta peak signal-to-noise rate
  • the training set used for the online training of the coding end may be reconstructed images.
  • the coding end needs to perform two passes of intra-frame coding, and the reconstructed image obtained by the first pass of coding is used as the training set for online training of the network.
  • Weight, the second-pass encoding uses the prediction mode obtained by online training as an alternative mode to compete with the traditional intra-frame mode, and encodes it.
  • the embodiments of the present application do not specifically limit the network structure used in the first prediction mode.
  • the network structure used in the first prediction mode may be a nonlinear network structure, such as nonlinear network structures such as nonlinear RNN, LSTM, and CNN. , or a linear network in which the predicted value is obtained by weighting several adjacent pixel values.
  • the sequence parameter set may include an enable flag (sps_aip_enabled_flag), which is a sequence-level control switch used to control whether to enable the first prediction mode in the current sequence.
  • the sequence turns on the first prediction mode, and 0 means the first prediction mode is turned off.
  • sps_aip_enabled_flag a sequence-level control switch used to control whether to enable the first prediction mode in the current sequence.
  • the sequence turns on the first prediction mode, and 0 means the first prediction mode is turned off.
  • the encoder can obtain the specific information of the permission flag by querying the configuration file configured by the user. The value, that is, whether the flag is 1 or 0.
  • the permission flag is used to identify permission to use the first prediction mode to perform intra-frame prediction on image blocks in the target image sequence; the method 330 may further include:
  • An optimal prediction mode is selected from the first prediction mode and the traditional prediction mode, and the target residual block is intra-predicted by using the best prediction mode to obtain the target prediction block.
  • the selecting an optimal prediction mode from the first prediction mode and the traditional prediction mode includes:
  • the first prediction mode is determined as the optimal prediction mode; if the rate-distortion cost of the first prediction block is The cost is higher than the rate-distortion cost of the second prediction block, and the legacy prediction mode is determined as the optimal prediction mode.
  • the encoder determines whether the first prediction mode is selected.
  • rate-distortion screening will be performed jointly based on the first prediction mode and the traditional prediction mode. If the cost of the traditional mode is low, the traditional prediction mode is selected; if the cost of the first prediction mode is low, the first prediction mode is selected. The selected mode will be encoded into the code stream for the decoder to read.
  • the parsed prediction mode is the first prediction mode
  • the first prediction mode is used for prediction; when the parsed prediction mode is the traditional mode, the corresponding traditional mode is used for prediction.
  • the prediction mode is selected to select the prediction mode with the least cost among the traditional intra prediction mode and the intra prediction mode trained online, and the cost measurement standard can be determined based on the following formula:
  • R represents the number of bits consumed by the current encoding method
  • D represents the distortion between the reconstructed block and the original block caused by the current encoding method
  • is a variable coefficient
  • the target image block includes a target luminance block
  • the S330 may include:
  • the brightness control flag is used to identify whether to use the first prediction mode to perform intra-frame execution on the target brightness block predict.
  • the encoding end needs to write the brightness control flag intra_aip_flag[x0][y0] in the code stream, and the brightness control flag is an image block level flag to control whether the current brightness block is suitable for the first prediction mode, If the luminance control flag is 1, it means that the current luminance block is applicable to the first prediction mode, and if it is 0, it means that the current luminance block is not applicable to the first prediction mode.
  • the luminance control identifier is used to identify using the first prediction mode to perform intra-frame prediction on the target luminance block, where the first prediction mode includes multiple network weights;
  • the S330 may include :
  • the brightness index is used to indicate the target network weight in the multiple network weights
  • the The target network weight is the network weight used by the first prediction mode when performing intra prediction on the target luminance block.
  • the syntax elements can be as shown in the following table.
  • the indexes are all image block-level identifiers, and the brightness control identifier controls whether the current brightness block is suitable for the first prediction mode. If the brightness control identifier is 1, it means that the current brightness block is suitable for the first prediction mode, and 0 means that the current brightness block is not applicable.
  • the luma index only exists when the current prediction block has multiple sets of network weights. If there are no multiple prediction weights, this syntax element does not exist.
  • the target image block includes a target chrominance block
  • the S330 may include:
  • the chroma control flag is used to identify whether to use the first prediction mode for the target chroma block Intra prediction is performed.
  • the syntax elements can be as shown in the following table.
  • the coding end needs to write the chroma control flag intra_chroma_aip_flag in the code stream, and the chroma control flag is an image block level flag to control whether the current chroma block is suitable for the first prediction mode. If the chrominance control flag is 1, it means that the first prediction mode is applicable to the current chroma block, and if it is 0, it means that the first prediction mode is not applicable to the current chroma block.
  • the chrominance control identifier is used to identify using the first prediction mode to perform intra-frame prediction on the chroma-luminance block, and the first prediction mode includes a plurality of network weights;
  • the S330 Can include:
  • the chrominance index is used to indicate the target network in the multiple network weights weight
  • the target network weight is the network weight used by the first prediction mode when performing intra-frame prediction on the target chroma block.
  • the syntax elements can be as shown in the following table.
  • the coding end not only needs to write the chroma control flag intra_chroma_aip_flag, but also the chroma index intra_chroma_aip_mode in the code stream.
  • Both the chroma control flag and the chroma index are image block-level flags
  • the chrominance control flag controls whether the current chrominance block is suitable for the first prediction mode, if the chrominance control flag is 1, it means that the current chrominance block is suitable for the first prediction mode, and if it is 0, it means that the current chrominance block is not suitable for the first prediction mode model.
  • the chroma index exists only when the current prediction block has multiple sets of network weights. If there are no multiple prediction weights, this syntax element does not exist.
  • the method 300 may further include:
  • Acquire training data based on a target object where the target object is the target image sequence, a slice in the target image sequence, the target image frame or the target image block; train at least one set of network weights based on the training data , the at least one set of network weights is a network weight that can be used when using the first prediction mode to perform intra-frame prediction on image blocks in the target object.
  • the encoder can use a set of online training network weights based on each image frame, can also share one or several sets of network weights for each sequence, and can also share one or several sets of network weights for each slice, and also One or several sets of network weights can be shared for each CTU. This embodiment of the present application does not specifically limit this.
  • the at least one set of network weights includes network weights for luma blocks
  • the training data includes training luma blocks for training network weights for luma blocks
  • the training luma blocks The reconstructed part of the neighbors is the input to train the network weights for the luma blocks.
  • the at least one set of network weights includes network weights for chroma blocks
  • the training data includes training chroma blocks for training network weights for chroma blocks
  • the reconstruction part adjacent to the chroma block, the reconstruction part of the training chroma block and the reconstruction part adjacent to the training chroma block are input to train the network weights for the chroma block.
  • the training data includes a training image block used to train the network weights of the first prediction mode; the average of the pixel values of the training image block is subtracted from the pixel values of the training image block value to obtain a training set; train the at least one set of network weights based on the training set.
  • the training set for online training is the data of the current image frame.
  • the shape of the training data can be different depending on the network structure.
  • the training data can choose the original pixel value or the reconstructed pixel value.
  • the training data selects the reconstructed pixel value, the current frame needs to undergo a conventional intra-frame compression in advance to generate the reconstructed data.
  • each image frame Before encoding each image frame, take the video to be compressed in the YUV420 format as an example, obtain the original pixel values of the Cb component and the Cr component of the Y component. Divide the Y component into several 16x16 blocks, and the Cb and Cr components are divided into 8x8 blocks, respectively. The divided Y, Cb, and Cr blocks are preprocessed by subtracting the mean value in the current block, according to the corresponding in the original image. Position pairing generates several training sets.
  • the training data of the training set is the de-averaged Y block
  • the shape is an array of Nx16x16x1, N represents the number of samples of the training data in the training set, 16x16 represents the block size of the training data, 1 represents the number of channels is 1, that is, the Y component;
  • the training label (label) is the de-averaged Cb block and Cr block, the shape is Nx8x8x2, N represents the number of samples of the training label in the training set, 8x8 represents the block size of the training label, and 2 represents the number of channels. Cr component.
  • the training data in the training set can be used as input data
  • the training labels in the training set can be used as output data.
  • FIG. 8 is a schematic block diagram of a network structure for training network weights provided by an embodiment of the present application.
  • the network structure is a 3-layer convolutional neural network.
  • the first convolutional layer includes four 3x3x1 convolution kernels and a nonlinear activation function Relu
  • the second convolutional layer includes four 3x3x4 convolutional layers.
  • the third convolution layer includes 2 3x3x4 convolution kernels.
  • the input to the network is a de-averaged 16x16 luminance block, and the output is two 8x8 blocks representing the de-averaged Cb and Cr components, respectively.
  • a neural network is trained based on the training data to obtain at least one floating point number; each of the at least one floating point number is quantized to obtain the at least one set of network weights in integer form.
  • the hardware complexity can be reduced by quantizing the network weight. After the network weight is quantized, it is changed from a 32-bit floating point number to an integer with a low bit depth.
  • the neural network calculated by integer is more friendly in hardware implementation. It should be understood that the embodiments of the present application do not limit the number of bits specifically included in the integer.
  • a set of network weights may include weight matrix parameters and/or biases parameters, based on which, the at least one floating point number may include at least one matrix parameter in the form of floating point and/or at least one floating point parameter Paranoid parameter of the form.
  • each of the at least one floating-point number is quantized based on the following formula:
  • bitdepth represents the floating-point number used to convert it into an integer quantization parameter of the form. Floating point numbers are used in computers to approximate any real number.
  • round() returns a value that is the result of rounding to the specified number of decimal places.
  • bitdepth bitlength-1, where bitlength represents the bit length of the quantized integer. For example, bitlength is 6, then bitdepth is 5.
  • each of the at least one floating-point number is quantized based on the following formula:
  • w i round(w f ⁇ Scale)-((2 ⁇ bitlength)/2) ⁇ ((W fmax +W fmin )/(W fmax -W fmin ));
  • w f represents the floating point number in the at least one floating point number
  • round() represents the rounding operation according to the specified number of decimal places
  • bitlength represents the bit length of the quantized integer
  • Scale represents the quantization scale
  • W fmax and W fmin is the largest floating point number and the smallest floating point number among the at least one floating point number, respectively.
  • each of the at least one floating-point number may be quantized based on the following formula:
  • Offset Scale ⁇ (W fmax +W fmin )/2.
  • Offset represents an offset used to shift the network weight in integer form to the range from -2 ⁇ (bitdepth) to 2 ⁇ (bitdepth) in the quantization process.
  • the quantized network weight can be guaranteed to be within -2 ⁇ (bitdepth) to 2 ⁇ (bitdepth) range.
  • the first prediction is used based on the target network weight in the form of an integer in the at least one set of network weights.
  • the mode performs intra-frame prediction on the target image block to obtain a quantized prediction block; performs inverse quantization processing on the quantized prediction block to obtain the target prediction block.
  • the network weight is enlarged by Scale times during the quantization process, that is, (2 ⁇ bitdepth) times, after the convolution of each layer is completed, the convolution result needs to be scaled by 2 ⁇ bitdepth times, which can be shifted by the shift operation. to complete the zoom. For example, perform inverse quantization processing on the convolution result of each layer of the quantized prediction block based on the following formula to obtain the target prediction block:
  • O' represents the result of the calculation process of the target prediction block, for each convolutional layer in the neural network, and has not undergone inverse quantization processing
  • O represents the calculation process of the quantized prediction block.
  • bitdepth represents a quantization parameter for quantizing floating point numbers into integer forms.
  • the result of the inverse quantization process can be understood as the result of the scaling or displacement operation.
  • the bitdepth is 6.
  • inverse quantization processing is performed on the convolution result of each convolution layer, but the embodiment of the present application is not limited to this.
  • inverse quantization processing may also be performed on the results of the neural network, that is, the quantization prediction block may be the output result of the neural network before inverse quantization processing.
  • the target prediction block It may be the output result of the neural network after inverse quantization processing.
  • the bitdepth is related to the number of convolutional layers in the neural network. The larger the number of convolutional layers in the neural network, the larger the numerical value of bitdepth.
  • the S330 may include:
  • the code stream is obtained by encoding the permission identifier, the at least one set of network weights and the target residual block.
  • the at least one set of network weights is located at the head of the code stream of the target object.
  • the at least one set of network weights may also be stored in other locations of the code stream, which is not specifically limited in this embodiment of the present application.
  • the S330 may include:
  • the S330 may include:
  • the at least one set of network weights is located at the head of the code stream of the target image frame, and the syntax elements may be as shown in the following table.
  • the weight control flag aip_info_ph_flag represents whether the current image frame includes network weights
  • the weight control flag of 1 represents that the current image frame includes network weights
  • the weight control flag of 0 represents that the current image frame does not include network weights.
  • the at least one set of network weights may be the parameter aip_params.
  • the network weight sign aip_param_sign represents the positive or negative of the network weight, the network weight sign is 0 for positive, and the network weight sign is 1 for negative.
  • the S330 may include:
  • the residual of the at least one set of network weights includes the at least one set of network weights The residuals after subtracting the pretrained network weights from each set of network weights in .
  • the encoder uses one or several sets of pre-trained networks as the basic network weights.
  • the network weights trained online are subtracted from the basic network weights to obtain the residuals of the network weights, which are indirectly saved by saving the residuals of the network weights.
  • Network weights obtained from online training are obtained from online training.
  • the encoding method of the at least one set of network weights is a fixed-length code encoding method or a variable-length code encoding method.
  • network weight compression may be used to further compress the network weight and store it in the code stream, which is not specifically limited in this embodiment of the present application.
  • the method 300 may further include:
  • network weights for the first prediction mode are determined.
  • the encoder uses one or several sets of pre-trained network weights as the basic network weights, and the network weights trained online are all obtained by fine-tuning the basic network weights.
  • the permission flag is used to identify that the first prediction mode is not allowed to perform intra-frame prediction on the image blocks in the target image sequence
  • the S330 may include:
  • Intra-frame prediction is performed on the target residual block using a conventional prediction mode to obtain the target prediction block.
  • the encoding method according to the embodiment of the present application is described in detail above from the perspective of the encoding end, and the decoding method according to the embodiment of the present application will be described below from the perspective of the decoding end with reference to FIG. 10 .
  • the embodiment of the present application also provides a decoding method based on an online training neural network.
  • the decoding end can obtain one or more sets of weights from the code stream, load the obtained one or more sets of weights into the network structure, reconstruct the first prediction mode obtained by online training, and then the decoder obtains and selects the first prediction mode from the code stream.
  • the image block of the prediction mode is predicted using the first prediction mode.
  • FIG. 9 is a schematic flowchart of a decoding framework 200-1 provided by an embodiment of the present application.
  • the decoding framework 200-1 may include an online neural network unit for executing the first prediction mode, and the intra prediction mode trained online is in the decoding part, including the decoding end obtaining the current image from the code stream
  • the network weight of the frame, the intra prediction mode selected for online training is obtained from the code stream. Based on this, it is determined that when using the intra prediction mode of online training to perform intra prediction on the current image block, the intra prediction mode of online training is used.
  • the current image block is intra-predicted.
  • the decoding framework 200-1 is an extension of the decoding framework 200, that is, for other units in the decoding framework 200-1, reference may be made to the relevant descriptions in the decoding framework 200, which will not be repeated here to avoid repetition.
  • FIG. 10 shows a schematic flowchart of a decoding method 400 according to an embodiment of the present application. It should be understood that the decoding method 400 can be performed by a decoding end. For example, it is applied to the decoding framework 200-1 shown in FIG. 9 .
  • the decoding method 400 may include:
  • the first prediction mode refers to a prediction mode for performing intra-frame prediction on image blocks based on network weights trained online;
  • the permission flag is used to identify permission to use the first prediction mode to perform intra prediction on image blocks in the target image sequence;
  • the target image blocks include target luminance blocks, and the target image blocks include target luminance blocks.
  • the prediction block includes the prediction block of the target luminance block;
  • the S420 may include:
  • the brightness control identifier is used to identify whether to use the first prediction mode to perform intra-frame prediction on the target brightness block;
  • intra-frame prediction is performed on the target luminance block to obtain a prediction block of the target luminance block.
  • the luminance control identifier is used to identify using the first prediction mode to perform intra-frame prediction on the target luminance block, and the first prediction mode includes multiple network weights;
  • the S420 Can include:
  • the brightness index is used to indicate a target network weight in the multiple network weights
  • the first prediction mode is used to perform intra-frame prediction on the target luminance block to obtain a prediction block of the target luminance block.
  • the permission flag is used to identify permission to use the first prediction mode to perform intra-frame prediction on image blocks in the target image sequence;
  • the target image blocks include target chrominance blocks, the The target prediction block includes the prediction block of the target chroma block;
  • the S420 may include:
  • the chrominance control identifier is used to identify whether to use the first prediction mode to perform intra-frame prediction on the target chrominance block;
  • intra-frame prediction is performed on the target chrominance block to obtain a prediction block of the target chrominance block.
  • the chrominance control identifier is used to identify intra-frame prediction of the chrominance-luminance block using the first prediction mode, where the first prediction mode includes a plurality of network weights;
  • the The S420 may include:
  • the chroma index is used to indicate the target network weight in the multiple network weights
  • the first prediction mode is used to perform intra-frame prediction on the target chroma block to obtain a prediction block of the target chroma block.
  • the method 400 may further include:
  • the at least one set of network weights are network weights that can be used when using the first prediction mode to perform intra-frame prediction on image blocks in the target object, so
  • the target object is the target image sequence, a slice in the target image sequence, the target image frame or the target image block.
  • the encoding method of the at least one set of network weights is a fixed-length code encoding method or a variable-length code encoding method.
  • the method 400 may further include:
  • the at least one set of network weights is a network that can be used when using the first prediction mode to perform intra-frame prediction on image blocks in the target object weight, the target object is the target image sequence, a slice in the target image sequence, the target image frame or the target image block, and the residual of the at least one set of network weights includes the at least one set of The residual of each set of network weights in the network weights minus the pre-trained network weights.
  • the method 400 may further include:
  • the permission flag is used to identify that the first prediction mode is not allowed to perform intra-frame prediction on the image block in the target image sequence, use the traditional prediction mode to perform intra-frame prediction on the target image block to obtain the target prediction piece.
  • the process of the decoding method 400 is the inverse process of the encoding method 300, i.e., the steps in the decoding method 400 may refer to the corresponding steps in the encoding method 300, and are not repeated here for brevity.
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the present application.
  • the implementation of the embodiments constitutes no limitation.
  • the encoding method according to the embodiment of the present application is described above in detail from the perspective of the encoding end.
  • the following describes the decoding method according to the embodiment of the present application from the perspective of the decoding end with reference to FIG. 10 .
  • FIG. 10 shows a schematic flowchart of a decoding method 400 according to an embodiment of the present application.
  • the method 400 may consist of a decoding framework comprising an intra prediction unit based on an online trained neural network.
  • an intra prediction unit based on an online training neural network can be extended into the decoding framework described in FIG. 4 to perform the decoding method 400 .
  • the method 400 may include:
  • FIG. 11 is a schematic block diagram of an encoder 500 according to an embodiment of the present application.
  • the encoder 500 may include:
  • the obtaining unit 510 is configured to obtain a target image sequence and a permission flag, where the permission flag is used to identify whether to use a first prediction mode to perform intra-frame prediction on image blocks in the target image sequence, where the first prediction mode refers to: Prediction mode for intra-frame prediction of image blocks based on network weights trained online
  • a dividing unit 520 configured to divide the target image frame in the target image sequence into multiple image blocks, the multiple image blocks including the target image block;
  • a prediction unit 530 configured to perform intra-frame prediction on the target image block based on the allowable identifier to obtain a target prediction block;
  • a residual unit 540 configured to obtain a target residual block based on the target prediction block of the target image
  • the encoding unit 550 is configured to encode the permission flag and the target residual block to obtain a code stream.
  • the permission flag is used to identify permission to use the first prediction mode to perform intra-frame prediction on the image blocks in the target image sequence; the prediction unit 530 is further configured to:
  • An optimal prediction mode is selected from the first prediction mode and the traditional prediction mode, and the target residual block is intra-predicted by using the best prediction mode to obtain the target prediction block.
  • the prediction unit 530 is specifically configured to:
  • the first prediction mode is determined as the optimal prediction mode; if the rate-distortion cost of the first prediction block is The cost is higher than the rate-distortion cost of the second prediction block, and the legacy prediction mode is determined as the optimal prediction mode.
  • the target image block includes a target luminance block
  • the encoding unit 550 is specifically configured to:
  • the brightness control flag is used to identify whether to use the first prediction mode to perform intra-frame execution on the target brightness block predict.
  • the luminance control identifier is used to identify using the first prediction mode to perform intra-frame prediction on the target luminance block, where the first prediction mode includes a plurality of network weights; the encoding Unit 550 is specifically used for:
  • the brightness index is used to indicate the target network weight in the multiple network weights
  • the The target network weight is the network weight used by the first prediction mode when performing intra prediction on the target luminance block.
  • the target image block includes a target chroma block
  • the encoding unit 550 is specifically configured to:
  • the chroma control flag is used to identify whether to use the first prediction mode for the target chroma block Intra prediction is performed.
  • the chrominance control identifier is used to identify intra-frame prediction of the chrominance-luminance block using the first prediction mode, where the first prediction mode includes a plurality of network weights; the The encoding unit 550 is specifically used for:
  • the chroma index is used to indicate the target network in the multiple network weights weight
  • the target network weight is the network weight used by the first prediction mode when performing intra-frame prediction on the target chroma block.
  • the prediction unit 530 is further configured to:
  • Target object is the target image sequence, a slice in the target image sequence, the target image frame or the target image block;
  • At least one set of network weights is trained based on the training data, where the at least one set of network weights are network weights usable when performing intra-frame prediction on image blocks in the target object using the first prediction mode.
  • the at least one set of network weights includes network weights for luma blocks
  • the training data includes training luma blocks for training network weights for luma blocks
  • the predicting unit 530 Specifically for:
  • the network weights for the luma block are trained using the reconstruction part adjacent to the training luma block as input.
  • the at least one set of network weights includes network weights for chroma blocks
  • the training data includes training chroma blocks for training network weights for chroma blocks
  • the The prediction unit 530 is specifically used for:
  • the network weights for the chroma block are trained.
  • the training data includes training image blocks used to train the network weights of the first prediction mode; the prediction unit 530 is specifically configured to:
  • a training set is obtained by subtracting the average value of the pixel values of the training image blocks from the pixel values of the training image blocks;
  • the at least one set of network weights is trained based on the training set.
  • the prediction unit 530 is specifically configured to:
  • Each of the at least one floating point number is quantized to obtain the at least one set of network weights in integer form.
  • the prediction unit 530 is specifically configured to:
  • Each of the at least one floating-point number is quantized based on the following formula:
  • w f represents the floating-point number in the at least one floating-point number
  • round() represents the rounding operation according to the specified number of decimal places
  • wi represents the network weight in the form of an integer
  • bitdepth represents the floating-point number used to convert it into an integer quantization parameter of the form.
  • the prediction unit 530 is specifically configured to:
  • Each of the at least one floating-point number is quantized based on the following formula:
  • w i round(w f ⁇ Scale)-((2 ⁇ bitlength)/2) ⁇ ((W fmax +W fmin )/(W fmax -W fmin ));
  • w f represents the floating point number in the at least one floating point number
  • round() represents the rounding operation according to the specified number of decimal places
  • bitlength represents the bit length of the quantized integer
  • Scale represents the quantization scale
  • W fmax and W fmin is the largest floating point number and the smallest floating point number among the at least one floating point number, respectively.
  • the prediction unit 530 is specifically configured to:
  • Inverse quantization processing is performed on the quantized prediction block, that is, the convolution result of each layer needs to be scaled by 2 ⁇ bitdepth times to obtain the target prediction block.
  • the prediction unit 530 is specifically configured to:
  • O' represents the result of the calculation process of the target prediction block, for each convolutional layer in the neural network, and has not undergone inverse quantization processing
  • O represents the calculation process of the quantized prediction block.
  • bitdepth represents a quantization parameter for quantizing floating point numbers into integer forms.
  • the encoding unit 550 is specifically configured to:
  • the code stream is obtained by encoding the permission identifier, the at least one set of network weights and the target residual block.
  • the at least one set of network weights is located at the head of the code stream of the target object.
  • the encoding unit 550 is specifically configured to:
  • the encoding unit 550 is specifically configured to:
  • the encoding unit 550 is specifically configured to:
  • the residual of the at least one set of network weights includes the at least one set of network weights The residuals after subtracting the pretrained network weights from each set of network weights in .
  • the encoding method of the at least one set of network weights is a fixed-length code encoding method or a variable-length code encoding method.
  • the prediction unit 530 is further configured to:
  • network weights for the first prediction mode are determined.
  • the permission flag is used to identify that the first prediction mode is not allowed to perform intra-frame prediction on the image blocks in the target image sequence, and the prediction unit 530 is specifically configured to:
  • Intra-frame prediction is performed on the target residual block using a conventional prediction mode to obtain the target prediction block.
  • FIG. 12 is a schematic block diagram of a decoder 600 according to an embodiment of the present application.
  • the decoder 600 may include:
  • the parsing unit 610 is configured to, by parsing the code stream, obtain a permission flag and a target residual block of the target image block in the target image sequence; the permission flag is used to identify whether the first prediction mode is allowed to be used in the target image sequence Intra-frame prediction is performed on the image block, and the first prediction mode refers to a prediction mode for performing intra-frame prediction on the image block based on the network weight of online training;
  • a prediction unit 620 configured to perform intra-frame prediction on the target image block based on the allowable identifier to obtain a target prediction block;
  • the processing unit 630 is configured to obtain the target image frame based on the target residual block and the target prediction block.
  • the permission flag is used to identify permission to use the first prediction mode to perform intra prediction on image blocks in the target image sequence; the target image blocks include target luminance blocks, and the target image blocks include target luminance blocks.
  • the prediction block includes the prediction block of the target luminance block; the prediction unit 620 is specifically used for:
  • the brightness control identifier is used to identify whether to use the first prediction mode to perform intra-frame prediction on the target brightness block;
  • intra-frame prediction is performed on the target luminance block to obtain a prediction block of the target luminance block.
  • the luminance control identifier is used to indicate that intra prediction is performed on the target luminance block using the first prediction mode, where the first prediction mode includes a plurality of network weights; the prediction Unit 620 is specifically used for:
  • the brightness index is used to indicate a target network weight in the multiple network weights
  • the first prediction mode is used to perform intra-frame prediction on the target luminance block to obtain a prediction block of the target luminance block.
  • the permission flag is used to identify permission to use the first prediction mode to perform intra-frame prediction on image blocks in the target image sequence;
  • the target image blocks include target chrominance blocks, the
  • the target prediction block includes the prediction block of the target chroma block;
  • the prediction unit 620 is specifically used for:
  • the chrominance control identifier is used to identify whether to use the first prediction mode to perform intra-frame prediction on the target chrominance block;
  • intra-frame prediction is performed on the target chrominance block to obtain a prediction block of the target chrominance block.
  • the chrominance control identifier is used to identify intra-frame prediction of the chrominance-luminance block using the first prediction mode, where the first prediction mode includes a plurality of network weights; the The prediction unit 620 is specifically used for:
  • the chroma index is used to indicate the target network weight in the multiple network weights
  • the first prediction mode is used to perform intra-frame prediction on the target chroma block to obtain a prediction block of the target chroma block.
  • the parsing unit 610 is further configured to:
  • the at least one set of network weights are network weights that can be used when using the first prediction mode to perform intra-frame prediction on image blocks in the target object, so
  • the target object is the target image sequence, a slice in the target image sequence, the target image frame or the target image block.
  • the encoding method of the at least one set of network weights is a fixed-length code encoding method or a variable-length code encoding method.
  • the parsing unit 610 is further configured to:
  • the at least one set of network weights is a network that can be used when using the first prediction mode to perform intra-frame prediction on image blocks in the target object weight, the target object is the target image sequence, a slice in the target image sequence, the target image frame or the target image block, and the residual of the at least one set of network weights includes the at least one set of The residual of each set of network weights in the network weights minus the pre-trained network weights.
  • the prediction unit 620 is specifically configured to:
  • the permission flag is used to identify that the first prediction mode is not allowed to perform intra-frame prediction on the image block in the target image sequence, use the traditional prediction mode to perform intra-frame prediction on the target image block to obtain the target prediction piece.
  • the apparatus embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, details are not repeated here.
  • the encoder 500 shown in FIG. 11 may correspond to the corresponding subject in executing the method 300 of the embodiments of the present application, that is, the aforementioned and other operations and/or functions of the various units in the encoder 500 are respectively for implementing the method 300 and the like
  • each unit in the encoder 500 or the decoder 600 involved in the embodiments of the present application may be respectively or all merged into one or several other units to form, or some of the unit(s) may be further disassembled It is divided into a plurality of units with smaller functions, which can realize the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be implemented by multiple units, or the functions of multiple units may be implemented by one unit. In other embodiments of the present application, the encoder 500 or the decoder 600 may also include other units.
  • a general-purpose computing device including a general-purpose computer such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc., and a general-purpose computer may be implemented
  • a computer program (including program code) capable of executing the steps involved in the corresponding method is run on the computer to construct the encoder 500 or the decoder 600 involved in the embodiments of the present application, and to implement the encoding or decoding methods of the embodiments of the present application.
  • the computer program may be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and executed in the electronic device, so as to implement the corresponding methods of the embodiments of the present application.
  • the units mentioned above can be implemented in the form of hardware, can also be implemented by instructions in the form of software, and can also be implemented in the form of a combination of software and hardware.
  • the steps of the method embodiments in the embodiments of the present application may be completed by hardware integrated logic circuits in the processor and/or instructions in the form of software, and the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as hardware
  • the execution of the decoding processor is completed, or the execution is completed by a combination of hardware and software in the decoding processor.
  • the software may be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • FIG. 13 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device 700 includes at least a processor 710 and a computer-readable storage medium 720 .
  • the processor 710 and the computer-readable storage medium 720 may be connected through a bus or other means.
  • the computer-readable storage medium 720 is used for storing a computer program 721
  • the computer program 721 includes computer instructions
  • the processor 710 is used for executing the computer instructions stored in the computer-readable storage medium 720 .
  • the processor 710 is the computing core and the control core of the electronic device 700, which is suitable for implementing one or more computer instructions, and is specifically suitable for loading and executing one or more computer instructions to implement corresponding method processes or corresponding functions.
  • the processor 710 may also be referred to as a central processing unit (Central Processing Unit, CPU).
  • the processor 710 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field Programmable Gate Array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • the computer-readable storage medium 720 may be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; computer readable storage medium.
  • the computer-readable storage medium 720 includes, but is not limited to, volatile memory and/or non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM).
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Random Access Memory
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the electronic device 700 may be an encoding terminal, an encoder, or an encoding framework involved in the embodiments of the present application;
  • the computer-readable storage medium 720 stores first computer instructions; loaded and executed by the processor 710
  • the first computer instructions stored in the computer-readable storage medium 720 are used to implement corresponding steps in the encoding method provided by the embodiments of the present application; in other words, the first computer instructions in the computer-readable storage medium 720 are loaded and executed by the processor 710
  • Corresponding steps, in order to avoid repetition, are not repeated here.
  • the electronic device 700 may be a decoding end, a decoder, or a decoding framework involved in the embodiments of the present application;
  • the computer-readable storage medium 720 stores second computer instructions; loaded and executed by the processor 710
  • the second computer instructions stored in the computer-readable storage medium 720 are used to implement corresponding steps in the decoding method provided by the embodiments of the present application; in other words, the second computer instructions in the computer-readable storage medium 720 are loaded and executed by the processor 710 Corresponding steps, in order to avoid repetition, are not repeated here.
  • an embodiment of the present application further provides a computer-readable storage medium (Memory), where the computer-readable storage medium is a memory device in the electronic device 700 for storing programs and data.
  • computer readable storage medium 720 may include both a built-in storage medium in the electronic device 700 , and certainly also an extended storage medium supported by the electronic device 700 .
  • the computer-readable storage medium provides storage space in which the operating system of the electronic device 700 is stored.
  • one or more computer instructions suitable for being loaded and executed by the processor 710 are also stored in the storage space, and these computer instructions may be one or more computer programs 721 (including program codes).
  • a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
  • the data processing device 700 may be a computer, the processor 710 reads the computer instructions from the computer-readable storage medium 720, and the processor 710 executes the computer instructions, so that the computer executes the encoding method provided in the above-mentioned various optional manners or decoding method.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server or data center via Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • wired eg, coaxial cable, optical fiber, digital subscriber line, DSL
  • wireless eg, infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供一种编码方法、解码方法、编码器、解码器以及电子设备。所述编码方法包括:获取目标图像序列和允许标识,所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;将所述目标图像序列中的目标图像帧划分为多个图像块,所述多个图像块包括目标图像块;基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;基于所述目标预测块,得到目标残差块;对所述允许标识以及所述目标残差块进行编码,得到码流。通过引入第一预测模式,在可使用第一预测模式对所述目标图像块进行帧内预测的情况下,可使用在所述第一预测模式和传统预测模式中选择的最优预测模式得到目标预测块,能够提升压缩性能。

Description

编码方法、解码方法、编码器、解码器以及电子设备 技术领域
本申请实施例涉及图像编解码技术领域,并且更具体地,涉及编码方法、解码方法、编码器、解码器以及电子设备。
背景技术
数字视频压缩技术主要是将庞大的数字影像视频数据进行压缩,以便于传输以及存储等。随着互联网视频的激增以及人们对视频清晰度的要求越来越高,尽管已有的数字视频压缩标准能够实现视频解压缩,但目前仍然需要追求更好的数字视频压缩技术,以提升压缩性能。
发明内容
本申请实施例提供一种编码方法、解码方法、编码器、解码器以及电子设备,能够提升压缩性能。
一方面,提供了一种编码方法,包括:
获取目标图像序列和允许标识,所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
将所述目标图像序列中的目标图像帧划分为多个图像块,所述多个图像块包括目标图像块;
基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
基于所述目标预测块,得到目标残差块;
对所述允许标识以及所述目标残差块进行编码,得到码流。
另一方面,提供了一种解码方法,包括:
通过解析码流,获取允许标识以及目标图像序列中的目标图像块的目标残差块;所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
基于所述目标残差块和所述目标预测块,得到所述目标图像帧。
另一方面,本申请实施例提供了一种编码器,用于执行上述第一方面或其各实现方式中的方法。具体地,该编码器包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
另一方面,本申请实施例提供了一种解码器,用于执行上述第二方面或其各实现方式中的方法。具体地,该解码器包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
另一方面,本申请实施例提供了一种电子设备,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机指令,该计算机指令被计算机设备的处理器读取并执行时,使得计算机设备执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
另一方面,本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
本申请实施例中,通过引入第一预测模式,在可使用所述第一预测模式对所述目标图像块进行帧内预测的情况下,可在所述第一预测模式和传统预测模式中选择最优预测模式;然后基于所述最优预测模式得到目标预测块,进而,能够提升压缩性能。
附图说明
图1是本申请实施例提供的编码框架的示意性框图。
图2是本申请实施例提供的33种角度预测模式的具体方向示意图。
图3是本申请实施例提供的MIP模式的示意性流程图。
图4是本申请实施例提供的解码器侧重构过程框图的示例。
图5是本申请实施例提供的解码框架的示意性框图。
图6是本申请实施例提供的编码框架的另一示意性框图。
图7是本申请实施例提供的编码方法的示意性流程图。
图8是本申请实施例提供的用于训练网络权重的网络结构的示意性框图。
图9是本申请实施例提供的解码框架的另一示意性框图。
图10是本申请实施例提供的解码方法的示意性流程图。
图11是本申请实施例的编码器的示意性框图。
图12是本申请实施例的解码器的示意性框图。
图13是本申请实施例提供的电子设备的示意结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的方案可应用于数字视频编码技术领域,例如,图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域。本申请实施例提供的方案可结合至音视频编码标准(Audio Video coding Standard,AVS)、第二代AVS标准(AVS2)或第三代AVS标准(AVS3)。包括但不限于H.264/音视频编码(Audio Video coding,AVC)标准、H.265/高效视频编码(High Efficiency Video Coding,HEVC)标准以及H.266/多功能视频编码(Versatile Video Coding,VVC)标准。本申请实施例提供的方案可以用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
在数字视频编码过程中,编码器对不同颜色格式的原始视频序列读取不相等的亮度分量的像素和色度分量的像素,即编码器读取一幅黑白图像或彩色图像,然后针对黑白图像或彩色图像进行编码。其中,黑白图像可以包括亮度分量的像素,彩色图像可以包括色度分量的像素,可选的,彩色图像还可以包括亮度分量的像素。原始视频序列的颜色格式可以是亮度色度(YCbCr,YUV)格式或红绿蓝(Red-Green-Blue,RGB)格式等。针对YUV格式,Y表示明亮度(Luma),Cb(U)表示蓝色色差,Cr(V)表示红色色差,U和V表示为色度(Chroma)用于描述色差信息。编码器读取一幅黑白图像或彩色图像之后,将其划分成块数据,并对块数据进行编码。该块数据可以是编码树单元(Coding Tree Unit,CTU)或编码单元块(Coding Unit,CU),一个编码树单元又可以继续被划分成若干个CU,CU可以为长方形块也可以为正方形块。即编码器可基于CTU或CU进行编码。如今编码器通常为混合框架编码模式,一般包含帧内与帧间预测、变换与量化、反变换与反量化、环路滤波及熵编码等操作。帧内预测只参考同一帧图像的信息,预测当前划分块内的像素信息,用于消除空间冗余;帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配当前划分块的运动矢量信息,用于消除时间冗余;变换将预测后的图像块转换到频率域,能量重新分布,结合量化可以将人眼不敏感的信息去除,用于消除视觉冗余;熵编码可以根据当前上下文模型以及二进制码流的概率信息消除字符冗余。
为了便于理解,先对本申请提供的编码框架进行简单介绍。
图1是本申请实施例提供的编码框架100的示意性框图。
如图1所示,该编码框架100可包括帧内预测单元180、残差单元110、变换与量化单元120、熵编码单元130、反变换与反量化单元140、以及环路滤波单元150。可选的,该编码框架100还可包括解码图像缓冲单元160和/或帧间预测单元170。该编码框架100也可称为混合框架编码模式。
在编码框架100中,帧内预测单元180或帧间预测单元170可对待编码图像块进行预测,以输出预测块。残差单元110可基于预测块与待编码图像块计算残差块,即预测块和待编码图像块的差值。该残差块经由变换与量化单元120变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换与量化单元120变换与量化之前的残差块可称为时域残差块,经过变换与量化单元120变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元130接收到变换与量化单元120输出的变换量化系数后,可基于该变换量化系数输出码流。例如,熵编码单元130可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。例如,熵编码单元130可以用于基于上下文的自适应二进制算术熵编码(CABAC)。熵编码单元130也可称为头信息编码单元。可选的,在本申请中,该待编码图像块也可称为原始图像块或目标图像块,预测块也可称为预测图像块或图像预测块,还可以称为预测信号或预测信息,重建块也可称为重建图像块或图像重建块,还可以称为重建信号或重建信息。此外,针对编码端,该待编码图像块也可称为编码块或编码图像块,针对解码端,该待编码图像块也可称为解码块或解码图像块。该待编码图像块可以是CTU或CU。
简言之,编码框架100将预测块与待编码图像块计算残差得到残差块经由变换与量化等过程,将残差块传输到解码端。解码端接收并解析码流后,经过反变换与反量化等步骤得到残差块,将解码端 预测得到的预测块叠加残差块后得到重建块。
需要说明的是,编码框架100中的反变换与反量化单元140、环路滤波单元150以及解码图像缓冲单元160可用于形成一个解码器。相当于,帧内预测单元180或帧间预测单元170可基于已有的重建块对待编码图像块进行预测,进而能够保证编码端和解码端的对参考帧的理解一致。换言之,编码器可复制解码器的处理环路,进而可与解码端产生相同的预测。具体而言,量化的变换系数通过反变换与反量化单元140反变换与反量化来复制解码端的近似残差块。该近似残差块加上预测块后可经过环路滤波单元150,以平滑滤除由于基于块处理和量化产生的块效应等影响。环路滤波单元150输出的图像块可存储在解码图像缓存单元160中,以便用于后续图像的预测。
帧内预测单元180可用于帧内预测,帧内预测只参考同一帧图像的信息,预测待编码图像块内的像素信息,用于消除空间冗余;帧内预测所使用的帧可以为I帧。例如,可根据从左至右、从上到下的编码顺序,待编码图像块可以参考左上方图像块,上方图像块以及左侧图像块作为参考信息来预测待编码图像块,而待编码图像块又作为下一个图像块的参考信息,如此,可对整幅图像进行预测。若输入的数字视频为彩色格式,例如YUV 4:2:0格式,则该数字视频的每一图像帧的每4个像素点由4个Y分量和2个UV分量组成,编码框架100可对Y分量(即亮度块)和UV分量(即色度块)分别进行编码。类似的,解码端也可根据格式进行相应的解码。帧间预测单元170可用于帧间预测,帧间预测可以参考不同帧的图像信息,利用运动估计搜索最匹配待编码图像块的运动矢量信息,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。
针对帧内预测过程,帧内预测可借助角度预测模式与非角度预测模式对待编码图像块进行预测,以得到预测块,根据预测块与待编码图像块计算得到的率失真信息,筛选出待编码图像块最优的预测模式,并将该预测模式经码流传输到解码端。解码端解析出预测模式,预测得到目标解码块的预测块并叠加经码流传输而获取的时域残差块,可得到重建块。经过历代的数字视频编解码标准发展,非角度模式保持相对稳定,有均值模式和平面模式;角度模式则随着数字视频编解码标准的演进而不断增加。以国际数字视频编码标准H系列为例,H.264/AVC标准仅有8种角度预测模式和1种非角度预测模式;H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。在H.266/VVC中,帧内预测模式被进一步拓展,对于亮度块共有67种传统预测模式和非传统的预测模式。非传统的预测模式可以包括矩阵加权帧内预测(Matrix weighted intra-frame prediction,MIP)模式。传统预测模式包括:模式编号0的平面(planar)模式、模式编号1的DC模式和模式编号2到模式编号66的角度预测模式。图2为本申请实施例提供的33种角度预测模式的具体方向示意图,如图2所示,33种角度预测模式分为水平类模式和竖直类模式,水平类模式包括H+32(模式编号2)至H-32(模式编号17),竖直类模式包括V-32(模式编号18)至V+32(模式编号34)。V0(模式编号26)和H0(模式编号10)分别表示竖直和水平方向,其余角度预测模式的预测方向都可以看作是在竖直或水平方向上做一个角度偏移。VVC的参考软件测试平台(VVC TEST MODEL,VTM)对于色度块除了planar模式、DC模式和角度模式外,还有跨分量线性色度预测(Cross component linear model prediction,CCLM)模式。MIP模式目前为VVC独有,而CCLM模式也存在于其它先进的标准里,例如AV1的来自亮度的色度(Chroma from Luma,CfL)模式和AVS3的两步跨分量预测模式(Two Step Cross-component Prediction Mode,TSCPM)。
图3是本申请实施例提供的MIP模式的示意性流程图。
如图3所示,利用待编码图像块左侧的K列,上方的K行和左上方的K列重建像素点为输入,采用全连接神经网络对待编码图像块进行预测,进而得到待编码图像块的预测像素点,即待编码图像块的预测块。重建像素点也可称为重构像素值或重构像素点,预测像素点也可称为预测像素值。换言之,在一个给定的形状为MxN的待编码图像块中,例如M≤32且N≤32,以待编码图像块周围的参考点为输入,采用全连接神经网络对待编码图像块进行预测,进而得到待编码图像块的预测块。待编码图像块周围的参考点可以由待编码图像块周围的K行宽度为N+K的上参考行和K列高度为M的左参考行构成。针对不同形状的待编码图像块,对全连接神经网络的多套参数即多套网络权重需要进行率失真筛选,选择出最优的一套网络权重进行预测,并将此套参数的索引编入码流。网络权重可包括矩阵和偏置(biases)等参数。
需要说明的是,MIP模式源于基于神经网络的预测模式,具体源于基于全连接神经网络的帧内预测模式。基于神经网络的预测模式指采用神经网络对图像块进行帧内预测。基于神经网络的预测模式可包括基于非线性神经网络的预测模式或基于线性网络的预测模式。通常基于预先准备好的训练集训练出一套或多套网络权重,在使用基于神经网络的预测模式对图像块进行帧内预测时,通过读取预先训练好的一套或多套网络权重来生成预测块,然而,训练中会因为考虑到神经网络的泛用性,往往使 用通用训练集训练出大多数情况下都较为适用的网络权重,但即便如此,在实际预测中,也仍会存在待编码图像块对应的训练集没有被通用训练集包括的情况,对于这样的待编码图像块,大都会选中非神经网络的帧内预测模式。此外,MIP模式局限于其只使用在亮度块的预测。并且,MIP模式的模型种类过多,训练起来较为复杂。
MIP模式相比于基于神经网络的预测模式,经过了包括网络参数、输入点数等多方面的简化,最终采用向量乘矩阵的形式完成预测。在MIP模式中,对于一个宽度为N,高度为M的待编码图像块,MIP模式会选取该块上方一行的W个重建像素点和左侧一列的H个重建像素点作为输入。如果这些位置的像素还未被重建,可像传统预测方法一样处理。MIP模式产生预测值主要基于三个步骤,分别是参考像素取均值、矩阵向量相乘和线性插值上采样。MIP模式作用于4x4至32x32大小的块,对于一个长方形的块,若长方形短边长为4时,将会从预先训练好的16套16列4行的矩阵和偏置(即网络权重)中选取最优;若长方形短边长为8时,将会从预先训练好的8套16列8行的矩阵和偏置中选取最优;若长方形短边长为16时;将会从预先训练好的6套64列8行的矩阵和偏置中选取最优。可通过合并多个训练后的神经网络的网络权重,获取上述涉及特定尺寸的块所对应的多套的矩阵和偏置。
图4是本申请实施例提供的解码器侧重构过程框图(Block diagram of decoder-side reconstruction process)的示例。
如图4所示,解码端可先执行模式列表推导(mode list derivation)过程,具体的,先将参考点属于包括隐藏层和输出层的神经网络,并输出排序的可能模式列表(sort mode probability list),然后基于解析码流得到的预测索引在基于可能模式列表确定的模式列表(mode list)中选择使用的预测模式,进而基于使用的预测模式进行图像块的预测。在预测过程中,可先将参考点输出包括隐藏层1至隐藏层3以及输出层的神经网络,神经网络输出预测块。当然,本申请实施例中的神经网络可由若干层全连接隐藏层和非线性激活函数组成,图4中包括的隐藏层的数量仅为示例,不应理解为对本申请的限制。当然,也可以基于卷积神经网络(CNN)的帧内预测或基于循环神经网络(RNN)的帧内预测等等,本申请实施例对此不作具体限定。
应理解,图1至图4仅为本申请的示例,不应理解为对本申请的限制。
例如,该编码框架100中的环路滤波单元150可包括去块滤波器(DBF)和样点自适应补偿滤波(SAO)。DBF的作用是去块效应,SAO的作用是去振铃效应。在本申请的其他实施例中,该编码框架100可采用基于神经网络的环路滤波算法,以提高视频的压缩效率。或者说,该编码框架100可以是基于深度学习的神经网络的视频编码混合框架。在一种实现中,可以在去块滤波器和样点自适应补偿滤波基础上,采用基于卷积神经网络的模型计算对像素滤波后的结果。环路滤波单元150在亮度分量和色度分量上的网络结构可以相同,也可以有所不同。考虑到亮度分量包含更多的视觉信息,还可以采用亮度分量指导色度分量的滤波,以提升色度分量的重建质量。
图5是本申请实施例提供的解码框架200的示意性框图。
如图5所示,该解码框架200可包括熵解码单元210、反变换反量化单元220、残差单元230、帧内预测单元240、帧间预测单元250、环路滤波单元260、解码图像缓存单元270。
熵解码单元210接收并解析码流后,以获取预测块和频域残差块,针对频域残差块,通过反变换反量化单元220进行反变换与反量化等步骤,可获取时域残差块,残差单元230将帧内预测单元240或帧间预测单元250预测得到的预测块叠加至经过通过反变换反量化单元220进行反变换与反量化之后的时域残差块,可得到重建块。例如,帧内预测单元240或帧间预测单元250可通过解码码流的头信息,获取预测块。
本申请实施例提供了一种基于在线训练神经网络的编码方法。
图6是本申请实施例提供的编码框架100-1的示意性流程图。
如图6所示,编码器可利用原始的待编码图像帧生成训练集,将训练集输入到网络结构中得到神经网络的网络权重,基于此,可使用得到的网络权重进行对待编码图像块进行预测。网络结构为预先定义好的结构,例如网络结构可根据预测亮度或色度以及待编码图像块的大小形状的变化而定义多组网络构,网络可以是非线性的网络或是线性的网络。基于此,使用第一预测模式或传统预测模式得到的预测块经过变化与量化,以及熵编码后写入码流。可选的,还需要将第一预测模式的控制标识和神经网络使用的网络权重写入码流。所述编码框架100-1可包括用于执行第一预测模式的在线神经网络单元,需要说明的是,所述编码框架100-1是编码框架100的扩展,即编码框架100-1中的其他单元可参见编码框架100中的相关描述,为避免重复,此处不再赘述。
训练神经网络时用到的参数包括但不限于批次大小(batch size)等参数以及学习率(learning rate)或优化器(Optimizer)等超参数。在机器学习中,超参数是在开始学习过程之前设置值的参数,而不是 通过训练得到的参数数据。通常情况下,需要对超参数进行优化,给学习机选择一组最优超参数,以提高学习的性能和效果。批次大小用于定义一次训练所选取的样本数,或批次大小用于定义训练的每次迭代处理所选取的样本数。批次大小的取值影响模型的优化程度和速度,也直接影响到处理器内存的使用情况。可选的,处理器内存越小,批次大小的取值越小。训练神经网络时用到的损失函数包括但不限于L1函数,L2函数或离散余弦变换(DCT for Discrete Cosine Transform,DCT)变换函数等。损失函数(loss function)也可称为代价函数(cost function),其被用于模型的参数估计(parametric estimation)。
本申请实施例中,编码端在线训练一套或多套的网络权重,所述一套或多套网络权重用于构建第一预测模式。在一种实现方式中,编码器可对所述一套或多套网络权重进行量化。在一种实现方式中,编码器从第一预测模式和传统预测模式中选择最优预测模式。在一种实现方式中,编码器将选中的模式编入码流,即新增语法元素。在一种实现方式中,编码器对在线训练得到的帧内预测网络权重进行编码。
图7是本申请实施例提供的编码方法300的示意性流程图。应理解,该编码方法300可由编码端执行。例如应用于图6所示的编码框架100-1。为便于描述,下面以编码端为例进行说明。
如图7所示,所述编码方法300可包括:
S310,获取目标图像序列和允许标识,所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
S320,将所述目标图像序列中的目标图像帧划分为多个图像块,所述多个图像块包括目标图像块;
S330,基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
S340,基于所述目标预测块,得到目标残差块;
S350,对所述允许标识以及所述目标残差块进行编码,得到码流。
本申请实施例中,通过引入第一预测模式,在可使用所述第一预测模式对所述目标图像块进行帧内预测的情况下,可在所述第一预测模式和传统预测模式中选择最优预测模式;然后基于所述最优预测模式得到目标预测块,进而,能够提升压缩性能。
在线训练可以有效提高编码性能,降低基于神经网络的帧内预测复杂度。在线训练网络的参数量远小于预先训练好的固定模型,在线训练可降低计算量复杂度。将在线预测的帧内色度预测模式集成进了VTM-10.0测试软件,得到测试结果如表1所示,下面结合表1对第一预测模式的预测性能进行说明。
表1
Figure PCTCN2021073410-appb-000001
Figure PCTCN2021073410-appb-000002
如表1所示,Class-B为1080p的测试视频,Class-C为480p的测试视频,Class-E为720p的测试视频。比约加德增量比特率(
Figure PCTCN2021073410-appb-000003
delta bit rate,BD-rate)为负代表基于第一预测模式的测试结果相对于神经网络基于预先训练的网络权重的测试结果的性能提升。BD-rate代表相同峰值信噪比(Peak Signal to Noise Ratio,PSNR)下的码率差异,BD-rate越小表示编码算法的性能越好。
需要说明的是,一般来说,码率降低,PSNR增大,能够说明新方法具有较好的性能。然而,会出现这样一种情况,即码率相对于原来的方法有所降低,但是PSNR即视频的质量却降低了,在这种情况下,可以采用BD-rate衡量编码算法的性能。当然,在视频处理过程中,也可利用其他参数来衡量编码算法的性能,以表征利用新方法得到的视频相对于原来的方法得到的视频在码率和PSNR上的变化情况,本申请实施例对此不作具体限定。例如,也可采用比约加德增量信号噪音功率比(
Figure PCTCN2021073410-appb-000004
delta peak signal-to-noise rate,BD-PSNR)来衡量编码算法的性能,BDPSNR代表相同码率下的PSNR的差异,BD-PSNR越大表示编码算法的性能越好。
此外,本申请实施例中,编码端在线训练采用的训练集可以为重建图像,此时编码端需要进行两遍帧内编码,第一遍编码得到的重建图像作为训练集,用于在线训练网络权重,第二遍编码将在线训练得到的预测模式作为备选模式的一种加入和传统帧内模式的竞争,并进行编码。此外,本申请实施例对第一预测模式使用的网络结构不作具体限定,作为示例,第一预测模式使用的网络结构可以是非线网络结构,例如非线性的RNN,LSTM,CNN等非线网络结构,也可以预测值由若干个邻近像素值通过加权的方式得到的线性网络。
下面结合表2中的语法元素对本申请的方案进行说明。
表2
Figure PCTCN2021073410-appb-000005
如表2所示,序列参数集(seq_parameter_set_rbsp)可包括允许标识(sps_aip_enabled_flag),允许标识为序列级控制开关,用于控制在当前序列是否开启第一预测模式,如该flag为1则代表在当前序列开启第一预测模式,为0则代表关闭第一预测模式。在具体实现中,可以通过用户设置的方式控制序列参数集是否开启允许标识,在序列参数集中开启所述允许标识的情况下,编码器可通过查询用户配置的配置文件获取所述允许标识的具体数值,即该flag为1还是为0。
在本申请的一些实施例中,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中 的图像块进行帧内预测;所述方法330还可包括:
在所述第一预测模式和传统预测模式中选择最优预测模式,利用所述最有预测模式对所述目标残差块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述在所述第一预测模式和传统预测模式中选择最优预测模式,包括:
使用所述第一预测模式对所述目标图像块进行帧内预测,得到第一预测块;
使用所述传统预测模式对所述目标图像块进行帧内预测,以得到第二预测块;
若所述第一预测块的率失真代价低于所述第二预测块的率失真代价,将所述第一预测模式确定为所述最优预测模式;若所述第一预测块的率失真代价高于所述第二预测块的率失真代价,将所述传统预测模式确定为所述最优预测模式。
换言之,基于所述第一预测模式是否被选中由编码端决定。在编码端,当筛选帧内预测模式时,基于所述第一预测模式与传统预测模式会共同进行率失真筛选。若传统模式代价低,则选中传统预测模式;若所述第一预测模式代价低,则选中所述第一预测模式。选中的模式将被编入码流供解码端读取。在解码端,当解析出的预测模式为所述第一预测模式时,采用所述第一预测模式进行预测;当解析出为传统模式时,采用对应传统模式进行预测。
例如,预测模式选择为在传统的帧内预测模式与在线训练出的帧内预测模式之中选出代价最小的预测模式,代价衡量标准可基于如下公式确定:
L=λ×R+D;
其中,R代表当前编码方式所消耗的bit数,D代表当前编码方式造成的重建块与原始块之间的失真,λ为一个可变的系数。
在本申请的一些实施例中,所述目标图像块包括目标亮度块,所述S330可包括:
对所述允许标识、亮度控制标识以及所述目标残差块进行编码,得到所述码流;所述亮度控制标识用于标识是否使用所述第一预测模式对所述目标亮度块进行帧内预测。
当在线训练的网络可用于亮度块预测且仅有一套网络权重可供选择时,语法元素可如下表所示。
表3
Figure PCTCN2021073410-appb-000006
如表3所示,编码端在编码过程中,码流中需要写入亮度控制标识intra_aip_flag[x0][y0],亮度控制标识为图像块级标识,控制当前亮度块是否适用第一预测模式,如该亮度控制标识为1则代表当前亮度块适用第一预测模式,为0则代表当前亮度块不适用第一预测模式。
在一种实现方式中,所述亮度控制标识用于标识使用所述第一预测模式对所述目标亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述S330可包括:
对所述允许标识、所述亮度控制标识、亮度索引以及所述目标残差块进行编码,得到所述码流;所述亮度索引用于指示所述多个网络权重中的目标网络权重,所述目标网络权重为对所述目标亮度块进行帧内预测时所述第一预测模式使用的网络权重。
当在线训练的网络可用于亮度块预测且有多套网络权重可供选择时,语法元素可如下表所示。
表4
Figure PCTCN2021073410-appb-000007
如表4所示,编码端在编码过程中,码流中不仅需要写入亮度控制标识intra_aip_flag[x0][y0],还需要写入亮度索引intra_aip_mode[x0][y0],亮度控制标识和亮度索引均为为图像块级标识,亮度控制标识控制当前亮度块是否适用第一预测模式,如该亮度控制标识为1则代表当前亮度块适用第一预测模式,为0则代表当前亮度块不适用第一预测模式。亮度索引仅在当前预测块有多套网络权重时存在,若不存在多种预测权重,则无此语法元素。
在本申请的一些实施例中,所述目标图像块包括目标色度块,所述S330可包括:
对所述允许标识、色度控制标识以及所述目标残差块进行编码,得到所述码流;所述色度控制标识用于标识是否使用所述第一预测模式对所述目标色度块进行帧内预测。
当在线训练的网络可用于色度块预测且仅有一套网络权重可供选择时,语法元素可如下表所示。
表5
Figure PCTCN2021073410-appb-000008
如表5所示,编码端在编码过程中,码流中需要写入色度控制标识intra_chroma_aip_flag,色度控制标识为图像块级标识,控制当前色度块是否适用第一预测模式,如该色度控制标识为1则代表当前色度块适用第一预测模式,为0则代表当前色度块不适用第一预测模式。
在一种实现方式中,所述色度控制标识用于标识使用所述第一预测模式对所述色度亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述S330可包括:
对所述允许标识、所述色度控制标识、色度索引以及所述目标残差块进行编码,得到所述码流; 所述色度索引用于指示所述多个网络权重中的目标网络权重,所述目标网络权重为对所述目标色度块进行帧内预测时所述第一预测模式使用的网络权重。
当在线训练的网络可用于色度块预测且仅有一套网络权重可供选择时,语法元素可如下表所示。
表6
Figure PCTCN2021073410-appb-000009
如表6所示,编码端在编码过程中,码流中不仅需要写入色度控制标识intra_chroma_aip_flag,还需要写入色度索引intra_chroma_aip_mode,色度控制标识和色度索引均为为图像块级标识,色度控制标识控制当前色度块是否适用第一预测模式,如该色度控制标识为1则代表当前色度块适用第一预测模式,为0则代表当前色度块不适用第一预测模式。色度索引仅在当前预测块有多套网络权重时存在,若不存在多种预测权重,则无此语法元素。
在本申请的一些实施例中,所述方法300还可包括:
基于目标对象获取训练数据,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块;基于所述训练数据训练至少一套网络权重,所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重。
换言之,编码端可以基于每一个图像帧使用一套在线训练的网络权重,也可为每个序列共享一套或若干套网络权重,还可以为每个切片共享一套或若干套网络权重,还可以为每个CTU共享一套或若干套网络权重。本申请实施例对此不作具体限定。
在一种实现方式中,所述至少一套网络权重包括用于亮度块的网络权重,所述训练数据包括用于训练用于亮度块的网络权重的训练亮度块;以所述训练亮度块相邻的重建部分为输入,训练用于亮度块的网络权重。
在一种实现方式中,所述至少一套网络权重包括用于色度块的网络权重,所述训练数据包括用于训练用于色度块的网络权重的训练色度块;以所述训练色度块相邻的重建部分,所述训练色度块的重建部分和所述训练色度块相邻的重建部分为输入,训练用于色度块的网络权重。
在一种实现方式中,所述训练数据包括用于训练所述第一预测模式的网络权重的训练图像块;利用所述训练图像块的像素值减去所述训练图像块的像素值的平均值,得到训练集;基于所述训练集训练所述至少一套网络权重。
换言之,在线训练的训练集为当前图像帧的数据。可选的,根据网络结构的不同,训练数据的形状可以不同。训练数据可以选取原始像素值,也可以选取重建像素值。当训练数据选取重建像素值时,当前帧需要事先经过一次常规的帧内压缩生成重建数据。
在每个图像帧编码开始前,以YUV420格式的待压缩视频为例,获取Y分量Cb分量和Cr分量的原始像素值。将Y分量切分成若干个16x16的块,Cb和Cr分量分别切分成8x8的块,切分出的Y,Cb,Cr块通过减去当前块内均值的预处理后,按照在原图中的对应位置配对生成若干个训练集。训练集的训练数据为去均值的Y块,形状为Nx16x16x1的数组,N代表训练集中训练数据的样本数,16x16代表训练数据的块大小,1代表通道数为1,即Y分量;训练集的训练标签(label)为为去均值的Cb块和Cr块,形状为Nx8x8x2,N代表训练集中训练标签的样本数,8x8代表训练标签的块大 小,2代表通道数为2,分别为Cb分量和Cr分量。需要说明的是,训练集中的训练数据可作为输入数据,训练集中的训练标签可作为输出数据。
图8是本申请实施例提供的用于训练网络权重的网络结构的示意性框图。
如图8所示,网络结构为3层的卷积神经网络,第一层卷积层包括4个3x3x1的卷积核和一个非线性激活函数Relu,第二层卷积层包括4个3x3x4的卷积核和一个非线性激活函数Relu,第三层卷积层包括2个3x3x4的卷积核。网络的输入为去均值16x16的亮度块,输出为2个8x8的块分别代表去均值的Cb和Cr分量。
在一种实现方式中,基于所述训练数据训练神经网络,得到至少一个浮点数;量化所述至少一个浮点数中的每一个浮点数,以得到整数形式的所述至少一套网络权重。
本实施例中,通过量化网络权重可以降低硬件复杂度,网络权重经过量化后由32bit的浮点数变成了低比特(bit)深度的整数,通过整数计算的神经网络在硬件实现上更加友好。应当理解,本申请实施例对整数具体包括的比特位数不作限定。
需要说明的是,一套网络权重可以包括权矩阵参数和/或偏置(biases)参数,基于此,所述至少一个浮点数可以包括至少一个浮点形式的矩阵参数和/或至少一个浮点形式的偏执参数。
在一种实现方式中,基于以下公式量化所述至少一个浮点数中的每一个浮点数:
w i=round(w f×2^bitdepth);
其中,w f表示所述至少一个浮点数中的浮点数,round()表示按照指定的小数位数进行四舍五入的运算,w i表示整数形式的网络权重,bitdepth表示用于将浮点数量化为整数形式的量化参数。浮点数在计算机中用以近似表示任意某个实数。round()用于返回一个数值,该数值是按照指定的小数位数进行四舍五入运算的结果。可选的,bitdepth=bitlength-1,其中,bitlength表示量化后的整数的比特长度。例如,bitlength为6,则bitdepth为5。
在一种实现方式中,基于以下公式量化所述至少一个浮点数中的每一个浮点数:
w i=round(w f×Scale)-((2^bitlength)/2)×((W fmax+W fmin)/(W fmax-W fmin));
Scale=(2^bitlength)/(W fmax-W fmin);
其中,w f表示所述至少一个浮点数中的浮点数,round()表示按照指定的小数位数进行四舍五入的运算,bitlength表示量化后的整数的比特长度,Scale表示量化尺度,W fmax和W fmin分别为所述至少一个浮点数中的最大浮点数和最小浮点数。
换言之,可以基于以下公式量化所述至少一个浮点数中的每一个浮点数:
w i=round(w f×Scale)-Offset;
Scale=(2^bitlength)/(W fmax-W fmin);
Offset=Scale×(W fmax+W fmin)/2。
其中,Offset表示量化过程中用于将整数形式的网络权重移位到-2^(bitdepth)到2^(bitdepth)范围内的一个偏移量,换言之,通过Offset能够保证量化后的网络权重在-2^(bitdepth)到2^(bitdepth)范围内。
当然,上述量化方式仅为本申请的示例,也可以采用其他量化方式,本申请实施例对此不作具体限定。
在一种实现方式中,若确定使用所述第一预测模式对所述目标图像块进行帧内预测,基于所述至少一套网络权重中的整数形式的目标网络权重,使用所述第一预测模式对所述目标图像块进行帧内预测,得到量化预测块;对所述量化预测块进行反量化处理,以得到所述目标预测块。需注意,由于网络权重在量化的过程中被放大了Scale倍,即(2^bitdepth)倍,故在每层卷积结束后,卷积结果需要缩放2^bitdepth倍,这里可以通过移位操作来完成缩放。例如,基于以下公式对所述量化预测块每层的卷积结果进行反量化处理,以得到所述目标预测块:
O'=(O+(2^bitdepth)/2)>>bitdepth;
其中,O'表示所述目标预测块的计算过程中的、针对所述神经网络中的每个卷积层的、且未经过反量化处理的结果,O表示所述量化预测块的计算过程中的、针对所述神经网络中的每个卷积层的、且经过反量化处理的结果,bitdepth表示用于将浮点数量化为整数形式的量化参数。经过反量化处理的结果可以理解为已完成缩放处理或位移操作的结果。
例如,假设网络权重相比浮点数放大了32倍,则在预测过程中使用整数形式的网络权重进行推理预测时,每层卷积结束需将结果缩小32倍,即为bitdepth为6。
需要说明的是,在上文中,是针对每一个卷积层的卷积结果进行反量化处理的,但本申请实施例并不限于成。例如,在本申请的其他实施例中,还可以针对神经网络的结果进行反量化处理,即所述量化预测块可以是经过反量化处理前的神经网络输出结果,相应的,所述目标预测块可以是经过反量化处理后的神经网络输出结果,此时,将所述对所述量化预测块进行反量化处理时,bitdepth与神经 网络中卷积层的数量相关。神经网络中卷积层的数量越大,bitdepth的数值越大。
在本申请的一些实施例中,所述S330可包括:
对所述允许标识、所述至少一套网络权重以及所述目标残差块进行编码,得到所述码流。
在本申请的一些实施例中,所述至少一套网络权重位于所述目标对象的码流的头部。
当然,所述至少一套网络权重也可保存于码流的其他位置,本申请实施例对此不作具体限定。
在本申请的一些实施例中,所述S330可包括:
对所述允许标识、所述至少一套网络权重、权重控制标识以及所述目标残差块进行编码,得到所述码流,所述权重控制标识用于标识所述目标对象的码流中是否包括所述至少一套网络权重。
在本申请的一些实施例中,所述S330可包括:
对所述允许标识、所述至少一套网络权重、权重控制标识、网络权重符号标识以及所述目标残差块进行编码,得到所述码流,所述权重控制标识用于标识所述目标对象的码流中是否包括所述至少一套网络权重,所述网络权重符号标识用于标识所述至少一套网络权重中的网络权重为正的或负的。
下面以所述至少一套网络权重位于所述目标图像帧的码流的头部为例,语法元素可如下表所示。
表7
Figure PCTCN2021073410-appb-000010
如表7所示,权重控制标识aip_info_ph_flag代表当前图像帧是否包括网络权重,该权重控制标识为1代表当前图像帧包括网络权重,该权重控制标识为0代表当前图像帧不包括网络权重。所述至少一套网络权重可以为参数aip_params。网络权重符号标识aip_param_sign代表网络权重的正负,网络权重符号标识为0代表正,网络权重符号标识为1代表负。
在本申请的一些实施例中,所述S330可包括:
对所述允许标识、所述至少一套网络权重的残差以及所述目标残差块进行编码,得到所述码流;所述至少一套网络权重的残差包括所述至少一套网络权重中的每一套网络权重减去预先训练的网络权重后的残差。
例如,编码端使用一套或若干套预先训练的网络作为基础网络权重,在线训练出的网络权重通过与基础网络权重相减得到网络权重的残差,通过保存网络权重的残差的方式间接保存在线训练得到的网络权重。
在本申请的一些实施例中,所述至少一套网络权重的编码方式为定长码编码方式或变长码编码方式。
当然,也可以采用其他编码方式,例如也可以采用网络权重压缩的方式将网络权重进一步压缩后保存进码流,本申请实施例对此不作具体限定。
在本申请的一些实施例中,所述方法300还可包括:
获取预先训练好的一套或多套网络权重;
基于所述一套或多套网络权重,确定用于所述第一预测模式的网络权重。
例如,编码端使用一套或若干套预先训练的网络权重作为基础网络权重,在线训练出的网络权重都为基础网络权重上通过做微调(fine-tuning)得到。
在本申请的一些实施例中,所述允许标识用于标识不允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述S330可包括:
利用传统预测模式对所述目标残差块进行帧内预测,得到所述目标预测块。
上文中从编码端的角度详细描述了根据本申请实施例的编码方法,下面将结合图10,从解码端的角度描述根据本申请实施例的解码方法。
本申请实施例还提供了一种基于在线训练神经网络的解码方法。解码端可以从码流中获取一套或多套权重,将获取的一套或多套权重载入网络结构,重建在线训练得到的第一预测模式,然后解码器 从码流中获取选中第一预测模式的图像块,并采用第一预测模式对其进行预测。
图9是本申请实施例提供的解码框架200-1的示意性流程图。如图9所示,所述解码框架200-1可包括用于执行第一预测模式的在线神经网络单元,在线训练的帧内预测模式在解码部分中,包括解码端从码流中获取当前图像帧的网络权重,从码流中获取选中在线训练的帧内预测模式,基于此,确定使用在线训练的帧内预测模式对当前图像块进行帧内预测时,使用在线训练的帧内预测模式对当前图像块进行帧内预测。需要说明的是,所述解码框架200-1是解码框架200的扩展,即解码框架200-1中的其他单元可参见解码框架200中的相关描述,为避免重复,此处不再赘述。
图10示出了根据本申请实施例的解码方法400的示意性流程图。应理解,该解码方法400可由解码端执行。例如应用于图9所示的解码框架200-1。
如图10所示,所述解码方法400可包括:
S410,通过解析码流,获取允许标识以及目标图像序列中的目标图像块的目标残差块;所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
S420,基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
S430,基于所述目标残差块和所述目标预测块,得到所述目标图像帧。
在本申请的一些实施例中,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述目标图像块包括目标亮度块,所述目标预测块包括所述目标亮度块的预测块;所述S420可包括:
通过解析所述码流亮度控制标识;所述亮度控制标识用于标识是否使用所述第一预测模式对所述目标亮度块进行帧内预测;
基于所述亮度控制标识,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块。
在本申请的一些实施例中,所述亮度控制标识用于标识使用所述第一预测模式对所述目标亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述S420可包括:
通过解析所述码流获取亮度索引;所述亮度索引用于指示所述多个网络权重中的目标网络权重;
基于所述目标网络权重,利用所述第一预测模式,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块。
在本申请的一些实施例中,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述目标图像块包括目标色度块,所述目标预测块包括所述目标色度块的预测块;所述S420可包括:
通过解析所述码流获取色度控制标识;所述色度控制标识用于标识是否使用所述第一预测模式对所述目标色度块进行帧内预测;
基于所述色度控制标识,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块。
在本申请的一些实施例中,所述色度控制标识用于标识使用所述第一预测模式对所述色度亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述S420可包括:
通过解析所述码流色度索引;所述色度索引用于指示所述多个网络权重中的目标网络权重;
基于所述目标网络权重,利用所述第一预测模式,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块。
在本申请的一些实施例中,所述方法400还可包括:
通过解析所述码流,获取至少一套网络权重;所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块。
在本申请的一些实施例中,所述至少一套网络权重的编码方式为定长码编码方式或变长码编码方式。
在本申请的一些实施例中,所述方法400还可包括:
通过解析所述码流,获取至少一套网络权重的残差;所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块,所述至少一套网络权重的残差包括所述至少一套网络权重中的每一套网络权重减去预先训练的网络权重后的残差。
在本申请的一些实施例中,所述方法400还可包括:
若所述允许标识用于标识不允许使用所述第一预测模式对所述目标图像序列中的图像块进行帧内预测,使用传统预测模式对所述目标图像块进行帧内预测,得到目标预测块。
应理解,解码方法400的过程为编码方法300的逆过程,即解码方法400中的步骤可以参考编码 方法300中的相应步骤,为了简洁,在此不再赘述。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文从编码端的角度详细描述了根据本申请实施例的编码方法,下面结合图10,从解码端的角度描述根据本申请实施例的解码方法。
图10示出了根据本申请实施例的解码方法400的示意性流程图。所述方法400可以由包括基于在线训练神经网络的帧内预测单元的解码框架。在一种实现方式中,可将基于在线训练神经网络的帧内预测单元扩展至图4所述的解码框架中,以执行所述解码方法400。
如图10所示,所述方法400可包括:
上文详细描述了本申请的方法实施例,下文结合图11至图13,详细描述本申请的装置实施例。
图11是本申请实施例的编码器500的示意性框图。
如图11所示,所述编码器500可包括:
获取单元510,用于获取目标图像序列和允许标识,所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式
划分单元520,用于将所述目标图像序列中的目标图像帧划分为多个图像块,所述多个图像块包括目标图像块;
预测单元530,用于基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
残差单元540,用于基于所述目标图像的目标预测块得到目标残差块;
编码单元550,用于对所述允许标识以及所述目标残差块进行编码,得到码流。
在本申请的一些实施例中,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述预测单元530还用于:
在所述第一预测模式和传统预测模式中选择最优预测模式,利用所述最有预测模式对所述目标残差块进行帧内预测,得到所述目标预测块。
在本申请的一些实施例中,所述预测单元530具体用于:
使用所述第一预测模式对所述目标图像块进行帧内预测,得到第一预测块;
使用所述传统预测模式对所述目标图像块进行帧内预测,以得到第二预测块;
若所述第一预测块的率失真代价低于所述第二预测块的率失真代价,将所述第一预测模式确定为所述最优预测模式;若所述第一预测块的率失真代价高于所述第二预测块的率失真代价,将所述传统预测模式确定为所述最优预测模式。
在本申请的一些实施例中,所述目标图像块包括目标亮度块,所述编码单元550具体用于:
对所述允许标识、亮度控制标识以及所述目标残差块进行编码,得到所述码流;所述亮度控制标识用于标识是否使用所述第一预测模式对所述目标亮度块进行帧内预测。
在本申请的一些实施例中,所述亮度控制标识用于标识使用所述第一预测模式对所述目标亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述编码单元550具体用于:
对所述允许标识、所述亮度控制标识、亮度索引以及所述目标残差块进行编码,得到所述码流;所述亮度索引用于指示所述多个网络权重中的目标网络权重,所述目标网络权重为对所述目标亮度块进行帧内预测时所述第一预测模式使用的网络权重。
在本申请的一些实施例中,所述目标图像块包括目标色度块,所述编码单元550具体用于:
对所述允许标识、色度控制标识以及所述目标残差块进行编码,得到所述码流;所述色度控制标识用于标识是否使用所述第一预测模式对所述目标色度块进行帧内预测。
在本申请的一些实施例中,所述色度控制标识用于标识使用所述第一预测模式对所述色度亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述编码单元550具体用于:
对所述允许标识、所述色度控制标识、色度索引以及所述目标残差块进行编码,得到所述码流;所述色度索引用于指示所述多个网络权重中的目标网络权重,所述目标网络权重为对所述目标色度块进行帧内预测时所述第一预测模式使用的网络权重。
在本申请的一些实施例中,所述预测单元530还用于:
基于目标对象获取训练数据,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块;
基于所述训练数据训练至少一套网络权重,所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重。
在本申请的一些实施例中,所述至少一套网络权重包括用于亮度块的网络权重,所述训练数据包括用于训练用于亮度块的网络权重的训练亮度块;所述预测单元530具体用于:
以所述训练亮度块相邻的重建部分为输入,训练用于亮度块的网络权重。
在本申请的一些实施例中,所述至少一套网络权重包括用于色度块的网络权重,所述训练数据包括用于训练用于色度块的网络权重的训练色度块;所述预测单元530具体用于:
以所述训练色度块相邻的重建部分,所述训练色度块的重建部分和所述训练色度块相邻的重建部分为输入,训练用于色度块的网络权重。
在本申请的一些实施例中,所述训练数据包括用于训练所述第一预测模式的网络权重的训练图像块;所述预测单元530具体用于:
利用所述训练图像块的像素值减去所述训练图像块的像素值的平均值,得到训练集;
基于所述训练集训练所述至少一套网络权重。
在本申请的一些实施例中,所述预测单元530具体用于:
基于所述训练数据训练神经网络,得到至少一个浮点数;
量化所述至少一个浮点数中的每一个浮点数,以得到整数形式的所述至少一套网络权重。
在本申请的一些实施例中,所述预测单元530具体用于:
基于以下公式量化所述至少一个浮点数中的每一个浮点数:
w i=round(w f×2^bitdepth);
其中,w f表示所述至少一个浮点数中的浮点数,round()表示按照指定的小数位数进行四舍五入的运算,w i表示整数形式的网络权重,bitdepth表示用于将浮点数量化为整数形式的量化参数。
在本申请的一些实施例中,所述预测单元530具体用于:
基于以下公式量化所述至少一个浮点数中的每一个浮点数:
w i=round(w f×Scale)-((2^bitlength)/2)×((W fmax+W fmin)/(W fmax-W fmin));
Scale=(2^bitlepth)/(W fmax-W fmin);
其中,w f表示所述至少一个浮点数中的浮点数,round()表示按照指定的小数位数进行四舍五入的运算,bitlength表示量化后的整数的比特长度,Scale表示量化尺度,W fmax和W fmin分别为所述至少一个浮点数中的最大浮点数和最小浮点数。
在本申请的一些实施例中,所述预测单元530具体用于:
若确定使用所述第一预测模式对所述目标图像块进行帧内预测,基于所述至少一套网络权重中的整数形式的目标网络权重,使用所述第一预测模式对所述目标图像块进行帧内预测,得到量化预测块;
对所述量化预测块进行反量化处理,即每层卷积结果需缩放2^bitdepth倍,以得到所述目标预测块。
在本申请的一些实施例中,所述预测单元530具体用于:
基于以下公式对所述量化预测块进行反量化处理,以得到所述目标预测块:
O'=(O+(2^bitdepth)/2)>>bitdepth;
其中,O'表示所述目标预测块的计算过程中的、针对所述神经网络中的每个卷积层的、且未经过反量化处理的结果,O表示所述量化预测块的计算过程中的、针对所述神经网络中的每个卷积层的、且经过反量化处理的结果,bitdepth表示用于将浮点数量化为整数形式的量化参数。
在本申请的一些实施例中,所述编码单元550具体用于:
对所述允许标识、所述至少一套网络权重以及所述目标残差块进行编码,得到所述码流。
在本申请的一些实施例中,所述至少一套网络权重位于所述目标对象的码流的头部。
在本申请的一些实施例中,所述编码单元550具体用于:
对所述允许标识、所述至少一套网络权重、权重控制标识以及所述目标残差块进行编码,得到所述码流,所述权重控制标识用于标识所述目标对象的码流中是否包括所述至少一套网络权重。
在本申请的一些实施例中,所述编码单元550具体用于:
对所述允许标识、所述至少一套网络权重、权重控制标识、网络权重符号标识以及所述目标残差块进行编码,得到所述码流,所述权重控制标识用于标识所述目标对象的码流中是否包括所述至少一套网络权重,所述网络权重符号标识用于标识所述至少一套网络权重中的网络权重为正的或负的。
在本申请的一些实施例中,所述编码单元550具体用于:
对所述允许标识、所述至少一套网络权重的残差以及所述目标残差块进行编码,得到所述码流;所述至少一套网络权重的残差包括所述至少一套网络权重中的每一套网络权重减去预先训练的网络权重后的残差。
在本申请的一些实施例中,所述至少一套网络权重的编码方式为定长码编码方式或变长码编码方式。
在本申请的一些实施例中,所述预测单元530还用于:
获取预先训练好的一套或多套网络权重;
基于所述一套或多套网络权重,确定用于所述第一预测模式的网络权重。
在本申请的一些实施例中,所述允许标识用于标识不允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述预测单元530具体用于:
利用传统预测模式对所述目标残差块进行帧内预测,得到所述目标预测块。
图12是本申请实施例的解码器600的示意性框图。
如图12所示,所述解码器600可包括:
解析单元610,用于通过解析码流,获取允许标识以及目标图像序列中的目标图像块的目标残差块;所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
预测单元620,用于基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
处理单元630,用于基于所述目标残差块和所述目标预测块,得到所述目标图像帧。
在本申请的一些实施例中,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述目标图像块包括目标亮度块,所述目标预测块包括所述目标亮度块的预测块;所述预测单元620具体用于:
通过解析所述码流亮度控制标识;所述亮度控制标识用于标识是否使用所述第一预测模式对所述目标亮度块进行帧内预测;
基于所述亮度控制标识,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块。
在本申请的一些实施例中,所述亮度控制标识用于标识使用所述第一预测模式对所述目标亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述预测单元620具体用于:
通过解析所述码流获取亮度索引;所述亮度索引用于指示所述多个网络权重中的目标网络权重;
基于所述目标网络权重,利用所述第一预测模式,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块。
在本申请的一些实施例中,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述目标图像块包括目标色度块,所述目标预测块包括所述目标色度块的预测块;所述预测单元620具体用于:
通过解析所述码流获取色度控制标识;所述色度控制标识用于标识是否使用所述第一预测模式对所述目标色度块进行帧内预测;
基于所述色度控制标识,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块。
在本申请的一些实施例中,所述色度控制标识用于标识使用所述第一预测模式对所述色度亮度块进行帧内预测,所述第一预测模式包括多个网络权重;所述预测单元620具体用于:
通过解析所述码流色度索引;所述色度索引用于指示所述多个网络权重中的目标网络权重;
基于所述目标网络权重,利用所述第一预测模式,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块。
在本申请的一些实施例中,所述解析单元610还用于:
通过解析所述码流,获取至少一套网络权重;所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块。
在本申请的一些实施例中,所述至少一套网络权重的编码方式为定长码编码方式或变长码编码方式。
在本申请的一些实施例中,所述解析单元610还用于:
通过解析所述码流,获取至少一套网络权重的残差;所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块,所述至少一套网络权重的残差包括所述至少一套网络权重中的每一套网络权重减去预先训练的网络权重后的残差。
在本申请的一些实施例中,所述预测单元620具体用于:
若所述允许标识用于标识不允许使用所述第一预测模式对所述目标图像序列中的图像块进行帧内预测,使用传统预测模式对所述目标图像块进行帧内预测,得到目标预测块。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图11所示的编码器500可以对应于执行本申请实施例的方法300中的相应主体,即编码器500中的各个单元的前述和其它操作和/或功能分别为了实现方法300等各个方法中的相应流程。图12所示的解码器600可以对应于执行本申请实施例的方法400中的相应主体,并且解码器600中的各个单元的前述和其它操作和/或功能分别为了实现方法400等各个方法中的相应流程。
还应当理解,本申请实施例涉及的编码器500或解码器600中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该编码器500或解码器600也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造本申请实施例涉及的编码器500或解码器600,以及来实现本申请实施例的编码方法或解码方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于电子设备中,并在其中运行,来实现本申请实施例的相应方法。
换言之,上文涉及的单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过软硬件结合的形式实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件组合执行完成。可选地,软件可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图13是本申请实施例提供的电子设备700的示意结构图。
如图13所示,该电子设备700至少包括处理器710以及计算机可读存储介质720。其中,处理器710以及计算机可读存储介质720可通过总线或者其它方式连接。计算机可读存储介质720用于存储计算机程序721,计算机程序721包括计算机指令,处理器710用于执行计算机可读存储介质720存储的计算机指令。处理器710是电子设备700的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
作为示例,处理器710也可称为中央处理器(CentralProcessingUnit,CPU)。处理器710可以包括但不限于:通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
作为示例,计算机可读存储介质720可以是高速RAM存储器,也可以是非不稳定的存储器(Non-VolatileMemory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器710的计算机可读存储介质。具体而言,计算机可读存储介质720包括但不限于:易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在一种实现方式中,该电子设备700可以是本申请实施例涉及的编码端、编码器或编码框架;该计算机可读存储介质720中存储有第一计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第一计算机指令,以实现本申请实施例提供的编码方法中的相应步骤;换言之,计算机 可读存储介质720中的第一计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
在一种实现方式中,该电子设备700可以是本申请实施例涉及的解码端、解码器或解码框架;该计算机可读存储介质720中存储有第二计算机指令;由处理器710加载并执行计算机可读存储介质720中存放的第二计算机指令,以实现本申请实施例提供的解码方法中的相应步骤;换言之,计算机可读存储介质720中的第二计算机指令由处理器710加载并执行相应步骤,为避免重复,此处不再赘述。
根据本申请的另一方面,本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是电子设备700中的记忆设备,用于存放程序和数据。例如,计算机可读存储介质720。可以理解的是,此处的计算机可读存储介质720既可以包括电子设备700中的内置存储介质,当然也可以包括电子设备700所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了电子设备700的操作系统。并且,在该存储空间中还存放了适于被处理器710加载并执行的一条或多条的计算机指令,这些计算机指令可以是一个或多个的计算机程序721(包括程序代码)。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。例如,计算机程序721。此时,数据处理设备700可以是计算机,处理器710从计算机可读存储介质720读取该计算机指令,处理器710执行该计算机指令,使得该计算机执行上述各种可选方式中提供的编码方法或解码方法。
换言之,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地运行本申请实施例的流程或实现本申请实施例的功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质进行传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元以及流程步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
最后需要说明的是,以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (37)

  1. 一种编码方法,其特征在于,包括:
    获取目标图像序列和允许标识,所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
    将所述目标图像序列中的目标图像帧划分为多个图像块,所述多个图像块包括目标图像块;
    基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
    基于所述目标预测块,得到目标残差块;
    对所述允许标识以及所述目标残差块进行编码,得到码流。
  2. 根据权利要求1所述的方法,其特征在于,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;
    所述基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块,包括:
    在所述第一预测模式和传统预测模式中选择最优预测模式;
    利用所述最优预测模式对所述目标残差块进行帧内预测,得到所述目标预测块。
  3. 根据权利要求2所述的方法,其特征在于,所述在所述第一预测模式和传统预测模式中选择最优预测模式,包括:
    使用所述第一预测模式对所述目标图像块进行帧内预测,得到第一预测块;
    使用所述传统预测模式对所述目标图像块进行帧内预测,得到第二预测块;
    若所述第一预测块的率失真代价低于所述第二预测块的率失真代价,将所述第一预测模式确定为所述最优预测模式;若所述第一预测块的率失真代价高于所述第二预测块的率失真代价,将所述传统预测模式确定为所述最优预测模式。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述目标图像块包括目标亮度块,所述对所述允许标识以及所述目标残差块进行编码,得到码流,包括:
    对所述允许标识、亮度控制标识以及所述目标残差块进行编码,得到所述码流;所述亮度控制标识用于标识是否使用所述第一预测模式对所述目标亮度块进行帧内预测。
  5. 根据权利要求4所述的方法,其特征在于,所述亮度控制标识用于标识使用所述第一预测模式对所述目标亮度块进行帧内预测,所述第一预测模式包括多个网络权重;
    所述对所述允许标识、亮度控制标识以及所述目标残差块进行编码,得到所述码流,包括:
    对所述允许标识、所述亮度控制标识、亮度索引以及所述目标残差块进行编码,得到所述码流;所述亮度索引用于指示所述多个网络权重中的目标网络权重,所述目标网络权重为对所述目标亮度块进行帧内预测时所述第一预测模式使用的网络权重。
  6. 根据权利要求1至3中任一项所述的方法,其特征在于,所述目标图像块包括目标色度块,所述对所述允许标识以及所述目标残差块进行编码,得到码流,包括:
    对所述允许标识、色度控制标识以及所述目标残差块进行编码,得到所述码流;所述色度控制标识用于标识是否使用所述第一预测模式对所述目标色度块进行帧内预测。
  7. 根据权利要求6所述的方法,其特征在于,所述色度控制标识用于标识使用所述第一预测模式对所述色度亮度块进行帧内预测,所述第一预测模式包括多个网络权重;
    所述对所述允许标识、色度控制标识以及所述目标残差块进行编码,得到所述码流,包括:
    对所述允许标识、所述色度控制标识、色度索引以及所述目标残差块进行编码,得到所述码流;所述色度索引用于指示所述多个网络权重中的目标网络权重,所述目标网络权重为对所述目标色度块进行帧内预测时所述第一预测模式使用的网络权重。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    基于目标对象获取训练数据,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块;
    基于所述训练数据训练至少一套网络权重,所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重。
  9. 根据权利要求8所述的方法,其特征在于,所述至少一套网络权重包括用于亮度块的网络权重,所述训练数据包括用于训练用于亮度块的网络权重的训练亮度块;所述基于所述训练数据训练至少一套网络权重,包括:
    以所述训练亮度块相邻的重建部分为输入,训练用于亮度块的网络权重。
  10. 根据权利要求8所述的方法,其特征在于,所述至少一套网络权重包括用于色度块的网络权重,所述训练数据包括用于训练用于色度块的网络权重的训练色度块;所述基于所述训练数据训练至 少一套网络权重,包括:
    以所述训练色度块相邻的重建部分,所述训练色度块的重建部分和所述训练色度块相邻的重建部分为输入,训练用于色度块的网络权重。
  11. 根据权利要求8所述的方法,其特征在于,所述训练数据包括用于训练所述第一预测模式的网络权重的训练图像块;所述基于所述训练数据训练至少一套网络权重,包括:
    利用所述训练图像块的像素值减去所述训练图像块的像素值的平均值,得到训练集;
    基于所述训练集训练所述至少一套网络权重。
  12. 根据权利要求8至11中任一项所述的方法,其特征在于,所述基于所述训练数据训练至少一套网络权重,包括:
    基于所述训练数据训练神经网络,得到至少一个浮点数;
    量化所述至少一个浮点数中的每一个浮点数,以得到整数形式的所述至少一套网络权重。
  13. 根据权利要求12所述的方法,其特征在于,所述量化所述至少一个浮点数中的每一个浮点数,包括:
    基于以下公式量化所述至少一个浮点数中的每一个浮点数:
    w i=round(w f×2^bitdepth);
    其中,w f表示所述至少一个浮点数中的浮点数,round()表示按照指定的小数位数进行四舍五入的运算,w i表示整数形式的网络权重,bitdepth表示用于将浮点数量化为整数形式的量化参数。
  14. 根据权利要求12所述的方法,其特征在于,所述量化所述至少一个浮点数中的每一个浮点数,包括:
    基于以下公式量化所述至少一个浮点数中的每一个浮点数:
    w i=round(w f×Scale)-((2^bitlength)/2)×((W fmax+W fmin)/(W fmax-W fmin));
    Scale=(2^bitlength)/(W fmax-W fmin);
    其中,w f表示所述至少一个浮点数中的浮点数,round()表示按照指定的小数位数进行四舍五入的运算,bitlength表示量化后的整数的比特长度,Scale表示量化尺度,W fmax和W fmin分别为所述至少一个浮点数中的最大浮点数和最小浮点数。
  15. 根据权利要求12所述的方法,其特征在于,所述基于对所述目标图像块预测得到的目标预测块,包括:
    若确定使用所述第一预测模式对所述目标图像块进行帧内预测,基于所述至少一套网络权重中的整数形式的目标网络权重,使用所述第一预测模式对所述目标图像块进行帧内预测,得到量化预测块;
    对所述量化预测块进行反量化处理,以得到所述目标预测块。
  16. 根据权利要求15所述的方法,其特征在于,所述对所述量化预测块进行反量化处理,得到所述目标预测块,包括:
    基于以下公式对所述量化预测块进行反量化处理,以得到所述目标预测块:
    O'=(O+(2^bitdepth)/2)>>bitdepth;
    其中,O'表示所述目标预测块的计算过程中的、针对所述神经网络中的每个卷积层的、且未经过反量化处理的结果,O表示所述量化预测块的计算过程中的、针对所述神经网络中的每个卷积层的、且经过反量化处理的结果,bitdepth表示用于将浮点数量化为整数形式的量化参数。
  17. 根据权利要求8至16中任一项所述的方法,其特征在于,所述对所述允许标识以及所述目标残差块进行编码,得到码流,包括:
    对所述允许标识、所述至少一套网络权重以及所述目标残差块进行编码,得到所述码流。
  18. 根据权利要求17所述的方法,其特征在于,所述至少一套网络权重位于所述目标对象的码流的头部。
  19. 根据权利要求17所述的方法,其特征在于,所述对所述允许标识、所述至少一套网络权重以及所述目标残差块进行编码,得到所述码流,包括:
    对所述允许标识、所述至少一套网络权重、权重控制标识以及所述目标残差块进行编码,得到所述码流,所述权重控制标识用于标识所述目标对象的码流中是否包括所述至少一套网络权重。
  20. 根据权利要求17所述的方法,其特征在于,所述对所述允许标识、所述至少一套网络权重以及所述目标残差块进行编码,得到所述码流,包括:
    对所述允许标识、所述至少一套网络权重、权重控制标识、网络权重符号标识以及所述目标残差块进行编码,得到所述码流,所述权重控制标识用于标识所述目标对象的码流中是否包括所述至少一套网络权重,所述网络权重符号标识用于标识所述至少一套网络权重中的网络权重为正的或负的。
  21. 根据权利要求17所述的方法,其特征在于,所述对所述允许标识、所述至少一套网络权重 以及所述目标残差块进行编码,得到所述码流,包括:
    对所述允许标识、所述至少一套网络权重的残差以及所述目标残差块进行编码,得到所述码流;所述至少一套网络权重的残差包括所述至少一套网络权重中的每一套网络权重减去预先训练的网络权重后的残差。
  22. 根据权利要求17所述的方法,其特征在于,所述至少一套网络权重的编码方式为定长码编码方式或变长码编码方式。
  23. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    获取预先训练好的一套或多套网络权重;
    基于所述一套或多套网络权重,确定用于所述第一预测模式的网络权重。
  24. 根据权利要求1所述的方法,其特征在于,所述允许标识用于标识不允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块,包括:
    利用传统预测模式对所述目标残差块进行帧内预测,得到所述目标预测块。
  25. 一种解码方法,其特征在于,包括:
    通过解析码流,获取允许标识以及目标图像序列中的目标图像块的目标残差块;所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
    基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
    基于所述目标残差块和所述目标预测块,得到所述目标图像帧。
  26. 根据权利要求25所述的方法,其特征在于,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述目标图像块包括目标亮度块,所述目标预测块包括所述目标亮度块的预测块;
    所述基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块,包括:
    通过解析所述码流亮度控制标识;所述亮度控制标识用于标识是否使用所述第一预测模式对所述目标亮度块进行帧内预测;
    基于所述亮度控制标识,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块。
  27. 根据权利要求26所述的方法,其特征在于,所述亮度控制标识用于标识使用所述第一预测模式对所述目标亮度块进行帧内预测,所述第一预测模式包括多个网络权重;
    所述基于所述亮度控制标识,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块,包括:
    通过解析所述码流获取亮度索引;所述亮度索引用于指示所述多个网络权重中的目标网络权重;
    基于所述目标网络权重,利用所述第一预测模式,对所述目标亮度块进行帧内预测,得到所述目标亮度块的预测块。
  28. 根据权利要求25所述的方法,其特征在于,所述允许标识用于标识允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测;所述目标图像块包括目标色度块,所述目标预测块包括所述目标色度块的预测块;
    所述基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块,包括:
    通过解析所述码流获取色度控制标识;所述色度控制标识用于标识是否使用所述第一预测模式对所述目标色度块进行帧内预测;
    基于所述色度控制标识,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块。
  29. 根据权利要求28所述的方法,其特征在于,所述色度控制标识用于标识使用所述第一预测模式对所述目标色度块进行帧内预测,所述第一预测模式包括多个网络权重;
    所述基于所述色度控制标识,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块,包括:
    通过解析所述码流色度索引;所述色度索引用于指示所述多个网络权重中的目标网络权重;
    基于所述目标网络权重,利用所述第一预测模式,对所述目标色度块进行帧内预测,得到所述目标色度块的预测块。
  30. 根据权利要求25至29中任一项所述的方法,其特征在于,所述方法还包括:
    通过解析所述码流,获取至少一套网络权重;所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块。
  31. 根据权利要求30所述的方法,其特征在于,所述至少一套网络权重的编码方式为定长码编 码方式或变长码编码方式。
  32. 根据权利要求25至29中任一项所述的方法,其特征在于,所述方法还包括:
    通过解析所述码流,获取至少一套网络权重的残差;所述至少一套网络权重为使用所述第一预测模式对所述目标对象中的图像块进行帧内预测时可使用的网络权重,所述目标对象为所述目标图像序列、所述目标图像序列中的切片、所述目标图像帧或所述目标图像块,所述至少一套网络权重的残差包括所述至少一套网络权重中的每一套网络权重减去预先训练的网络权重后的残差。
  33. 根据权利要求25至32中任一项所述的方法,其特征在于,所述方法还包括:
    若所述允许标识用于标识不允许使用所述第一预测模式对所述目标图像序列中的图像块进行帧内预测,使用传统预测模式对所述目标图像块进行帧内预测,得到目标预测块。
  34. 一种编码器,其特征在于,包括:
    获取单元,用于获取目标图像序列和允许标识,所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式
    划分单元,用于将所述目标图像序列中的目标图像帧划分为多个图像块,所述多个图像块包括目标图像块;
    预测单元,用于基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
    残差单元,用于基于所述目标图像的目标预测块得到目标残差块;
    编码单元,用于对所述允许标识以及所述目标残差块进行编码,得到码流。
  35. 一种解码器,其特征在于,包括:
    解析单元,用于通过解析码流,获取允许标识以及目标图像序列中的目标图像块的目标残差块;所述允许标识用于标识是否允许使用第一预测模式对所述目标图像序列中的图像块进行帧内预测,所述第一预测模式指基于在线训练的网络权重对图像块进行帧内预测的预测模式;
    预测单元,用于基于所述允许标识,对所述目标图像块进行帧内预测,得到目标预测块;
    处理单元,用于基于所述目标残差块和所述目标预测块,得到所述目标图像帧。
  36. 一种电子设备,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至24中任一项所述的编码方法,或实现如权利要求25至33中任一项所述的解码方法。
  37. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机指令,所述计算机指令适于由所述处理器加载并执行如权利要求1至24任一项所述的编码方法或如权利要求25至33中任一项所述的解码方法。
PCT/CN2021/073410 2021-01-22 2021-01-22 编码方法、解码方法、编码器、解码器以及电子设备 WO2022155923A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/073410 WO2022155923A1 (zh) 2021-01-22 2021-01-22 编码方法、解码方法、编码器、解码器以及电子设备
CN202180083611.5A CN116686288A (zh) 2021-01-22 2021-01-22 编码方法、解码方法、编码器、解码器以及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/073410 WO2022155923A1 (zh) 2021-01-22 2021-01-22 编码方法、解码方法、编码器、解码器以及电子设备

Publications (1)

Publication Number Publication Date
WO2022155923A1 true WO2022155923A1 (zh) 2022-07-28

Family

ID=82548402

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073410 WO2022155923A1 (zh) 2021-01-22 2021-01-22 编码方法、解码方法、编码器、解码器以及电子设备

Country Status (2)

Country Link
CN (1) CN116686288A (zh)
WO (1) WO2022155923A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115604477A (zh) * 2022-12-14 2023-01-13 广州波视信息科技股份有限公司(Cn) 一种超高清视频失真优化编码方法
CN115955574A (zh) * 2023-03-10 2023-04-11 宁波康达凯能医疗科技有限公司 一种基于权重网络的帧内图像编码方法、装置及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109842799A (zh) * 2017-11-29 2019-06-04 杭州海康威视数字技术股份有限公司 颜色分量的帧内预测方法及装置
CN109996083A (zh) * 2017-12-29 2019-07-09 杭州海康威视数字技术股份有限公司 帧内预测方法及装置
CN110677644A (zh) * 2018-07-03 2020-01-10 北京大学 一种视频编码、解码方法及视频编码帧内预测器
CN111432208A (zh) * 2020-04-01 2020-07-17 济南浪潮高新科技投资发展有限公司 一种利用神经网络确定帧内预测模式的方法
US20200329233A1 (en) * 2019-04-12 2020-10-15 Frank Nemirofsky Hyperdata Compression: Accelerating Encoding for Improved Communication, Distribution & Delivery of Personalized Content
CN111800642A (zh) * 2020-07-02 2020-10-20 中实燃气发展(西安)有限公司 Hevc帧内角度模式选择方法、装置、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109842799A (zh) * 2017-11-29 2019-06-04 杭州海康威视数字技术股份有限公司 颜色分量的帧内预测方法及装置
CN109996083A (zh) * 2017-12-29 2019-07-09 杭州海康威视数字技术股份有限公司 帧内预测方法及装置
CN110677644A (zh) * 2018-07-03 2020-01-10 北京大学 一种视频编码、解码方法及视频编码帧内预测器
US20200329233A1 (en) * 2019-04-12 2020-10-15 Frank Nemirofsky Hyperdata Compression: Accelerating Encoding for Improved Communication, Distribution & Delivery of Personalized Content
CN111432208A (zh) * 2020-04-01 2020-07-17 济南浪潮高新科技投资发展有限公司 一种利用神经网络确定帧内预测模式的方法
CN111800642A (zh) * 2020-07-02 2020-10-20 中实燃气发展(西安)有限公司 Hevc帧内角度模式选择方法、装置、设备及可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115604477A (zh) * 2022-12-14 2023-01-13 广州波视信息科技股份有限公司(Cn) 一种超高清视频失真优化编码方法
CN115955574A (zh) * 2023-03-10 2023-04-11 宁波康达凯能医疗科技有限公司 一种基于权重网络的帧内图像编码方法、装置及存储介质

Also Published As

Publication number Publication date
CN116686288A (zh) 2023-09-01

Similar Documents

Publication Publication Date Title
CN103782598A (zh) 用于无损编码的快速编码方法
CN104041035A (zh) 用于复合视频的无损编码及相关信号表示方法
KR20130045154A (ko) 영상 복호화 장치
WO2021185008A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
US20220116664A1 (en) Loop filtering method and device
WO2022155923A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
KR20190122615A (ko) 영상 부호화 방법 및 장치
CN111741299B (zh) 帧内预测模式的选择方法、装置、设备及存储介质
WO2021134706A1 (zh) 环路滤波的方法与装置
WO2022052533A1 (zh) 编码方法、解码方法、编码器、解码器以及编码系统
WO2022116085A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
JP2022538968A (ja) ビデオデータの変換スキップ残差符号化
WO2021134635A1 (zh) 变换方法、编码器、解码器以及存储介质
US20230042484A1 (en) Decoding method and coding method for unmatched pixel, decoder, and encoder
WO2022178686A1 (zh) 编解码方法、编解码设备、编解码系统以及计算机可读存储介质
WO2022165763A1 (zh) 编码方法、解码方法、编码器、解码器以及电子设备
JP2022548204A (ja) 変換スキップモードで映像データを符号化するための方法及び装置
KR20170058870A (ko) 비선형 매핑을 통한 영상 부호화/복호화 방법 및 장치
WO2023123398A1 (zh) 滤波方法、滤波装置以及电子设备
WO2021134303A1 (zh) 变换方法、编码器、解码器以及存储介质
WO2022188239A1 (zh) 系数的编解码方法、编码器、解码器及计算机存储介质
WO2023193254A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2022193394A1 (zh) 系数的编解码方法、编码器、解码器及计算机存储介质
WO2023193253A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2024007116A1 (zh) 解码方法、编码方法、解码器以及编码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920321

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180083611.5

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920321

Country of ref document: EP

Kind code of ref document: A1