WO2024012474A1 - 一种基于神经网络的图像解码、编码方法、装置及其设备 - Google Patents

一种基于神经网络的图像解码、编码方法、装置及其设备 Download PDF

Info

Publication number
WO2024012474A1
WO2024012474A1 PCT/CN2023/106899 CN2023106899W WO2024012474A1 WO 2024012474 A1 WO2024012474 A1 WO 2024012474A1 CN 2023106899 W CN2023106899 W CN 2023106899W WO 2024012474 A1 WO2024012474 A1 WO 2024012474A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
layer
neural network
image
decoding
Prior art date
Application number
PCT/CN2023/106899
Other languages
English (en)
French (fr)
Inventor
陈方栋
叶宗苗
武晓阳
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2024012474A1 publication Critical patent/WO2024012474A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4666Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user

Definitions

  • the present disclosure relates to the field of coding and decoding technology, and in particular to a neural network-based image decoding and encoding method, device and equipment.
  • Video images are transmitted after encoding.
  • Complete video encoding can include prediction, transformation, quantization, entropy coding, filtering and other processes.
  • the prediction process can include intra-frame prediction and inter-frame prediction.
  • Inter-frame prediction refers to using the correlation in the video time domain to predict the current pixel using pixels adjacent to the encoded image to effectively remove video time domain redundancy.
  • Purpose refers to using the correlation in the video spatial domain to predict the current pixel using the pixels of the encoded block of the current frame image to achieve the purpose of removing video spatial domain redundancy.
  • Deep learning has been successful in many high-level computer vision problems, such as image classification, target detection, etc.
  • Deep learning has also gradually begun to be applied in the field of encoding and decoding, that is, neural networks can be used to encode images. and decoding.
  • encoding and decoding methods based on neural networks have shown great performance potential, they still have problems such as poor stability, poor generalization, and high complexity.
  • the present disclosure provides an image decoding and encoding method, device and equipment based on neural network, which improves the encoding performance and decoding performance, and solves the problems of poor stability, poor generalization and high complexity. .
  • the present disclosure provides an image decoding method based on neural network, which is applied to the decoding end.
  • the method includes:
  • the input features corresponding to the decoding processing unit are determined based on the image information, and the input features are processed based on the decoding neural network to obtain the output features corresponding to the decoding processing unit.
  • the present disclosure provides an image coding method based on neural network, which is applied to the coding end.
  • the method includes:
  • the input features corresponding to the coding processing unit are determined based on the current block, the input features are processed based on the coding neural network corresponding to the coding processing unit, the output features corresponding to the coding processing unit are obtained, and the input features corresponding to the coding processing unit are determined based on the output features. Describe the image information corresponding to the current block;
  • control parameters corresponding to the current block where the control parameters include neural network information corresponding to the decoding processing unit, and the neural network information is used to determine the decoding neural network corresponding to the decoding processing unit;
  • the image information and control parameters corresponding to the current block are encoded in the code stream.
  • the present disclosure provides an image decoding device based on a neural network, which device includes:
  • memory configured to store video data
  • a decoder configured to:
  • the input features corresponding to the decoding processing unit are determined based on the image information, and the input features are processed based on the decoding neural network to obtain the output features corresponding to the decoding processing unit.
  • the present disclosure provides an image coding device based on neural network, the device includes:
  • memory configured to store video data
  • An encoder configured to:
  • the input features corresponding to the coding processing unit are determined based on the current block, the input features are processed based on the coding neural network corresponding to the coding processing unit, the output features corresponding to the coding processing unit are obtained, and the input features corresponding to the coding processing unit are determined based on the output features. Describe the image information corresponding to the current block;
  • control parameters corresponding to the current block where the control parameters include neural network information corresponding to the decoding processing unit, and the neural network information is used to determine the decoding neural network corresponding to the decoding processing unit;
  • the image information and control parameters corresponding to the current block are encoded in the code stream.
  • the present disclosure provides a decoding end device, including: a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions that can be executed by the processor;
  • the processor is configured to execute machine-executable instructions to implement the above neural network-based image decoding method.
  • the present disclosure provides an encoding end device, including: a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions that can be executed by the processor;
  • the processor is configured to execute machine-executable instructions to implement the above neural network-based image coding method.
  • control parameters corresponding to the current block can be decoded from the code stream
  • the neural network information corresponding to the decoding processing unit can be obtained from the control parameters
  • the neural network information corresponding to the decoding processing unit can be generated based on the neural network information.
  • Decoding neural network and then implementing image decoding based on decoding neural network to improve decoding performance.
  • Image coding can be implemented based on the coding neural network corresponding to the coding processing unit to improve coding performance.
  • Neural networks such as decoding neural networks and encoding neural networks
  • the neural network information is transmitted through the code stream, and then the decoding neural network corresponding to the decoding processing unit is generated based on the neural network information to solve the problem of stability Problems such as poorer, poorer generalization and higher complexity, that is, better stability, better generalization and lower complexity. It can provide a solution for dynamic adjustment of encoding and decoding complexity, which has better encoding performance and decoding performance than a single neural network framework.
  • the neural network information obtained from the control parameters is the neural network information for the current block, so that a decoding neural network is generated for each current block, that is, the decoding neural network for different current blocks is possible. They may be the same or different, so the block-level decoding neural network, that is, the decoding neural network, can be changed and adjusted.
  • Figure 1 is a schematic diagram of the video coding framework
  • FIGS. 2A-2C are schematic diagrams of the video coding framework
  • Figure 3 is a flow chart of a neural network-based image decoding method in an embodiment of the present disclosure
  • Figure 4 is a flow chart of a neural network-based image coding method in an embodiment of the present disclosure
  • Figures 5A and 5C are schematic diagrams of an image encoding method and an image decoding method in an embodiment of the present disclosure
  • 5B and 5D are schematic diagrams of boundary filling in an embodiment of the present disclosure.
  • Figure 5E is a schematic diagram of image domain transformation of an original image in an embodiment of the present disclosure.
  • Figures 6A and 6B are schematic structural diagrams of the decoding end in an embodiment of the present disclosure.
  • Figure 6C is a schematic structural diagram of a coefficient hyperparameter feature generation unit in an embodiment of the present disclosure.
  • Figure 6D is a schematic structural diagram of an image feature inverse transformation unit in an embodiment of the present disclosure.
  • Figure 7A, Figure 7B and Figure 7C are schematic structural diagrams of the encoding end in an embodiment of the present disclosure
  • Figure 8A is a hardware structure diagram of a decoding terminal device in an embodiment of the present disclosure.
  • Figure 8B is a hardware structure diagram of an encoding terminal device in an embodiment of the present disclosure.
  • the first information may also be called second information, and similarly, the second information may also be called first information, depending on the context.
  • the word “if” is used and may be interpreted as "when,” or "when,” or "in response to determining.”
  • a neural network-based image decoding and encoding method is proposed, which may involve the following concepts:
  • Neural Network refers to artificial neural network, not biological neural network.
  • Neural network is a computing model composed of a large number of nodes (or neurons) connected to each other.
  • neuron processing units can represent different objects, such as features, letters, concepts, or some meaningful abstract patterns.
  • the types of processing units in neural networks are divided into three categories: input units, output units and hidden units.
  • the input unit receives signals and data from the external world; the output unit realizes the output of processing results; the hidden unit is a unit that is between the input and output units and cannot be observed from outside the system.
  • the connection weights between neurons reflect the connection strength between units, and the representation and processing of information are reflected in the connection relationships of processing units.
  • Neural network is a non-programmed, brain-like information processing method. Its essence is to obtain a parallel distributed information processing function through the transformation and dynamic behavior of the neural network, and to imitate the human brain to varying degrees and levels. Information processing functions of the nervous system.
  • commonly used neural networks can include but are not limited to: convolutional neural network (CNN), recurrent neural network (RNN), fully connected network, etc.
  • Convolutional neural network is a feedforward neural network and one of the most representative network structures in deep learning technology.
  • the artificial neurons of the convolutional neural network can respond to part of the coverage. Surrounding units within range, excellent performance for large image processing.
  • the basic structure of the convolutional neural network consists of two layers. One is the feature extraction layer (also called the convolution layer). The input of each neuron is connected to the local receptive field of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined. The second is the feature mapping layer (also called the activation layer). Each calculation layer of the neural network is composed of multiple feature maps.
  • Each feature map is a plane, and the weights of all neurons on the plane are equal.
  • the feature mapping structure can use Sigmoid function, ReLU function, Leaky-ReLU function, PReLU function, GDN function, etc. as the activation function of the convolutional network.
  • neurons on a mapping surface share weights, the number of free parameters of the network is reduced.
  • convolutional neural networks compared to image processing algorithms avoids complex pre-processing of images (extracting artificial features, etc.), and can directly input original images for end-to-end learning.
  • convolutional neural networks compared to ordinary neural networks adopt a fully connected method, that is, the neurons from the input layer to the hidden layer are all connected. Doing so will result in a huge amount of parameters. Making network training time-consuming or even difficult to train, convolutional neural networks avoid this difficulty through local connections, weight sharing and other methods.
  • the deconvolution layer is also called the transposed convolution layer.
  • the working processes of the deconvolution layer and the convolution layer are very similar. The main difference is that the deconvolution layer will pass padding. Make the output greater than the input (or keep it the same, of course). If stride is 1, it means that the output size is equal to the input size; if stride (step size) is N, it means that the width of the output feature is N times the width of the input feature, and the height of the output feature is N times the height of the input feature.
  • Generalization ability can refer to the adaptability of a machine learning algorithm to fresh samples.
  • the purpose of learning is to learn the rules hidden behind the data pairs.
  • the trained network can also give appropriate output, and this ability can be called generalization ability.
  • Rate-Distortion Optimized There are two major indicators to evaluate coding efficiency: code rate and PSNR (Peak Signal to Noise Ratio, Peak Signal to Noise Ratio). The smaller the bit stream, the greater the compression rate and the greater PSNR. , the better the reconstructed image quality is.
  • SAD refers to the sum of the absolute values of the differences between the reconstructed image block and the source image; ⁇ is the Lagrange multiplier, and R is the image block encoding required in this mode.
  • the actual number of bits including the total number of bits required for encoding mode information, motion information, residuals, etc.
  • Video coding framework Refer to Figure 1, which is a schematic diagram of the video coding framework of the encoding end. This video coding framework can be used to implement the encoding end processing flow of the embodiment of the present disclosure.
  • the schematic diagram of the video decoding framework can be similar to Figure 1, and will not be used here. To reiterate, the video decoding framework can be used to implement the decoding end processing flow of the embodiment of the present disclosure.
  • the video coding framework may include modules such as prediction, transformation, quantization, entropy encoder, inverse quantization, inverse transformation, reconstruction, and filtering. On the coding side, through the cooperation between these modules, the processing flow of the coding side can be realized.
  • the video decoding framework can include prediction, transformation, quantization, entropy decoder, inverse quantization, inverse transformation, reconstruction, filtering and other modules. On the decoding end, through the cooperation between these modules, the processing flow of the decoding end can be realized.
  • RDO Rate-Distortion Optimize
  • the optimal tool or mode is determined, the decision information of the tool or mode is transmitted by encoding mark information in the bit stream.
  • the encoding end includes a prediction processing unit, a residual calculation unit, a transformation processing unit, a quantization unit, a coding unit, an inverse quantization unit (which may also be called an inverse quantization unit), and an inverse transform processing unit (which may also be called an inverse quantization unit).
  • Transform processing unit reconstruction unit (also known as reconstruction unit) and filter unit.
  • the encoding end may also include a buffer and a decoded image buffer, wherein the buffer is used to cache the reconstructed image blocks output by the reconstruction unit, and the decoded image buffer is used to cache the filtered image blocks output by the filter unit.
  • Image blocks may also include a buffer and a decoded image buffer, wherein the buffer is used to cache the reconstructed image blocks output by the reconstruction unit, and the decoded image buffer is used to cache the filtered image blocks output by the filter unit.
  • the input of the encoding end is the image block of the image (which can be called the image to be encoded).
  • the image block can also be called the current block or the block to be encoded.
  • the encoding end can also include a segmentation unit (not shown in the figure). shown), the dividing unit is used to divide the image to be encoded into multiple image blocks.
  • the encoding end is used for block-by-block encoding to complete the encoding of the image to be encoded, for example, performing the encoding process on each image block.
  • the prediction processing unit is used to receive or obtain image blocks (the current image block to be encoded of the current image to be encoded, which can also be called the current block, and the image block can be understood as the true value of the image block) and reconstructed image data, based on the The relevant data in the reconstructed image data is used to predict the current block and a prediction block of the current block is obtained.
  • the prediction processing unit may include an inter prediction unit, an intra prediction unit and a mode selection unit.
  • the mode selection unit is used to select an intra prediction mode or an inter prediction mode. If the intra prediction mode is selected, the frame The intra prediction unit performs the prediction process, and if the inter prediction mode is selected, the inter prediction unit may perform the prediction process.
  • the residual calculation unit is used to calculate the residual between the true value of the image block and the prediction block of the image block to obtain the residual block.
  • the residual calculation unit can subtract the prediction block from the pixel value of the image block pixel by pixel. pixel value.
  • the transform processing unit is used to transform the residual block, such as discrete cosine transform (DCT) or discrete sine transform (DST), to obtain transform coefficients in the transform domain.
  • DCT discrete cosine transform
  • DST discrete sine transform
  • the transform coefficients can also be called Transform residual coefficients that can represent the residual block in the transform domain.
  • the quantization unit is used to quantize the transform coefficient by applying scalar quantization or vector quantization to obtain a quantized transform coefficient, which may also be called a quantized residual coefficient.
  • the quantization process can reduce the bit depth associated with some or all transform coefficients. For example, n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m.
  • the degree of quantization can be modified by adjusting the quantization parameter (QP). For example, with scalar quantization, different scales can be applied to achieve finer or coarser quantization. A smaller quantization step size corresponds to finer quantization, while a larger quantization step size corresponds to coarser quantization.
  • the appropriate quantization step size can be indicated by the quantization parameter (QP).
  • the encoding unit is used to encode the above-mentioned quantized residual coefficient, output the encoded image data in the form of an encoded bit stream (that is, the encoding result of the current image block to be encoded), and then transmit the encoded bit stream to the decoder, Or store it for subsequent transmission to a decoder or for retrieval.
  • the coding unit can also be used to encode other syntax elements of the current image block, such as encoding the prediction mode to the code stream.
  • Coding algorithms include but are not limited to variable length coding (VLC) algorithm, context adaptive VLC (context adaptive VLC, CAVLC) algorithm, arithmetic coding algorithm, context adaptive binary arithmetic coding (CABAC) ) algorithm, syntax-based context-adaptive binary arithmetic coding (SBAC) algorithm, probability interval partitioning entropy (PIPE) algorithm.
  • VLC variable length coding
  • CABAC context adaptive binary arithmetic coding
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the inverse quantization unit is used to perform inverse quantization on the above-mentioned quantized coefficients to obtain the inverse quantized coefficients.
  • the inverse quantization is a reverse application of the above-mentioned quantization unit.
  • the quantization unit is applied based on or using the same quantization step size as the quantization unit.
  • the inverse quantization scheme of the applied quantization scheme may also be called inverse quantized residual coefficients.
  • the inverse transformation processing unit is used to inversely transform the above-mentioned inverse quantization coefficient. It should be understood that the inverse transformation is an inverse application of the above-mentioned transformation processing unit.
  • the inverse transformation may include inverse discrete cosine transform (IDCT) or Inverse discrete sine transform (IDST) to obtain the inverse transform block in the pixel domain (or sample domain).
  • IDCT inverse discrete cosine transform
  • IDST Inverse discrete sine transform
  • the inverse transform block may also be called an inverse transform inverse quantized block or an inverse transform residual block.
  • the reconstruction unit is used to add the inverse transform block (ie, the inverse transform residual block) to the prediction block to obtain the reconstructed block in the sample domain.
  • the reconstruction unit may be a summer, for example, the samples of the residual block.
  • the values i.e. pixel values
  • the reconstructed block output by the reconstruction unit can be subsequently used to predict other image blocks, for example, in intra prediction mode.
  • the filter unit (or simply "filter”) is used to filter the reconstructed block to obtain the filtered block, thereby smoothly performing pixel conversion or improving image quality.
  • the filter unit may be a loop filter unit, intended to represent one or more loop filters, for example, the filter unit may be a deblocking filter, sample adaptive offset (sample-adaptive offset, SAO) filter or other filter, such as a bilateral filter, an adaptive loop filter (ALF), or a sharpening or smoothing filter, or a collaborative filter.
  • SAO sample adaptive offset
  • ALF adaptive loop filter
  • the filtered blocks output by the filtering unit can be subsequently used to predict other image blocks, for example, in inter prediction mode, without limitation.
  • FIG. 2B a schematic block diagram is shown of an example of an encoding end (which may also be referred to as a decoder) for implementing an embodiment of the present disclosure.
  • the decoder is configured to receive encoded image data (ie, an encoded bitstream, e.g., an encoded bitstream including image blocks and associated syntax elements), eg, encoded by the encoder, to obtain a decoded image.
  • the decoder includes a decoding unit, an inverse quantization unit, an inverse transform processing unit, a prediction processing unit, a reconstruction unit, and a filter unit.
  • the decoder may perform a decoding pass that is generally reciprocal to the encoding pass described for the encoder of Figure 2A.
  • the decoder may further include a buffer and a decoded image buffer, wherein the buffer is used to cache the reconstructed image block output by the reconstruction unit, and the decoded image buffer is used to cache the filtered image block output by the filter unit.
  • Image blocks may further include a buffer and a decoded image buffer, wherein the buffer is used to cache the reconstructed image block output by the reconstruction unit, and the decoded image buffer is used to cache the filtered image block output by the filter unit.
  • the decoding unit is configured to perform decoding on the encoded image data to obtain quantized coefficients and/or decoded encoding parameters (for example, the decoding parameters may include inter prediction parameters, intra prediction parameters, filter parameters, and/or other syntax any or all of the elements).
  • the decoding unit is also configured to forward the decoded encoding parameters to the prediction processing unit, so that the prediction processing unit performs a prediction process according to the encoding parameters.
  • the inverse quantization unit may have the same function as the inverse quantization unit of the encoder, for inverse quantization (ie, inverse quantization) of the quantized coefficients decoded by the decoding unit.
  • the function of the inverse transform processing unit may be the same as the function of the inverse transform processing unit of the encoder, and the function of the reconstruction unit (such as a summer) may be the same as the function of the reconstruction unit of the encoder.
  • the coefficients are inversely transformed (for example, inverse DCT, inverse integer transform, or a conceptually similar inverse transform process) to obtain an inverse transform block (which can also be called an inverse transform residual block).
  • This inverse transform block is the current image block in Residual block in pixel domain.
  • the prediction processing unit is configured to receive or obtain encoded image data (such as the encoded bit stream of the current image block) and reconstructed image data.
  • the prediction processing unit may also receive or obtain prediction related parameters and/or information about the decoding unit from, for example, the decoding unit.
  • the information of the selected prediction mode ie, the decoded coding parameters
  • the current image block is predicted based on the relevant data in the reconstructed image data and the decoded coding parameters to obtain a prediction block of the current image block.
  • the prediction processing unit may include an inter prediction unit, an intra prediction unit and a mode selection unit.
  • the mode selection unit is used to select an intra prediction mode or an inter prediction mode. If the intra prediction mode is selected, the frame The intra prediction unit performs the prediction process. If the inter prediction mode is selected, the inter prediction unit performs the prediction process.
  • the reconstruction unit is used to add the inverse transform block (ie, the inverse transform residual block) to the prediction block to obtain the reconstructed block in the sample domain.
  • the sample value of the inverse transform residual block can be compared with the sample of the prediction block. The values are added.
  • the filter unit is used to filter the reconstructed block to obtain a filtered block, which is a decoded image block.
  • the processing result of a certain link can also be further processed, and the further processed processing result can be output to the next link, for example, in After interpolation filtering, motion vector derivation or filtering, the processing results of the corresponding links are further subjected to operations such as clip or shift.
  • Figure 2C is a schematic flow chart of encoding and decoding provided by the embodiment of the present disclosure.
  • the encoding and decoding implementation methods include process 1 to process 5, and processes 1 to process 5 can be executed by the above-mentioned decoder and encoder.
  • Process 1 Divide a frame of image into one or more parallel coding units that do not overlap with each other. There is no dependency between the one or more parallel coding units, and they can be encoded and decoded completely in parallel and independently of each other, such as the parallel coding unit 1 and the parallel coding unit 2 shown in Figure 2C.
  • each parallel coding unit it can be divided into one or more independent coding units that do not overlap with each other.
  • the independent coding units may not depend on each other, but they can share some parallel coding unit header information.
  • the width of an independent coding unit is w_lcu and the height is h_lcu. If the parallel coding unit is divided into an independent coding unit, the size of the independent coding unit is exactly the same as that of the parallel coding unit; otherwise, the width of the independent coding unit should be greater than the height (unless it is an edge area).
  • the independent coding unit can be fixed w_lcu ⁇ h_lcu, w_lcu and h_lcu are both 2 to the N power (N ⁇ 0).
  • the size of the independent coding unit is: 128 ⁇ 4, 64 ⁇ 4, 32 ⁇ 4, 16 ⁇ 4, 8 ⁇ 4, 32 ⁇ 2, 16 ⁇ 2 or 8 ⁇ 2 etc.
  • the independent coding unit may be fixed 128 ⁇ 4. If the size of the parallel coding unit is 256 ⁇ 8, the parallel coding unit can be equally divided into 4 independent coding units; if the size of the parallel coding unit is 288 ⁇ 10, the parallel coding unit can be divided into: the first row and the second row. The second row is two independent coding units of 128 ⁇ 4 and one 32 ⁇ 4; the third row is two independent coding units of 128 ⁇ 2 and one 32 ⁇ 2.
  • the independent coding unit can include three components of brightness Y, chroma Cb, and chroma Cr, or three components of red (red, R), green (green, G), and blue (blue, B), or brightness Y, color There are three components of degree Co and chroma Cg, or only one of them can be included. If the independent coding unit contains three components, the sizes of the three components can be exactly the same or different, depending on the input format of the image.
  • each independent coding unit it can be divided into one or more sub-coding units that do not overlap with each other.
  • Each sub-coding unit within the independent coding unit can depend on each other. For example, multiple sub-coding units can perform mutual reference prediction. Codec.
  • the size of the sub-coding unit is the same as that of the independent coding unit (that is, the independent coding unit is only divided into one sub-coding unit), then its size can be all the sizes described in process 2.
  • the independent coding unit is divided into multiple non-overlapping sub-coding units, examples of feasible divisions include: horizontal division (the height of the sub-coding unit is the same as the independent coding unit, but the width is different, which can be 1/2, 1/ 4, 1/8, 1/16, etc.), divided vertically (the width of the sub-coding unit is the same as that of the independent coding unit, but the height is different, which can be 1/2, 1/4, 1/8, 1/16, etc.) , horizontal and vertical equal divisions (quadtree division), etc., preferably horizontal equal divisions.
  • the width of the sub-coding unit is w_cu and the height is h_cu.
  • the width should be greater than the height (unless it is an edge area).
  • the sub-coding unit is fixed w_cu
  • the sub-coding unit is fixed 16x4. If the size of the independent coding unit is 64x4, the independent coding unit is equally divided into 4 sub-coding units; if the size of the independent coding unit is 72x4, the sub-coding unit is divided into: 4 16x4+1 8x4.
  • the sub-coding unit can include three components of brightness Y, chroma Cb, and chroma Cr (or three components of red R, green G, and blue B, or brightness Y, chroma Co, and chroma Cg) , or it can contain only one of its components. If it contains three components, the sizes of the components can be exactly the same or different, depending on the image input format.
  • process 3 can be an optional step in the encoding and decoding method, and the encoder/decoder can encode and decode the residual coefficients (or residual values) of the independent coding units obtained by process 2.
  • PG non-overlapping prediction groups
  • PG is also referred to as Group.
  • Each PG is encoded and decoded according to the selected prediction mode to obtain the PG.
  • the predicted value constitutes the predicted value of the entire sub-coding unit. Based on the predicted value and the original value of the sub-coding unit, the residual value of the sub-coding unit is obtained.
  • Process 5 Based on the residual values of the sub-coding units, group the sub-coding units to obtain one or more non-overlapping residual blocks (RB).
  • the residual coefficients of each RB are processed according to the selected mode. Encoding and decoding to form a residual coefficient stream. Specifically, it can be divided into two categories: transforming the residual coefficients and not transforming them.
  • the selected mode of the residual coefficient encoding and decoding method in process 5 can include, but is not limited to any of the following: semi-fixed length encoding method, exponential Golomb encoding method, Golomb-Rice encoding method, truncated unary code Encoding method, run-length encoding method, direct encoding of the original residual value, etc.
  • the encoder may directly encode the coefficients within the RB.
  • the encoder can also transform the residual block (such as DCT, DST, Hadamard transform, etc.), and then encode the transformed coefficients.
  • the encoder can directly quantize each coefficient in the RB uniformly, and then perform binary encoding.
  • the RB can be further divided into multiple coefficient groups (CG), and then each CG can be uniformly quantized and then binary encoded.
  • CG coefficient group
  • QG quantization group
  • the coefficient group and the quantization group may be the same.
  • the coefficient group and the quantization group may also be different.
  • the maximum value of the absolute value of the residual within an RB block is defined as the modified maximum (MM).
  • the number of coded bits of the residual coefficient in the RB block is determined (the number of coded bits of the residual coefficient in the same RB block is consistent). For example, if the critical limit (CL) of the current RB block is 2 and the current residual coefficient is 1, then 2 bits are needed to encode the residual coefficient 1, which is expressed as 01. If the CL of the current RB block is 7, it means encoding an 8-bit residual coefficient and a 1-bit sign bit.
  • the determination of CL is to find the minimum M value that satisfies all residuals of the current sub-block to be within the range of [-2 ⁇ (M-1), 2 ⁇ (M-1)]. If there are two boundary values -2 ⁇ (M-1) and 2 ⁇ (M-1) at the same time, M increases by 1, that is, M+1 bits are needed to encode all the residuals of the current RB block; if there is only -2 One of the two boundary values ⁇ (M-1) and 2 ⁇ (M-1), then encode a Trailing bit to determine whether the boundary value is -2 ⁇ (M-1) or 2 ⁇ (M-1); If any of -2 ⁇ (M-1) and 2 ⁇ (M-1) does not exist in all residuals, there is no need to encode the Trailing bit. For some special cases, the encoder can directly encode the original value of the image instead of the residual value.
  • neural networks can adaptively construct feature descriptions driven by training data, with higher flexibility and universality, making deep learning successful in many high-level computer vision problems, such as Image classification, target detection, etc., and deep learning has gradually begun to be applied in the field of encoding and decoding, that is, using neural networks to encode and decode images.
  • the convolutional neural network VRCNN Variable Filter Size Convolutional Network
  • VRCNN Variable Filter Size Convolutional Network
  • neural networks can be applied For intra-frame prediction, an intra-frame prediction mode based on block up- and down-sampling is proposed. For intra-frame prediction blocks, down-sampling encoding is first performed, and then the reconstructed pixels are up-sampled through a neural network. For ultra-high-definition sequences, up to 9.0 can be obtained % performance gain. Obviously, neural networks can effectively get rid of the limitations of artificially set models, obtain neural networks that meet actual needs through data drive, and significantly improve coding performance.
  • the encoding and decoding methods based on neural networks show great performance potential, the encoding and decoding methods based on neural networks still have problems such as poor stability, poor generalization, and high complexity.
  • neural networks are still in the process of rapid iteration, and new network structures are emerging one after another. Even for some common problems, there is still no definite answer on which network structure is optimal, let alone specific problems for certain modules of the encoder. Therefore, For a certain module of the encoder, using a single fixed neural network is risky.
  • the formation of neural networks is highly dependent on training data, if the training data does not contain a certain feature of the actual problem, poor performance is likely to occur when dealing with the problem.
  • a mode usually uses only one neural network.
  • the neural network has insufficient generalization ability, the mode will cause a reduction in coding performance.
  • the video encoding standard technology has extremely strict requirements on complexity, especially decoding complexity.
  • the parameter amount of the neural network used for encoding is often very large (such as more than 1M).
  • the average number of multiplications and additions generated by applying a neural network can reach more than 100K times per pixel, and the number of simplified network layers or parameters is relatively large.
  • an embodiment of the present disclosure proposes an image decoding method and an image encoding method based on neural networks.
  • Neural networks such as decoding neural networks and encoding neural networks, etc.
  • the optimization idea is: in addition to focusing on encoding performance and decoding performance, you also need to pay attention to complexity (especially the parallelism of coefficient encoding and decoding) and application functionality (supporting variable and fine-tuned code rates, that is, controllable code rates).
  • Embodiment 1 In this embodiment of the present disclosure, an image decoding method based on neural network is proposed. See Figure 3, which is a schematic flow chart of the method. This method can be applied to the decoding end (also called a video decoder). The method may include steps 301-303.
  • Step 301 Decode the control parameters and image information corresponding to the current block from the code stream.
  • Step 302 Obtain the neural network information corresponding to the decoding processing unit from the control parameter, and generate the decoding neural network corresponding to the decoding processing unit based on the neural network information.
  • Step 303 Determine the input features corresponding to the decoding processing unit based on the image information, process the input features based on the decoding neural network, and obtain the output features corresponding to the decoding processing unit.
  • the basic layer corresponding to the decoding processing unit can be determined based on the basic layer information, and the enhancement layer corresponding to the decoding processing unit can be determined based on the enhancement layer information. ; Then, a decoding neural network corresponding to the decoding processing unit is generated based on the basic layer and enhancement layer.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network, then the base layer of the default network structure is obtained.
  • the prefabricated network can be obtained from the basic layer. Select the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer from the neural network pool.
  • the prefabricated neural network pool may include at least one network layer of a prefabricated network structure.
  • the enhancement layer information includes the enhancement layer uses default network flag, and the enhancement layer uses default network flag indicates that the enhancement layer uses the default network, then the enhancement layer of the default network structure is obtained.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network
  • the prefabricated network can be obtained from the enhancement layer. Select the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer from the neural network pool.
  • the prefabricated neural network pool includes at least one network layer of a prefabricated network structure.
  • the enhancement layer information includes network parameters used to generate the enhancement layer
  • the enhancement layer corresponding to the decoding processing unit is generated based on the network parameters;
  • the network parameters may include but are not limited to the following: At least one: the number of neural network layers, the deconvolution layer flag, the number of deconvolution layers, the quantization step size of each deconvolution layer, the number of channels of each deconvolution layer, the size of the convolution kernel, Number of filters, filter size index, filter coefficient zero flag, filter coefficient, activation layer flag, activation layer type.
  • the above are just a few examples and are not limiting.
  • the image information may include coefficient hyperparameter feature information and image feature information.
  • the input features corresponding to the decoding processing unit are determined based on the image information.
  • the input features are processed based on the decoding neural network to obtain the corresponding input features of the decoding processing unit.
  • the output features may include but are not limited to: when performing the decoding process of coefficient hyperparameter feature generation, the coefficient hyperparameter feature coefficient reconstruction value can be determined based on the coefficient hyperparameter feature information; the coefficient hyperparameter feature coefficient can be determined based on the decoding neural network
  • the reconstructed value is subjected to an inverse transformation operation to obtain the coefficient hyperparameter feature value; where the coefficient hyperparameter feature value can be used to decode image feature information from the code stream.
  • the image feature reconstruction value can be determined based on the image feature information; the image feature reconstruction value is inversely transformed based on the decoding neural network to obtain the image low-order feature value; where, the image low-order feature value
  • the order eigenvalue can be used to obtain the reconstructed image block corresponding to the current block.
  • determining the coefficient hyperparameter characteristic coefficient reconstruction value based on the coefficient hyperparameter characteristic information may include but is not limited to: if the control parameter includes first enabling information, and the first enabling information indicates enabling the first inverse quantization operation. , then the coefficient hyperparameter feature information can be inversely quantized to obtain the coefficient hyperparameter feature coefficient reconstruction value.
  • Determining the image feature reconstruction value based on the image feature information may include but is not limited to: if the control parameter includes second enabling information, and the second enabling information indicates enabling the second inverse quantization operation, then the image feature information may be inverted. Quantify to obtain image feature reconstruction values.
  • control parameter includes the third enabling information
  • the third enabling information indicates enabling the quality enhancement operation
  • the decoding neural network is used to The low-order feature values of the image are enhanced to obtain the reconstructed image block corresponding to the current block.
  • the decoding end device may include a control parameter decoding unit, a first feature decoding unit, a second feature decoding unit, a coefficient hyperparameter feature generation unit and an image feature inverse transformation unit;
  • the image information may include a coefficient super parameter decoding unit. parameter feature information and image feature information.
  • the control parameter decoding unit can decode the control parameters from the code stream
  • the first feature decoding unit can decode the coefficient hyperparameter feature information from the code stream
  • the second feature decoding unit can decode the image feature information from the code stream.
  • the coefficient hyperparameter feature coefficient reconstruction value can be determined based on the coefficient hyperparameter feature information; the coefficient hyperparameter feature generation unit can invert the coefficient hyperparameter feature coefficient reconstruction value based on the decoding neural network. Transform operation to obtain coefficient hyperparameter feature values; wherein, the coefficient hyperparameter feature values are used to enable the second feature decoding unit to decode image feature information from the code stream.
  • the image feature inverse transformation unit is a decoding processing unit
  • the image feature reconstruction value is determined based on the image feature information
  • the image feature inverse transformation unit can perform an inverse transformation operation on the image feature reconstruction value based on the decoding neural network to obtain the image low-order feature value; where , the low-order feature value of the image is used to obtain the reconstructed image block corresponding to the current block.
  • the decoding end device further includes a first inverse quantization unit and a second inverse quantization unit; wherein: the control parameter may include first enabling information of the first inverse quantization unit, if the first enabling information indicates enabling the first Inverse quantization unit, the first inverse quantization unit can obtain the coefficient hyperparameter feature information from the first feature decoding unit, inversely quantize the coefficient hyperparameter feature information, obtain the coefficient hyperparameter feature coefficient reconstruction value, and reconstruct the coefficient hyperparameter feature coefficient The value is provided to the coefficient hyperparameter feature generation unit; the control parameter may include second enabling information of the second inverse quantization unit.
  • the second inverse quantization unit may be The feature decoding unit obtains the image feature information, inversely quantizes the image feature information, obtains the image feature reconstruction value, and provides the image feature reconstruction value to the image feature inverse transformation unit.
  • the decoding end device may also include a quality enhancement unit; wherein: the control parameters may include third enabling information of the quality enhancement unit. If the third enabling information indicates enabling the quality enhancement unit, then, in the quality enhancement unit: When decoding the processing unit, the quality enhancement unit can obtain the low-order feature values of the image from the image feature inverse transformation unit, and enhance the low-order feature values of the image based on the decoding neural network to obtain the reconstructed image block corresponding to the current block.
  • the above execution order is only an example given for convenience of description. In actual applications, the execution order between steps can also be changed, and there is no limit to the execution order. Moreover, in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification, and the method may include more or fewer steps than described in this specification. In addition, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may also be combined into a single step for description in other embodiments.
  • control parameters corresponding to the current block can be decoded from the code stream
  • the neural network information corresponding to the decoding processing unit can be obtained from the control parameters
  • the neural network information corresponding to the decoding processing unit can be generated based on the neural network information.
  • Decoding neural network and then implementing image decoding based on decoding neural network to improve decoding performance.
  • Can Image coding is implemented based on the coding neural network corresponding to the coding processing unit to improve coding performance.
  • Deep learning networks (such as decoding neural networks and encoding neural networks) can be used to encode and decode images, and the neural network information is transmitted through the code stream, and then the decoding neural network corresponding to the decoding processing unit is generated based on the neural network information to solve the problem of stable Problems such as poor stability, poor generalization and high complexity, that is, better stability, better generalization and lower complexity. It can provide a solution for dynamic adjustment of encoding and decoding complexity, which has better encoding performance than the framework of a single deep learning network.
  • the neural network information obtained from the control parameters is the neural network information for the current block, so that a decoding neural network is generated for each current block, that is, the decoding neural network for different current blocks is possible. They may be the same or different, so the block-level decoding neural network, that is, the decoding neural network, can be changed and adjusted.
  • Embodiment 2 In this embodiment of the present disclosure, an image coding method based on neural network is proposed. See Figure 4, which is a schematic flow chart of the method. This method can be applied to the encoding end (also called a video encoder). The method may include steps 401-403.
  • Step 401 Determine the input features corresponding to the coding processing unit based on the current block, process the input features based on the coding neural network corresponding to the coding processing unit, obtain the output features corresponding to the coding processing unit, and determine the image information corresponding to the current block based on the output features , image information such as coefficient hyperparameter feature information and image feature information.
  • Step 402 Obtain the control parameters corresponding to the current block.
  • the control parameters may include neural network information corresponding to the decoding processing unit.
  • the neural network information is used to determine the decoding neural network corresponding to the decoding processing unit.
  • Step 403 Encode the image information and control parameters corresponding to the current block in the code stream.
  • the neural network information may include base layer information and enhancement layer information
  • the decoding neural network includes a base layer determined based on the base layer information and an enhancement layer determined based on the enhancement layer information.
  • the decoding neural network adopts the base layer of the default network structure.
  • the decoding neural network can be selected from the prefabricated neural network pool.
  • the prefabricated neural network pool may include at least one network layer of a prefabricated network structure.
  • the enhancement layer information includes the enhancement layer uses default network flag
  • the enhancement layer uses default network flag indicates that the enhancement layer uses the default network
  • the decoding neural network uses the enhancement layer of the default network structure.
  • the decoding neural network can be selected from the prefabricated neural network pool.
  • the prefabricated neural network pool includes at least one network layer of a prefabricated network structure.
  • the decoding neural network can use an enhancement layer generated based on the network parameters; where the network parameters can, but are not limited to, include at least one of the following: neural network layer number, deconvolution layer flag bit, number of deconvolution layers, quantization step size of each deconvolution layer, number of channels of each deconvolution layer, size of convolution kernel, number of filters, filter size index,
  • the filter coefficient is the zero flag, filter coefficient, activation layer flag, and activation layer type.
  • the image information may include coefficient hyperparameter feature information and image feature information
  • the input features corresponding to the encoding processing unit are determined based on the current block
  • the input features are processed based on the encoding neural network corresponding to the encoding processing unit
  • Obtaining the output features corresponding to the encoding processing unit, and determining the image information corresponding to the current block based on the output features may include but is not limited to:
  • the feature transformation of the current block may be performed based on the coding neural network to obtain the current block.
  • the image feature value corresponding to the block where the image feature value is used to determine the image feature information.
  • the image feature value can be transformed by coefficient hyperparameter feature based on the coding neural network to obtain the coefficient hyperparameter feature coefficient value.
  • the coefficient hyperparameter feature coefficient value is used to determine the coefficient hyperparameter feature. information.
  • the characteristic coefficient value of the coefficient hyperparameter can be quantified to obtain the quantified value of the characteristic coefficient of the coefficient hyperparameter, and based on the quantified value of the characteristic coefficient of the coefficient hyperparameter Determine coefficient hyperparameter feature information.
  • the control parameters may also include first enabling information, and the first enabling information is used to indicate that the first quantization operation has been enabled.
  • the image feature values may be quantified to obtain image feature quantified values, and the image feature information may be determined based on the image feature quantified values.
  • the control parameter may also include second enabling information, and the second enabling information is used to indicate that the second quantization operation has been enabled.
  • obtaining the control parameters corresponding to the current block may include but is not limited to: the network structure of the coding neural network used in the encoding process based on feature transformation, and the neural network used in the decoding process of determining the image feature inverse transformation of the decoding end device.
  • Network information which is used to determine the decoding neural network corresponding to the decoding process of the image feature inverse transformation of the decoding end device; and/or, the network structure of the coding neural network used in the coding process based on the coefficient hyperparameter feature transformation, Determine the neural network information used in the decoding process of generating coefficient hyperparameter features of the decoding end device.
  • the neural network information is used to determine the decoding neural network corresponding to the decoding process of generating the coefficient hyperparameter features of the decoding end device.
  • the encoding end device may include a control parameter encoding unit, a first feature encoding unit, a second feature encoding unit, a feature transformation unit, and a coefficient hyperparameter feature transformation unit;
  • the image information may include coefficient hyperparameter features information and image feature information.
  • the control parameter encoding unit encodes control parameters in the code stream
  • the first feature encoding unit encodes coefficient hyperparameter feature information in the code stream
  • the second feature encoding unit encodes image feature information in the code stream
  • the feature transformation unit is encoding
  • the feature transformation unit can perform feature transformation on the current block based on the coding neural network to obtain the image feature value corresponding to the current block; among which, the image feature value is used to determine the image feature information
  • the coefficient hyperparameter feature transformation unit can perform coefficient hyperparameter feature transformation on the image feature value based on the coding neural network to obtain the coefficient hyperparameter feature coefficient value, which is used to determine the coefficient hyperparameter feature information.
  • the encoding end device may also include a first quantization unit and a second quantization unit; wherein: the first quantization unit may obtain the coefficient hyperparameter feature coefficient value from the coefficient hyperparameter feature transformation unit, and perform the transformation on the coefficient hyperparameter feature coefficient value. Quantify, obtain the quantified value of the coefficient hyperparameter characteristic coefficient, and determine the coefficient hyperparameter characteristic information based on the quantified value of the coefficient hyperparameter characteristic coefficient.
  • the control parameter may also include first enabling information of the first quantization unit, and the first enabling information is used to indicate that the first quantization unit has been enabled.
  • the second quantization unit can obtain the image feature value from the feature transformation unit, quantize the image feature value, obtain the image feature quantized value, and determine the image feature information based on the image feature quantized value; wherein the control parameter can also include the second quantization Second enabling information of the unit, and the second enabling information is used to indicate that the second quantization unit has been enabled.
  • obtaining the control parameters corresponding to the current block may include but is not limited to: the network structure of the coding neural network based on the feature transformation unit, determining the neural network information corresponding to the image feature inverse transformation unit of the decoding end device, the neural network information Used to determine the decoding neural network corresponding to the image feature inverse transformation unit of the decoding end device; and/or, based on the network structure of the coding neural network of the coefficient hyperparameter feature transformation unit, determine the decoding neural network corresponding to the coefficient hyperparameter feature generation unit of the decoding end device Neural network information, which is used to determine the decoding neural network corresponding to the coefficient hyperparameter feature generation unit of the decoding end device.
  • the image information may include coefficient hyperparameter feature information and image feature information.
  • the coefficient hyperparameter feature coefficient reconstruction may be determined based on the coefficient hyperparameter feature information. value; perform an inverse transformation operation on the reconstruction value of the coefficient hyperparameter feature coefficient based on the decoding neural network to obtain the coefficient hyperparameter feature value; the coefficient hyperparameter feature value can be used to decode image feature information from the code stream.
  • the image feature reconstruction value can be determined based on the image feature information; the image feature reconstruction value can be inversely transformed based on the decoding neural network to obtain the image low-order feature value; the image low-order feature The value can be used to obtain the reconstructed image block corresponding to the current block.
  • determining the coefficient hyperparameter characteristic coefficient reconstruction value based on the coefficient hyperparameter characteristic information may include but is not limited to: the coefficient hyperparameter characteristic information may be inversely quantized to obtain the coefficient hyperparameter characteristic coefficient reconstruction value.
  • Determining the image feature reconstruction value based on the image feature information may include but is not limited to: the image feature information may be inversely quantized to obtain the image feature reconstruction value.
  • low-order feature values of the image can be obtained, and the low-order feature values of the image can be enhanced based on the decoding neural network to obtain a reconstructed image block corresponding to the current block.
  • obtaining the control parameters corresponding to the current block may include but is not limited to at least one of the following: the network structure of the decoding neural network used in the encoding process generated based on the coefficient hyperparameter features of the encoding end device, determining the decoding end device's Neural network information used in the decoding process of generating coefficient hyperparameter features. This neural network information is used to determine the decoding neural network used in the decoding process of generating coefficient hyperparameter features of the decoding end device. Based on the network structure of the decoding neural network used in the encoding process of the image feature inverse transformation of the encoding end device, the neural network information used in the decoding process of the image feature inverse transformation of the decoding end device is determined.
  • the neural network information is used to determine the decoding end device.
  • the decoding neural network is used in the decoding process of the device's inverse transformation of image features.
  • the network structure of the decoding neural network used in the encoding process based on the quality enhancement of the encoding end device determines the neural network information used in the decoding process of the quality enhancement of the decoding end device.
  • the neural network information is used to determine the quality enhancement of the decoding end device.
  • the decoding neural network is used in the decoding process.
  • the encoding end device may include a first feature decoding unit, a second feature decoding unit, a coefficient hyperparameter feature generation unit and an image feature inverse transformation unit; the image information may include coefficient hyperparameter feature information and image Feature information.
  • the first feature decoding unit can decode the coefficient hyperparameter feature information from the code stream
  • the second feature decoding unit can decode the image feature information from the code stream
  • the coefficient hyperparameter feature generation unit can reconstruct the coefficient hyperparameter feature coefficient based on the decoding neural network Perform an inverse transformation operation on the value to obtain the coefficient hyperparameter feature value; wherein, the coefficient hyperparameter feature value is used to enable the second feature decoding unit to decode the image feature information from the code stream; and, is used to enable the second feature encoding unit to decode the image feature information in the code stream.
  • the image feature inverse transformation unit can perform an inverse transformation operation on the image feature reconstruction value based on the decoding neural network to obtain the image low-order feature value; where, the image low-order feature value The feature value is used to obtain the reconstructed image block corresponding to the current block.
  • the encoding end device may also include a first inverse quantization unit and a second inverse quantization unit; wherein: the first inverse quantization unit may obtain the coefficient hyperparameter feature information from the first feature decoding unit, and obtain the coefficient hyperparameter feature The information is inversely quantized to obtain coefficient hyperparameter feature coefficient reconstruction values; the second inverse quantization unit can obtain image feature information from the second feature decoding unit, and inversely quantize the image feature information to obtain image feature reconstruction values.
  • the encoding end device also includes a quality enhancement unit; the quality enhancement unit obtains low-order feature values of the image from the image feature inverse transformation unit, and performs enhancement processing on the low-order feature values of the image based on the decoding neural network to obtain reconstructed image blocks.
  • obtaining the control parameters corresponding to the current block may include but is not limited to at least one of the following: determining the coefficient hyperparameters of the decoding end device based on the network structure of the decoding neural network of the coefficient hyperparameter feature generation unit of the encoding end device.
  • Neural network information corresponding to the feature generation unit which is used to determine the decoding neural network corresponding to the coefficient hyperparameter feature generation unit of the decoding end device; the network structure of the decoding neural network can be based on the image feature inverse transformation unit of the encoding end device , determine the neural network information corresponding to the image feature inverse transformation unit of the decoding end device.
  • the neural network information is used to determine the decoding neural network corresponding to the image feature inverse transformation unit of the decoding end device; it can be based on the decoding of the quality enhancement unit of the encoding end device
  • the network structure of the neural network determines the neural network information corresponding to the quality enhancement unit of the decoding end device.
  • the neural network information is used to determine the decoding neural network corresponding to the quality enhancement unit of the decoding end device.
  • the current image before determining the input features corresponding to the encoding processing unit based on the current block, the current image can also be divided into N image blocks that do not overlap with each other, N is a positive integer; each image block can be Boundary filling, to obtain the image block after boundary filling; wherein, when performing boundary filling on each image block, the filling value does not depend on the reconstructed pixel value of the adjacent image block; N current current blocks can be generated based on the boundary filling image block piece.
  • the current image before determining the input features corresponding to the encoding processing unit based on the current block, can also be divided into multiple basic blocks, and each basic block includes at least one image block; each image can be Perform boundary filling on the block to obtain a boundary-filled image block; among them, when performing boundary filling on each image block, the filling value of the image block does not depend on the reconstructed pixel values of other image blocks in the same basic block, and is allowed to depend on different Reconstructed pixel values of the image blocks within the basic block; multiple current blocks can be generated based on the boundary-filled image blocks.
  • the above execution order is only an example given for convenience of description. In actual applications, the execution order between steps can also be changed, and there is no limit to the execution order. Moreover, in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification, and the method may include more or fewer steps than described in this specification. In addition, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may also be combined into a single step for description in other embodiments.
  • control parameters corresponding to the current block can be decoded from the code stream
  • the neural network information corresponding to the decoding processing unit can be obtained from the control parameters
  • the neural network information corresponding to the decoding processing unit can be generated based on the neural network information.
  • Decoding neural network and then implementing image decoding based on decoding neural network to improve decoding performance.
  • Image coding can be implemented based on the coding neural network corresponding to the coding processing unit to improve coding performance.
  • Deep learning networks (such as decoding neural networks and encoding neural networks) can be used to encode and decode images, and the neural network information is transmitted through the code stream, and then the decoding neural network corresponding to the decoding processing unit is generated based on the neural network information to solve the problem of stable Problems such as poor stability, poor generalization and high complexity, that is, better stability, better generalization and lower complexity. It can provide a solution for dynamic adjustment of encoding and decoding complexity, which has better encoding performance than the framework of a single deep learning network.
  • the neural network information obtained from the control parameters is the neural network information for the current block, so that a decoding neural network is generated for each current block, that is, the decoding neural network for different current blocks is possible. They may be the same or different, so the block-level decoding neural network, that is, the decoding neural network, can be changed and adjusted.
  • Embodiment 3 In this embodiment of the present disclosure, a neural network-based image encoding method and image decoding method with a variable and adjustable code rate are proposed, which can achieve high parallelism of image blocks, and the code rate is controllable and adjustable.
  • Figure 5A is a schematic diagram of an image encoding method and an image decoding method, showing the encoding process of the image encoding method and the decoding process of the image decoding method.
  • the encoding process may include the following steps S11-S15.
  • Step S11 The blocking unit divides the current image (i.e., the original image) into N image blocks (i.e., the original image blocks) that do not overlap with each other, which can be recorded as original image block 1, original image block 2, ..., original image block N), N is a positive integer.
  • Step S12 Perform boundary filling on each original image block to obtain a boundary-filled image block, and generate N current blocks based on the boundary-filled image block. That is to say, perform boundary filling on N original image blocks respectively. Obtain N image blocks after boundary filling, and use the N boundary filling image blocks as N current blocks.
  • the filling value when performing boundary filling on each original image block, the filling value may not depend on the reconstructed pixel value of the adjacent image block, thereby ensuring that each original image block can be independently encoded in parallel, improving Encoding performance.
  • Step S13 The coding unit determines the coding parameters of the current block based on the information of the coded block.
  • the coding parameters are used to control the coding rate of the current block (such as quantization step size and other parameters). There is no restriction on this coding parameter.
  • Step S14 The control unit writes the control parameters that are needed by the decoding end and cannot be obtained by derivation into the code stream.
  • Step S15 Input the filled image block (i.e., the current block) into the coding unit based on the neural network.
  • the coding unit codes the current block based on the coding parameters and outputs the code stream of the current block. For example, when the coding unit codes the current block based on the coding parameters, the coding unit may use a neural network to code the current block.
  • the decoding process may include the following steps S21-S24.
  • Step S21 The decoding end decodes the control parameters that are needed for the current block from the code stream and cannot be derived.
  • Step S22 based on the control parameters and the code stream of the current block, obtain the reconstructed image block corresponding to the current block through the decoding unit based on the neural network, that is, decode the current block to obtain the reconstructed image block, such as the reconstruction corresponding to the original image block 1 Image block 1, reconstructed image block 2 corresponding to original image block 2, ..., reconstructed image block N corresponding to original image block N.
  • a neural network may be used to decode the current block.
  • Step S23 based on the control parameters, determine whether to perform a filtering process on a certain current block. If so, the filtering merging unit performs a filtering process based on the information of the current block and at least one adjacent reconstructed image block to obtain a filtered image block.
  • Step S24 combine the filtered image blocks to obtain a reconstructed image.
  • the filling value when performing boundary filling on the original image block, may be a filling preset value, and the filling preset value may be a default value agreed upon by the encoding and decoding (such as 0, or 1 ⁇ (1 -depth), etc., depth is the bit depth, such as 8, 10, 12, etc.), or it can be a value passed to the decoder through high-level syntax encoding, or it can be obtained by mirroring, nearest neighbor copying and other operations based on the pixels of the current block. There are no restrictions on how to obtain this fill value.
  • the surrounding block filling expansion size may be the default value agreed upon by the encoding and decoding (such as 1, 2, 4, etc.), or may be the same as the current block.
  • the size-related value can also be a value passed to the decoder through high-level syntax encoding, and there is no limit on the size of the surrounding block padding expansion.
  • Embodiment 4 In this embodiment of the present disclosure, a neural network-based image coding method and image decoding method with a variable and adjustable code rate are proposed, which can achieve high parallelism of image blocks, and the code rate is controllable and adjustable. On this basis On top of that, the information between adjacent blocks (such as reconstructed pixels of adjacent blocks, etc.) can also be used. See Figure 5C, which is a schematic diagram of the image encoding method and the image decoding method, showing the encoding process and the image decoding method. The decoding process of the image decoding method.
  • the encoding process may include the following steps S31-S35.
  • Step S31 The blocking unit divides the current image (i.e., the original image) into multiple basic blocks.
  • Each basic block includes at least one image block.
  • M is a positive integer
  • the M basic blocks include N image blocks that do not overlap each other (i.e. original image blocks, denoted as original image block 1, original image block 2, ..., original image block N), N is a positive integer.
  • Step S32 Perform boundary filling on each original image block to obtain a boundary-filled image block, and generate N current blocks based on the boundary-filled image block. That is to say, perform boundary filling on the N original image blocks respectively. Obtain N image blocks after boundary filling, and use the N boundary filling image blocks as N current blocks.
  • the filling value of the original image block does not depend on the reconstructed pixel values of other original image blocks in the same basic block, and is allowed to depend on the reconstruction of the original image blocks in different basic blocks. Pixel values.
  • Each basic block includes at least one image block. (i.e., original image blocks).
  • Each image block within a basic block does not refer to each other, but can refer to the reconstruction information of other image blocks in different basic blocks from the image block.
  • the image block to its left that is, the adjacent block to the left of image block 1
  • the reconstruction information is used as the filling value of image block 1, but the filling preset value is used to fill the image block.
  • the image block above it that is, the adjacent block above image block 1 is in a different basic block from image block 1.
  • the reconstruction information of this image block can be used as the fill of image block 1. top up.
  • the reconstruction value of the image block may be used, and the reconstruction value of the image block before filtering may be used.
  • Step S33 The encoding unit determines the encoding parameters of the current block based on the information of the encoded block.
  • the encoding parameters are used to control the encoding rate of the current block (such as quantization step size and other parameters). There are no restrictions on the encoding parameters.
  • Step S34 The control unit writes the control parameters required by the decoding end and cannot be obtained by derivation into the code stream.
  • Step S35 Input the filled image block (ie, the current block) into the coding unit based on the neural network.
  • the coding unit codes the current block based on the coding parameters and outputs the code stream of the current block. For example, when the coding unit codes the current block based on the coding parameters, the coding unit may use a neural network to code the current block.
  • the decoding process may include the following steps S41-S44.
  • Step S41 The decoding end decodes the control parameters that are needed for the current block from the code stream and cannot be derived.
  • Step S42 based on the control parameters and the code stream of the current block, obtain the reconstructed image block corresponding to the current block through the decoding unit based on the neural network, that is, decode the current block to obtain the reconstructed image block, such as the reconstruction corresponding to the original image block 1 Image block 1, reconstructed image block 2 corresponding to original image block 2, ..., reconstructed image block N corresponding to original image block N.
  • Step S43 based on the control parameters, determine whether to perform a filtering process on a certain current block. If so, the filtering merging unit performs a filtering process based on the information of the current block and at least one adjacent reconstructed image block to obtain a filtered image block.
  • Step S44 combine the filtered image blocks to obtain a reconstructed image.
  • the filling value when performing boundary filling on the original image block, may be a filling preset value, and the filling preset value may be a default value agreed upon by the encoding and decoding (such as 0, or 1 ⁇ (1 -depth), etc., depth is the bit depth, such as 8, 10, 12, etc.), or it can be a value passed to the decoder through high-level syntax encoding, or it can be obtained by mirroring, nearest neighbor copying and other operations based on the pixels of the current block. There are no restrictions on how to obtain this fill value.
  • the surrounding block filling expansion size may be the default value agreed upon by the encoding and decoding (such as 1, 2, 4, etc.), or may be the same as the current block.
  • the size-related value can also be a value passed to the decoder through high-level syntax encoding, and there is no limit on the size of the surrounding block padding expansion.
  • the number of image blocks included in the basic block may be a default value agreed upon by the codec (such as 1, 4, 16, etc.), or a value related to the size of the current image, or It can be a value passed to the decoder through high-level syntax encoding. There is no limit to the number of image blocks in this basic block and can be selected according to actual needs.
  • Embodiment 3 and Embodiment 4 for the blocking process (step S11 and step S31) of the original image, it is also possible to perform image domain transformation on the original image, and then perform image domain transformation on the original image.
  • the image is divided into blocks.
  • the filtered image blocks can be merged into one image, and the image domain is inversely transformed.
  • the reconstructed image is obtained based on the inversely transformed image domain, as shown in Figure 5E.
  • the image domain transformation of the original image can be a transformation process from an RGB domain image to a YUV domain image (the corresponding image domain inverse transformation is an inverse transformation process from a YUV domain image to an RGB domain image), or it can be the introduction of wavelet Processes such as transformation or Fourier transform generate images in a new domain (the corresponding inverse transformation of the image domain is the inverse wavelet transform or inverse Fourier transform process).
  • the image domain transformation process can be implemented using a neural network or a non-neural network, and there is no restriction on this image domain transformation process.
  • Embodiment 5 In this embodiment of the present disclosure, a neural network-based image decoding method is proposed, which can be applied to the decoding end (also called a video decoder). See Figure 6A, which is a schematic structural diagram of the decoding end.
  • the decoding end may include Control parameter decoding unit, first feature decoding unit, second feature decoding unit, coefficient hyperparameter feature generation unit, image feature inverse transformation unit, first inverse quantization unit, second inverse quantization unit, and quality enhancement unit.
  • the first inverse quantization unit, the second inverse quantization unit, and the quality enhancement unit are optional units. In some scenarios, you can choose to turn off or skip the process of these optional units.
  • the code stream corresponding to the current block includes three parts: code stream 0 (code stream containing control parameters), code stream 1 (containing coefficient hyperparameter feature information) code stream), code stream 2 (code stream containing image feature information).
  • code stream 0 code stream containing control parameters
  • code stream 1 containing coefficient hyperparameter feature information
  • code stream 2 code stream containing image feature information
  • Coefficient hyperparameter feature information and image feature information can be collectively referred to as image information.
  • the image decoding method based on neural network in this embodiment may include the following steps S51-S58.
  • Step S51 Decode the code stream 0 corresponding to the current block and obtain the control parameters corresponding to the current block.
  • the control parameter decoding unit can decode the code stream 0 corresponding to the current block and obtain the control parameters corresponding to the current block. That is, the control parameter decoding unit can decode the control parameters from the code stream 0.
  • the control parameters may include the first inverse quantization unit.
  • the content of the control parameters please refer to the subsequent embodiments. , which will not be described in detail here.
  • Step S52 Decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature information corresponding to the current block.
  • the first feature decoding unit can decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature information corresponding to the current block. That is, the first feature decoding unit can decode the coefficient hyperparameter feature information from the code stream 1.
  • Step S53 Determine the coefficient hyperparameter characteristic coefficient reconstruction value based on the coefficient hyperparameter characteristic information.
  • the first enabling information may indicate enabling the first inverse quantization unit (that is, enabling the first inverse quantization unit to perform the first inverse quantization). operation), the first enabling information may also indicate disabling the first inverse quantization unit. For example, if the first enable information is a first value, it means that the first inverse quantization unit is enabled; if the first enable information is a second value, it means that the first inverse quantization unit is not enabled.
  • the coefficient hyperparameter feature information may be the coefficient hyperparameter feature coefficient quantized value C_q, and the first inverse quantization unit may quantize the coefficient hyperparameter feature coefficient quantized value C_q is inversely quantized to obtain the coefficient hyperparameter characteristic coefficient reconstruction value C'. If the first enablement information indicates that the first inverse quantization unit is not enabled, the coefficient hyperparameter feature information can be the coefficient hyperparameter feature coefficient reconstruction value C', that is, the coefficient hyperparameter feature coefficient reconstruction value is directly decoded from code stream 1 C'.
  • Step S54 Perform an inverse transformation operation on the coefficient hyperparameter characteristic coefficient reconstruction value C' to obtain the coefficient hyperparameter characteristic value P.
  • the coefficient hyperparameter feature generation unit performs an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' to obtain the coefficient hyperparameter feature value P.
  • the coefficient hyperparameter feature generation unit reconstructs the coefficient hyperparameter feature coefficient based on the decoding neural network. Perform an inverse transformation operation on the value C' to obtain the coefficient hyperparameter characteristic value P.
  • the coefficient hyperparameter feature generation unit can obtain the neural network information 1 corresponding to the coefficient hyperparameter feature generation unit from the control parameters, and generate the decoding corresponding to the coefficient hyperparameter feature generation unit based on the neural network information 1 Neural Network 1.
  • the coefficient hyperparameter feature generation unit can determine the input feature corresponding to the coefficient hyperparameter feature generation unit (such as the coefficient hyperparameter feature coefficient reconstruction value C'), and process the input feature based on the decoding neural network 1 (such as the coefficient superparameter feature generation unit). Perform an inverse transformation operation on the parameter feature coefficient reconstruction value C') to obtain the output feature corresponding to the coefficient hyperparameter feature generation unit (such as the coefficient hyperparameter feature value P).
  • the neural network information 1 may include basic layer information and enhancement layer information.
  • the coefficient hyperparameter feature generation unit may determine the basic layer corresponding to the coefficient hyperparameter feature generation unit based on the basic layer information, and determine the coefficient based on the enhancement layer information.
  • the enhancement layer corresponding to the hyperparameter feature generation unit.
  • the coefficient hyperparameter feature generation unit can generate the decoding neural network 1 corresponding to the coefficient hyperparameter feature generation unit based on the base layer and the enhancement layer. For example, the base layer and the enhancement layer can be combined to obtain the decoding neural network 1.
  • the coefficient hyperparameter feature generation unit can perform an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' through the decoding neural network 1 to obtain the coefficient hyperparameter feature value P.
  • This inverse transformation operation process No restrictions.
  • Step S55 Decode the code stream 2 corresponding to the current block and obtain the image feature information corresponding to the current block.
  • the second feature decoding unit can decode the code stream 2 corresponding to the current block and obtain the image feature information corresponding to the current block. That is, the second feature decoding unit can decode the image feature information from the code stream 2.
  • the second feature decoding unit can use the coefficient hyperparameter feature value P to decode the code stream 2 corresponding to the current block, and there is no restriction on this decoding process.
  • Step S56 Determine the image feature reconstruction value based on the image feature information.
  • the second enabling information may indicate enabling the second inverse quantization unit (that is, enabling the second inverse quantization unit to perform the second inverse quantization). operation), the second enabling information may also indicate disabling the second inverse quantization unit. For example, if the second enable information is the first value, it means that the second inverse quantization unit is enabled; if the second enable information is the second value, it means that the second inverse quantization unit is not enabled. Yuan.
  • the image feature information may be the image feature quantized value F_q, and the second inverse quantization unit may obtain the image feature quantized value F_q, and quantize the image feature value.
  • F_q is inversely quantized to obtain the image feature reconstruction value F'.
  • the image feature information may be the image feature reconstruction value F', that is, the image feature reconstruction value F' is directly decoded from the code stream 2.
  • Step S57 Perform an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the image feature inverse transformation unit performs an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the image feature reconstruction value F' is inversely transformed to obtain the image low-order feature value.
  • Value LF is a value that is a value that is a value that is a transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the image feature inverse transformation unit can obtain the neural network information 2 corresponding to the image feature inverse transformation unit from the control parameters, and generate the decoding neural network 2 corresponding to the image feature inverse transformation unit based on the neural network information 2 .
  • the image feature inverse transformation unit can determine the input feature corresponding to the image feature inverse transformation unit (such as the image feature reconstruction value F'), and process the input feature based on the decoding neural network 2 (such as inverse transformation of the image feature reconstruction value F' Operation) to obtain the output features corresponding to the image feature inverse transformation unit (such as the image low-order feature value LF).
  • the neural network information 2 may include basic layer information and enhancement layer information.
  • the image feature inverse transformation unit may determine the basic layer corresponding to the image feature inverse transformation unit based on the basic layer information, and determine the corresponding base layer of the image feature inverse transformation unit based on the enhancement layer information.
  • enhancement layer The image feature inverse transformation unit can generate the decoding neural network 2 corresponding to the image feature inverse transformation unit based on the base layer and the enhancement layer. For example, the base layer and the enhancement layer can be combined to obtain the decoding neural network 2.
  • the image feature inverse transformation unit can perform an inverse transformation operation on the image feature reconstruction value F' through the decoding neural network 2 to obtain the image low-order feature value LF. There is no limit to the inverse transformation operation process.
  • Step S58 Determine the reconstructed image block I corresponding to the current block based on the image low-order feature value LF.
  • the third enabling information may indicate enabling the quality enhancement unit (ie, enabling the quality enhancement unit to perform a quality enhancement operation).
  • the third enabling information The message may also indicate that the quality enhancement unit is not enabled. For example, if the third enabling information is a first value, it may indicate that the quality enhancement unit is enabled, and if the third enabling information is a second value, it may indicate that the quality enhancement unit is not enabled.
  • the quality enhancement unit obtains the low-order feature value LF of the image, performs enhancement processing on the low-order feature value LF of the image, and obtains the reconstructed image block I corresponding to the current block. If the third enabling information indicates that the quality enhancement unit is not enabled, the image low-order feature value LF is used as the reconstructed image block I corresponding to the current block.
  • the quality enhancement unit when the quality enhancement unit enhances the low-order feature value LF of the image, the quality enhancement unit can enhance the low-order feature value LF of the image based on the decoding neural network to obtain the reconstructed image block I corresponding to the current block.
  • the quality enhancement unit can obtain the neural network information 3 corresponding to the quality enhancement unit from the control parameters, and generate the decoding neural network 3 corresponding to the quality enhancement unit based on the neural network information 3.
  • the quality enhancement unit can determine the input features corresponding to the quality enhancement unit (such as the low-order feature value LF of the image), and process the input features based on the decoding neural network 3 (such as enhancing the low-order feature value LF of the image) to obtain the quality
  • the output features corresponding to the enhancement unit (such as the reconstructed image block I corresponding to the current block).
  • the neural network information 3 may include basic layer information and enhancement layer information.
  • the quality enhancement unit may determine the basic layer corresponding to the quality enhancement unit based on the basic layer information, and determine the enhancement layer corresponding to the quality enhancement unit based on the enhancement layer information.
  • the quality enhancement unit can generate the decoding neural network 3 corresponding to the quality enhancement unit based on the base layer and the enhancement layer. For example, the base layer and the enhancement layer can be combined to obtain the decoding neural network 3.
  • the quality enhancement unit can enhance the low-order feature value LF of the image through the decoding neural network 3 to obtain the reconstructed image block I corresponding to the current block. There is no restriction on this enhancement process.
  • Embodiment 6 In this embodiment of the present disclosure, a neural network-based image decoding method is proposed, which can be applied to the decoding end. See Figure 6B, which is a schematic structural diagram of the decoding end.
  • the decoding end may include a control parameter decoding unit, a first feature decoding unit, second feature decoding unit, coefficient hyperparameter feature generation unit, and image feature inverse transformation unit.
  • the code stream corresponding to the current block includes three parts: code stream 0 (code stream containing control parameters), code stream 1 (containing coefficient hyperparameter feature information) code stream), code stream 2 (code stream containing image feature information).
  • the image decoding method based on neural network in this embodiment may include the following steps S61-S66.
  • Step S61 The control parameter decoding unit decodes the code stream 0 corresponding to the current block and obtains the control parameters corresponding to the current block.
  • Step S62 The first feature decoding unit decodes the code stream 1 corresponding to the current block and obtains the coefficient hyperparameter feature information corresponding to the current block.
  • the coefficient hyperparameter feature information may be the coefficient hyperparameter feature coefficient reconstruction value C'.
  • Step S63 The coefficient hyperparameter feature generation unit performs an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' to obtain the coefficient hyperparameter feature value P.
  • the coefficient hyperparameter feature coefficient reconstruction value C' is inversely transformed based on the decoding neural network. Operation to obtain the coefficient hyperparameter eigenvalue P.
  • Step S64 The second feature decoding unit decodes the code stream 2 corresponding to the current block and obtains the image feature information corresponding to the current block.
  • the image feature information may be the image feature reconstruction value F'.
  • the second feature decoding unit can obtain the coefficient hyperparameter feature value P, and use the coefficient hyperparameter feature value P to decode the code stream 2 corresponding to the current block to obtain the image feature reconstruction value F'.
  • Step S65 The image feature inverse transformation unit performs an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF. For example, based on the decoding neural network, the image feature reconstruction value F' is inversely transformed to obtain the image low-order feature value. LF.
  • Step S66 Determine the reconstructed image block I corresponding to the current block based on the image low-order feature value LF.
  • the low-order feature value LF of the image can be directly used as the reconstructed image block I.
  • the quality enhancement unit performs enhancement processing on the low-order feature value LF of the image to obtain the reconstructed image block I corresponding to the current block. For example, based on the decoding neural network, the low-order feature value LF of the image is obtained. Perform enhancement processing to obtain the reconstructed image block I.
  • the first feature decoding unit can decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature information corresponding to the current block.
  • the first feature decoding unit includes at least one coefficient decoding module.
  • the coefficient decoding module can use the entropy decoding method to decode the coefficients, that is, use the entropy decoding method to decode the code stream 1 corresponding to the current block, Obtain the coefficient hyperparameter feature information corresponding to the current block.
  • the entropy decoding method may include but is not limited to CAVLC (Context-Adaptive Varialbe Length Coding, content-based adaptive variable length coding) or CABAC (Context-based Adaptive Binary Arithmetic Coding, context-based adaptive binary arithmetic coding) Isentropic decoding method, there is no restriction on this.
  • CAVLC Context-Adaptive Varialbe Length Coding, content-based adaptive variable length coding
  • CABAC Context-based Adaptive Binary Arithmetic Coding, context-based adaptive binary arithmetic coding
  • the probability model of the entropy decoding can be a preset probability model.
  • the preset probability model can be configured according to actual needs, and there is no restriction on the preset probability model.
  • the coefficient decoding module can use the entropy decoding method to decode the coefficients.
  • the first inverse quantization unit can inversely quantize the coefficient hyperparameter characteristic coefficient quantized value C_q (ie, the coefficient hyperparameter characteristic information) to obtain the coefficient hyperparameter characteristic coefficient reconstruction value C'.
  • the first inverse quantization unit may not exist, or if the first inverse quantization unit exists, the first inverse quantization unit may be selectively skipped based on control parameters (such as high-level syntax, such as first enable information, etc.), Or the first inverse quantization unit is determined to be enabled based on the control parameter.
  • the coefficient hyperparameter characteristic coefficient reconstructed value C' is the same as the coefficient hyperparameter characteristic coefficient quantized value C_q, that is, there is no need to inverse the coefficient hyperparameter characteristic coefficient quantized value C_q. If the first inverse quantization unit is selectively skipped based on the control parameters, the coefficient hyperparameter characteristic coefficient reconstruction value C' is the same as the coefficient hyperparameter characteristic coefficient quantized value C_q, that is, there is no need to inverse the coefficient hyperparameter characteristic coefficient quantized value C_q. .
  • the step parameter qstep corresponding to the quantized value C_q of the coefficient hyperparameter characteristic coefficient is 1, then the reconstructed value C' of the coefficient hyperparameter characteristic coefficient is equal to the quantized value of the coefficient hyperparameter characteristic coefficient C_q is the same, that is, there is no need to inverse quantize the coefficient hyperparameter feature coefficient quantized value C_q.
  • the first inverse quantization unit can be based on the control parameters (such as quantization correlation parameter) performs inverse quantization on the coefficient hyperparameter characteristic coefficient quantized value C_q to obtain the coefficient hyperparameter characteristic coefficient reconstruction value C'.
  • the first inverse quantization unit performs the following operations: obtains the quantization-related parameters corresponding to the quantized value C_q of the coefficient hyperparameter characteristic coefficient from the control parameters (the control parameters are included in the code stream, and the control parameters may include quantization-related parameters), Such as the step parameter qstep or the quantization parameter qp.
  • the quantization-related parameters (such as the step parameter qstep) corresponding to the quantized value C_q of the coefficient hyperparameter characteristic coefficient can include: 1) The quantized value of each coefficient hyperparameter characteristic coefficient of each characteristic channel adopts the same Step parameter qstep; 2) The quantized value of the characteristic coefficient of the coefficient hyperparameter of each feature channel adopts a different step parameter qstep. However, the quantified value of the characteristic coefficient of each coefficient hyperparameter in the feature channel adopts the same step parameter qstep; 3) The quantized value of each coefficient hyperparameter characteristic coefficient of each feature channel adopts a different step parameter qstep.
  • the step parameter qstep can also be called the quantization step.
  • the coefficient hyperparameter feature generation unit can perform an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' based on the decoding neural network to obtain the coefficient hyperparameter feature value P.
  • the coefficient hyperparameter feature generation unit may include a decoding neural network 1.
  • the decoding neural network 1 may include a basic layer and an enhancement layer, and the coefficient hyperparameter feature coefficient reconstruction value C' is as The input features of the decoding neural network 1, the coefficient hyperparameter feature value P are used as the output features of the decoding neural network 1, and the decoding neural network 1 is used to perform an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C'.
  • the decoding neural network 1 is divided into a basic layer and an enhancement layer.
  • the basic layer may include at least one network layer, and the basic layer may not include a network layer, that is, the basic layer is empty.
  • the enhancement layer may include at least one network layer, or the enhancement layer may not include a network layer, that is, the enhancement layer may be empty. It should be noted that for the multiple network layers in the decoding neural network 1, the multiple network layers can be divided into basic layers and enhanced layers according to actual needs.
  • the first M1 network layers are used as the basic layers, and The remaining network layers are used as enhancement layers, or the first M2 network layers are used as enhancement layers and the remaining network layers are used as basic layers, or the following M3 network layers are used as basic layers and the remaining network layers are used as enhancement layers, or The following M4 network layers are used as enhancement layers, and the remaining network layers are used as basic layers, or the odd-numbered network layers are used as basic layers, and the remaining network layers are used as enhancement layers, or the even-numbered network layers are used as basic layers, and The remaining network layers serve as enhancement layers.
  • the above are just a few examples, and there are no restrictions on this division method.
  • the network layer with a fixed network structure can be used as the basic layer
  • the network layer with an unfixed network structure can be used as the enhancement layer.
  • the network layer with a fixed network structure if the network layer will use the same network structure when decoding multiple image blocks, then the network layer will be regarded as a network layer with a fixed network structure, and the network layer will be regarded as Basic layer.
  • the network layer will be regarded as a network layer with an unfixed network structure, and the network layer will be layer as an enhancement layer.
  • the size of the output feature may be larger than the size of the input feature, or the size of the output feature may be equal to the size of the input feature, or the size of the output feature may be smaller than the size of the input feature.
  • the base layer may include at least one deconvolution layer
  • the enhancement layer may include at least one deconvolution layer, or may not include a deconvolution layer.
  • the enhancement layer may include at least one deconvolution layer
  • the base layer may include at least one deconvolution layer, or may not include a deconvolution layer.
  • the decoding neural network 1 may include but is not limited to a deconvolution layer, an activation layer, etc., and is not limited to this.
  • the decoding neural network 1 consists of a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 1, and an activation layer. All the above network layers can be regarded as the basic layer, that is, the basic layer includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 1, and an activation layer.
  • the enhancement layer is empty.
  • part of the network layer can also be used as an enhancement layer, and there is no restriction on this.
  • the decoding neural network 1 includes a deconvolution layer with a stride of 2 and a deconvolution layer with a stride of 1. All the above network layers can be regarded as the basic layer, that is, the basic layer includes a deconvolution layer with a stride of 2 and a deconvolution layer with a stride of 1. In this case, the enhancement layer is empty. Of course, it is also Part of the network layer can be used as an enhancement layer, and there is no restriction on this.
  • the decoding neural network 1 includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 2, an activation layer, and a deconvolution with a stride of 1. layer, 1 activation layer.
  • the basic layer includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 2, an activation layer, and a stride. It is a deconvolution layer of 1 and an activation layer.
  • the enhancement layer is empty.
  • some network layers can also be used as enhancement layers, and there is no restriction on this.
  • the above are just a few examples and are not limiting.
  • the network layer of the default network structure can be configured for the coefficient hyperparameter feature generation unit (the network layer of the default network structure can be composed of at least one network layer), and the network related to the network layer of the default network structure
  • the parameters are all fixed.
  • the network parameters may include but are not limited to at least one of the following: number of neural network layers, deconvolution layer flag, number of deconvolution layers, quantization step size of each deconvolution layer, each deconvolution layer
  • the number of channels of the multilayer layer, the size of the convolution kernel, the number of filters, the filter size index, the filter coefficient zero flag, the filter coefficient, the number of activation layers, the activation layer flag, the activation layer type, that is to say, the above network parameters All are fixed.
  • the number of deconvolution layers is fixed, the number of activation layers is fixed, the number of channels of each deconvolution layer is fixed, the size of the convolution kernel is fixed, and the filter coefficients are fixed. is fixed, etc., such as the pass of the deconvolution layer
  • the number of channels is 4, 8, 16, 32, 64, 128, or 256, etc.
  • the size of the convolution kernel is 1*1, 3*3, or 5x5, etc.
  • a prefabricated neural network pool can be configured for the coefficient hyperparameter feature generation unit.
  • the prefabricated neural network pool can include at least one network layer of a prefabricated network structure (the prefabricated network structure can be composed of at least one network layer). Network layer), the network parameters related to the network layer of the prefabricated network structure can be configured according to actual needs.
  • the network parameters can include but are not limited to at least one of the following: number of neural network layers, deconvolution layer flag , the number of deconvolution layers, the quantization step size of each deconvolution layer, the number of channels of each deconvolution layer, the size of the convolution kernel, the number of filters, the filter size index, the filter coefficient zero flag, filter Coefficients, number of activation layers, activation layer flags, activation layer types, that is to say, the above network parameters can be configured according to actual needs.
  • the prefabricated neural network pool may include the network layer of the prefabricated network structure s1, the network layer of the prefabricated network structure s2, and the network layer of the prefabricated network structure s3.
  • the number of deconvolution layers, the number of activation layers, the number of channels of each deconvolution layer, the size of the convolution kernel, filter coefficients and other network parameters can be pre-configured. After all network parameters are configured, the network layer of the prefabricated network structure s1 can be obtained. In the same way, the network layer of the prefabricated network structure s2 and the network layer of the prefabricated network structure s3 can be obtained, which will not be repeated here.
  • a network layer with a variable network structure can be dynamically generated for the coefficient hyperparameter feature generation unit based on network parameters (the network layer with a variable network structure can be composed of at least one network layer), and the variable network
  • the network parameters related to the network layer of the structure are dynamically generated by the encoding end rather than preconfigured.
  • the encoding end can send the network parameters corresponding to the coefficient hyperparameter feature generation unit to the decoding end.
  • the network parameters can include but not Limited to at least one of the following: number of neural network layers, deconvolution layer flag, number of deconvolution layers, quantization step size of each deconvolution layer, number of channels of each deconvolution layer, convolution kernel Size, number of filters, filter size index, filter coefficient zero flag, filter coefficient, number of activation layers, activation layer flag, activation layer type.
  • the decoder dynamically generates a network layer with a variable network structure based on the above network parameters. For example, the encoding end encodes the number of deconvolution layers, the number of activation layers, the number of channels of each deconvolution layer, the size of the convolution kernel, filter coefficients and other network parameters in the code stream. The decoding end can obtain the code from the code stream.
  • the above network parameters are parsed, and a network layer with a variable network structure is generated based on these network parameters. There is no limit to the generation process.
  • the decoding neural network 1 can be divided into a basic layer and an enhancement layer.
  • the combination of the basic layer and the enhancement layer can include but is not limited to the following methods: Method 1.
  • the basic layer adopts the default network structure
  • the network layer, the enhancement layer adopts the network layer of the default network structure, and the basic layer and the enhancement layer form a decoding neural network 1.
  • Method 2 The basic layer adopts the network layer of the default network structure, and the enhancement layer adopts the network layer of the prefabricated network structure.
  • the basic layer and the enhancement layer form a decoding neural network 1.
  • Method 3 The basic layer adopts a network layer with a default network structure, and the enhancement layer adopts a network layer with a variable network structure.
  • the basic layer and the enhancement layer form a decoding neural network 1.
  • Method 4 The basic layer adopts a network layer with a prefabricated network structure, and the enhancement layer adopts a network layer with a default network structure.
  • the basic layer and the enhancement layer form a decoding neural network 1.
  • Method 5 The basic layer adopts a network layer of a prefabricated network structure, and the enhancement layer adopts a network layer of a prefabricated network structure.
  • the basic layer and the enhancement layer form a decoding neural network 1.
  • Method 6 The basic layer adopts a network layer with a prefabricated network structure, and the enhancement layer adopts a network layer with a variable network structure.
  • the basic layer and the enhancement layer form a decoding neural network 1.
  • the control parameters may include neural network information 1 corresponding to the coefficient hyperparameter feature generation unit.
  • the coefficient hyperparameter feature generation unit may parse the neural network information 1 from the control parameters and generate the neural network information 1 based on the neural network information 1.
  • the neural network information 1 can include basic layer information and enhancement layer information.
  • the basic layer can be determined based on the basic layer information, and the enhancement layer can be determined based on the enhancement layer information.
  • the basic layer and enhancement layer can be combined to obtain the decoding neural network.
  • Network 1 For example, the following situation can be used to obtain the neural network information corresponding to the coefficient hyperparameter feature generation unit 1:
  • the base layer information includes the base layer using the default network flag, and the base layer using the default network flag indicates that the base layer uses the default network.
  • the coefficient hyperparameter feature generation unit learns that the base layer uses the default based on the base layer information.
  • the network layer of the network structure therefore, obtains the base layer of the default network structure (i.e., the network layer of the default network structure).
  • the enhancement layer information includes the enhancement layer using the default network flag, and the enhancement layer using the default network flag indicates that the enhancement layer uses the default network.
  • the coefficient hyperparameter feature generation unit learns based on the enhancement layer information that the enhancement layer adopts the default network structure.
  • the network layer therefore, obtains the enhancement layer of the default network structure (i.e., the network layer of the default network structure).
  • the basic layer of the default network structure and the enhancement layer of the default network structure can be combined to obtain the decoding neural network 1.
  • the base layer information includes the base layer using the default network flag, and the base layer using the default network flag indicates that the base layer uses the default network.
  • the coefficient hyperparameter feature generation unit learns that the base layer uses the default based on the base layer information.
  • the network layer of the network structure therefore, obtains the base layer of the default network structure (i.e., the network layer of the default network structure).
  • the enhancement layer information includes the enhancement layer usage prefabricated network flag and the enhancement layer prefabricated network index number. And the enhancement layer uses a prefabricated network flag bit indicates that the enhancement layer uses a prefabricated network.
  • the coefficient hyperparameter feature generation unit learns based on the enhancement layer information that the enhancement layer uses a prefabricated network structure.
  • the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer (for example, when the prefabricated network index number of the enhancement layer is 0, the network layer of the prefabricated network structure s1 is used as the enhancement layer; when the prefabricated network index number of the enhancement layer is 1, the prefabricated network structure The network layer of s2 serves as the enhancement layer).
  • the basic layer of the default network structure and the enhancement layer of the prefabricated network structure can be combined to obtain the decoding neural network 1.
  • the base layer information includes the base layer using the default network flag, and the base layer using the default network flag indicates that the base layer uses the default network.
  • the coefficient hyperparameter feature generation unit learns that the base layer uses the default based on the base layer information.
  • the network layer of the network structure therefore, obtains the base layer of the default network structure (i.e., the network layer of the default network structure).
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the coefficient hyperparameter feature generation unit can parse the network parameters from the control parameters, and generate an enhancement layer with a variable network structure based on the network parameters (i.e. The network layer that changes the network structure).
  • the network parameters may include but are not limited to at least one of the following: number of neural network layers, deconvolution layer flag, number of deconvolution layers, quantization step size of each deconvolution layer, each deconvolution layer
  • the number of channels of the layer, the size of the convolution kernel, the number of filters, the filter size index, the filter coefficient zero flag, the filter coefficient, the number of activation layers, the activation layer flag, the activation layer type, the coefficient hyperparameter feature generation unit can be based on The above network parameters generate enhancement layers of variable network structures.
  • the coefficient hyperparameter feature generation unit parses the number of deconvolution layers, the quantization step (stride) of each deconvolution layer, the number of channels of each deconvolution layer, and the number of convolution kernels from the control parameters.
  • Network parameters such as size and activation layer type, and the coefficient hyperparameter feature generation unit generates enhancement layers with variable network structures based on these network parameters.
  • the coefficient hyperparameter feature generation unit can combine the basic layer of the default network structure and the enhancement layer of the variable network structure to obtain the decoding neural network 1.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated network.
  • the coefficient hyperparameter feature generation unit is based on the basic layer.
  • the layer information knows that the basic layer adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer is selected from the prefabricated neural network pool (for example, when the prefabricated network index number of the basic layer is 0, the prefabricated network The network layer of structure s1 is used as the basic layer. When the prefabricated network index number of the basic layer is 1, the network layer of the prefabricated network structure s2 is used as the basic layer).
  • the enhancement layer information includes the enhancement layer using the default network flag, and the enhancement layer using the default network flag indicates that the enhancement layer uses the default network.
  • the coefficient hyperparameter feature generation unit learns based on the enhancement layer information that the enhancement layer adopts the default network structure.
  • the network layer therefore, obtains the enhancement layer of the default network structure (i.e., the network layer of the default network structure).
  • the basic layer of the prefabricated network structure and the enhancement layer of the default network structure can be combined to obtain the decoding neural network 1.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated network.
  • the coefficient hyperparameter feature generation unit is based on the basic layer.
  • the layer information knows that the basic layer adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer can be selected from the prefabricated neural network pool (such as the network layer of the prefabricated network structure s1, etc.).
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network.
  • the coefficient hyperparameter feature generation unit is informed based on the enhancement layer information.
  • the enhancement layer adopts the network layer of the prefabricated network structure. Therefore, the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer can be selected from the prefabricated neural network pool (such as the network layer of the prefabricated network structure s1, etc.). On this basis, the basic layer of the prefabricated network structure and the enhancement layer of the prefabricated network structure can be combined to obtain the decoding neural network 1.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated network.
  • the coefficient hyperparameter feature generation unit is based on the basic layer.
  • the layer information knows that the basic layer adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer can be selected from the prefabricated neural network pool (such as the network layer of the prefabricated network structure s1, etc.).
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the coefficient hyperparameter feature generation unit can parse the network parameters from the control parameters, and generate an enhancement layer with a variable network structure based on the network parameters (i.e. The network layer that changes the network structure). For example, the coefficient hyperparameter feature generation unit can parse the number of deconvolution layers, the quantization step (stride) of each deconvolution layer, the number of channels of each deconvolution layer, and the convolution kernel from the control parameters. The size, activation layer type and other network parameters, the coefficient hyperparameter feature generation unit generates an enhancement layer with a variable network structure based on these network parameters. On this basis, the coefficient hyperparameter feature generation unit can combine the basic layer of the prefabricated network structure and the enhancement layer of the variable network structure to obtain the decoding neural network 1.
  • the coefficient hyperparameter feature generation unit can combine the basic layer of the prefabricated network structure and the enhancement layer of the variable network structure to obtain the decoding neural network 1.
  • the above network structures for the base layer and the enhancement layer can be determined through decoded control parameters.
  • the control parameters can include neural network information 1, and the neural network information 1 can include base layer information and enhancement.
  • Layer information an example of the neural network information 1 for the coefficient hyperparameter feature generation unit can be seen in Table 1.
  • u(n) represents the n-bit fixed-length code encoding method
  • ae(v) represents the variable Long coding method.
  • hyper_basic_layer_use_default_para_flag is the default network flag for the basic layer of the coefficient hyperparameter feature generation unit
  • hyper_basic_layer_use_default_para_flag is a binary variable
  • a value of 1 indicates that the basic layer of the coefficient hyperparameter feature generation unit uses the default network
  • a value of 0 indicates The basic layer of the coefficient hyperparameter feature generation unit does not use the default network
  • the value of HyperBasicLayerUseDefaultParaFlag is equal to the value of hyper_basic_layer_use_default_para_flag.
  • hyper_basic_layer_use_predesigned_para_flag is the prefabricated network flag used for the basic layer of the coefficient hyperparameter feature generation unit
  • hyper_basic_layer_use_predesigned_para_flag is a binary variable.
  • the value of this binary variable is 1, it means that the basic layer of the coefficient hyperparameter feature generation unit uses a prefabricated network, and when the value of this binary variable is 0, it means that the basic layer of the coefficient hyperparameter feature generation unit does not use a prefabricated network.
  • the value of HyperBasicLayerUsePredesignedParaFlag can be equal to the value of hyper_basic_layer_use_predesigned_para_flag.
  • hyper_basic_id is the base layer prefabricated network index number for the coefficient hyperparameter feature generation unit. It can be a 32-bit unsigned integer, indicating the index number of the neural network used in the base layer in the prefabricated neural network pool.
  • hyper_enhance_layer_use_default_para_flag is the default network flag used for the enhancement layer of the coefficient hyperparameter feature generation unit
  • hyper_enhance_layer_use_default_para_flag is a binary variable.
  • the value of this binary variable is 1, it means that the enhancement layer of the coefficient hyperparameter feature generation unit uses the default network, and when the value of this binary variable is 0, it means that the enhancement layer of the coefficient hyperparameter feature generation unit does not use the default network.
  • the value of HyperEnhanceLayerUseDefaultParaFlag can be equal to the value of hyper_enhance_layer_use_default_para_flag.
  • hyper_enhance_layer_use_predesigned_para_flag is the prefabricated network flag used for the enhancement layer of the coefficient hyperparameter feature generation unit
  • hyper_enhance_layer_use_predesigned_para_flag is a binary variable. When the value of this binary variable is 1, it indicates that the enhancement layer of the coefficient hyperparameter feature generation unit uses the prefabricated network. When the value of this binary variable is 0, it indicates that the enhancement layer of the coefficient hyperparameter feature generation unit does not use the prefabricated network.
  • HyperEnhanceLayerUsePredesignedParaFlag The value can be equal to the value of hyper_enhance_layer_use_predesigned_para_flag.
  • hyper_enhance_id is the pre-made network index number of the enhancement layer for the coefficient hyperparameter feature generation unit. It can be a 32-bit unsigned integer, indicating the index number of the neural network used in the enhancement layer in the pre-made neural network pool.
  • the range of hyper_basic_id is [id_min, id_max], id_min is preferably 0, id_max is preferably 2 ⁇ 32-1, and the [a, b] section is a reserved section for later expansion of the prefabricated neural network pool.
  • the prefabricated neural network pool of the basic layer can include several prefabricated networks of the basic layer, such as 2 , 3, 4, etc., which can include dozens of basic layer prefabricated networks, and can also include more basic layer prefabricated networks, without restrictions.
  • the preferred value of id_max is 2 ⁇ 32-1 is just an example, and the value of id_max can be dynamically adjusted under different circumstances.
  • the range of hyper_enhance_id is [id_min, id_max], id_min is preferably 0, id_max is preferably 2 ⁇ 32-1, and the [a, b] section is a reserved section for later expansion of the prefabricated neural network pool.
  • the prefabricated neural network pool can include several enhancement layer prefabricated networks, such as 2, 3, 4, etc., and can include dozens of enhancement layer prefabricated networks. More enhancement layer prefabricated networks can be included without restriction.
  • the preferred value of id_max is 2 ⁇ 32-1 is just an example, and the value of id_max can be dynamically adjusted under different circumstances.
  • the above network structure for the enhancement layer can be determined through decoded control parameters.
  • the control parameters can include neural network information 1, and the neural network information 1 can include enhancement layer information.
  • the neural network information 1 of the generating unit can be seen in Table 2 and Table 3.
  • layer_num represents the number of neural network layers and is used to represent the number of network layers of the neural network. If activated If the layer is included in a certain network structure, the number of layers is not calculated additionally, and the value of LayerNum is equal to layer_num.
  • deconv_layer_flag represents the deconvolution layer flag
  • deconv_layer_flag is a binary variable.
  • the value of the binary variable is 1, it means that the current layer is a deconvolution layer network.
  • the value of the binary variable is 0, it means that the current layer is not a deconvolution layer network.
  • the value of DeconvLayerFlag is equal to the value of deconv_layer_flag.
  • stride_num represents the quantization step size of the deconvolution layer.
  • filter_num represents the number of filters, that is, the number of filters in the current layer.
  • filter_size_index represents the filter size index, that is, the current filter size index value.
  • filter_coeff_zero_flag[i][j] indicates that the filter coefficient is a zero flag and is a binary variable. When the value of the binary variable is 1, it means that the current filter coefficient is 0. When the value of the binary variable is 0, it means that the current filter coefficient is not 0.
  • the value of FilterCoeffZeroFlag[i][j] is equal to filter_coeff_zero_flag[i ][j] value.
  • filter_coeff[i][j] represents the filter system, that is, the current filter coefficient value.
  • activation_layer_flag represents the activation layer flag
  • activation_layer_flag is a binary variable.
  • the value of the binary variable is 1, it means that the current layer is the activation layer.
  • the value of the binary variable is 0, it means that the current layer is not the activation layer.
  • the value of ActivationLayerFlag is equal to the value of activation_layer_flag.
  • activation_layer_type indicates the activation layer type, that is, the specific type of the activation layer of the current layer.
  • the second feature decoding unit can decode the code stream 2 corresponding to the current block and obtain the image feature information corresponding to the current block.
  • the second feature decoding unit includes at least one coefficient decoding module and a probability model acquisition module.
  • the coefficient decoding module can use the entropy decoding method to decode the coefficients, that is, use the entropy decoding method to decode the code stream 2 corresponding to the current block and obtain the image features corresponding to the current block. information.
  • the entropy decoding method may include but is not limited to CAVLC or CABAC, etc., and is not limited thereto.
  • the way in which the features generated by the coefficient hyperparameter feature generation unit are used in the second feature decoding unit may include:
  • the probability model acquisition module is used to acquire the probability model of entropy decoding.
  • the probability model acquisition module acquires the coefficient hyperparameter feature value P from the coefficient hyperparameter feature generation unit.
  • the coefficient decoding module can obtain the coefficient hyperparameter feature value P from the probability model acquisition module.
  • the coefficient decoding module can use the entropy decoding method to decode the coefficient.
  • Method 2 The coefficient parsing process (such as the CABAC or CAVCL decoding process) does not rely on the features generated by the coefficient hyperparameter feature generation unit, and can directly parse the coefficient values (this can ensure the parsing throughput or rate). Based on the features generated by the coefficient hyperparameter feature generation unit, the parsed coefficient values are converted to obtain the image feature quantified value F_q.
  • the second inverse quantization unit can inversely quantize the image feature quantized value F_q (ie, image feature information) to obtain the image feature reconstruction value F'.
  • the second inverse quantization unit may not exist, or if the second inverse quantization unit exists, the second inverse quantization unit may be selectively skipped based on control parameters (such as high-level syntax, such as second enable information, etc.), Or the second inverse quantization unit is determined to be enabled based on the control parameter.
  • control parameters such as high-level syntax, such as second enable information, etc.
  • the second inverse quantization unit is determined to be enabled based on the control parameter.
  • the image feature reconstruction value F' is the same as the image feature quantization value F_q, that is, there is no need to inverse the image feature quantization value F_q.
  • the image feature reconstruction value F' is the same as the image feature quantization value F_q, that is, there is no need to inverse the image feature quantization value F_q. If the second inverse quantization unit is enabled based on the control parameters, but the step parameter qstep corresponding to the image feature quantized value F_q is 1, then the image feature reconstruction value F' is the same as the image feature quantized value F_q, that is, there is no need to modify the image feature The quantized value F_q is inversely quantized.
  • the second inverse quantization unit can be based on the control parameters (such as quantization related parameters).
  • the image feature quantized value F_q is inversely quantized to obtain the image feature reconstruction value F'.
  • the second inverse quantization unit performs the following operations: obtains the quantization-related parameters corresponding to the image feature quantization value F_q from the control parameters (the control parameters are included in the code stream, and the control parameters may include quantization-related parameters), such as the step size Parameter qstep or quantization parameter qp.
  • the quantization-related parameters (such as the step parameter qstep) corresponding to the image feature quantization value F_q include: 1) Each image feature quantization value of each feature channel uses the same step parameter qstep; 2) The image feature quantization value of each feature channel uses a different step size parameter qstep, but each image feature quantization value within the feature channel uses the same step size parameter qstep; 3) Each image feature quantization value of each feature channel All use different step size parameters qstep.
  • the image feature inverse transformation unit can perform an inverse transformation operation on the image feature reconstruction value F' based on the decoding neural network to obtain the image low-order feature value LF.
  • the image feature inverse transformation unit may include a decoding neural network 2.
  • the decoding neural network 2 may include a basic layer and an enhancement layer, and the image feature reconstruction value F' is used as the decoding neural network 2.
  • the image low-order feature value LF is used as the output feature of the decoding neural network 2
  • the decoding neural network 2 is used to perform an inverse transformation operation on the image feature reconstruction value F'.
  • the decoding neural network 2 can be divided into a basic layer and an enhancement layer.
  • the basic layer may include at least one network layer, and the basic layer may not include a network layer, that is, the basic layer is empty.
  • the enhancement layer may include at least one network layer, or the enhancement layer may not include a network layer, that is, the enhancement layer may be empty.
  • the multiple network layers in the decoding neural network 2 can be divided into basic layers and enhancement layers according to actual needs. For example, the network layer with a fixed network structure can be used as the basic layer, and the network layer with an unfixed network structure can be used as the enhancement layer.
  • the size of the output feature may be larger than the size of the input feature, or the size of the output feature may be equal to the size of the input feature, or the size of the output feature may be smaller than the size of the input feature.
  • the base layer may include at least one deconvolution layer
  • the enhancement layer may include at least one deconvolution layer, or may not include a deconvolution layer.
  • the enhancement layer may include at least one deconvolution layer
  • the base layer may include at least one deconvolution layer, or may not include a deconvolution layer.
  • the base layer and the enhancement layer include at least one residual structure layer.
  • the base layer includes at least one residual structure layer, and the enhancement layer may include a residual structure layer or not.
  • the enhancement layer includes at least one residual structure layer, and the base layer may include a residual structure layer or not.
  • the decoding neural network 2 may include but is not limited to a deconvolution layer, an activation layer, etc., and is not limited to this.
  • the decoding neural network 2 includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 1, and an activation layer. All the above network layers can be used as basic layers, and the enhancement layer can be empty.
  • the decoding neural network 2 includes a deconvolution layer with a stride of 2 and a deconvolution layer with a stride of 1. All the above network layers can be used as basic layers, and the enhancement layer can be empty.
  • the decoding neural network 2 includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 2, an activation layer, and a deconvolution with a stride of 1. layer, 1 activation layer. All the above network layers can be used as basic layers, and the enhancement layer can be empty.
  • the output feature number (number of filters) of the last network layer of the image feature inverse transformation unit is 1 or 3. Specifically, if the output is only the value of one channel (such as grayscale image), the output feature number of the last network layer of the image feature inverse transformation unit is 1; if the output is only the value of three channels (such as RGB or YUV format), then the number of output features of the last network layer of the image feature inverse transformation unit is 3.
  • the number of output features (number of filters) of the last network layer of the image feature inverse transformation unit can be 1 or 3, or other values, without limitation.
  • the image feature inverse transformation unit can be configured with a network layer of a default network structure, and the network parameters related to the network layer of the default network structure are all fixed.
  • the network layer of the default network structure the number of deconvolution layers is fixed, the number of activation layers is fixed, the number of channels of each deconvolution layer is fixed, the size of the convolution kernel is fixed, and the filter coefficients are fixed.
  • the network parameters in the network layer of the default network structure are all fixed and these network parameters are known, the network layer of the default network structure can be obtained directly.
  • a prefabricated neural network pool can be configured for the image feature inverse transformation unit.
  • the prefabricated neural network pool includes at least one network layer of a prefabricated network structure.
  • the network parameters related to the network layer of the prefabricated network structure can be configured according to Actual configuration is required.
  • the prefabricated neural network pool includes the network layer of the prefabricated network structure t1, the network layer of the prefabricated network structure t2, and the network layer of the prefabricated network structure t3.
  • the number of deconvolution layers, the number of activation layers, the number of channels of each deconvolution layer, the size of the convolution kernel, filter coefficients and other network parameters can be pre-configured. After all network parameters are configured, the network layer of the prefabricated network structure t1 can be obtained, and so on.
  • a network layer with a variable network structure can be dynamically generated for the image feature inverse transformation unit based on network parameters.
  • the network parameters related to the network layer with a variable network structure are dynamically generated by the encoding end, and Not pre-configured.
  • the encoding end can encode the number of deconvolution layers, the number of activation layers, the number of channels of each deconvolution layer, the size of the convolution kernel, filter coefficients and other network parameters in the code stream. Therefore, the decoding end can The above-mentioned network parameters are parsed from the code stream, and a network layer with a variable network structure is generated based on these network parameters.
  • control parameters may include neural network information 2 corresponding to the image feature inverse transformation unit.
  • the image feature inverse transformation unit may parse the neural network information 2 from the control parameters and generate decoding based on the neural network information 2 Neural Network 2.
  • the neural network information 2 can include basic layer information and enhancement layer information.
  • the basic layer can be determined based on the basic layer information, and the enhancement layer can be determined based on the enhancement layer information.
  • the basic layer and enhancement layer can be combined to obtain the decoding neural network.
  • Network 2 For example, the following situation can be used to obtain neural network information 2:
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the image feature inverse transformation unit learns that the base layer uses the default network based on the base layer information.
  • the network layer of the structure therefore, gets the base layer of the default network structure.
  • the enhancement layer information includes the enhancement layer uses default network flag bit, and the enhancement layer uses default network flag bit indicates that the enhancement layer uses the default network. In this case, based on the enhancement layer information, it is learned that the enhancement layer adopts the network layer of the default network structure. Therefore, obtain Enhancement layer of the default network structure. Combining the basic layer of the default network structure and the enhancement layer of the default network structure, the decoding neural network 2 is obtained.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the image feature inverse transformation unit learns that the base layer uses the default network based on the base layer information.
  • the network layer of the structure therefore, gets the base layer of the default network structure.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network. In this case, it is learned based on the enhancement layer information that the enhancement layer adopts a prefabricated network structure. network layer, therefore, select the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer from the prefabricated neural network pool.
  • the basic layer of the default network structure and the enhancement layer of the prefabricated network structure can be combined to obtain the decoding neural network 2.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the image feature inverse transformation unit learns based on the base layer information that the base layer uses the default network structure. So, get the base layer of the default network structure.
  • the enhancement layer information includes network parameters used to generate the enhancement layer. In this case, the network parameters are parsed from the control parameters, and the enhancement layer of the variable network structure is generated based on the network parameters.
  • the decoding neural network 2 is obtained by combining the basic layer of the default network structure and the enhancement layer of the variable network structure.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated networks.
  • the image feature inverse transformation unit is based on the basic layer
  • the basic layer of information acquisition adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer is selected from the prefabricated neural network pool.
  • the enhancement layer information includes the enhancement layer uses default network flag bit, and the enhancement layer uses default network flag bit indicates that the enhancement layer uses the default network. In this case, based on the enhancement layer information, it is learned that the enhancement layer adopts the network layer of the default network structure. Therefore, obtain Enhancement layer of the default network structure.
  • the basic layer of the prefabricated network structure and the enhancement layer of the default network structure can be combined to obtain the decoding neural network 2.
  • the base layer information includes the base layer using prefabricated network flag and the base layer prefabricated network index number, and the base layer using prefabricated network flag indicates that the base layer uses prefabricated networks.
  • the image feature inverse transformation unit is based on the base layer
  • the basic layer of information acquisition adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer can be selected from the prefabricated neural network pool.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network. In this case, it is learned based on the enhancement layer information that the enhancement layer adopts a prefabricated network structure.
  • the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer can be selected from the prefabricated neural network pool.
  • the basic layer of the prefabricated network structure and the enhancement layer of the prefabricated network structure can be combined to obtain the decoding neural network 2.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated networks.
  • the image feature inverse transformation unit is based on the basic layer
  • the basic layer of information acquisition adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer can be selected from the prefabricated neural network pool.
  • the enhancement layer information includes network parameters used to generate the enhancement layer. In this case, the network parameters are parsed from the control parameters, and the enhancement layer of the variable network structure is generated based on the network parameters.
  • the decoding neural network 2 is obtained by combining the basic layer of the prefabricated network structure and the enhancement layer of the variable network structure.
  • the above network structures for the basic layer and the enhancement layer can be determined through decoded control parameters.
  • the control parameters can include neural network information 2, and the neural network information 2 can include basic Layer information and enhancement layer information, the neural network information 2 for the image feature inverse transformation unit is similar to Table 1, Table 2, and Table 3, except that the relevant information is for the image feature inverse transformation unit, not for the coefficient hyperparameter feature generation unit, I won’t repeat them here.
  • Embodiment 13 For Embodiment 5 and Embodiment 6, the quality enhancement unit can obtain the low-order feature value LF of the image, perform enhancement processing on the low-order feature value LF of the image based on the decoding neural network, and obtain the reconstructed image block I corresponding to the current block.
  • the quality enhancement unit may not exist, or if the quality enhancement unit exists, the quality enhancement unit may be selectively skipped based on control parameters (such as high-level syntax, such as third enablement information, etc.), or the use of the quality enhancement unit may be determined based on the control parameters.
  • Energy quality enhancement unit For example, if the quality enhancement unit is determined to be enabled based on the control parameters, the quality enhancement unit can be used to remove image quality degradation problems such as block effects between blocks and quantization distortion. For example, the quality enhancement unit can be based on the decoding neural network. The low-order feature value LF of the image is enhanced to obtain the reconstructed image block I corresponding to the current block.
  • the quality enhancement unit may include a decoding neural network 3.
  • the decoding neural network 3 may include a basic layer and an enhancement layer.
  • the low-order feature value LF of the image is used as an input feature of the decoding neural network 3 to reconstruct the image block I.
  • the decoding neural network 3 is used to enhance the low-order feature value LF of the image.
  • the decoding neural network 3 can be divided into a basic layer and an enhancement layer.
  • the basic layer may include at least one network layer, and the basic layer may not include a network layer, that is, the basic layer is empty.
  • the enhancement layer may include at least one network layer, or the enhancement layer may not include a network layer, that is, the enhancement layer may be empty.
  • the multiple network layers in the decoding neural network 3 can be divided into basic layers and enhancement layers according to actual needs. For example, the network layer with a fixed network structure can be used as the basic layer, and the network layer with an unfixed network structure can be used as the enhancement layer.
  • the size of the output feature may be larger than the size of the input feature, or the size of the output feature may be equal to the size of the input feature, or the size of the output feature may be smaller than the size of the input feature.
  • At least one deconvolution layer is included in the base layer and enhancement layer.
  • the base layer may include at least one deconvolution layer
  • the enhancement layer may include at least one deconvolution layer, or may not include a deconvolution layer.
  • the enhancement layer may include at least one deconvolution layer
  • the base layer may include at least one deconvolution layer, or may not include a deconvolution layer.
  • the base layer and enhancement layer include at least one residual structure layer.
  • the base layer includes at least one residual structure layer, and the enhancement layer may include a residual structure layer or not.
  • the enhancement layer includes at least one residual structure layer, and the base layer may include a residual structure layer or not.
  • the decoding neural network 3 may include but is not limited to a deconvolution layer, an activation layer, etc., and is not limited to this.
  • the decoding neural network 3 includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 1, and an activation layer. All the above network layers can be used as basic layers, and the enhancement layer can be empty.
  • the decoding neural network 3 sequentially includes a deconvolution layer with a stride of 2 and a deconvolution layer with a stride of 1. All the above network layers can be used as basic layers, and the enhancement layer can be empty.
  • the decoding neural network 3 includes a deconvolution layer with a stride of 2, an activation layer, a deconvolution layer with a stride of 2, an activation layer, and a deconvolution with a stride of 1. layer, 1 activation layer. All the above network layers can be used as basic layers, and the enhancement layer can be empty.
  • the output feature number (number of filters) of the last network layer of the quality enhancement unit is 1 or 3. Specifically, if the output is only the value of one channel (such as grayscale image), the output feature number of the last network layer is 1; if the output is only the value of three channels (such as RGB or YUV format), then the last The number of output features of the network layer is 3.
  • the quality enhancement unit can be configured with a network layer of a default network structure, and network parameters related to the network layer of the default network structure are all fixed.
  • a prefabricated neural network pool can be configured for the quality enhancement unit.
  • the prefabricated neural network pool includes at least one network layer of a prefabricated network structure.
  • Network parameters related to the network layer of the prefabricated network structure can be configured according to actual needs.
  • the network layer of the variable network structure can be dynamically generated for the quality enhancement unit based on the network parameters.
  • the network parameters related to the network layer of the variable network structure are dynamically generated by the encoding end rather than preconfigured.
  • the control parameters may include neural network information 3 corresponding to the quality enhancement unit.
  • the quality enhancement unit may parse the neural network information 3 from the control parameters and generate the decoding neural network 3 based on the neural network information 3 .
  • the neural network information 3 can include basic layer information and enhancement layer information.
  • the basic layer can be determined based on the basic layer information, and the enhancement layer can be determined based on the enhancement layer information.
  • the basic layer and enhancement layer can be combined to obtain the decoding neural network.
  • Network 3 the quality enhancement unit can use the following situation to obtain the neural network information Information 3:
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the quality enhancement unit learns based on the base layer information that the base layer uses the default network structure. The network layer, therefore, gets the base layer of the default network structure.
  • the enhancement layer information includes the enhancement layer uses default network flag bit, and the enhancement layer uses default network flag bit indicates that the enhancement layer uses the default network. In this case, based on the enhancement layer information, it is learned that the enhancement layer adopts the network layer of the default network structure. Therefore, obtain Enhancement layer of the default network structure. Combining the basic layer of the default network structure and the enhancement layer of the default network structure, the decoding neural network 3 is obtained.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the quality enhancement unit learns based on the base layer information that the base layer uses the default network structure.
  • the network layer therefore, gets the base layer of the default network structure.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network. In this case, it is learned based on the enhancement layer information that the enhancement layer adopts a prefabricated network structure. network layer, therefore, select the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer from the prefabricated neural network pool.
  • the basic layer of the default network structure and the enhancement layer of the prefabricated network structure can be combined to obtain the decoding neural network 3.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the quality enhancement unit learns based on the base layer information that the base layer uses the default network structure. Therefore, Get the base layer of the default network structure.
  • the enhancement layer information includes network parameters used to generate the enhancement layer. In this case, the network parameters are parsed from the control parameters, and the enhancement layer of the variable network structure is generated based on the network parameters.
  • the decoding neural network 3 is obtained by combining the basic layer of the default network structure and the enhancement layer of the variable network structure.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated network.
  • the quality enhancement unit learns based on the basic layer information
  • the basic layer adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer is selected from the prefabricated neural network pool.
  • the enhancement layer information includes the enhancement layer uses default network flag bit, and the enhancement layer uses default network flag bit indicates that the enhancement layer uses the default network. In this case, based on the enhancement layer information, it is learned that the enhancement layer adopts the network layer of the default network structure. Therefore, obtain Enhancement layer of the default network structure.
  • the basic layer of the prefabricated network structure and the enhancement layer of the default network structure can be combined to obtain the decoding neural network 3.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated network.
  • the quality enhancement unit learns based on the basic layer information
  • the basic layer adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer can be selected from the prefabricated neural network pool.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network. In this case, it is learned based on the enhancement layer information that the enhancement layer adopts a prefabricated network structure.
  • the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer can be selected from the prefabricated neural network pool.
  • the basic layer of the prefabricated network structure and the enhancement layer of the prefabricated network structure can be combined to obtain the decoding neural network 3.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, and the basic layer using prefabricated network flag indicates that the basic layer uses prefabricated network.
  • the quality enhancement unit learns based on the basic layer information
  • the basic layer adopts the network layer of the prefabricated network structure. Therefore, the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer can be selected from the prefabricated neural network pool.
  • the enhancement layer information includes network parameters used to generate the enhancement layer. In this case, the network parameters are parsed from the control parameters, and the enhancement layer of the variable network structure is generated based on the network parameters.
  • the decoding neural network 3 is obtained by combining the basic layer of the prefabricated network structure and the enhancement layer of the variable network structure.
  • the above network structures for the base layer and enhancement layer can be determined by decoded control parameters.
  • the control parameters can include neural network information 3, and the neural network information 3 can include base layer information and enhancement.
  • Layer information, the neural network information 3 for the quality enhancement unit is similar to Table 1, Table 2, and Table 3, except that the relevant information is for the quality enhancement unit, not the coefficient hyperparameter feature generation unit, and will not be repeated here.
  • Embodiment 14 In this embodiment of the present disclosure, an image coding method based on neural network is proposed. This method can be applied to the encoding end (also called a video encoder). See Figure 7A, which is a schematic structural diagram of the encoding end.
  • the encoding end It may include a control parameter coding unit, a feature transformation unit, a coefficient hyperparameter feature transformation unit, a first quantization unit, a second quantization unit, a first feature coding unit, a second feature coding unit, a first feature decoding unit, and a second feature decoding unit. unit, coefficient hyperparameter feature generation unit, image feature inverse transformation unit, first inverse quantization unit, second inverse quantization unit, and quality enhancement unit.
  • the first quantization unit, the second quantization unit, the first inverse quantization unit, the second inverse quantization unit, and the quality enhancement unit are optional units. In some scenarios, you can choose to turn off or skip these optional units. process.
  • the code stream corresponding to the current block includes three parts: code stream 0 (code stream containing control parameters), code stream 1 (containing coefficient hyperparameter feature information) code stream), code stream 2 (code stream containing image feature information).
  • code stream 0 code stream containing control parameters
  • code stream 1 containing coefficient hyperparameter feature information
  • code stream 2 code stream containing image feature information
  • Coefficient hyperparameter feature information and image feature information can be collectively referred to as image information.
  • the neural network-based image coding method in this embodiment may include the following steps S71-S79.
  • Step S71 Perform feature transformation on the current block I to obtain the image feature value F corresponding to the current block I.
  • the feature transformation unit can perform feature transformation on the current block I to obtain the image feature value F corresponding to the current block I.
  • the feature transformation unit can perform feature transformation on the current block I based on the coding neural network to obtain the image feature value F corresponding to the current block I.
  • the current block I is used as the input feature of the coding neural network
  • the image feature value F is used as the output feature of the coding neural network.
  • Step S72 Determine image feature information based on the image feature value F.
  • the second quantization unit can obtain the image feature value F from the feature transformation unit, quantize the image feature value F, obtain the image feature quantized value F_q, and based on the image feature quantized value F_q determines the image feature information, that is, the image feature information can be the image feature quantized value F_q.
  • the control parameter encoding unit may encode the second enabling information of the second quantization unit in code stream 0, that is, the control parameter includes the second enabling information of the second quantization unit, and the second enabling information is used for Indicates that the second quantization unit has been enabled.
  • the control parameter encoding unit may also encode the quantization-related parameters corresponding to the second quantization unit in code stream 0, such as the step parameter qstep or the quantization parameter qp.
  • the image feature information is determined based on the image feature value F, that is, the image feature information may be the image feature value F.
  • the control parameter encoding unit may encode the second enabling information of the second quantization unit in the code stream 0, and the second enabling information is used to indicate that the second quantization unit is not enabled.
  • Step S73 Perform coefficient hyperparameter feature transformation on the image feature value F to obtain the coefficient hyperparameter feature coefficient value C.
  • the coefficient hyperparameter feature transformation unit performs coefficient hyperparameter feature transformation on the image feature value F to obtain the coefficient hyperparameter feature coefficient value C.
  • the coefficient hyperparameter feature transformation is performed on the image feature value F to obtain the coefficient superparameter feature transformation.
  • Parameter characteristic coefficient value C is used as the image feature value F.
  • the image feature value F is used as the input feature of the coding neural network
  • the coefficient hyperparameter feature coefficient value C is used as the output feature of the coding neural network.
  • Step S74 Determine the coefficient hyperparameter characteristic information based on the coefficient hyperparameter characteristic coefficient value C.
  • the first quantization unit can obtain the coefficient hyperparameter feature coefficient value C from the coefficient hyperparameter feature transformation unit, and quantize the coefficient hyperparameter feature coefficient value C to obtain the coefficient hyperparameter
  • the characteristic coefficient quantized value C_q, and the coefficient hyperparameter characteristic information is determined based on the coefficient hyperparameter characteristic coefficient quantized value C_q, that is, the coefficient hyperparameter characteristic information can be the coefficient hyperparameter characteristic coefficient quantified value C_q.
  • the control parameter encoding unit may encode the first enabling information of the first quantization unit in code stream 0, that is, the control parameter includes the first enabling information of the first quantization unit, and the first enabling information is used for Indicates that the first quantization unit has been enabled.
  • the control parameter encoding unit may also encode the quantization related parameters corresponding to the first quantization unit in code stream 0.
  • the coefficient hyperparameter feature information is determined based on the coefficient hyperparameter feature coefficient value C, that is, the coefficient hyperparameter feature information may be the coefficient hyperparameter feature coefficient value C.
  • the control parameter encoding unit may encode the first enabling information of the first quantization unit in the code stream 0, and the first enabling information is used to indicate that the first quantization unit is not enabled.
  • Step S75 Encode the coefficient hyperparameter feature information (such as the coefficient hyperparameter feature coefficient quantized value C_q or the coefficient hyperparameter feature coefficient value C) to obtain code stream 1.
  • the first feature encoding unit may encode the coefficient hyperparameter feature information in the code stream corresponding to the current block.
  • the code stream including the coefficient hyperparameter feature information is recorded as code stream 1.
  • Step S76 Decode the code stream 1 corresponding to the current block, obtain the coefficient hyperparameter feature information (such as the coefficient hyperparameter feature coefficient quantized value C_q or the coefficient hyperparameter feature coefficient value C), and determine the coefficient hyperparameter based on the coefficient hyperparameter feature information Characteristic coefficient reconstruction value.
  • the coefficient hyperparameter feature information such as the coefficient hyperparameter feature coefficient quantized value C_q or the coefficient hyperparameter feature coefficient value C
  • the first feature decoding unit may decode code stream 1 corresponding to the current block to obtain coefficient hyperparameter feature information.
  • the coefficient hyperparameter feature information is the coefficient hyperparameter feature coefficient quantized value C_q.
  • the first inverse quantization unit can inverse the coefficient hyperparameter feature coefficient quantized value C_q.
  • the coefficient hyperparameter characteristic coefficient reconstruction value C' is obtained.
  • the coefficient hyperparameter characteristic coefficient reconstruction value C' and the coefficient hyperparameter characteristic coefficient value C can be the same.
  • the coefficient hyperparameter feature information is the coefficient hyperparameter feature coefficient value C.
  • the coefficient hyperparameter feature coefficient value C can be used as the coefficient hyperparameter feature coefficient reconstruction value C'.
  • the coefficient hyperparameter characteristic coefficient reconstruction value C’ can be obtained.
  • the reconstruction value C' of the coefficient hyperparameter characteristic coefficient is the same as the coefficient hyperparameter characteristic coefficient value C. Therefore, The process of step S76 can also be omitted, and the coefficient hyperparameter characteristic coefficient value C is directly used as the coefficient hyperparameter characteristic coefficient reconstruction value C'. In this case, the encoding end structure shown in Figure 7A can also be improved to obtain the encoding end structure shown in Figure 7B.
  • Step S77 Perform an inverse transformation operation on the coefficient hyperparameter characteristic coefficient reconstruction value C' to obtain the coefficient hyperparameter characteristic value P.
  • the coefficient hyperparameter feature generation unit performs an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C’ to obtain the coefficient hyperparameter feature value P.
  • Step S78 Encode the image feature information (such as the image feature quantized value F_q or the image feature value F) based on the coefficient hyperparameter feature value P to obtain code stream 2.
  • the second feature encoding unit may encode image feature information in the code stream corresponding to the current block.
  • the code stream including the image feature information is recorded as code stream 2.
  • Step S79 The control parameter encoding unit obtains the control parameters corresponding to the current block, which may include neural network information, and encodes the control parameters corresponding to the current block in the code stream, and records the code stream including the control parameters as code stream 0.
  • the feature transformation unit can use a coding neural network to perform feature transformation on the current block I
  • the control parameter coding unit can determine the image feature inverse transformation unit corresponding to the decoding end device based on the network structure of the coding neural network.
  • Neural network information 2 is used to determine the decoding neural network 2 corresponding to the image feature inverse transformation unit, and encode the neural network information 2 corresponding to the image feature inverse transformation unit in code stream 0.
  • the coefficient hyperparameter feature transformation unit can use a coding neural network to perform coefficient hyperparameter feature transformation on the image feature value F, and the control parameter coding unit can determine the decoding terminal device based on the network structure of the coding neural network.
  • Neural network information 1 corresponding to the coefficient hyperparameter feature generation unit This neural network information 1 is used to determine the decoding neural network 1 corresponding to the coefficient hyperparameter feature generation unit, and encodes the neural network corresponding to the coefficient hyperparameter feature generation unit in code stream 0.
  • steps S80-S86 may also be included.
  • Step S80 Decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature information corresponding to the current block.
  • the first feature decoding unit can decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature information corresponding to the current block.
  • Step S81 Determine the coefficient hyperparameter characteristic coefficient reconstruction value based on the coefficient hyperparameter characteristic information.
  • the coefficient hyperparameter characteristic information is the coefficient hyperparameter characteristic coefficient quantized value C_q.
  • the first inverse quantization unit can inversely quantize the coefficient hyperparameter characteristic coefficient quantized value C_q. , obtain the coefficient hyperparameter characteristic coefficient reconstruction value C'.
  • the coefficient hyperparameter feature information is the coefficient hyperparameter feature coefficient value C.
  • the coefficient hyperparameter feature coefficient value C can be used as the coefficient hyperparameter feature coefficient reconstruction value C' .
  • Step S82 Perform an inverse transformation operation on the coefficient hyperparameter characteristic coefficient reconstruction value C' to obtain the coefficient hyperparameter characteristic value P.
  • the coefficient hyperparameter feature generation unit performs an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' to obtain the coefficient hyperparameter feature value P.
  • the coefficient hyperparameter feature generation unit reconstructs the coefficient hyperparameter feature coefficient based on the decoding neural network. Perform an inverse transformation operation on the value C' to obtain the coefficient hyperparameter characteristic value P.
  • Step S83 Decode the code stream 2 corresponding to the current block and obtain the image feature information corresponding to the current block.
  • the second feature decoding unit can decode the code stream 2 corresponding to the current block and obtain the image feature information corresponding to the current block.
  • the second feature decoding unit may use the coefficient hyperparameter feature value P to decode the code stream 2 corresponding to the current block.
  • Step S84 Determine the image feature reconstruction value based on the image feature information.
  • the image feature information is the image feature quantized value F_q
  • the second inverse quantization unit can inversely quantize the image feature quantized value F_q to obtain the image feature reconstruction value F'.
  • the image feature information is the image feature value F, and the image feature value F can be used as the image feature reconstruction value F'.
  • Step S85 Perform an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the image feature inverse transformation unit performs an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the image feature reconstruction value F' is inversely transformed to obtain the image low-order feature value.
  • Value LF is a value that is a value that is a value that is a transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value.
  • Step S86 Determine the reconstructed image block I corresponding to the current block based on the image low-order feature value LF. For example, if the quality enhancement unit is enabled, the quality enhancement unit performs enhancement processing on the low-order feature value LF of the image to obtain the reconstructed image block I corresponding to the current block. For example, the low-order feature value LF of the image is enhanced based on the decoding neural network. dealt with properly The reconstructed image block I corresponding to the previous block. If the quality enhancement unit is not enabled, the low-order feature value LF of the image is used as the reconstructed image block I.
  • steps S80 to S86 please refer to Embodiment 5, and details will not be repeated here.
  • the coefficient hyperparameter feature generation unit can use a decoding neural network to perform an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' to obtain the coefficient hyperparameter feature value P.
  • the control parameter encoding unit can be based on this
  • the network structure of the decoding neural network determines the neural network information 1 corresponding to the coefficient hyperparameter feature generation unit of the decoding end device.
  • the neural network information 1 is used to determine the decoding neural network 1 corresponding to the coefficient hyperparameter feature generation unit, and is displayed in the code stream 0 encodes neural network information 1.
  • the image feature inverse transformation unit can use a decoding neural network to perform an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the control parameter encoding unit can be based on the decoding neural network.
  • the network structure determines the neural network information 2 corresponding to the image feature inverse transformation unit of the decoding end device.
  • the neural network information 2 is used to determine the decoding neural network 2 corresponding to the image feature inverse transformation unit, and encodes the neural network information in code stream 0. 2.
  • the quality enhancement unit can use a decoding neural network to enhance the low-order feature value LF of the image to obtain the reconstructed image block I corresponding to the current block.
  • the control parameter encoding unit can be based on the network of the decoding neural network.
  • the structure determines the neural network information 3 corresponding to the quality enhancement unit of the decoding end device.
  • the neural network information 3 is used to determine the decoding neural network 3 corresponding to the quality enhancement unit, and encodes the neural network information 3 in the code stream 0.
  • Embodiment 15 This embodiment of the present disclosure proposes an image coding method based on neural networks. This method can be applied to the encoding end. See Figure 7C, which is a schematic structural diagram of the encoding end.
  • the encoding end can include a control parameter encoding unit, a feature Transformation unit, coefficient hyperparameter feature transformation unit, first feature coding unit, second feature coding unit, first feature decoding unit, second feature decoding unit, coefficient hyperparameter feature generation unit, image feature inverse transformation unit, quality enhancement unit .
  • the image coding method based on neural network in this embodiment may include the following steps S91-S99.
  • Step S91 Perform feature transformation on the current block I to obtain the image feature value F corresponding to the current block I.
  • the feature transformation unit can perform feature transformation on the current block I based on the coding neural network to obtain the image feature value F corresponding to the current block I.
  • Step S92 Perform coefficient hyperparameter feature transformation on the image feature value F to obtain the coefficient hyperparameter feature coefficient value C.
  • the coefficient hyperparameter feature transformation unit performs coefficient hyperparameter feature transformation on the image feature value F based on the coding neural network, and obtains the coefficient hyperparameter feature coefficient value C.
  • Step S93 Encode the coefficient hyperparameter characteristic coefficient value C to obtain code stream 1.
  • the first feature encoding unit can encode the coefficient hyperparameter feature coefficient value C in the code stream corresponding to the current block to obtain code stream 1.
  • Step S94 Determine the coefficient hyperparameter characteristic coefficient reconstruction value C' based on the coefficient hyperparameter characteristic coefficient value C. If the coefficient superparameter characteristic coefficient reconstruction value C' is the same as the coefficient hyperparameter characteristic coefficient value C, the coefficient superparameter characteristic coefficient reconstruction value C' performs an inverse transformation operation to obtain the coefficient hyperparameter eigenvalue P. For example, the coefficient hyperparameter feature generation unit performs an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C’ to obtain the coefficient hyperparameter feature value P.
  • Step S95 Encode the image feature value F based on the coefficient hyperparameter feature value P to obtain code stream 2.
  • the second feature encoding unit encodes the image feature value F in the code stream corresponding to the current block to obtain code stream 2.
  • Step S96 The control parameter encoding unit obtains the control parameters corresponding to the current block, which may include neural network information, and encodes the control parameters corresponding to the current block in the code stream, and records the code stream including the control parameters as code stream 0.
  • Step S97 Decode the code stream 1 corresponding to the current block, and obtain the coefficient hyperparameter characteristic coefficient value C corresponding to the current block.
  • the first feature decoding unit can decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature coefficient value C corresponding to the current block.
  • Step S98 Perform an inverse transformation operation on the coefficient hyperparameter characteristic coefficient reconstruction value C' to obtain the coefficient hyperparameter characteristic value P.
  • the coefficient hyperparameter feature generation unit performs an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C’ based on the decoding neural network to obtain the coefficient hyperparameter feature value P.
  • Decode the code stream 2 corresponding to the current block obtain the image feature value F corresponding to the current block, and use the image feature value F as the image feature reconstruction value F'.
  • the second feature decoding unit uses the coefficient hyperparameter feature value P to decode the image feature value corresponding to the current block. Code stream 2.
  • Step S99 Perform an inverse transformation operation on the image feature reconstruction value F' to obtain the image low-order feature value LF.
  • the image feature inverse transformation unit performs an inverse transformation operation on the image feature reconstruction value F' based on the decoding neural network, and we get to the low-order feature value LF of the image.
  • the reconstructed image block I corresponding to the current block is determined based on the image low-order feature value LF.
  • the quality enhancement unit enhances the low-order feature value LF of the image based on the decoding neural network to obtain the reconstructed image block I corresponding to the current block.
  • the first feature encoding unit may encode the coefficient hyperparameter feature information in the code stream 1 corresponding to the current block.
  • the encoding process of the first feature encoding unit is the same as that of the first feature decoding unit.
  • the first feature encoding unit may use an entropy coding method (such as CAVLC or CABAC isentropy coding method) to encode the coefficient hyperparameter feature information, which will not be repeated here.
  • the first feature decoding unit can decode the code stream 1 corresponding to the current block and obtain the coefficient hyperparameter feature information corresponding to the current block.
  • the second feature encoding unit may encode the image feature information based on the coefficient hyperparameter feature value P to obtain code stream 2.
  • the encoding process of the second feature encoding unit and the decoding of the second feature decoding unit For process correspondence, see Embodiment 10.
  • the second feature encoding unit may use an entropy coding method (such as CAVLC or CABAC isentropy coding method) to encode the image feature information, which will not be repeated here.
  • the second feature decoding unit can decode the code stream 2 corresponding to the current block and obtain the image feature information corresponding to the current block.
  • Embodiment 10 please refer to Embodiment 10.
  • the first quantization unit may quantize the coefficient hyperparameter characteristic coefficient value C to obtain the coefficient hyperparameter characteristic coefficient quantized value C_q.
  • the quantization process of the first quantization unit is the inverse of the first inverse quantization unit.
  • the quantization process corresponds to Embodiment 8.
  • the first quantization unit quantizes the coefficient hyperparameter characteristic coefficient value C based on the quantization related parameters, which will not be repeated here.
  • each feature value of each feature channel adopts the same quantization step; 2) each feature The channels use different quantization step sizes, but each feature value in the feature channel uses the same quantization step size; 3) Each feature value of each feature channel uses a different quantization step size.
  • the first inverse quantization unit at the encoding end can inversely quantize the coefficient hyperparameter characteristic coefficient quantized value C_q to obtain the coefficient hyperparameter characteristic coefficient reconstruction value C'.
  • the processing process of the first inverse quantization unit at the encoding end please refer to Embodiment 10.
  • the second quantization unit can quantize the image feature value F to obtain the image feature quantization value F_q.
  • the quantization process of the second quantization unit corresponds to the inverse quantization process of the second inverse quantization unit. See Embodiment 11.
  • the second quantization The unit quantizes the image feature value F based on the quantization related parameters, which will not be repeated here. It should be noted that for the quantization step size, 1) each feature value of each feature channel adopts the same quantization step size; 2) each feature channel adopts a different quantization step size, but each feature value within the feature channel uses a different quantization step size. Each feature value adopts the same quantization step size; 3) Each feature value of each feature channel adopts a different quantization step size.
  • the second inverse quantization unit at the encoding end can inversely quantize the image feature quantized value F_q to obtain the image feature reconstruction value F'.
  • the processing process of the second inverse quantization unit at the encoding end please refer to Embodiment 11.
  • Embodiment 17 For Embodiment 14 and 15, the feature transformation unit can perform feature transformation on the current block I based on the coding neural network to obtain the image feature value F corresponding to the current block I.
  • the neural network information 2 corresponding to the image feature inverse transformation unit of the decoding end device can be determined based on the network structure of the encoding neural network, and the neural network information 2 corresponding to the image feature inverse transformation unit is encoded in code stream 0.
  • the decoding end device generates the decoding neural network 2 corresponding to the image feature inverse transformation unit based on the neural network information 2
  • Embodiment 12 the details will not be repeated here.
  • the feature transformation unit may include a coding neural network 2.
  • the coding neural network 2 may include a base layer and an enhancement layer.
  • the coding neural network 2 may be divided into a base layer and an enhancement layer.
  • the base layer may include at least A network layer, the basic layer may not include the network layer, that is, the basic layer is empty.
  • the enhancement layer may include at least one network layer, or the enhancement layer may not include a network layer, that is, the enhancement layer may be empty.
  • the multiple network layers in the coding neural network 2 the multiple network layers can be divided into basic layers and enhancement layers according to actual needs. For example, the network layer with a fixed network structure can be used as the basic layer, and the network layer with an unfixed network structure can be used as the enhancement layer.
  • the size of the output feature may be smaller than the size of the input feature, or the size of the output feature may be equal to the size of the input feature, or the size of the output feature may be larger than the size of the input feature.
  • the base layer and the enhancement layer include at least one convolutional layer.
  • the base layer may include at least one convolutional layer
  • the enhancement layer may include at least one convolutional layer, or may not include a convolutional layer.
  • the enhancement layer may include at least one convolutional layer
  • the base layer may include at least one convolutional layer, or may not include a convolutional layer.
  • the base layer and enhancement layer include at least one residual structure layer.
  • the base layer includes at least one residual structure layer, and the enhancement layer may include a residual structure layer or not.
  • the enhancement layer includes at least one residual structure layer, and the base layer may include a residual structure layer or not.
  • the coding neural network 2 may include but is not limited to convolutional layers, activation layers, etc., and is not limited to this.
  • the coding neural network 2 includes a convolutional layer with a stride of 2, an activation layer, a convolutional layer with a stride of 1, and an activation layer in sequence, and all the above network layers are used as basic layers.
  • the coding neural network 2 sequentially includes a convolutional layer with a stride of 2 and a convolutional layer with a stride of 1, and all the above network layers are used as basic layers.
  • the coding neural network 2 sequentially includes a convolutional layer with a stride of 2, an activation layer, a convolutional layer with a stride of 2, an activation layer, a convolutional layer with a stride of 1, and 1
  • An activation layer takes all the above network layers as basic layers.
  • the number of input channels of the first network layer of the feature transformation unit is 1 or 3. If the input image block contains only one channel (such as grayscale image), the number of input channels of the first network layer of the feature transformation unit is 1; if the input image block contains three channels (such as RGB or YUV format), then the feature The number of input channels of the first network layer of the transform unit is 3.
  • the network structure of the encoding neural network 2 of the feature transformation unit can be symmetrical to the network structure of the decoding neural network 2 of the image feature inverse transformation unit (see Embodiment 12), and the network structure of the encoding neural network 2 of the feature transformation unit
  • the parameters may be the same as or different from the network parameters of the decoding neural network 2 of the image feature inverse transformation unit.
  • the feature transformation unit can be configured with a network layer of a default network structure, and network parameters related to the network layer of the default network structure are all fixed.
  • network parameters related to the network layer of the default network structure are all fixed.
  • the number of deconvolution layers is fixed
  • the number of activation layers is fixed
  • the number of channels of each deconvolution layer is fixed
  • the size of the convolution kernel is fixed
  • the filter coefficients are fixed.
  • the network layer of the default network structure configured for the feature transformation unit and the network layer of the default network structure configured for the image feature inverse transformation unit may have symmetrical structures.
  • a prefabricated neural network pool (corresponding to the prefabricated neural network pool of the decoding end device) can be configured for the feature transformation unit.
  • the prefabricated neural network pool includes at least one network layer of a prefabricated network structure, and the prefabricated network structure
  • the network parameters related to the network layer can be configured according to actual needs.
  • the prefabricated neural network pool may include the network layer of the prefabricated network structure t1', the network layer of the prefabricated network structure t2', the network layer of the prefabricated network structure t3', etc.
  • the network layer of the prefabricated network structure t1' and the network layer of the prefabricated network structure t1 may be symmetrical structures.
  • the network layer of the prefabricated network structure t2' and the network layer of the prefabricated network structure t2 can be symmetrical structures.
  • the network layer of the prefabricated network structure t3' and the network layer of the prefabricated network structure t3 can be symmetrical structures.
  • a network layer with a variable network structure can be dynamically generated for the feature transformation unit based on network parameters.
  • the network parameters related to the network layer with a variable network structure are dynamically generated by the encoding end rather than pre-configured. of.
  • the encoding end can determine the neural network information 2 corresponding to the image feature inverse transformation unit based on the network structure of the encoding neural network 2, and encode the neural network information corresponding to the image feature inverse transformation unit in the code stream 0 2. For example, the encoding end can use the following situation to determine the neural network information 2:
  • the encoding end encodes the basic layer information and the enhancement layer information in code stream 0.
  • the basic layer The information and the enhancement layer information constitute the neural network information 2 corresponding to the image feature inverse transformation unit.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the enhancement layer information includes the enhancement layer uses default network flag, and the enhancement layer uses default network flag indicates that the enhancement layer uses the default network.
  • the encoding end encodes the basic layer information and the enhancement layer information in code stream 0.
  • the basic layer The information and the enhancement layer information constitute the neural network information 2 corresponding to the image feature inverse transformation unit.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number.
  • the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network
  • the enhancement layer prefabricated network index number indicates that the network layer of the prefabricated network structure is in The corresponding index in the pre-made neural network pool. For example, if the enhancement layer of the coding neural network 2 adopts the network layer of the prefabricated network structure t1', then the prefabricated network index number of the enhancement layer indicates the network layer of the first prefabricated network structure t1' in the prefabricated neural network pool.
  • the encoding end encodes the basic layer information and the enhancement layer information in the code stream 0.
  • the basic layer The layer information and the enhancement layer information constitute the neural network information 2 corresponding to the image feature inverse transformation unit.
  • the base layer information includes the base layer uses default network flag, and the base layer uses default network flag indicates that the base layer uses the default network.
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the network parameters may include but are not limited to at least one of the following: the number of neural network layers, the deconvolution layer flag, the number of deconvolution layers, each deconvolution The quantization step size of the layer, The number of channels of each deconvolution layer, the size of the convolution kernel, the number of filters, the filter size index, the filter coefficient zero flag, the filter coefficient, the activation layer flag, and the activation layer type.
  • the network parameters in the enhancement layer information can be different from the network parameters used in the enhancement layer of the coding neural network 2. That is to say, for each network parameter used in the enhancement layer, a network symmetrical to the network parameters can be generated. Parameters, there is no restriction on this generation process, and this symmetric network parameter is transmitted to the decoder as enhancement layer information.
  • the basic layer information and the enhancement layer information, the basic layer information and the enhancement are encoded in code stream 0
  • the layer information constitutes the neural network information corresponding to the image feature inverse transformation unit 2.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number.
  • the basic layer using prefabricated network flag indicates that the basic layer uses a prefabricated network.
  • the basic layer prefabricated network index number indicates that the network layer of the prefabricated network structure is in The corresponding index in the pre-made neural network pool.
  • the enhancement layer information includes the enhancement layer use default network flag, and the enhancement layer use default network flag indicates that the enhancement layer uses the default network.
  • the basic layer of the coding neural network 2 adopts the network layer of the prefabricated network structure, and the enhancement layer adopts the network layer of the prefabricated network structure, then the encoding end encodes the basic layer information and the enhancement layer information in the code stream 0.
  • the basic layer information and the enhancement layer information constitute the neural network information 2 corresponding to the image feature inverse transformation unit.
  • the basic layer information may include a basic layer using prefabricated network flag bit and a basic layer prefabricated network index number.
  • the basic layer using prefabricated network flag bit is used to indicate that the basic layer uses a prefabricated network.
  • the basic layer prefabricated network index number is used to indicate that the basic layer uses a prefabricated network.
  • the enhancement layer information may include a prefabricated network flag bit for the enhancement layer and a prefabricated network index number for the enhancement layer, and the prefabricated network flag bit for the enhancement layer is used to indicate that the enhancement layer uses a prefabricated network, and the prefabricated network index number for the enhancement layer is used to indicate The corresponding index of the network layer of the prefabricated network structure in the prefabricated neural network pool.
  • the basic layer information and the enhancement layer information are encoded in the code stream 0, and the basic layer information and The enhancement layer information constitutes the neural network information corresponding to the image feature inverse transformation unit 2.
  • the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number.
  • the basic layer using prefabricated network flag indicates that the basic layer uses a prefabricated network.
  • the basic layer prefabricated network index number indicates that the network layer of the prefabricated network structure is in The corresponding index in the pre-made neural network pool.
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the network parameters may include but are not limited to at least one of the following: the number of neural network layers, the deconvolution layer flag, the number of deconvolution layers, each deconvolution
  • the quantization step size of the layer, the number of channels of each deconvolution layer, the size of the convolution kernel, the number of filters, the filter size index, the filter coefficient is zero flag, the filter coefficient, the activation layer flag, and the activation layer type.
  • the network parameters in the enhancement layer information can be different from the network parameters used in the enhancement layer of the coding neural network 2. That is to say, for each network parameter used in the enhancement layer, network parameters that are symmetrical to the network parameters can be generated. For this generation There are no restrictions on the process, and this symmetric network parameter is transmitted to the decoder as enhancement layer information.
  • Embodiment 18 Regarding Embodiment 14 and 15, the image feature inverse transformation unit at the encoding end can perform an inverse transformation operation on the image feature reconstruction value F' based on the decoding neural network to obtain the image low-order feature value LF.
  • the neural network information 2 corresponding to the image feature inverse transformation unit of the decoding end device can also be determined based on the network structure of the decoding neural network, and the neural network information 2 corresponding to the image feature inverse transformation unit is encoded in the code stream 0 .
  • the image feature inverse transformation unit at the encoding end may include a decoding neural network 2.
  • the decoding neural network 2 may include a basic layer and an enhancement layer.
  • the network structure of the decoding neural network 2 at the encoding end is the same as that of the decoding neural network at the decoding end.
  • the network structure of Embodiment 2 is the same. Please refer to Embodiment 12, which will not be described again here.
  • the encoding end can configure a network layer of a default network structure for the image feature inverse transformation unit.
  • the network layer of this default network structure is the same as the network layer of the default network structure of the decoding end.
  • the encoding end can configure a prefabricated neural network pool for the image feature inverse transformation unit.
  • the prefabricated neural network pool can include at least one network layer of a prefabricated network structure. This prefabricated neural network pool is the same as the prefabricated neural network pool at the decoding end.
  • the encoding end can dynamically generate a network layer with a variable network structure for the image feature inverse transformation unit based on network parameters.
  • the encoding end can determine the neural network information 2 corresponding to the image feature inverse transformation unit based on the network structure of the decoding neural network 2, and encode the neural network information 2 in the code stream 0.
  • the base layer information includes the base layer uses the default network flag, and the base layer uses the default network flag bit indicates that the base layer uses the default network.
  • the enhancement layer information includes the enhancement layer uses default network flag, and the enhancement layer uses default network flag indicates that the enhancement layer uses the default network.
  • the basic layer information includes the basic layer using the default network flag, and the basic layer using the default network flag bit indicates that the base layer uses the default network.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number.
  • the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network
  • the enhancement layer prefabricated network index number indicates that the network layer of the prefabricated network structure is in The corresponding index in the pre-made neural network pool.
  • the basic layer information includes The base layer uses the default network flag, and the base layer uses the default network flag to indicate that the base layer uses the default network.
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the network parameters in the enhancement layer information may be the same as the network parameters used in the enhancement layer of the decoding neural network 2 .
  • the basic layer information includes the basic layer use prefabricated network flag bit and the basic layer prefabricated network index number.
  • the layer uses prefabricated network flag bit indicates that the basic layer uses a prefabricated network
  • the base layer prefabricated network index number indicates the corresponding index of the network layer of the prefabricated network structure in the prefabricated neural network pool.
  • the enhancement layer information includes the enhancement layer use default network flag, and the enhancement layer use default network flag indicates that the enhancement layer uses the default network.
  • the basic layer information includes the basic layer using prefabricated network flag bit and the basic layer prefabricated network index number.
  • the layer uses prefabricated network flag bit indicates that the basic layer uses a prefabricated network
  • the base layer prefabricated network index number indicates the corresponding index of the network layer of the prefabricated network structure in the prefabricated neural network pool.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number.
  • the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network.
  • the enhancement layer prefabricated network index number indicates that the network layer of the prefabricated network structure is in the prefabricated neural network.
  • the corresponding index in the network pool For another example, if the basic layer of the decoding neural network 2 adopts a network layer with a prefabricated network structure, and the enhancement layer adopts an enhancement layer with a variable network structure, then the basic layer information includes the basic layer using prefabricated network flag and the basic layer prefabricated network index number, The basic layer uses a prefabricated network flag bit to indicate that the basic layer uses a prefabricated network, and the base layer prefabricated network index number indicates the corresponding index of the network layer of the prefabricated network structure in the prefabricated neural network pool.
  • the enhancement layer information includes network parameters, and the network parameters in the enhancement layer information are the same as the network parameters used in the enhancement layer of the decoding neural network 2.
  • the coefficient hyperparameter feature transformation unit can perform coefficient hyperparameter feature transformation on the image feature value F based on the coding neural network to obtain the coefficient hyperparameter feature coefficient value C.
  • the neural network information 1 corresponding to the coefficient hyperparameter feature generation unit of the decoding end device can be determined based on the network structure of the encoding neural network, and the neural network information corresponding to the coefficient hyperparameter feature generation unit can be encoded in code stream 0 1.
  • the decoding end device generates the decoding neural network 1 corresponding to the coefficient hyperparameter feature generation unit based on the neural network information 1, please refer to Embodiment 9, and the details will not be repeated here.
  • the coefficient hyperparameter feature transformation unit may include a coding neural network 1.
  • the coding neural network 1 may include a basic layer and an enhancement layer.
  • the coding neural network 1 may be divided into a basic layer and an enhancement layer.
  • the basic layer It may include at least one network layer, and the base layer may not include the network layer, that is, the base layer is empty.
  • the enhancement layer may include at least one network layer, or the enhancement layer may not include a network layer, that is, the enhancement layer may be empty.
  • the network structure of the encoding neural network 1 of the coefficient hyperparameter feature transformation unit and the network structure of the decoding neural network 1 of the coefficient hyperparameter feature generation unit can be symmetrical, and the coefficient hyperparameter feature transformation unit
  • the network parameters of the encoding neural network 1 may be the same as or different from the network parameters of the decoding neural network 1 of the coefficient hyperparameter feature generation unit.
  • the network structure of the encoding neural network 1 will not be described again.
  • the network layer of the default network structure can be configured for the coefficient hyperparameter feature transformation unit, and the network parameters related to the network layer of the default network structure are all fixed.
  • the network layer of the default network structure configured for the coefficient hyperparameter feature transformation unit and the network layer of the default network structure configured for the coefficient hyperparameter feature generation unit can be symmetrical structures.
  • a prefabricated neural network pool (corresponding to the prefabricated neural network pool of the decoding end device) can be configured for the coefficient hyperparameter feature transformation unit.
  • the prefabricated neural network pool includes at least one network layer of a prefabricated network structure, and a network related to the network layer of the prefabricated network structure.
  • the parameters can be configured according to actual needs.
  • a network layer with a variable network structure can be dynamically generated for the coefficient hyperparameter feature transformation unit based on network parameters. Network parameters related to the network layer with a variable network structure are dynamically generated by the encoding end.
  • the encoding end can determine the neural network information 1 corresponding to the coefficient hyperparameter feature generation unit based on the network structure of the encoding neural network 1, and encode the neural network information 1 in the code stream 0.
  • the neural network information 1 may include basic layer information and enhancement layer information.
  • the basic layer information includes the basic layer using the default network flag, and the basic layer using the default network flag bit indicates that the base layer uses the default network.
  • the enhancement layer information includes the enhancement layer uses default network flag, and the enhancement layer uses default network flag indicates that the enhancement layer uses the default network.
  • Embodiment 20 Regarding Embodiment 14 and 15, the coefficient hyperparameter feature generation unit at the encoding end can perform an inverse transformation operation on the coefficient hyperparameter feature coefficient reconstruction value C' based on the decoding neural network to obtain the coefficient hyperparameter feature value P.
  • the neural network information 1 corresponding to the coefficient hyperparameter feature generation unit of the decoding end device can also be determined based on the network structure of the decoding neural network, and the neural network corresponding to the coefficient hyperparameter feature generation unit is encoded in code stream 0 Information 1.
  • the coefficient hyperparameter feature generation unit at the encoding end may include a decoding neural network 1.
  • the decoding neural network 1 may include a basic layer and an enhancement layer.
  • the network structure of the decoding neural network 1 at the encoding end is the same as that of the decoding neural network at the decoding end.
  • the network structure of network 1 is the same. Please refer to Embodiment 9, which will not be described again here.
  • the encoding end can configure a network layer of a default network structure for the coefficient hyperparameter feature generation unit, and the network layer of this default network structure is the same as the network layer of the default network structure of the decoding end.
  • the encoding end can configure a prefabricated neural network pool for the coefficient hyperparameter feature generation unit.
  • the prefabricated neural network pool can include at least one network layer of a prefabricated network structure. This prefabricated neural network pool is the same as the prefabricated neural network pool on the decoding end.
  • the encoding end can dynamically generate a network layer with a variable network structure for the coefficient hyperparameter feature generation unit based on the network parameters.
  • the encoding end can determine the neural network information 1 corresponding to the coefficient hyperparameter feature generation unit based on the network structure of the decoding neural network 1, and encode the neural network information 1 in the code stream 0.
  • the neural network information 1 may include basic layer information and enhancement layer information.
  • the basic layer information includes the base layer using the default network flag, and The base layer uses the default network flag indicates that the base layer uses the default network.
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the network parameters in the enhancement layer information can be the same as the network parameters used in the enhancement layer of the decoding neural network 1 (that is, the enhancement layer used at the encoding end).
  • the quality enhancement unit at the encoding end can enhance the low-order feature value LF of the image based on the decoding neural network to obtain the reconstructed image block I corresponding to the current block.
  • the neural network information 3 corresponding to the quality enhancement unit of the decoding end device may also be determined based on the network structure of the decoding neural network, and the neural network information 3 corresponding to the quality enhancement unit may be encoded in code stream 0.
  • the quality enhancement unit at the encoding end may include a decoding neural network 3, and the decoding neural network 3 may include a basic layer and an enhancement layer.
  • the network structure of the decoding neural network 3 at the encoding end is the same as the network structure of the decoding neural network 3 at the decoding end. See Embodiment 13 will not be described again here.
  • the encoding end may configure the network layer of a default network structure for the quality enhancement unit, and the network layer of this default network structure is the same as the network layer of the default network structure of the decoding end.
  • the encoding end may configure a prefabricated neural network pool for the quality enhancement unit.
  • the prefabricated neural network pool may include at least one network layer of a prefabricated network structure.
  • the prefabricated neural network pool is the same as the prefabricated neural network pool at the decoding end.
  • the encoding end can dynamically generate network layers with variable network structures for the quality enhancement unit based on network parameters.
  • the encoding end determines the neural network information 3 corresponding to the quality enhancement unit based on the network structure of the decoding neural network 3, and encodes the neural network information 3 in the code stream 0.
  • the neural network information 3 may include basic layer information and enhancement layer information.
  • the method of encoding the neural network information 3 which is similar to the method of encoding the neural network information 2, please refer to Embodiment 18, which will not be repeated here.
  • the base layer information includes the base layer uses the default network flag, and the base layer uses the default network The flag bit indicates that the base layer uses the default network.
  • the enhancement layer information includes network parameters used to generate the enhancement layer.
  • the network parameters in the enhancement layer information may be the same as the network parameters used in the enhancement layer of the decoding neural network 3 .
  • the network parameters in Embodiment 1 to Embodiment 21 can all be fixed-point network parameters.
  • the filtering weights in the network parameters can be 4, 8, 16, 32, 64.
  • Bit width representation the feature value output by the network can also be limited in bit width, such as limited to 4, 8, 16, 32, or 64 bit width representation.
  • bit width of the network parameters can be limited to 8 bits, and the value size can be limited to [-127, 127]; the bit width of the feature value output by the network can be limited to 8 bits, and the value size can be limited to Between [-127, 127].
  • Embodiment 22 For the syntax table related to the image header, see Table 4, which provides syntax information related to the image header, that is, image-level syntax.
  • Table 4 u(n) is used to represent the n-bit fixed-length code encoding method.
  • pic_width represents the image width
  • pic_height represents the image height
  • pic_format represents the image format, such as RGB 444, YUV444, YUV420, YUV422 and other image formats.
  • bu_width represents the width of the basic block
  • bu_height represents the height of the basic block
  • block_width represents the width of the image block
  • block_height represents the height of the image block
  • bit_depth represents the image bit depth
  • pic_qp represents the quantization parameter in the current image
  • lossless_flag represents whether the current image is lossless. Coding mark
  • feature_map_max_bit_depth represents the maximum special depth of the feature map, which is used to limit the maximum and minimum values of the network input or output feature map.
  • each of the above embodiments can be implemented individually or in combination.
  • each embodiment in Embodiment 1 to Embodiment 22 can be implemented individually, and at least two of Embodiment 1 to Embodiment 22 can be implemented individually.
  • the embodiments can be implemented in combination.
  • the content on the encoding side can also be applied to the decoding side, that is, the decoding side can be processed in the same way, and the content on the decoding side can also be applied to the encoding side, that is, the encoding side can be processed in the same way.
  • an embodiment of the present disclosure also proposes an image decoding device based on a neural network.
  • the device is applied to the decoding end.
  • the device includes: a memory configured to store video data; decoding A device configured to implement the decoding method in the above-mentioned Embodiment 1 to Embodiment 22, that is, the processing flow of the decoding end.
  • a decoder is configured to:
  • the input features corresponding to the decoding processing unit are determined based on the image information, and the input features are processed based on the decoding neural network to obtain the output features corresponding to the decoding processing unit.
  • an embodiment of the present disclosure also proposes an image encoding device based on a neural network.
  • the device is applied to the encoding end.
  • the device includes: a memory configured to store video data; encoding
  • the processor is configured to implement the encoding method in the above-mentioned Embodiment 1 to Embodiment 22, that is, the processing flow of the encoding end.
  • an encoder is configured to:
  • the input features corresponding to the coding processing unit are determined based on the current block, the input features are processed based on the coding neural network corresponding to the coding processing unit, the output features corresponding to the coding processing unit are obtained, and the input features corresponding to the coding processing unit are determined based on the output features. Describe the image information corresponding to the current block;
  • control parameters corresponding to the current block where the control parameters include neural network information corresponding to the decoding processing unit, and the neural network information is used to determine the decoding neural network corresponding to the decoding processing unit;
  • the image information and control parameters corresponding to the current block are encoded in the code stream.
  • the decoding end device (which may also be called a video decoder) provided by the embodiment of the present disclosure, from a hardware level, its hardware architecture schematic diagram can be seen in FIG. 8A . It includes: a processor 811 and a machine-readable storage medium 812.
  • the machine-readable storage medium 812 stores machine-executable instructions that can be executed by the processor 811; the processor 811 is used to execute machine-executable instructions to implement the above implementation of the present disclosure. Decoding method of Example 1-22.
  • the encoding end device (which may also be called a video encoder) provided by the embodiment of the present disclosure, from a hardware level, its hardware architecture schematic diagram can be seen in FIG. 8B . It includes: a processor 821 and a machine-readable storage medium 822.
  • the machine-readable storage medium 822 stores machine-executable instructions that can be executed by the processor 821; the processor 821 is used to execute machine-executable instructions to implement the above implementation of the present disclosure. Encoding method of Example 1-22.
  • embodiments of the present disclosure also provide a machine-readable storage medium.
  • Several computer instructions are stored on the machine-readable storage medium.
  • the present invention can be realized.
  • the methods disclosed in the above examples are disclosed, such as the decoding method or encoding method in the above embodiments.
  • embodiments of the present disclosure also provide a computer application program.
  • the computer application program When the computer application program is executed by a processor, it can implement the decoding method or encoding method disclosed in the above examples of the present disclosure.
  • the embodiment of the present disclosure also proposes an image decoding device based on a neural network.
  • the device is applied to the decoding end.
  • the device includes: a decoding module for decoding the current code stream from the code stream.
  • the acquisition module is used to obtain the neural network information corresponding to the decoding processing unit from the control parameters, and generate the decoding neural network corresponding to the decoding processing unit based on the neural network information;
  • processing A module configured to determine input features corresponding to the decoding processing unit based on the image information, process the input features based on the decoding neural network, and obtain output features corresponding to the decoding processing unit.
  • the acquisition module when the acquisition module generates the decoding neural network corresponding to the decoding processing unit based on the neural network information, it is specifically used to: based on the basic layer Information determines the base layer corresponding to the decoding processing unit; determines the enhancement layer corresponding to the decoding processing unit based on the enhancement layer information; generates a decoding neural network corresponding to the decoding processing unit based on the base layer and the enhancement layer. .
  • the acquisition module determines the base layer corresponding to the decoding processing unit based on the base layer information, it is specifically used: if the base layer information includes the base layer using default network flag bit, and the base layer uses The default network flag bit indicates that the base layer uses the default network, and the base layer of the default network structure is obtained.
  • the acquisition module determines the base layer corresponding to the decoding processing unit based on the base layer information
  • it is specifically used: if the base layer information includes a base layer using prefabricated network flag and a base layer prefabricated network index number.
  • the basic layer uses a prefabricated network flag to indicate that the basic layer uses a prefabricated network, then select the basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer from the prefabricated neural network pool; wherein, the prefabricated neural network The pool includes at least one network layer of a prefabricated network structure.
  • the acquisition module determines the enhancement layer corresponding to the decoding processing unit based on the enhancement layer information, it is specifically used: if the enhancement layer information includes the enhancement layer using default network flag bit, and the enhancement layer uses The default network flag indicates that the enhancement layer uses the default network, and the enhancement layer of the default network structure is obtained.
  • the acquisition module determines the enhancement layer corresponding to the decoding processing unit based on the enhancement layer information
  • it is specifically used: if the enhancement layer information includes the enhancement layer use prefabricated network flag and the enhancement layer prefabricated network index number.
  • the enhancement layer uses a prefabricated network flag to indicate that the enhancement layer uses a prefabricated network, then select the enhancement layer of the prefabricated network structure corresponding to the prefabricated network index number of the enhancement layer from the prefabricated neural network pool; wherein, the prefabricated neural network The pool includes at least one network layer of a prefabricated network structure.
  • the acquisition module determines the enhancement layer corresponding to the decoding processing unit based on the enhancement layer information
  • it is specifically used to: if the enhancement layer information includes network parameters used to generate the enhancement layer, based on the network Parameters generate the enhancement layer corresponding to the decoding processing unit; wherein the network parameters include at least one of the following: the number of neural network layers, the deconvolution layer flag, the number of deconvolution layers, and the number of each deconvolution layer.
  • Quantization step size number of channels of each deconvolution layer, size of convolution kernel, number of filters, filter size index, filter coefficient zero flag, filter coefficient, activation layer flag, activation layer type.
  • the image information includes coefficient hyperparameter feature information and image feature information.
  • the processing module determines the input features corresponding to the decoding processing unit based on the image information, and analyzes the input features based on the decoding neural network.
  • it is specifically used to: when executing the decoding process of coefficient hyperparameter feature generation, determine the coefficient hyperparameter feature coefficient reconstruction value based on the coefficient hyperparameter feature information; based on the The decoding neural network performs an inverse transformation operation on the reconstruction value of the coefficient hyperparameter feature coefficient to obtain the coefficient hyperparameter feature value; wherein the coefficient hyperparameter feature value is used to decode the image feature information from the code stream; when executing the image
  • the image feature reconstruction value is determined based on the image feature information; the image feature reconstruction value is inversely transformed based on the decoding neural network to obtain the low-order feature value of the image; wherein, the image The low-order feature values are used to obtain the
  • the processing module determines the coefficient hyperparameter characteristic coefficient reconstruction value based on the coefficient hyperparameter characteristic information, it is specifically used: if the control parameter includes first enabling information, and the first enabling information represents When the first inverse quantization operation is enabled, the coefficient hyperparameter feature information is inversely quantized to obtain a coefficient hyperparameter feature coefficient reconstruction value.
  • the processing module determines the image feature reconstruction value based on the image feature information, it is specifically used: if the control parameter includes second enablement information, and the second enablement information indicates that the second inverse reaction is enabled.
  • the image feature information is inversely quantized to obtain the image feature reconstruction value.
  • the processing module is further configured to: if the control parameter includes third enabling information, the third enabling information indicates enabling the quality enhancement operation, and when performing the quality enhancement decoding process, obtain the The low-order feature values of the image are enhanced based on the decoding neural network to obtain the reconstructed image block corresponding to the current block.
  • an embodiment of the present disclosure also proposes an image coding device based on a neural network.
  • the device is applied to the encoding end.
  • the device includes: a processing module for determining encoding processing based on the current block.
  • the input features corresponding to the unit are based on the coding neural network corresponding to the coding processing unit.
  • the acquisition module is used to obtain the control parameters corresponding to the current block, and the control parameters include decoding Neural network information corresponding to the processing unit, the neural network information is used to determine the decoding neural network corresponding to the decoding processing unit; an encoding module is used to encode the image information and control parameters corresponding to the current block in the code stream.
  • the neural network information includes basic layer information and enhancement layer information
  • the decoding neural network includes a base layer determined based on the basic layer information and an enhancement layer determined based on the enhancement layer information.
  • the decoding neural network adopts the base layer of the default network structure.
  • the decoding neural network uses The basic layer of the prefabricated network structure corresponding to the prefabricated network index number of the basic layer selected from the prefabricated neural network pool; wherein the prefabricated neural network pool includes at least one network layer of the prefabricated network structure.
  • the enhancement layer information includes the enhancement layer uses default network flag
  • the enhancement layer uses default network flag indicates that the enhancement layer uses a default network
  • the decoding neural network uses an enhancement layer with a default network structure.
  • the enhancement layer information includes the enhancement layer using prefabricated network flag and the enhancement layer prefabricated network index number, and the enhancement layer using prefabricated network flag indicates that the enhancement layer uses a prefabricated network
  • the decoding neural network uses The enhancement layer is an enhancement layer of a prefabricated network structure corresponding to the prefabricated network index number selected from the prefabricated neural network pool; wherein the prefabricated neural network pool includes at least one network layer of a prefabricated network structure.
  • the decoding neural network uses the enhancement layer generated based on the network parameters; wherein the network parameters include at least one of the following: neural network Number of layers, deconvolution layer flag, number of deconvolution layers, quantization step size of each deconvolution layer, number of channels of each deconvolution layer, size of convolution kernel, number of filters, filter size index , filter coefficient is zero flag, filter coefficient, activation layer flag, activation layer type.
  • the processing module is also used to: divide the current image into N image blocks that do not overlap each other, N is a positive integer; perform boundary filling on each image block to obtain a boundary-filled image block; wherein, When performing boundary filling on each image block, the filling value does not depend on the reconstructed pixel values of adjacent image blocks; N current blocks are generated based on the boundary-filled image blocks.
  • the processing module is also used to: divide the current image into multiple basic blocks, and each basic block includes at least one image block; perform boundary filling on each image block to obtain a boundary-filled image block; Among them, when performing boundary filling on each image block, the filling value of the image block does not depend on the reconstructed pixel values of other image blocks in the same basic block, and is allowed to depend on the reconstructed pixel values of image blocks in different basic blocks; based on boundary filling The subsequent image blocks generate multiple current blocks.
  • embodiments of the present disclosure may be provided as methods, systems, or computer program products.
  • the disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects.
  • Embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Abstract

本公开提供一种基于神经网络的图像解码、编码方法、装置及其设备,该方法包括:从码流中解码当前块对应的控制参数和图像信息;从所述控制参数中获取解码处理单元对应的神经网络信息,基于所述神经网络信息生成解码处理单元对应的解码神经网络;基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。通过本公开能够提高编码性能和解码性能。

Description

一种基于神经网络的图像解码、编码方法、装置及其设备 技术领域
本公开涉及编解码技术领域,尤其是涉及一种基于神经网络的图像解码、编码方法、装置及其设备。
背景技术
为了达到节约空间的目的,视频图像都是经过编码后才传输的,完整的视频编码可以包括预测、变换、量化、熵编码、滤波等过程。针对预测过程,预测过程可以包括帧内预测和帧间预测,帧间预测是指利用视频时间域的相关性,使用邻近已编码图像的像素预测当前像素,以达到有效去除视频时域冗余的目的。帧内预测是指利用视频空间域的相关性,使用当前帧图像的已编码块的像素预测当前像素,以达到去除视频空域冗余的目的。
随着深度学习的迅速发展,深度学习在许多高层次的计算机视觉问题上取得成功,如图像分类、目标检测等,深度学习也逐渐在编解码领域开始应用,即可以采用神经网络对图像进行编码和解码。虽然基于神经网络的编解码方法展现出巨大性能潜力,但是,基于神经网络的编解码方法仍然存在稳定性较差、泛化性较差和复杂度较高等问题。
发明内容
有鉴于此,本公开提供了一种基于神经网络的图像解码、编码方法、装置及其设备,提高了编码性能和解码性能,解决稳定性较差、泛化性较差和复杂度较高等问题。
本公开提供一种基于神经网络的图像解码方法,应用于解码端,所述方法包括:
从码流中解码当前块对应的控制参数和图像信息;
从所述控制参数中获取解码处理单元对应的神经网络信息,并基于所述神经网络信息生成所述解码处理单元对应的解码神经网络;
基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。
本公开提供一种基于神经网络的图像编码方法,应用于编码端,所述方法包括:
基于当前块确定编码处理单元对应的输入特征,基于所述编码处理单元对应的编码神经网络对所述输入特征进行处理,得到所述编码处理单元对应的输出特征,并基于所述输出特征确定所述当前块对应的图像信息;
获取当前块对应的控制参数,所述控制参数包括解码处理单元对应的神经网络信息,所述神经网络信息用于确定所述解码处理单元对应的解码神经网络;
在码流中编码所述当前块对应的图像信息和控制参数。
本公开提供一种基于神经网络的图像解码装置,所述装置包括:
存储器,其经配置以存储视频数据;
解码器,其经配置以实现:
从码流中解码当前块对应的控制参数和图像信息;
从所述控制参数中获取解码处理单元对应的神经网络信息,并基于所述神经网络信息生成所述解码处理单元对应的解码神经网络;
基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。
本公开提供一种基于神经网络的图像编码装置,所述装置包括:
存储器,其经配置以存储视频数据;
编码器,其经配置以实现:
基于当前块确定编码处理单元对应的输入特征,基于所述编码处理单元对应的编码神经网络对所述输入特征进行处理,得到所述编码处理单元对应的输出特征,并基于所述输出特征确定所述当前块对应的图像信息;
获取当前块对应的控制参数,所述控制参数包括解码处理单元对应的神经网络信息,所述神经网络信息用于确定所述解码处理单元对应的解码神经网络;
在码流中编码所述当前块对应的图像信息和控制参数。
本公开提供一种解码端设备,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;
所述处理器用于执行机器可执行指令,以实现上述的基于神经网络的图像解码方法。
本公开提供一种编码端设备,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;
所述处理器用于执行机器可执行指令,以实现上述的基于神经网络的图像编码方法。
由以上技术方案可见,本公开实施例中,可以从码流中解码当前块对应的控制参数,从控制参数中获取解码处理单元对应的神经网络信息,并基于神经网络信息生成解码处理单元对应的解码神经网络,继而基于解码神经网络实现图像解码,提高解码性能。可以基于编码处理单元对应的编码神经网络实现图像编码,提高编码性能。能够采用神经网络(如解码神经网络和编码神经网络等)对图像进行编码和解码,且通过码流传输神经网络信息,继而基于神经网络信息生成解码处理单元对应的解码神经网络,从而解决稳定性较差、泛化性较差和复杂度较高等问题,即稳定性较好,泛化性较好,复杂度较低。可以提供编解码复杂度动态调整的方案,相比单一神经网络的框架有更优的编码性能和解码性能。由于是每个当前块对应控制参数,那么,从控制参数获取的神经网络信息就是针对当前块的神经网络信息,从而对每个当前块分别生成解码神经网络,即不同当前块的解码神经网络可能相同,也可能不同,从而块级别的解码神经网络,即解码神经网络是可以变动可以调整的。
附图说明
图1是视频编码框架的示意图;
图2A-图2C是视频编码框架的示意图;
图3是本公开一种实施方式中的基于神经网络的图像解码方法的流程图;
图4是本公开一种实施方式中的基于神经网络的图像编码方法的流程图;
图5A和图5C是本公开一种实施方式中的图像编码方法和图像解码方法的示意图;
图5B和图5D是本公开一种实施方式中的边界填充的示意图;
图5E是本公开一种实施方式中的原始图像的图像域变换的示意图;
图6A和图6B是本公开一种实施方式中的解码端的结构示意图;
图6C是本公开一种实施方式中的系数超参特征生成单元的结构示意图;
图6D是本公开一种实施方式中的图像特征反变换单元的结构示意图;
图7A、图7B和图7C是本公开一种实施方式中的编码端的结构示意图;
图8A是本公开一种实施方式中的解码端设备的硬件结构图;
图8B是本公开一种实施方式中的编码端设备的硬件结构图。
具体实施方式
在本公开实施例使用的术语仅仅是出于描述特定实施例的目的,而非用于限制本公开。本公开实施例和权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。应当理解,尽管在本公开实施例可能采用术语第一、第二、第三等来描述各种信息,但是,这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息,取决于语境。此外,所使用的词语“如果”可以被解释成为“在......时”,或“当......时”,或“响应于确定”。
本公开实施例中提出一种基于神经网络的图像解码、编码方法、可以涉及如下概念:
神经网络(Neural Network,NN):神经网络是指人工神经网络,而不是生物神经网络,神经网络是一种运算模型,由大量节点(或称为神经元)之间相互联接构成。在神经网络中,神经元处理单元可表示不同的对象,如特征、字母、概念,或者一些有意义的抽象模式。神经网络中处理单元的类型分为三类:输入单元、输出单元和隐单元。输入单元接受外部世界的信号与数据;输出单元实现处理结果的输出;隐单元是处在输入和输出单元之间,不能由系统外部观察的单元。神经元之间的连接权值反映了单元间的连接强度,信息的表示和处理体现在处理单元的连接关系中。神经网络是一种非程序化、类大脑风格的信息处理方式,其本质是通过神经网络的变换和动力学行为得到一种并行分布式的信息处理功能,并在不同程度和层次上模仿人脑神经系统的信息处理功能。在视频处理领域,常用的神经网络可以包括但不限于:卷积神经网络(CNN)、循环神经网络(RNN)、全连接网络等。
卷积神经网络(Convolutional Neural Network,CNN):卷积神经网络是一种前馈神经网络,是深度学习技术中极具代表的网络结构之一,卷积神经网络的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。卷积神经网络的基本结构包括两层,其一为特征提取层(也称卷积层),每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来。其二是特征映射层(也称激活层),神经网络的每个计算层由多个特征映射组成,每个特征映射是一个平面,平面上所有神经元的权值相等。特征映射结构可采用Sigmoid函数、ReLU函数、Leaky-ReLU函数、PReLU函数、GDN函数等作为卷积网络的激活函数。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。
卷积神经网络相较于图像处理算法的优点之一在于,避免了对图像复杂的前期预处理过程(提取人工特征等),可以直接输入原始图像,进行端到端的学习。卷积神经网络相较于普通神经网络的优点之一在于,普通神经网络都是采用全连接的方式,即输入层到隐藏层的神经元都是全部连接的,这样做将导致参数量巨大,使得网络训练耗时甚至难以训练,而卷积神经网络则通过局部连接、权值共享等方法避免了这一困难。
反卷积层(Deconvolution):反卷积层又称为转置卷积层,反卷积层和卷积层的工作过程很相似,主要区别在于,反卷积层会通过padding(填充),使得输出大于输入(当然也可以保持相同)。若stride为1,则表示输出尺寸等于输入尺寸;若stride(步长)为N,则表示输出特征的宽为输入特征的宽的N倍,输出特征的高为输入特征的高的N倍。
泛化能力(Generalization Ability):泛化能力可以是指机器学习算法对新鲜样本的适应能力,学习的目的是学到隐含在数据对背后的规律,对具有同一规律的学习集以外的数据,经过训练的网络也能够给出合适的输出,该能力就可以称为泛化能力。
率失真原则(Rate-Distortion Optimized):评价编码效率的有两大指标:码率和PSNR(Peak Signal to Noise Ratio,峰值信噪比),比特流越小,则压缩率越大,PSNR越大,则重建图像质量越好,在模式选择时,判别公式实质上也就是对二者的综合评价。例如,模式对应的代价:J(mode)=D+λ*R,其中,D表示Distortion(失真),通常可以使用SSE指标来进行衡量,SSE是指重建图像块与源图像的差值的均方和,为了实现代价考虑,也可以使用SAD指标,SAD是指重建图像块与源图像的差值绝对值之和;λ是拉格朗日乘子,R就是该模式下图像块编码所需的实际比特数,包括编码模式信息、运动信息、残差等所需的比特总和。在模式选择时,若使用率失真原则去对编码模式做比较决策,通常可以保证编码性能最佳。
视频编码框架:参见图1所示,为编码端的视频编码框架的示意图,可以使用该视频编码框架实现本公开实施例的编码端处理流程,视频解码框架的示意图可以与图1类似,在此不再重复赘述,可以使用视频解码框架实现本公开实施例的解码端处理流程。
示例性的,参见图1所示,视频编码框架可以包括预测、变换、量化、熵编码器、反量化、反变换、重建、滤波等模块。在编码端,通过这些模块之间的配合,可以实现编码端的处理流程。此外,视频解码框架可以包括预测、变换、量化、熵解码器、反量化、反变换、重建、滤波等模块,在解码端,通过这些模块之间的配合,可以实现解码端的处理流程。
针对编码端的各个模块,提出了非常多的编码工具,而每个工具往往又有多种模式。对于不同视频序列,能获得最优编码性能的编码工具往往不同。因此,在编码过程中,通常采用RDO(Rate-Distortion Optimize)比较不同工具或模式的编码性能,以选择最佳模式。在确定最优工具或模式后,再通过在比特流中编码标记信息的方法传递工具或模式的决策信息。这种方法虽然带来了较高的编码复杂度,但可以针对不同内容,自适 应选择最优的模式组合,获得最优的编码性能。解码端可通过直接解析标志信息获得相关模式信息,复杂度影响较小。
下面对编码端和解码端的结构进行简单介绍。参见图2A所示,示出用于实现本公开实施例的编码端的实例的示意性框图。在图2A中,编码端包括预测处理单元、残差计算单元、变换处理单元、量化单元、编码单元、反量化单元(也可以称为逆量化单元)、反变换处理单元(也可以称为逆变换处理单元)、重构单元(或者称为重建单元)以及滤波器单元。在一个例子中,编码端还可以包括缓冲器、经解码图像缓冲器,其中,缓冲器用于缓存重构单元输出的重构图像块,经解码图像缓冲器用于缓存滤波器单元输出的滤波后的图像块。
编码端(也称为编码器)的输入为图像(可以称为待编码图像)的图像块,图像块也可以称为当前块或待编码块,编码端中还可以包括分割单元(图中未示出),该分割单元用于将待编码图像分割成多个图像块。编码端用于逐块编码从而完成对待编码图像的编码,例如,对每个图像块执行编码过程。预测处理单元用于接收或获取图像块(当前待编码图像的当前待编码图像块,也可以称为当前块,该图像块可以理解为图像块的真实值)和已重构图像数据,基于已重构图像数据中的相关数据对当前块进行预测,得到当前块的预测块。在一个例子中,预测处理单元可以包含帧间预测单元、帧内预测单元和模式选择单元,模式选择单元用于选择帧内预测模式或者帧间预测模式,若选择帧内预测模式,则由帧内预测单元执行预测过程,若选择帧间预测模式,则可以由帧间预测单元执行预测过程。
残差计算单元用于计算图像块的真实值和该图像块的预测块之间的残差,得到残差块,例如,残差计算单元可以通过逐像素将图像块的像素值减去预测块的像素值。
变换处理单元用于对残差块进行例如离散余弦变换(discrete cosine transform,DCT)或离散正弦变换(discrete sine transform,DST)的变换,以在变换域中获取变换系数,变换系数也可以称为变换残差系数,该变换残差系数可以在变换域中表示残差块。
量化单元用于通过应用标量量化或向量量化来量化变换系数,以获取经量化变换系数,经量化变换系数也可以称为经量化残差系数。量化过程可以减少与部分或全部变换系数有关的位深度。例如,可在量化期间将n位变换系数向下舍入到m位变换系数,其中n大于m。可通过调整量化参数(quantization parameter,QP)修改量化程度。例如,对于标量量化,可以应用不同的标度来实现较细或较粗的量化。较小量化步长对应较细量化,而较大量化步长对应较粗量化。可以通过量化参数(quantization parameter,QP)指示合适的量化步长。
编码单元用于对上述经量化残差系数进行编码,以经编码比特流的形式输出的经编码图像数据(即当前待编码图像块的编码结果),然后将经编码比特流传输到解码器,或将其存储起来,后续传输至解码器或用于检索。编码单元还可用于对当前图像块的其它语法元素进行编码,例如将预测模式编码至码流等。编码算法包括但不限于可变长度编码(variable length coding,VLC)算法、上下文自适应VLC(context adaptive VLC,CAVLC)算法、算术编码算法、上下文自适应二进制算术编码(context adaptive binary arithmetic coding,CABAC)算法、基于语法的上下文自适应二进制算术编码(syntax-based context-adaptive binary arithmetic coding,SBAC)算法、概率区间分割熵(probability interval partitioning entropy,PIPE)算法。
反量化单元用于对上述经量化系数进行反量化,以获取经反量化系数,该反量化是上述量化单元的反向应用,例如,基于或使用与量化单元相同的量化步长,应用量化单元应用的量化方案的逆量化方案。经反量化系数也可以称为经反量化残差系数。
反变换处理单元用于对上述反量化系数进行反变换,应该理解,该反变换是上述变换处理单元的反向应用,例如,反变换可以包括逆离散余弦变换(inverse discrete cosine transform,IDCT)或逆离散正弦变换(inverse discrete sine transform,IDST),以在像素域(或者称为样本域)中获取逆变换块。逆变换块也可以称为逆变换经反量化块或逆变换残差块。
重构单元用于将逆变换块(即逆变换残差块)添加至预测块,以在样本域中获取经重构块,重构单元可以为求和器,例如,将残差块的样本值(即像素值)与预测块的样本值相加。该重构单元输出的重构块可以后续用于预测其他图像块,例如,在帧内预测模式下使用。
滤波器单元(或简称“滤波器”)用于对经重构块进行滤波以获取经滤波块,从而顺利进行像素转变或提高图像质量。滤波器单元可以为环路滤波器单元,旨在表示一个或多个环路滤波器,例如,滤波器单元可以为去块滤波器、样本自适应偏移 (sample-adaptive offset,SAO)滤波器或其它滤波器,例如双边滤波器、自适应环路滤波器(adaptive loop filter,ALF),或锐化或平滑滤波器,或协同滤波器。在一个例子中,该滤波单元输出的经滤波块可以后续用于预测其他图像块,例如,在帧间预测模式下使用,对此不做限制。
参见图2B所示,示出用于实现本公开实施例的编码端(也可以称为解码器)的实例的示意性框图。解码器用于接收例如由编码器编码的经编码图像数据(即经编码比特流,例如,包括图像块的经编码比特流及相关联的语法元素),以获取经解码图像。解码器包括解码单元、反量化单元、反变换处理单元、预测处理单元、重构单元、滤波器单元。在一些实例中,解码器可执行大体上与图2A的编码器描述的编码遍次互逆的解码遍次。在一个例子中,解码器还可以包括缓冲器、经解码图像缓冲器,其中,缓冲器用于缓存重构单元输出的重构图像块,经解码图像缓冲器用于缓存滤波器单元输出的滤波后的图像块。
解码单元用于对经编码图像数据执行解码,以获取经量化系数和/或经解码的编码参数(例如,解码参数可以包括帧间预测参数、帧内预测参数、滤波器参数和/或其它语法元素中的任意一个或全部)。解码单元还用于将上述经解码的编码参数转发至预测处理单元,以供预测处理单元根据编码参数执行预测过程。反量化单元的功能可与编码器的反量化单元的功能相同,用于反量化(即,逆量化)由解码单元解码的经量化系数。
反变换处理单元的功能可与编码器的反变换处理单元的功能可以相同,重构单元(例如求和器)的功能可与编码器的重构单元的功能可以相同,用于对上述经量化系数进行逆变换(例如,逆DCT、逆整数变换或概念上类似的逆变换过程),从而得到逆变换块(也可以称为逆变换残差块),该逆变换块即为当前图像块在像素域中的残差块。
预测处理单元,用于接收或获取经编码图像数据(例如当前图像块的经编码比特流)和已重构图像数据,预测处理单元还可以从例如解码单元接收或获取预测相关参数和/或关于所选择的预测模式的信息(即经解码的编码参数),并且基于已重构图像数据中的相关数据和经解码的编码参数对当前图像块进行预测,得到当前图像块的预测块。
在一个例子中,预测处理单元可以包含帧间预测单元、帧内预测单元和模式选择单元,模式选择单元用于选择帧内预测模式或者帧间预测模式,若选择帧内预测模式,则由帧内预测单元执行预测过程,若选择帧间预测模式,则由帧间预测单元执行预测过程。
重构单元用于将逆变换块(即逆变换残差块)添加到预测块,以在样本域中获取经重构块,例如,可以将逆变换残差块的样本值与预测块的样本值相加。
滤波器单元用于对经重构块进行滤波以获取经滤波块,该经滤波块即为经解码图像块。
应当理解的是,在本公开实施例的编码器和解码器中,针对某个环节的处理结果也可以对其进行进一步处理后,将进一步处理后的处理结果输出到下一个环节,例如,在插值滤波、运动矢量推导或滤波等环节之后,对相应环节的处理结果进一步进行Clip或移位shift等操作。
在编码器以及解码器的基础上,本公开实施例提供一种可能的编/解码实现方式,如图2C所示,图2C为本公开实施例提供的一种编码和解码的流程示意图,该编码和解码实现方式包括过程①至过程⑤,过程①至过程⑤可以由上述的解码器和编码器执行。过程①:将一帧图像分成一个或多个互相不重叠的并行编码单元。该一个或多个并行编码单元之间无依赖关系,可以完全并行且相互独立地编码和解码,如图2C所示出的并行编码单元1和并行编码单元2。
过程②:对于每个并行编码单元,可再将其分成一个或多个互相不重叠的独立编码单元,各个独立编码单元间可相互不依赖,但可以共用一些并行编码单元头信息。例如,独立编码单元的宽为w_lcu,高为h_lcu。若并行编码单元划分成一个独立编码单元,则独立编码单元的尺寸与并行编码单元完全相同;否则,则独立编码单元的宽应大于高(除非是边缘区域)。
通常的,独立编码单元可为固定的w_lcu×h_lcu,w_lcu和h_lcu均为2的N次方(N≥0),如独立编码单元的尺寸为:128×4,64×4,32×4,16×4,8×4,32×2,16×2或8×2等。
作为一种可能的示例,独立编码单元可以为固定的128×4。若并行编码单元的尺寸为256×8,则可以将并行编码单元等分为4个独立编码单元;若并行编码单元的尺寸为288×10,则并行编码单元以划分为:第一行和第二行分别为2个128×4和1个32×4的独立编码单元;第三行为2个128×2和1个32×2的独立编码单元。值得注意的是, 独立编码单元既可以包括亮度Y、色度Cb、色度Cr三个分量,或红(red,R)、绿(green,G)、蓝(blue,B)三个分量,或亮度Y、色度Co、色度Cg三个分量,也可以仅包含其中的某一个分量。若独立编码单元包含三个分量,则这三个分量的尺寸可以完全一样,也可不一样,具体与图像的输入格式相关。
过程③:对于每个独立编码单元,可再将其分成一个或多个互相不重叠的子编码单元,独立编码单元内的各个子编码单元可相互依赖,如多个子编码单元可以进行相互参考预编解码。
若子编码单元与独立编码单元尺寸相同(即独立编码单元仅分成一个子编码单元),则其尺寸可为过程②所述的所有尺寸。若独立编码单元分成多个互相不重叠的子编码单元,则其可行划分例子有:水平等分(子编码单元的高与独立编码单元相同,但宽不同,可为其1/2,1/4,1/8,1/16等),垂直等分(子编码单元的宽与独立编码单元相同,高不同,可为其1/2,1/4,1/8,1/16等),水平和垂直等分(四叉树划分)等,优选为水平等分。
子编码单元的宽为w_cu,高为h_cu,宽应大于高(除非是边缘区域)。通常的,子编码单元为固定的w_cu x h_cu,w_cu和h_cu均为2个N次方(N大于等于0),如16x4,8x4,16x2,8x2,8x1,4x1等。如子编码单元为固定的16x4。若独立编码单元的尺寸为64x4,则将独立编码单元等分为4个子编码单元;若独立编码单元的尺寸为72x4,则子编码单元划分为:4个16x4+1个8x4。值得注意的是,子编码单元既可以包括亮度Y、色度Cb、色度Cr三个分量(或红R、绿G、蓝B三分量,或,亮度Y、色度Co、色度Cg),也可以仅包含其中的某一个分量。若包含三个分量,几个分量的尺寸可以完全一样,也可以不一样,具体与图像输入格式相关。
值得注意的是,过程③可以是编解码方法中一个可选的步骤,编码器/解码器可以对过程②获得的独立编码单元进行残差系数(或残差值)进行编码和解码。
过程④:对于子编码单元,可以再将其分成一个或多个互相不重叠的预测组(Prediction Group,PG),PG也简称为Group,各PG按照选定预测模式进行编解码,得到PG的预测值,组成整个子编码单元的预测值,基于子编码单元的预测值和原始值,获得子编码单元的残差值。
过程⑤:基于子编码单元的残差值,对子编码单元进行分组,获得一个或多个相不重叠的残差小块(residual block,RB),各个RB的残差系数按照选定模式进行编解码,形成残差系数流。具体的,可分为对残差系数进行变换和不进行变换两类。
其中,过程⑤中残差系数编解码方法的选定模式可以包括,但不限于下述任一种:半定长编码方式、指数哥伦布(Golomb)编码方法、Golomb-Rice编码方法、截断一元码编码方法、游程编码方法、直接编码原始残差值等。例如,编码器可直接对RB内的系数进行编码。又例如,编码器也可以对残差块进行变换(如DCT、DST、Hadamard变换等),再对变换后的系数进行编码。作为一种可能的示例,当RB较小时,编码器可以直接对RB内的各个系数进行统一量化,再进行二值化编码。若RB较大,可以进一步划分为多个系数组(coefficient group,CG),再对各个CG进行统一量化,再进行二值化编码。在本公开的一些实施例中,系数组(CG)和量化组(QG)可以相同,当然,系数组和量化组也可以不同。
下面以半定长编码方式对残差系数编码的部分进行示例性说明。首先,将一个RB块内残差绝对值的最大值定义为修整最大值(modified maximum,MM)。其次,确定该RB块内残差系数的编码比特数(同一RB块内残差系数的编码比特数一致)。例如,若当前RB块的关键限值(critical limit,CL)为2,当前残差系数为1,则编码残差系数1需要2个比特,表示为01。若当前RB块的CL为7,则表示编码8-bit的残差系数和1-bit的符号位。CL的确定是去找满足当前子块所有残差都在[-2^(M-1),2^(M-1)]范围之内的最小M值。若同时存在-2^(M-1)和2^(M-1)两个边界值,则M增加1,即需要M+1个比特编码当前RB块的所有残差;若仅存在-2^(M-1)和2^(M-1)两个边界值中的一个,则编码一个Trailing位来确定该边界值是-2^(M-1)还是2^(M-1);若所有残差均不存在-2^(M-1)和2^(M-1)中的任何一个,则无需编码该Trailing位。对于某些特殊的情况,编码器可以直接编码图像的原始值,而不是残差值。
随着深度学习的迅速发展,神经网络可在训练数据驱动下自适应地构建特征描述,具有更高的灵活性和普适性,使得深度学习在许多高层次的计算机视觉问题上取得成功,如图像分类、目标检测等,且深度学习也逐渐在编解码领域开始应用,即采用神经网络对图像编码和解码。例如,使用卷积神经网络VRCNN(Variable Filter Size Convolutional Neural Network)替代去块滤波技术和自适应样点补偿技术,对帧内编码后的图像进行后处理滤波,使得重建图像主、客观质量获得大幅提升。又例如,可以将神经网络应用 于帧内预测,提出基于块上、下采样的帧内预测模式,对于帧内预测的块,先进行下采样编码,再通过神经网络对重建像素进行上采样,对于超高清序列可获得高达9.0%的性能增益。显然,神经网络可以有效摆脱人为设定模式的局限性,通过数据驱动获得满足实际需求的神经网络,显著提升编码性能。
示例性的,虽然基于神经网络的编解码方法展现出巨大的性能潜力,但是,基于神经网络的编解码方法仍然存在稳定性较差、泛化性较差和复杂度较高等问题。首先,神经网络仍处于快速迭代的过程,新的网络结构层出不穷,即使对于某些常见问题,何种网络结构最优仍未有定数,更不用说对于编码器某些模块的特定问题,因此,对于编码器的某个模块,采用单一固定的神经网络,风险性较高。其次,由于神经网络的形成高度依赖于训练数据,若训练数据中不包含实际问题的某个特征,则在处理该问题时,容易出现性能不佳的情况。相关技术中,一个模式通常仅采用一个神经网络,当该神经网络出现泛化能力不足时,该模式将带来编码性能的降低。再次,由于视频编码需要处理的数据密集度高,又有低延时的要求,因此,视频编码标准技术对于复杂度,尤其是解码复杂度的要求格外严苛,而为了获得较好的编码性能,用于编码的神经网络的参数量往往非常大(如1M以上),应用一次神经网络产生的乘加次数平均到每个像素上可达100K次以上,而简化后的网络层数或参数较少的神经网络,虽然无法获得最优的编码性能,但采用此类神经网络带来的编解码复杂度可大大减少。另外,针对图像编码方案,多数方案是以整帧图像作为输入(缓存开销极大),同时无法有效控制输出码率,而在实际应用中,往往需要获得任意码率的图像。
针对上述发现,本公开实施例中提出一种基于神经网络的图像解码方法和图像编码方法,可以采用神经网络(如解码神经网络和编码神经网络等)对图像进行编码和解码,本实施例中,优化思路是:除了关注编码性能和解码性能,还需要关注复杂度(特别是系数编解码的并行度)与应用功能性(支持码率可变且可细调,即码率可控制)。基于上述优化思路,本公开实施例中给出几个可行方式:a)固定神经网络的结构,但不限定神经网络的网络参数(即权重参数),网络参数可以通过ID方式索引。b)神经网络的结构弹性,可以通过语法编码传递相关结构参数,网络参数可以通过ID方式索引(部分参数可以通过语法编码传递,增加一定码率代价)。c)神经网络的结构弹性(高层语法配置)、部分网络(如浅层网络)的网络参数固定(节省码率),剩余网络的网络参数通过编码语法传递(保持性能优化空间)。
以下结合几个具体实施例,对本公开实施例中的解码方法和编码方法进行详细说明。
实施例1:本公开实施例中提出一种基于神经网络的图像解码方法,参见图3所示,为该方法的流程示意图,该方法可以应用于解码端(也称为视频解码器),该方法可以包括步骤301-303。
步骤301、从码流中解码当前块对应的控制参数和图像信息。
步骤302、从该控制参数中获取解码处理单元对应的神经网络信息,并基于神经网络信息生成该解码处理单元对应的解码神经网络。
步骤303、基于该图像信息确定解码处理单元对应的输入特征,基于解码神经网络对该输入特征进行处理,得到该解码处理单元对应的输出特征。
在一种可能的实施方式中,若神经网络信息包括基本层信息和增强层信息,则可以基于基本层信息确定解码处理单元对应的基本层,并基于增强层信息确定解码处理单元对应的增强层;然后,基于基本层和增强层生成解码处理单元对应的解码神经网络。
示例性的,针对确定基本层的过程,若基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,则获取默认网络结构的基本层。
示例性的,针对确定基本层的过程,若基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,则可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。
其中,预制神经网络池可以包括至少一个预制网络结构的网络层。
示例性的,针对确定增强层的过程,若增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,则获取默认网络结构的增强层。
示例性的,针对确定增强层的过程,若增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,则可以从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层。
其中,预制神经网络池包括至少一个预制网络结构的网络层。
示例性的,针对确定增强层的过程,若增强层信息包括用于生成增强层的网络参数,则基于该网络参数生成解码处理单元对应的增强层;其中,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。当然,上述只是几个示例,对此不作限制。
在一种可能的实施方式中,图像信息可以包括系数超参特征信息和图像特征信息,基于图像信息确定解码处理单元对应的输入特征,基于解码神经网络对输入特征进行处理,得到解码处理单元对应的输出特征,可以包括但不限于:在执行系数超参特征生成的解码过程时,可以基于该系数超参特征信息确定系数超参特征系数重建值;基于解码神经网络对该系数超参特征系数重建值进行反变换操作,得到系数超参特征值;其中,该系数超参特征值可以用于从码流中解码图像特征信息。在执行图像特征反变换的解码过程时,可以基于该图像特征信息确定图像特征重建值;基于解码神经网络对该图像特征重建值进行反变换操作,得到图像低阶特征值;其中,该图像低阶特征值可以用于获得当前块对应的重建图像块。
示例性的,基于系数超参特征信息确定系数超参特征系数重建值,可以包括但不限于:若该控制参数包括第一使能信息,且第一使能信息表示使能第一反量化操作,则可以对系数超参特征信息进行反量化,得到系数超参特征系数重建值。基于图像特征信息确定图像特征重建值,可以包括但不限于:若该控制参数包括第二使能信息,且第二使能信息表示使能第二反量化操作,则可以对图像特征信息进行反量化,得到图像特征重建值。
示例性的,若控制参数包括第三使能信息,且第三使能信息表示使能质量增强操作,在执行质量增强的解码过程时,可以获取图像低阶特征值,基于解码神经网络对该图像低阶特征值进行增强处理,得到当前块对应的重建图像块。
在一种可能的实施方式中,解码端设备可以包括控制参数解码单元、第一特征解码单元、第二特征解码单元、系数超参特征生成单元和图像特征反变换单元;图像信息可以包括系数超参特征信息和图像特征信息。其中:控制参数解码单元可以从码流中解码控制参数,第一特征解码单元可以从码流中解码系数超参特征信息,第二特征解码单元可以从码流中解码图像特征信息。在系数超参特征生成单元为解码处理单元时,可以基于系数超参特征信息确定系数超参特征系数重建值;系数超参特征生成单元可以基于解码神经网络对系数超参特征系数重建值进行反变换操作,得到系数超参特征值;其中,系数超参特征值用于使第二特征解码单元从码流中解码图像特征信息。在图像特征反变换单元为解码处理单元时,基于图像特征信息确定图像特征重建值;图像特征反变换单元可以基于解码神经网络对图像特征重建值进行反变换操作,得到图像低阶特征值;其中,图像低阶特征值用于获得当前块对应的重建图像块。
示例性的,解码端设备还包括第一反量化单元和第二反量化单元;其中:控制参数可以包括第一反量化单元的第一使能信息,若第一使能信息表示使能第一反量化单元,第一反量化单元可以从第一特征解码单元获取系数超参特征信息,并对系数超参特征信息进行反量化,得到系数超参特征系数重建值,将系数超参特征系数重建值提供给系数超参特征生成单元;控制参数可以包括第二反量化单元的第二使能信息,若第二使能信息表示使能第二反量化单元,所二反量化单元可以从第二特征解码单元获取图像特征信息,对图像特征信息进行反量化,得到图像特征重建值,并将图像特征重建值提供给图像特征反变换单元。
示例性的,解码端设备还可以包括质量增强单元;其中:控制参数可以包括质量增强单元的第三使能信息,若第三使能信息表示使能质量增强单元,那么,在质量增强单元为解码处理单元时,质量增强单元可以从图像特征反变换单元获取图像低阶特征值,并基于解码神经网络对图像低阶特征值进行增强处理,得到当前块对应的重建图像块。
示例性的,上述执行顺序只是为了方便描述给出的示例,在实际应用中,还可以改变步骤之间的执行顺序,对此执行顺序不做限制。而且,在其它实施例中,并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其它实施例中可能被分解为多个步骤进行描述;本说明书中所描述的多个步骤,在其它实施例也可能被合并为单个步骤进行描述。
由以上技术方案可见,本公开实施例中,可以从码流中解码当前块对应的控制参数,从控制参数中获取解码处理单元对应的神经网络信息,并基于神经网络信息生成解码处理单元对应的解码神经网络,继而基于解码神经网络实现图像解码,提高解码性能。可 以基于编码处理单元对应的编码神经网络实现图像编码,提高编码性能。能够采用深度学习网络(如解码神经网络和编码神经网络等)对图像进行编码和解码,且通过码流传输神经网络信息,继而基于神经网络信息生成解码处理单元对应的解码神经网络,从而解决稳定性较差、泛化性较差和复杂度较高等问题,即稳定性较好,泛化性较好,复杂度较低。可以提供编解码复杂度动态调整的方案,相比单一深度学习网络的框架有更优的编码性能。由于是每个当前块对应控制参数,那么,从控制参数获取的神经网络信息就是针对当前块的神经网络信息,从而对每个当前块分别生成解码神经网络,即不同当前块的解码神经网络可能相同,也可能不同,从而块级别的解码神经网络,即解码神经网络是可以变动可以调整的。
实施例2:本公开实施例中提出一种基于神经网络的图像编码方法,参见图4所示,为该方法的流程示意图,该方法可以应用于编码端(也称为视频编码器),该方法可以包括步骤401-403。
步骤401、基于当前块确定编码处理单元对应的输入特征,基于编码处理单元对应的编码神经网络对输入特征进行处理,得到编码处理单元对应的输出特征,并基于输出特征确定当前块对应的图像信息,如系数超参特征信息和图像特征信息等图像信息。
步骤402、获取当前块对应的控制参数,该控制参数可以包括解码处理单元对应的神经网络信息,该神经网络信息用于确定解码处理单元对应的解码神经网络。
步骤403、在码流中编码当前块对应的图像信息和控制参数。
在一种可能的实施方式中,神经网络信息可以包括基本层信息和增强层信息,解码神经网络包括基于基本层信息确定的基本层、及基于增强层信息确定的增强层。
示例性的,若基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,则解码神经网络采用默认网络结构的基本层。
示例性的,若基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,则解码神经网络可以采用从预制神经网络池中选取的基本层预制网络索引号对应的预制网络结构的基本层。
其中,预制神经网络池可以包括至少一个预制网络结构的网络层。
示例性的,若增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,则解码神经网络采用默认网络结构的增强层。
示例性的,若增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,则解码神经网络可以采用从预制神经网络池中选取的增强层预制网络索引号对应的预制网络结构的增强层。
其中,预制神经网络池包括至少一个预制网络结构的网络层。
示例性的,若增强层信息包括用于生成增强层的网络参数,则解码神经网络可以采用基于网络参数生成的增强层;其中,该网络参数可以但不限于包括以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。当然,上述只是网络参数的几个示例,对此不作限制。
在一种可能的实施方式中,图像信息可以包括系数超参特征信息和图像特征信息,基于当前块确定编码处理单元对应的输入特征,基于编码处理单元对应的编码神经网络对输入特征进行处理,得到编码处理单元对应的输出特征,并基于输出特征确定当前块对应的图像信息,可以包括但不限于:在执行特征变换的编码过程中,可以基于编码神经网络对当前块进行特征变换,得到当前块对应的图像特征值;其中,图像特征值用于确定图像特征信息。在执行系数超参特征变换的编码过程中,可以基于编码神经网络对图像特征值进行系数超参特征变换,得到系数超参特征系数值,该系数超参特征系数值用于确定系数超参特征信息。
示例性的,在基于系数超参特征系数值确定系数超参特征信息的过程,可以对系数超参特征系数值进行量化,得到系数超参特征系数量化值,并基于系数超参特征系数量化值确定系数超参特征信息。其中,控制参数还可以包括第一使能信息,且第一使能信息用于表示已使能第一量化操作。
示例性的,在基于图像特征值确定图像特征信息的过程,可以对图像特征值进行量化,得到图像特征量化值,并基于图像特征量化值确定图像特征信息。其中,控制参数还可以包括第二使能信息,且第二使能信息用于表示已使能第二量化操作。
示例性的,获取当前块对应的控制参数,可以包括但不限于:基于特征变换的编码过程中采用的编码神经网络的网络结构,确定解码端设备的图像特征反变换的解码过程中采用的神经网络信息,该神经网络信息用于确定解码端设备的图像特征反变换的解码过程对应的解码神经网络;和/或,基于系数超参特征变换的编码过程中采用的编码神经网络的网络结构,确定解码端设备的系数超参特征生成的解码过程中采用的神经网络信息,该神经网络信息用于确定解码端设备的系数超参特征生成的解码过程对应的解码神经网络。
在一种可能的实施方式中,编码端设备可以包括控制参数编码单元、第一特征编码单元、第二特征编码单元、特征变换单元、系数超参特征变换单元;图像信息可以包括系数超参特征信息和图像特征信息。其中:控制参数编码单元在码流中编码控制参数,第一特征编码单元在码流中编码系数超参特征信息,第二特征编码单元在码流中编码图像特征信息;在特征变换单元为编码处理单元时,特征变换单元可以基于编码神经网络对当前块进行特征变换,得到当前块对应的图像特征值;其中,图像特征值用于确定图像特征信息;在系数超参特征变换单元为编码处理单元时,系数超参特征变换单元可以基于编码神经网络对图像特征值进行系数超参特征变换,得到系数超参特征系数值,系数超参特征系数值用于确定系数超参特征信息。
示例性的,编码端设备还可以包括第一量化单元和第二量化单元;其中:第一量化单元可以从系数超参特征变换单元获取系数超参特征系数值,对系数超参特征系数值进行量化,得到系数超参特征系数量化值,并基于系数超参特征系数量化值确定系数超参特征信息。其中,控制参数还可以包括第一量化单元的第一使能信息,且第一使能信息用于表示已使能第一量化单元。以及,第二量化单元可以从特征变换单元获取图像特征值,对图像特征值进行量化,得到图像特征量化值,并基于图像特征量化值确定图像特征信息;其中,控制参数还可以包括第二量化单元的第二使能信息,且第二使能信息用于表示已使能第二量化单元。
示例性的,获取当前块对应的控制参数,可以包括但不限于:基于特征变换单元的编码神经网络的网络结构,确定解码端设备的图像特征反变换单元对应的神经网络信息,该神经网络信息用于确定解码端设备的图像特征反变换单元对应的解码神经网络;和/或,基于系数超参特征变换单元的编码神经网络的网络结构,确定解码端设备的系数超参特征生成单元对应的神经网络信息,该神经网络信息用于确定解码端设备的系数超参特征生成单元对应的解码神经网络。
在一种可能的实施方式中,图像信息可以包括系数超参特征信息和图像特征信息,在执行系数超参特征生成的解码过程时,可以基于该系数超参特征信息确定系数超参特征系数重建值;基于解码神经网络对该系数超参特征系数重建值进行反变换操作,得到系数超参特征值;该系数超参特征值可以用于从码流中解码图像特征信息。在执行图像特征反变换的解码过程时,可以基于该图像特征信息确定图像特征重建值;基于解码神经网络对该图像特征重建值进行反变换操作,得到图像低阶特征值;该图像低阶特征值可以用于获得当前块对应的重建图像块。
示例性的,基于系数超参特征信息确定系数超参特征系数重建值,可以包括但不限于:可以对系数超参特征信息进行反量化,得到系数超参特征系数重建值。基于图像特征信息确定图像特征重建值,可以包括但不限于:可以对图像特征信息进行反量化,得到图像特征重建值。
示例性的,在执行质量增强的解码过程时,可以获取图像低阶特征值,基于解码神经网络对该图像低阶特征值进行增强处理,得到当前块对应的重建图像块。
示例性的,获取当前块对应的控制参数,可以包括但不限于以下至少一种:基于编码端设备的系数超参特征生成的编码过程中采用的解码神经网络的网络结构,确定解码端设备的系数超参特征生成的解码过程中采用的神经网络信息,该神经网络信息用于确定解码端设备的系数超参特征生成的解码过程中采用的解码神经网络。基于编码端设备的图像特征反变换的编码过程中采用的解码神经网络的网络结构,确定解码端设备的图像特征反变换的解码过程中采用的神经网络信息,该神经网络信息用于确定解码端设备的图像特征反变换的解码过程中采用的解码神经网络。基于编码端设备的质量增强的编码过程中采用的解码神经网络的网络结构,确定解码端设备的质量增强的解码过程中采用的神经网络信息,该神经网络信息用于确定解码端设备的质量增强的解码过程中采用的解码神经网络。
在一种可能的实施方式中,编码端设备可以包括第一特征解码单元、第二特征解码单元、系数超参特征生成单元和图像特征反变换单元;图像信息可以包括系数超参特征信息和图像特征信息。其中:第一特征解码单元可以从码流中解码系数超参特征信息, 第二特征解码单元可以从码流中解码图像特征信息;在基于系数超参特征信息确定系数超参特征系数重建值之后,系数超参特征生成单元可以基于解码神经网络对系数超参特征系数重建值进行反变换操作,得到系数超参特征值;其中,系数超参特征值用于使第二特征解码单元从码流中解码图像特征信息;以及,用于使第二特征编码单元在码流中编码图像特征信息;在基于图像特征信息确定图像特征重建值之后,图像特征反变换单元可以基于解码神经网络对图像特征重建值进行反变换操作,得到图像低阶特征值;其中,图像低阶特征值用于获得当前块对应的重建图像块。
示例性的,编码端设备还可以包括第一反量化单元和第二反量化单元;其中:第一反量化单元可以从第一特征解码单元获取系数超参特征信息,并对该系数超参特征信息进行反量化,得到系数超参特征系数重建值;第二反量化单元可以从第二特征解码单元获取图像特征信息,并对该图像特征信息进行反量化,得到图像特征重建值。
示例性的,编码端设备还包括质量增强单元;质量增强单元从图像特征反变换单元获取图像低阶特征值,基于解码神经网络对图像低阶特征值进行增强处理,得到重建图像块。
示例性的,获取当前块对应的控制参数,可以包括但不限于以下至少一种:可以基于编码端设备的系数超参特征生成单元的解码神经网络的网络结构,确定解码端设备的系数超参特征生成单元对应的神经网络信息,该神经网络信息用于确定解码端设备的系数超参特征生成单元对应的解码神经网络;可以基于编码端设备的图像特征反变换单元的解码神经网络的网络结构,确定解码端设备的图像特征反变换单元对应的神经网络信息,该神经网络信息用于确定解码端设备的图像特征反变换单元对应的解码神经网络;可以基于编码端设备的质量增强单元的解码神经网络的网络结构,确定解码端设备的质量增强单元对应的神经网络信息,该神经网络信息用于确定解码端设备的质量增强单元对应的解码神经网络。
在一种可能的实施方式中,基于当前块确定编码处理单元对应的输入特征之前,还可以将当前图像划分为N个互相不重合的图像块,N为正整数;可以对每个图像块进行边界填充,得到边界填充后的图像块;其中,在对每个图像块进行边界填充时,填充值可以不依赖相邻图像块的重建像素值;可以基于边界填充后的图像块生成N个当前块。
在一种可能的实施方式中,基于当前块确定编码处理单元对应的输入特征之前,还可以将当前图像划分为多个基本块,且每个基本块包括至少一个图像块;可以对每个图像块进行边界填充,得到边界填充后的图像块;其中,在对每个图像块进行边界填充时,该图像块的填充值不依赖同一基本块内其它图像块的重建像素值,且允许依赖不同基本块内图像块的重建像素值;可以基于边界填充后的图像块生成多个当前块。
示例性的,上述执行顺序只是为了方便描述给出的示例,在实际应用中,还可以改变步骤之间的执行顺序,对此执行顺序不做限制。而且,在其它实施例中,并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其它实施例中可能被分解为多个步骤进行描述;本说明书中所描述的多个步骤,在其它实施例也可能被合并为单个步骤进行描述。
由以上技术方案可见,本公开实施例中,可以从码流中解码当前块对应的控制参数,从控制参数中获取解码处理单元对应的神经网络信息,并基于神经网络信息生成解码处理单元对应的解码神经网络,继而基于解码神经网络实现图像解码,提高解码性能。可以基于编码处理单元对应的编码神经网络实现图像编码,提高编码性能。能够采用深度学习网络(如解码神经网络和编码神经网络等)对图像进行编码和解码,且通过码流传输神经网络信息,继而基于神经网络信息生成解码处理单元对应的解码神经网络,从而解决稳定性较差、泛化性较差和复杂度较高等问题,即稳定性较好,泛化性较好,复杂度较低。可以提供编解码复杂度动态调整的方案,相比单一深度学习网络的框架有更优的编码性能。由于是每个当前块对应控制参数,那么,从控制参数获取的神经网络信息就是针对当前块的神经网络信息,从而对每个当前块分别生成解码神经网络,即不同当前块的解码神经网络可能相同,也可能不同,从而块级别的解码神经网络,即解码神经网络是可以变动可以调整的。
实施例3:本公开实施例中提出一种码率可变可调的基于神经网络的图像编码方法和图像解码方法,能够实现图像块的高并行,且码率可控可调,参见图5A所示,为图像编码方法和图像解码方法的示意图,示出了图像编码方法的编码过程和图像解码方法的解码过程。
示例性的,对于基于神经网络的图像编码方法,其编码过程可以包括以下步骤S11-S15。
步骤S11、分块单元将当前图像(即原始图像)划分为N个互相不重合的图像块(即原始图像块,可以记为原始图像块1、原始图像块2、...、原始图像块N),N为正整数。
步骤S12、对每个原始图像块进行边界填充,得到边界填充后的图像块,并基于边界填充后的图像块生成N个当前块,也就是说,分别对N个原始图像块进行边界填充,得到N个边界填充后的图像块,并将N个边界填充后的图像块作为N个当前块。
示例性的,参见图5B所示,在对每个原始图像块进行边界填充时,填充值可以不依赖相邻图像块的重建像素值,从而能够保证每个原始图像块可以独立并行编码,提高编码性能。
步骤S13、编码单元基于已编码块的信息确定当前块的编码参数,该编码参数用于控制当前块的编码码率大小(如量化步长等参数),对此编码参数不做限制。
步骤S14、控制单元将解码端需要且无法推导获得的控制参数写入码流。
步骤S15、将填充后的图像块(即当前块)输入基于神经网络的编码单元,由编码单元基于编码参数对当前块进行编码,并输出当前块的码流。示例性的,在编码单元基于编码参数对当前块进行编码时,编码单元可以采用神经网络对当前块进行编码。
示例性的,对于基于神经网络的图像解码方法,其解码过程可以包括以下步骤S21-S24。
步骤S21、解码端从码流中解码当前块需要且无法推导获得的控制参数。
步骤S22,基于控制参数和当前块的码流,通过基于神经网络的解码单元,获得当前块对应的重建图像块,即对当前块进行解码,得到重建图像块,如原始图像块1对应的重建图像块1、原始图像块2对应的重建图像块2、...、原始图像块N对应的重建图像块N。
示例性的,在解码单元对当前块进行解码时,可以采用神经网络对当前块进行解码。
步骤S23,基于控制参数,确定是否对某个当前块进行滤波过程,如果是,则滤波合并单元基于当前块和至少1个相邻重建图像块的信息进行滤波过程,得到滤波后的图像块。
步骤S24,将滤波后的图像块合并获得重建图像。
在一种可能的实施方式中,在对原始图像块进行边界填充时,填充值可以是填充预设值,填充预设值可以是编解码约定的默认值(如0,或1<<(1-depth)等,depth为比特深度,如8,10,12等),也可以是通过高层语法编码传递给解码端的值,还可以是基于当前块的像素进行镜像、最近邻复制等操作获得,对此填充值的获取方式不做限制。
在一种可能的实施方式中,在对原始图像块进行边界填充时,周围块填充拓展的尺寸,可以是编解码约定的默认值(如1、2、4等),也可以是与当前块的尺寸相关的值,还可以是通过高层语法编码传递给解码端的值,对此周围块填充拓展的尺寸不做限制。
实施例4:本公开实施例中提出一种码率可变可调的基于神经网络的图像编码方法和图像解码方法,能够实现图像块的高并行,且码率可控可调,在此基础上,还可以利用相邻块之间的信息(如相邻块的重建像素等),参见图5C所示,为图像编码方法和图像解码方法的示意图,示出了图像编码方法的编码过程和图像解码方法的解码过程。
示例性的,对于基于神经网络的图像编码方法,其编码过程可以包括以下步骤S31-S35。
步骤S31、分块单元将当前图像(即原始图像)划分为多个基本块,每个基本块包括至少一个图像块,以M个基本块为例,M为正整数,M个基本块共包括N个互相不重合的图像块(即原始图像块,记为原始图像块1、原始图像块2、...、原始图像块N),N为正整数。
步骤S32、对每个原始图像块进行边界填充,得到边界填充后的图像块,并基于边界填充后的图像块生成N个当前块,也就是说,分别对N个原始图像块进行边界填充,得到N个边界填充后的图像块,并将N个边界填充后的图像块作为N个当前块。
示例性的,在对每个原始图像块进行边界填充时,该原始图像块的填充值不依赖同一基本块内其它原始图像块的重建像素值,且允许依赖不同基本块内原始图像块的重建像素值。
本实施例中引入基本块的概念,参见图5D所示,每个基本块包括至少1个图像块 (即原始图像块),基本块内的各个图像块之间互相不参考,但可以参考与图像块处于不同基本块的其它图像块的重建信息。比如说,参见图5D所示,对于图像块1,其左侧的图像块(即图像块1左侧的相邻块)与图像块1处于同一个基本块中,因此,不能采用该图像块的重建信息作为图像块1的填充值,而是采用填充预设值对图像块进行填充。对于图像块1,其上侧的图像块(即图像块1上侧的相邻块)与图像块1处于不同的基本块中,因此,可以采用该图像块的重建信息作为图像块1的填充值。在采用该图像块的重建信息作为图像块1的填充值时,可以采用该图像块的重建值,且可以采用该图像块的滤波前的重建值。
显然,通过引入基本块,从而既能够保证并行度(基本块内各个图像块可以并行编解码),同时也有助于提升性能(可以利用相邻基本块内图像块的重建信息)。
步骤S33、编码单元基于已编码块的信息确定当前块的编码参数,该编码参数用于控制当前块的编码码率大小(如量化步长等参数),对此编码参数不做限制。
步骤S34、控制单元将解码端需要且无法推导获得的控制参数写入码流。
步骤S35、将填充后的图像块(即当前块)输入基于神经网络的编码单元,由编码单元基于编码参数对当前块进行编码,并输出当前块的码流。示例性的,在编码单元基于编码参数对当前块进行编码时,编码单元可以采用神经网络对当前块进行编码。
示例性的,对于基于神经网络的图像解码方法,其解码过程可以包括以下步骤S41-S44。
步骤S41、解码端从码流中解码当前块需要且无法推导获得的控制参数。
步骤S42,基于控制参数和当前块的码流,通过基于神经网络的解码单元,获得当前块对应的重建图像块,即对当前块进行解码,得到重建图像块,如原始图像块1对应的重建图像块1、原始图像块2对应的重建图像块2、...、原始图像块N对应的重建图像块N。
步骤S43,基于控制参数,确定是否对某个当前块进行滤波过程,如果是,则滤波合并单元基于当前块和至少1个相邻重建图像块的信息进行滤波过程,得到滤波后的图像块。
步骤S44,将滤波后的图像块合并获得重建图像。
在一种可能的实施方式中,在对原始图像块进行边界填充时,填充值可以是填充预设值,填充预设值可以是编解码约定的默认值(如0,或1<<(1-depth)等,depth为比特深度,如8,10,12等),也可以是通过高层语法编码传递给解码端的值,还可以是基于当前块的像素进行镜像、最近邻复制等操作获得,对此填充值的获取方式不做限制。
在一种可能的实施方式中,在对原始图像块进行边界填充时,周围块填充拓展的尺寸,可以是编解码约定的默认值(如1、2、4等),也可以是与当前块的尺寸相关的值,还可以是通过高层语法编码传递给解码端的值,对此周围块填充拓展的尺寸不做限制。
在一种可能的实施方式中,基本块内包括的图像块个数,可以是编解码约定的默认值(如1、4、16等),也可以是与当前图像的尺寸相关的值,还可以是通过高层语法编码传递给解码端的值,对此基本块内的图像块个数不做限制,可以根据实际需要进行选择。
在一种可能的实施方式中,在实施例3和实施例4中,对于原始图像的分块过程(步骤S11和步骤S31),也可以是对原始图像进行图像域变换,对图像域变换后的图像进行分块,相应的,可以将滤波后的图像块合并为一个图像,对这个图像进行图像域反变换,基于图像域反变换后图像获得重建图像,参见图5E所示。示例性的,对原始图像进行图像域变换,可以是RGB域图像到YUV域图像的变换过程(对应的图像域反变换就是YUV域图像到RGB域图像的反变换过程),也可以是引入小波变换或傅里叶变换等过程生成新域的图像(对应的图像域反变换就是逆小波变换或逆傅里叶变换过程)。图像域变换过程可以采用神经网络实现,也可以采用非神经网络实现,对此图像域变换过程不做限制。
实施例5:本公开实施例中提出一种基于神经网络的图像解码方法,可以应用于解码端(也称为视频解码器),参见图6A所示,为解码端的结构示意图,解码端可以包括控制参数解码单元、第一特征解码单元、第二特征解码单元、系数超参特征生成单元、图像特征反变换单元、第一反量化单元、第二反量化单元、质量增强单元。第一反量化单元、第二反量化单元、质量增强单元为可选单元,在某些场景下可以选择关闭或跳过这些可选单元的过程。
本实施例中,对于每个当前块(即图像块)来说,当前块对应的码流包括三部分:码流0(含有控制参数的码流)、码流1(含有系数超参特征信息的码流)、码流2(含有图像特征信息的码流)。系数超参特征信息和图像特征信息可以统称为图像信息。
示例性的,本实施例中的基于神经网络的图像解码方法,可以包括以下步骤S51-S58。
步骤S51、解码当前块对应的码流0,获得当前块对应的控制参数。比如说,控制参数解码单元可以解码当前块对应的码流0,获得当前块对应的控制参数,即控制参数解码单元可以从码流0中解码控制参数,该控制参数可以包括第一反量化单元的控制参数、第二反量化单元的控制参数、系数超参特征生成单元的控制参数、图像特征反变换单元的控制参数、质量增强单元的控制参数,关于控制参数的内容,可以参见后续实施例,在此不再赘述。
步骤S52、解码当前块对应的码流1,获得当前块对应的系数超参特征信息。比如说,第一特征解码单元可以解码当前块对应的码流1,获得当前块对应的系数超参特征信息,即第一特征解码单元可以从码流1中解码系数超参特征信息。
步骤S53、基于该系数超参特征信息确定系数超参特征系数重建值。
示例性的,若该控制参数包括第一反量化单元对应的第一使能信息,该第一使能信息可以表示使能第一反量化单元(即使能第一反量化单元执行第一反量化操作),该第一使能信息也可以表示不使能第一反量化单元。比如说,若第一使能信息为第一取值,则表示使能第一反量化单元,若第一使能信息为第二取值,则表示不使能第一反量化单元。
示例性的,若第一使能信息表示使能第一反量化单元,则系数超参特征信息可以为系数超参特征系数量化值C_q,第一反量化单元可以对系数超参特征系数量化值C_q进行反量化,得到系数超参特征系数重建值C’。若第一使能信息表示不使能第一反量化单元,则系数超参特征信息可以为系数超参特征系数重建值C’,即直接从码流1中解码出系数超参特征系数重建值C’。
步骤S54、对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。比如说,系数超参特征生成单元对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P,如系数超参特征生成单元基于解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。
在一种可能的实施方式中,系数超参特征生成单元可以从控制参数中获取系数超参特征生成单元对应的神经网络信息1,并基于神经网络信息1生成系数超参特征生成单元对应的解码神经网络1。
此外,系数超参特征生成单元可以确定系数超参特征生成单元对应的输入特征(如系数超参特征系数重建值C’),并基于解码神经网络1对该输入特征进行处理(如对系数超参特征系数重建值C’进行反变换操作),得到系数超参特征生成单元对应的输出特征(如系数超参特征值P)。
示例性的,神经网络信息1可以包括基本层信息和增强层信息,系数超参特征生成单元可以基于该基本层信息确定系数超参特征生成单元对应的基本层,并基于该增强层信息确定系数超参特征生成单元对应的增强层。系数超参特征生成单元可以基于基本层和增强层生成系数超参特征生成单元对应的解码神经网络1,比如说,可以将基本层和增强层组合起来,得到解码神经网络1。
在得到解码神经网络1之后,系数超参特征生成单元就可以通过解码神经网络1对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P,对此反变换操作过程不做限制。
步骤S55、解码当前块对应的码流2,获得当前块对应的图像特征信息。比如说,第二特征解码单元可以解码当前块对应的码流2,获得当前块对应的图像特征信息,即第二特征解码单元可以从码流2中解码图像特征信息。在解码当前块对应的码流2时,第二特征解码单元可以利用系数超参特征值P解码当前块对应的码流2,对此解码过程不做限制。
步骤S56、基于该图像特征信息确定图像特征重建值。
示例性的,若该控制参数包括第二反量化单元对应的第二使能信息,该第二使能信息可以表示使能第二反量化单元(即使能第二反量化单元执行第二反量化操作),该第二使能信息也可以表示不使能第二反量化单元。比如说,若第二使能信息为第一取值,则表示使能第二反量化单元,若第二使能信息为第二取值,则表示不使能第二反量化单 元。
示例性的,若第二使能信息表示使能第二反量化单元,则图像特征信息可以为图像特征量化值F_q,第二反量化单元可以获取图像特征量化值F_q,并对图像特征量化值F_q进行反量化,得到图像特征重建值F’。若第二使能信息表示不使能第二反量化单元,则图像特征信息可以为图像特征重建值F’,即直接从码流2中解码出图像特征重建值F’。
步骤S57、对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。比如说,图像特征反变换单元对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF,如基于解码神经网络对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。
在一种可能的实施方式中,图像特征反变换单元可以从控制参数中获取图像特征反变换单元对应的神经网络信息2,并基于神经网络信息2生成图像特征反变换单元对应的解码神经网络2。图像特征反变换单元可以确定图像特征反变换单元对应的输入特征(如图像特征重建值F’),并基于解码神经网络2对该输入特征进行处理(如对图像特征重建值F’进行反变换操作),得到图像特征反变换单元对应的输出特征(如图像低阶特征值LF)。
示例性的,神经网络信息2可以包括基本层信息和增强层信息,图像特征反变换单元可以基于基本层信息确定图像特征反变换单元对应的基本层,基于增强层信息确定图像特征反变换单元对应的增强层。图像特征反变换单元可以基于基本层和增强层生成图像特征反变换单元对应的解码神经网络2,比如说,可以将基本层和增强层组合起来,得到解码神经网络2。
在得到解码神经网络2之后,图像特征反变换单元就可以通过解码神经网络2对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF,对此反变换操作过程不做限制。
步骤S58、基于图像低阶特征值LF确定当前块对应的重建图像块I。
示例性的,若该控制参数包括质量增强单元对应的第三使能信息,该第三使能信息可以表示使能质量增强单元(即使能质量增强单元执行质量增强操作),该第三使能信息也可以表示不使能质量增强单元。比如说,若第三使能信息为第一取值,则可以表示使能质量增强单元,若第三使能信息为第二取值,则可以表示不使能质量增强单元。
示例性的,若第三使能信息表示使能质量增强单元,则质量增强单元获取图像低阶特征值LF,对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I。若第三使能信息表示不使能质量增强单元,则将图像低阶特征值LF作为当前块对应的重建图像块I。
示例性的,质量增强单元对图像低阶特征值LF进行增强处理时,质量增强单元可以基于解码神经网络对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I。
在一种可能的实施方式中,质量增强单元可以从控制参数中获取质量增强单元对应的神经网络信息3,并基于神经网络信息3生成质量增强单元对应的解码神经网络3。
质量增强单元可以确定质量增强单元对应的输入特征(如图像低阶特征值LF),并基于解码神经网络3对该输入特征进行处理(如对图像低阶特征值LF进行增强处理),得到质量增强单元对应的输出特征(如当前块对应的重建图像块I)。
示例性的,神经网络信息3可以包括基本层信息和增强层信息,质量增强单元可以基于基本层信息确定质量增强单元对应的基本层,基于增强层信息确定质量增强单元对应的增强层。质量增强单元可以基于基本层和增强层生成质量增强单元对应的解码神经网络3,比如说,可以将基本层和增强层组合起来,得到解码神经网络3。
在得到解码神经网络3之后,质量增强单元就可以通过解码神经网络3对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I,对此增强处理过程不做限制。
实施例6:本公开实施例中提出一种基于神经网络的图像解码方法,可以应用于解码端,参见图6B所示,为解码端的结构示意图,解码端可以包括控制参数解码单元、第一特征解码单元、第二特征解码单元、系数超参特征生成单元、图像特征反变换单元。本实施例中,对于每个当前块(即图像块)来说,当前块对应的码流包括三部分:码流0(含有控制参数的码流)、码流1(含有系数超参特征信息的码流)、码流2(含有图像特征信息的码流)。
示例性的,本实施例中的基于神经网络的图像解码方法,可以包括以下步骤S61-S66。
步骤S61、控制参数解码单元解码当前块对应的码流0,获得当前块对应的控制参数。
步骤S62、第一特征解码单元解码当前块对应的码流1,获得当前块对应的系数超参特征信息,该系数超参特征信息可以为系数超参特征系数重建值C’。
步骤S63、系数超参特征生成单元对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P,如基于解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。
步骤S64、第二特征解码单元解码当前块对应的码流2,获得当前块对应的图像特征信息,该图像特征信息可以为图像特征重建值F’。示例性的,第二特征解码单元可以获取系数超参特征值P,并利用系数超参特征值P解码当前块对应的码流2,得到图像特征重建值F’。
步骤S65、图像特征反变换单元对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF,如基于解码神经网络对图像特征重建值F’进行反变换操作得到图像低阶特征值LF。
步骤S66、基于图像低阶特征值LF确定当前块对应的重建图像块I。
比如说,可以直接将图像低阶特征值LF作为重建图像块I。或者,若解码端设备还包括质量增强单元,则由质量增强单元对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I,如基于解码神经网络对图像低阶特征值LF进行增强处理,得到重建图像块I。
实施例7:针对实施例5和实施例6,第一特征解码单元可以解码当前块对应的码流1,获得当前块对应的系数超参特征信息。比如说,第一特征解码单元至少包括一个系数解码模块,在一种可能的实施方式中,系数解码模块可以采用熵解码方法进行系数解码,即采用熵解码方法解码当前块对应的码流1,获得当前块对应的系数超参特征信息。
示例性的,熵解码方法可以包括但不限于CAVLC(Context-Adaptive Varialbe Length Coding,基于内容的自适应变长编码)或CABAC(Context-based Adaptive Binary Arithmetic Coding,基于上下文的自适应二进制算术编码)等熵解码方法,对此不做限制。
示例性的,在采用熵解码方法进行系数解码时,熵解码的概率模型可以采用预设概率模型,预设概率模型可以根据实际需要进行配置,对此预设概率模型不做限制。比如说,基于预设概率模型,系数解码模块可以采用熵解码方法进行系数解码。
实施例8:针对实施例5,第一反量化单元可以对系数超参特征系数量化值C_q(即系数超参特征信息)进行反量化,得到系数超参特征系数重建值C’。比如说,第一反量化单元可以不存在,或者,若第一反量化单元存在,则可以基于控制参数(如高层语法,如第一使能信息等)选择性跳过第一反量化单元,或基于控制参数确定使能第一反量化单元。
示例性的,若第一反量化单元不存在,则系数超参特征系数重建值C’与系数超参特征系数量化值C_q相同,即不需要对系数超参特征系数量化值C_q进行反量化。若基于控制参数选择性跳过第一反量化单元,则系数超参特征系数重建值C’与系数超参特征系数量化值C_q相同,即不需要对系数超参特征系数量化值C_q进行反量化。若基于控制参数确定使能第一反量化单元,但是,系数超参特征系数量化值C_q对应的步长参数qstep为1,则系数超参特征系数重建值C’与系数超参特征系数量化值C_q相同,即不需要对系数超参特征系数量化值C_q进行反量化。
示例性的,若基于控制参数确定使能第一反量化单元,且系数超参特征系数量化值C_q对应的步长参数qstep不为1,则第一反量化单元可以基于控制参数(如量化相关参数)对系数超参特征系数量化值C_q进行反量化,得到系数超参特征系数重建值C’。比如说,第一反量化单元进行如下操作:从控制参数中获取系数超参特征系数量化值C_q对应的量化相关参数(在码流中包括控制参数,且该控制参数可以包括量化相关参数),如步长参数qstep或量化参数qp。基于步长参数qstep或量化参数qp确定系数超参特征系数量化值C_q对应的乘法因子mult和移位因子shift;假设系数超参特征系数量化值C_q为Coff_hyper,系数超参特征系数重建值C’为Coff_hyper_rec,则Coff_hyper_rec=(Coff_hyper*mult)<<shift。综上所述,在对系数超参特征系数量化值C_q进行反量化时,可以采用上述公式得到系数超参特征系数重建值C’。
需要说明的是,针对系数超参特征系数量化值C_q对应的量化相关参数(如步长参数qstep),可以包括:1)每个特征通道的每个系数超参特征系数量化值都采用相同的步长参数qstep;2)每个特征通道的系数超参特征系数量化值采用不同的步长参数qstep,但是,特征通道内的每个系数超参特征系数量化值采用相同的步长参数qstep;3)每个特征通道的每个系数超参特征系数量化值都采用不同的步长参数qstep。在上述过程中,步长参数qstep也可以称为量化步长。
实施例9:针对实施例5和实施例6,系数超参特征生成单元可以基于解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。在一种可能的实施方式中,参见图6C所示,系数超参特征生成单元可以包括解码神经网络1,解码神经网络1可以包括基本层和增强层,系数超参特征系数重建值C’作为解码神经网络1的输入特征,系数超参特征值P作为解码神经网络1的输出特征,解码神经网络1用于对系数超参特征系数重建值C’进行反变换操作。
本实施例中,将解码神经网络1划分为基本层和增强层,基本层可以包括至少一个网络层,基本层也可以不包括网络层,即基本层为空。增强层可以包括至少一个网络层,增强层也可以不包括网络层,即增强层为空。需要注意的是,对于解码神经网络1中的多个网络层来说,可以根据实际需要将多个网络层划分为基本层和增强层,比如说,将前面M1个网络层作为基本层,将剩余网络层作为增强层,或者,将前面M2个网络层作为增强层,将剩余网络层作为基本层,或者,将后面M3个网络层作为基本层,将剩余网络层作为增强层,或者,将后面M4个网络层作为增强层,将剩余网络层作为基本层,或者,将第奇数个网络层作为基本层,将剩余网络层作为增强层,或者,将第偶数个网络层作为基本层,将剩余网络层作为增强层。当然,上述只是几个示例,对此划分方式不做限制。
比如说,可以将网络结构固定的网络层作为基本层,将网络结构不固定的网络层作为增强层。例如,对于解码神经网络1中某个网络层,若对多个图像块进行解码时,该网络层会采用相同网络结构,则将该网络层作为网络结构固定的网络层,将该网络层作为基本层。又例如,对于解码神经网络1中某个网络层,若对多个图像块进行解码时,该网络层会采用不同网络结构,则将该网络层作为网络结构不固定的网络层,将该网络层作为增强层。
示例性的,对于解码神经网络1来说,输出特征的尺寸可以大于输入特征的尺寸,或者,输出特征的尺寸可以等于输入特征的尺寸,或者,输出特征的尺寸可以小于输入特征的尺寸。
对于解码神经网络1来说,基本层和增强层中至少包括1个反卷积层。比如说,基本层至少包括1个反卷积层,增强层可以包括至少包括1个反卷积层,或不包括反卷积层。或者,增强层至少包括1个反卷积层,基本层可以包括至少包括1个反卷积层,或不包括反卷积层。
示例性的,对于解码神经网络1来说,可以包括但不限于反卷积层和激活层等,对此不做限制。比如说,解码神经网络1依次包括1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层。可以将上述所有网络层均作为基本层,即基本层包括1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层,在该情况下,增强层为空,当然,也可以将部分网络层作为增强层,对此不做限制。又例如,解码神经网络1依次包括1个stride为2的反卷积层、1个stride为1的反卷积层。可以将上述所有网络层均作为基本层,即基本层包括1个stride为2的反卷积层、1个stride为1的反卷积层,在该情况下,增强层为空,当然,也可以将部分网络层作为增强层,对此不做限制。又例如,解码神经网络1依次包括1个stride为2的反卷积层、1个激活层、1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层。可以将上述所有网络层均作为基本层,即基本层包括1个stride为2的反卷积层、1个激活层、1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层,在该情况下,增强层为空,当然,也可以将部分网络层作为增强层,对此不做限制。当然,上述只是几个示例,对此不做限制。
在一种可能的实施方式中,可以为系数超参特征生成单元配置默认网络结构的网络层(可以由至少一个网络层组成默认网络结构的网络层),与默认网络结构的网络层有关的网络参数均为固定。比如说,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层的数量、激活层标志位、激活层类型,也就是说,上述网络参数均为固定。比如说,在默认网络结构的网络层中,反卷积层的数量为固定,激活层的数量为固定,每个反卷积层的通道数为固定,卷积核的尺寸为固定,滤波系数为固定等,如反卷积层的通 道数为4、8、16、32、64、128、或256等,卷积核的尺寸为1*1、3*3或5x5等。显然,由于默认网络结构的网络层中的网络参数均为固定,且已知这些网络参数,因此,可以直接获得默认网络结构的网络层。
在一种可能的实施方式中,可以为系数超参特征生成单元配置预制神经网络池,该预制神经网络池可以包括至少一个预制网络结构的网络层(可以由至少一个网络层组成预制网络结构的网络层),与预制网络结构的网络层有关的网络参数均可以根据实际需要进行配置,比如说,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层的数量、激活层标志位、激活层类型,也就是说,上述网络参数均可以根据实际需要进行配置。
比如说,预制神经网络池可以包括预制网络结构s1的网络层、预制网络结构s2的网络层、预制网络结构s3的网络层。其中,对于预制网络结构s1的网络层,可以预先配置反卷积层的数量、激活层的数量、每个反卷积层的通道数、卷积核的尺寸、滤波系数等网络参数,在所有网络参数均配置完成后,就可以获得预制网络结构s1的网络层。同理,可以获得预制网络结构s2的网络层和预制网络结构s3的网络层,在此不再重复赘述。
在一种可能的实施方式中,可以基于网络参数为系数超参特征生成单元动态生成可变网络结构的网络层(可以由至少一个网络层组成可变网络结构的网络层),与可变网络结构的网络层有关的网络参数是编码端动态生成的,而不是预先配置的,比如说,编码端可以将系数超参特征生成单元对应的网络参数发送给解码端,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层的数量、激活层标志位、激活层类型。解码端基于上述网络参数动态生成可变网络结构的网络层。比如说,编码端在码流中编码反卷积层的数量、激活层的数量、每个反卷积层的通道数、卷积核的尺寸、滤波系数等网络参数,解码端可以从码流中解析出上述网络参数,并基于这些网络参数生成可变网络结构的网络层,对此生成过程不做限制。
在一种可能的实施方式中,可以将解码神经网络1划分为基本层和增强层,关于基本层和增强层的组合方式,可以包括但不限于如下方式:方式1、基本层采用默认网络结构的网络层,增强层采用默认网络结构的网络层,将该基本层和该增强层组成解码神经网络1。方式2、基本层采用默认网络结构的网络层,增强层采用预制网络结构的网络层,将该基本层和该增强层组成解码神经网络1。方式3、基本层采用默认网络结构的网络层,增强层采用可变网络结构的网络层,将该基本层和该增强层组成解码神经网络1。方式4、基本层采用预制网络结构的网络层,增强层采用默认网络结构的网络层,将该基本层和该增强层组成解码神经网络1。方式5、基本层采用预制网络结构的网络层,增强层采用预制网络结构的网络层,将该基本层和该增强层组成解码神经网络1。方式6、基本层采用预制网络结构的网络层,增强层采用可变网络结构的网络层,将该基本层和该增强层组成解码神经网络1。
在一种可能的实施方式中,控制参数可以包括系数超参特征生成单元对应的神经网络信息1,系数超参特征生成单元可以从控制参数中解析出神经网络信息1,并基于神经网络信息1生成解码神经网络1。比如说,神经网络信息1可以包括基本层信息和增强层信息,可以基于该基本层信息确定基本层,并基于该增强层信息确定增强层,并将基本层和增强层组合起来,得到解码神经网络1。比如说,可以采用如下情况得到系数超参特征生成单元对应的神经网络信息1:
情况1、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,系数超参特征生成单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层(即默认网络结构的网络层)。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,在该情况下,系数超参特征生成单元基于增强层信息获知增强层采用默认网络结构的网络层,因此,获取默认网络结构的增强层(即默认网络结构的网络层)。在此基础上,可以将默认网络结构的基本层和默认网络结构的增强层组合起来,得到解码神经网络1。
情况2、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,系数超参特征生成单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层(即默认网络结构的网络层)。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号, 且增强层使用预制网络标志位表示增强层使用预制网络,在该情况下,系数超参特征生成单元基于增强层信息获知增强层采用预制网络结构的网络层,因此,从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层(如增强层预制网络索引号为0时,预制网络结构s1的网络层作为增强层,增强层预制网络索引号为1时,预制网络结构s2的网络层作为增强层)。在此基础上,可以将默认网络结构的基本层和预制网络结构的增强层组合起来,得到解码神经网络1。
情况3、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,系数超参特征生成单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层(即默认网络结构的网络层)。增强层信息包括用于生成增强层的网络参数,在该情况下,系数超参特征生成单元可以从控制参数中解析出网络参数,并基于该网络参数生成可变网络结构的增强层(即可变网络结构的网络层)。其中,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层的数量、激活层标志位、激活层类型,系数超参特征生成单元可以基于上述网络参数生成可变网络结构的增强层。比如说,系数超参特征生成单元从控制参数中解析出反卷积层的数量、每个反卷积层的量化步长(stride)、每个反卷积层的通道数、卷积核的尺寸、激活层类型等网络参数,系数超参特征生成单元基于这些网络参数生成可变网络结构的增强层。在此基础上,系数超参特征生成单元可以将默认网络结构的基本层和可变网络结构的增强层组合起来,得到解码神经网络1。
情况4、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,系数超参特征生成单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层(如基本层预制网络索引号为0时,预制网络结构s1的网络层作为基本层,基本层预制网络索引号为1时,预制网络结构s2的网络层作为基本层)。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,在该情况下,系数超参特征生成单元基于增强层信息获知增强层采用默认网络结构的网络层,因此,获取默认网络结构的增强层(即默认网络结构的网络层)。在此基础上,可以将预制网络结构的基本层和默认网络结构的增强层组合起来,得到解码神经网络1。
情况5、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,系数超参特征生成单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层(如预制网络结构s1的网络层等)。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,在该情况下,系数超参特征生成单元基于增强层信息获知增强层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层(如预制网络结构s1的网络层等)。在此基础上,可以将预制网络结构的基本层和预制网络结构的增强层组合起来,得到解码神经网络1。
情况6、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,系数超参特征生成单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层(如预制网络结构s1的网络层等)。增强层信息包括用于生成增强层的网络参数,在该情况下,系数超参特征生成单元可以从控制参数中解析出网络参数,并基于该网络参数生成可变网络结构的增强层(即可变网络结构的网络层)。比如说,系数超参特征生成单元可以从控制参数中解析出反卷积层的数量、每个反卷积层的量化步长(stride)、每个反卷积层的通道数、卷积核的尺寸、激活层类型等网络参数,系数超参特征生成单元基于这些网络参数生成可变网络结构的增强层。在此基础上,系数超参特征生成单元可以将预制网络结构的基本层和可变网络结构的增强层组合起来,得到解码神经网络1。
在一种可能的实施方式中,针对基本层和增强层的上述网络结构,均可以通过解码的控制参数确定,控制参数可以包括神经网络信息1,且神经网络信息1可以包括基本层信息和增强层信息,针对系数超参特征生成单元的神经网络信息1的一个示例,可以参见表1所示,在表1中,u(n)表示n位定长码编码方法,ae(v)表示变长编码方法。
表1
在表1中,hyper_basic_layer_use_default_para_flag为针对系数超参特征生成单元的基本层使用默认网络标志位,hyper_basic_layer_use_default_para_flag为二值变量,值为1表示系数超参特征生成单元的基本层使用默认网络,值为0表示系数超参特征生成单元的基本层不使用默认网络,HyperBasicLayerUseDefaultParaFlag的值等于hyper_basic_layer_use_default_para_flag的值。
在表1中,hyper_basic_layer_use_predesigned_para_flag为针对系数超参特征生成单元的基本层使用预制网络标志位,hyper_basic_layer_use_predesigned_para_flag为一个二值变量。该二值变量的值为1时,表示系数超参特征生成单元的基本层使用预制网络,而该二值变量的值为0时,表示系数超参特征生成单元的基本层不使用预制网络,HyperBasicLayerUsePredesignedParaFlag的值可以等于hyper_basic_layer_use_predesigned_para_flag的值。
在表1中,hyper_basic_id为针对系数超参特征生成单元的基本层预制网络索引号,可以为32位的无符号整数,表示基本层采用的神经网络在预制神经网络池中的索引号。
在表1中,hyper_enhance_layer_use_default_para_flag为针对系数超参特征生成单元的增强层使用默认网络标志位,hyper_enhance_layer_use_default_para_flag为一个二值变量。该二值变量的值为1时,则表示系数超参特征生成单元的增强层使用默认网络,而该二值变量的值为0时,则表示系数超参特征生成单元的增强层不使用默认网络,HyperEnhanceLayerUseDefaultParaFlag的值可以等于hyper_enhance_layer_use_default_para_flag的值。
在表1中,hyper_enhance_layer_use_predesigned_para_flag为针对系数超参特征生成单元的增强层使用预制网络标志位,hyper_enhance_layer_use_predesigned_para_flag为二值变量。该二值变量的值为1时,表示系数超参特征生成单元的增强层使用预制网络,该二值变量的值为0时,表示系数超参特征生成单元的增强层不使用预制网络,HyperEnhanceLayerUsePredesignedParaFlag的值可以等于hyper_enhance_layer_use_predesigned_para_flag的值。
在表1中,hyper_enhance_id为针对系数超参特征生成单元的增强层预制网络索引号,可以为32位的无符号整数,表示增强层采用的神经网络在预制神经网络池中的索引号。
在上述过程中,hyper_basic_id的范围是[id_min,id_max],id_min优选为0,id_max优选为2^32-1,[a,b]段为保留段,用于后期预制神经网络池的扩充。需要说明的是,针对基本层的预制神经网络池,该预制神经网络池可以包括几个基本层预制网络,如2 个、3个、4个等,可以包括几十个基本层预制网络,还可以包括更多的基本层预制网络,对此不做限制。显然,id_max优选为2^32-1只是示例,不同情况下,id_max的取值可以动态调整。
在上述过程中,hyper_enhance_id的范围是[id_min,id_max],id_min优选为0,id_max优选为2^32-1,[a,b]段为保留段,用于后期预制神经网络池的扩充。需要说明的是,针对增强层的预制神经网络池,该预制神经网络池可以包括几个增强层预制网络,如2个、3个、4个等,可以包括几十个增强层预制网络,还可以包括更多的增强层预制网络,对此不做限制。显然,id_max优选为2^32-1只是示例,不同情况下,id_max的取值可以动态调整。
在一种可能的实施方式中,针对增强层的上述网络结构,可以通过解码的控制参数确定,控制参数可以包括神经网络信息1,且神经网络信息1可以包括增强层信息,针对系数超参特征生成单元的神经网络信息1的一个示例,可以参见表2和表3所示。
表2
表3
在表2中,layer_num表示神经网络层数,用于表示神经网络的网络层数,若激活 层包含在某个网络结构中,则不额外计算层数,LayerNum的值等于layer_num。
在表3中,deconv_layer_flag表示反卷积层标志位,deconv_layer_flag是一个二值变量。当该二值变量的值为1时,表示当前层是反卷积层网络,当该二值变量的值为0时,表示当前层不是反卷积层网络,DeconvLayerFlag的值等于deconv_layer_flag的值。
在表3中,stride_num表示反卷积层的量化步长。
在表3中,filter_num表示滤波数量,即当前层的滤波数量。
在表3中,filter_size_index表示滤波尺寸索引,即当前滤波尺寸索引值。
在表3中,filter_coeff_zero_flag[i][j]表示滤波系数为零标志位,是一个二值变量。当该二值变量的值为1时,表示当前滤波系数为0,当该二值变量的值为0时,表示当前滤波系数不为0,FilterCoeffZeroFlag[i][j]的值等于filter_coeff_zero_flag[i][j]的值。
在表3中,filter_coeff[i][j]表示滤波系,即当前滤波系数值。
在表3中,activation_layer_flag表示激活层标志位,activation_layer_flag是一个二值变量。当该二值变量的值为1时,表示当前层是激活层,当该二值变量的值为0时,表示当前层不是激活层,ActivationLayerFlag的值等于activation_layer_flag的值。
在表3中,activation_layer_type表示激活层类型,即当前层的激活层的具体类型。
实施例10:针对实施例5和实施例6,第二特征解码单元可以解码当前块对应的码流2,获得当前块对应的图像特征信息,比如说,第二特征解码单元至少包括一个系数解码模块和一个概率模型获取模块,在一种可能的实施方式中,系数解码模块可以采用熵解码方法进行系数解码,即采用熵解码方法解码当前块对应的码流2,获得当前块对应的图像特征信息。
示例性的,熵解码方法可以包括但不限于CAVLC或CABAC等,对此不做限制。
示例性的,系数超参特征生成单元生成的特征用于第二特征解码单元的方式,可以包括:
方式1、概率模型获取模块用于获取熵解码的概率模型,比如说,概率模型获取模块从系数超参特征生成单元获取系数超参特征值P。在此基础上,系数解码模块可以从概率模型获取模块获取系数超参特征值P,基于系数超参特征值P,系数解码模块可以采用熵解码方法进行系数解码。
方式2、系数解析过程(如CABAC或CAVCL解码过程)并不依赖于系数超参特征生成单元生成的特征,可直接解析得到系数值(这样可以保证解析吞吐率或速率)。基于系数超参特征生成单元生成的特征再对解析出的系数值进行转换,以得到图像特征量化值F_q。一个例子为,如系数解析过程得到的系数值是0,而系数超参特征生成单元生成的对应特征值为u,则F_q=u;如系数解析过程得到的系数值是1,而系数超参特征生成单元生成的对应特征值为u,则F_q=u+x,x为对应的系数方差。
实施例11:针对实施例5,第二反量化单元可以对图像特征量化值F_q(即图像特征信息)进行反量化,得到图像特征重建值F’。比如说,第二反量化单元可以不存在,或者,若第二反量化单元存在,则可以基于控制参数(如高层语法,如第二使能信息等)选择性跳过第二反量化单元,或基于控制参数确定使能第二反量化单元。示例性的,若第二反量化单元不存在,则图像特征重建值F’与图像特征量化值F_q相同,即不需要对图像特征量化值F_q进行反量化。若基于控制参数选择性跳过第二反量化单元,则图像特征重建值F’与图像特征量化值F_q相同,即不需要对图像特征量化值F_q进行反量化。若基于控制参数确定使能第二反量化单元,但是,图像特征量化值F_q对应的步长参数qstep为1,则图像特征重建值F’与图像特征量化值F_q相同,即不需要对图像特征量化值F_q进行反量化。
示例性的,若基于控制参数确定使能第二反量化单元,且图像特征量化值F_q对应的步长参数qstep不为1,则第二反量化单元可以基于控制参数(如量化相关参数)对图像特征量化值F_q进行反量化,得到图像特征重建值F’。比如说,第二反量化单元进行如下操作:从控制参数中获取图像特征量化值F_q对应的量化相关参数(在码流中包括控制参数,且该控制参数可以包括量化相关参数),如步长参数qstep或量化参数qp。基于步长参数qstep或量化参数qp确定图像特征量化值F_q对应的乘法因子mult和移位因子shift;假设图像特征量化值F_q为Coif_hyper,图像特征重建值F’为Coif_hyper_rec,则Coif_hyper_rec=(Coif_hyper*mult)<<shift,也就是说,在进行反量化时,可以采用上述公式得到图像特征重建值F’。
需要说明的是,针对图像特征量化值F_q对应的量化相关参数(如步长参数qstep),包括:1)每个特征通道的每个图像特征量化值都采用相同的步长参数qstep;2)每个特征通道的图像特征量化值采用不同的步长参数qstep,但是,特征通道内的每个图像特征量化值采用相同的步长参数qstep;3)每个特征通道的每个图像特征量化值都采用不同的步长参数qstep。
实施例12:针对实施例5和实施例6,图像特征反变换单元可以基于解码神经网络对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。在一种可能的实施方式中,参见图6D所示,图像特征反变换单元可以包括解码神经网络2,解码神经网络2可以包括基本层和增强层,图像特征重建值F’作为解码神经网络2的输入特征,图像低阶特征值LF作为解码神经网络2的输出特征,解码神经网络2用于对图像特征重建值F’进行反变换操作。
本实施例中,可以将解码神经网络2划分为基本层和增强层,基本层可以包括至少一个网络层,基本层也可以不包括网络层,即基本层为空。增强层可以包括至少一个网络层,增强层也可以不包括网络层,即增强层为空。需要注意的是,对于解码神经网络2中的多个网络层来说,可以根据实际需要将多个网络层划分为基本层和增强层。比如说,可以将网络结构固定的网络层作为基本层,可以将网络结构不固定的网络层作为增强层。
示例性的,对于解码神经网络2来说,输出特征的尺寸可以大于输入特征的尺寸,或者,输出特征的尺寸可以等于输入特征的尺寸,或者,输出特征的尺寸可以小于输入特征的尺寸。
对于解码神经网络2来说,基本层和增强层中至少包括1个反卷积层。比如说,基本层至少包括1个反卷积层,增强层可以包括至少包括1个反卷积层,或不包括反卷积层。或者,增强层至少包括1个反卷积层,基本层可以包括至少包括1个反卷积层,或不包括反卷积层。
对于解码神经网络2来说,基本层和增强层中至少包括1个残差结构层。比如说,基本层至少包括1个残差结构层,增强层可以包括残差结构层或不包括残差结构层。或者,增强层至少包括1个残差结构层,基本层可以包括残差结构层或不包括残差结构层。
示例性的,对于解码神经网络2来说,可以包括但不限于反卷积层和激活层等,对此不做限制。比如说,解码神经网络2依次包括1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层。可以将上述所有网络层均作为基本层,增强层为空。又例如,解码神经网络2依次包括1个stride为2的反卷积层、1个stride为1的反卷积层。可以将上述所有网络层均作为基本层,增强层为空。又例如,解码神经网络2依次包括1个stride为2的反卷积层、1个激活层、1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层。可以将上述所有网络层均作为基本层,增强层为空。
示例性的,若不存在质量增强单元,则图像特征反变换单元的最后一个网络层的输出特征数(滤波器数量)为1或3。具体的,若输出仅为一个通道的值(如灰度图),则图像特征反变换单元的最后一个网络层的输出特征数为1;若输出仅为三个通道的值(如RGB或YUV格式),则图像特征反变换单元的最后一个网络层的输出特征数为3。
示例性的,若存在质量增强单元,则图像特征反变换单元的最后一个网络层的输出特征数(滤波器数量)可以为1或3,也可以是其它数值,对此不做限制。
在一种可能的实施方式中,可以为图像特征反变换单元配置默认网络结构的网络层,与默认网络结构的网络层有关的网络参数均为固定。比如说,在默认网络结构的网络层中,反卷积层的数量为固定,激活层的数量为固定,每个反卷积层的通道数为固定,卷积核的尺寸为固定,滤波系数为固定等。显然,由于默认网络结构的网络层中的网络参数均为固定,且已知这些网络参数,因此,可以直接获得默认网络结构的网络层。
在一种可能的实施方式中,可以为图像特征反变换单元配置预制神经网络池,预制神经网络池包括至少一个预制网络结构的网络层,与预制网络结构的网络层有关的网络参数均可以根据实际需要配置。比如说,预制神经网络池包括预制网络结构t1的网络层、预制网络结构t2的网络层、预制网络结构t3的网络层。其中,对于预制网络结构t1的网络层,可以预先配置反卷积层的数量、激活层的数量、每个反卷积层的通道数、卷积核的尺寸、滤波系数等网络参数,在所有网络参数均配置完成后,可以获得预制网络结构t1的网络层,以此类推。
在一种可能的实施方式中,可以基于网络参数为图像特征反变换单元动态生成可变网络结构的网络层,与可变网络结构的网络层有关的网络参数是编码端动态生成的,而 不是预先配置的。比如说,编码端可以在码流中编码反卷积层的数量、激活层的数量、每个反卷积层的通道数、卷积核的尺寸、滤波系数等网络参数,因此,解码端可以从码流中解析出上述网络参数,并基于这些网络参数生成可变网络结构的网络层。
在一种可能的实施方式中,控制参数可以包括图像特征反变换单元对应的神经网络信息2,图像特征反变换单元可以从控制参数中解析出神经网络信息2,并基于神经网络信息2生成解码神经网络2。比如说,神经网络信息2可以包括基本层信息和增强层信息,可以基于该基本层信息确定基本层,并基于该增强层信息确定增强层,并将基本层和增强层组合起来,得到解码神经网络2。比如说,可以采用如下情况得到神经网络信息2:
情况1、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,图像特征反变换单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,在该情况下,基于增强层信息获知增强层采用默认网络结构的网络层,因此,获取默认网络结构的增强层。将默认网络结构的基本层和默认网络结构的增强层组合起来,得到解码神经网络2。
情况2、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,图像特征反变换单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,在该情况下,基于增强层信息获知增强层采用预制网络结构的网络层,因此,从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层。可以将默认网络结构的基本层和预制网络结构的增强层组合起来,得到解码神经网络2。
情况3、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,图像特征反变换单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层。增强层信息包括用于生成增强层的网络参数,在该情况下,从控制参数中解析出网络参数,并基于该网络参数生成可变网络结构的增强层。将默认网络结构的基本层和可变网络结构的增强层组合起来,得到解码神经网络2。
情况4、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,图像特征反变换单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,在该情况下,基于增强层信息获知增强层采用默认网络结构的网络层,因此,获取默认网络结构的增强层。可以将预制网络结构的基本层和默认网络结构的增强层组合起来,得到解码神经网络2。
情况5、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,图像特征反变换单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,在该情况下,基于增强层信息获知增强层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层。可以将预制网络结构的基本层和预制网络结构的增强层组合起来,得到解码神经网络2。
情况6、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,图像特征反变换单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。增强层信息包括用于生成增强层的网络参数,在该情况下,从控制参数中解析出网络参数,并基于该网络参数生成可变网络结构的增强层。将预制网络结构的基本层和可变网络结构的增强层组合起来,得到解码神经网络2。
在一种可能的实施方式中,针对基本层和增强层的上述网络结构,均可以通过解码的控制参数确定,控制参数可以包括神经网络信息2,且神经网络信息2可以包括基本 层信息和增强层信息,针对图像特征反变换单元的神经网络信息2与表1、表2、表3类似,只是相关信息是针对图像特征反变换单元,而不是针对系数超参特征生成单元,在此不再重复赘述。
实施例13:针对实施例5和实施例6,质量增强单元可以获取图像低阶特征值LF,基于解码神经网络对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I。
比如说,质量增强单元可以不存在,或者,若质量增强单元存在,则可以基于控制参数(如高层语法,如第三使能信息等)选择性跳过质量增强单元,或基于控制参数确定使能质量增强单元。示例性的,若基于控制参数确定使能质量增强单元,则该质量增强单元可以用于去除块间的块效应、量化失真等图像质量劣化问题,比如说,质量增强单元可以基于解码神经网络对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I。
在一种可能的实施方式中,质量增强单元可以包括解码神经网络3,解码神经网络3可以包括基本层和增强层,图像低阶特征值LF作为解码神经网络3的输入特征,重建图像块I作为解码神经网络3的输出特征,解码神经网络3用于对图像低阶特征值LF进行增强处理。
本实施例中,可以将解码神经网络3划分为基本层和增强层,基本层可以包括至少一个网络层,基本层也可以不包括网络层,即基本层为空。增强层可以包括至少一个网络层,增强层也可以不包括网络层,即增强层为空。需要注意的是,对于解码神经网络3中的多个网络层来说,可以根据实际需要将多个网络层划分为基本层和增强层。比如说,可以将网络结构固定的网络层作为基本层,可以将网络结构不固定的网络层作为增强层。
示例性的,对于解码神经网络3来说,输出特征的尺寸可以大于输入特征的尺寸,或者,输出特征的尺寸可以等于输入特征的尺寸,或者,输出特征的尺寸可以小于输入特征的尺寸。
对于解码神经网络3来说,基本层和增强层中至少包括1个反卷积层。比如说,基本层至少包括1个反卷积层,增强层可以包括至少包括1个反卷积层,或不包括反卷积层。或者,增强层至少包括1个反卷积层,基本层可以包括至少包括1个反卷积层,或不包括反卷积层。
对于解码神经网络3来说,基本层和增强层中至少包括1个残差结构层。比如说,基本层至少包括1个残差结构层,增强层可以包括残差结构层或不包括残差结构层。或者,增强层至少包括1个残差结构层,基本层可以包括残差结构层或不包括残差结构层。
示例性的,对于解码神经网络3来说,可以包括但不限于反卷积层和激活层等,对此不做限制。比如说,解码神经网络3依次包括1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层。可以将上述所有网络层均作为基本层,增强层为空。又例如,解码神经网络3依次包括1个stride为2的反卷积层、1个stride为1的反卷积层。可以将上述所有网络层均作为基本层,增强层为空。又例如,解码神经网络3依次包括1个stride为2的反卷积层、1个激活层、1个stride为2的反卷积层、1个激活层、1个stride为1的反卷积层、1个激活层。可以将上述所有网络层均作为基本层,增强层为空。
示例性的,质量增强单元的最后一个网络层的输出特征数(滤波器数量)为1或3。具体的,若输出仅为一个通道的值(如灰度图),则最后一个网络层的输出特征数为1;若输出仅为三个通道的值(如RGB或YUV格式),则最后一个网络层的输出特征数为3。
在一种可能的实施方式中,可以为质量增强单元配置默认网络结构的网络层,与默认网络结构的网络层有关的网络参数均为固定。可以为质量增强单元配置预制神经网络池,预制神经网络池包括至少一个预制网络结构的网络层,与预制网络结构的网络层有关的网络参数均可以根据实际需要配置。可以基于网络参数为质量增强单元动态生成可变网络结构的网络层,与可变网络结构的网络层有关的网络参数是编码端动态生成的,而不是预先配置的。
在一种可能的实施方式中,控制参数可以包括质量增强单元对应的神经网络信息3,质量增强单元可以从控制参数中解析出神经网络信息3,并基于神经网络信息3生成解码神经网络3。比如说,神经网络信息3可以包括基本层信息和增强层信息,可以基于该基本层信息确定基本层,并基于该增强层信息确定增强层,并将基本层和增强层组合起来,得到解码神经网络3。比如说,质量增强单元可以采用如下情况得到神经网络信 息3:
情况1、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,质量增强单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,在该情况下,基于增强层信息获知增强层采用默认网络结构的网络层,因此,获取默认网络结构的增强层。将默认网络结构的基本层和默认网络结构的增强层组合起来,得到解码神经网络3。
情况2、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,在该情况下,质量增强单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,在该情况下,基于增强层信息获知增强层采用预制网络结构的网络层,因此,从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层。可以将默认网络结构的基本层和预制网络结构的增强层组合起来,得到解码神经网络3。
情况3、基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络,质量增强单元基于基本层信息获知基本层采用默认网络结构的网络层,因此,获取默认网络结构的基本层。增强层信息包括用于生成增强层的网络参数,在该情况下,从控制参数中解析出网络参数,并基于该网络参数生成可变网络结构的增强层。将默认网络结构的基本层和可变网络结构的增强层组合起来,得到解码神经网络3。
情况4、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,质量增强单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络,在该情况下,基于增强层信息获知增强层采用默认网络结构的网络层,因此,获取默认网络结构的增强层。可以将预制网络结构的基本层和默认网络结构的增强层组合起来,得到解码神经网络3。
情况5、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,质量增强单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,在该情况下,基于增强层信息获知增强层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取增强层预制网络索引号对应的预制网络结构的增强层。可以将预制网络结构的基本层和预制网络结构的增强层组合起来,得到解码神经网络3。
情况6、基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且基本层使用预制网络标志位表示基本层使用预制网络,在该情况下,质量增强单元基于基本层信息获知基本层采用预制网络结构的网络层,因此,可以从预制神经网络池中选取基本层预制网络索引号对应的预制网络结构的基本层。增强层信息包括用于生成增强层的网络参数,在该情况下,从控制参数中解析出网络参数,并基于该网络参数生成可变网络结构的增强层。将预制网络结构的基本层和可变网络结构的增强层组合起来,得到解码神经网络3。
在一种可能的实施方式中,针对基本层和增强层的上述网络结构,均可以通过解码的控制参数确定,控制参数可以包括神经网络信息3,且神经网络信息3可以包括基本层信息和增强层信息,针对质量增强单元的神经网络信息3与表1、表2、表3类似,只是相关信息是针对质量增强单元,而不是针对系数超参特征生成单元,在此不再重复赘述。
实施例14:本公开实施例中提出一种基于神经网络的图像编码方法,该方法可以应用于编码端(也称为视频编码器),参见图7A所示,为编码端的结构示意图,编码端可以包括控制参数编码单元、特征变换单元、系数超参特征变换单元、第一量化单元、第二量化单元、第一特征编码单元、第二特征编码单元、第一特征解码单元、第二特征解码单元、系数超参特征生成单元、图像特征反变换单元、第一反量化单元、第二反量化单元、质量增强单元。
示例性的,第一量化单元、第二量化单元、第一反量化单元、第二反量化单元、质量增强单元为可选单元,在某些场景下可以选择关闭或跳过这些可选单元的过程。
本实施例中,对于每个当前块(即图像块)来说,当前块对应的码流包括三部分:码流0(含有控制参数的码流)、码流1(含有系数超参特征信息的码流)、码流2(含有图像特征信息的码流)。系数超参特征信息和图像特征信息可以统称为图像信息。
示例性的,本实施例中的基于神经网络的图像编码方法,可以包括以下步骤S71-S79。
步骤S71、对当前块I进行特征变换,得到当前块I对应的图像特征值F。比如说,特征变换单元可以对当前块I进行特征变换,得到当前块I对应的图像特征值F。示例性的,特征变换单元可以基于编码神经网络对当前块I进行特征变换,得到当前块I对应的图像特征值F。其中,当前块I作为编码神经网络的输入特征,图像特征值F作为编码神经网络的输出特征。
步骤S72、基于图像特征值F确定图像特征信息。
示例性的,若使能第二量化单元,则第二量化单元可以从特征变换单元获取图像特征值F,并对图像特征值F进行量化,得到图像特征量化值F_q,并基于图像特征量化值F_q确定图像特征信息,即图像特征信息可以为图像特征量化值F_q。在该情况下,控制参数编码单元可以在码流0中编码第二量化单元的第二使能信息,即控制参数包括第二量化单元的第二使能信息,且第二使能信息用于表示已使能第二量化单元。控制参数编码单元还可以在码流0中编码第二量化单元对应的量化相关参数,如步长参数qstep或量化参数qp等。
示例性的,若未使能第二量化单元,则基于图像特征值F确定图像特征信息,即图像特征信息可以为图像特征值F。在该情况下,控制参数编码单元可以在码流0中编码第二量化单元的第二使能信息,且第二使能信息用于表示未使能第二量化单元。
步骤S73、对图像特征值F进行系数超参特征变换,得到系数超参特征系数值C。比如说,系数超参特征变换单元对图像特征值F进行系数超参特征变换,得到系数超参特征系数值C,如基于编码神经网络对图像特征值F进行系数超参特征变换,得到系数超参特征系数值C。其中,图像特征值F作为编码神经网络的输入特征,系数超参特征系数值C作为编码神经网络的输出特征。
步骤S74、基于系数超参特征系数值C确定系数超参特征信息。
示例性的,若使能第一量化单元,则第一量化单元可以从系数超参特征变换单元获取系数超参特征系数值C,并对系数超参特征系数值C进行量化,得到系数超参特征系数量化值C_q,并基于系数超参特征系数量化值C_q确定系数超参特征信息,即系数超参特征信息可以为系数超参特征系数量化值C_q。在该情况下,控制参数编码单元可以在码流0中编码第一量化单元的第一使能信息,即控制参数包括第一量化单元的第一使能信息,且第一使能信息用于表示已使能第一量化单元。控制参数编码单元还可以在码流0中编码第一量化单元对应的量化相关参数。
示例性的,若未使能第一量化单元,则基于系数超参特征系数值C确定系数超参特征信息,即系数超参特征信息可以为系数超参特征系数值C。在该情况下,控制参数编码单元可以在码流0中编码第一量化单元的第一使能信息,且第一使能信息用于表示未使能第一量化单元。
步骤S75、对系数超参特征信息(如系数超参特征系数量化值C_q或者系数超参特征系数值C)进行编码,得到码流1。比如说,第一特征编码单元可以在当前块对应的码流中编码系数超参特征信息,为了区分方便,将包括系数超参特征信息的码流记为码流1。
步骤S76、解码当前块对应的码流1,获得系数超参特征信息(如系数超参特征系数量化值C_q或者系数超参特征系数值C),并基于该系数超参特征信息确定系数超参特征系数重建值。
示例性的,第一特征解码单元可以解码当前块对应的码流1,获得系数超参特征信息。
示例性的,若使能第一量化单元,则系数超参特征信息为系数超参特征系数量化值C_q,在该情况下,第一反量化单元可以对系数超参特征系数量化值C_q进行反量化,得到系数超参特征系数重建值C’,显然,系数超参特征系数重建值C’与系数超参特征系数值C可以相同。若未使能第一量化单元,则系数超参特征信息为系数超参特征系数值C,在该情况下,可以将系数超参特征系数值C作为系数超参特征系数重建值C’。综上所述,可以得到系数超参特征系数重建值C’。
综上可以看出,系数超参特征系数重建值C’与系数超参特征系数值C相同,因此, 还可以省略步骤S76的过程,直接将系数超参特征系数值C作为系数超参特征系数重建值C’。在该情况下,还可以对图7A所示的编码端结构进行改进,得到图7B所示的编码端结构。
步骤S77、对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。比如说,系数超参特征生成单元对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。
步骤S78、基于系数超参特征值P,对图像特征信息(如图像特征量化值F_q或者图像特征值F)进行编码,得到码流2。比如说,第二特征编码单元可以在当前块对应的码流中编码图像特征信息,为了区分方便,将包括图像特征信息的码流记为码流2。
步骤S79、控制参数编码单元获取当前块对应的控制参数,该控制参数可以包括神经网络信息,并在码流中编码当前块对应的控制参数,将包括控制参数的码流记为码流0。
在一种可能的实施方式中,特征变换单元可以采用编码神经网络对当前块I进行特征变换,控制参数编码单元可以基于该编码神经网络的网络结构确定解码端设备的图像特征反变换单元对应的神经网络信息2,该神经网络信息2用于确定图像特征反变换单元对应的解码神经网络2,并在码流0中编码图像特征反变换单元对应的神经网络信息2。
在一种可能的实施方式中,系数超参特征变换单元可以采用编码神经网络对图像特征值F进行系数超参特征变换,控制参数编码单元可以基于该编码神经网络的网络结构确定解码端设备的系数超参特征生成单元对应的神经网络信息1,该神经网络信息1用于确定系数超参特征生成单元对应的解码神经网络1,并在码流0中编码系数超参特征生成单元对应的神经网络信息1。
在一种可能的实施方式中,继续参见图7A和图7B所示,还可以包括以下步骤S80-S86。
步骤S80、解码当前块对应的码流1,获得当前块对应的系数超参特征信息。比如说,第一特征解码单元可以解码当前块对应的码流1,获得当前块对应的系数超参特征信息。
步骤S81、基于该系数超参特征信息确定系数超参特征系数重建值。
比如说,若使能第一量化单元,则系数超参特征信息为系数超参特征系数量化值C_q,在该情况下,第一反量化单元可以对系数超参特征系数量化值C_q进行反量化,得到系数超参特征系数重建值C’。或者,若未使能第一量化单元,则系数超参特征信息为系数超参特征系数值C,在该情况下,可以将系数超参特征系数值C作为系数超参特征系数重建值C’。
步骤S82、对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。比如说,系数超参特征生成单元对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P,如系数超参特征生成单元基于解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。
步骤S83、解码当前块对应的码流2,获得当前块对应的图像特征信息。比如说,第二特征解码单元可以解码当前块对应的码流2,获得当前块对应的图像特征信息。在解码当前块对应的码流2时,第二特征解码单元可以利用系数超参特征值P解码当前块对应的码流2。
步骤S84、基于该图像特征信息确定图像特征重建值。
比如说,若使能第二量化单元,则图像特征信息为图像特征量化值F_q,第二反量化单元可以对图像特征量化值F_q进行反量化,得到图像特征重建值F’。若未使能第二量化单元,则图像特征信息为图像特征值F,可以将图像特征值F作为图像特征重建值F’。
步骤S85、对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。比如说,图像特征反变换单元对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF,如基于解码神经网络对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。
步骤S86、基于图像低阶特征值LF确定当前块对应的重建图像块I。示例性的,若使能质量增强单元,则质量增强单元对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I,如基于解码神经网络对图像低阶特征值LF进行增强处理,得到当 前块对应的重建图像块I。若不使能质量增强单元,则图像低阶特征值LF作为重建图像块I。
示例性的,关于步骤S80-步骤S86,可以参见实施例5,在此不再重复赘述。
在一种可能的实施方式中,系数超参特征生成单元可以采用解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P,控制参数编码单元可以基于该解码神经网络的网络结构确定解码端设备的系数超参特征生成单元对应的神经网络信息1,该神经网络信息1用于确定该系数超参特征生成单元对应的解码神经网络1,并在码流0中编码神经网络信息1。
在一种可能的实施方式中,图像特征反变换单元可以采用解码神经网络对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF,控制参数编码单元可以基于该解码神经网络的网络结构确定解码端设备的图像特征反变换单元对应的神经网络信息2,该神经网络信息2用于确定该图像特征反变换单元对应的解码神经网络2,并在码流0中编码神经网络信息2。
在一种可能的实施方式中,质量增强单元可以采用解码神经网络对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I,控制参数编码单元可以基于该解码神经网络的网络结构确定解码端设备的质量增强单元对应的神经网络信息3,该神经网络信息3用于确定该质量增强单元对应的解码神经网络3,并在码流0中编码神经网络信息3。
实施例15:本公开实施例中提出一种基于神经网络的图像编码方法,该方法可以应用于编码端,参见图7C所示,为编码端的结构示意图,编码端可以包括控制参数编码单元、特征变换单元、系数超参特征变换单元、第一特征编码单元、第二特征编码单元、第一特征解码单元、第二特征解码单元、系数超参特征生成单元、图像特征反变换单元、质量增强单元。
示例性的,本实施例中的基于神经网络的图像编码方法,可以包括以下步骤S91-S99。
步骤S91、对当前块I进行特征变换,得到当前块I对应的图像特征值F。比如说,特征变换单元可以基于编码神经网络对当前块I进行特征变换,得到当前块I对应的图像特征值F。
步骤S92、对图像特征值F进行系数超参特征变换,得到系数超参特征系数值C。比如说,系数超参特征变换单元基于编码神经网络对图像特征值F进行系数超参特征变换,得到系数超参特征系数值C。
步骤S93、对系数超参特征系数值C进行编码,得到码流1。比如说,第一特征编码单元可以在当前块对应的码流中编码系数超参特征系数值C,得到码流1。
步骤S94、基于系数超参特征系数值C确定系数超参特征系数重建值C’,如系数超参特征系数重建值C’与系数超参特征系数值C相同,对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。比如说,系数超参特征生成单元对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。
步骤S95、基于系数超参特征值P,对图像特征值F进行编码,得到码流2。比如说,第二特征编码单元在当前块对应的码流中编码图像特征值F,得到码流2。
步骤S96、控制参数编码单元获取当前块对应的控制参数,该控制参数可以包括神经网络信息,并在码流中编码当前块对应的控制参数,将包括控制参数的码流记为码流0。
步骤S97、解码当前块对应的码流1,获得当前块对应的系数超参特征系数值C。比如说,第一特征解码单元可以解码当前块对应的码流1,获得当前块对应的系数超参特征系数值C。
以及,基于系数超参特征系数值C确定系数超参特征系数重建值C’。
步骤S98、对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。比如说,系数超参特征生成单元基于解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。解码当前块对应的码流2,获得当前块对应的图像特征值F,将图像特征值F作为图像特征重建值F’,如第二特征解码单元利用系数超参特征值P解码当前块对应的码流2。
步骤S99、对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。比如说,图像特征反变换单元基于解码神经网络对图像特征重建值F’进行反变换操作,得 到图像低阶特征值LF。基于图像低阶特征值LF确定当前块对应的重建图像块I。比如说,质量增强单元基于解码神经网络对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I。
实施例16:针对实施例14和实施例15,第一特征编码单元可以在当前块对应的码流1中编码系数超参特征信息,第一特征编码单元的编码过程与第一特征解码单元的解码过程对应,参见实施例7,比如说,第一特征编码单元可以采用熵编码方法(如CAVLC或CABAC等熵编码方法)对系数超参特征信息进行编码,在此不再重复赘述。第一特征解码单元可以解码当前块对应的码流1,获得当前块对应的系数超参特征信息,该过程可以参见实施例7。
针对实施例14和实施例15,第二特征编码单元可以基于系数超参特征值P对图像特征信息进行编码,得到码流2,第二特征编码单元的编码过程与第二特征解码单元的解码过程对应,参见实施例10,比如说,第二特征编码单元可以采用熵编码方法(如CAVLC或CABAC等熵编码方法)对图像特征信息进行编码,在此不再重复赘述。第二特征解码单元可以解码当前块对应的码流2,获得当前块对应的图像特征信息,该过程可以参见实施例10。
针对实施例14和实施例15,第一量化单元可以对系数超参特征系数值C进行量化,得到系数超参特征系数量化值C_q,第一量化单元的量化过程与第一反量化单元的反量化过程对应,参见实施例8,比如说,第一量化单元基于量化相关参数对系数超参特征系数值C进行量化,在此不再重复赘述。需要说明的是,对于量化相关参数(如步长参数qstep,也称为量化步长)来说,1)每个特征通道的每个特征值都采用相同的量化步长;2)每个特征通道采用不同的量化步长,但特征通道内的每个特征值采用相同的量化步长;3)每个特征通道的每个特征值都采用不同的量化步长。编码端的第一反量化单元可以对系数超参特征系数量化值C_q进行反量化,得到系数超参特征系数重建值C’,编码端的第一反量化单元的处理过程可以参见实施例10。
第二量化单元可以对图像特征值F进行量化,得到图像特征量化值F_q,第二量化单元的量化过程与第二反量化单元的反量化过程对应,参见实施例11,比如说,第二量化单元基于量化相关参数对图像特征值F进行量化,在此不再重复赘述。需要说明的是,对于量化步长来说,1)每个特征通道的每个特征值都采用相同的量化步长;2)每个特征通道采用不同的量化步长,但特征通道内的每个特征值采用相同的量化步长;3)每个特征通道的每个特征值都采用不同的量化步长。编码端的第二反量化单元可以对图像特征量化值F_q进行反量化,得到图像特征重建值F’,编码端的第二反量化单元的处理过程可以参见实施例11。
实施例17:针对实施例14和实施例15,特征变换单元可以基于编码神经网络对当前块I进行特征变换,得到当前块I对应的图像特征值F。在此基础上,可以基于该编码神经网络的网络结构确定解码端设备的图像特征反变换单元对应的神经网络信息2,并在码流0中编码图像特征反变换单元对应的神经网络信息2。关于解码端设备如何根据神经网络信息2生成图像特征反变换单元对应的解码神经网络2,可以参见实施例12,在此不再重复赘述。
在一种可能的实施方式中,特征变换单元可以包括编码神经网络2,编码神经网络2可以包括基本层和增强层,可以将编码神经网络2划分为基本层和增强层,基本层可以包括至少一个网络层,基本层也可以不包括网络层,即基本层为空。增强层可以包括至少一个网络层,增强层也可以不包括网络层,即增强层为空。需要注意的是,对于编码神经网络2中的多个网络层来说,可以根据实际需要将多个网络层划分为基本层和增强层。比如说,可以将网络结构固定的网络层作为基本层,可以将网络结构不固定的网络层作为增强层。
示例性的,对于编码神经网络2来说,输出特征的尺寸可以小于输入特征的尺寸,或者,输出特征的尺寸可以等于输入特征的尺寸,或者,输出特征的尺寸可以大于输入特征的尺寸。
示例性的,对于编码神经网络2来说,基本层和增强层中至少包括1个卷积层。比如说,基本层至少包括1个卷积层,增强层可以包括至少包括1个卷积层,或不包括卷积层。或者,增强层至少包括1个卷积层,基本层可以包括至少包括1个卷积层,或不包括卷积层。
对于编码神经网络2来说,基本层和增强层中至少包括1个残差结构层。比如说,基本层至少包括1个残差结构层,增强层可以包括残差结构层或不包括残差结构层。或者,增强层至少包括1个残差结构层,基本层可以包括残差结构层或不包括残差结构层。
对于编码神经网络2来说,可以包括但不限于卷积层和激活层等,对此不做限制。比如说,编码神经网络2依次包括1个stride为2的卷积层、1个激活层、1个stride为1的卷积层、1个激活层,将上述所有网络层均作为基本层。又例如,编码神经网络2依次包括1个stride为2的卷积层、1个stride为1的卷积层,将上述所有网络层均作为基本层。又例如,编码神经网络2依次包括1个stride为2的卷积层、1个激活层、1个stride为2的卷积层、1个激活层、1个stride为1的卷积层、1个激活层,将上述所有网络层均作为基本层。
示例性的,特征变换单元的第一个网络层的输入通道数为1或3。若输入图像块仅含有一个通道(如灰度图),则特征变换单元的第一个网络层的输入通道数为1;若输入图像块含有三个通道(如RGB或YUV格式),则特征变换单元的第一个网络层的输入通道数为3。
需要注意的是,特征变换单元的编码神经网络2的网络结构,与图像特征反变换单元的解码神经网络2的网络结构(参见实施例12)可以对称,特征变换单元的编码神经网络2的网络参数,与图像特征反变换单元的解码神经网络2的网络参数,可以相同或者不同。
在一种可能的实施方式中,可以为特征变换单元配置默认网络结构的网络层,与默认网络结构的网络层有关的网络参数均为固定。比如说,在默认网络结构的网络层中,反卷积层的数量为固定,激活层的数量为固定,每个反卷积层的通道数为固定,卷积核的尺寸为固定,滤波系数为固定等。需要注意的是,为特征变换单元配置的默认网络结构的网络层,与为图像特征反变换单元配置的默认网络结构的网络层(参见实施例12),二者可以为对称结构。
在一种可能的实施方式中,可以为特征变换单元配置预制神经网络池(与解码端设备的预制神经网络池对应),预制神经网络池包括至少一个预制网络结构的网络层,与预制网络结构的网络层有关的网络参数均可以根据实际需要配置。比如说,预制神经网络池可以包括预制网络结构t1’的网络层、预制网络结构t2’的网络层、预制网络结构t3’的网络层等。需要注意的是,预制网络结构t1’的网络层与预制网络结构t1的网络层(参见实施例12),二者可以为对称结构。预制网络结构t2’的网络层与预制网络结构t2的网络层,二者可以为对称结构。预制网络结构t3’的网络层与预制网络结构t3的网络层,二者可以为对称结构。
在一种可能的实施方式中,可以基于网络参数为特征变换单元动态生成可变网络结构的网络层,与可变网络结构的网络层有关的网络参数是编码端动态生成的,而不是预先配置的。
在一种可能的实施方式中,编码端可以基于编码神经网络2的网络结构确定图像特征反变换单元对应的神经网络信息2,并在码流0中编码图像特征反变换单元对应的神经网络信息2。比如说,编码端可以采用如下情况确定神经网络信息2:
情况1、若编码神经网络2的基本层采用默认网络结构的网络层,且增强层采用默认网络结构的网络层,则编码端在码流0中编码基本层信息和增强层信息,该基本层信息和该增强层信息组成图像特征反变换单元对应的神经网络信息2。其中,基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络。
情况2、若编码神经网络2的基本层采用默认网络结构的网络层,且增强层采用预制网络结构的网络层,则编码端在码流0中编码基本层信息和增强层信息,该基本层信息和该增强层信息组成图像特征反变换单元对应的神经网络信息2。其中,基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,且增强层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。例如,若编码神经网络2的增强层采用预制网络结构t1’的网络层,则增强层预制网络索引号表示预制神经网络池中的第一个预制网络结构t1’的网络层。
情况3、若编码神经网络2的基本层采用默认网络结构的网络层,且增强层采用可变网络结构的增强层,则编码端在码流0中编码基本层信息和增强层信息,该基本层信息和该增强层信息组成图像特征反变换单元对应的神经网络信息2。其中,基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括用于生成增强层的网络参数,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、 每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。需要注意的是,增强层信息中的网络参数与编码神经网络2的增强层采用的网络参数可以不同,也就是说,针对增强层采用的每个网络参数,可以生成与该网络参数对称的网络参数,对此生成过程不做限制,将这个对称的网络参数作为增强层信息传输给解码端。
情况4、若编码神经网络2的基本层采用预制网络结构的网络层,且增强层采用默认网络结构的网络层,则在码流0中编码基本层信息和增强层信息,基本层信息和增强层信息组成图像特征反变换单元对应的神经网络信息2。其中,基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,基本层使用预制网络标志位表示基本层使用预制网络,基本层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。增强层信息包括增强层使用默认网络标志位,增强层使用默认网络标志位表示增强层使用默认网络。
情况5、若编码神经网络2的基本层采用预制网络结构的网络层,且增强层采用预制网络结构的网络层,则编码端在码流0中编码基本层信息和增强层信息,该基本层信息和该增强层信息组成图像特征反变换单元对应的神经网络信息2。其中,该基本层信息可以包括基本层使用预制网络标志位和基本层预制网络索引号,该基本层使用预制网络标志位用于表示基本层使用预制网络,该基本层预制网络索引号用于表示该预制网络结构的网络层在预制神经网络池中对应的索引。该增强层信息可以包括增强层使用预制网络标志位和增强层预制网络索引号,且该增强层使用预制网络标志位用于表示增强层使用预制网络,且该增强层预制网络索引号用于表示该预制网络结构的网络层在预制神经网络池中对应的索引。
情况6、若编码神经网络2的基本层采用预制网络结构的网络层,且增强层采用可变网络结构的增强层,则在码流0中编码基本层信息和增强层信息,基本层信息和增强层信息组成图像特征反变换单元对应的神经网络信息2。其中,基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,基本层使用预制网络标志位表示基本层使用预制网络,基本层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。增强层信息包括用于生成增强层的网络参数,该网络参数可以包括但不限于以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。增强层信息中的网络参数与编码神经网络2的增强层采用的网络参数可以不同,也就是说,针对增强层采用的每个网络参数,可以生成与该网络参数对称的网络参数,对此生成过程不做限制,将这个对称的网络参数作为增强层信息传输给解码端。
实施例18:针对实施例14和实施例15,编码端的图像特征反变换单元可以基于解码神经网络对图像特征重建值F’进行反变换操作,得到图像低阶特征值LF。在此基础上,也可以基于该解码神经网络的网络结构确定解码端设备的图像特征反变换单元对应的神经网络信息2,并在码流0中编码图像特征反变换单元对应的神经网络信息2。
在一种可能的实施方式中,编码端的图像特征反变换单元可以包括解码神经网络2,解码神经网络2可以包括基本层和增强层,编码端的解码神经网络2的网络结构与解码端的解码神经网络2的网络结构相同,可以参见实施例12,在此不再赘述。
在一种可能的实施方式中,编码端可以为图像特征反变换单元配置默认网络结构的网络层,这个默认网络结构的网络层与解码端的默认网络结构的网络层相同。编码端可以为图像特征反变换单元配置预制神经网络池,该预制神经网络池可以包括至少一个预制网络结构的网络层,这个预制神经网络池与解码端的预制神经网络池相同。编码端可以基于网络参数为图像特征反变换单元动态生成可变网络结构的网络层。编码端可以基于解码神经网络2的网络结构确定图像特征反变换单元对应的神经网络信息2,并在码流0中编码神经网络信息2。
比如说,若解码神经网络2的基本层采用默认网络结构的网络层,且增强层采用默认网络结构的网络层,则基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络。又例如,若解码神经网络2的基本层采用默认网络结构的网络层,且增强层采用预制网络结构的网络层,则基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且增强层使用预制网络标志位表示增强层使用预制网络,且增强层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。又例如,若解码神经网络2的基本层采用默认网络结构的网络层,且增强层采用可变网络结构的增强层,则基本层信息包括 基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括用于生成增强层的网络参数,增强层信息中的网络参数与解码神经网络2的增强层采用的网络参数可以相同。
比如说,若解码神经网络2的基本层采用预制网络结构的网络层,增强层采用默认网络结构的网络层,则基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,基本层使用预制网络标志位表示基本层使用预制网络,基本层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。增强层信息包括增强层使用默认网络标志位,增强层使用默认网络标志位表示增强层使用默认网络。又例如,若解码神经网络2的基本层采用预制网络结构的网络层,增强层采用预制网络结构的网络层,则基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,基本层使用预制网络标志位表示基本层使用预制网络,基本层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,增强层使用预制网络标志位表示增强层使用预制网络,增强层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。又例如,若解码神经网络2的基本层采用预制网络结构的网络层,增强层采用可变网络结构的增强层,则基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,基本层使用预制网络标志位表示基本层使用预制网络,基本层预制网络索引号表示该预制网络结构的网络层在预制神经网络池中对应的索引。增强层信息包括网络参数,增强层信息中的网络参数与解码神经网络2的增强层采用的网络参数相同。
实施例19:针对实施例14和实施例15,系数超参特征变换单元可以基于编码神经网络对图像特征值F进行系数超参特征变换,得到系数超参特征系数值C。在此基础上,可以基于该编码神经网络的网络结构确定解码端设备的系数超参特征生成单元对应的神经网络信息1,并在码流0中编码系数超参特征生成单元对应的神经网络信息1。关于解码端设备如何根据神经网络信息1生成系数超参特征生成单元对应的解码神经网络1,可以参见实施例9,在此不再重复赘述。
在一种可能的实施方式中,系数超参特征变换单元可以包括编码神经网络1,编码神经网络1可以包括基本层和增强层,可以将编码神经网络1划分为基本层和增强层,基本层可以包括至少一个网络层,基本层也可以不包括网络层,即基本层为空。增强层可以包括至少一个网络层,增强层也可以不包括网络层,即增强层为空。需要注意的是,系数超参特征变换单元的编码神经网络1的网络结构,与系数超参特征生成单元的解码神经网络1的网络结构(参见实施例9)可以对称,系数超参特征变换单元的编码神经网络1的网络参数,与系数超参特征生成单元的解码神经网络1的网络参数,可以相同或者不同,对此编码神经网络1的网络结构不再赘述。
在一种可能的实施方式中,可以为系数超参特征变换单元配置默认网络结构的网络层,与默认网络结构的网络层有关的网络参数均为固定。需要注意的是,为系数超参特征变换单元配置的默认网络结构的网络层,与为系数超参特征生成单元配置的默认网络结构的网络层,二者可以为对称结构。可以为系数超参特征变换单元配置预制神经网络池(与解码端设备的预制神经网络池对应),预制神经网络池包括至少一个预制网络结构的网络层,与预制网络结构的网络层有关的网络参数均可以根据实际需要配置。可以基于网络参数为系数超参特征变换单元动态生成可变网络结构的网络层,与可变网络结构的网络层有关的网络参数是编码端动态生成的。
在一种可能的实施方式中,编码端可以基于编码神经网络1的网络结构确定系数超参特征生成单元对应的神经网络信息1,并在码流0中编码神经网络信息1。其中,神经网络信息1可以包括基本层信息和增强层信息,关于编码神经网络信息1的方式,与编码神经网络信息2的方式类似,可以参见实施例17,在此不再重复赘述。比如说,若编码神经网络1的基本层采用默认网络结构的网络层,且增强层采用默认网络结构的网络层,则基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括增强层使用默认网络标志位,且增强层使用默认网络标志位表示增强层使用默认网络。
实施例20:针对实施例14和实施例15,编码端的系数超参特征生成单元可以基于解码神经网络对系数超参特征系数重建值C’进行反变换操作,得到系数超参特征值P。在此基础上,也可以基于该解码神经网络的网络结构确定解码端设备的系数超参特征生成单元对应的神经网络信息1,并在码流0中编码系数超参特征生成单元对应的神经网络信息1。
在一种可能的实施方式中,编码端的系数超参特征生成单元可以包括解码神经网络1,解码神经网络1可以包括基本层和增强层,编码端的解码神经网络1的网络结构与解码端的解码神经网络1的网络结构相同,可以参见实施例9,在此不再赘述。
在一种可能的实施方式中,编码端可以为系数超参特征生成单元配置默认网络结构的网络层,这个默认网络结构的网络层与解码端的默认网络结构的网络层相同。编码端可以为系数超参特征生成单元配置预制神经网络池,该预制神经网络池可以包括至少一个预制网络结构的网络层,这个预制神经网络池与解码端的预制神经网络池相同。编码端可以基于网络参数为系数超参特征生成单元动态生成可变网络结构的网络层。编码端可以基于解码神经网络1的网络结构确定系数超参特征生成单元对应的神经网络信息1,并在码流0中编码神经网络信息1。其中,神经网络信息1可以包括基本层信息和增强层信息,关于编码神经网络信息1的方式,与编码神经网络信息2的方式类似,可以参见实施例18,在此不再重复赘述。比如说,若解码神经网络1(编码端采用)的基本层采用默认网络结构的网络层,且增强层采用可变网络结构的增强层,则基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括用于生成增强层的网络参数,增强层信息中的网络参数与解码神经网络1的增强层(即编码端采用的增强层)采用的网络参数可以相同。
实施例21:针对实施例14和实施例15,编码端的质量增强单元可以基于解码神经网络对图像低阶特征值LF进行增强处理,得到当前块对应的重建图像块I。在此基础上,也可以基于该解码神经网络的网络结构确定解码端设备的质量增强单元对应的神经网络信息3,并在码流0中编码质量增强单元对应的神经网络信息3。比如说,编码端的质量增强单元可以包括解码神经网络3,解码神经网络3可以包括基本层和增强层,编码端的解码神经网络3的网络结构与解码端的解码神经网络3的网络结构相同,可以参见实施例13,在此不再赘述。
示例性的,编码端可以为质量增强单元配置默认网络结构的网络层,这个默认网络结构的网络层与解码端的默认网络结构的网络层相同。编码端可以为质量增强单元配置预制神经网络池,该预制神经网络池可以包括至少一个预制网络结构的网络层,这个预制神经网络池与解码端的预制神经网络池相同。编码端可以基于网络参数为质量增强单元动态生成可变网络结构的网络层。编码端基于解码神经网络3的网络结构确定质量增强单元对应的神经网络信息3,并在码流0中编码神经网络信息3。其中,神经网络信息3可以包括基本层信息和增强层信息,关于编码神经网络信息3的方式,与编码神经网络信息2的方式类似,可以参见实施例18,在此不再重复赘述。比如说,若解码神经网络3的基本层采用默认网络结构的网络层,且增强层采用可变网络结构的增强层,则基本层信息包括基本层使用默认网络标志位,且基本层使用默认网络标志位表示基本层使用默认网络。增强层信息包括用于生成增强层的网络参数,增强层信息中的网络参数与解码神经网络3的增强层采用的网络参数可以相同。
在一种可能的实施方式中,对于实施例1-实施例21中的网络参数,可以均为采用定点化后的网络参数,如网络参数中的滤波权重采用4、8、16、32、64比特位宽表示,对于网络输出的特征值,也可以进行位宽限制,如限制到4、8、16、32、64比特位宽表示。
示例性的,可以将网络参数的位宽限制在8比特,其值大小限制到[-127,127]之间;可以将网络输出的特征值的位宽限制在8比特,其值大小限制到[-127,127]之间。
实施例22:对于图像头的相关语法表,参见表4所示,给出与图像头有关的语法信息,即图像级语法。在表4中,u(n)用于表示n位定长码编码方法。
表4

在表4中,pic_width表示图像宽,pic_height表示图像高,pic_format表示图像格式,如RGB 444、YUV444、YUV420、YUV422等图像格式。bu_width表示基本块的宽,bu_height表示基本块的高,block_width表示图像块的宽,block_height表示图像块的高,bit_depth表示图像比特深度,pic_qp表示当前图像中的量化参数,lossless_flag表示当前图像是否进行无损编码标记,feature_map_max_bit_depth表示特征图最大特深度,特征图最大特深度用于限制网络输入或输出特征图的最大值和最小值。
示例性的,上述各实施例可以单独实现,也可以组合实现,比如说,实施例1-实施例22中的每个实施例,可以单独实现,实施例1-实施例22中的至少两个实施例,可以组合实现。
示例性的,上述各实施例中,编码端的内容也可以应用到解码端,即解码端可以采用相同方式处理,解码端的内容也可以应用到编码端,即编码端可以采用相同方式处理。
基于与上述方法同样的申请构思,本公开实施例中还提出一种基于神经网络的图像解码装置,所述装置应用于解码端,所述装置包括:存储器,其经配置以存储视频数据;解码器,其经配置以实现上述实施例1-实施例22中的解码方法,即解码端的处理流程。
比如说,在一种可能的实施方式中,解码器,其经配置以实现:
从码流中解码当前块对应的控制参数和图像信息;
从所述控制参数中获取解码处理单元对应的神经网络信息,并基于所述神经网络信息生成所述解码处理单元对应的解码神经网络;
基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。
基于与上述方法同样的申请构思,本公开实施例中还提出一种基于神经网络的图像编码装置,所述装置应用于编码端,所述装置包括:存储器,其经配置以存储视频数据;编码器,其经配置以实现上述实施例1-实施例22中的编码方法,即编码端的处理流程。
比如说,在一种可能的实施方式中,编码器,其经配置以实现:
基于当前块确定编码处理单元对应的输入特征,基于所述编码处理单元对应的编码神经网络对所述输入特征进行处理,得到所述编码处理单元对应的输出特征,并基于所述输出特征确定所述当前块对应的图像信息;
获取当前块对应的控制参数,所述控制参数包括解码处理单元对应的神经网络信息,所述神经网络信息用于确定所述解码处理单元对应的解码神经网络;
在码流中编码所述当前块对应的图像信息和控制参数。
基于与上述方法同样的申请构思,本公开实施例提供的解码端设备(也可以称为视频解码器),从硬件层面而言,其硬件架构示意图具体可以参见图8A所示。包括:处理器811和机器可读存储介质812,机器可读存储介质812存储有能够被处理器811执行的机器可执行指令;处理器811用于执行机器可执行指令,以实现本公开上述实施例1-22的解码方法。
基于与上述方法同样的申请构思,本公开实施例提供的编码端设备(也可以称为视频编码器),从硬件层面而言,其硬件架构示意图具体可以参见图8B所示。包括:处理器821和机器可读存储介质822,机器可读存储介质822存储有能够被处理器821执行的机器可执行指令;处理器821用于执行机器可执行指令,以实现本公开上述实施例1-22的编码方法。
基于与上述方法同样的申请构思,本公开实施例还提供一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被处理器执行时,能够实现本公开上述示例公开的方法,如上述上述各实施例中的解码方法或者编码方法。
基于与上述方法同样的申请构思,本公开实施例还提供一种计算机应用程序,所述计算机应用程序令被处理器执行时,能够实现本公开上述示例公开的解码方法或者编码方法。
基于与上述方法同样的申请构思,本公开实施例中还提出一种基于神经网络的图像解码装置,所述装置应用于解码端,所述装置包括:解码模块,用于从码流中解码当前块对应的控制参数和图像信息;获取模块,用于从所述控制参数中获取解码处理单元对应的神经网络信息,并基于所述神经网络信息生成所述解码处理单元对应的解码神经网络;处理模块,用于基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。
示例性的,若所述神经网络信息包括基本层信息和增强层信息,所述获取模块基于所述神经网络信息生成所述解码处理单元对应的解码神经网络时具体用于:基于所述基本层信息确定所述解码处理单元对应的基本层;基于所述增强层信息确定所述解码处理单元对应的增强层;基于所述基本层和所述增强层生成所述解码处理单元对应的解码神经网络。
示例性的,所述获取模块基于所述基本层信息确定所述解码处理单元对应的基本层时具体用于:若所述基本层信息包括基本层使用默认网络标志位,且所述基本层使用默认网络标志位表示基本层使用默认网络,则获取默认网络结构的基本层。
示例性的,所述获取模块基于所述基本层信息确定所述解码处理单元对应的基本层时具体用于:若所述基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且所述基本层使用预制网络标志位表示基本层使用预制网络,则从预制神经网络池中选取所述基本层预制网络索引号对应的预制网络结构的基本层;其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
示例性的,所述获取模块基于所述增强层信息确定所述解码处理单元对应的增强层时具体用于:若所述增强层信息包括增强层使用默认网络标志位,且所述增强层使用默认网络标志位表示增强层使用默认网络,则获取默认网络结构的增强层。
示例性的,所述获取模块基于所述增强层信息确定所述解码处理单元对应的增强层时具体用于:若所述增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且所述增强层使用预制网络标志位表示增强层使用预制网络,则从预制神经网络池中选取所述增强层预制网络索引号对应的预制网络结构的增强层;其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
示例性的,所述获取模块基于所述增强层信息确定所述解码处理单元对应的增强层时具体用于:若所述增强层信息包括用于生成增强层的网络参数,则基于所述网络参数生成所述解码处理单元对应的增强层;其中,所述网络参数包括以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。
示例性的,所述图像信息包括系数超参特征信息和图像特征信息,所述处理模块基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征时具体用于:在执行系数超参特征生成的解码过程时,基于所述系数超参特征信息确定系数超参特征系数重建值;基于所述解码神经网络对所述系数超参特征系数重建值进行反变换操作,得到系数超参特征值;其中,所述系数超参特征值用于从码流中解码所述图像特征信息;在执行图像特征反变换的解码过程时,基于所述图像特征信息确定图像特征重建值;基于所述解码神经网络对所述图像特征重建值进行反变换操作,得到图像低阶特征值;其中,所述图像低阶特征值用于获得所述当前块对应的重建图像块。
示例性的,所述处理模块基于所述系数超参特征信息确定系数超参特征系数重建值时具体用于:若所述控制参数包括第一使能信息,且所述第一使能信息表示使能第一反量化操作,则对所述系数超参特征信息进行反量化,得到系数超参特征系数重建值。
示例性的,所述处理模块基于所述图像特征信息确定图像特征重建值时具体用于:若所述控制参数包括第二使能信息,且所述第二使能信息表示使能第二反量化操作,则对所述图像特征信息进行反量化,得到图像特征重建值。
示例性的,所述处理模块还用于:若所述控制参数包括第三使能信息,所述第三使能信息表示使能质量增强操作,在执行质量增强的解码过程时,获取所述图像低阶特征值,基于所述解码神经网络对所述图像低阶特征值进行增强处理,得到所述当前块对应的重建图像块。
基于与上述方法同样的申请构思,本公开实施例中还提出一种基于神经网络的图像编码装置,所述装置应用于编码端,所述装置包括:处理模块,用于基于当前块确定编码处理单元对应的输入特征,基于所述编码处理单元对应的编码神经网络对所述输入特 征进行处理,得到所述编码处理单元对应的输出特征,并基于所述输出特征确定所述当前块对应的图像信息;获取模块,用于获取当前块对应的控制参数,所述控制参数包括解码处理单元对应的神经网络信息,所述神经网络信息用于确定所述解码处理单元对应的解码神经网络;编码模块,用于在码流中编码所述当前块对应的图像信息和控制参数。
示例性的,所述神经网络信息包括基本层信息和增强层信息,所述解码神经网络包括基于所述基本层信息确定的基本层、及基于所述增强层信息确定的增强层。
示例性的,若所述基本层信息包括基本层使用默认网络标志位,且所述基本层使用默认网络标志位表示基本层使用默认网络,所述解码神经网络采用默认网络结构的基本层。
示例性的,若所述基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且所述基本层使用预制网络标志位表示基本层使用预制网络,则所述解码神经网络采用从预制神经网络池中选取的所述基本层预制网络索引号对应的预制网络结构的基本层;其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
示例性的,若所述增强层信息包括增强层使用默认网络标志位,且所述增强层使用默认网络标志位表示增强层使用默认网络,所述解码神经网络采用默认网络结构的增强层。
示例性的,若所述增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且所述增强层使用预制网络标志位表示增强层使用预制网络,则所述解码神经网络采用从预制神经网络池中选取的所述增强层预制网络索引号对应的预制网络结构的增强层;其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
示例性的,若所述增强层信息包括用于生成增强层的网络参数,所述解码神经网络采用基于所述网络参数生成的增强层;其中,所述网络参数包括以下至少一种:神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。
示例性的,所述处理模块还用于:将当前图像划分为N个互相不重合的图像块,N为正整数;对每个图像块进行边界填充,得到边界填充后的图像块;其中,在对每个图像块进行边界填充时,填充值不依赖相邻图像块的重建像素值;基于边界填充后的图像块生成N个当前块。
示例性的,所述处理模块还用于:将当前图像划分为多个基本块,且每个基本块包括至少一个图像块;对每个图像块进行边界填充,得到边界填充后的图像块;其中,在对每个图像块进行边界填充时,该图像块的填充值不依赖同一基本块内其它图像块的重建像素值,且允许依赖不同基本块内图像块的重建像素值;基于边界填充后的图像块生成多个当前块。
本领域内的技术人员应明白,本公开实施例可提供为方法、系统、或计算机程序产品。本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。本公开实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本公开的实施例而已,并不用于限制本公开。对于本领域技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本公开的权利要求范围之内。

Claims (23)

  1. 一种基于神经网络的图像解码方法,其特征在于,所述方法包括:
    从码流中解码当前块对应的控制参数和图像信息;
    从所述控制参数中获取解码处理单元对应的神经网络信息,并基于所述神经网络信息生成所述解码处理单元对应的解码神经网络;
    基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。
  2. 根据权利要求1所述的方法,其特征在于,
    若所述神经网络信息包括基本层信息和增强层信息,所述基于所述神经网络信息生成所述解码处理单元对应的解码神经网络,包括:
    基于所述基本层信息确定所述解码处理单元对应的基本层;
    基于所述增强层信息确定所述解码处理单元对应的增强层;
    基于所述基本层和所述增强层生成所述解码处理单元对应的解码神经网络。
  3. 根据权利要求2所述的方法,其特征在于,
    所述基于所述基本层信息确定所述解码处理单元对应的基本层,包括:
    若所述基本层信息包括基本层使用默认网络标志位,且所述基本层使用默认网络标志位表示基本层使用默认网络,则获取默认网络结构的基本层。
  4. 根据权利要求2所述的方法,其特征在于,
    所述基于所述基本层信息确定所述解码处理单元对应的基本层,包括:
    若所述基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且所述基本层使用预制网络标志位表示基本层使用预制网络,则从预制神经网络池中选取所述基本层预制网络索引号对应的预制网络结构的基本层;
    其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
  5. 根据权利要求2所述的方法,其特征在于,
    所述基于所述增强层信息确定所述解码处理单元对应的增强层,包括:
    若所述增强层信息包括增强层使用默认网络标志位,且所述增强层使用默认网络标志位表示增强层使用默认网络,则获取默认网络结构的增强层。
  6. 根据权利要求2所述的方法,其特征在于,
    所述基于所述增强层信息确定所述解码处理单元对应的增强层,包括:
    若所述增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且所述增强层使用预制网络标志位表示增强层使用预制网络,则从预制神经网络池中选取所述增强层预制网络索引号对应的预制网络结构的增强层;
    其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
  7. 根据权利要求2所述的方法,其特征在于,
    所述基于所述增强层信息确定所述解码处理单元对应的增强层,包括:
    若所述增强层信息包括用于生成增强层的网络参数,则基于所述网络参数生成所述解码处理单元对应的增强层;其中,所述网络参数包括以下至少一种:
    神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。
  8. 根据权利要求1-7任一所述的方法,其特征在于,所述图像信息包括系数超参特征信息和图像特征信息,所述基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征,包括:
    在执行系数超参特征生成的解码过程时,基于所述系数超参特征信息确定系数超参特征系数重建值;基于所述解码神经网络对所述系数超参特征系数重建值进行反变换操作,得到系数超参特征值;其中,所述系数超参特征值用于从码流中解码所述图像特征信息;
    在执行图像特征反变换的解码过程时,基于所述图像特征信息确定图像特征重建值;基于所述解码神经网络对所述图像特征重建值进行反变换操作,得到图像低阶特征值;其中,所述图像低阶特征值用于获得所述当前块对应的重建图像块。
  9. 根据权利要求8所述的方法,其特征在于,
    所述基于所述系数超参特征信息确定系数超参特征系数重建值,包括:
    若所述控制参数包括第一使能信息,且所述第一使能信息表示使能第一反量化操作,则对所述系数超参特征信息进行反量化,得到系数超参特征系数重建值;
    所述基于所述图像特征信息确定图像特征重建值,包括:
    若所述控制参数包括第二使能信息,且所述第二使能信息表示使能第二反量化操作,则对所述图像特征信息进行反量化,得到图像特征重建值。
  10. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    若所述控制参数包括第三使能信息,且所述第三使能信息表示使能质量增强操作, 在执行质量增强的解码过程时,获取所述图像低阶特征值,基于所述解码神经网络对所述图像低阶特征值进行增强处理,得到所述当前块对应的重建图像块。
  11. 一种基于神经网络的图像编码方法,其特征在于,所述方法包括:
    基于当前块确定编码处理单元对应的输入特征,基于所述编码处理单元对应的编码神经网络对所述输入特征进行处理,得到所述编码处理单元对应的输出特征,并基于所述输出特征确定所述当前块对应的图像信息;
    获取当前块对应的控制参数,所述控制参数包括解码处理单元对应的神经网络信息,所述神经网络信息用于确定所述解码处理单元对应的解码神经网络;
    在码流中编码所述当前块对应的图像信息和控制参数。
  12. 根据权利要求11所述的方法,其特征在于,
    所述神经网络信息包括基本层信息和增强层信息,所述解码神经网络包括基于所述基本层信息确定的基本层、及基于所述增强层信息确定的增强层。
  13. 根据权利要求12所述的方法,其特征在于,
    若所述基本层信息包括基本层使用默认网络标志位,且所述基本层使用默认网络标志位表示基本层使用默认网络,所述解码神经网络采用默认网络结构的基本层。
  14. 根据权利要求12所述的方法,其特征在于,
    若所述基本层信息包括基本层使用预制网络标志位和基本层预制网络索引号,且所述基本层使用预制网络标志位表示基本层使用预制网络,则所述解码神经网络采用从预制神经网络池中选取的所述基本层预制网络索引号对应的预制网络结构的基本层;
    其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
  15. 根据权利要求12所述的方法,其特征在于,
    若所述增强层信息包括增强层使用默认网络标志位,且所述增强层使用默认网络标志位表示增强层使用默认网络,所述解码神经网络采用默认网络结构的增强层。
  16. 根据权利要求12所述的方法,其特征在于,
    若所述增强层信息包括增强层使用预制网络标志位和增强层预制网络索引号,且所述增强层使用预制网络标志位表示增强层使用预制网络,则所述解码神经网络采用从预制神经网络池中选取的所述增强层预制网络索引号对应的预制网络结构的增强层;
    其中,所述预制神经网络池包括至少一个预制网络结构的网络层。
  17. 根据权利要求12所述的方法,其特征在于,
    若所述增强层信息包括用于生成增强层的网络参数,所述解码神经网络采用基于所述网络参数生成的增强层;其中,所述网络参数包括以下至少一种:
    神经网络层数、反卷积层标志位、反卷积层的数量、每个反卷积层的量化步长、每个反卷积层的通道数、卷积核的尺寸、滤波数量、滤波尺寸索引、滤波系数为零标志位、滤波系数、激活层标志位、激活层类型。
  18. 根据权利要求11-17任一所述的方法,其特征在于,
    所述基于当前块确定编码处理单元对应的输入特征之前,所述方法还包括:
    将当前图像划分为N个互相不重合的图像块,N为正整数;
    对每个图像块进行边界填充,得到边界填充后的图像块;其中,在对每个图像块进行边界填充时,填充值不依赖相邻图像块的重建像素值;
    基于边界填充后的图像块生成N个当前块。
  19. 根据权利要求11-17任一所述的方法,其特征在于,
    所述基于当前块确定编码处理单元对应的输入特征之前,所述方法还包括:
    将当前图像划分为多个基本块,且每个基本块包括至少一个图像块;
    对每个图像块进行边界填充,得到边界填充后的图像块;其中,在对每个图像块进行边界填充时,该图像块的填充值不依赖同一基本块内其它图像块的重建像素值,且允许依赖不同基本块内图像块的重建像素值;
    基于边界填充后的图像块生成多个当前块。
  20. 一种基于神经网络的图像解码装置,其特征在于,所述装置包括:
    存储器,其经配置以存储视频数据;
    解码器,其经配置以实现:
    从码流中解码当前块对应的控制参数和图像信息;
    从所述控制参数中获取解码处理单元对应的神经网络信息,并基于所述神经网络信息生成所述解码处理单元对应的解码神经网络;
    基于所述图像信息确定所述解码处理单元对应的输入特征,基于所述解码神经网络对所述输入特征进行处理,得到所述解码处理单元对应的输出特征。
  21. 一种基于神经网络的图像编码装置,其特征在于,所述装置包括:
    存储器,其经配置以存储视频数据;
    编码器,其经配置以实现:
    基于当前块确定编码处理单元对应的输入特征,基于所述编码处理单元对应的编码神经网络对所述输入特征进行处理,得到所述编码处理单元对应的输出特征,并基于所述输出特征确定所述当前块对应的图像信息;
    获取当前块对应的控制参数,所述控制参数包括解码处理单元对应的神经网络信息,所述神经网络信息用于确定所述解码处理单元对应的解码神经网络;
    在码流中编码所述当前块对应的图像信息和控制参数。
  22. 一种解码端设备,其特征在于,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;
    所述处理器用于执行机器可执行指令,以实现权利要求1-10任一项所述的方法步骤。
  23. 一种编码端设备,其特征在于,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;
    所述处理器用于执行机器可执行指令,以实现权利要求11-19任一项所述的方法步骤。
PCT/CN2023/106899 2022-07-14 2023-07-12 一种基于神经网络的图像解码、编码方法、装置及其设备 WO2024012474A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210834031.1A CN117440185A (zh) 2022-07-14 2022-07-14 一种基于神经网络的图像解码、编码方法、装置及其设备
CN202210834031.1 2022-07-14

Publications (1)

Publication Number Publication Date
WO2024012474A1 true WO2024012474A1 (zh) 2024-01-18

Family

ID=89535614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/106899 WO2024012474A1 (zh) 2022-07-14 2023-07-12 一种基于神经网络的图像解码、编码方法、装置及其设备

Country Status (3)

Country Link
CN (1) CN117440185A (zh)
TW (1) TW202404362A (zh)
WO (1) WO2024012474A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110401836A (zh) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 一种图像解码、编码方法、装置及其设备
US20200213587A1 (en) * 2017-08-28 2020-07-02 Interdigital Vc Holdings, Inc. Method and apparatus for filtering with mode-aware deep learning
CN113163200A (zh) * 2021-02-09 2021-07-23 北京工业大学 基于卷积变分自编码器神经网络的双层hdr图像压缩器及方法
CN114125446A (zh) * 2020-06-22 2022-03-01 华为技术有限公司 图像编码方法、解码方法和装置
CN114339262A (zh) * 2020-09-30 2022-04-12 华为技术有限公司 熵编/解码方法及装置
CN114731406A (zh) * 2020-12-04 2022-07-08 深圳市大疆创新科技有限公司 编码方法、解码方法和编码装置、解码装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200213587A1 (en) * 2017-08-28 2020-07-02 Interdigital Vc Holdings, Inc. Method and apparatus for filtering with mode-aware deep learning
CN110401836A (zh) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 一种图像解码、编码方法、装置及其设备
CN114125446A (zh) * 2020-06-22 2022-03-01 华为技术有限公司 图像编码方法、解码方法和装置
CN114339262A (zh) * 2020-09-30 2022-04-12 华为技术有限公司 熵编/解码方法及装置
CN114731406A (zh) * 2020-12-04 2022-07-08 深圳市大疆创新科技有限公司 编码方法、解码方法和编码装置、解码装置
CN113163200A (zh) * 2021-02-09 2021-07-23 北京工业大学 基于卷积变分自编码器神经网络的双层hdr图像压缩器及方法

Also Published As

Publication number Publication date
CN117440185A (zh) 2024-01-23
TW202404362A (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2020253829A1 (zh) 一种编解码方法、装置及存储介质
CA3131035C (en) Coding and decoding methods, coder and decoder, and storage medium
WO2024022356A1 (zh) 一种编解码方法、装置及其设备
WO2024012474A1 (zh) 一种基于神经网络的图像解码、编码方法、装置及其设备
WO2021143177A1 (zh) 编码、解码方法、装置及其设备
EP4338416A1 (en) End-to-end learning-based, eg neural network, pre-processing and post-processing optimization for image and video coding
WO2023208069A1 (zh) 一种解码、编码方法、装置及其设备
WO2019152131A1 (en) Methods and devices for picture encoding and decoding using multiple transforms
TWI832661B (zh) 圖像編解碼的方法、裝置及存儲介質
WO2023197230A1 (zh) 滤波方法、编码器、解码器以及存储介质
WO2023138391A1 (zh) 系数解码方法、装置、图像解码器及电子设备
WO2023245544A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2024077573A1 (zh) 编解码方法、编码器、解码器、码流以及存储介质
WO2022257134A1 (zh) 一种视频编解码方法、装置、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838964

Country of ref document: EP

Kind code of ref document: A1