WO2023040745A1 - 特征图编解码方法和装置 - Google Patents

特征图编解码方法和装置 Download PDF

Info

Publication number
WO2023040745A1
WO2023040745A1 PCT/CN2022/117819 CN2022117819W WO2023040745A1 WO 2023040745 A1 WO2023040745 A1 WO 2023040745A1 CN 2022117819 W CN2022117819 W CN 2022117819W WO 2023040745 A1 WO2023040745 A1 WO 2023040745A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
probability
elements
feature element
estimation result
Prior art date
Application number
PCT/CN2022/117819
Other languages
English (en)
French (fr)
Inventor
师一博
葛运英
王晶
毛珏
赵寅
杨海涛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to AU2022348742A priority Critical patent/AU2022348742A1/en
Priority to CA3232206A priority patent/CA3232206A1/en
Publication of WO2023040745A1 publication Critical patent/WO2023040745A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Definitions

  • the embodiments of the present application relate to the technical field of audio, video or image compression based on artificial intelligence (AI), and in particular, to a feature map encoding and decoding method and device.
  • AI artificial intelligence
  • Image compression refers to a technology that utilizes image data characteristics such as spatial redundancy, visual redundancy, and statistical redundancy to represent the original image pixel matrix with less bits lossy or lossless, so as to realize effective image information. transmission and storage. Image compression is divided into lossless compression and lossy compression. Among them, lossless compression will not cause loss of image details, but lossy compression achieves a larger compression ratio at the cost of a certain degree of image quality reduction.
  • the present application provides a feature map encoding and decoding method and device, which can improve encoding and decoding performance while reducing encoding and decoding complexity.
  • the present application provides a feature map decoding method, the method comprising: obtaining a code stream of a feature map to be decoded, the feature map to be decoded includes a plurality of feature elements; based on the code stream of the feature map to be decoded, obtaining multiple The first probability estimation result corresponding to each feature element in the feature elements, the first probability estimation result includes the first peak probability; based on the first threshold and the first peak probability corresponding to each feature element, from multiple feature elements A set of first feature elements and a set of second feature elements are determined; based on the set of first feature elements and the set of second feature elements, the decoding feature map is obtained.
  • the present application uses the first threshold and each feature element Corresponding to the peak probability, the method of determining the first feature element and the second feature element is more accurate, thereby improving the accuracy of the obtained decoding feature map and improving the performance of data decoding.
  • the first probability estimation result is a Gaussian distribution
  • the first peak probability is a mean probability of the Gaussian distribution
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is composed of a plurality of Gaussian distributions
  • the first peak probability is the maximum value among the average probabilities of each Gaussian distribution; or, the first peak probability is formed by each Gaussian distribution
  • the mean probability and the weight of each Gaussian distribution in this mixture of Gaussian distributions are calculated.
  • the values of the decoded feature map are composed of values of all first feature elements in the set of first feature elements and values of all second feature elements in the set of second feature elements.
  • the set of the first feature elements is an empty set, or the set of the second feature elements is an empty set.
  • the first probability estimation result further includes the feature value corresponding to the first peak probability; furthermore, entropy decoding may be performed on the first feature element for the first probability estimation result corresponding to the first feature element , to obtain the value of the first feature element; based on the feature value corresponding to the first peak probability of the second feature element, to obtain the value of the second feature element.
  • the present application compared to assigning the value of the uncoded feature element (ie the second feature element) to a fixed value, the present application will assign the value of the uncoded feature element (ie the second feature element) The value is assigned to the feature value corresponding to the first peak probability of the second feature element, which improves the accuracy of the value of the second feature element in the value of the decoded data map, thereby improving the performance of data decoding.
  • the first threshold is obtained before determining the set of first feature elements and the set of second feature elements from multiple feature elements based on the first threshold and the first peak probability corresponding to each feature element. It may also be Based on the code stream of the feature map to be decoded, the first threshold is obtained.
  • the feature map to be decoded corresponds to its own first threshold, which increases the variable flexibility of the first threshold, thereby reducing the The gap between the replacement value and the true value of the unencoded feature element (i.e., the second feature element) improves the accuracy of the decoded feature map.
  • the first peak probability of the first feature element is less than or equal to a first threshold, and the first peak probability of the second feature element is greater than the first threshold.
  • the first probability estimation result is a Gaussian distribution
  • the first probability estimation result also includes a first probability variance value; in this case, the first probability variance value of the first feature element is greater than or equal to the first threshold, and the first probability variance value of the second feature element is smaller than the first threshold.
  • the probability estimation result is a Gaussian distribution
  • the time complexity of determining the first feature element and the second feature element through the probability variance value is less than that of determining the first feature element and the second feature element through the peak probability
  • the time complexity of the way of feature elements can improve the speed of data decoding.
  • side information corresponding to the feature map to be decoded is obtained based on a code stream of the feature map to be decoded; based on the side information, a first probability estimation result corresponding to each feature element is obtained.
  • the side information corresponding to the feature map to be decoded is obtained; for each feature element in the feature map to be encoded, each feature is estimated based on the side information and the first context information A first probability estimation result of an element; wherein, the first context information is a feature element within a preset region in the feature map to be decoded.
  • the present application provides a feature map encoding method, the method comprising: obtaining a first feature map to be encoded, the first feature map to be encoded includes a plurality of feature elements; based on the first feature map to be encoded, determining a plurality of A first probability estimation result of each feature element in the feature elements, the first probability estimation result including a first peak probability; for each feature element in the first feature map to be encoded, based on the first peak probability of the feature element, Determine whether the feature element is the first feature element; only if the feature element is the first feature element, perform entropy encoding on the first feature element.
  • the judgment result is improved by the probability peak value of each feature element (whether the feature element needs entropy coding), and will skip the coding process of more feature elements, further improving the coding speed, thereby improving the coding performance.
  • the first probability estimation result is a Gaussian distribution
  • the first peak probability is the mean probability of the Gaussian distribution
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is composed of a plurality of Gaussian distributions
  • the first peak probability is the maximum value among the average probabilities of each Gaussian distribution; or, the first peak probability is formed by each Gaussian distribution
  • the mean probability and the weight of each Gaussian distribution in this mixture of Gaussian distributions are calculated.
  • each feature element in the first feature map to be encoded it is determined whether the feature element is the first feature element based on the first threshold and the first peak probability of the feature element.
  • the second probability estimation result of each feature element in the plurality of feature elements is determined, and the second probability estimation result includes a second peak probability;
  • a third feature element set is determined from the plurality of feature elements;
  • a first threshold is determined based on the second peak probability of each feature element in the third feature element set; entropy encoding is performed on the first threshold.
  • the feature map to be encoded can determine the first threshold of the feature map to be encoded according to its own feature elements, so that there is better adaptability between the first threshold and the feature map to be encoded, thereby improving
  • the judgment result determined based on the first threshold and the first peak probability of the feature element that is, whether the feature element needs to perform entropy coding has higher reliability.
  • the first threshold is a maximum second peak probability of the second peak probabilities corresponding to each feature element in the third feature element set.
  • the first peak probability of the first feature element is less than or equal to the first threshold.
  • the second probability estimation result is a Gaussian distribution
  • the second probability estimation result also includes a second probability variance value
  • the first threshold is the second probability corresponding to each feature element in the third feature element set Minimum second probability variance value for variance values.
  • the first probability estimation result is also a Gaussian distribution
  • the first probability estimation result further includes a first probability variance value
  • the first probability variance value of the first feature element is greater than or equal to a first threshold.
  • the second probability estimation result also includes the feature value corresponding to the second peak probability, further, based on the preset error, the value of each feature element, and the value corresponding to the second peak probability of each feature element
  • the characteristic value is to determine a set of third characteristic elements from the plurality of characteristic elements.
  • the characteristic elements in the set of the third characteristic elements have the following characteristics: in, is the value of the feature element, p(x,y,i) is the feature value corresponding to the second peak probability of the feature element, and TH_2 is the preset error.
  • the first probability estimation result is the same as the second probability estimation result.
  • the side information of the first feature map to be encoded is obtained; Estimated to obtain the first probability estimation result of each feature element.
  • the first probability estimation result is different from the second probability estimation result.
  • the second context information is the feature elements within the preset area range of the first feature map to be encoded; based on the side information and the second context information, obtain the first feature element of each feature element Two probability estimation results.
  • the side information of the first feature map to be encoded is obtained; for any feature element in the first feature map to be encoded, based on the first context information and side information, determine The first probability estimation result of the feature element; wherein, the first probability estimation result also includes the feature value corresponding to the first probability peak, and the first context information is that the feature element is within the preset area range of the second feature map to be encoded
  • the value of the second feature map to be encoded is composed of the value of the first feature element and the feature value corresponding to the first peak probability of the second feature element, the second feature element is the first feature map to be encoded Feature elements other than the first feature element in .
  • the entropy coding results of all the first feature elements are written into the coded code stream.
  • the present application provides a feature map decoding device, including:
  • An acquisition module configured to acquire a code stream of a feature map to be decoded, where the feature map to be decoded includes a plurality of feature elements; and based on the code stream of the feature map to be decoded, to obtain each of the plurality of feature elements A first probability estimation result corresponding to the feature element, the first probability estimation result including a first peak probability;
  • a decoding module configured to determine a set of first feature elements and a set of second feature elements from the plurality of feature elements based on the first threshold and the first peak probability corresponding to each feature element; based on the first A set of feature elements and a set of the second feature elements to obtain a decoded feature map.
  • the present application provides a feature map encoding device, including:
  • An acquisition module configured to acquire a first feature map to be encoded, where the first feature map to be encoded includes a plurality of feature elements
  • An encoding module configured to determine, based on the first feature map to be encoded, a first probability estimation result for each of the plurality of feature elements, where the first probability estimation result includes a first peak probability; for the For each feature element in the first feature map to be encoded, based on the first peak probability of the feature element, determine whether the feature element is the first feature element; only if the feature element is the first feature element , performing entropy encoding on the first feature element.
  • the present application provides a decoder, including a processing circuit, configured to determine the method according to any one of the first aspect and the first aspect.
  • the present application provides an encoder, including a processing circuit for judging the second aspect and the method described in any one of the second aspect.
  • the present application provides a computer program product, including program code, which is used to judge the above-mentioned first aspect and the method described in any one of the first aspect, or to judge the above-mentioned The second aspect and the method according to any one of the second aspect.
  • the present application provides a decoder, including: one or more processors; a non-transitory computer-readable storage medium, coupled to the processor and storing a program judged by the processor, wherein the The program enables the decoder to judge the first aspect and the method described in any one of the first aspect when judged by the processor.
  • the present application provides an encoder, including: one or more processors; a non-transitory computer-readable storage medium, coupled to the processor and storing a program judged by the processor, wherein the When the program is judged by the processor, the encoder is configured to make the encoder judge the second aspect and the method described in any one of the second aspect.
  • the present application provides a non-transitory computer-readable storage medium, including program code, which is used to determine any one of the first aspect and the first aspect, the second aspect and The method according to any one of the second aspect.
  • the present invention relates to a decoding device, which has the function of realizing the behavior in the first aspect or any one of the method embodiments of the first aspect.
  • Said functions can be realized by hardware, and can also be realized by corresponding software judged by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present invention relates to an encoding device, which has the function of implementing the actions in the second aspect or any one of the method embodiments of the second aspect.
  • Said functions can be realized by hardware, and can also be realized by corresponding software judged by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • FIG. 1 is a schematic diagram of a data decoding system architecture provided by an embodiment of the present application
  • Fig. 2a is a schematic diagram of an output result of a probability estimation module 103 provided by an embodiment of the present application.
  • Fig. 2b is a schematic diagram of a probability estimation result provided by the embodiment of the present application.
  • Fig. 3 is a schematic flow chart of a feature map encoding method provided by an embodiment of the present application.
  • Fig. 4a is a schematic diagram of the input and output results of a probability estimation module 103 provided by the embodiment of the present application;
  • Fig. 4b is a schematic structural diagram of a probability estimation network provided by an embodiment of the present application.
  • Fig. 4c is a schematic flowchart of a method for determining a first threshold provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a feature map decoding method provided in an embodiment of the present application.
  • Fig. 6a is a schematic flowchart of another feature map encoding method provided by the embodiment of the present application.
  • Fig. 6b is a schematic diagram of the input and output results of another probability estimation module 103 provided by the embodiment of the present application.
  • Fig. 7a is a schematic flowchart of another feature map decoding method provided by the embodiment of the present application.
  • Figure 7b is a schematic diagram of the experimental results of a compression performance comparison test provided in the embodiment of the present application.
  • Figure 7c is a schematic diagram of the experimental results of another compression performance comparison test provided in the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a feature map encoding device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a feature map decoding device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the feature map decoding method and feature map coding method provided in the embodiments of the present application can be applied in the field of data coding (including the field of audio coding, video coding and image coding), specifically, the feature map decoding method and feature map coding method can be Applied in album management, human-computer interaction, audio compression or transmission, video compression or transmission, image compression or transmission, data compression or transmission. It should be noted that, for ease of understanding, the embodiments of the present application only use the feature map decoding method and the feature map encoding method to apply to the field of image coding for schematic illustration, which should not be regarded as limitations on the methods provided in the present application.
  • the end-to-end image feature map encoding and decoding system includes two parts: image encoding and image decoding.
  • Image encoding is determined at the source and typically involves processing (eg, compressing) raw video images to reduce the amount of data required to represent the video images (and thus more efficient storage and/or transmission).
  • Image decoding is judged at the destination and usually involves inverse processing relative to the encoder to reconstruct the image.
  • an end-to-end image feature map encoding and decoding system through the feature map decoding and feature map encoding methods provided in this application, it is possible to determine whether entropy coding is required for each feature element in the feature map to be encoded, thereby skipping some features
  • the encoding process of elements reduces the number of elements performing entropy encoding and reduces the complexity of entropy encoding. Moreover, the reliability of the judgment result (whether entropy coding needs to be performed on the feature element) is improved by using the probability peak value of each feature element, thereby improving the performance of image compression.
  • Entropy coding is the coding that does not lose any information according to the principle of entropy during the coding process. Entropy coding is used to apply entropy coding algorithms or schemes to quantized coefficients and other syntax elements to obtain coded data that can be output in the form of coded bit streams through the output terminal, so that decoders, etc. can receive and use parameters for decoding.
  • the encoded bitstream can be transmitted to the decoder, or it can be stored in memory for later transmission or retrieval by the decoder.
  • entropy coding algorithms or schemes include but are not limited to: variable length coding (variable length coding, VLC) scheme, context adaptive VLC scheme (context adaptive VLC, CALVC), arithmetic coding scheme, binarization algorithm, context adaptive Binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), syntax-based context-adaptive binary arithmetic coding (syntax-based context-adaptive binary arithmetic coding, SBAC), probability interval partitioning entropy (probability interval partitioning entropy, PIPE) coding or Other entropy coding methods or techniques.
  • VLC variable length coding
  • CABAC context adaptive Binary arithmetic coding
  • CABAC syntax-based context-adaptive binary arithmetic coding
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • Other entropy coding methods or techniques Other entropy
  • the neural network may be composed of neural units, and the neural unit may refer to an operation unit that takes x s and the intercept 1 as inputs, and the output of the operation unit may be as shown in formula (1).
  • W s is the weight of x s
  • b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function may be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • DNN Deep neural network
  • DNN is also called multi-layer neural network, which can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • y ⁇ (Wx+b)
  • x is the input vector and y is The output vector
  • b is the offset vector
  • W is the weight matrix (also called coefficient)
  • ⁇ () is the activation function.
  • Each layer is just such a simple operation on the input vector x to get the output vector y. Due to the large number of DNN layers, the number of coefficients W and offset vector b is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • CNN Convolutional neural network
  • CNN is a deep neural network with convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of a convolutional layer and a subsampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolutional feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. That means that the image information learned in one part can also be used in another part. So for all positions on the image, the same learned image information can be used.
  • multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • RNN Recurrent neural networks
  • RNN is used to process sequence data, that is, the current output of a sequence is also related to the previous output, that is, the output of RNN needs to depend on the current input information and historical memory information.
  • the specific manifestation is that the network will remember the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer and the current layer are no longer connected but connected, and the input of the hidden layer not only includes The output of the input layer also includes the output of the hidden layer at the previous moment.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as that of traditional CNN or DNN.
  • the error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is expanded, the parameters (such as W) in it are shared; this is not the case with the traditional neural network mentioned above. And in the gradient descent algorithm, the output of each step depends not only on the network of the current step, but also depends on the state of the previous several steps of the network. This learning algorithm is called back propagation through time (BPTT) based on time.
  • BPTT back propagation through time
  • the convolutional neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial super-resolution model by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
  • GAN Generative adversarial networks
  • the model includes at least two modules: one module is a Generative Model, and the other is a Discriminative Model. These two modules learn from each other through games to produce better output.
  • Both the generative model and the discriminative model can be neural networks, specifically deep neural networks or convolutional neural networks.
  • the basic principle of GAN is as follows: Taking the GAN that generates pictures as an example, suppose there are two networks, G (Generator) and D (Discriminator), where G is a network that generates pictures, which receives a random noise z, and passes this noise Generate a picture, denoted as G(z); D is a discriminant network, used to determine whether a picture is "real".
  • Its input parameter is x
  • x represents a picture
  • the output D(x) represents the probability that x is a real picture. If it is 1, it means that 100% is a real picture. If it is 0, it means that it cannot be real. picture.
  • the goal of the generation network G is to generate real pictures as much as possible to deceive the discriminant network D
  • the goal of the discriminant network D is to distinguish the pictures generated by G from the real pictures as much as possible. Come. In this way, G and D constitute a dynamic "game” process, which is the "confrontation" in the "generative confrontation network”.
  • the pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color.
  • the pixel value is 256*Red+100*Green+76Blue, where Blue represents a blue component, Green represents a green component, and Red represents a red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness.
  • the pixel values may be grayscale values.
  • FIG. 1 is a data decoding system architecture provided by an embodiment of the present application.
  • the data decoding system architecture includes a data acquisition module 101 , a feature extraction module 102 , a probability estimation module 103 , a data encoding module 104 , a data decoding module 105 , a data reconstruction module 106 and a display module 107 . in:
  • the data collection module 101 is used for collecting original images.
  • the data acquisition module 101 may include or be any type of image capture device for capturing real-world images, etc., and/or any type of image generation device, such as a computer graphics processor or any type of Devices for acquiring and/or providing real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality (AR) images)
  • the data collection module 101 can also be any type of internal memory or memory that stores the above-mentioned images.
  • the feature extraction module 102 is used to receive the original image from the data acquisition module 101, preprocess the original image, and further use the feature extraction network to extract a feature map (i.e., a feature map to be encoded) from the preprocessed image.
  • a map i.e., a feature map to be encoded
  • the aforementioned preprocessing of the original image includes, but is not limited to: trimming, color format conversion (for example, conversion from RGB to YCbCr), color correction, denoising or normalization, and the like.
  • the feature extraction network may be one or a variant of the aforementioned neural network, DNN, CNN or RNN, and the specific form of the feature extraction network is not specifically limited here.
  • the feature extraction module 102 is also configured to round the feature map (ie, the feature map to be encoded) by scalar quantization or vector quantization, for example. It should be known that the feature map includes multiple feature elements, and the value of the feature map consists of the value of each feature element.
  • the feature extraction module 102 also includes a side information extraction network, that is, the feature extraction module 102 not only outputs the feature map output by the feature extraction network, but also outputs the side information extracted from the feature map through the side information extraction network.
  • the side information extraction network may be one or a variant of the aforementioned neural network, DNN, CNN or RNN, and the specific form of the feature extraction network is not specifically limited here.
  • the probability estimation module 103 is used for estimating the probability of the corresponding value of each feature element among the multiple feature elements of the feature map (ie, the feature map to be encoded).
  • the feature map to be encoded includes m feature elements, where m is a positive integer, as shown in Figure 2a
  • the probability estimation module 103 outputs the probability estimation result of each feature element in the m feature elements
  • the probability estimation result of a feature element can be shown in Figure 2b
  • the horizontal axis coordinate of Figure 2b is the possible value of the feature element (or called the possible value of the feature element)
  • the vertical axis coordinate represents each possible value ( Or called the possibility of the possible value of the feature element)
  • point P indicates that the probability of the feature element taking the value [a-0.5, a+0.5] is p.
  • the data encoding module 104 is used to perform entropy encoding according to the feature map from the feature extraction module 102 (i.e., the feature map to be encoded) and the probability estimation result of each feature element from the probability estimation module 103, to generate a coded code stream (also referred to herein as is the code stream of the feature map to be decoded).
  • the data decoding module 105 is configured to receive the encoded code stream from the data encoding module 104, further perform entropy decoding according to the encoded code stream and the probability estimation result of each feature element from the probability estimation module 103, and obtain the decoded feature map (or understood as the value of the decoded feature map).
  • the data reconstruction module 106 is configured to post-process the decoded image feature map from the data decoding module 105, and use an image reconstruction network to perform image reconstruction on the post-processed decoded image feature map to obtain a decoded image.
  • post-processing operations include but are not limited to color format conversion (for example, from YCbCr to RGB), color correction, pruning or resampling, etc.; the image reconstruction network can be one of the aforementioned neural networks, DNN, CNN or RNN
  • the specific form of the feature extraction network is not specifically limited here.
  • the display module 107 is configured to display the decoded image from the data reconstruction module 106, so as to display the image to a user or a viewer.
  • the display module 107 may be or include any type of player or display for representing reconstructed audio or images, eg, an integrated or external display or display.
  • the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display. ), a digital light processor (DLP), or any type of other display.
  • the data decoding system architecture can be a functional module of a device; the data decoding system architecture can also be an end-to-end data decoding system, that is, the data decoding system architecture includes two devices: End device and destination end device, wherein, the source end device can include: data acquisition module 101, feature extraction module 102, probability estimation module 103 and data encoding module 104; Destination end device can include: data decoding module 105, data reconstruction module 106 and display module 107 .
  • Method 1 for the source device to provide the coded stream to the destination device the source device may send the coded stream to the destination device through a communication interface, and the communication interface may be a communication between the source device and the destination device.
  • Direct communication link such as direct wired or wireless connection, etc., or through any type of network, such as wired network, wireless network or any combination thereof, any type of private network and public network or any combination thereof;
  • source device Method 2 for providing the coded stream to the destination device the source device may also store the coded stream in a storage device, and the destination device may obtain the coded stream from the storage device.
  • the feature map encoding method mentioned in this application can be mainly executed by the probability estimation module 103 and the data encoding module 104 in Figure 1, and the feature map decoding method mentioned in this application can be mainly performed by the probability The estimation module 103 and the data decoding module 105 are implemented.
  • the method execution body of the feature map encoding method provided in this application is the encoding side device, and the encoding side device may mainly include the probability estimation module 103 and the data encoding module 104 in FIG. 1 .
  • the encoding side device may include the following steps: Step 11-Step 14. in:
  • Step 11 The encoding side device acquires a first feature map to be encoded, and the first feature map to be encoded includes a plurality of feature elements.
  • Step 12 the probability estimation module 103 in the encoding side device determines a first probability estimation result of each feature element in the plurality of feature elements based on the first feature map to be encoded, and the first probability estimation result includes a first peak probability.
  • Step 13 For each feature element in the first feature map to be encoded, the encoding side device determines whether the feature element is the first feature element based on the first peak probability of the feature element.
  • Step 14 only if the feature element is the first feature element, the data encoding module 104 in the encoding side device performs entropy encoding on the first feature element.
  • the feature map decoding method provided in this application is executed by a decoding device, and the decoding device mainly includes the probability estimation module 103 and the data decoding module 105 in FIG. 1 .
  • the decoding side device may include the following steps: Step 21-Step 24. in:
  • Step 21 The device on the decoding side acquires the code stream of the feature map to be decoded, and the feature map to be decoded includes a plurality of feature elements.
  • Step 22 The probability estimation module 103 in the decoding side device obtains a first probability estimation result corresponding to each of the multiple feature elements based on the code stream of the feature map to be decoded, and the first probability estimation result includes the first peak probability .
  • Step 23 The decoding side device determines a set of first feature elements and a set of second feature elements from multiple feature elements based on the first threshold and the first peak probability corresponding to each feature element.
  • Step 24 the data decoding module 105 in the decoding side device obtains the decoding feature map based on the set of the first feature elements and the set of the second feature elements.
  • the schematic diagram of the execution flow of the coding side shown in Figure 3 and the schematic diagram of the execution flow of the decoding side shown in Figure 5 can be regarded as a schematic flow diagram of a set of feature map encoding and decoding methods;
  • the schematic diagram of the execution flow of the decoding side shown in Fig. 7a can be regarded as a schematic flow diagram of a set of feature map encoding and decoding methods.
  • FIG. 3 is a schematic flowchart of a feature map encoding method provided by an embodiment of the present application.
  • the feature map encoding method includes steps S301 to S306 . in:
  • the feature map y to be encoded is obtained. Further, the feature map y to be encoded is quantized, that is, the feature value of the floating point number is rounded to obtain an integer feature value, and the quantized feature map to be encoded is obtained. (i.e. the first feature map to be encoded), the feature map The characteristic elements in are expressed as In a specific example, referring to the specific description of the original image collected by the data collection module 101 shown in FIG. 1 , the specific description of the feature map to be encoded is obtained through the feature extraction module 102 .
  • the side information can be understood as a feature map obtained by further feature extraction of the feature map to be encoded, and the number of feature elements contained in the side information is less than the number of feature elements in the feature map to be encoded.
  • side information of the first feature map to be encoded may be obtained through a side information extraction network.
  • the side information extraction network may use RNN, CNN, deformation of RNN, deformation of CNN or other deep neural network (or deformation of other deep neural network), which is not specifically limited in this application.
  • the side information is used as the input of the probability estimation module 103 in FIG. 1, and the output of the probability estimation module 103 is the first probability estimation result of each feature element.
  • the probability estimation module 103 may be a probability estimation network, and the probability estimation network may use RNN, CNN, a deformation of RNN, a deformation of CNN or other deep neural networks (or deformations of other deep neural networks).
  • Figure 4b is a schematic structural diagram of a probability estimation network.
  • the probability estimation network is a convolutional network, which includes 5 network layers: 3 convolutional layers and 2 Nonlinear activation layer.
  • the probability estimation module 103 can also be realized by a non-network traditional probability estimation method. Probability estimation methods include, but are not limited to, statistical methods such as equal maximum likelihood estimation, maximum a posteriori estimation, and maximum likelihood estimation.
  • the first probability estimation result of is: the feature element The probability of each possible value (or called possible value) of .
  • the horizontal axis represents the feature element Each possible value (or called possible value) of
  • the vertical axis represents the possibility of each possible value (or called possible value).
  • the first peak probability is the probability maximum value in the first probability estimation result, and it can also be called the probability peak value in the first probability estimation result, as shown in Figure 2b, the value p of the ordinate of point P is the first probability estimation result The first peak probability in .
  • the first probability estimation result is a Gaussian distribution
  • the first peak probability is a mean probability of the Gaussian distribution.
  • the first probability estimation result is a Gaussian distribution as shown in FIG. 2 b
  • the first peak value is the mean probability in the Gaussian distribution, that is, the probability p corresponding to the mean value a.
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is composed of multiple Gaussian distributions.
  • the mixed Gaussian distribution can be multiplied by each Gaussian distribution by the weight of each Gaussian distribution, weighted get.
  • the first peak probability is the maximum value among the mean probabilities of the respective Gaussian distributions.
  • the first peak probability is calculated from the average probability of each Gaussian distribution and the weight of each Gaussian distribution in the mixed Gaussian distribution.
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is obtained by weighting Gaussian distribution 1 , Gaussian distribution 2 and Gaussian distribution 3 .
  • the weight of Gaussian distribution 1 is w 1
  • the weight of Gaussian distribution 2 is w 2 and the weight of Gaussian distribution 3 is w 3
  • the mean probability of Gaussian distribution 1 is p 1
  • the mean probability of Gaussian distribution 2 is p 2
  • Gaussian distribution The mean probability of distribution 3 is p 3
  • p 1 is greater than p 2 greater than p 3 .
  • the first peak probability is the maximum value among the average probabilities of each Gaussian distribution
  • the first peak probability is the maximum value among the average probabilities of each Gaussian distribution (ie, the average probability of Gaussian distribution 1 is p 1 ).
  • the first peak probability is as shown in formula (2).
  • the weights corresponding to each Gaussian distribution in the mixed Gaussian distribution can be obtained and output by the probability estimation network (such as the aforementioned probability estimation module 103 ).
  • the probability estimation network obtains the first probability estimation result of each feature element (that is, the mixed Gaussian distribution)
  • it also obtains the weights corresponding to the Gaussian distributions that make up the mixed Gaussian distribution.
  • a set of third feature elements is determined from multiple feature elements in the first feature map to be encoded. Further, the first threshold is determined based on the first probability estimation result of each feature element in the third feature element set.
  • the process of determining the first threshold can be divided into two steps. Specifically, a schematic flow chart of determining the first threshold is shown in FIG. 4c, including steps S401-S402. in:
  • a set of third feature elements is determined from a plurality of feature elements in the first feature map to be encoded, and the set of third feature elements can be understood as A collection of feature elements used to determine the first threshold.
  • the first feature element can be determined from the plurality of feature elements.
  • the feature value corresponding to the second peak probability of each feature element refers to: in the first probability estimation result of the feature element, the possible value (or possible value) of the feature element corresponding to the first peak probability, for example The abscissa coordinate value a of point P in Fig. 2b.
  • the preset error value can be understood as the tolerable error of the feature map encoding method, which can be determined according to empirical values or algorithms.
  • the determined feature elements in the set of the third feature elements have the features shown in formula (3).
  • a plurality of feature elements constituting the first feature map to be encoded are: feature element 1 , feature element 2 , feature element 3 , feature element 4 and feature element 5 .
  • the first probability estimation result of each of the multiple feature elements of the first feature map to be encoded has been obtained through the probability estimation module.
  • the value of each feature element and the first peak probability of each feature element corresponding to the first probability estimation result (hereinafter referred to as the first peak probability of the feature element), from the feature Among element 1, characteristic element 2, characteristic element 3, characteristic element 4 and characteristic element 5, the characteristic elements satisfying the formula (3) are selected to form the set of the third characteristic element.
  • the absolute difference between the value of feature element 1 and the feature value corresponding to the first peak probability of feature element 1 is greater than TH_2, then feature element 1 satisfies formula (3); the value of feature element 2 and feature element 2 correspond to the first peak probability
  • the absolute difference between the eigenvalues of the peak probability is greater than TH_2, then the feature element 2 satisfies the formula (3); the absolute difference between the value of the feature element 3 and the eigenvalue of the feature element 3 corresponding to the first peak probability is less than TH_2, Then feature element 3 does not satisfy formula (3); the absolute difference between the value of feature element 4 and the feature value corresponding to the first peak probability of feature element 4 is equal to TH_2, then feature element 4 does not satisfy formula (3); feature element If the absolute difference between the value of 5 and the eigenvalue corresponding to the first peak probability of feature element 5 is greater than TH_2, feature element 5 satisfies formula (3).
  • S402. Determine a first threshold based on a first probability estimation result of each feature element in the third feature element set.
  • the first threshold is determined according to the form of the first probability estimation result of each feature element in the third feature element set.
  • the form of the first probability estimation result includes Gaussian distribution or other forms of probability distribution (including but not limited to Laplace distribution or mixed Gaussian distribution, etc.).
  • the manner of determining the first threshold is described below based on the form of the first probability distribution result.
  • the first threshold is the maximum first peak probability of the first peak probabilities corresponding to each feature element in the third feature element set.
  • the form of the first probability distribution result may be Gaussian distribution, or other forms of probability distribution (including but not limited to Laplace distribution or mixed Gaussian distribution, etc.).
  • feature element 1, feature element 2, and feature element 5 are determined as the third feature element, forming a set of the third feature elements.
  • the first peak probability of feature element 1 is 70%
  • the first peak probability of feature element 2 is 65%
  • the first peak probability of feature element 5 is 75%
  • the maximum first peak probability ie, 75% of the first peak probability of feature element 5 is determined as the first threshold.
  • the first probability estimation result is a Gaussian distribution
  • the first probability estimation result also includes a first probability variance value
  • the first threshold value is the first probability variance corresponding to each feature element in the third feature element set The minimum first probability variance value for values.
  • the mathematical characteristics of the Gaussian distribution can be summarized as follows: in a Gaussian distribution, the larger the first probability variance value is, the smaller the first peak probability is; and in the case of the Gaussian distribution of the first probability estimation result, from The speed of obtaining the first probability variance value from the first probability estimation result is faster than the speed of obtaining the first peak probability from the first probability estimation result. It can be seen that when the first probability estimation result is a Gaussian distribution, the efficiency of determining the first threshold based on the first probability variance value may be better than the efficiency of determining the first threshold based on the first peak probability.
  • feature element 1, feature element 2, and feature element 5 are determined as the third feature element to form a set of the third feature elements.
  • the first probability variance value ⁇ of feature element 1 is 0.6
  • the first probability variance value ⁇ of feature element 2 is 0.7
  • the first probability variance value ⁇ of feature element 5 is 0.5
  • the set of the third feature element The smallest first probability variance value ⁇ corresponding to each feature element in (ie, the probability variance value of feature element 5 is 0.5) is determined as the first threshold.
  • the first threshold is determined according to the feature elements in the first feature map to be encoded, that is, the first threshold corresponds to the first feature map to be encoded
  • the first Entropy coding is performed on the threshold, and the result of the entropy coding is written into the code stream of the first feature map to be coded.
  • the feature element For each of the plurality of feature elements in the first feature map to be encoded, it may be determined whether the feature element is the first feature element based on the first threshold and the first probability estimation result of the feature element. It can be seen that an important judgment condition for judging whether a feature element is the first feature element is the first threshold value. The method of determining whether the feature element is the first feature element will be discussed in detail below based on the aforementioned specific method of determining the first threshold value.
  • Method 1 When the first threshold is the maximum second peak probability of the first peak probability corresponding to each feature element in the third feature element set, the first feature element determined according to the first threshold satisfies the condition: the first feature element The first peak probability of is less than or equal to the first threshold.
  • a plurality of feature elements constituting the first feature map to be encoded are: feature element 1 , feature element 2 , feature element 3 , feature element 4 and feature element 5 .
  • characteristic element 1, characteristic element 2, and characteristic element 5 form a set of third characteristic elements
  • the first threshold value is determined to be 75% according to the set of third characteristic elements.
  • the first peak probability of feature element 1 is 70%, less than the first threshold
  • the first peak probability of feature element 2 is 65%, less than the first threshold
  • the first peak probability of feature element 3 is 80%, greater than the first threshold
  • the first peak probability of feature element 4 is 60%, less than the first threshold
  • the first peak probability of feature element 5 is 75%, equal to the first threshold.
  • feature element 1, feature element 2, feature element 4, and feature element 5 are determined as the first feature element.
  • Method 2 When the first threshold value is the minimum first probability variance value of the first probability variance value corresponding to each feature element in the set of the third feature element, the first feature element determined according to the first threshold value satisfies the condition : the first probability variance value of the first feature element is greater than or equal to the first threshold.
  • a plurality of feature elements constituting the first feature map to be encoded are: feature element 1 , feature element 2 , feature element 3 , feature element 4 and feature element 5 .
  • feature element 1, feature element 2, and feature element 5 form a set of third feature elements
  • the first threshold value is determined to be 0.5 according to the set of third feature elements.
  • the first peak probability of feature element 1 is 0.6, greater than the first threshold
  • the first peak probability of feature element 2 is 0.7, greater than the first threshold
  • the first peak probability of feature element 3 is 0.4
  • the first peak probability of feature element 4 is 0.75, which is greater than the first threshold
  • the first peak probability of feature element 5 is 0.5, which is equal to the first threshold.
  • feature element 1, feature element 2, feature element 4, and feature element 5 are determined as the first feature element.
  • a plurality of feature elements constituting the first feature map to be encoded are: feature element 1 , feature element 2 , feature element 3 , feature element 4 and feature element 5 .
  • Feature element 1, feature element 2, feature element 4, and feature element 5 are determined as the first feature element.
  • Entropy encoding is not performed on feature element 2, but entropy encoding is performed on feature element 1, feature element 2, feature element 4, and feature element 5, and entropy encoding results of all first feature elements are written into the encoded code stream.
  • each feature element in S305 is: not the first feature element, entropy encoding is not performed on each feature element. If the judgment result of each characteristic element in S305 is: it is the first characteristic element, perform entropy encoding on each characteristic element, and write the entropy encoding result of each characteristic element into the encoded code stream.
  • FIG. 5 is a schematic flowchart of a feature map decoding method provided by an embodiment of the present application.
  • the feature decoding method includes steps S501 to S504 . in:
  • the bit stream of the feature map to be encoded can be understood as the encoded bit stream obtained in S306.
  • the feature map to be decoded is a feature map obtained after data decoding of the code stream.
  • the feature map to be decoded includes a plurality of feature elements, and the feature elements are divided into two parts: a set of first feature elements and a set of second feature elements.
  • the first set of feature elements is the set of feature elements that have been entropy encoded in the feature map encoding stage of Figure 3
  • the second set of feature elements is the feature that has not been entropy encoded in the feature map encoding stage of Figure 3 A collection of elements.
  • the set of the first feature elements is an empty set, or, the set of the second feature elements is an empty set.
  • the set of the first feature elements is an empty set, which means that in the feature map encoding stage in Figure 3, each feature element is not entropy encoded; the set of the second feature elements is an empty set, that is, in Figure 3 In the feature map encoding stage of , each feature element is entropy encoded.
  • Entropy decoding is performed on the code stream of the feature map to be decoded; further, a first probability estimation result corresponding to each feature element in the plurality of feature elements may be obtained according to the entropy decoding result.
  • the first probability estimate includes a first peak probability.
  • side information corresponding to the feature map to be decoded is obtained based on a code stream of the feature map to be decoded; based on the side information, a first probability estimation result corresponding to each feature element is obtained.
  • the code stream of the feature map to be decoded includes the entropy coding result of the side information, so entropy decoding can be performed on the code stream of the feature map to be decoded, and the obtained entropy decoding result includes the side information corresponding to the feature map to be decoded.
  • the side information is used as the input of the probability estimation module 103 in FIG. The first probability estimation result of the feature element in the set of feature elements).
  • the first probability estimation result of a certain feature element is shown in Figure 2b, and the horizontal axis represents the feature element Each possible value (or called possible value) of , and the vertical axis represents the possibility of each possible value (or called possible value).
  • the first peak probability is the probability maximum value in the first probability estimation result, and it can also be called the probability peak value in the first probability estimation result, as shown in Figure 2b, the value p of the ordinate of point P is the first probability estimation result
  • the first peak probability in It should be known that the first probability estimation result is a Gaussian distribution, and the first peak probability is the mean probability of the Gaussian distribution.
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is composed of a plurality of Gaussian distributions
  • the first peak probability is the maximum value among the average probabilities of each Gaussian distribution
  • the first peak probability is formed by each Gaussian distribution
  • the mean probability and the weight of each Gaussian distribution in this mixture of Gaussian distributions are calculated.
  • the specific implementation manner of obtaining the first peak probability based on the first probability estimation result please refer to the related description of the first probability estimation result and the first peak probability in S303 above, and the repetition will not be repeated.
  • the probability estimation module 103 may be a probability estimation network, and the probability estimation network may use RNN, CNN, a deformation of RNN, a deformation of CNN or other deep neural networks (or deformations of other deep neural networks).
  • Figure 4b is a schematic structural diagram of a probability estimation network.
  • the probability estimation network is a convolutional network, which includes 5 network layers: 3 convolutional layers and 2 Nonlinear activation layer.
  • the probability estimation module 103 can also be realized by a non-network traditional probability estimation method.
  • Probability estimation methods include, but are not limited to, statistical methods such as equal maximum likelihood estimation, maximum a posteriori estimation, and maximum likelihood estimation.
  • a set of first feature elements and a set of second feature elements are determined from the multiple feature elements of the feature map to be decoded.
  • the first threshold may be determined through negotiation between the device corresponding to the feature map encoding method and the device corresponding to the feature map decoding method; it may also be set according to empirical values; the first threshold may also be based on the The code stream of the feature map is obtained.
  • the first threshold may be the first maximum probability of the first peak in the set of the third feature elements in the first manner of S402.
  • the feature element is determined as the second feature element (that is, the first two feature elements in the set of feature elements); if the first peak probability of the feature element is less than or equal to (including less than or equal to) the first threshold, then the feature element is determined as the first feature element (i.e. the first feature element in the set of feature elements).
  • the first threshold is 75%
  • the multiple feature elements of the feature map to be decoded are feature element 1, feature element 2, feature element 3, feature element 4, and feature element 5, wherein the first feature element of feature element 1
  • the peak probability is 70%, less than the first threshold; the first peak probability of feature element 2 is 65%, less than the first threshold; the first peak probability of feature element 3 is 80%, greater than the first threshold; the first peak probability of feature element 4 A peak probability is 60%, which is less than the first threshold; and the first peak probability of the feature element 5 is 75%, which is equal to the first threshold.
  • feature element 1, feature element 2, feature element 4, and feature element 5 are determined as the first feature element.
  • the characteristic element 1, the characteristic element 2, the characteristic element 4 and the characteristic element 5 are determined as the characteristic element in the set of the first characteristic element; the characteristic element 3 is determined as the characteristic element in the second set of characteristic elements .
  • the form of the first probability estimation result is a Gaussian distribution
  • the first probability estimation result further includes a first probability variance value.
  • an optional implementation of S3 is: based on the first threshold and the first probability variance value of each feature element, determine the first set of feature elements and the second set of feature elements from multiple feature elements A collection of feature elements.
  • the first threshold may be the second method of S402 above: the smallest first probability variance value in the set of the third feature elements; further, for each feature element in the feature map to be decoded, If the first probability variance value of the feature element is less than the first threshold, the feature element is determined as the second feature element (i.e. the feature element in the set of the second feature element); if the first probability of the feature element If the variance value is greater than or equal to the first threshold, the feature element is determined as the first feature element (that is, a feature element in the set of first feature elements).
  • the first threshold is 0.5
  • the feature elements constituting the first feature map to be encoded are: feature element 1, feature element 2, feature element 3, feature element 4, and feature element 5.
  • the first peak probability of feature element 1 is 0.6, greater than the first threshold
  • the first peak probability of feature element 2 is 0.7, greater than the first threshold
  • the first peak probability of feature element 3 is 0.4, less than the first threshold
  • the first peak probability of feature element 4 is 0.75, which is greater than the first threshold
  • the first peak probability of feature element 5 is 0.5, which is equal to the first threshold.
  • the characteristic element 1, the characteristic element 2, the characteristic element 4 and the characteristic element 5 are determined as the characteristic element in the set of the first characteristic element
  • the characteristic element 3 is determined as the characteristic element in the second set of characteristic elements .
  • the value of the decoded feature map is obtained according to the value of each feature element in the first set of feature elements and the first probability estimation result of each feature element in the second set of feature elements.
  • entropy decoding is performed on the first feature element corresponding to the first probability estimation result to obtain the value of the first feature element (understood as the collective name of the feature elements in the set of first feature elements); the first probability
  • the estimation result includes the first peak probability and the feature value corresponding to the first peak probability, and further, based on the feature corresponding to the first peak probability of the second feature element (understood as a general term for feature elements in the set of second feature elements) value, get the value of the second feature element.
  • entropy decoding is performed on all feature elements in the first feature element set corresponding to the first probability estimation results to obtain the values of all feature elements in the first feature element set; and based on each feature in the second feature element The feature value corresponding to the first peak probability of the element is used to obtain the values of all the feature elements in the second set of feature elements, without performing entropy decoding on any feature element in the second set of feature elements.
  • to perform data decoding on the feature map to be decoded is to obtain the value of each feature element.
  • the multiple feature elements of the feature map to be decoded are feature element 1 , feature element 2 , feature element 3 , feature element 4 and feature element 5 .
  • the characteristic element 1, the characteristic element 2, the characteristic element 4 and the characteristic element 5 determine the characteristic element in the set of the first characteristic element; the characteristic element 3 is determined as the characteristic element in the second set of characteristic elements.
  • the code stream and the first probability estimation result corresponding to the first feature element are input into the data decoding module 104 shown in FIG.
  • the set of the first feature elements is an empty set (that is, each feature element is not entropy encoded)
  • the first probability estimation result of each feature element here, the first probability estimation result The eigenvalue corresponding to the first peak probability in
  • the value of the decoded feature map can be obtained.
  • the set of the second feature elements is an empty set (that is, entropy coding is performed on each feature element)
  • entropy decoding is performed on the first probability estimation result corresponding to each feature element to obtain the value of the decoded feature map.
  • the peak probability of the corresponding probability estimation result based on the feature element provided in Figure 3 is used to determine the The method of whether the feature element needs to skip the entropy coding process can improve the reliability of the judgment result (whether the feature element needs to perform entropy coding), and can significantly reduce the number of elements to perform entropy coding and reduce the complexity of entropy coding. And by using the feature value of the first probability peak value of the feature element (i.e.
  • the second feature element that is not entropy coded provided in Figure 5 as the value of the second feature element, the reliability of the value of the feature map to be decoded is formed, It is better than the traditional method of using a fixed value to replace the value of the second feature element to form the value of the feature map to be decoded, which further improves the accuracy of data decoding and improves the performance of the data encoding and decoding method.
  • FIG. 6 a is a schematic flowchart of another feature map encoding method provided by an embodiment of the present application.
  • the process of the feature map encoding method includes S601 to S607 . in:
  • a manner of obtaining the second context may be to obtain the second context information from the first feature map to be encoded through a network module, wherein the network module may be an RNN or a network variant of the RNN.
  • the second context information may be understood as the feature element (or the value of the feature element) of the feature element within a preset area range in the first feature map to be encoded.
  • the side information and the second context information are used as the input of the probability estimation module 103 in FIG. 1 , and the output of the probability estimation module 103 is the second probability estimation result of each feature element.
  • the form of the second probability estimation result includes Gaussian distribution or other forms of probability distribution (including but not limited to Laplace distribution or mixed Gaussian distribution, etc.).
  • the schematic diagram of the second probability result of a feature element is the same as the schematic diagram of the first probability result shown in FIG. 2 b above, and will not be explained in detail here.
  • a set of third feature elements is determined from multiple feature elements in the first feature map to be encoded.
  • the first threshold is determined based on the second probability estimation result of each feature element in the third feature element set.
  • the specific manner of determining the first threshold may refer to the first probability of each feature element in the third feature element set shown in FIG. 4c The specific manner of determining the first threshold by the estimation result will not be repeated here.
  • the first context information is the feature element within the preset area range of the second feature map to be encoded
  • the value of the second feature map to be encoded is composed of the value of the first feature element and the value of the second feature element
  • the feature value composition corresponding to the first peak probability, the second feature element is a feature element other than the first feature element in the first feature map to be encoded.
  • the second feature map to be encoded can be understood as a feature map after decoding the first feature map to be encoded (ie, the feature map to be decoded in this application).
  • the first context information describes the relationship between each feature element in the second feature map to be encoded
  • the second context information describes the relationship between each feature element in the first feature map to be encoded.
  • the feature elements in the first feature map to be encoded include: feature element 1, feature element 2, feature element 3, . . . , feature element m.
  • the first threshold is obtained based on the specific description in S604, alternate probability estimation and entropy encoding are performed on the feature element 1, the feature element 2, the feature element 3, the feature element 4, and the feature element 5. That is to say, it can be understood that the probability estimation and entropy encoding are performed on the feature element 1 first. Since the feature element 1 is the first feature element to perform entropy encoding, the first context information of the feature element 1 is empty.
  • the information performs probability estimation on feature element 1, and obtains the first probability estimation result corresponding to feature element 1; further, according to the first probability estimation result and the first threshold, it is determined whether feature element 1 is the first feature element, and only in feature element When 1 is the first feature element, perform entropy encoding on feature element 1; determine the value of feature element 1 in the second feature map to be encoded.
  • estimate the first probability estimation result of feature element 2 according to the side information and the first context information (at this time, it can be understood as the value of the first feature element in the second feature map to be encoded).
  • the feature element 2 is the first feature element, and only perform entropy encoding on the feature element 2 when the feature element 2 is the first feature element; determine the feature element 2 Value in the second feature map to be encoded.
  • the feature element 3 according to the side information and the first context information (at this time, it can be understood as the value of the first feature element in the second feature map to be encoded, and the value of the second feature element in the second feature map to be encoded value), estimate the first probability estimation result of feature element 3, and further, according to the first probability estimation result and the first threshold, determine whether feature element 3 is the first feature element, only when feature element 3 is the first feature Entropy encoding is performed on the feature element 3; and the value of the feature element 3 in the second feature map to be encoded is determined. And so on, until the estimation of the probabilities of all the feature elements in the first feature map to be encoded is completed.
  • S606. Determine whether the feature element is the first feature element according to the first probability estimation result of the feature element and the first threshold.
  • the probability estimation result for judging whether the feature element is the first feature element is recorded as the first feature element of the feature element.
  • a probability estimation result, the probability result used to determine the first threshold is recorded as a second probability estimation result.
  • the first probability estimation result of a feature element is different from the second probability estimation result of the feature element.
  • the first probability estimation result of a feature element is the same as the second probability estimation result of the feature element.
  • FIG. 7 a is a schematic flowchart of a feature map decoding method provided by an embodiment of the present application.
  • the feature map decoding method includes steps S701 to S706 . in:
  • side information corresponding to the feature map to be decoded is obtained based on a code stream of the feature map to be decoded; based on the side information, a first probability estimation result corresponding to each feature element is obtained.
  • the code stream of the feature map to be decoded includes the entropy coding result of the side information, so entropy decoding can be performed on the code stream of the feature map to be decoded, and the obtained entropy decoding result includes the side information corresponding to the feature map to be decoded.
  • the first context information is a feature element within a preset area range of the feature element in the feature map to be decoded (that is, the second feature map to be encoded in S605).
  • the second feature map to be encoded in S605 the second feature map to be encoded in S605.
  • the feature elements of the feature map to be decoded include feature element 1, feature element 2, feature element 3, . . . , feature element m.
  • probability estimation and entropy decoding are performed on feature element 1. Since feature element 1 is the first feature element to perform entropy decoding, the first context information of feature element 1 is empty. At this time, only the feature element needs to be analyzed according to the side information. 1. Perform probability estimation to obtain the first probability estimation result corresponding to feature element 1; further, determine (or judge) that feature element 1 is the first feature element or the second feature element, and determine it in the feature map to be decoded according to the judgment result The value of element 1 of this feature.
  • estimate the first probability estimation result of feature element 2 according to side information and first context information (at this time, it can be understood as the value of the first feature element in the feature map to be decoded); further Specifically, determine (or judge) whether the feature element 2 is the first feature element or the second feature element; according to the judgment result, determine the value of the feature element 2 in the feature map to be decoded.
  • the feature element 3 according to the side information and the first context information (at this time, it can be understood as the value of the first feature element in the feature map to be decoded, and the value of the second feature element in the feature map to be decoded), Estimate the first probability estimation result of the feature element 3; further, determine whether the feature element 3 is the first feature element or the second feature element; determine the value of the feature element 3 in the feature map to be decoded according to the judgment result. And so on, until the probability of all feature elements is estimated.
  • the feature element is the first feature element, perform entropy decoding based on the first probability estimation result of the first feature element and the code stream of the feature map to be decoded to obtain the value of the first feature element.
  • the judgment result for the feature element is: the feature element is the first feature element
  • entropy decoding is performed on the first feature element, and the decoding feature of the first feature element is obtained.
  • the value of the first feature element in the decoded feature map is the same as the value of the first feature element in the feature map to be encoded.
  • the characteristic element is a second characteristic element, obtain the value of the second characteristic element based on the first probability estimation result of the second characteristic element.
  • the characteristic element is the second characteristic element
  • the characteristic value corresponding to the first peak probability of the second characteristic element is determined as the value of the second characteristic element. That is, the second feature element does not need to be entropy decoded, and the value of the second feature element in the decoded feature map may be the same as or different from the value of the second feature element in the feature map to be encoded. Based on the value of all the second feature elements and the value of the first feature element, the value of the decoding feature map is determined to obtain the decoding feature map.
  • the feature map encoding method provided in Figure 6a Compared with the feature map encoding method provided in Figure 3, the feature map encoding method provided in Figure 6a combines context information for probability estimation, which improves the accuracy of the probability estimation results corresponding to each feature element, thereby increasing the skip coding The number of characteristic elements of the process further improves the data encoding efficiency.
  • the feature map decoding method provided in Figure 7a collects context information for probability estimation, which improves the accuracy of the probability estimation results corresponding to each feature element, thus improving the accuracy of the probability estimation results without entropy coding.
  • the reliability of the feature elements (that is, the second feature elements) in the feature map to be decoded improves the performance of data decoding.
  • the applicant uses the feature map encoding and decoding method without skip encoding (that is, when entropy encoding is performed on the feature map to be encoded, entropy encoding is performed on all the feature elements in the feature map to be encoded) as the baseline method, and Fig. 6a and Fig. 7a
  • the provided feature map encoding and decoding method (denoted as the feature map encoding and decoding method of dynamic peak skipping), and the feature map encoding performed by skipping the feature element with the probability corresponding to a certain fixed value in the corresponding probability estimation result of each feature element method (denoted as the feature map encoding and decoding method with fixed peak skipping) for comparative experiments, and the comparative experimental results are shown in Table 1.
  • the feature map decoding method with fixed peak skipping reduces the amount of data obtained by the same image quality by 0.11%, while the amount of data obtained by the same image quality in this scheme is reduced by 1%.
  • the applicant also conducted a comparative experiment between the feature map encoding and decoding method provided in Figure 6a and Figure 7a and the feature map encoding and decoding method with fixed peak skipping, and the results of the comparison experiment are shown in Figure 7b and Figure 7c.
  • the vertical axis can be understood as the quality of the reconstructed image
  • the horizontal axis is the image compression ratio. Generally, as the image compression ratio increases, the quality of the reconstructed image will become better.
  • FIG. 8 is a schematic structural diagram of a feature map encoding device provided by the present application.
  • the feature map encoding device may be an integration of the probability estimation module 103 and the data encoding module 104 in FIG. 1 .
  • the unit includes:
  • An acquisition module 80 configured to acquire a first feature map to be encoded, the first feature map to be encoded includes a plurality of feature elements; an encoding module 81, configured to determine the plurality of features based on the first feature map to be encoded
  • the first probability estimation result of each feature element in the element, the first probability estimation result includes the first peak probability; for each feature element in the first feature map to be encoded, based on the first probability of the feature element Peak probability, determining whether the feature element is the first feature element; performing entropy encoding on the first feature element only if the feature element is the first feature element.
  • the first probability estimation result is a Gaussian distribution
  • the first peak probability is the mean probability of the Gaussian distribution
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is composed of multiple Gaussian distributions
  • the first peak probability is the maximum value among the mean probabilities of each Gaussian distribution; or, the first peak probability is formed by the mean value of each Gaussian distribution
  • the probabilities and weights of each Gaussian distribution in this mixture of Gaussian distributions are calculated.
  • the encoding module 81 is specifically configured to determine whether the feature element is the first feature element based on the first threshold and the first peak probability of the feature element.
  • the encoding module 81 is further configured to determine a second probability estimation result of each feature element in the plurality of feature elements based on the first feature map to be encoded, and the second probability estimation result Including the second peak probability; Based on the second probability estimation result of each feature element, determine a set of third feature elements from the plurality of feature elements; Based on the set of each feature element in the set of third feature elements The second peak probability is to determine a first threshold; and perform entropy coding on the first threshold.
  • the first threshold is a maximum second peak probability of the second peak probabilities corresponding to each feature element in the third feature element set.
  • the first peak probability of the first feature element is less than or equal to the first threshold.
  • the second probability estimation result is a Gaussian distribution
  • the second probability estimation result further includes a second probability variance value
  • the first threshold is each feature in the third feature element set The minimum second probability variance value of the second probability variance value corresponding to the element.
  • the first probability estimation result is a Gaussian distribution
  • the first probability estimation result further includes a first probability variance value
  • the first probability variance value of the first feature element is greater than or equal to the first threshold.
  • the second probability estimation result further includes the feature value corresponding to the second peak probability
  • the encoding module 81 is specifically configured to, based on the preset error, the value of each feature element and the The feature value corresponding to the second peak probability of each feature element is used to determine a set of third feature elements from the plurality of feature elements.
  • the characteristic elements in the set of the third characteristic elements have the following characteristics: in, is the feature element, p(x, y, i) is the feature value corresponding to the second peak probability of the feature element, and TH_2 is the preset error.
  • the first probability estimation result is the same as the second probability estimation result
  • the encoding module 81 is specifically configured to obtain the edge of the first feature map to be encoded based on the first feature map to be encoded information; performing probability estimation on the side information to obtain a first probability estimation result of each feature element.
  • the first probability estimation result is different from the second probability estimation result
  • the encoding module 81 is further configured to obtain the edge of the first feature map to be encoded based on the first feature map to be encoded Information and the second context information of each feature element, the second context information is the feature elements of the feature elements within the preset area range of the first feature map to be encoded; based on the side information and The second context information is to obtain the second probability estimation result of each feature element.
  • the encoding module 81 is specifically configured to obtain side information of the first feature map to be encoded based on the first feature map to be encoded; for any feature element in the first feature map to be encoded , based on the first context information and the side information, determine a first probability estimation result of the feature element; wherein, the first probability estimation result further includes a feature value corresponding to the first probability peak, and the first
  • the context information is the feature element within the preset area of the second feature map to be encoded, and the value of the second feature map to be encoded is composed of the value of the first feature element and the second feature element of the second feature element A feature value corresponding to a peak probability, the second feature element is a feature element other than the first feature element in the first feature map to be encoded.
  • the encoding module 81 is further configured to write entropy encoding results of all the first feature elements into the encoded code stream.
  • FIG. 9 is a schematic structural diagram of a feature map decoding device provided in the present application.
  • the feature map decoding device may be an integration of the probability estimation module 103 and the data decoding module 105 in FIG. 1 .
  • the feature map decoding device includes:
  • the obtaining module 90 is configured to obtain a code stream of a feature map to be decoded, the feature map to be decoded includes a plurality of feature elements; and is used to obtain each of the multiple feature elements based on the code stream of the feature map to be decoded A first probability estimation result corresponding to feature elements, the first probability estimation result including a first peak probability;
  • the decoding module 91 is configured to determine a set of first feature elements and a set of second feature elements from the plurality of feature elements based on the first threshold and the first peak probability corresponding to each feature element; based on the The set of the first feature elements and the set of the second feature elements are used to obtain the feature map to be decoded.
  • the first probability estimation result is a Gaussian distribution
  • the first peak probability is the mean probability of the Gaussian distribution
  • the first probability estimation result is a mixed Gaussian distribution
  • the mixed Gaussian distribution is composed of a plurality of Gaussian distributions
  • the first peak probability is the maximum value among the average probabilities of each Gaussian distribution; or, the first peak probability is formed by each Gaussian distribution
  • the mean probability and the weight of each Gaussian distribution in this mixture of Gaussian distributions are calculated.
  • the value of the feature map to be decoded is composed of values of all first feature elements in the set of first feature elements and values of all second feature elements in the set of second feature elements .
  • the set of the first feature elements is an empty set, or, the set of the second feature elements is an empty set.
  • the first probability estimation result further includes the feature value corresponding to the first peak probability
  • the decoding module 91 is further configured to, based on the first probability estimation result corresponding to the first feature element, Entropy decoding is performed on the first feature element to obtain the value of the first feature element; based on the feature value corresponding to the first peak probability of the second feature element, the value of the second feature element is obtained.
  • the decoding module 91 is further configured to obtain the first threshold based on the code stream of the feature map to be decoded.
  • the first peak probability of the first feature element is less than or equal to the first threshold, and the first peak probability of the second feature element is greater than the first threshold.
  • the first probability estimation result is a Gaussian distribution
  • the first probability estimation result further includes a first probability variance value
  • the first probability variance value of the first feature element is greater than or equal to The first threshold
  • the first probability variance value of the second feature element is smaller than the first threshold.
  • the acquisition module 90 is further configured to obtain side information corresponding to the feature map to be decoded based on the code stream of the feature map to be decoded; based on the side information, obtain each The first probability estimation result corresponding to the feature element.
  • the decoding module 91 is further configured to obtain side information corresponding to the feature map to be decoded based on the code stream of the feature map to be decoded; for each feature in the feature map to be encoded element, estimating the first probability estimation result of each feature element based on the side information and the first context information; wherein, the first context information is the preset area of the feature element in the feature map to be decoded Feature elements in the range.
  • FIG. 10 is a schematic diagram of a hardware structure of a feature map encoding device or a feature map decoding device provided in an embodiment of the present application.
  • the apparatus shown in FIG. 10 includes a memory 1001 , a processor 1002 , a communication interface 1003 and a bus 1004 .
  • the memory 1001 , the processor 1002 , and the communication interface 1003 are connected to each other through a bus 1004 .
  • the memory 1001 may be a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device or a random access memory (Random Access Memory, RAM).
  • the memory 1001 can store a program. When the program stored in the memory 1001 is executed by the processor 1002, each step of the feature map encoding method provided by the embodiment of the present application is executed, or each step of the feature map decoding method provided by the embodiment of the present application is executed. .
  • the processor 1002 may be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the feature map encoding device or feature map decoding device of the embodiment of the present application, or to execute the various steps of the feature map encoding method of the method embodiment of the present application, Or execute each step of the feature map decoding method provided in the embodiment of the present application.
  • the processor 1002 may also be an integrated circuit chip with signal processing capability.
  • each step of the feature map encoding method or each step of the feature map decoding method of the present application may be completed by an integrated logic circuit of hardware in the processor 1002 or instructions in the form of software.
  • the above-mentioned processor 1002 can also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application-specific integrated circuit
  • FPGA Field Programmable Gate Array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1001, and the processor 1002 reads the information in the memory 1001, and combines its hardware to complete the functions required by the units included in the feature map encoding device or feature map decoding device of the embodiment of the present application, or execute the The feature map encoding method or the feature map decoding method of the method embodiment.
  • the communication interface 1003 implements communication between the computer device 1000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • Bus 1004 may include pathways for transferring information between various components of computer device 1000 (eg, memory 1001 , processor 1002 , communication interface 1003 ).
  • the acquisition module 80 in the feature map encoding device in FIG. 8 is equivalent to the communication interface 1003 in the computer device 1000
  • the encoding module 81 is equivalent to the processor 1002 in the computer device 1000
  • the acquiring module 90 in the feature map decoding device in FIG. 9 is equivalent to the communication interface 1003 in the computer device 1000
  • the decoding module 91 is equivalent to the processor 1002 in the computer device 1000 .
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored.
  • the program When the program is executed by a processor, it can realize some or all of the steps described in any one of the above-mentioned method embodiments, and realize the above-mentioned The function of any one of the functional modules described in FIG. 10 .
  • the embodiment of the present application also provides a computer program product, which, when running on a computer or a processor, causes the computer or processor to execute one or more steps in any one of the above methods. If each component module of the above-mentioned device is implemented in the form of a software function unit and sold or used as an independent product, it can be stored in the above-mentioned computer-readable storage medium.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD) and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the techniques of the present application may be implemented in a wide variety of devices or devices, including wireless handsets, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules, or units are described in this application to emphasize functional aspects of the means for judging the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.

Abstract

本申请提供了特征图编解码方法和装置。涉及基于人工智能(AI)的数据编解码技术领域,具体设计基于神经网络的数据编码解码技术领域。其中特征图解码方法包括:获取待解码特征图的码流,该待解码特征图包括多个特征元素;基于该码流,获得每个特征元素对应的第一概率估计结果,该第一概率估计结果包括第一峰值概率;基于第一阈值和每个特征元素对应的第一峰值概率,从多个特征元素中确定第一特征元素的集合和第二特征元素的集合;基于该第一特征元素的集合和第二特征元素的集合,得到该解码特征图。通过概率估计结果和每个特征元素对应的第一峰值概率确定每个特征元素的解码方式,能在降低编解码复杂度的同时提升编解码性能。

Description

特征图编解码方法和装置
本申请要求于2021年09月18日提交于中国专利局、申请号为202111101920.9、申请名称为“特征图编解码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。本申请要求于2022年03月25日提交于中国专利局、申请号为202210300566.0、申请名称为“特征图编解码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及基于人工智能(AI)的音视频或图像压缩技术领域,尤其涉及一种特征图编解码方法和装置。
背景技术
图像压缩是指一种利用空间冗余度、视觉冗余度和统计冗余度等图像数据特性,以较少的比特有损或无损地表示原本的图像像素矩阵的技术,实现图像信息的有效传输和存储。图像压缩分为无损压缩和有损压缩两种,其中,无损压缩不会造成图像细节的损失,有损压缩却是以一定程度的图像质量降低为代价实现较大的压缩比。通常在图像有损压缩算法中,很多技术被用于去除图像数据的冗余信息,如量化技术被用于消除由图像中相邻像素间的相关性引起的空间冗余度和由人眼视觉系统感知决定的视觉冗余度,熵编码和转换技术被用于消除图像数据的统计冗余性,传统有损图像压缩技术经过相关领域技术人员数十年的研究与优化已形成了诸如JPEG、BPG等成熟的有损图像压缩标准。
但若图像压缩技术无法在提升压缩效率的同时保证图像压缩质量,将无法满足多媒体应用数据日益增多的时代需求。
发明内容
本申请提供一种特征图编解码方法和装置,能够在降低编解码复杂度的同时提升编解码性能。
第一方面,本申请提供一种特征图解码方法,该方法包括:获取待解码特征图的码流,该待解码特征图包括多个特征元素;基于该待解码特征图的码流,获得多个特征元素中每个特征元素对应的第一概率估计结果,该第一概率估计结果包括第一峰值概率;基于第一阈值和每个特征元素对应的第一峰值概率,从多个特征元素中确定第一特征元素的集合和第二特征元素的集合;基于该第一特征元素的集合和第二特征元素的集合,得到该解码特征图。
相比于根据第一阈值和每个特征元素数值为固定值对应的概率,从多个特征元素中确定第一特征元素和第二特征元素的方法,本申请根据第一阈值和每个特征元素对应的峰值概率,确定第一特征元素和第二特征元素的方法更加精准,进而提升了获得的解码特征图的准确性,提升了数据解码的性能。
在一种可能的实现方式中,第一概率估计结果为高斯分布,该第一峰值概率为所述高斯分布的均值概率;
或者,第一概率估计结果为混合高斯分布,该混合高斯分布由多个高斯分布组成,第一峰值概率为各个高斯分布的均值概率中的最大值;或者,第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到。
在一种可能的实现方式中,该解码特征图的值由第一特征元素的集合中所有第一特征元素的数值和第二特征元素的集合中所有第二特征元素的数值组成。
在一种可能的实现方式中,该第一特征元素的集合为空集,或者,该第二特征元素的集合为空集。
在一种可能的实现方式中,第一概率估计结果还包括第一峰值概率对应的特征值;进而,可以对第一特征元素对应的第一概率估计结果,对该第一特征元素进行熵解码,得到该第一特征元素的数值;基于第二特征元素的第一峰值概率对应的特征值,得到该第二特征元素的数值。通过实施该可能的实现方式,相比于将未编码的特征元素(即第二特征元素)的值赋为固定值而言,本申请将将未编码的特征元素(即第二特征元素)的值赋为该第二特征元素的第一峰值概率对应的特征值,提升了解码数据图的值中第二特征元素的数值的准确性,从而提升了数据解码的性能。
在一种可能的实现方式中,在基于第一阈值和每个特征元素对应的第一峰值概率,从多个特征元素中确定第一特征元素的集合和第二特征元素的集合之前,还可以基于待解码特征图的码流,得到第一阈值。通过实施该可能的实现方式,相比于第一阈值为经验预设值的方法而言,该待解码特征图对应自身的第一阈值,增加了第一阈值的可变灵活性,从而降低了未编码的特征元素(即第二特征元素)的替换值与真值之间的差距,提升解码特征图的准确性。
在一种可能的实现中,第一特征元素的第一峰值概率小于或等于第一阈值,第二特征元素的第一峰值概率大于第一阈值。
在一种可能的实现中,第一概率估计结果为高斯分布,该第一概率估计结果还包括第一概率方差值;在这种情况下,第一特征元素的第一概率方差值大于或等于第一阈值,第二特征元素的第一概率方差值小于第一阈值。基于实施该可能的实现,在概率估计结果为高斯分布的情况下,通过概率方差值确定第一特征元素和第二特征元素的时间复杂度,小于通过峰值概率确定第一特征元素和第二特征元素的方式的时间复杂度,从而可以提升数据解码的速度。
在一种可能的实现中,基于待解码特征图的码流,获得待解码特征图对应的边信息;基于该边信息,获得每个特征元素对应的第一概率估计结果。
在一种可能的实现中,基于待解码特征图的码流,获得待解码特征图对应的边信息;针对待编码特征图中每个特征元素,基于边信息和第一上下文信息估计每个特征元素的第一概率估计结果;其中,第一上下文信息为特征元素在待解码特征图中预设区域范围内的特征元素。通过实施该可能的实现,基于边信息和上下文信息获取每个特征元素的概率估计结果,可以提升该概率估计结果的准确性,从而提升编解码的性能。
第二方面,本申请提供一种特征图编码方法,该方法包括:获取第一待编码特征图,该第一待编码特征图包括多个特征元素;基于第一待编码特征图,确定多个特征元素中每个特征元素的第一概率估计结果,该第一概率估计结果包括第一峰值概率;针对第一待编码特征图中的每个特征元素,基于该特征元素的第一峰值概率,确定该特征元素是否为第一特征元素;仅在特征元素为第一特征元素的情况下,对第一特征元素进行熵编码。
基于第二方面的方法,对待编码特征图中每个特征元素进行判定是否需要执行熵编码,从而可以跳过该待编码特征图中部分特征元素的编码过程,可以显著减少执行熵编码的元素个数,降低熵编码复杂度。并且相比于通过每个特征元素对应概率估计结果中某个固定值对应的概率来确定该特征元素是否需要执行编码,通过每个特征元素的概率峰值来提升了判断 结果(该特征元素是否需要执行熵编码)的可靠性,并且会跳过更多的特征元素的编码过程,进一步提升了编码速度,从而提升了编码的性能。
在一个可能的实现中,第一概率估计结果为高斯分布,第一峰值概率为该高斯分布的均值概率;
或者,第一概率估计结果为混合高斯分布,该混合高斯分布由多个高斯分布组成,第一峰值概率为各个高斯分布的均值概率中的最大值;或者,第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到。
在一个可能的实现中,针对第一待编码特征图中的每个特征元素,基于第一阈值和该特征元素的第一峰值概率,确定该特征元素是否为第一特征元素。
在一个可能的实现中,基于第一待编码特征图,确定多个特征元素中每个特征元素的第二概率估计结果,该第二概率估计结果包括第二峰值概率;基于每个特征元素的第二概率估计结果,从多个特征元素中确定第三特征元素的集合;基于第三特征元素的集合中各个特征元素的第二峰值概率,确定第一阈值;对第一阈值进行熵编码。通过实施该可能的实现方式,待编码特征图可以根据自身的特征元素确定该待编码特征图的第一阈值,使第一阈值与待编码特征图之间具有更好的适配性,从而提升基于第一阈值和特征元素的第一峰值概率确定出的判断结果(即该特征元素是否需要执行熵编码)具有更高的可靠性。
在一个可能的实现方式中,该第一阈值为第三特征元素集合中各个特征元素对应的第二峰值概率的最大第二峰值概率。
在一个可能的实现方式中,第一特征元素的第一峰值概率小于或等于该第一阈值。
在一个可能的实现方式中,第二概率估计结果为高斯分布,该第二概率估计结果还包括第二概率方差值,第一阈值为第三特征元素集合中各个特征元素对应的第二概率方差值的最小第二概率方差值。在这种情况下,第一概率估计结果也为高斯分布,该第一概率估计结果还包括第一概率方差值,该第一特征元素的第一概率方差值大于或等于第一阈值。通过实现该可能的实现方式,在概率估计结果为高斯分布的情况下,通过概率方差值确定第一特征元素的时间复杂度,小于通过峰值概率确定第一特征元素的时间复杂度,从而可以提升数据编码的速度。
在一个可能的实现方式中,第二概率估计结果还包括第二峰值概率对应的特征值,进一步地,基于预设误差、每个特征元素的数值和每个特征元素的第二峰值概率对应的特征值,从多个特征元素中确定第三特征元素的集合。
在一个可能的实现方式中,第三特征元素的集合中的特征元素具有以下特征:
Figure PCTCN2022117819-appb-000001
其中,
Figure PCTCN2022117819-appb-000002
为特征元素的数值,p(x,y,i)为特征元素的第二峰值概率对应的特征值,TH_2为预设误差。
在一个可能的实现中,第一概率估计结果和第二概率估计结果相同,在这种情况下,基于第一待编码特征图,获取第一待编码特征图的边信息;对边信息进行概率估计,得到每个特征元素的第一概率估计结果。
在一个可能的实现中,第一概率估计结果和第二概率估计结果不同,在这种情况下,基于第一待编码特征图,获取第一待编码特征图的边信息和每个特征元素的第二上下文信息,该第二上下文信息为特征元素在第一待编码特征图中预设区域范围内的特征元素;基于所述边信息和第二上下文信息,得到所述每个特征元素的第二概率估计结果。
在一个可能的实现中,基于第一待编码特征图,获取第一待编码特征图的边信息;针对第一待编码特征图中的任一特征元素,基于第一上下文信息和边信息,确定所述特征元素的 第一概率估计结果;其中,第一概率估计结果还包括第一概率峰值对应的特征值,第一上下文信息为该特征元素在第二待编码特征图中预设区域范围内的特征元素,该第二待编码特征图的值由第一特征元素的数值和所述第二特征元素的第一峰值概率对应的特征值组成,该第二特征元素为第一待编码特征图中除第一特征元素之外的特征元素。通过这样的方式,结合边信息和上下文信息得到每个特征元素的概率估计结果,相较于仅利用边信息得到每个特征元素的概率估计结果的方式,提升了每个特征元素的概率估计结果的准确性。
在一个可能的实现中,将所有第一特征元素的熵编码结果,写入编码码流。
第三方面,本申请提供了一种特征图解码装置,包括:
获取模块,用于获取待解码特征图的码流,所述待解码特征图包括多个特征元素;以及用于基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;
解码模块,用于基于第一阈值和所述每个特征元素对应的第一峰值概率,从所述多个特征元素中确定第一特征元素的集合和第二特征元素的集合;基于所述第一特征元素的集合和所述第二特征元素的集合,得到解码特征图。
上述获得模块、解码模块的进一步实现功能可以参考第一方面或者第一方面的任意一种实现方式,此处不再赘述。
第四方面,本申请提供了一种特征图编码装置,包括:
获取模块,用于获取第一待编码特征图,所述第一待编码特征图包括多个特征元素;
编码模块,用于基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;针对所述第一待编码特征图中的每个特征元素,基于所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素;仅在所述特征元素为第一特征元素的情况下,对所述第一特征元素进行熵编码。
上述获得模块、编码模块的进一步实现功能可以参考第二方面或者第二方面的任意一种实现方式,此处不再赘述。
第五方面,本申请提供一种解码器,包括处理电路,用于判断根据上述第一方面及第一方面任一项所述的方法。
第六方面,本申请提供一种编码器,包括处理电路,用于判断上述第二方面及第二方面任一项所述的方法。
第七方面,本申请提供一种计算机程序产品,包括程序代码,当其在计算机或处理器上判断时,用于判断上述第一方面及第一方面任一项所述的方法、或判断上述第二方面及第二方面任一项所述的方法。
第八方面,本申请提供一种解码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述解码器判断上述第一方面及第一方面任一项所述的方法。
第九方面,本申请提供一种编码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述编码器判断上述第二方面及第二方面任一项所述的方法。
第十方面,本申请提供一种非瞬时性计算机可读存储介质,包括程序代码,当其由计算机设备判断时,用于判断上述第一方面及第一方面任一项、上述第二方面及第二方面任一项所述的方法。
第十一方面,本发明涉及解码装置,具有实现上述第一方面或第一方面任一项的方法实 施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件判断相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
第十二方面,本发明涉及编码装置,具有实现上述第二方面或第二方面任一项的方法实施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件判断相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
附图说明
图1是本申请实施例提供的一种数据译码系统架构示意图;
图2a是本申请实施例提供的一种概率估计模块103的输出结果示意图;
图2b是本申请实施例提供的一种概率估计结果的示意图;
图3是本申请实施例提供的一种特征图编码方法的流程示意图;
图4a是本申请实施例提供的一种概率估计模块103的输入和输出结果的示意图;
图4b是本申请实施例提供的一种概率估计网络的结构示意图;
图4c是本申请实施例提供的一种确定第一阈值方法的流程示意图;
图5是本申请实施例提供的一种特征图解码方法的流程示意图;
图6a是本申请实施例提供的另一种特征图编码方法的流程示意图;
图6b是本申请实施例提供的另一种概率估计模块103的输入和输出结果的示意图;
图7a是本申请实施例提供的另一种特征图解码方法的流程示意图;
图7b是本申请实施例提供的一种压缩性能对比试验的实验结果示意图;
图7c是本申请实施例提供的另一种压缩性能对比试验的实验结果示意图;
图8是本申请实施例提供的一种特征图编码装置的结构示意图;
图9是本申请实施例提供的一种特征图解码装置的结构示意图;
图10是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面结合附图对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。
需要说明的是,本申请的说明书以及附图中的术语“第一”和“第二”等是用于区分不同的对象,或者用于区别对同一对象的不同处理,而不是用于描述对象的特定顺序。此外,本申请的描述中所提到的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一些列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括其他没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。需要说明的是,本申请实施例中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方法不应被解释为比其他实施例或设计方案更优地或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。在本申请实施例中,“A和/或B”表示A和B,A或B两个含义。“A,和/或B,和/或C”表示A、B、C中的任一个,或者,表示A、B、C中的任两个,或者,表示A和B和C。下面结合附图,对本申请中的技术方案进行描述。
本申请实施例提供的特征图解码方法和特征图编码方法能应用在数据编码领域(包括音频编码领域、视频编码领域和图像编码领域),具体地,该特征图解码方法和特征图编码方法可以应用在相册管理、人机交互,音频压缩或传输、视频压缩或传输、图像压缩或传输、数 据压缩或传输的场景中。需要说明的是,为了便于理解,本申请实施例中仅以特征图解码方法和特征图编码方法应用于图像编码领域进行示意性说明,并不能视为对本申请所提供方法的限定。
具体地,以特征图编码方法和特征图解码方法应用于端到端的图像特征图编解码系统为例,在该端到端的图像特征图编解码系统中,包括图像编码和图像解码两部分。图像编码在源端判断,通常包括处理(例如,压缩)原始视频图像以减少表示该视频图像所需的数据量(从而更高效存储和/或传输)。图像解码在目的地端判断,通常包括相对于编码器作逆处理,以重建图像。在端到端的图像特征图编解码系统中,通过本申请所提供的特征图解码和特征图编码方法,可以对待编码特征图中每个特征元素进行判定是否需要执行熵编码,从而跳过部分特征元素的编码过程,减少执行熵编码的元素个数,降低熵编码复杂度。并且通过每个特征元素的概率峰值来提升判断结果(该特征元素是否需要执行熵编码)的可靠性,从而提升图像压缩的性能。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
1、熵编码
熵编码即编码过程中按熵原理不丢失任何信息的编码。熵编码用于将熵编码算法或方案应用于量化系数、其它语法元素,得到可以通过输出端以编码比特流等形式输出的编码数据,使得解码器等可以接收并使用用于解码的参数。可将编码比特流传输到解码器,或将其保存在存储器中稍后由解码器传输或检索。其中,熵编码算法或方案包括但不限于:可变长度编码(variable length coding,VLC)方案、上下文自适应VLC方案(context adaptive VLC,CALVC)、算术编码方案、二值化算法、上下文自适应二进制算术编码(context adaptive binary arithmetic coding,CABAC)、基于语法的上下文自适应二进制算术编码(syntax-based context-adaptive binary arithmetic coding,SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)编码或其它熵编码方法或技术。
2、神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为公式(1)所示。
Figure PCTCN2022117819-appb-000003
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
3、深度神经网络(deep neural network,DNN)
DNN也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:y=α(Wx+b),其中,x是输入向量,y是输出向量,b是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量x经过如此简单的操作得到输出向量y。由于DNN层数多,系数W和偏移向量b的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022117819-appb-000004
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022117819-appb-000005
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
4、卷积神经网络(convolutional neuron network,CNN)
CNN是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
5、循环神经网络(recurrent neural networks,RNN)
在现实世界中,很多元素都是有序的、相互连接的,为了让机器像人一样拥有记忆的能力,会根据上下文的内容进行推断,RNN就应运而生了。
RNN是用来处理序列数据的,即一个序列当前的输出与前面的输出也有关,即RNN的输出就需要依赖当前的输入信息和历史的记忆信息。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数(如W)是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法(back propagation through time,BPTT)。
6、损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
7、反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
8、生成式对抗网络
生成式对抗网络(generative adversarial networks,GAN)是一种深度学习模型。该模型中至少包括两个模块:一个模块是生成模型(Generative Model),另一个模块是判别模型(Discriminative Model),通过这两个模块互相博弈学习,从而产生更好的输出。生成模型和判别模型都可以是神经网络,具体可以是深度神经网络,或者卷积神经网络。GAN的基本原理如下:以生成图片的GAN为例,假设有两个网络,G(Generator)和D(Discriminator),其中G是一个生成图片的网络,它接收一个随机的噪声z,通过这个噪声生成图片,记做G(z);D是一个判别网络,用于判别一张图片是不是“真实的”。它的输入参数是x,x代表一张图片,输出D(x)代表x为真实图片的概率,如果为1,就代表100%是真实的图片,如果为0,就代表不可能是真实的图片。在对该生成式对抗网络进行训练的过程中,生成网络G的目标就是尽可能生成真实的图片去欺骗判别网络D,而判别网络D的目标就是尽量把G生成的图片和真实的图片区分开来。这样,G和D就构成了一个动态的“博弈”过程,也即“生成式对抗网络”中的“对抗”。最后博弈的结果,在理想的状态下,G可以生成足以“以假乱真”的图片G(z),而D难以判定G生成的图片究竟是不是真实的,即D(G(z))=0.5。这样就得到了一个优异的生成模型G,它可以用来生成图片。
9、像素值
图像的像素值可以是一个红绿蓝(RGB)颜色值,像素值可以是表示颜色的长整数。例如,像素值为256*Red+100*Green+76Blue,其中,Blue代表蓝色分量,Green代表绿色分量,Red代表红色分量。各个颜色分量中,数值越小,亮度越低,数值越大,亮度越高。对于灰度图像来说,像素值可以是灰度值。
下面介绍本申请实施例提供的系统架构。请参加图1,图1为本申请实施例提供的一种数据译码系统架构。在该数据译码系统架构中包括数据采集模块101、特征提取模块102、概率估计模块103、数据编码模块104、数据解码模块105、数据重建模块106、显示模块107。其中:
数据采集模块101,用于采集原始图像。该数据采集模块101可包括或可以为任意类型 的用于捕获现实世界图像等的图像捕获设备,和/或任意类型的图像生成设备,例如用于生成计算机动画图像的计算机图形处理器或任意类型的用于获取和/或提供现实世界图像、计算机生成图像(例如,屏幕内容、虚拟现实(virtual reality,VR)图像和/或其任意组合(例如增强现实(augmented reality,AR)图像)的设备。所述数据采集模块101还可以为存储上述图像的任意类型的内存或存储器。
特征提取模块102,用于从数据采集模块101接收原始图像,对原始图像进行预处理,进一步地采用特征提取网络从预处理之后的图像中提取出特征图(即待编码特征图),该特征图(即待编码特征图)包括多个特征元素。具体地,前述对原始图像进行预处理包括但不限于:修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色、去噪或归一化等。该特征提取网络可以是前述介绍的神经网络、DNN、CNN或RNN中的一种或变形,在此对该特征提取网络的具体形式不做具体限定。该特征提取模块102可选地,还用于通过例如标量量化或矢量量化对该特征图(即待编码特征图)进行取整。需要知晓的是,特征图包括多个特征元素,该特征图的值由每个特征元素的数值组成。可选地,该特征提取模块102中还包括边信息提取网络,即该特征提取模块102除了输出由特征提取网络输出的特征图之外,还输出通过边信息提取网络提取该特征图的边信息。其中,该边信息提取网络可以是前述介绍的神经网络、DNN、CNN或RNN中的一种或变形,在此对该特征提取网络的具体形式不做具体限定。
概率估计模块103,用于估计特征图(即待编码特征图)的多个特征元素中每个特征元素的对应取值的概率。例如,该待编码特征图中包括m个特征元素,其中m为正整数,如图2a所示,该概率估计模块103输出m个特征元素中的每个特征元素的概率估计结果,示例性地,一个特征元素的概率估计结果可以如图2b所示,图2b的横轴坐标为特征元素的可能数值(或称为该特征元素可能的取值),纵轴坐标则代表每个可能数值(或称为该特征元素可能的取值)的可能性,例如点P即表明该特征元素取值为[a-0.5,a+0.5]的概率为p。
数据编码模块104,用于根据来自特征提取模块102的特征图(即待编码特征图)和来自概率估计模块103的每个特征元素的概率估计结果进行熵编码,生成编码码流(本文也称为待解码特征图的码流)。
数据解码模块105,用于接收来自数据编码模块104的编码码流,进一步地根据该编码码流和来自概率估计模块103的每个特征元素的概率估计结果进行熵解码,得到解码特征图(或理解为解码特征图的值)。
数据重建模块106,用于对来自数据解码模块105的解码图像特征图进行后处理,和采用图像重建网络对后处理后的解码图像特征图进行图像重建,得到解码图像。其中,后处理的操作包括但不限于颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样等;图像重建网络可以是前述介绍的神经网络、DNN、CNN或RNN中的一种或变形,在此对该特征提取网络的具体形式不做具体限定。
显示模块107,用于显示来自数据重建模块106的解码图像,以向用户或观看者等显示图像。该显示模块107可以为或包括任意类型的用于表示重建后音频或图像的播放器或显示器,例如,集成或外部显示屏或显示器。例如,显示屏可包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微型LED显示器、硅基液晶显示器(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任意类型的其它显示屏。
需要说明的是,该数据译码系统架构可以是一个设备的功能模块;该数据译码系统架构 也可以是端到端的数据译码系统,即该数据译码系统架构中包括两个设备:源端设备和目的端设备,其中,源端设备可以包括:数据采集模块101、特征提取模块102、概率估计模块103和数据编码模块104;目的端设备可以包括:数据解码模块105、数据重建模块106和显示模块107。源端设备用于向目的端设备提供编码码流的方式一:可以是源端设备通过通信接口向目的端设备发送该编码码流,该通信接口可以是源端设备与目的端设备之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合;源端设备用于向目的端设备提供编码码流的方式二:还可以是源端设备将编码码流存储于一个存储设备中,目的端设备可以从存储设备中获取该编码码流。
需要说明的是,本申请所提及的特征图编码方法主要可以由图1中概率估计模块103和数据编码模块104来执行,本申请所提及的特征图解码方法主要可以由图1中概率估计模块103和数据解码模块105来执行。
在一个示例中,本申请所提供的特征图编码方法的方法执行主体为编码侧设备,该编码侧设备主要可以包括图1中概率估计模块103和数据编码模块104。针对本申请所提供的特征图编码方法,编码侧设备可以包括如下几个步骤:步骤11~步骤14。其中:
步骤11、编码侧设备获取第一待编码特征图,该第一待编码特征图包括多个特征元素。
步骤12、编码侧设备中的概率估计模块103基于第一待编码特征图,确定多个特征元素中每个特征元素的第一概率估计结果,该第一概率估计结果包括第一峰值概率。
步骤13、针对第一待编码特征图中的每个特征元素,编码侧设备基于特征元素的第一峰值概率,确定特征元素是否为第一特征元素。
步骤14、仅在特征元素为第一特征元素的情况下,编码侧设备中的数据编码模块104对第一特征元素进行熵编码。
在另一个示例中,本申请所提供的特征图解码方法的方法执行主体为解码侧设备,该解码侧设备主要包括图1中概率估计模块103和数据解码模块105。针对本申请所提供的特征图解码方法中,解码侧设备可以包括如下几个步骤:步骤21~步骤24。其中:
步骤21、解码侧设备获取待解码特征图的码流,该待解码特征图包括多个特征元素。
步骤22、解码侧设备中的概率估计模块103基于待解码特征图的码流,获得多个特征元素中每个特征元素对应的第一概率估计结果,该第一概率估计结果包括第一峰值概率。
步骤23、解码侧设备基于第一阈值和每个特征元素对应的第一峰值概率,从多个特征元素中确定第一特征元素的集合和第二特征元素的集合。
步骤24、解码侧设备中的数据解码模块105基于第一特征元素的集合和第二特征元素的集合,得到解码特征图。
下面结合附图,对本申请提供的特征图解码方法和特征图编码方法的具体实施方式进行详细的描述。下文中,图3所示编码侧的执行流程示意图与图5所示解码侧的执行流程示意图,可视为一套特征图编解码方法的流程示意图;图6a所示编码侧的执行流程示意图与图7a所示解码侧的执行流程示意图,可视为一套特征图编解码方法的流程示意图。
编码侧:请参见图3,图3为本申请实施例提供的一种特征图编码方法的流程示意图,该特征图编码方法的流程包括S301~S306。其中:
S301、获取第一待编码特征图,该第一待编码特征图包括多个特征元素。
原始数据经过特征提取之后,得到待编码特征图y,进一步地,对该待编码特征图y进 行量化,即将浮点数的特征值进行四舍五入得到整数特征值,得到量化后的待编码特征图
Figure PCTCN2022117819-appb-000006
(即第一待编码特征图),将该特征图
Figure PCTCN2022117819-appb-000007
中的特征元素表示为
Figure PCTCN2022117819-appb-000008
在一个具体的示例中,可参见前述图1所示的数据采集模块101采集的原始图像的具体描述,通过特征提取模块102获得待编码特征图的具体描述。
S302、基于该第一待编码特征图,获得该第一待编码特征图的边信息。
其中,边信息可以理解为对待编码特征图进行进一步特征提取得到的特征图,边信息包含的特征元素的个数比待编码特征图中的特征元素个数少。
在一个可能的实现方式中,可以通过边信息提取网络获取第一待编码特征图的边信息。该边信息提取网络可以使用RNN、CNN、RNN的变形、CNN的变形或其他深度神经网络(或其他深度神经网络的变形),本申请对此不进行具体限定。
S303、基于边信息,得到每个特征元素的第一概率估计结果,该第一概率估计结果包括第一峰值概率。
如图4a所示,将边信息作为图1中概率估计模块103的输入,该概率估计模块103的输出即为各个特征元素的第一概率估计结果。其中该概率估计模块103可以为概率估计网络,该概率估计网络可以使用RNN、CNN、RNN的变形、CNN的变形或其他深度神经网络(或其他深度神经网络的变形)。请参见图4b,图4b为一种概率估计网络的结构示意图,在图4b中该概率估计网络为卷积网络,该卷积网络中包括了5个网络层:3个卷积层以及2个非线性激活层。该概率估计模块103还可以采用非网络的传统概率估计方法实现。概率估计方法包括且不限于等最大似然估计、最大后验估计、极大似然估计等统计方法。
其中,针对第一待编码特征图中的任一特征元素
Figure PCTCN2022117819-appb-000009
而言,该特征元素
Figure PCTCN2022117819-appb-000010
的第一概率估计结果为:该特征元素
Figure PCTCN2022117819-appb-000011
的每个可能取值(或称为可能数值)的概率。例如,请参见图2b,横轴表示该特征元素
Figure PCTCN2022117819-appb-000012
的每个可能取值(或称为可能数值),纵轴表示该每个可能取值(或称为可能数值)的可能性。第一峰值概率即为该第一概率估计结果中概率最大值,也可称为第一概率估计结果中的概率峰值,如图2b中点P纵坐标的数值p即为该第一概率估计结果中的第一峰值概率。
在一个可能的实施方式中,第一概率估计结果为高斯分布,则第一峰值概率为该高斯分布的均值概率。例如第一概率估计结果如图2b所示为高斯分布,则第一峰值为该高斯分布中的均值概率,即均值a对应的概率p。
在另一个可能的实施方式中,第一概率估计结果为混合高斯分布,该混合高斯分布由多个高斯分布组成,换言之,该混合高斯分布可以由各个高斯分布乘以各个高斯分布的权重,加权得到。在一个可能的情况下,第一峰值概率为各个高斯分布的均值概率中的最大值。或者,在另一个可能的情况下,第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到。
例如第一概率估计结果为混合高斯分布,该混合高斯分布由高斯分布1、高斯分布2和高斯分布3加权得到。其中,高斯分布1的权重为w 1、高斯分布2的权重为w 2和高斯分布3的权重为w 3,高斯分布1的均值概率为p 1、高斯分布2的均值概率为p 2和高斯分布3的均值概率为p 3,并且p 1大于p 2大于p 3。在第一峰值概率为该各个高斯分布的均值概率中的最大值的情况下,第一峰值概率为各个高斯分布的均值概率中的最大值(即高斯分布1的均值概率为p 1)。在第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到的情况下,第一峰值概率如公式(2)所示。
第一峰值概率=p 1×w 1+p 2×w 2+p 3×w 3   (2)
需要知晓的是,当第一概率估计结果为混合高斯分布时,该混合高斯分布中各个高斯分布对应的权重可以由概率估计网络(例如前述概率估计模块103)得到并输出。换言之,概率估计网络在得到每个特征元素的第一概率估计结果(即混合高斯分布)的同时,也会得到组成该混合高斯分布的各个高斯分布对应的权重。
S304、基于每个特征元素的第一概率结果,确定第一阈值。
在一个可能的实现中,基于第一待编码特征图中每个特征元素的第一概率估计结果,从第一待编码特征图中的多个特征元素中确定第三特征元素的集合。进一步地,基于该第三特征元素的集合中各个特征元素的第一概率估计结果,确定第一阈值。
换言之,可将确定第一阈值的流程分为两个步骤,具体地确定第一阈值的流程示意图如图4c所示,包括步骤S401~S402。其中:
S401、从第一待编码特征图包括的多个特征元素中,确定第三特征元素的集合。
基于第一待编码特征图中每个特征元素的第一概率估计结果,从第一待编码特征图的多个特征元素中确定第三特征元素的集合,该第三特征元素的集合可以理解为用于确定第一阈值的特征元素的集合。
在一个可能的实现中,可以基于预设误差、第一待编码特征图中每个特征元素的数值和每个特征元素的第一峰值概率对应的特征值,从该多个特征元素中确定第三特征元素的集合。其中,每个特征元素的第二峰值概率对应的特征值是指:该特征元素的第一概率估计结果中,该第一峰值概率对应的该特征元素的可能取值(或可能数值),例如图2b中点P的横轴坐标数值a。预设误差值可理解为该特征图编码方法的可容忍误差,可以根据经验值或算法确定。
具体地,确定出的第三特征元素的集合中特征元素具有公式(3)所示的特征。
Figure PCTCN2022117819-appb-000013
其中,
Figure PCTCN2022117819-appb-000014
为该特征元素
Figure PCTCN2022117819-appb-000015
的数值,p(x,y,i)为该特征元素
Figure PCTCN2022117819-appb-000016
的第一峰值概率对应的特征值,TH_2为预设误差。
示例性地,组成第一待编码特征图的多个特征元素为:特征元素1、特征元素2、特征元素3、特征元素4和特征元素5。已经通过概率估计模块获取到了该第一待编码特征图的多个特征元素中每个特征元素的第一概率估计结果。在这种情况下,基于预设误差e,每个特征元素的数值和每个特征元素对应第一概率估计结果的第一峰值概率(后文简称为特征元素的第一峰值概率),从特征元素1、特征元素2、特征元素3、特征元素4和特征元素5中,筛选出满足公式(3)的特征元素组成第三特征元素的集合。其中,特征元素1的数值与特征元素1对应第一峰值概率的特征值之间的绝对差值大于TH_2,则特征元素1满足公式(3);特征元素2的数值与特征元素2对应第一峰值概率的特征值之间的绝对差值大于TH_2,则特征元素2满足公式(3);特征元素3的数值与特征元素3对应第一峰值概率的特征值之间的绝对差值小于TH_2,则特征元素3不满足公式(3);特征元素4的数值与特征元素4对应第一峰值概率的特征值之间的绝对差值等于TH_2,则特征元素4不满足公式(3);特征元素5的数值与特征元素5对应第一峰值概率的特征值之间的绝对差值大于TH_2,则特征元素5满足公式(3)。综上所述,从特征元素1、特征元素2、特征元素3、特征元素4和特征元素5中,确定特征元素1、特征元素2、特征元素5确定为第三特征元素,组成第三特征元素的集合。
S402、基于第三特征元素的集合中各个特征元素的第一概率估计结果,确定第一阈值。
根据第三特征元素的集合中各个特征元素的第一概率估计结果的形式,确定第一阈值。其中,该第一概率估计结果的形式包括高斯分布或其他形式概率分布(包括但不限于拉普拉 斯分布或混合高斯分布等)。
下面基于该第一概率分布结果的形式,对确定第一阈值的方式进行展开叙述。
方式一:该第一阈值为第三特征元素集合中各个特征元素对应的第一峰值概率的最大第一峰值概率。
需要知晓的是,在此种方式中,该第一概率分布结果的形式可以为高斯分布,也可以为其他形式概率分布(包括但不限于拉普拉斯分布或混合高斯分布等)。
示例性地,特征元素1、特征元素2、特征元素5确定为第三特征元素,组成第三特征元素的集合。特征元素1的第一峰值概率为70%,特征元素2的第一峰值概率为65%,特征元素5的第一峰值概率为75%,则将第三特征元素的集合中各特征元素对应的最大第一峰值概率(即特征元素5的第一峰值概率75%)确定为第一阈值。
方式二:该第一概率估计结果为高斯分布,该第一概率估计结果还包括第一概率方差值,该第一阈值为第三特征元素的集合中各个特征元素对应的第一概率方差值的最小第一概率方差值。
需要知晓的是,高斯分布的数学特征可概括为:在一个高斯分布中第一概率方差值越大则第一峰值概率越小;并且在第一概率估计结果的高斯分布的情况下,从该第一概率估计结果中获取第一概率方差值的速度,快于从该第一概率估计结果中获取第一峰值概率的速度。可见,在第一概率估计结果为高斯分布时,可基于第一概率方差值确定第一阈值的效率可优于基于第一峰值概率确定第一阈值的效率。
例如,特征元素1、特征元素2、特征元素5确定为第三特征元素,组成第三特征元素的集合。特征元素1的第一概率方差值σ为0.6,特征元素2的第一概率方差值σ为0.7,特征元素5的第一概率方差值σ为0.5,则将第三特征元素的集合中各特征元素对应的最小第一概率方差值σ(即特征元素5的概率方差值0.5)确定为第一阈值。
需要知晓的是,由于第一阈值是根据该第一待编码特征图中的特征元素确定的,即第一阈值与该第一待编码特征图相对应,为了方便数据解码,可以将该第一阈值进行熵编码,并将该熵编码的结果写入第一待编码特征图的编码码流。
S305、针对每个特征元素,基于第一阈值和该特征元素的第一概率估计结果,确定该特征元素是否为第一特征元素。
针对第一待编码特征图中多个特征元素的每个特征元素而言,可以基于第一阈值和该特征元素的第一概率估计结果,确定该特征元素是否为第一特征元素。可见,判断特征元素是否为第一特征元素的一个重要判断条件是第一阈值,下面根据前述确定第一阈值的具体方式,对确定特征元素是否为第一特征元素的方式进行具体讨论。
方式一:第一阈值为第三特征元素集合中各个特征元素对应的第一峰值概率的最大第二峰值概率的情况下,根据该第一阈值确定的第一特征元素满足条件:第一特征元素的第一峰值概率小于或等于该第一阈值。
示例性地,组成第一待编码特征图的多个特征元素为:特征元素1、特征元素2、特征元素3、特征元素4和特征元素5。其中,特征元素1、特征元素2、特征元素5组成第三特征元素的集合,根据该第三特征元素的集合确定第一阈值为75%。在这种情况下,若特征元素1的第一峰值概率为70%,小于第一阈值;特征元素2的第一峰值概率为65%,小于第一阈值;特征元素3的第一峰值概率为80%,大于第一阈值;特征元素4的第一峰值概率为60%,小于第一阈值;和特征元素5的第一峰值概率为75%,等于第一阈值。综上,将特征元素1、特征元素2、特征元素4和特征元素5确定为第一特征元素。
方式二:第一阈值为第三特征元素的集合中各个特征元素对应的第一概率方差值的最小第一概率方差值的情况下,根据该第一阈值确定的第一特征元素满足条件:第一特征元素的第一概率方差值大于或等于该第一阈值。
示例性地,组成第一待编码特征图的多个特征元素为:特征元素1、特征元素2、特征元素3、特征元素4和特征元素5。其中,特征元素1、特征元素2、特征元素5组成第三特征元素的集合,根据该第三特征元素的集合确定第一阈值为0.5。在这种情况下,若特征元素1的第一峰值概率为0.6,大于第一阈值;特征元素2的第一峰值概率为0.7,大于第一阈值;特征元素3的第一峰值概率为0.4,小于第一阈值;特征元素4的第一峰值概率为0.75,大于第一阈值;和特征元素5的第一峰值概率为0.5,等于第一阈值。综上,将特征元素1、特征元素2、特征元素4和特征元素5确定为第一特征元素。
S306、仅在该特征元素为第一特征元素的情况下,对第一特征元素进行熵编码。
对第一待编码特征图中每个特征元素进行判断,判断该特征元素是否是第一特征元素,若是,则对该第一特征元素进行编码,并将该第一特征元素的编码结果写入编码码流。即可以理解对特征图中所有的第一特征元素进行了熵编码,并将所有第一特征元素的熵编码结果写入编码码流。
示例性地,组成第一待编码特征图的多个特征元素为:特征元素1、特征元素2、特征元素3、特征元素4和特征元素5。特征元素1、特征元素2、特征元素4和特征元素5确定为第一特征元素。则不对特征元素2进行熵编码,将特征元素1、特征元素2、特征元素4和特征元素5进行熵编码,将全部第一特征元素的熵编码结果写入编码码流。
需要说明的是,若每个特征元素在S305的判断结果均是:不为第一特征元素,则对每个特征元素都不进行熵编码。若每个特征元素在S305的判断结果均是:为第一特征元素,则对每一个特征元素都进行熵编码,并将每一个特征元素的熵编码结果写入编码码流。
在一个可能的实现中,还可以将第一待编码特征图的边信息进行熵编码,并将该边信息的熵编码结果写入码流。也可以是将该第一待编码特征图的边信息发送至解码端,以便后续进行数据解码。
解码侧:请参见图5,图5为本申请实施例提供的一种特征图解码方法的流程示意图,该特征解码方法的流程包括S501~S504。其中:
S501、获取待解码特征图的码流,该待解码特征图包括多个特征元素。
该待编码特征图的码流可以理解为S306得到的编码码流。该待解码特征图为对该码流进行数据解码后得到的特征图。该待解码特征图包括多个特征元素,该多个特征元素被分为两部分:第一特征元素的集合和第二特征元素的集合。其中,第一特征元素的集合为在图3的特征图编码阶段,进行了熵编码的特征元素的集合;第二特征元素的集合为在图3的特征图编码阶段,未进行熵编码的特征元素的集合。
在一个可能的实现中,该第一特征元素的集合为空集,或者,该第二特征元素的集合为空集。其中,第一特征元素的集合为空集,即是指在图3的特征图编码阶段,每个特征元素均未进行熵编码;第二特征元素的集合为空集,即是指在图3的特征图编码阶段,每个特征元素均进行了熵编码。
S502、基于该待解码特征图的码流,获得该多个特征元素中每个特征元素对应的第一概率估计结果,该第一概率估计结果包括第一峰值概率。
对该待解码特征图的码流,进行熵解码;进一步地,可以根据熵解码结果,获得该多个特征元素中每个特征元素对应的第一概率估计结果。该第一概率估计结果包括第一峰值概率。
在一个可能的实现中,基于待解码特征图的码流,获得该待解码特征图对应的边信息;基于该边信息,获得每个特征元素对应的第一概率估计结果。
具体地,该待解码特征图的码流中包括边信息的熵编码结果,因此可以对待解码特征图的码流进行熵解码,得到的熵解码结果中包括该待解码特征图对应的边信息。进一步地,如图4a所示,将边信息作为图1中概率估计模块103的输入,该概率估计模块103的输出即为各个特征元素(包括第一特征元素的集合中的特征元素和第二特征元素的集合中的特征元素)的第一概率估计结果。
示例性地,某个特征元素的第一概率估计结果请参见图2b所示,横轴表示该特征元素
Figure PCTCN2022117819-appb-000017
的每个可能取值(或称为可能数值),纵轴表示该每个可能取值(或称为可能数值)的可能性。第一峰值概率即为该第一概率估计结果中概率最大值,也可称为第一概率估计结果中的概率峰值,如图2b中点P纵坐标的数值p即为该第一概率估计结果中的第一峰值概率。需要知晓的是,第一概率估计结果为高斯分布,该第一峰值概率为所述高斯分布的均值概率。或者,第一概率估计结果为混合高斯分布,该混合高斯分布由多个高斯分布组成,第一峰值概率为各个高斯分布的均值概率中的最大值;或者,第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到。其中,基于第一概率估计结果获得第一峰值概率的具体实施方式,可参见前述S303中对第一概率估计结果和第一峰值概率的相关描述,重复之处不再赘述。
其中该概率估计模块103可以为概率估计网络,该概率估计网络可以使用RNN、CNN、RNN的变形、CNN的变形或其他深度神经网络(或其他深度神经网络的变形)。请参见图4b,图4b为一种概率估计网络的结构示意图,在图4b中该概率估计网络为卷积网络,该卷积网络中包括了5个网络层:3个卷积层以及2个非线性激活层。该概率估计模块103还可以采用非网络的传统概率估计方法实现。概率估计方法包括且不限于等最大似然估计、最大后验估计、极大似然估计等统计方法。
S503、基于第一阈值和每个特征元素对应的第一峰值概率,从多个特征元素中确定第一特征元素的集合和第二特征元素的集合。
基于第一阈值和每个特征元素对应的第一峰值概率之间的数量关系,从待解码特征图的多个特征元素中确定第一特征元素的集合和第二特征元素的集合。其中,该第一阈值可以是由特征图编码方法对应的设备和特征图解码方法对应的设备之间协商确定的;也可以是根据经验值设定的;该第一阈值还可以是基于待解码特征图的码流得到的。
具体的,该第一阈值可以为前述S402的方式一中:第三特征元素的集合中的最大第一峰值概率。在这种情况下,针对该待解码特征图中的每个特征元素而言,若该特征元素的第一峰值概率大于该第一阈值,则将该特征元素确定为第二特征元素(即第二特征元素的集合中的特征元素);若该特征元素的第一峰值概率小于或等于(包括小于或小于等于)该第一阈值,则将该特征元素确定为第一特征元素(即第一特征元素的集合中的特征元素)。
示例性地,该第一阈值为75%,待解码特征图的多个特征元素为特征元素1、特征元素2、特征元素3、特征元素4和特征元素5,其中,特征元素1的第一峰值概率为70%,小于第一阈值;特征元素2的第一峰值概率为65%,小于第一阈值;特征元素3的第一峰值概率为80%,大于第一阈值;特征元素4的第一峰值概率为60%,小于第一阈值;和特征元素5的第一峰值概率为75%,等于第一阈值。综上,将特征元素1、特征元素2、特征元素4和特征元素5确定为第一特征元素。综上所示,将特征元素1、特征元素2、特征元素4和特征元素5确定为第一特征元素的集合中的特征元素;将特征元素3确定为第二特征元素的集合中的特征元 素。
在一个情形中,该第一概率估计结果的形式为高斯分布,则该第一概率估计结果还包括第一概率方差值。在这种情况下,对于S3的一个可选的实施方式为:基于第一阈值和每个特征元素的第一概率方差值,从多个特征元素中确定第一特征元素的集合和第二特征元素的集合。具体地,该第一阈值可以为前述S402的方式二中:第三特征元素的集合中的最小第一概率方差值;进一步地,针对该待解码特征图中的每个特征元素而言,若该特征元素的第一概率方差值小于该第一阈值,则将该特征元素确定为第二特征元素(即第二特征元素的集合中的特征元素);若该特征元素的第一概率方差值大于等于该第一阈值,则将该特征元素确定为第一特征元素(即第一特征元素的集合中的特征元素)。
示例性地,该第一阈值为0.5,组成第一待编码特征图的多个特征元素为:特征元素1、特征元素2、特征元素3、特征元素4和特征元素5。其中,特征元素1的第一峰值概率为0.6,大于第一阈值;特征元素2的第一峰值概率为0.7,大于第一阈值;特征元素3的第一峰值概率为0.4,小于第一阈值;特征元素4的第一峰值概率为0.75,大于第一阈值;和特征元素5的第一峰值概率为0.5,等于第一阈值。综上所示,将特征元素1、特征元素2、特征元素4和特征元素5确定为第一特征元素的集合中的特征元素;将特征元素3确定为第二特征元素的集合中的特征元素。
S504、基于第一特征元素的集合和第二特征元素的集合,得到解码特征图。
换言之,根据第一特征元素的集合中每个特征元素的数值和第二特征元素的集合中每个特征元素的第一概率估计结果,得到解码特征图的值。
在一个可能的实现中,对第一特征元素对应第一概率估计结果进行熵解码,得到该第一特征元素(理解为第一特征元素的集合中的特征元素的统称)的数值;第一概率估计结果包括第一峰值概率和该第一峰值概率对应的特征值,进一步地,基于第二特征元素(理解为第二特征元素的集合中的特征元素的统称)的第一峰值概率对应的特征值,得到第二特征元素的数值。即可以理解为,对该第一特征元素的集合中所有特征元素对应第一概率估计结果进行熵解码,得到第一特征元素的集合中所有特征元素的数值;并基于第二特征元素中各个特征元素的第一峰值概率对应的特征值,得到第二特征元素的集合中所有特征元素的数值,不需要对第二特征元素的集合中的任一特征元素进行熵解码。
示例性地,对待解码特征图进行数据解码,即是想要得到各个特征元素的数值。待解码特征图的多个特征元素为特征元素1、特征元素2、特征元素3、特征元素4和特征元素5。其中,特征元素1、特征元素2、特征元素4和特征元素5确定第一特征元素的集合中的特征元素;将特征元素3确定为第二特征元素的集合中的特征元素。进一步地,将码流和第一特征元素对应的第一概率估计结果作为输入,输入至图1所示的数据解码模块104中,得到特征元素1的数值、特征元素2的数值、特征元素4的数值和特征元素5的数值;并将特征元素3的第一概率估计结果中第一峰值概率对应的特征值,确定为在待解码特征图中特征元素3的数值;进而将特征元素1的数值、特征元素2的数值、特征元素3的数值、特征元素4的数值和特征元素5的数值组合为待解码特征图的值。
需要说明的是,若第一特征元素的集合为空集(即每个特征元素均未进行熵编码),则根据每个特征元素的第一概率估计结果(此处指该第一概率估计结果中第一峰值概率对应的特征值),即可得到解码特征图的值。若第二特征元素的集合为空集(即每个特征元素均进行熵编码),则对每个特征元素对应第一概率估计结果进行熵解码,即可得到解码特征图的值。
相比于通过每个特征元素对应概率估计结果中某个固定值对应的概率来确定该特征元素 是否需要执行编码,通过基于图3提供的根据特征元素对应概率估计结果的峰值概率,来确定该特征元素是否需要跳过熵编码过程的方法,可以提升判断结果(该特征元素是否需要执行熵编码)的可靠性,并且可以显著减少执行熵编码的元素个数,降低熵编码复杂度。并且通过图5所提供的采用未进行熵编码的特征元素(即第二特征元素)的第一概率峰值的特征值来作为第二特征元素的数值,构成待解码特征图的值的可靠性,优于传统的采用固定值来替换第二特征元素的数值,构成待解码特征图的值,进一步地提升了数据解码的准确性,提升了数据编解码方法的性能。
编码侧:请参见图6a,图6a为本申请实施例提供的另一种特征图编码方法的流程示意图,该特征图编码方法的流程包括S601~S607。其中:
S601、获取第一待编码特征图,该第一待编码特征图包括多个特征元素。
其中,S601的具体实现方式可参见前述S301的具体实现方式的描述,在此不再进行过多赘述。
S602、基于该第一待编码特征图,获得该第一待编码特征图的边信息和每个特征元素的第二上下文信息。
其中,获取第一待编码特征图边信息的具体实现方式可参见前述S302的具体实现方式的描述,在此不再进行过多赘述。
获取第二上下文的方式可以是,通过网络模块从第一待编码特征图中获取该第二上行文信息,其中该网络模块可以是RNN或RNN的网络变形。第二上下文信息可以理解为该特征元素在该第一待编码特征图中预设区域范围内的特征元素(或特征元素的数值)。
S603、基于边信息和第二上下文信息,得到每个特征元素的第二概率估计结果。
如图6b所示,将边信息和第二上下文信息作为图1中概率估计模块103的输入,该概率估计模块103的输出即为各个特征元素的第二概率估计结果。该概率估计模块103的具体描述可参见前述S303中所示。该第二概率估计结果的形式包括高斯分布或其他形式概率分布(包括但不限于拉普拉斯分布或混合高斯分布等)。一个特征元素的第二概率结果的示意图同前述图2b所示的第一概率结果示意图,在此不再进行详细讲解。
S604、基于每个特征元素的第二概率结果,确定第一阈值。
在一个可能的实现中,基于第一待编码特征图中每个特征元素的第二概率估计结果,从第一待编码特征图中的多个特征元素中确定第三特征元素的集合。进一步地,基于该第三特征元素的集合中各个特征元素的第二概率估计结果,确定第一阈值。具体地,根据第三特征元素的集合中各个特征元素的第二概率估计结果,确定第一阈值的具体方式可参见图4c所示的根据第三特征元素的集合中各个特征元素的第一概率估计结果确定第一阈值的具体方式,在此不再进行赘述。
S605、针对该第一待编码特征图中每个特征元素而言,基于边信息和该特征元素的第一上下文信息,确定该特征元素的第一概率估计结果。
其中,该第一上下文信息为该特征元素在第二待编码特征图中预设区域范围内的特征元素,该第二待编码特征图的值由第一特征元素的数值和第二特征元素的第一峰值概率对应的特征值组成,该第二特征元素为第一待编码特征图中除第一特征元素之外的特征元素。需要理解的是,第一待编码特征图包括的特征元素的数量和第二待编码特征图包括的特征元素的数量相同,第一待编码特征图的值与第二待编码特征图的值不同,第二待编码特征图可以理解为对第一待编码特征图进行解码后的特征图(即本申请中的待解码特征图)。第一上下文信 息描述了第二待编码特征图中各个特征元素之间的关系,第二上下文信息描述了第一待编码特征图中各个特征元素之间的关系。
示例性地,第一待编码特征图中的特征元素有:特征元素1、特征元素2、特征元素3、……、特征元素m。基于S604的具体描述方式得到第一阈值之后,对特征元素1、特征元素2、特征元素3、特征元素4和特征元素5进行交替概率估计和熵编码。即可以理解为首先对特征元素1进行概率估计和熵编码,由于特征元素1是第一个执行熵编码的特征元素,因此该特征元素1的第一上下文信息为空,此时仅需要根据边信息对特征元素1进行概率估计,得到特征元素1对应的第一概率估计结果;进一步地根据该第一概率估计结果和第一阈值,确定特征元素1是否为第一特征元素,仅在特征元素1为第一特征元素时对特征元素1进行熵编码;确定特征元素1在第二待编码特征图中的数值。接下来,针对特征元素2而言,根据边信息和第一上下文信息(此时可以理解为第一特征元素在第二待编码特征图中的数值),估计特征元素2的第一概率估计结果,进一步地,根据该第一概率估计结果和第一阈值,确定特征元素2是否为第一特征元素,仅在特征元素2为第一特征元素时对特征元素2进行熵编码;确定特征元素2在第二待编码特征图中的数值。然后针对特征元素3而言,根据边信息和第一上下文信息(此时可以理解为第一特征元素在第二待编码特征图中的数值,和第二特征元素在第二待编码特征图中的数值),估计特征元素3的第一概率估计结果,进一步地,根据该第一概率估计结果和第一阈值,确定特征元素3是否为第一特征元素,仅在特征元素3为第一特征元素时对特征元素3进行熵编码;确定特征元素3在第二待编码特征图中的数值。依次类推,直到对第一待编码特征图中的所有特征元素概率估计完毕为止。
S606、根据该特征元素的第一概率估计结果和第一阈值,确定该特征元素是否为第一特征元素。
S607、仅在该特征元素为第一特征元素的情况下,对第一特征元素进行熵编码。
其中,S606-S607的具体实现方式可参见前述S305~S306的具体实现方式的相关描述,在此不再进行赘述。
需要理解的是,针对特征图中任一特征元素而言,将用于判断该特征元素是否为第一特征元素(即需要熵编码的特征元素)的概率估计结果记为该特征元素的第一概率估计结果,将用于确定第一阈值的概率结果记为第二概率估计结果。在图6a所示的特征图编码方法中,特征元素的第一概率估计结果不同于该特征元素的第二概率估计结果。但在图3所示的特征图编码方法中,由于未引入上下文特征进行概率估计,故特征元素的第一概率估计结果和该特征元素的第二概率估计结果相同。
解码侧:请参见图7a,图7a为本申请实施例提供的一种特征图解码方法的流程示意图,该特征图解码方法的流程包括S701~S706。其中:
S701、获取待解码特征图的码流,该待解码特征图包括多个特征元素。
其中,S701的具体实现方式可参见前述S501的具体实现方式的描述,在此不再进行赘述。
S702、基于该待解码特征图的码流,获得该待解码特征图对应的边信息。
在一个可能的实现中,基于待解码特征图的码流,获得该待解码特征图对应的边信息;基于该边信息,获得每个特征元素对应的第一概率估计结果。
具体地,该待解码特征图的码流中包括边信息的熵编码结果,因此可以对待解码特征图的码流进行熵解码,得到的熵解码结果中包括该待解码特征图对应的边信息。
S703、针对每个特征元素而言,基于该边信息和第一上下文信息估计该特征元素的第一 概率估计结果。
其中,该第一上下文信息为该特征元素在待解码特征图(即S605的第二待编码特征图)中预设区域范围内的特征元素。需要知晓的是,在此种情况下,对待解码特征图中的特征元素依次交替进行概率估计和熵解码。
示例性地,待解码特征图的特征元素有特征元素1、特征元素2、特征元素3、……、特征元素m。首先对特征元素1进行概率估计和熵解码,由于特征元素1是第一个执行熵解码的特征元素,因此该特征元素1的第一上下文信息为空,此时仅需要根据边信息对特征元素1进行概率估计,得到特征元素1对应的第一概率估计结果;进一步地,确定(或判断)该特征元素1为第一特征元素或第二特征元素,根据判断结果确定在待解码特征图中该特征元素1的数值。接下来,针对特征元素2而言,根据边信息和第一上下文信息(此时可以理解为第一特征元素在待解码特征图中的数值),估计特征元素2的第一概率估计结果;进一步地,确定(或判断)特征元素2为第一特征元素还是第二特征元素;根据判断结果,确定特征元素2在待解码特征图中的数值。然后针对特征元素3而言,根据边信息和第一上下文信息(此时可以理解为第一特征元素在待解码特征图中的数值,和第二特征元素在待解码特征图中的数值),估计特征元素3的第一概率估计结果;进一步地,确定特征元素3为第一特征元素还是第二特征元素;根据判断结果,确定特征元素3在待解码特征图中的数值。依次类推,直到对所有特征元素概率估计完毕为止。
S704、根据该特征元素的第一概率估计结果和第一阈值,确定该特征元素为第一特征元素或第二特征元素。
其中,S704的具体实现方式可参见S503的具体实现方式的描述,在此不再进行赘述。
S705、在该特征元素为第一特征元素的情况下,基于该第一特征元素的第一概率估计结果和待解码特征图的码流进行熵解码,得到第一特征元素的数值。
若针对该特征元素的判断结果为:该特征元素为第一特征元素,则基于该第一特征元素的第一概率估计结果,对第一特征元素进行熵解码,得到第一特征元素在解码特征图的数值。该第一特征元素在解码特征图的数值与该第一特征元素的待编码特征图的数值相同。
S706、在该特征元素为第二特征元素的情况下,基于该第二特征元素的第一概率估计结果,得到第二特征元素的数值。
若针对该特征元素的判断结果为:该特征元素为第二特征元素,则将该第二特征元素的第一峰值概率对应的特征值,确定为第二特征元素的数值。即该第二特征元素不需要进行熵解码,且该第二特征元素在解码特征图的数值与该第二特征元素的待编码特征图的数值可以相同或不同。基于所有第二特征元素的数值和所述有第一特征元素的数值共同,确定解码特征图的值,得到解码特征图。
相比于图3所提供的特征图编码方法,图6a所提供的特征图编码方法结合了上下文信息进行概率估计,提升了每个特征元素对应概率估计结果的准确性,从而增加了跳过编码过程的特征元素的数量,进一步提升了数据编码效率。相比于图5所提供的特征图解码方法,图7a所提供的特征图解码方法集合上下文信息进行概率估计,提升了每个特征元素对应概率估计结果的准确性,从而提升了未进行熵编码的特征元素(即第二特征元素)在待解码特征图中的可靠性,提升了数据解码的性能。
申请人以无跳过编码(即在对待编码特征图进行熵编码时,对待编码特征图中的所有特征元素执行熵编码过程)的特征图编解码方法记为基线方法,将图6a和图7a所提供的特征图编解码方法(记为动态峰值跳过的特征图编解码方法),与通过每个特征元素对应概率估计 结果中某个固定值对应的概率跳过特征元素进行特征图编码的方法(记为固定峰值跳过的特征图编解码方法)进行对比实验,其对比实验结果请参见表1所示。其中,固定峰值跳过的特征图解码方法相比于基线方法,其得到相同图像质量的数据量减少了0.11%,而本方案节得到相同图像质量的数据量减少了1%。
表1
方法 节省的数据量
基线方法 0%
固定峰值跳过的特征图解码方法 -0.11%
动态峰值跳过的特征图解码方法 -1%
可见,在保证解码图像质量的情况下,采用本申请所提供的技术方法,能节省较多的数据量,提升数据压缩的性能(包括但不限于压缩率)。
申请人还将图6a和图7a所提供的特征图编解码方法与固定峰值跳过的特征图编解码方法进行对比实验,其对比实验结果图如图7b和图7c所示。在图7b中,纵轴可以理解为重建图像的质量,横轴为图像压缩率,通常随着图像压缩率的增大,重建图像的质量将会变得更好。从图7b中可见,动态峰值跳过的特征图编解码方法(即图7b中记为动态峰值)与固定峰值跳过的特征图编码方法(即图7b中记为固定峰值)的曲线几乎重合,即表明在相同的重建图像质量(即纵坐标的数值相同)的情况下,动态峰值跳过的特征图编解码方法(即图7b中记为动态峰值)略优于固定峰值跳过的特征图编码方法(即图7b中记为固定峰值)。在图7c中,纵轴为跳过特征元素的占比,横轴为图像压缩率,通常随着图像压缩率的增大,可跳过特征元素的占比将逐渐变低。从图7c中可见,动态峰值跳过的特征图编解码方法(即图7c中记为动态峰值)的曲线,在固定峰值跳过的特征图编码方法(即图7c中记为固定峰值)曲线的上方,即表明在相同的图像压缩率(即横坐标的数值相同)的情况下,动态峰值跳过的特征图编解码方法(即图7c中记为动态峰值)可跳过编码过程的特征元素,多于固定峰值跳过的特征图编码方法(即图7c中记为固定峰值)。
请参见图8,图8为本申请提供的一个特征图编码装置的结构示意图。该特征图编码装置可以为于图1的概率估计模块103和数据编码模块104的集成。该装置包括:
获取模块80,用于获取第一待编码特征图,所述第一待编码特征图包括多个特征元素;编码模块81,用于基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;针对所述第一待编码特征图中的每个特征元素,基于所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素;仅在所述特征元素为第一特征元素的情况下,对所述第一特征元素进行熵编码。
在一个可能的实现中,第一概率估计结果为高斯分布,第一峰值概率为该高斯分布的均值概率;
或者,第一概率估计结果为混合高斯分布,混合高斯分布由多个高斯分布组成,第一峰值概率为各个高斯分布的均值概率中的最大值;或者,第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到。
在一个可能的实现中,编码模块81,具体用于基于第一阈值和所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素。
在一个可能的实现中,编码模块81,还用于基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第二概率估计结果,所述第二概率估计结果包括第二峰值概率; 基于所述每个特征元素的第二概率估计结果,从所述多个特征元素中确定第三特征元素的集合;基于所述第三特征元素的集合中各个特征元素的第二峰值概率,确定第一阈值;对所述第一阈值进行熵编码。
在一个可能的实现中,所述第一阈值为所述第三特征元素集合中各个特征元素对应的第二峰值概率的最大第二峰值概率。
在一个可能的实现中,所述第一特征元素的第一峰值概率小于或等于所述第一阈值。
在一个可能的实现中,所述第二概率估计结果为高斯分布,所述第二概率估计结果还包括第二概率方差值,所述第一阈值为所述第三特征元素集合中各个特征元素对应的第二概率方差值的最小第二概率方差值。
在一个可能的实现中,所述第一概率估计结果为高斯分布,所述第一概率估计结果还包括第一概率方差值,所述第一特征元素的第一概率方差值大于或等于所述第一阈值。
在一个可能的实现中,所述第二概率估计结果还包括所述第二峰值概率对应的特征值,编码模块81,具体用于基于预设误差、所述每个特征元素的数值和所述每个特征元素的第二峰值概率对应的特征值,从所述多个特征元素中确定第三特征元素的集合。
在一个可能的实现中,所述第三特征元素的集合中的特征元素具有以下特征:
Figure PCTCN2022117819-appb-000018
其中,
Figure PCTCN2022117819-appb-000019
为所述特征元素,p(x,y,i)为所述特征元素的第二峰值概率对应的特征值,TH_2为所述预设误差。
在一个可能的实现中,所述第一概率估计结果和所述第二概率估计结果相同,编码模块81,具体用于基于第一待编码特征图,获取所述第一待编码特征图的边信息;对所述边信息进行概率估计,得到所述每个特征元素的第一概率估计结果。
在一个可能的实现中,所述第一概率估计结果和所述第二概率估计结果不同,编码模块81,还用于基于第一待编码特征图,获取所述第一待编码特征图的边信息和所述每个特征元素的第二上下文信息,所述第二上下文信息为所述特征元素在所述第一待编码特征图中预设区域范围内的特征元素;基于所述边信息和第二上下文信息,得到所述每个特征元素的第二概率估计结果。
在一个可能的实现中,编码模块81,具体用于基于第一待编码特征图,获取所述第一待编码特征图的边信息;针对所述第一待编码特征图中的任一特征元素,基于第一上下文信息和所述边信息,确定所述特征元素的第一概率估计结果;其中,所述第一概率估计结果还包括所述第一概率峰值对应的特征值,所述第一上下文信息为所述特征元素在第二待编码特征图中预设区域范围内的特征元素,所述第二待编码特征图的值由所述第一特征元素的数值和第二特征元素的第一峰值概率对应的特征值组成,所述第二特征元素为所述第一待编码特征图中除所述第一特征元素之外的特征元素。
在一个可能的实现中,编码模块81,还用于将所有所述第一特征元素的熵编码结果,写入编码码流。
请参见图9,图9为本申请提供的一个特征图解码装置的结构示意图。该特征图解码装置可以为图1的概率估计模块103和数据解码模块105的集成。该特征图解码装置,包括:
获取模块90,用于获取待解码特征图的码流,所述待解码特征图包括多个特征元素;以及用于基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;
解码模块91,用于基于第一阈值和所述每个特征元素对应的第一峰值概率,从所述多个 特征元素中确定第一特征元素的集合和第二特征元素的集合;基于所述第一特征元素的集合和所述第二特征元素的集合,得到所述待解码特征图。
在一个可能的实现中,第一概率估计结果为高斯分布,第一峰值概率为该高斯分布的均值概率;
或者,第一概率估计结果为混合高斯分布,该混合高斯分布由多个高斯分布组成,第一峰值概率为各个高斯分布的均值概率中的最大值;或者,第一峰值概率由各个高斯分布的均值概率和各个高斯分布在该混合高斯分布中的权重计算得到。
在一个可能的实现中,所述待解码特征图的值由所述第一特征元素的集合中所有第一特征元素的数值和所述第二特征元素的集合中所有第二特征元素的数值组成。
在一个可能的实现中,该第一特征元素的集合为空集,或者,该第二特征元素的集合为空集。
在一个可能的实现中,所述第一概率估计结果还包括所述第一峰值概率对应的特征值,解码模块91,还用于基于所述第一特征元素对应的第一概率估计结果,对所述第一特征元素进行熵解码,得到所述第一特征元素的数值;基于所述第二特征元素的第一峰值概率对应的特征值,得到所述第二特征元素的数值。
在一个可能的实现中,所述解码模块91,还用于基于所述待解码特征图的码流,得到所述第一阈值。
在一个可能的实现中,所述第一特征元素的第一峰值概率小于或等于所述第一阈值,所述第二特征元素的第一峰值概率大于所述第一阈值。
在一个可能的实现中,所述第一概率估计结果为高斯分布,所述第一概率估计结果还包括第一概率方差值,所述第一特征元素的第一概率方差值大于或等于所述第一阈值,所述第二特征元素的第一概率方差值小于所述第一阈值。
在一个可能的实现中,所述获取模块90,还用于基于所述待解码特征图的码流,获得所述待解码特征图对应的边信息;基于所述边信息,获得所述每个特征元素对应的第一概率估计结果。
在一个可能的实现中,所述解码模块91,还用于基于所述待解码特征图的码流,获得所述待解码特征图对应的边信息;针对所述待编码特征图中每个特征元素,基于所述边信息和第一上下文信息估计所述每个特征元素的第一概率估计结果;其中,所述第一上下文信息为所述特征元素在所述待解码特征图中预设区域范围内的特征元素。
图10是本申请实施例提供的一种特征图编码装置或特征图解码装置的硬件结构示意图。图10所示的装置(该装置具体可以是一种计算机设备1000)包括存储器1001、处理器1002、通信接口1003以及总线1004。其中,存储器1001、处理器1002、通信接口1003通过总线1004实现彼此之间的通信连接。
存储器1001可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1001可以存储程序,当存储器1001中存储的程序被处理器1002执行时,执行本申请实施例提供的特征图编码方法的各个步骤,或执行本申请实施例提供的特征图解码方法的各个步骤。
处理器1002可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例 的特征图编码装置或特征图解码装置中的单元所需执行的功能,或者执行本申请方法实施例的特征图编码方法的各个步骤,或执行本申请实施例提供的特征图解码方法的各个步骤。
处理器1002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的特征图编码方法的各个步骤或特征图解码方法的各个步骤可以通过处理器1002中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1002还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1001,处理器1002读取存储器1001中的信息,结合其硬件完成本申请实施例的特征图编码装置或特征图解码装置中包括的单元所需执行的功能,或者执行本申请方法实施例的特征图编码方法或特征图解码方法。
通信接口1003使用例如但不限于收发器一类的收发装置,来实现计算机设备1000与其他设备或通信网络之间的通信。
总线1004可包括在计算机设备1000各个部件(例如,存储器1001、处理器1002、通信接口1003)之间传送信息的通路。
应理解,图8中特征图编码装置中的获取模块80相当于计算机设备1000中的通信接口1003,编码模块81相当于计算机设备1000中的处理器1002。或者,图9中特征图解码装置中获取模块90相当于计算机设备1000中的通信接口1003、解码模块91相当于计算机设备1000中的处理器1002。
需要说明的是,本申请实施例中所描述的计算机设备1000中各功能单元的功能可参见上述方法实施例中相关步骤的描述,此处不再赘述。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤,以及实现上述图10所描述的任意一个功能模块的功能。
本申请实施例还提供了一种计算机程序产品,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。上述所涉及的设备的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取存储介质中。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元判断。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体 (例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来判断指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于判断所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (50)

  1. 一种特征图解码方法,其特征在于,所述方法包括:
    获取待解码特征图的码流,所述待解码特征图包括多个特征元素;
    基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;
    基于第一阈值和所述每个特征元素对应的第一峰值概率,从所述多个特征元素中确定第一特征元素的集合和第二特征元素的集合;
    基于所述第一特征元素的集合和所述第二特征元素的集合,得到解码特征图。
  2. 根据权利要求1所述方法,其特征在于,所述第一概率估计结果为高斯分布,所述第一峰值概率为所述高斯分布的均值概率;
    或者,所述第一概率估计结果为混合高斯分布,所述混合高斯分布由多个高斯分布组成,所述第一峰值概率为各个高斯分布的均值概率中的最大值;或者,所述第一峰值概率由各个高斯分布的均值概率和所述各个高斯分布在所述混合高斯分布中的权重计算得到。
  3. 根据权利要求1或2所述方法,其特征在于,所述解码特征图的值由所述第一特征元素的集合中所有第一特征元素的数值和所述第二特征元素的集合中所有第二特征元素的数值组成。
  4. 根据权利要求3所述方法,其特征在于,所述第一特征元素的集合为空集,或者,所述第二特征元素的集合为空集。
  5. 根据权利要求3或4所述方法,其特征在于,所述第一概率估计结果还包括所述第一峰值概率对应的特征值,所述方法还包括:
    基于所述第一特征元素对应的第一概率估计结果,对所述第一特征元素进行熵解码,得到所述第一特征元素的数值;
    基于所述第二特征元素的第一峰值概率对应的特征值,得到所述第二特征元素的数值。
  6. 根据权利要求1-5中任一项所述方法,其特征在于,所述基于第一阈值和所述每个特征元素对应的第一峰值概率,从所述多个特征元素中确定第一特征元素的集合和第二特征元素的集合之前,所述方法还包括:
    基于所述待解码特征图的码流,得到所述第一阈值。
  7. 根据权利要求1-6中任一项所述方法,其特征在于,所述第一特征元素的第一峰值概率小于或等于所述第一阈值,所述第二特征元素的第一峰值概率大于所述第一阈值。
  8. 根据权利要求1-7中任一项所述方法,其特征在于,所述基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,包括:
    基于所述待解码特征图的码流,获得所述待解码特征图对应的边信息;
    基于所述边信息,获得所述每个特征元素对应的第一概率估计结果。
  9. 根据权利要求1-7中任一项所述方法,其特征在于,所述基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,包括:
    基于所述待解码特征图的码流,获得所述待解码特征图对应的边信息;
    针对所述待编码特征图中每个特征元素,基于所述边信息和第一上下文信息估计所述每个特征元素的第一概率估计结果;其中,所述第一上下文信息为所述特征元素在所述待解码特征图中预设区域范围内的特征元素。
  10. 一种特征图编码方法,其特征在于,所述方法包括:
    获取第一待编码特征图,所述第一待编码特征图包括多个特征元素;
    基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;
    针对所述第一待编码特征图中的每个特征元素,基于所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素;
    仅在所述特征元素为第一特征元素的情况下,对所述第一特征元素进行熵编码。
  11. 根据权利要求10所述方法,其特征在于,所述第一概率估计结果为高斯分布,所述第一峰值概率为所述高斯分布的均值概率;
    或者,所述第一概率估计结果为混合高斯分布,所述混合高斯分布由多个高斯分布组成,所述第一峰值概率为各个高斯分布的均值概率中的最大值;或者,所述第一峰值概率由各个高斯分布的均值概率和所述各个高斯分布在所述混合高斯分布中的权重计算得到。
  12. 根据权利要求10或11所述方法,其特征在于,所述针对所述第一待编码特征图中的每个特征元素,基于所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素,包括:
    基于第一阈值和所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素。
  13. 根据权利要求12所述方法,其特征在于,所述方法还包括:
    基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第二概率估计结果,所述第二概率估计结果包括第二峰值概率;
    基于所述每个特征元素的第二概率估计结果,从所述多个特征元素中确定第三特征元素的集合;
    基于所述第三特征元素的集合中各个特征元素的第二峰值概率,确定第一阈值;
    对所述第一阈值进行熵编码。
  14. 根据权利要求13所述方法,其特征在于,所述第一阈值为所述第三特征元素集合中各个特征元素对应的第二峰值概率的最大第二峰值概率。
  15. 根据权利要求14所述方法,其特征在于,所述第一特征元素的第一峰值概率小于或等于所述第一阈值。
  16. 根据权利要求13-15中任一项所述方法,所述第二概率估计结果还包括所述第二峰值 概率对应的特征值,所述基于所述每个特征元素的第二概率估计结果,从所述多个特征元素中确定第三特征元素集合,包括:
    基于预设误差、所述每个特征元素的数值和所述每个特征元素的第二峰值概率对应的特征值,从所述多个特征元素中确定第三特征元素的集合。
  17. 根据权利要求16所述方法,其特征在于,所述第三特征元素的集合中的特征元素具有以下特征:
    Figure PCTCN2022117819-appb-100001
    其中,
    Figure PCTCN2022117819-appb-100002
    为所述特征元素的数值,p(x,y,i)为所述特征元素的第二峰值概率对应的特征值,TH_2为所述预设误差。
  18. 根据权利要求13-17中任一项所述方法,其特征在于,所述第一概率估计结果和所述第二概率估计结果相同,所述基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,包括:
    基于所述第一待编码特征图,获取所述第一待编码特征图的边信息;
    对所述边信息进行概率估计,得到所述每个特征元素的第一概率估计结果。
  19. 根据权利要求13-17中任一项所述方法,其特征在于,所述第一概率估计结果和所述第二概率估计结果不同,所述基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第二概率估计结果,包括:
    基于所述第一待编码特征图,获取所述第一待编码特征图的边信息和所述每个特征元素的第二上下文信息,所述第二上下文信息为所述特征元素在所述第一待编码特征图中预设区域范围内的特征元素;
    基于所述边信息和所述第二上下文信息,得到所述每个特征元素的第二概率估计结果。
  20. 根据权利要求19所述方法,其特征在于,所述基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,包括:
    基于第一待编码特征图,获取所述第一待编码特征图的边信息;
    针对所述第一待编码特征图中的任一特征元素,基于第一上下文信息和所述边信息,确定所述特征元素的第一概率估计结果;其中,所述第一概率估计结果还包括所述第一概率峰值对应的特征值,所述第一上下文信息为所述特征元素在第二待编码特征图中预设区域范围内的特征元素,所述第二待编码特征图的值由所述第一特征元素的数值和所述第二特征元素的第一峰值概率对应的特征值组成,所述第二特征元素为所述第一待编码特征图中除所述第一特征元素之外的特征元素。
  21. 根据权利要求10-20中任一项所述方法,其特征在于,所述方法还包括:
    将所有所述第一特征元素的熵编码结果,写入编码码流。
  22. 一种特征图解码装置,其特征在于,包括:
    获取模块,用于获取待解码特征图的码流,所述待解码特征图包括多个特征元素;以及用于基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率 估计结果,所述第一概率估计结果包括第一峰值概率;
    解码模块,用于基于第一阈值和所述每个特征元素对应的第一峰值概率,从所述多个特征元素中确定第一特征元素的集合和第二特征元素的集合;基于所述第一特征元素的集合和所述第二特征元素的集合,得到所述待解码特征图。
  23. 根据权利要求22所述装置,其特征在于,所述第一概率估计结果为高斯分布,所述第一峰值概率为所述高斯分布的均值概率;
    或者,所述第一概率估计结果为混合高斯分布,所述混合高斯分布由多个高斯分布组成,所述第一峰值概率为各个高斯分布的均值概率中的最大值;或者,所述第一峰值概率由各个高斯分布的均值概率和所述各个高斯分布在所述混合高斯分布中的权重计算得到。
  24. 根据权利要求22或23所述装置,其特征在于,所述待解码特征图的值由所述第一特征元素的集合中所有第一特征元素的数值和所述第二特征元素的集合中所有第二特征元素的数值组成。
  25. 根据权利要求24所述装置,其特征在于,所述第一特征元素的集合为空集,或者,所述第二特征元素的集合为空集。
  26. 根据权利要求24或25所述装置,其特征在于,所述第一概率估计结果还包括所述第一峰值概率对应的特征值,所述解码模块还用于:
    基于所述第一特征元素对应的第一概率估计结果,对所述第一特征元素进行熵解码,得到所述第一特征元素的数值;
    基于所述第二特征元素的第一峰值概率对应的特征值,得到所述第二特征元素的数值。
  27. 根据权利要求22-26中任一项所述装置,其特征在于,所述基于第一阈值和所述每个特征元素对应的第一峰值概率,从所述多个特征元素中确定第一特征元素的集合和第二特征元素的集合之前,所述解码模块还用于:
    基于所述待解码特征图的码流,得到所述第一阈值。
  28. 根据权利要求22-27中任一项所述装置,其特征在于,所述第一特征元素的第一峰值概率小于或等于所述第一阈值,所述第二特征元素的第一峰值概率大于所述第一阈值。
  29. 根据权利要求22-28中任一项所述装置,其特征在于,所述基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,包括:
    基于所述待解码特征图的码流,获得所述待解码特征图对应的边信息;
    基于所述边信息,获得所述每个特征元素对应的第一概率估计结果。
  30. 根据权利要求22-28中任一项所述装置,其特征在于,所述基于所述待解码特征图的码流,获得所述多个特征元素中每个特征元素对应的第一概率估计结果,包括:
    基于所述待解码特征图的码流,获得所述待解码特征图对应的边信息;
    针对所述待编码特征图中每个特征元素,基于所述边信息和第一上下文信息估计所述每 个特征元素的第一概率估计结果;其中,所述第一上下文信息为所述特征元素在所述待解码特征图中预设区域范围内的特征元素。
  31. 一种特征图编码装置,其特征在于,包括:
    获取模块,用于获取第一待编码特征图,所述第一待编码特征图包括多个特征元素;
    编码模块,用于基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,所述第一概率估计结果包括第一峰值概率;针对所述第一待编码特征图中的每个特征元素,基于所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素;仅在所述特征元素为第一特征元素的情况下,对所述第一特征元素进行熵编码。
  32. 根据权利要求31所述装置,其特征在于,所述第一概率估计结果为高斯分布,所述第一峰值概率为所述高斯分布的均值概率;
    或者,所述第一概率估计结果为混合高斯分布,所述混合高斯分布由多个高斯分布组成,所述第一峰值概率为各个高斯分布的均值概率中的最大值;或者,所述第一峰值概率由各个高斯分布的均值概率和所述各个高斯分布在所述混合高斯分布中的权重计算得到。
  33. 根据权利要求31或32所述装置,其特征在于,所述针对所述第一待编码特征图中的每个特征元素,基于所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素,包括:
    基于第一阈值和所述特征元素的第一峰值概率,确定所述特征元素是否为第一特征元素。
  34. 根据权利要求33所述装置,其特征在于,所述装置还包括:
    基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第二概率估计结果,所述第二概率估计结果包括第二峰值概率;
    基于所述每个特征元素的第二概率估计结果,从所述多个特征元素中确定第三特征元素的集合;
    基于所述第三特征元素的集合中各个特征元素的第二峰值概率,确定第一阈值;
    对所述第一阈值进行熵编码。
  35. 根据权利要求34所述装置,其特征在于,所述第一阈值为所述第三特征元素集合中各个特征元素对应的第二峰值概率的最大第二峰值概率。
  36. 根据权利要求35所述装置,其特征在于,所述第一特征元素的第一峰值概率小于或等于所述第一阈值。
  37. 根据权利要求34-36中任一项所述装置,所述第二概率估计结果还包括所述第二峰值概率对应的特征值,所述基于所述每个特征元素的第二概率估计结果,从所述多个特征元素中确定第三特征元素集合,包括:
    基于预设误差、所述每个特征元素的数值和所述每个特征元素的第二峰值概率对应的特征值,从所述多个特征元素中确定第三特征元素的集合。
  38. 根据权利要求37所述装置,其特征在于,所述第三特征元素的集合中的特征元素具有以下特征:
    Figure PCTCN2022117819-appb-100003
    其中,
    Figure PCTCN2022117819-appb-100004
    所述特征元素的数值,p(x,y,i)为所述特征元素的第二峰值概率对应的特征值,TH_2为所述预设误差。
  39. 根据权利要求34-38中任一项所述装置,其特征在于,所述第一概率估计结果和所述第二概率估计结果相同,所述基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,包括:
    基于第一待编码特征图,获取所述第一待编码特征图的边信息;
    对所述边信息进行概率估计,得到所述每个特征元素的第一概率估计结果。
  40. 根据权利要求34-38中任一项所述装置,其特征在于,所述第一概率估计结果和所述第二概率估计结果不同,所述基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第二概率估计结果,包括:
    基于所述第一待编码特征图,获取所述第一待编码特征图的边信息和所述每个特征元素的第二上下文信息,所述第二上下文信息为所述特征元素在所述第一待编码特征图中预设区域范围内的特征元素;
    基于所述边信息和第二上下文信息,得到所述每个特征元素的第二概率估计结果。
  41. 根据权利要求40所述装置,其特征在于,所述基于所述第一待编码特征图,确定所述多个特征元素中每个特征元素的第一概率估计结果,包括:
    基于第一待编码特征图,获取所述第一待编码特征图的边信息;
    针对所述第一待编码特征图中的任一特征元素,基于第一上下文信息和所述边信息,确定所述特征元素的第一概率估计结果;其中,所述第一概率估计结果还包括所述第一概率峰值对应的特征值,所述第一上下文信息为所述特征元素在第二待编码特征图中预设区域范围内的特征元素,所述第二待编码特征图的值由所述第一特征元素的数值和第二特征元素的第一峰值概率对应的特征值组成,所述第二特征元素为所述第一待编码特征图中除所述第一特征元素之外的特征元素。
  42. 根据权利要求31-41中任一项所述装置,其特征在于,所述装置还包括:
    将所有所述第一特征元素的熵编码结果,写入编码码流。
  43. 一种解码器,其特征在于,包括处理电路,用于执行权利要求1-9中任一项所述方法。
  44. 一种编码器,其特征在于,包括处理电路,用于执行权利要求10-21中任一项所述方法。
  45. 一种计算机程序产品,其特征在于,包括程序代码,当其在计算机或处理器上判断时,用于判断权利要求1-9中任一项所述方法,或,用于判断权利要求10-21中任一项所述方法。
  46. 一种非瞬时性计算机可读存储介质,其特征在于,包括根据权利要求21所述的编码方法获得的码流。
  47. 一种解码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述解码器执行根据权利要求1-9中任一项所述的方法。
  48. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述编码器执行根据权利要求10-21任一项所述的方法。
  49. 一种数据处理器,其特征在于,包括处理电路,用于执行根据权利要求1-9中任一项所述方法,或,用于执行权利要求10-21中任一项所述的方法。
  50. 一种非瞬时性计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备判断时,用于执行根据权利要求1-9中任一项所述方法,或,用于执行权利要求10-21中任一项所述的方法。
PCT/CN2022/117819 2021-09-18 2022-09-08 特征图编解码方法和装置 WO2023040745A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2022348742A AU2022348742A1 (en) 2021-09-18 2022-09-08 Feature map encoding and decoding method and apparatus
CA3232206A CA3232206A1 (en) 2021-09-18 2022-09-08 Feature map encoding and decoding method and apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111101920.9 2021-09-18
CN202111101920 2021-09-18
CN202210300566.0 2022-03-25
CN202210300566.0A CN115834888A (zh) 2021-09-18 2022-03-25 特征图编解码方法和装置

Publications (1)

Publication Number Publication Date
WO2023040745A1 true WO2023040745A1 (zh) 2023-03-23

Family

ID=85522485

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117819 WO2023040745A1 (zh) 2021-09-18 2022-09-08 特征图编解码方法和装置

Country Status (4)

Country Link
CN (1) CN115834888A (zh)
AU (1) AU2022348742A1 (zh)
CA (1) CA3232206A1 (zh)
WO (1) WO2023040745A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019147020A1 (ko) * 2018-01-23 2019-08-01 주식회사 날비컴퍼니 이미지의 품질 향상을 위하여 이미지를 처리하는 방법 및 장치
CN111641832A (zh) * 2019-03-01 2020-09-08 杭州海康威视数字技术股份有限公司 编码方法、解码方法、装置、电子设备及存储介质
CN111818346A (zh) * 2019-04-11 2020-10-23 富士通株式会社 图像编码方法和装置、图像解码方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019147020A1 (ko) * 2018-01-23 2019-08-01 주식회사 날비컴퍼니 이미지의 품질 향상을 위하여 이미지를 처리하는 방법 및 장치
CN111641832A (zh) * 2019-03-01 2020-09-08 杭州海康威视数字技术股份有限公司 编码方法、解码方法、装置、电子设备及存储介质
CN111818346A (zh) * 2019-04-11 2020-10-23 富士通株式会社 图像编码方法和装置、图像解码方法和装置

Also Published As

Publication number Publication date
CN115834888A (zh) 2023-03-21
AU2022348742A1 (en) 2024-04-04
CA3232206A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
WO2018121670A1 (zh) 压缩/解压缩的装置和系统、芯片、电子装置
US11861809B2 (en) Electronic apparatus and image processing method thereof
WO2021155832A1 (zh) 一种图像处理方法以及相关设备
WO2022021938A1 (zh) 图像处理方法与装置、神经网络训练的方法与装置
US20210400277A1 (en) Method and system of video coding with reinforcement learning render-aware bitrate control
US20240105193A1 (en) Feature Data Encoding and Decoding Method and Apparatus
WO2018120019A1 (zh) 用于神经网络数据的压缩/解压缩的装置和系统
WO2023279961A1 (zh) 视频图像的编解码方法及装置
WO2021249290A1 (zh) 环路滤波方法和装置
US20230396810A1 (en) Hierarchical audio/video or picture compression method and apparatus
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
JP2023543520A (ja) 機械学習を基にしたピクチャコーディングにおけるクロマサブサンプリングフォーマット取り扱いのための方法
WO2023174256A1 (zh) 一种数据压缩方法以及相关设备
WO2023040745A1 (zh) 特征图编解码方法和装置
WO2022100173A1 (zh) 一种视频帧的压缩和视频帧的解压缩方法及装置
TW202318265A (zh) 基於注意力的圖像和視訊壓縮上下文建模
WO2024007820A1 (zh) 数据编解码方法及相关设备
WO2022194137A1 (zh) 视频图像的编解码方法及相关设备
WO2024078252A1 (zh) 特征数据编解码方法及相关装置
US20240078414A1 (en) Parallelized context modelling using information shared between patches
WO2023010981A1 (zh) 编解码方法及装置
WO2023279968A1 (zh) 视频图像的编解码方法及装置
WO2023165487A1 (zh) 特征域光流确定方法及相关设备
CN116797674A (zh) 图像编解码方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22869114

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2401001721

Country of ref document: TH

WWE Wipo information: entry into national phase

Ref document number: 3232206

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2022348742

Country of ref document: AU

Ref document number: AU2022348742

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2022869114

Country of ref document: EP

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112024005313

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2022869114

Country of ref document: EP

Effective date: 20240321

ENP Entry into the national phase

Ref document number: 2022348742

Country of ref document: AU

Date of ref document: 20220908

Kind code of ref document: A