CN116778003A - Feature map coding and feature map coding method and device - Google Patents

Feature map coding and feature map coding method and device Download PDF

Info

Publication number
CN116778003A
CN116778003A CN202210234510.XA CN202210234510A CN116778003A CN 116778003 A CN116778003 A CN 116778003A CN 202210234510 A CN202210234510 A CN 202210234510A CN 116778003 A CN116778003 A CN 116778003A
Authority
CN
China
Prior art keywords
value
code rate
target
probability distribution
reference value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210234510.XA
Other languages
Chinese (zh)
Inventor
崔泽
王晶
张恋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210234510.XA priority Critical patent/CN116778003A/en
Priority to PCT/CN2023/079510 priority patent/WO2023169319A1/en
Priority to TW112108485A priority patent/TW202403660A/en
Publication of CN116778003A publication Critical patent/CN116778003A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a feature map coding method, a feature map coding device and a feature map coding device, and relates to the technical field of image processing. When the feature map is encoded, the encoder can acquire a first feature map of the input image, wherein the first feature map comprises first feature values; determining a target value of a model parameter for estimating probability distribution of the first characteristic value according to the target code rate and a reference value of the model parameter of the probability estimation model; determining a probability distribution of the first characteristic value through the target value; entropy coding is carried out on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a compressed code stream; and transmitting the compressed code stream. The target value of the model parameter of the probability estimation model can be flexibly adjusted according to the requirement of the target code rate, the probability estimation model is relatively simple to learn, and the requirement of multi-code rate coding can be adapted under the condition that the target value of the model parameter is flexibly changed, so that the compressed code stream is reduced.

Description

Feature map coding and feature map coding method and device
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to a feature map coding method and a feature map coding device.
Background
With the wide application of deep learning, researchers propose image coding method based on deep learningAnd (3) a case. In a typical codec frame, at the encoding end, an original image x is input to a feature extraction module to output a feature map y, and then the feature map y is input to a quantization module to obtain a quantized feature mapThe feature map y outputs side information z through a side information extraction module, and the side information z is input into a quantization module for quantization to obtain +.>Will->Input to probability estimation unit to get ∈ ->According to +.>Probability distribution pairs->And performing entropy coding, and writing the result after the entropy coding into a code stream.
At the decoding end, the entropy decoding unit obtains according to the probability estimation unitIs decoded from the code stream to obtain side information +.>Then go through the side information->Obtaining the symbol to be decoded->Probability distribution N (μ, σ). The entropy decoding module is according to->Probability distribution of each feature element of (a) versus feature map +.>Entropy decoding is carried out on each characteristic element in the sequence to obtain a characteristic diagram +.>Feature map +.>The input image reconstruction module obtains a reconstruction map.
In practical application, in order to match different code rate requirements, the feature value of the feature map y is often scaled by using different gain vectors, however, the variation range of the scaled feature value of y is larger, thereby resulting in The variation range of the network model is larger, so that the multi-code rate scheme has large learning difficulty of the network model, and the rate distortion performance of image compression is low.
Disclosure of Invention
The application provides a feature map coding method, a feature map coding device and a feature map coding device, which are used for adapting to the requirement of multi-code rate coding and improving the rate distortion performance of image compression.
In a first aspect, the present application provides a feature map encoding method, which may be performed by an image encoding apparatus, where the image encoding apparatus may be generally carried in an electronic device, and the electronic device may be a device carrying a camera or a processor, such as a video camera, a mobile phone, a tablet computer, a notebook computer, or a television, and the application is not specifically limited herein.
The image coding device acquires a first feature map of an input image, wherein the first feature map comprises first feature values; determining a target value of a model parameter for estimating probability distribution of the first characteristic value according to the target code rate and a reference value of the model parameter of the probability estimation model; determining a probability distribution of the first characteristic value through the target value; entropy coding is carried out on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a compressed code stream; and transmitting the compressed code stream.
It should be noted that, in general, the first feature map is extracted by a feature extraction module of the image encoding apparatus, and the feature map generally includes information of three dimensions of width, height and channel. The first characteristic diagram comprises a plurality of first characteristic values, and the image coding device codes the plurality of first characteristic values of the first characteristic diagram into a compressed code stream, so that time delay and bandwidth occupied by data transmission can be reduced, and power consumption of equipment can be reduced.
When the first characteristic value of the first characteristic diagram is processed in a coding mode, the target value of the model parameter is determined based on the target code rate and the reference value of the model parameter of the probability estimation model, the target value is not a fixed numerical value, the target value can be flexibly adjusted along with the transformation of the target code rate and the transformation of the reference value, the requirement of multiple code rates can be met, and the compressed code stream is reduced. In addition, the rate distortion performance of image compression can be improved.
In an alternative embodiment, the first feature map is determined by extracting gain values for second feature values of a second feature map of the input image; the first feature map is different from the second feature map; the image coding device performs entropy coding on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a first compressed code stream; entropy decoding is carried out on the first compressed code stream to determine an estimated value of the first characteristic value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy coding the second characteristic value according to the probability distribution of the second characteristic value to obtain a second compressed code stream; the first compressed code stream and the second compressed code stream are transmitted.
After the probability distribution of the first characteristic value is determined by adopting the target value, the image coding device carries out entropy coding on the first characteristic value according to the probability distribution of the first characteristic value to obtain a first compressed code stream, then carries out entropy decoding on the first compressed code stream, determines the estimated value of the first characteristic value, determines the probability distribution of the second characteristic value based on the estimated value of the first characteristic value, and carries out entropy coding on the second characteristic value according to the probability distribution of the second characteristic value to obtain a second compressed code stream.
In an alternative embodiment, the image coding device determines the weight coefficient of the reference value according to the target code rate; the target value is determined based on the weight coefficient of the reference value and the reference value.
By determining the target value in the mode, the requirement of multiple code rates can be met, the compressed code stream can be reduced, and the rate distortion performance of image compression can be improved.
In an alternative embodiment, the image coding device multiplies or divides the reference value by the corresponding weight coefficient, and determines the target value.
It should be noted that, in actual application, the target value may be determined by multiplying the weight coefficient by the reference value according to a setting rule of a user or an actual requirement, or by dividing the weight coefficient by the reference value, or may be determined by other manners, which is not particularly limited herein.
In an alternative embodiment, the weight coefficient of the reference value is selected from a corresponding set of weight coefficients according to the target code rate, and the weight coefficients in the set of weight coefficients are obtained by training a probability estimation model.
It should be noted that, in practical application, the code rate may not correspond to the set of weight coefficients, the code rate may correspond to a plurality of candidate weight coefficients, and the weight coefficient of the reference value may also be selected from a plurality of candidate weight coefficients.
In an alternative embodiment, the reference value includes M, where M is a positive integer greater than or equal to 2; the image encoding device selects a target value from the M reference values according to the target code rate.
It should be noted that, in practical application, the probability estimation model may pre-store reference values of a plurality of model parameters, and the image encoding device directly selects one of the plurality of reference values as a target value based on the requirement of the target code rate after determining the target code rate, so that the data processing efficiency can be improved.
In an alternative embodiment, the M reference values correspond to M code rates; the image coding device determines a code rate alpha which has an association relation with a target code rate in M code rates; the association relationship includes one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of alpha; and taking the reference value corresponding to alpha as a target value.
It should be noted that, when the probability estimation model pre-stores the reference values of a plurality of model parameters, the reference values may have an association relationship with the code rate, and the target code rate may have an association relationship with the reference values, so that the reference value corresponding to the target code rate is directly used as the target value; if the target code rate and the reference value have no association relation, determining a code rate closest to the target code rate, namely a code rate alpha, from the code rates with association relation with the reference value, and taking the reference value corresponding to the code rate alpha as the target value. By the target value determined in this way, the requirement of multiple code rates can be adapted.
In an alternative embodiment, the target code rate is greater than the first code rate and less than the second code rate, the first code rate corresponds to a first reference value of the model parameter of the probability estimation model, the second code rate corresponds to a second reference value of the model parameter of the probability estimation model, and the target value is obtained through interpolation operation of the first reference value and the second reference value.
The application can expand the value range of the target value through interpolation operation and can adapt to the requirement of more code rates.
In an alternative embodiment, the first probability distribution is obtained by a first reference value, the second probability distribution is obtained by a second reference value, and the probability distribution of the first feature value is obtained by interpolation of the first probability distribution and the second probability distribution.
The probability distribution of the first characteristic value is determined through interpolation operation, so that the method can adapt to the requirement of more code rates.
In an alternative embodiment, the interpolation operation includes one of the following: neighbor interpolation and weighted interpolation.
In addition, the interpolation operation method may be selected flexibly according to the user's needs in practical application, and other interpolation operation methods may be used in addition to the above interpolation operation method, and the present application is not particularly limited herein.
In an alternative embodiment, the compressed code stream further includes a target code rate.
In an alternative embodiment, the target code rate is indicated by one of the following information: modulation factor of target code rate, quantization parameter of target code rate.
It should be noted that, in practical application, the target code rate may be indicated by a target code rate modulation factor, may also be indicated by a target quantization parameter, or may also be indicated by other parameters.
In a second aspect, the present application provides a feature map encoding method, which may be performed by an image decoding apparatus, where the image decoding apparatus may be generally carried in an electronic device, and the electronic device may be a device carrying a camera or a processor, such as a video camera, a mobile phone, a tablet computer, a notebook computer, or a television, and the present application is not specifically limited herein.
The image decoding device acquires a compressed code stream of a first feature map of an image to be decoded, wherein the first feature map comprises first feature values; determining a target value of a model parameter for estimating probability distribution of the first characteristic value according to the target code rate and a reference value of the model parameter of the probability estimation model; determining a probability distribution of the first characteristic value through the target value; entropy decoding is carried out on the compressed code stream according to the probability distribution of the first characteristic value so as to obtain an estimated value of the first characteristic value; and reconstructing an image according to the estimated value of the first characteristic value.
In an alternative embodiment, the first feature map is determined by extracting gain values for second feature values of a second feature map of the input image; the compressed code stream comprises a first compressed code stream and a second compressed code stream; the image decoding device performs entropy decoding on the first compressed code stream to determine an estimated value of the first characteristic value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy decoding is carried out on the second compressed code stream according to probability distribution of the second characteristic value so as to obtain an estimated value of the second characteristic value; determining a second feature map according to the estimated value of the second feature value; and reconstructing the image based on the second feature map to obtain a reconstructed image.
In an alternative embodiment, the image decoding device determines the weight coefficient of the reference value according to the target code rate; the target value is determined based on the weight coefficient of the reference value and the reference value.
In an alternative embodiment, the image decoding device multiplies or divides the reference value by the corresponding weight coefficient to determine the target value.
In an alternative embodiment, the weight coefficient of the reference value is selected from a corresponding set of weight coefficients according to the target code rate, and the weight coefficients in the set of weight coefficients are obtained by training a probability estimation model.
In an alternative embodiment, the reference value includes M, where M is a positive integer greater than or equal to 2; the image decoding apparatus selects a target value from the M reference values according to the target code rate.
In an alternative embodiment, the M reference values correspond to M code rates; the image decoding device determines a code rate alpha which has an association relation with a target code rate in M code rates; the association relationship includes one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of alpha; and taking the reference value corresponding to alpha as a target value.
In an alternative embodiment, the target code rate is greater than the first code rate and less than the second code rate, the first code rate corresponds to a first reference value of the model parameter of the probability estimation model, the second code rate corresponds to a second reference value of the model parameter of the probability estimation model, and the target value is obtained through interpolation operation of the first reference value and the second reference value.
In an alternative embodiment, the first probability distribution is obtained by a first reference value, the second probability distribution is obtained by a second reference value, and the probability distribution of the first feature value is obtained by interpolation of the first probability distribution and the second probability distribution.
In an alternative embodiment, the interpolation operation includes one of the following: neighbor interpolation and weighted interpolation.
In an alternative embodiment, the compressed code stream further includes a target code rate.
In an alternative embodiment, the target code rate is indicated by one of the following information: modulation factor of target code rate, quantization parameter of target code rate.
In a third aspect, an embodiment of the present application provides an image processing apparatus, which may be an image encoding apparatus (or a chip disposed inside the image encoding apparatus, or may also be an image decoding apparatus or a chip disposed inside the image decoding apparatus, where the image processing apparatus has a function of implementing any one of the first aspect to the second aspect, for example, the image processing apparatus includes a module or a unit or means (means) corresponding to performing any one of the first aspect to the third aspect and related to the steps, where the function or unit or means may be implemented by software, or may be implemented by hardware, or may also be implemented by executing corresponding software by hardware.
In one possible design, the image processing device includes a processing unit and a transceiver unit, where the transceiver unit may be configured to transceiver signals to enable communication between the image processing device and other devices, for example, the transceiver unit is configured to transmit compressed code streams; the processing unit may be adapted to perform some internal operations of the image processing apparatus. The transceiver unit may be referred to as an input-output unit, a communication unit, etc., and may be a transceiver; the processing unit may be a processor. When the image processing apparatus is a module (e.g., a chip) in a communication device, the transceiver unit may be an input/output interface, an input/output circuit, an input/output pin, or the like, and may also be referred to as an interface, a communication interface, or an interface circuit, or the like; the processing unit may be a processor, a processing circuit, a logic circuit, or the like.
In yet another possible design, the image processing apparatus includes a processor, and may further include a transceiver for receiving signals, the processor executing program instructions to complete the method in any of the possible designs or implementations of the first aspect to the second aspect. Wherein the image processing apparatus may further comprise one or more memories for coupling with the processor, the memories may hold necessary computer programs or instructions to implement the functions referred to in any of the above first to second aspects. The processor may execute a computer program or instructions stored by the memory, which when executed, cause the image processing apparatus to implement the method in any of the possible designs or implementations of the first to second aspects described above.
In yet another possible design, the image processing device includes a processor that may be used to couple with the memory. The memory may hold the necessary computer programs or instructions to implement the functions referred to in any of the above first to second aspects. The processor may execute a computer program or instructions stored by the memory, which when executed, cause the image processing apparatus to implement the method in any of the possible designs or implementations of the first to second aspects described above.
In yet another possible design, the image processing device includes a processor and an interface circuit, wherein the processor is configured to communicate with other devices through the interface circuit and perform the method of any of the possible designs or implementations of the first to second aspects above.
It will be appreciated that in the above third aspect, the processor may be implemented by hardware or may be implemented by software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general purpose processor, implemented by reading software code stored in a memory. Further, the above processor may be one or more, and the memory may be one or more. The memory may be integral to the processor or separate from the processor. In a specific implementation process, the memory and the processor may be integrated on the same chip, or may be respectively disposed on different chips.
In a fourth aspect, an embodiment of the present application provides an image processing system, which includes the image encoding device of the first aspect and the image decoding device of the second aspect.
In a fifth aspect, the present application provides a chip system comprising a processor and possibly a memory for implementing the method described in any one of the possible designs of the first to second aspects. The chip system may be formed of a chip or may include a chip and other discrete devices.
In a sixth aspect, the application also provides a computer readable storage medium having stored therein computer readable instructions which when run on a computer cause the computer to perform the method as in any of the possible designs of the first to second aspects.
In a seventh aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of embodiments of the first to second aspects described above.
The technical effects achieved by the second to seventh aspects are described with reference to the corresponding possible design solutions in the first aspect, and the description of the technical effects achieved by the second to seventh aspects is not repeated here.
Drawings
Fig. 1 shows a schematic view of an image processing scene provided by an embodiment of the present application;
FIG. 2 illustrates a codec frame diagram;
FIG. 3A is a schematic diagram of a codec frame according to an embodiment of the present application;
FIG. 3B is a schematic diagram of a codec frame according to an embodiment of the present application;
FIG. 3C illustrates a schematic diagram of a codec framework provided by an embodiment of the present application;
FIG. 3D is a schematic diagram of a codec frame according to an embodiment of the present application;
FIG. 3E illustrates a schematic diagram of a codec frame according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of feature map encoding and feature map decoding according to an embodiment of the present application;
fig. 5 is a schematic diagram showing the structure of an image processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic diagram showing the structure of another image processing apparatus according to the embodiment of the present application;
fig. 7 is a schematic diagram showing a structure of still another image processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings. The specific method of operation in the method embodiment may also be applied to the device embodiment or the system embodiment. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more. Therefore, the implementation of the apparatus and the method can be referred to each other, and the repetition is not repeated.
Fig. 1 shows a schematic diagram of an image processing scenario to which the present application may be applied, where the image processing scenario may be applied to video encoding and decoding processing, and may also be applied to image encoding and decoding processing, where the scenario includes an image encoding device and an image decoding device, where both the image encoding device and the image decoding device may be deployed in an electronic device, and the electronic device may be understood as a mobile phone, a video camera, a mobile phone, a tablet computer, a notebook computer, a television, a vehicle-mounted device, an internet of things device, and the present application is not specifically limited herein. In addition, although only 1 image encoding device and image decoding device are illustrated in fig. 1, in actual application, a plurality of image encoding devices and image decoding devices may be included in the image processing scene, and the present application is not particularly limited herein.
For more clarity in illustrating the image processing scenario of the present application, the image encoding apparatus illustrated in fig. 1 includes: acquisition unit, coding unit and sending unit, image decoding device includes: a receiving unit, a decoding unit and a rendering unit. The image acquisition device, for example, a video camera, a camera, etc., acquires video data or image data, then transmits the video data or image data to an acquisition unit of the image encoding device to perform preprocessing (cutting, filling, etc.), then inputs the preprocessed video data or image data to the encoding unit to perform encoding (extracting characteristic values, determining distribution of the characteristic values, etc.) to obtain a compressed code stream, finally transmits the compressed code stream to a receiving unit of the image decoding device through a transmitting unit, the receiving unit of the image decoding device receives the compressed code stream, then transmits the compressed code stream to a decoding unit to perform decoding, and then transmits a decoding result to a rendering unit, and the rendering unit can recover the video data or image data acquired by the image acquisition device through rendering.
It should be noted that, considering the situations of code stream compression, network transmission delay, interference, and the like, the video data or the image data may not be completely the same as the video data or the image data acquired by the image acquisition device after being encoded and decoded, but the visual effect is indiscriminate.
The present application is not particularly limited herein, and the transmitting module of the image encoding apparatus and the receiving module of the image decoding apparatus may transmit data via a wire or wirelessly. Furthermore, in order to facilitate subsequent data transmission, the image encoding apparatus may save the compressed code stream for direct transmission to other devices.
In the present application, "and/or" describing the association relationship of the association object means that there may be three relationships, for example, a and/or B may mean: a alone, a and B together, and B alone, wherein a, B may be singular or plural. And, unless specified to the contrary, references to "first," "second," etc. ordinal words of embodiments of the present application are used for distinguishing between multiple objects and are not used for limiting the order, timing, priority, or importance of the multiple objects.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
In general, as shown in fig. 2, at the encoding end, an original image x is input to a feature extraction module Ga to output a feature map y, and then the feature map y is input to a quantization module Q (where the quantization operation performed by the quantization module Q may be understood as rounding y), so as to obtain a quantized feature mapOn the one hand, the feature map y outputs the side information z through the side information extraction module Ha module, and the side information z is input into the quantization module Q for quantization to obtain +. >Will->Input to probability estimation unit C2 to get +.>According to +.>Probability distribution pairs->Entropy encoding, writing the entropy encoded result into the bit stream Bitstream, wherein AE2 can obtain +.>A corresponding probability distribution. The entropy decoding unit AD2 is according to->The corresponding probability distribution realizes entropy decoding operation to obtain +.>Will->The input probability estimation module Hs module obtains a feature mapIs a probability distribution N (μ, σ) for each feature element of (c). On the other hand, entropy coding module AE1 is according to +.>Probability distribution N (μ, σ) for each feature element of (a) versus feature map +.>Entropy coding is carried out on each characteristic element in the code stream, and the coding result is written into the code stream Bitstream.
At the decoding end, the entropy decoding unit AD2 obtains from the probability estimation unit C2To decode side information +.>Side information is then->Inputting an Hs module to obtain a symbol to be decoded +.>Probability distribution N (μ, σ). The entropy decoding module AD1 is according to->Probability distribution of each feature element of (a) versus feature map +.>Entropy decoding is carried out on each characteristic element in the sequence to obtain a characteristic diagram +.>Feature map +.>And the input image reconstruction module Gs module obtains a reconstruction map.
With the increase of the user demand and the increase of the network bandwidth capacity, different demands are put forward on the code stream of the video or the image, and in order to match the different code rate demands of different devices, a characteristic gain module and a characteristic inverse gain module are introduced in the multi-code rate image processing architecture of fig. 3A so as to adapt to the demands of multiple code rates. As shown in fig. 3A, the feature gain module Gu1 and the feature inverse gain module IGU1 are respectively introduced into the framework shown in fig. 2, so that the variation of the image compressed code stream can be realized on the single code rate model. The characteristic gain module Gu2 and the characteristic inverse gain module IGU2 can be selectively introduced. The feature gain module is used for gain each feature value in the feature map y to obtain a gain feature map y s ,y s Through the quantization module Q to y s Performing rounding operation on each characteristic value in the map to obtain a characteristic mapThe gain module comprises a gain vector set { q } 1, q 2 ,…,q n In the practical use process, the bit rate can be neededFind the modulation factor i by adjusting the code rate, where i is E [1, n]Selecting a gain vector q i Gain is carried out on the characteristic diagram y; the characteristic inverse gain module refers to the characteristic diagram +.>Is inverse gain.
Wherein gain operation generally refers to the use of a gain vector q i Has a logarithmic value range of [ a, b ]]Scaling the characteristic value of y of (a) to adjust the value range to [ a/q ] i ,b/q i ],y s The range of y is greatly different from the range of y, e.g. when the gain step is 2, i.e. q is 2, y is 0, 100]When y is s At [0, 50]Within the interval.
In a multi-rate scenario, where the input image is to match different rates, it is often necessary to use different gain vectors, so y s There will be a significant difference in the numerical ranges of (a) resulting inThere are also significant differences in the value ranges of the fixed probability estimation unit C2 for which the model parameters need to be +.>Probability estimation is carried out, so that the multi-code rate implementation scheme has high network learning difficulty, and a network model with good rate distortion performance is difficult to train.
In view of the above, the present application provides a feature map coding and feature map coding method to adapt to the requirements of multi-rate coding and guarantee rate-distortion performance. In actual implementation, the feature map encoding may be performed by the image encoding device, and the feature map decoding may be performed by the image decoding device, and the present application is described herein by taking the interaction between the image encoding device and the image decoding device as an example, and the scheme may be applied to the architecture of the multi-rate processing illustrated in fig. 3A and fig. 3B to fig. 3E described below, and may be specifically performed with reference to fig. 4 as follows:
In step 401, the image encoding apparatus acquires a first feature map of an input image, the first feature map including first feature values.
It should be noted that, in general, the first feature map is extracted by a feature extraction module of the image encoding apparatus, and the feature map generally includes information of three dimensions of width, height and channel. The first characteristic diagram comprises a plurality of first characteristic values, and the image coding device codes the plurality of first characteristic values of the first characteristic diagram into a compressed code stream, so that time delay and bandwidth occupied by data transmission can be reduced, and power consumption of equipment can be reduced.
It is further noted that the first feature map described in step 401 corresponds to those of FIGS. 2, 3A-3EThe first feature map is a gain value +_for a second feature value by extracting a second feature map (i.e., y) of the input image>Determining; the first signature is different from the second signature. Of course, in practical application, the first feature map may also be determined by other ways, and the present application is not particularly limited herein.
The image coding device may be implemented by using a neural network, for example, a neural network formed by interleaving and cascading four convolutional layers and three active layers to implement the function of image coding, for example, the convolution kernel size of each convolutional layer of the feature extraction module of the image coding device may be set to 5x5, the number of channels of the output feature map is set to M, and each convolutional layer is downsampled by 2 times in width and height, which is only described herein as an example and not particularly limited. The convolution kernel size, the number of channels of the feature map, the down-sampling multiple, the down-sampling times, the number of convolution layers and the number of activation layers are all adjustable, and the application is not particularly limited herein.
Selecting corresponding gain vector q according to code rate requirement i Gain is carried out on the characteristic values in the second characteristic diagram y, and the gain values of the second characteristic values can be obtained by quantization and roundingGain vector q for each eigenvalue in the second eigenvector y i The characteristic values may be different or the same, for example, the same gain step may be used for the characteristic values in the same channel, or the same gain step may be used for the characteristic values of different channels in the same spatial location, and the present application is not particularly limited herein.
In step 402, the image encoding apparatus determines a target value of a model parameter for estimating a probability distribution of the first feature value based on the target code rate and a reference value of the model parameter of the probability estimation model.
The probability estimation model may use a network model based on deep learning, such as a recurrent neural network (recurrent neural network, RNN) and a convolutional neural network (convolutional neural network, CNN), and the like, which are not limited herein.
Typically the probability distribution model comprises a single gaussian model (gaussian single mode, GSM) or a hybrid gaussian model (gaussian mixture model, GMM), with corresponding model parameters being mean μ and variance σ. The probability distribution model may also be a laplace distribution model (laplace distribution), with the corresponding model parameters being a position parameter μ and a scale parameter b. Taking a gaussian model as an example, the probability distribution function of the first eigenvalue x can be shown in formula 1.
In addition, the reference value of the model parameter of the probability estimation model can be obtained by training the probability estimation model, for example, the sample data can be input into the probability estimation model for data processing by training the sample data with labels to obtain an output result, the output result is compared with the labels of the sample data to determine a loss value, the probability estimation model is adjusted based on the loss value until the training of the probability estimation model reaches the maximum iteration number, or the difference between the output result of the probability estimation model and the labels is smaller than a preset threshold value, the training of the probability estimation model is considered to be completed, and the model parameter value is recorded, at this time, the value of the model parameter is the reference value. In practical applications, multiple sets of reference values may be trained to adapt to the requirements of the image encoding or decoding task.
In the application, the target value of the model parameter is determined according to the reference value and the target code rate, and the target code rate can be understood as the memory capacity allowed by the user.
The target code rate can also be determined by the modulation factor i of the target code rate and the quantization parameter QP (or target gain vector q) i ) To indicate. In practical application, the target code rate can be indicated by a modulation factor of the target code rate, can be indicated by a quantization parameter of the target code rate, and can also be indicated by other parameters, such as a gain vector of the target code rate.
Wherein, the modulation factor i of the target code rate can correspond to the gain vector q of the target code rate i . In one example, the rate modulation factor may be equivalently the quantization parameter in the high efficiency video coding (high efficiency video coding, HEVC) standard scheme, i.e., i may be equivalent to QP in the present application.
Hereinafter, how the target value is determined in step 402 is specifically described based on the following 3 ways:
mode 1, determining a weight coefficient of a reference value according to a target code rate; the target value is determined based on the weight coefficient of the reference value and the reference value.
It should be noted that, there is a correspondence between the code rate and the weight coefficient of the reference value, generally, different code rates correspond to different weight coefficients of the reference value, the weight coefficient of the reference value corresponding to the target code rate may be determined based on the correspondence between the code rate and the weight coefficient of the reference value, and the target value may be determined based on the weight coefficient of the reference value and the reference value. For example, code rate 1, the weight coefficient of the corresponding reference value is a, code rate 2, and the weight coefficient of the corresponding reference value is b. If the target code rate is 1, the weight coefficient obtained by direct matching is a, and the target value is determined based on the a and the reference value; if the target code rate is 1.8 and is close to the code rate 2, the weight coefficient obtained by matching is b, and the target value is determined based on the b and the reference value.
Optionally, the weight coefficient of the reference value is selected from the corresponding weight coefficient set according to the target code rate, the weight coefficient in the weight coefficient set is obtained by training the probability estimation model, and generally, different code rates correspond to different weight coefficients. When the code rate corresponds to the weight coefficient set, the weight coefficient a and the weight coefficient b in the above example are elements in the weight coefficient set. And continuously converting code rates in the training process to obtain a set of weight coefficients adapting to different code rates.
Alternatively, in practical application, the code rate may not correspond to the set of weight coefficients, and the code rate corresponds to the plurality of candidate weight coefficients. When the code rate corresponds to the weight coefficient of the candidate, the weight coefficient a and the weight coefficient b in the above example also exist in the candidate weight coefficient. The candidate weight coefficient can also be obtained by training a probability estimation model, and the code rate can be continuously transformed in the training process to obtain the candidate weight coefficient adapting to different code rates. The application is not particularly limited to the source of the reference value weight coefficient, and can be flexibly determined according to actual requirements.
The modulation factor i for different target code rates or the quantization parameter QP for target code rate (or target gain vector q) during training i ) And selecting model parameters of a probability estimation model corresponding to the weight coefficient adjustment, and carrying out probability estimation on the first characteristic value based on the model parameters and the probability estimation model. When participating in the training of the multi-code rate model, the modulation factor i of the target code rate or the quantization parameter QP of the target code rate (or the gain vector q of the target code rate) are continuously transformed in the training process i ) And obtaining a weight coefficient set or candidate weight coefficients which can adapt to different code rates. Wherein the target code rate modulation factor i can be used to indicate different size code rates, e.g., i e 1, n](n represents the number of discrete code rate points set by the multi-code rate model, and is generally taken as 6), i with different values, for example, i=2 and i=3 indicate different code rates.
Of course, in practical application, the reference value can also be directly multiplied or divided by the corresponding weight coefficient to determine the target value. In practical application, the target value can be determined by multiplying the weight coefficient by the reference value according to the setting rule of the user or the practical requirement, or by dividing the weight coefficient by the reference value, or in other manners, and the application is not limited in detail herein.
The image coding device can select corresponding weight coefficient x according to different target code rate modulation factors i, the probability estimation unit utilizes x to regulate the reference value epsilon of the probability estimation model parameter, namely, the reference value epsilon is multiplied or divided with the weight coefficient x, so that a target value of the model parameter adapting to the current characteristic value range is obtained, the probability distribution of the first characteristic value can be obtained according to the target value, and more accurate probability estimation is realized. For example, a weight coefficient c is determined according to the target code rate, and a value obtained by multiplying or dividing c by a reference value 3 is determined as the target value. In practical applications, the present application is not particularly limited as to how the target value is determined based on the weight coefficient and the reference value, but is merely exemplified herein.
It should be noted that, when the above mode 1 is adopted to determine the target value, and when the target code rate is indicated by the modulation factor of the target code rate, the codec structure may be as shown in fig. 3B, the modulation factor of the target code rate may adjust the characteristic gain module Gu1 and the characteristic inverse gain module IGU1, and in addition, the probability estimation unit C2 may multiply the reference value of the model parameter with the gain coefficient q (i) determined by the modulation factor i of the target code rate to determine the target value epsilon of the model parameter i And based on epsilon i Determination ofProbability distribution of->
By determining the target value in the mode, the method can adapt to the requirement of multiple code rates, reduce compressed code streams and improve the rate distortion performance of image compression.
Mode 2, reference value includes M; m is a positive integer greater than or equal to 2; the image encoding device selects a target value from the M reference values according to the target code rate.
It should be noted that, in practical application, the probability estimation model may pre-store reference values of a plurality of model parameters, and after determining the target code rate, one of the plurality of reference values is directly selected as the target value based on the requirement of the target code rate, so that the data processing efficiency can be improved.
Optionally, the M reference values correspond to M code rates; the image coding device determines a code rate alpha which has an association relation with a target code rate (when the target code rate is indicated by a target code rate modulation factor, the target code rate can be understood as a target code rate modulation factor, and when the target code rate is indicated by a target quantization gain parameter or a target gain vector, the target code rate can be understood as a target quantization parameter or a target gain vector); the association relationship includes one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of alpha; and taking the reference value corresponding to alpha as a target value.
It should be noted that, when the probability estimation model pre-stores the reference values of a plurality of model parameters, the reference values may have an association relationship with the code rate, and the target code rate may have an association relationship with the reference values, so that the reference value corresponding to the target code rate is directly used as the target value; if the target code rate and the reference value have no association relation, determining a code rate closest to the target code rate, namely a code rate alpha, from the code rates with association relation with the reference value, and taking the reference value corresponding to the code rate alpha as the target value.
Specifically, taking the example of the target code rate modulation factor indicating the target code rate as an example, the image encoding device may construct a candidate probability estimation model parameter set { ε } 1 ,…,ε n I.e. a set of reference values for model parameters). In the training process, selecting corresponding model parameters epsilon estimated by candidate probability according to different target code rate modulation factors i i Takes part in the training of the multi-code rate model (the target code rate modulation factor i is consistent with the code rate modulation factor used for selecting the gain step in the normal multi-code rate model training process), thereby obtaining the candidate probability estimation model parameter set which can adapt to different code rates on a single model by continuously changing the code rate modulation factor i in the training process . When in coding, according to the current code rate condition, the model parameter corresponding to the code rate closest to the current code rate can be used as the target value of the model parameter of probability estimation of the current characteristic value.
For the modulation factor i of the target code rate, respectively training to obtain model parameters epsilon when i=1, 2, … and n 1 ,…,ε n Thus the reference value set of the model parameters is ε i ∈{ε 1 ,…,ε n }. Wherein ε i Corresponding to different probability distribution situations. If the modulation factor of the current target code rate is i=1, the model parameter epsilon is obtained by indexing in the reference value set 1 And will epsilon 1 As target values for model parameters. If the modulation factor of the current target code rate is not in the value range of i, selecting the code rate modulation factor closest to the modulation factor of the target code rate in the value range of i, and indexing out the target value in the reference value set according to the closest code rate modulation factor. For example, the code rate modulation factor 1, the corresponding reference value is ε 1 Code rate modulation factor 2, the corresponding reference value is epsilon 2 . If the modulation factor of the target code rate is 1, the direct index target value is epsilon 1 The method comprises the steps of carrying out a first treatment on the surface of the If the modulation factor of the target code rate is 2.1, and is close to the code rate modulation factor 2, epsilon can be calculated 2 As the target value. The present application is illustrated herein by way of example and is not limited to practice.
It should be noted that, when the target value is determined in the above manner 2 and when the target code rate is indicated by the modulation factor of the target code rate, the codec structure may be as shown in fig. 3C, the modulation factor adjustable feature gain module Gu1 and the feature inverse gain module IGU1 of the target code rate, and the probability estimating unit C2 selects the target value epsilon of the model parameter from the multiple reference values of the model parameter based on the modulation factor i of the target code rate i And based on epsilon i Determination ofProbability distribution of->
By determining the target value in the mode, the method can adapt to the requirement of multiple code rates, reduce compressed code streams and improve the rate distortion performance of image compression.
Mode 3, the target code rate is greater than the first code rate and less than the second code rate, the first code rate corresponds to a first reference value of the model parameter of the probability estimation model, the second code rate corresponds to a second reference value of the model parameter of the probability estimation model, and the target value is obtained through interpolation operation of the first reference value and the second reference value.
The code rate mentioned in the above manner 3 may be further understood as a modulation factor or a quantization parameter or a gain vector, and the target code rate being greater than the first code rate may be understood as the target code rate modulation factor being greater than the first code rate modulation factor, or the target quantization parameter being greater than the first quantization parameter, or the target gain vector being greater than the first gain vector, etc., which are only exemplified herein, and not shown, and in the case of the code rate, all the other portions may be understood by referring to the above manner, and will not be repeated.
For example, the target code rate 2 is smaller than the code rate 1 and larger than the code rate 2, the code rate 1 corresponds to the reference value 1, the code rate 2 corresponds to the reference value 2, the reference value and the reference value 2 are subjected to interpolation calculation to determine the target value, wherein the interpolation may be neighbor interpolation or weighted interpolation. If the interpolation operation is a weighted interpolation, the target value may be determined based on a weighted calculation of reference 1 and reference 2. In practical application, the interpolation operation mode can be flexibly selected according to the requirement of the user, and other interpolation operation modes besides the interpolation operation modes can be adopted, and the application is not particularly limited herein.
In practical application, the number of target values (i.e. mode 1 or mode 2) determined directly based on the target code rate may be limited, but after interpolation operation is adopted, the value range of the target values can be expanded, and the requirement of more code rates is adapted.
In step 403, the image encoding device determines a probability distribution of the first feature value from the target value.
Optionally, the target code rate is greater than the first code rate and less than the second code rate, the first code rate corresponds to a first reference value of a model parameter of the probability estimation model, the first probability distribution is obtained through the first reference value, the second code rate corresponds to a second reference value of the model parameter of the probability estimation model, the second probability distribution is obtained through the second reference value, and the probability distribution of the first characteristic value is obtained through interpolation operation of the first probability distribution and the second probability distribution.
The parameters involved in the interpolation operation may include the target code rate modulation factor i, the interpolation parameter ft e [0,1] in the training process, the gain parameter { q (1), q (2), …, q (n) }, and the gaussian probability distribution model F (x; μ, σ), where the modulation form of the gain parameter on the gaussian distribution parameter is assumed to be F1 (u, q (i))=u×q (i), F2 (σ, q (i))=σ×q (i) (this formula is only exemplified, the modulation form of the gain parameter and the distribution parameter may be set according to the need), and thus the modulated gaussian probability distribution is F (x; F1 (u, q (i)), and F2 (σ, q (i)).
If the probability distribution is subjected to neighbor interpolation, determining probability distribution of a first characteristic value through a comparison relation between interpolation parameters and a neighbor judgment threshold t, and when 0< = ft < t, determining probability distribution F (x; F1 (u, q (i)), F2 (sigma, q (i)) corresponding to the first characteristic value, and when t < = ft < = 1, determining probability distribution F (x; F1 (u, q (i+1), F2 (sigma, q (i+1)) corresponding to the first characteristic value.
If the probability distribution is subjected to weighted interpolation, probability branches obtained through gain parameters are weighted, and the probability distribution corresponding to the first characteristic value is obtained as (1-ft) F (x; F1 (u, q (i)), F2 (sigma, q (i))) and ft (F (x; F1 (u, q (i+1, F2 (sigma, q (i+1))).
It should be noted that, when determining the probability distribution corresponding to the first eigenvalue and when the target code rate is indicated by the modulation factor i of the target code rate, the codec structure may be as shown in fig. 3D, where the modulation factor i of the target code rate may adjust the eigenvalue module Gu1 and the eigenvalue inverse gain module IGU1, and in addition, the probability estimation unit C2 determines that two gain parameters are q (i) respectively based on the modulation factor i of the target code rate to useAnd q (i+1), q (i) and q (i+1) are multiplied by the reference value of the model parameter to obtain epsilon i And epsilon i+1 Based on epsilon i Determined probability distributionBased on epsilon i+1 Determined probability distribution->Interpolation is carried out based on interpolation parameter ft to obtain +.>Probability distribution of->
The gain parameter q (i) obtained through training can be interpolated by utilizing the interpolation parameter ft to obtain a gain parameter (1-ft) q (i) +ft q (i+1) which is suitable for the current requirement, and the probability distribution F (x; F1 (u, (1-ft) q (i) +ft q (i+1)) of the first characteristic value is obtained according to the gain parameter obtained through interpolation, and F2 (sigma, (1-ft) q (i) +ft q (i+1)).
It should be noted that, when determining the probability distribution corresponding to the first eigenvalue and when the target code rate is indicated by the modulation factor of the target code rate, the codec structure may be as shown in fig. 3E, where the modulation factor i of the target code rate may adjust the eigenvalue module Gu1 and the eigenvalue inverse module IGU1, and the probability estimation unit C2 determines that the two gain parameters are q (i) and q (i+1) respectively based on the modulation factor i of the target code rate, interpolates q (i) and q (i+1) based on the interpolation parameter ft to obtain q (i, ft), and multiplies q (i, ft) by the reference value of the model parameter to obtain ε i Ft, based on ε i,ft Determined probability distribution
The probability distribution of the first characteristic value is determined through interpolation operation, so that the method can adapt to the requirement of more code rates, and the interpolation operation mode can be flexibly selected according to the requirement of a user in actual application.
In step 404, the image encoding device performs entropy encoding on the first feature value according to the probability distribution of the first feature value, so as to obtain a compressed code stream.
Optionally, the image encoding device performs entropy encoding on the first feature value according to the probability distribution of the first feature value to obtain a first compressed code stream (i.e. a compressed code stream transmitted from AE2 to AD2 in fig. 3A-3E); entropy decoding is carried out on the first compressed code stream to determine an estimated value of the first characteristic value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy encoding the second eigenvalues according to their probability distribution to obtain a second compressed code stream (i.e. the compressed code stream transmitted by AE1 to AD1 in fig. 3A-3E). After a probability distribution of a first characteristic value is determined by adopting a target value, the image coding device carries out entropy coding on the first characteristic value according to the probability distribution of the first characteristic value to obtain a first compressed code stream, then carries out entropy decoding on the first compressed code stream, determines an estimated value of the first characteristic value, determines the probability distribution of the first characteristic value based on the estimated value of the first characteristic value, and carries out entropy coding on a second characteristic value according to the probability distribution of the second characteristic value to obtain a second compressed code stream.
In step 405, the image encoding device transmits the compressed code stream to the image decoding device.
Accordingly, the image decoding apparatus receives the compressed code stream. In the case where the image encoding apparatus compresses to obtain the first compressed code stream and the second compressed code stream, the image encoding apparatus may transmit the first compressed code stream and the second compressed code stream. The compressed code stream may include a target code rate to facilitate the image decoding apparatus to quickly reconstruct an image.
The steps 401 to 405 are performed by an image encoding device, which may transmit the compressed code stream to an image decoding device after determining the compressed code stream, and the image decoding device and the image encoding device are usually decoupled and not located in the same device. Next, the operation steps of the image decoding apparatus are received through steps 501 to 504.
In step 501, the image decoding apparatus determines a target value of a model parameter for estimating a probability distribution of a first feature value according to a target code rate and a reference value of the model parameter of the probability estimation model.
In step 502, the image decoding apparatus determines a probability distribution of the first feature value from the target value.
The steps 502 and 503 described above may be understood by referring to the steps 402 and 403 of the image encoding device part, and are not described herein.
In step 503, the image decoding apparatus performs entropy decoding on the compressed code stream according to the probability distribution of the first feature value, so as to obtain an estimated value of the first feature value.
In step 504, the image decoding apparatus reconstructs an image based on the estimated value of the first feature value.
When the image decoding device receives two compressed code streams, the image decoding device can carry out entropy decoding on the first compressed code stream to determine an estimated value of a first characteristic value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy decoding is carried out on the second compressed code stream according to probability distribution of the second characteristic value so as to obtain an estimated value of the second characteristic value; determining a second feature map according to the estimated value of the second feature value; and reconstructing the image based on the second feature map to obtain a reconstructed image.
When the first characteristic value of the first characteristic diagram is processed in a coding mode, the target value of the model parameter is determined based on the target code rate and the reference value of the model parameter of the probability estimation model, the target value is not a fixed numerical value, the target value can be flexibly adjusted along with the transformation of the target code rate and the transformation of the reference value, the requirement of multiple code rates can be met, and the compressed code stream is reduced. In addition, the rate distortion performance of image compression can be improved.
The scheme provided by the embodiment of the application is mainly introduced from the perspective of equipment interaction. It will be appreciated that in order to achieve the above-described functionality, each device may comprise corresponding hardware structures and/or software modules that perform each function. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application can divide the functional units of the device according to the method example, for example, each functional unit can be divided corresponding to each function, and two or more functions can be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
In case of using integrated units, fig. 5 shows a possible exemplary block diagram of an image processing apparatus involved in an embodiment of the present application. As shown in fig. 5, the image processing apparatus 600 may include: a processing unit 601 and a transceiver unit 602. The processing unit 601 is used for controlling and managing the operation of the image processing apparatus 600. The transceiver unit 602 is used to support communication of the image processing apparatus 600 with other devices. Alternatively, the transceiver unit 602 may include a receiving unit and/or a transmitting unit for performing receiving and transmitting operations, respectively. Optionally, the image processing apparatus 600 may further comprise a storage unit for storing program code and/or data of the image processing apparatus 600. The transceiver unit may be referred to as an input-output unit, a communication unit, etc., and may be a transceiver; the processing unit may be a processor. When the image processing apparatus is a module (e.g., a chip) in a communication device, the transceiver unit may be an input/output interface, an input/output circuit, an input/output pin, or the like, and may also be referred to as an interface, a communication interface, or an interface circuit, or the like; the processing unit may be a processor, a processing circuit, a logic circuit, or the like. Specifically, the image processing apparatus may be the above-described image encoding apparatus, image decoding apparatus, or the like.
In one embodiment, the transceiver 602 is configured to obtain a first feature map of the input image, where the first feature map includes a first feature value; a processing unit 601, configured to determine a target value of a model parameter for estimating a probability distribution of the first feature value according to a target code rate and a reference value of the model parameter of the probability estimation model; determining a probability distribution of the first characteristic value through the target value; entropy coding is carried out on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a compressed code stream; the transceiver 602 is further configured to transmit the compressed code stream.
In an alternative way, the first feature map is determined by extracting gain values of second feature values of a second feature map of the input image; the first feature map is different from the second feature map; the processing unit 601 is specifically configured to entropy encode the first feature value according to a probability distribution of the first feature value, so as to obtain a first compressed code stream; entropy decoding is carried out on the first compressed code stream to determine an estimated value of the first characteristic value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy coding the second characteristic value according to the probability distribution of the second characteristic value to obtain a second compressed code stream; the transceiver 602 is specifically configured to transmit the first compressed code stream and the second compressed code stream.
In an alternative manner, the processing unit 601 is specifically configured to determine a weight coefficient of the reference value according to the target code rate; the target value is determined based on the weight coefficient of the reference value and the reference value.
In an alternative manner, the processing unit 601 is specifically configured to multiply or divide the reference value by the corresponding weight coefficient to determine the target value.
In an alternative manner, the weight coefficient of the reference value is selected from the corresponding weight coefficient set according to the target code rate, and the weight coefficient in the weight coefficient set is obtained by training the probability estimation model.
In an optional manner, the reference values include M, where M is a positive integer greater than or equal to 2; the processing unit 601 is specifically configured to select a target value from M reference values according to a target code rate.
In an alternative manner, the M reference values correspond to M code rates; the processing unit 601 is specifically configured to determine a code rate α having an association relationship with a target code rate in the M code rates; the association relationship includes one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of alpha; and taking the reference value corresponding to alpha as a target value.
In an alternative manner, the target code rate is greater than the first code rate and less than the second code rate, the first code rate corresponds to a first reference value of the model parameter of the probability estimation model, the second code rate corresponds to a second reference value of the model parameter of the probability estimation model, and the target value is obtained through interpolation operation of the first reference value and the second reference value.
In an alternative way, the first probability distribution is obtained by the first reference value, the second probability distribution is obtained by the second reference value, and the processing unit 601 is further configured to obtain the probability distribution of the first feature value by interpolation operation of the first probability distribution and the second probability distribution.
In an alternative, the interpolation operation includes one of the following: neighbor interpolation and weighted interpolation.
In an alternative way, the compressed code stream further comprises a target code rate.
In an alternative way, the target code rate is indicated by one of the following information:
modulation factor of target code rate, quantization parameter of target code rate.
In one embodiment, the transceiving unit 602 is configured to obtain a compressed code stream of a first feature map of an image to be decoded, where the first feature map includes first feature values; a processing unit 601, configured to determine a target value of a model parameter for estimating a probability distribution of the first feature value according to a target code rate and a reference value of the model parameter of the probability estimation model; determining a probability distribution of the first characteristic value through the target value; entropy decoding is carried out on the compressed code stream according to the probability distribution of the first characteristic value so as to obtain an estimated value of the first characteristic value; and reconstructing an image according to the estimated value of the first characteristic value.
In an alternative way, the first feature map is determined by extracting gain values of second feature values of a second feature map of the input image; the compressed code stream comprises a first compressed code stream and a second compressed code stream; the processing unit 601 is specifically configured to perform entropy decoding on the first compressed code stream to determine an estimated value of the first feature value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy decoding is carried out on the second compressed code stream according to probability distribution of the second characteristic value so as to obtain an estimated value of the second characteristic value; determining a second feature map according to the estimated value of the second feature value; and reconstructing the image based on the second feature map to obtain a reconstructed image.
In one embodiment, the transceiver 602 is configured to obtain a first feature map of the input image, where the first feature map includes a first feature value; a processing unit 601, configured to determine a target value of a model parameter for estimating a probability distribution of the first feature value according to a target code rate and a reference value of the model parameter of the probability estimation model; determining a probability distribution of the first characteristic value through the target value; entropy coding is carried out on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a compressed code stream; the transceiver 602 is further configured to transmit the compressed code stream.
In an alternative way, the first feature map is determined by extracting gain values of second feature values of a second feature map of the input image; the first feature map is different from the second feature map; the processing unit 601 is specifically configured to entropy encode the first feature value according to a probability distribution of the first feature value, so as to obtain a first compressed code stream; entropy decoding is carried out on the first compressed code stream to determine an estimated value of the first characteristic value; determining probability distribution of the second characteristic value according to the estimated value of the first characteristic value; entropy coding the second characteristic value according to the probability distribution of the second characteristic value to obtain a second compressed code stream; the transceiver 602 is specifically configured to transmit the first compressed code stream and the second compressed code stream.
In an alternative manner, the processing unit 601 is specifically configured to determine a weight coefficient of the reference value according to the target code rate; the target value is determined based on the weight coefficient of the reference value and the reference value.
In an alternative manner, the processing unit 601 is specifically configured to multiply or divide the reference value by the corresponding weight coefficient to determine the target value.
In an alternative manner, the weight coefficient of the reference value is selected from the corresponding weight coefficient set according to the target code rate, and the weight coefficient in the weight coefficient set is obtained by training the probability estimation model.
In an optional manner, the reference values include M, where M is a positive integer greater than or equal to 2; the processing unit 601 is specifically configured to select a target value from M reference values according to a target code rate.
In an alternative manner, the M reference values correspond to M code rates; the processing unit 601 is specifically configured to determine a code rate α having an association relationship with a target code rate in the M code rates; the association relationship includes one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of alpha; and taking the reference value corresponding to alpha as a target value.
In an alternative manner, the target code rate is greater than the first code rate and less than the second code rate, the first code rate corresponds to a first reference value of the model parameter of the probability estimation model, the second code rate corresponds to a second reference value of the model parameter of the probability estimation model, and the target value is obtained through interpolation operation of the first reference value and the second reference value.
In an alternative way, the first probability distribution is obtained by the first reference value, the second probability distribution is obtained by the second reference value, and the processing unit 601 is further configured to obtain the probability distribution of the first feature value by interpolation operation of the first probability distribution and the second probability distribution.
In an alternative, the interpolation operation includes one of the following: neighbor interpolation and weighted interpolation.
In an alternative way, the compressed code stream further comprises a target code rate.
In an alternative way, the target code rate is indicated by one of the following information:
modulation factor of target code rate, quantization parameter of target code rate.
As shown in fig. 6, the present application further provides an image processing apparatus 700. The image processing apparatus 700 may be a chip or a chip system. The image processing apparatus may be located in the device according to any of the above-described method embodiments, for example, an image encoding apparatus, an image decoding apparatus, or the like, to execute an operation corresponding to the device.
Alternatively, the chip system may be constituted by a chip, and may also include a chip and other discrete devices.
The image processing apparatus 700 includes a processor 710.
Processor 710, configured to execute a computer program stored in memory 720 to implement the actions of the respective devices in any of the method embodiments described above.
The image processing apparatus 700 may further comprise a memory 720 for storing a computer program.
Optionally, a memory 720 is coupled to the processor 710. Coupling is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other form for the exchange of information between the devices, units, or modules. Optionally, memory 720 is integrated with processor 710.
The processor 710 and the memory 720 may each be one or more, and are not limited.
Optionally, in practical applications, the image processing apparatus 700 may or may not include the transceiver 730, which is illustrated by a dashed box, and the image processing apparatus 700 may perform information interaction with other devices through the transceiver 730. Transceiver 730 may be a circuit, bus, transceiver, or any other device that may be used to interact with information.
The specific connection medium between the transceiver 730, the processor 710, and the memory 720 is not limited in the embodiment of the present application. The embodiment of the present application is shown in fig. 6 with the memory 720, the processor 710, and the transceiver 730 connected by a bus, which is shown in fig. 6 with a bold line, and the connection between other components is merely illustrative and not limited thereto. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus. In an embodiment of the present application, the processor may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
In the embodiment of the present application, the memory may be a nonvolatile memory, such as a hard disk (HDD) or a Solid State Drive (SSD), or may be a volatile memory (RAM). The memory may also be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory in embodiments of the present application may also be circuitry or any other device capable of implementing a memory function for storing computer programs, program instructions and/or data.
Based on the above embodiments, referring to fig. 7, an embodiment of the present application further provides another image processing apparatus 800, including: interface circuitry 810 and logic circuitry 820; interface circuit 810, which may be understood as an input/output interface, may be configured to perform the steps of transceiving the respective devices in any of the method embodiments described above, e.g., step 405 described above, the step of transmitting the compressed code stream; logic 820 may be used to execute code or instructions to perform the methods performed by the devices in any of the above embodiments, and will not be described in detail.
Based on the above embodiments, the embodiments of the present application also provide a computer-readable storage medium storing instructions that, when executed, cause the methods performed by the respective devices in any of the above method embodiments to be implemented, for example, cause the methods performed by the image encoding apparatus and the image decoding apparatus in the embodiment shown in fig. 4 to be implemented. The computer readable storage medium may include: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.
Based on the above embodiments, the present application provides an image processing system including the image encoding apparatus and the image decoding apparatus mentioned in any of the above method embodiments, which may be used to perform the methods performed by the respective devices in any of the above method embodiments.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable image processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (29)

1. A feature map encoding method, comprising:
acquiring a first feature map of an input image, wherein the first feature map comprises first feature values;
determining a target value of a model parameter for estimating probability distribution of the first characteristic value according to a target code rate and a reference value of the model parameter of a probability estimation model;
determining a probability distribution of the first characteristic value through the target value;
entropy coding is carried out on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a compressed code stream;
and transmitting the compressed code stream.
2. The method of claim 1, wherein the first feature map is determined by extracting gain values for second feature values for a second feature map of the input image; the entropy encoding the first eigenvalue according to the probability distribution of the first eigenvalue to obtain a compressed code stream, including:
Entropy coding is carried out on the first characteristic value according to probability distribution of the first characteristic value so as to obtain a first compressed code stream;
entropy decoding the first compressed code stream to determine an estimated value of the first eigenvalue;
determining a probability distribution of the second characteristic value according to the estimated value of the first characteristic value;
entropy coding the second characteristic value according to the probability distribution of the second characteristic value to obtain a second compressed code stream;
the transmitting the compressed code stream includes:
and transmitting the first compressed code stream and the second compressed code stream.
3. The method according to claim 1 or 2, wherein said determining a target value of a model parameter for estimating a probability distribution of said first eigenvalue from a target code rate and a reference value of said model parameter of a probability estimation model comprises:
determining a weight coefficient of the reference value according to the target code rate;
the target value is determined based on the weight coefficient of the reference value and the reference value.
4. A method according to claim 3, wherein said determining said target value based on a weight coefficient of said reference value and said reference value comprises:
And multiplying or dividing the reference value by a corresponding weight coefficient to determine the target value.
5. The method according to claim 3 or 4, wherein the weight coefficients of the reference value are selected from corresponding weight coefficient sets according to the target code rate, the weight coefficients in the weight coefficient sets being obtained by training the probability estimation model.
6. The method according to claim 1 or 2, wherein the reference values include M, the M being a positive integer of 2 or more; the determining the target value of the model parameter for estimating the probability distribution of the first characteristic value according to the target code rate and the reference value of the model parameter of the probability estimation model comprises the following steps:
and selecting the target value from the M reference values according to the target code rate.
7. The method of claim 5, wherein the M reference values correspond to M code rates; the selecting the target value from the M reference values according to the target code rate includes:
determining a code rate alpha which has an association relation with the target code rate in the M code rates; the association relationship comprises one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of the alpha;
And taking the reference value corresponding to the alpha as the target value.
8. The method according to any one of claims 1-7, wherein the target code rate is greater than a first code rate and less than a second code rate, the first code rate corresponding to a first reference value of a model parameter of the probabilistic estimation model and the second code rate corresponding to a second reference value of the model parameter of the probabilistic estimation model, the target value being obtained by an interpolation operation of the first reference value and the second reference value.
9. The method of claim 8, wherein a first probability distribution is obtained from the first reference value and a second probability distribution is obtained from the second reference value, the determining the probability distribution of the first feature value further comprising:
and obtaining the probability distribution of the first characteristic value through interpolation operation of the first probability distribution and the second probability distribution.
10. The method of claim 8 or 9, wherein the interpolation operation comprises one of:
neighbor interpolation and weighted interpolation.
11. The method according to any of claims 1-10, wherein the compressed code stream further comprises the target code rate.
12. The method according to any of claims 1-11, wherein the target code rate is indicated by one of the following information:
the modulation factor of the target code rate and the quantization parameter of the target code rate.
13. A signature encoding method, comprising:
obtaining a compressed code stream of a first feature map of an image to be decoded, wherein the first feature map comprises first feature values;
determining a target value of a model parameter for estimating probability distribution of the first characteristic value according to a target code rate and a reference value of the model parameter of a probability estimation model;
determining a probability distribution of the first characteristic value through the target value;
entropy decoding is carried out on the compressed code stream according to the probability distribution of the first characteristic value so as to obtain an estimated value of the first characteristic value;
reconstructing the image according to the estimated value of the first characteristic value.
14. The method of claim 13, wherein the first feature map is determined by extracting gain values for second feature values for a second feature map of the input image; the compressed code stream comprises a first compressed code stream and a second compressed code stream; the entropy decoding the compressed code stream according to the probability distribution of the first eigenvalue to obtain an estimated value of the first eigenvalue, including:
Entropy decoding the first compressed code stream to determine an estimated value of the first eigenvalue;
said reconstructing said image from said estimate of said first eigenvalue comprises:
determining a probability distribution of the second characteristic value according to the estimated value of the first characteristic value;
entropy decoding the second compressed code stream according to the probability distribution of the second characteristic value to obtain an estimated value of the second characteristic value;
determining the second feature map according to the estimated value of the second feature value;
and reconstructing the image based on the second feature map to obtain a reconstructed image.
15. The method according to claim 13 or 14, wherein determining the target value of the model parameter for estimating the probability distribution of the first eigenvalue from the target code rate and the reference value of the model parameter of the probability estimation model comprises:
determining a weight coefficient of the reference value according to the target code rate;
the target value is determined based on the weight coefficient of the reference value and the reference value.
16. The method of claim 15, wherein the determining the target value based on the weight coefficient of the reference value and the reference value comprises:
And multiplying or dividing the reference value by a corresponding weight coefficient to determine the target value.
17. The method according to claim 15 or 16, wherein the weight coefficients of the reference value are selected from a corresponding set of weight coefficients according to the target code rate, the weight coefficients of the set of weight coefficients being obtained by training the probability estimation model.
18. The method according to claim 13 or 14, wherein the reference values include M, the M being a positive integer of 2 or more; the determining the target value of the model parameter for estimating the probability distribution of the first characteristic value according to the target code rate and the reference value of the model parameter of the probability estimation model comprises the following steps:
and selecting the target value from the M reference values according to the target code rate.
19. The method of claim 18, wherein the M reference values correspond to M code rates; the selecting the target value from the M reference values according to the target code rate includes:
determining a code rate alpha which has an association relation with the target code rate in the M code rates; the association relationship comprises one of the following: the target code rate is the same as the value of alpha, and the value of the target code rate is in the neighborhood range of the alpha;
And taking the reference value corresponding to the alpha as the target value.
20. The method according to any of claims 13-19, wherein the target code rate is greater than a first code rate and less than a second code rate, the first code rate corresponding to a first reference value of a model parameter of the probabilistic estimation model and the second code rate corresponding to a second reference value of the model parameter of the probabilistic estimation model, the target value being obtained by an interpolation operation of the first reference value and the second reference value.
21. The method of claim 20, wherein a first probability distribution is obtained from a first reference value and a second probability distribution is obtained from a second reference value, the determining the probability distribution of the first feature value further comprising:
and obtaining the probability distribution of the first characteristic value through interpolation operation of the first probability distribution and the second probability distribution.
22. The method of claim 20 or 21, wherein the interpolation operation comprises one of:
neighbor interpolation and weighted interpolation.
23. The method according to any of claims 13-22, wherein the compressed code stream further comprises the target code rate.
24. The method according to any of claims 13-23, characterized in that the target code rate is indicated by one of the following information:
the modulation factor of the target code rate and the quantization parameter of the target code rate.
25. An image processing apparatus, comprising: functional module for implementing the method according to any of claims 1-12 or 13-24.
26. An image processing apparatus, comprising: at least one processor and memory;
the memory is used for storing a computer program or instructions;
the at least one processor configured to execute the computer program or instructions to cause the method of any one of claims 1-12 or any one of claims 13-24 to be performed.
27. A chip system, the chip system comprising: a processing circuit; the processing circuit is coupled with a storage medium;
the processing circuitry for executing part or all of the computer program or instructions in the storage medium for implementing the method of any one of claims 1-12 or any one of claims 13-24 when the part or all of the computer program or instructions are executed.
28. A computer readable storage medium storing instructions which, when executed by a computer, cause the method of any one of claims 1-12 or any one of claims 13-24 to be performed.
29. A computer program product comprising a computer program or instructions which, when run on a computer, causes the method of any one of the preceding claims 1-12 or 13-24 to be performed.
CN202210234510.XA 2022-03-10 2022-03-10 Feature map coding and feature map coding method and device Pending CN116778003A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202210234510.XA CN116778003A (en) 2022-03-10 2022-03-10 Feature map coding and feature map coding method and device
PCT/CN2023/079510 WO2023169319A1 (en) 2022-03-10 2023-03-03 Feature map coding method, feature map decoding method, and apparatus
TW112108485A TW202403660A (en) 2022-03-10 2023-03-08 A method and apparatus for feature map encoding and decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234510.XA CN116778003A (en) 2022-03-10 2022-03-10 Feature map coding and feature map coding method and device

Publications (1)

Publication Number Publication Date
CN116778003A true CN116778003A (en) 2023-09-19

Family

ID=87937242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210234510.XA Pending CN116778003A (en) 2022-03-10 2022-03-10 Feature map coding and feature map coding method and device

Country Status (3)

Country Link
CN (1) CN116778003A (en)
TW (1) TW202403660A (en)
WO (1) WO2023169319A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652581B1 (en) * 2019-02-27 2020-05-12 Google Llc Entropy coding in image and video compression using machine learning
CN110717948A (en) * 2019-04-28 2020-01-21 合肥图鸭信息科技有限公司 Image post-processing method, system and terminal equipment
CN113132723B (en) * 2019-12-31 2023-11-14 武汉Tcl集团工业研究院有限公司 Image compression method and device
CN113259665B (en) * 2020-02-07 2022-08-09 华为技术有限公司 Image processing method and related equipment

Also Published As

Publication number Publication date
TW202403660A (en) 2024-01-16
WO2023169319A1 (en) 2023-09-14

Similar Documents

Publication Publication Date Title
CN110494892B (en) Method and apparatus for processing multi-channel feature map images
US10623775B1 (en) End-to-end video and image compression
CN111641832B (en) Encoding method, decoding method, device, electronic device and storage medium
US11503295B2 (en) Region-based image compression and decompression
KR102299958B1 (en) Systems and methods for image compression at multiple, different bitrates
CN110300301B (en) Image coding and decoding method and device
US11012718B2 (en) Systems and methods for generating a latent space residual
US10812790B2 (en) Data processing apparatus and data processing method
CN111641826B (en) Method, device and system for encoding and decoding data
CN111314709A (en) Video compression based on machine learning
US10972749B2 (en) Systems and methods for reconstructing frames
CN113994348A (en) Linear neural reconstruction for deep neural network compression
CN116582685A (en) AI-based grading residual error coding method, device, equipment and storage medium
KR102113904B1 (en) Encoder, decoder and method of operation using interpolation
KR20160078984A (en) Method and apparatus for building an estimate of an original image from a low-quality version of the original image and an epitome
KR102245682B1 (en) Apparatus for compressing image, learning apparatus and method thereof
CN116778003A (en) Feature map coding and feature map coding method and device
CN116886922A (en) Video processing method, video processing device, electronic equipment and computer readable storage medium
US8306341B2 (en) Image data compression apparatus and decoding apparatus
WO2023024115A1 (en) Encoding method, decoding method, encoder, decoder and decoding system
US20220321879A1 (en) Processing image data
WO2023092388A1 (en) Decoding method, encoding method, decoder, encoder, and encoding and decoding system
CN115243043A (en) Depth image coding and decoding method and system with controllable entropy decoding complexity
CN117813634A (en) Method and apparatus for encoding/decoding image or video
CN116112673A (en) Encoding and decoding method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication