WO2022253249A1 - 特征数据编解码方法和装置 - Google Patents

特征数据编解码方法和装置 Download PDF

Info

Publication number
WO2022253249A1
WO2022253249A1 PCT/CN2022/096510 CN2022096510W WO2022253249A1 WO 2022253249 A1 WO2022253249 A1 WO 2022253249A1 CN 2022096510 W CN2022096510 W CN 2022096510W WO 2022253249 A1 WO2022253249 A1 WO 2022253249A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature element
feature
value
probability
threshold
Prior art date
Application number
PCT/CN2022/096510
Other languages
English (en)
French (fr)
Inventor
毛珏
赵寅
闫宁
杨海涛
张恋
王晶
师一博
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2023574690A priority Critical patent/JP2024520151A/ja
Priority to CA3222179A priority patent/CA3222179A1/en
Priority to BR112023025167A priority patent/BR112023025167A2/pt
Priority to EP22815293.0A priority patent/EP4336829A1/en
Priority to KR1020237045517A priority patent/KR20240016368A/ko
Priority to AU2022286517A priority patent/AU2022286517A1/en
Publication of WO2022253249A1 publication Critical patent/WO2022253249A1/zh
Priority to US18/526,406 priority patent/US20240105193A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3079Context modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6082Selection strategies

Definitions

  • Embodiments of the present invention relate to the technical field of image or audio compression based on artificial intelligence (AI), in particular to a method and device for encoding and decoding feature data.
  • AI artificial intelligence
  • Image or audio encoding and decoding (referred to as codec) is widely used in digital image or audio applications, such as broadcasting digital TV, image or audio transmission on the Internet and mobile networks, video or voice chat, and real-time conversations such as video or voice conferencing applications, DVD and Blu-ray discs, image or audio content capture and editing systems, and security applications for camcorders.
  • codec Digital image or audio applications
  • the video is composed of multiple frames of images, so the images in this application can be individual images or images in the video.
  • image (or audio) data is usually compressed before being transmitted in modern telecommunication networks.
  • Image (or audio) size may also be an issue when storing video on a storage device, as memory resources may be limited.
  • Image (or audio) compression equipment typically uses software and/or hardware at the source to encode image (or audio) data prior to transmission or storage, thereby reducing the amount of data required to represent a digital image (or audio) quantity.
  • the compressed data is then received at the destination side by an image (or audio) decompression device.
  • VVC's video standard formulation work was completed in June 2020, and the standard includes almost all technical algorithms that can bring about significant improvements in compression efficiency. Therefore, it is difficult to obtain major technological breakthroughs in a short period of time by continuing to study new compression coding algorithms along the traditional signal processing path.
  • Different from traditional image algorithms that optimize each module of image compression through manual design end-to-end AI image compression is optimized as a whole, so the compression effect of the AI image compression scheme is better.
  • the variational autoencoder (Variational Autoencoder, AE) method is the mainstream technical solution of the current AI image lossy compression technology.
  • the current mainstream technical solution is to obtain the image feature map of the image to be encoded through the encoding network, and further perform entropy encoding on the image feature map, but the entropy encoding process has the problem of high complexity.
  • the present application provides a feature data encoding and decoding method and device, which can reduce encoding and decoding complexity without affecting encoding and decoding performance.
  • a method for encoding characteristic data including:
  • the feature data to be encoded includes a plurality of feature elements, and the plurality of feature elements include a first feature element;
  • Entropy coding is performed on the first feature element only when it is determined that entropy coding needs to be performed on the first feature element.
  • the feature data includes an image feature map, or an audio feature variable, or an image feature map and an audio feature variable. It can be one-dimensional, two-dimensional or multi-dimensional data output by the encoding network, where each data is a feature element. It should be noted that the meanings of the feature point and the feature element in this application are the same.
  • the first feature element is any feature element to be encoded in the feature data to be encoded.
  • the probability estimation process of obtaining the probability estimation result of the first feature element can be realized through a probability estimation network; in another possibility, the probability estimation process can use a traditional non-network probability estimation method to evaluate the feature data for probability estimation.
  • the probability estimation results of each feature element can be output in parallel; when the input of probability estimation includes context information, the probability estimation results of each feature element need to be serially output.
  • the side information is the feature information obtained by inputting the feature data into the neural network and further extracting, and the number of feature elements contained in the side information is less than that of the feature data.
  • the side information of the feature data can be encoded into the code stream.
  • entropy coding does not need to be performed on the first feature element of the feature data.
  • the current first feature element is the Pth feature element of the feature data
  • start the P+1th feature element of the feature data Judgment of feature elements and execution or non-execution of entropy coding process according to the judgment result wherein P is a positive integer and P is less than M, where M is the number of feature elements in the entire feature data.
  • P is a positive integer and P is less than M
  • M is the number of feature elements in the entire feature data.
  • judging whether to perform entropy coding on the first feature element includes: when the probability estimation result of the first feature element satisfies a preset condition, judging that entropy coding needs to be performed on the first feature element Entropy coding; or when the probability estimation result of the first feature element does not satisfy a preset condition, it is determined that entropy coding does not need to be performed on the first feature element.
  • the probability estimation result of the first feature element is the probability value that the first feature element takes a value k
  • the preset condition is the probability that the first feature element takes a value k
  • the value is less than or equal to the first threshold, where k is an integer.
  • k is a certain value in the possible value range of the above-mentioned first characteristic element.
  • the value range that the first feature element can take is [-255, 255].
  • k may be set to 0, then entropy coding is performed on the first feature element whose probability value is less than or equal to 0.5. Entropy coding is not performed for the first feature element with a probability value greater than 0.5.
  • the probability value that the first feature element takes a value of k is the maximum probability value among the probability values of all possible values of the first feature element.
  • the first threshold selected for the coded stream at a low code rate is smaller than the first threshold selected for the coded stream at a high code rate.
  • the specific bit rate is related to the resolution and image content of the image. Taking the public Kodak dataset as an example, the bit rate is lower than 0.5bpp, and the bit rate is high if it is lower.
  • the first threshold may be configured according to actual needs, which is not limited here.
  • the entropy encoding complexity can be flexibly reduced according to requirements through a flexible first threshold setting manner.
  • the probability estimation result of the first feature element includes a first parameter and a second parameter of a probability distribution of the first feature element.
  • the first parameter of the probability distribution of the first feature element is the mean value of the Gaussian distribution of the first feature element
  • the second parameter of the probability distribution of the first feature element is the first parameter of the probability distribution of the first feature element.
  • the variance of a Gaussian distribution of a feature element; or when the probability distribution is a Laplace distribution, the first parameter of the probability distribution of the first feature element is the position parameter of the Laplace distribution of the first feature element, so The second parameter of the probability distribution of the first feature element is a scale parameter of the Laplace distribution of the first feature element.
  • the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to the second threshold;
  • a second parameter of the probability distribution of the first feature element is greater than or equal to a third threshold
  • the sum of the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element and the second parameter of the probability distribution of the first feature element is greater than or equal to a fourth threshold .
  • the first parameter of the probability distribution of the first feature element is the mean value of the mixed Gaussian distribution of the first feature element
  • the second parameter of the probability distribution of the first feature element is The variance of the mixed Gaussian distribution of the first feature element
  • the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution of the first feature element and the value k of the first feature element and the sum of any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the fifth threshold;
  • the difference between any mean value of the mixed Gaussian distribution of the first feature element and the value k of the first feature element is greater than or equal to the sixth threshold;
  • Any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the seventh threshold.
  • the first parameter of the probability distribution of the first feature element is the mean value of the asymmetric Gaussian distribution of the first feature element
  • the second parameter of the probability distribution of the first feature element is The parameter is the first variance and the second variance of the asymmetric Gaussian distribution of the first feature element
  • the absolute value of the difference between the mean value of the asymmetric Gaussian distribution of the first feature element and the value k of the first feature element is greater than or equal to the eighth threshold;
  • a first variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a ninth threshold
  • a second variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a tenth threshold.
  • the probability distribution of the first feature element is a mixed Gaussian distribution, and the judgment range of the first feature element is determined. When the multiple mean values of the probability distribution of the first feature element are not in the first feature element range of judgment values.
  • the probability distribution of the first characteristic element is a Gaussian distribution, and the judgment value range of the first characteristic element is determined. When the mean value of the probability distribution of the first characteristic element is not within the judgment value range of the first characteristic element scope.
  • the probability distribution of the first characteristic element is a Gaussian distribution
  • the judgment value range of the first characteristic element is determined
  • the judgment value range includes a plurality of possible values of the first characteristic element, when the The absolute value of the difference between the mean parameter of the Gaussian distribution of the first feature element and each value in the judgment value range of the first feature element is greater than or equal to the eleventh threshold, or the probability of the first feature element
  • the variance of the distribution is greater than or equal to the twelfth threshold.
  • the value of the first feature element is not within the value range of the first feature element.
  • the probability value corresponding to the value of the first characteristic element is less than or equal to the thirteenth threshold.
  • the method further includes: constructing a threshold candidate list of the first threshold, putting the first threshold into the threshold candidate list of the first threshold and corresponding to the first threshold
  • the index number of the first threshold is written into the encoded code stream, wherein the length of the threshold candidate list of the first threshold can be set to T; T is an integer greater than or equal to 1.
  • the other thresholds can be constructed in the manner of constructing a threshold candidate list as the first threshold, and have index numbers corresponding to the thresholds and write them into the encoded code stream.
  • the index number is written into the code stream, which can be stored in the sequence header (sequence header), image header (picture header), Slice/strip (slice header) or SEI (supplemental enhancement information) and sent to At the decoding end, other methods may also be used, which are not limited here.
  • sequence header sequence header
  • image header picture header
  • Slice/strip slice header
  • SEI Supplemental Enhancement information
  • the decision information is obtained by inputting the probability estimation result into the generation network.
  • the generation network may be a convolutional network, which may include multiple network layers, and any network layer may be a convolutional layer, a normalization layer, a nonlinear activation layer, and the like.
  • the probability estimation result of the feature data is input into the generation network to obtain the decision information of the first feature element, and the decision information is used to indicate whether to perform entropy on the first feature element coding.
  • the decision information of the characteristic data is a decision map, which may also be called a decision map.
  • the decision graph is preferably a binary graph, and the binary graph may also be called a binary graph map.
  • the value of the decision information of the feature elements in the binary graph is usually 0 or 1. Therefore, when the value corresponding to the location of the first characteristic element in the decision diagram is a preset value, entropy coding needs to be performed on the first characteristic element; when the value corresponding to the location of the first characteristic element in the decision diagram is When the value of the position is not a preset value, entropy coding does not need to be performed on the first feature element.
  • the decision information of the feature elements in the feature data is a preset value.
  • the preset value of the decision information is usually 1, so when the decision information is a preset value, entropy coding needs to be performed on the first feature element; when the decision information is not a preset value, There is no need to perform entropy coding on the first feature element.
  • the decision information can be an identifier or an identifier's value. Judging whether to perform entropy coding on the first feature element depends on whether the flag or the value of the flag is a preset value.
  • the set of decision information of each feature element in the feature data can also be a floating point number, that is to say, the value can be other values except 0 and 1.
  • the method further includes: the image to be encoded is passed through a coding network to obtain the feature data; the image to be coded is rounded to obtain the feature data after passing through the coding network; or the image to be coded is passed through the coding network Afterwards, the feature data is obtained through quantization and rounding.
  • the encoding network can adopt an autoencoder structure.
  • the encoding network can be a convolutional neural network.
  • An encoding network can include multiple subnetworks, each containing one or more convolutional layers. The network structures among the sub-networks may be the same or different from each other.
  • the image to be encoded can be an original image or a residual image.
  • the image to be encoded can be in RGB format or YUV, RAW and other representation formats, and the image to be encoded can be pre-processed before being input into the encoding network.
  • the pre-processing operation can include operations such as conversion, block division, filtering, and pruning.
  • a method for decoding feature data including:
  • the characteristic data to be decoded includes a plurality of characteristic elements, and the plurality of characteristic elements include a first characteristic element;
  • Entropy decoding is performed on the first feature element only when it is determined that entropy decoding needs to be performed on the first feature element.
  • the first feature element is any feature element in the feature data to be decoded, and when all the feature elements in the feature data to be decoded complete the judgment and execute or not perform entropy decoding according to the judgment result, the decoded feature data is obtained .
  • the feature data to be decoded may be one-dimensional, two-dimensional or multi-dimensional data, each of which is a feature element. It should be noted that the meanings of the feature point and the feature element in this application are the same.
  • the first feature element is any feature element to be decoded in the feature data to be decoded.
  • the probability estimation process of obtaining the probability estimation result of the first feature element can be realized through a probability estimation network; in another possibility, the probability estimation process can use a traditional non-network probability estimation method to evaluate the feature data for probability estimation.
  • the probability estimation results of each feature element can be output in parallel; when the input of probability estimation includes context information, the probability estimation results of each feature element need to be serially output. Wherein, the number of feature elements included in the side information is less than that of feature data.
  • the code stream contains side information, and the process of decoding the code stream needs to decode the side information.
  • the process of judging each feature element in the feature data includes conditional judgment and deciding whether to perform entropy decoding according to the result of the conditional judgment.
  • entropy decoding can be implemented by means of neural networks.
  • entropy decoding can be implemented by conventional entropy decoding.
  • the current first feature element is the Pth feature element of the feature data
  • start the P+1th feature element of the feature data Judgment of feature elements and execution or non-execution of entropy decoding process according to the judgment result where P is a positive integer and P is less than M, where M is the number of feature elements in the entire feature data.
  • P is a positive integer and P is less than M
  • M is the number of feature elements in the entire feature data.
  • the judging whether to perform entropy decoding on the first feature element of the feature data includes: when the probability estimation result of the first feature element of the feature data satisfies a preset condition, judging whether to perform entropy decoding on the first feature element of the feature data Entropy decoding is performed on the first feature element; or when the probability estimation result of the first feature element does not meet the preset condition, it is judged that the entropy decoding of the first feature element is not required, and the The eigenvalues are set to k; where k is an integer.
  • the probability estimation result of the first feature element is the probability value that the first feature element takes a value k
  • the preset condition is the probability that the first feature element takes a value k
  • the value is less than or equal to the first threshold, where k is an integer.
  • the first feature element is set to k when the preset condition is not satisfied.
  • the value range that the first feature element can take is [-255, 255].
  • k may be set to 0, then entropy coding is performed on the first feature element whose probability value is less than or equal to 0.5. Entropy coding is not performed for the first feature element with a probability value greater than 0.5.
  • the value of the first characteristic element is determined through a list when the preset condition is not satisfied.
  • the first feature element is set to a fixed integer value when the preset condition is not satisfied.
  • k is a certain value in the possible value range of the above-mentioned first characteristic element.
  • k is the value corresponding to the maximum probability among all possible value ranges of the above-mentioned first feature element.
  • the first threshold selected for the decoded code stream in the case of a low code rate is smaller than the first threshold selected for the decoded code stream in the case of a high code rate.
  • the specific bit rate is related to the resolution and image content of the image. Taking the public Kodak dataset as an example, a bit rate lower than 0.5bpp is low, otherwise it is high bit rate.
  • the first threshold may be configured according to actual needs, which is not limited here.
  • the flexible first threshold setting method enables the entropy decoding complexity to be flexibly reduced according to requirements.
  • the probability estimation result of the first feature element includes a first parameter and a second parameter of a probability distribution of the first feature element.
  • the first parameter of the probability distribution of the first feature element is the mean value of the Gaussian distribution of the first feature element
  • the second parameter of the probability distribution of the first feature element is the first parameter of the probability distribution of the first feature element.
  • the variance of a Gaussian distribution of a feature element; or when the probability distribution is a Laplace distribution, the first parameter of the probability distribution of the first feature element is the position parameter of the Laplace distribution of the first feature element, so The second parameter of the probability distribution of the first feature element is a scale parameter of the Laplace distribution of the first feature element.
  • the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to a second threshold
  • a second parameter of the first feature element is greater than or equal to a third threshold
  • the sum of the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element and the second parameter of the probability distribution of the first feature element is greater than or equal to a fourth threshold .
  • the first parameter of the probability distribution of the first feature element is the mean value of the mixed Gaussian distribution of the first feature element
  • the second parameter of the probability distribution of the first feature element is The variance of the mixed Gaussian distribution of the first feature element
  • the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution of the first feature element and the value k of the first feature element and the sum of any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the fifth threshold;
  • the difference between any mean value of the mixed Gaussian distribution of the first feature element and the value k of the first feature element is greater than the sixth threshold;
  • Any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the seventh threshold.
  • the first parameter of the probability distribution of the first feature element is the mean value of the asymmetric Gaussian distribution of the first feature element
  • the second parameter of the probability distribution of the first feature element is The parameter is the first variance and the second variance of the asymmetric Gaussian distribution of the first feature element
  • the absolute value of the difference between the mean parameter of the asymmetric Gaussian distribution of the first feature element and the value k of the first feature element is greater than the eighth threshold;
  • a first variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a ninth threshold
  • a second variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a tenth threshold.
  • the probability distribution of the first feature element is a mixed Gaussian distribution, and the judgment range of the first feature element is determined. When the multiple mean values of the probability distribution of the first feature element are not in the first feature element range of judgment values.
  • the probability distribution of the first characteristic element is a Gaussian distribution, and the judgment value range of the first characteristic element is determined. When the mean value of the probability distribution of the first characteristic element is not within the judgment value range of the first characteristic element scope.
  • the probability distribution of the first characteristic element is a Gaussian distribution
  • the judgment value range of the first characteristic element is determined
  • the judgment value range includes a plurality of possible values of the first characteristic element, when the The absolute value of the difference between the mean parameter of the Gaussian distribution of the first feature element and each value in the judgment value range of the first feature element is greater than or equal to the eleventh threshold, or the probability of the first feature element
  • the variance of the distribution is greater than or equal to the twelfth threshold.
  • the value k of the first characteristic element is not within the judgment value range of the first characteristic element.
  • the probability value corresponding to the value k of the first feature element is less than or equal to the thirteenth threshold.
  • a threshold candidate list of the first threshold is constructed, the code stream is decoded to obtain an index number of the threshold candidate list of the first threshold, and the index number of the first threshold
  • the value of the threshold candidate list position corresponding to the first threshold is used as the value of the first threshold, wherein the length of the threshold candidate list of the first threshold can be set to T; T is an integer greater than or equal to 1.
  • the other arbitrary thresholds can adopt the threshold candidate list construction method as the first threshold, and can decode the index number corresponding to the threshold, and select the value in the construction list as the threshold according to the index number.
  • the decision information is obtained by inputting the probability estimation result into the generation network.
  • the generation network may be a convolutional network, which may include multiple network layers, and any network layer may be a convolutional layer, a normalization layer, a nonlinear activation layer, and the like.
  • the probability estimation result of the feature data is input into the generation network to obtain the decision information of the first feature element, and the decision information is used to indicate whether to perform entropy on the first feature element decoding.
  • the decision information of each feature element in the feature data is a decision map, which may also be called a decision map map.
  • the decision graph is preferably a binary graph, and the binary graph may also be called a binary graph map.
  • the value of the decision information of the feature elements in the binary graph is usually 0 or 1. Therefore, when the value corresponding to the location of the first characteristic element in the decision diagram is a preset value, entropy decoding needs to be performed on the first characteristic element; when the value corresponding to the location of the first characteristic element in the decision diagram is When the value of the position is not a preset value, entropy decoding does not need to be performed on the first feature element.
  • the set of decision information of each feature element in the feature data can also be a floating point number, that is to say, the value can be other values except 0 and 1.
  • the value of the decision information of the first feature element is equal to or greater than the preset value, it is judged that entropy decoding needs to be performed on the first feature element; or when the first feature element
  • the value of the decision information of the element is less than a preset value, it is determined that entropy decoding does not need to be performed on the first feature element.
  • the feature data is passed through a decoding network to obtain a reconstructed image.
  • the feature data is passed through a decoding network to obtain machine-oriented task data
  • the feature data is passed through a machine-oriented task module to obtain machine-oriented task data
  • the machine-oriented module includes a target Recognition network, classification network or semantic segmentation network.
  • a characteristic data encoding device including:
  • An obtaining module configured to obtain feature data to be encoded, the feature data to be encoded includes a plurality of feature elements, the plurality of feature elements include a first feature element, and is used to obtain a probability estimation result of the first feature element ;
  • An encoding module configured to determine whether to perform entropy encoding on the first feature element according to the probability estimation result of the first feature element; only when it is determined that entropy encoding needs to be performed on the first feature element, perform The first feature element performs entropy coding.
  • a feature data decoding device including:
  • An obtaining module configured to obtain a code stream of feature data to be decoded, where the feature data to be decoded includes a plurality of feature elements, and the plurality of feature elements include a first feature element; acquire a probability estimation result of the first feature element;
  • a decoding module configured to judge whether to perform entropy decoding on the first feature element according to the probability estimation result of the first feature element; only when it is judged that entropy decoding needs to be performed on the first feature element, The first feature element performs entropy decoding.
  • the present application provides an encoder, including a processing circuit for judging the method according to any one of the first aspect and the first aspect.
  • the present application provides a decoder, including a processing circuit for judging the second aspect and the method described in any one of the second aspect.
  • the present application provides a computer program product, including program code, which is used to judge the above-mentioned first aspect and any one of the first aspect, the above-mentioned second aspect and the second aspect when it is judged on a computer or a processor.
  • program code which is used to judge the above-mentioned first aspect and any one of the first aspect, the above-mentioned second aspect and the second aspect when it is judged on a computer or a processor.
  • the present application provides an encoder, including: one or more processors; a non-transitory computer-readable storage medium, coupled to the processor and storing a program judged by the processor, wherein the The program enables the decoder to judge the first aspect and the method described in any one of the first aspect when judged by the processor.
  • the present application provides a decoder, including: one or more processors; a non-transitory computer-readable storage medium, coupled to the processors and storing a program judged by the processors, wherein the When judged by the processor, the program enables the encoder to judge the method described in the second aspect and the method described in any one of the second aspect.
  • the present application provides a non-transitory computer-readable storage medium, including program code, which is used to determine any one of the first aspect and the first aspect, the second aspect and The method according to any one of the second aspect.
  • the present invention relates to an encoding device, having a function of implementing the behaviors in the first aspect or any one of the method embodiments of the first aspect.
  • Said functions can be realized by hardware, and can also be realized by corresponding software judged by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the encoding device includes: an obtaining module, configured to transform the original image or the residual image into a feature space through an encoding network, and extract feature data for compression.
  • the probability estimation of the feature data is performed to obtain the probability estimation results of each feature element of the feature data; the encoding module uses the probability estimation results of each feature element of the feature data to judge whether each feature element in the feature data performs entropy encoding and The encoding process of all the feature elements in the feature data is completed to obtain the code stream of the feature data.
  • These modules can determine the corresponding functions in the first aspect or any method example of the first aspect. For details, refer to the detailed description in the method examples, and details are not repeated here.
  • the present invention relates to a decoding device, which has the function of realizing the actions in the second aspect or any one of the method embodiments of the second aspect.
  • Said functions can be realized by hardware, and can also be realized by corresponding software judged by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the decoding device includes: an obtaining module, configured to obtain a code stream of the feature data to be decoded, and perform probability estimation according to the code stream of the feature data to be decoded to obtain a probability estimate of each feature element of the feature data Result; the decoding module uses the probability estimation results of each feature element of the feature data to judge whether each feature element in the feature data performs entropy decoding through certain conditions and completes the decoding process of all feature elements in the feature data to obtain the feature data. , and decode the feature data to obtain reconstructed image or machine-oriented task data.
  • These modules can determine the corresponding functions in the above-mentioned second aspect or any method example of the second aspect. For details, refer to the detailed description in the method examples, and details are not repeated here.
  • a method for encoding characteristic data including:
  • the feature data includes a plurality of feature elements, and the plurality of feature elements include a first feature element
  • Entropy coding is performed on the first feature element only when it is determined that entropy coding needs to be performed on the first feature element.
  • the feature data is one-dimensional, two-dimensional or multi-dimensional data output by the encoding network, where each data is a feature element.
  • the side information of the characteristic data is encoded into the code stream.
  • the side information is the feature information obtained by inputting the feature data into the neural network and further extracting, and the number of feature elements included in the side information is less than that of the feature data.
  • the first feature element is any feature element in the feature data.
  • the set of decision information of each feature element of the feature data may be represented by a decision diagram or the like.
  • the decision graph is one-dimensional, two-dimensional or multi-dimensional image data and is consistent with the size of the feature data.
  • the joint network also outputs the probability estimation result of the first feature element, and the probability estimation result of the first feature element includes the probability value of the first feature element, and/or the probability distribution of the The first parameter and the second parameter of the probability distribution.
  • entropy coding when the value corresponding to the position of the first characteristic element in the decision diagram is a preset value, entropy coding needs to be performed on the first characteristic element; when the value corresponding to the first characteristic element in the decision diagram When the value of the position is not a preset value, entropy coding does not need to be performed on the first feature element.
  • a method for decoding feature data including:
  • the characteristic data to be decoded includes a plurality of characteristic elements, and the plurality of characteristic elements include a first characteristic element;
  • Entropy decoding is performed on the first feature element only when it is determined that entropy decoding needs to be performed on the first feature element.
  • the code stream of the feature data to be decoded is decoded to obtain side information.
  • the number of feature elements contained in the side information is less than that of feature data.
  • the first feature element is any feature element in the feature data.
  • the decision information of each characteristic element of the characteristic data may be expressed in a manner such as a decision diagram.
  • the decision graph is one-dimensional, two-dimensional or multi-dimensional image data and is consistent with the size of the feature data.
  • the joint network also outputs the probability estimation result of the first feature element, and the probability estimation result of the first feature element includes the probability value of the first feature element, and/or the probability distribution of the The first parameter and the second parameter of the probability distribution.
  • entropy decoding needs to be performed on the first characteristic element; when the value corresponding to the first characteristic element in the decision diagram When the value of the position is not a preset value, entropy decoding does not need to be performed on the first feature element, and the feature value of the first feature element is set to k, where k is an integer.
  • This application utilizes the relevant information about the probability distribution of feature points in the feature data to be encoded to determine whether entropy encoding and decoding is required for each feature element in the feature data to be encoded and decoded, thereby skipping the entropy encoding and decoding process of some feature elements.
  • the number of elements to be encoded and decoded can be significantly reduced, reducing the complexity of encoding and decoding.
  • the threshold value can be flexibly set to control the code rate of the generated code stream.
  • FIG. 1A is an exemplary block diagram of an image decoding system
  • Fig. 1B is the realization of the processing circuit of the image decoding system
  • Fig. 1C is a schematic block diagram of an image decoding device
  • Figure 1D is a diagram of the implementation of the device of the embodiment of the present application.
  • FIG. 2A is a system architecture diagram of a possible scenario of the present application.
  • FIG. 2B is a system architecture diagram of a possible scenario of the present application.
  • 3A-3D are schematic block diagrams of encoders
  • FIG. 4A is a schematic diagram of an encoding network unit
  • Figure 4B is a schematic diagram of the network structure of the encoding network
  • Fig. 5 is a structural schematic diagram of a coding decision-making realization unit
  • Figure 6 is an example diagram of joint network output
  • Fig. 7 is an example diagram of generating network output
  • Fig. 8 is a schematic diagram of realization of decoding decision
  • Fig. 9 is an example diagram of a network structure of a decoding network
  • FIG. 10A is an example diagram of a decoding method in an embodiment of the present application.
  • FIG. 10B is a schematic block diagram of an image feature map decoder according to an embodiment of the present application.
  • Figure 11A is an example diagram of the decoding method of the embodiment of the present application.
  • Fig. 12 is an example diagram of the network structure of the side information extraction module
  • Fig. 13A is an example diagram of the decoding method of the embodiment of the present application.
  • FIG. 13B is a schematic block diagram of an image feature map decoder according to an embodiment of the present application.
  • FIG. 14 is an example diagram of a decoding method in an embodiment of the present application.
  • Fig. 15 is an example diagram of a network structure of a joint network
  • FIG. 16 is a schematic block diagram of an image feature map decoder according to an embodiment of the present application.
  • Fig. 17 is an example diagram of the decoding method of the embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of an exemplary encoding device of the present application.
  • FIG. 19 is a schematic structural diagram of an exemplary decoding device of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • the embodiment of the present application provides an AI-based feature data encoding and decoding technology, especially a neural network-based image feature map and/or audio feature variable encoding and decoding technology, and specifically provides an end-to-end image feature map-based and/or the codec system of the audio characteristic variable.
  • Image coding (or commonly referred to as coding) includes two parts, image coding and image decoding, in which video is composed of multiple images and is a representation of continuous images.
  • Image encoding is determined on the source side and typically involves processing (eg, compressing) raw video images to reduce the amount of data required to represent the video images (and thus more efficient storage and/or transmission).
  • Image decoding is judged at the destination and usually involves inverse processing relative to the encoder to reconstruct the image.
  • the "decoding" of images or audios involved in the embodiments should be understood as “encoding” or “decoding” of images or audios.
  • the encoding part and the decoding part are also collectively referred to as codec (encoding and decoding, CODEC).
  • the original image can be reconstructed, i.e. the reconstructed image has the same quality as the original image (assuming no transmission loss or other data loss during storage or transmission).
  • the amount of data required to represent the video image is reduced by further compression through quantization and other judgments, and the decoder side cannot completely reconstruct the video image, that is, the quality of the reconstructed video image is higher than that of the original video image. lower or worse.
  • the neural network can be composed of neural units, and the neural unit can refer to an operation unit that takes xs and intercept 1 as input, and the output of the operation unit can be:
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, it is actually not complicated in terms of the work of each layer.
  • it is the following linear relationship expression: in, is the input vector, is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Entropy coding is used to apply entropy coding algorithms or schemes (for example, variable length coding (variable length coding, VLC) schemes, context adaptive VLC schemes (context adaptive VLC, CALVC), arithmetic coding schemes, binarization algorithms, context automatic Adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), syntax-based context-adaptive binary arithmetic coding (syntax-based context-adaptive binary arithmetic coding, SBAC), probability interval partitioning entropy (probability interval partitioning entropy, PIPE) coding or other entropy coding methods or techniques) are applied to quantized coefficients and other syntax elements to obtain encoded data that can be output in the form of encoded bit streams through the output terminal, so that decoders, etc. can receive and use parameters for decoding.
  • the encoded bitstream can be transmitted to the decoder, or it can be stored in memory for later transmission or retrieval by
  • the encoder 20A and the decoder 30A are described with reference to FIGS. 1A to 15 .
  • FIG. 1A is a schematic block diagram of an exemplary decoding system 10 , such as an image (or audio) decoding system 10 (or simply referred to as the decoding system 10 ) that can utilize the technology of the present application.
  • the encoder 20A and the decoder 30A in the image decoding system 10 represent devices and the like that can be used to judge each technique from various examples described in this application.
  • a decoding system 10 includes a source device 12 , and the source device 12 is configured to provide coded streams 21 such as coded images (or audio) to a destination device 14 for decoding the coded streams 21 .
  • the source device 12 includes an encoder 20A, and optionally an image source 16 , a preprocessor (or preprocessing unit) 18 , a communication interface (or communication unit) 26 and a probability estimation (or probability estimation unit) 40 .
  • Image (or audio) source 16 may comprise or may be any type of image capture device for capturing real world images (or audio), etc., and/or any type of image generation device, such as a computer for generating computer animation images Graphics processors or any type of processor used to acquire and/or provide real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality, AR) image) equipment.
  • the audio or image source can be any type of memory or storage that stores any of the above audio or images.
  • the image or audio (image or audio data) 17 may also be referred to as original image or audio (original image data or audio data) 17 .
  • the preprocessor 18 is used to receive (original) image (or audio) data 17 and preprocess the image (or audio) data 17 to obtain preprocessed image or audio (or preprocessed image or audio data) 19 .
  • preprocessing determined by preprocessor 18 may include cropping, color format conversion (eg, from RGB to YCbCr), color grading, or denoising. It can be understood that the preprocessing unit 18 can be an optional component.
  • the encoder 20A includes an encoding network 20 , an entropy encoding 24 and, optionally, a preprocessor 22 .
  • Image (or audio) encoding network (or encoding network) 20 is used to receive preprocessed image (or audio) data 19 and provide encoded image (or audio) data 21 .
  • the preprocessor 22 is used to receive the feature data 21 to be encoded, and perform preprocessing on the feature data 21 to be encoded to obtain the preprocessed feature data 23 to be encoded.
  • the preprocessing determined by the preprocessor 22 may include cropping, color format conversion (eg, from RGB to YCbCr), color correction, or denoising. It can be understood that the preprocessing unit 22 can be an optional component.
  • the entropy coding 24 is used to receive the feature data to be coded (or preprocess the feature data to be coded) 23 and generate the code stream 25 according to the probability estimation result 41 provided by the probability estimation 40 .
  • the communication interface 26 in the source device 12 can be used to: receive the coded code stream 25 and send the coded code stream 25 (or any other processed version) to another device such as the destination device 14 or any other device through the communication channel 27, so as to store Or rebuild directly.
  • the destination device 14 includes a decoder 30A, and may additionally and optionally include a communication interface (or communication unit) 28 , a post-processor (or post-processing unit) 36 and a display device 38 .
  • the communication interface 28 in the destination device 14 is used to receive the coded code stream 25 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, for example, the storage device is a coded code stream storage device, And the encoded code stream 25 is provided to the decoder 30A.
  • the communication interface 26 and the communication interface 28 can be used to pass through a direct communication link between the source device 12 and the destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any other Combination, any type of private network and public network or any combination thereof, send or receive coded code stream (or coded code stream data) 25 .
  • the communication interface 26 can be used to encapsulate the coded code stream 25 into a suitable format such as a message, and/or use any type of transmission coding or processing to process the coded code stream for transmission over a communication link or a communication network. transmission.
  • the communication interface 28 corresponds to the communication interface 26 , for example, can be used to receive transmission data, and use any type of corresponding transmission decoding or processing and/or decapsulation to process the transmission data to obtain the encoded code stream 25 .
  • Both the communication interface 26 and the communication interface 28 can be configured as a one-way communication interface as indicated by an arrow from the source device 12 to the corresponding communication channel 27 of the destination device 14 in FIG. 1A , or a two-way communication interface, and can be used to send and receive messages etc., to establish the connection, confirm and exchange any other information related to the communication link and/or data transmission such as encoded image data transmission, etc.
  • the decoder 30A comprises a decoding network 34 , an entropy decoding 30 and, optionally, a post-processor 32 .
  • the entropy decoding 30 is used to receive the encoded code stream 25 and provide the decoding characteristic data 31 according to the probability estimation result 42 provided by the probability estimation 40 .
  • the post-processor 32 is used to post-process the decoded feature data 31 to obtain post-processed decoded feature data 33 .
  • the post-processing determined by the post-processing unit 32 may include, for example, color format conversion (for example, conversion from YCbCr to RGB), color correction, trimming, or resampling. It can be understood that the post-processing unit 32 may be an optional component.
  • the decoding network 34 is used to receive the decoded characteristic data 31 or post-processed decoded characteristic data 33 and provide reconstructed image data 35 .
  • the post-processor 36 is used for post-processing the reconstructed image data 35 to obtain post-processed reconstructed image data 37 .
  • the post-processing determined by the post-processing unit 36 may include, for example, color format conversion (for example, conversion from YCbCr to RGB), color correction, trimming, or resampling. It is understood that the post-processing unit 36 may be an optional component.
  • the display device 38 is used to receive the reconstructed image data 35 or post-processed reconstructed image data 37 to display the image to a user or a viewer.
  • Display device 38 may be or include any type of player or display for representing reconstructed audio or images, eg, an integrated or external display screen or display.
  • the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display. ), a digital light processor (DLP), or any type of other display.
  • FIG. 1A shows the source device 12 and the destination device 14 as independent devices
  • device embodiments may also include the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14 at the same time, that is, include both the source device 12 and the destination device 14 at the same time.
  • Device 12 or corresponding function and destination device 14 or corresponding function In these embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
  • a feature data encoder 20A (such as an image feature map encoder or an audio feature variable encoder) or a feature data decoder 30A (such as an image feature map decoder or an audio feature variable decoder) or both can be implemented by Implementation of processing circuits, such as one or more microprocessors, digital signal processors (digital signal processors, DSPs), application-specific integrated circuits (application-specific integrated circuits, ASICs), field-programmable gate arrays (field-programmable gate arrays) , FPGA), discrete logic, hardware, dedicated processor for image encoding, or any combination thereof.
  • Feature data encoder 20A can be implemented by processing circuit 56 and feature data decoder 30A can be implemented by processing circuit 56 .
  • the processing circuitry 56 may be used to determine various operations discussed below. If part of the technology is implemented in software, the device can store software instructions in a suitable non-transitory computer-readable storage medium, and use one or more processors to judge the instructions in hardware, thereby judging the technology of the present invention.
  • One of the feature data encoder 20A and the feature data decoder 30A may be integrated in a single device as part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 1B .
  • Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, cell phone, smartphone, tablet or tablet computer, camera, desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiving devices, broadcast transmitting devices, etc., and may not Use or use any type of operating system.
  • source device 12 and destination device 14 may be equipped with components for wireless communication. Accordingly, source device 12 and destination device 14 may be wireless communication devices.
  • the decoding system 10 shown in FIG. 1A is merely exemplary, and the techniques provided herein can be applied to image feature map or audio feature variable encoding settings (e.g., image feature map encoding or image feature map decoding) , these settings do not necessarily include any data communication between the encoding device and the decoding device.
  • data is retrieved from local storage, sent over a network, and so on.
  • the image feature map or audio feature variable encoding device may encode data and store the data in memory, and/or the image feature map or audio feature variable decoding device may retrieve data from memory and decode the data.
  • encoding and decoding are determined by devices that do not communicate with each other but simply encode data to memory and/or retrieve and decode data from memory.
  • FIG. 1B is an illustrative diagram of an example of a coding system 50 including feature data encoder 20A of FIG. 1A and/or feature data decoder 30A of FIG. 1B , according to an example embodiment.
  • the decoding system 50 may include an imaging (or audio generating) device 51, an encoder 20A, a decoder 30A (and/or a feature data encoder/decoder implemented by a processing circuit 56), an antenna 52, one or more processors 53.
  • One or more memory stores 54 and/or display (or audio playback) devices 55 are examples of the decoding system 50 including feature data encoder 20A of FIG. 1A and/or feature data decoder 30A of FIG. 1B , according to an example embodiment.
  • the decoding system 50 may include an imaging (or audio generating) device 51, an encoder 20A, a decoder 30A (and/or a feature data encoder/decoder implemented by a processing circuit 56), an antenna 52, one or more processors 53.
  • an imaging (or audio producing) device 51 an antenna 52, a processing circuit 56, an encoder 20A, a decoder 30A, a processor 53, a memory storage 54, and/or a display (or audio playback) device 55 can interact with each other. communication.
  • coding system 50 may include only encoder 20A or only decoder 30A.
  • antenna 52 may be used to transmit or receive an encoded bitstream of characteristic data.
  • a display (or audio playback) device 55 may be used to present image (or audio) data.
  • the processing circuit 56 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the decoding system 50 can also include an optional processor 53, which can similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, an audio processor, a general-purpose processor, etc.
  • the memory storage 54 can be any type of memory, such as volatile memory (for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.) or non-volatile memory.
  • volatile memory for example, flash memory, etc.
  • memory storage 54 may be implemented by cache memory.
  • processing circuitry 56 may include memory (eg, cache, etc.) for implementing an image buffer or the like.
  • encoder 20A implemented with logic circuitry may include an image buffer (eg, implemented with processing circuitry 56 or memory storage 54 ) and a graphics processing unit (eg, implemented with processing circuitry 56 ).
  • a graphics processing unit may be communicatively coupled to the image buffer.
  • Graphics processing unit may include encoder 20A implemented by processing circuitry 56 .
  • Logic circuits may be used to determine the various operations discussed herein.
  • decoder 30A may be implemented by processing circuitry 56 in a similar manner to implement the various modules discussed with reference to decoder 30 of FIG. 1B and/or any other decoder system or subsystem described herein.
  • logic circuit implemented decoder 30A may include an image buffer (implemented by processing circuit 56 or memory storage 54 ) and a graphics processing unit (eg, implemented by processing circuit 56 ).
  • a graphics processing unit may be communicatively coupled to the image buffer.
  • Graphics processing unit may include image decoder 30A implemented by processing circuitry 56 .
  • antenna 52 may be used to receive an encoded bitstream of image data.
  • an encoded bitstream may contain data related to encoding audio or video frames, indicators, index values, mode selection data, etc., as discussed herein, such as data related to encoding partitions.
  • Coding system 50 may also include decoder 30A coupled to antenna 52 and used to decode the encoded bitstream.
  • a display (or audio playback) device 55 is used to present images (or audio).
  • the decoder 30A may be used for judging the opposite process.
  • the decoder 30A may be configured to receive and parse such syntax elements, decoding the associated image data accordingly.
  • encoder 20A may entropy encode the syntax elements into an encoded bitstream. In such instances, decoder 30A may parse such syntax elements and decode the associated image data accordingly.
  • FIG. 1C is a schematic diagram of a decoding device 400 provided by an embodiment of the present invention.
  • the decoding device 400 is suitable for implementing the disclosed embodiments described herein.
  • the decoding device 400 may be a decoder, such as the image feature map decoder 30A in FIG. 1A , or an encoder, such as the image feature map encoder 20A in FIG. 1A .
  • the image decoding device 400 includes: an input port 410 (or input port 410) for receiving data and a receiving unit (receiver unit, Rx) 420; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 430;
  • the processor 430 here can be a neural network processor 430; a sending unit (transmitter unit, Tx) 440 and an output port 450 (or output port 450) for transmitting data; memory 460.
  • the image (or audio) decoding device 400 may also include an optical-to-electrical (OE) component and an electrical-to-optical (electrical-to-optical, EO) components for the exit or entry of optical or electrical signals.
  • OE optical-to-electrical
  • EO electrical-to-optical
  • the processor 430 is realized by hardware and software.
  • Processor 430 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 430 is in communication with the ingress port 410 , the receiving unit 420 , the transmitting unit 440 , the egress port 450 and the memory 460 .
  • the processor 430 includes a decoding module 470 (eg, a neural network NN based decoding module 470 ).
  • the decoding module 470 implements the embodiments disclosed above. For example, the decode module 470 judges, processes, prepares, or provides for various encoding operations.
  • the decoding module 470 is implemented with instructions stored in the memory 460 and judged by the processor 430 .
  • Memory 460 includes one or more magnetic disks, tape drives, and solid-state drives, which can be used as overflow data storage devices for storing judgment programs as they are selected, and for storing instructions and data read during program judgment.
  • Memory 460 may be volatile and/or nonvolatile, and may be a read-only memory (ROM), random access memory (random access memory, RAM), ternary content-addressable memory (ternary content-addressable memory (TCAM) and/or static random-access memory (static random-access memory, SRAM).
  • ROM read-only memory
  • RAM random access memory
  • TCAM ternary content-addressable memory
  • SRAM static random-access memory
  • FIG. 1D is a simplified block diagram of an apparatus 500 provided in an exemplary embodiment.
  • the apparatus 500 may be used as either or both of the source device 12 and the destination device 14 in FIG. 1A .
  • Processor 502 in apparatus 500 may be a central processing unit.
  • processor 502 may be any other type of device or devices, existing or to be developed in the future, capable of manipulating or processing information. While the disclosed implementations can be implemented using a single processor, such as processor 502 as shown, it is faster and more efficient to use more than one processor.
  • memory 504 in apparatus 500 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 504 .
  • Memory 504 may include code and data 506 accessed by processor 502 via bus 512 .
  • Memory 504 may also include an operating system 508 and application programs 510, including at least one program that allows processor 502 to execute methods described herein.
  • application program 510 may include applications 1 through N, and also include an image decoding application that determines the methods described herein.
  • Apparatus 500 may also include one or more output devices, such as display 518 .
  • display 518 may be a touch-sensitive display that combines the display with touch-sensitive elements that may be used to sense touch input.
  • Display 518 may be coupled to processor 502 via bus 512 .
  • bus 512 in device 500 is described herein as a single bus, bus 512 may include multiple buses. Additionally, secondary storage may be directly coupled to other components of device 500 or accessed over a network, and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, apparatus 500 may have a wide variety of configurations.
  • FIG. 2A shows a system architecture 1800 in a possible image feature map or audio feature variable encoding and decoding scenario, including:
  • Collection device 1801 the video collection device completes the original video (or audio) collection
  • Pre-collection processing 1802 the original video (or audio) is collected to obtain video (or audio) data through a series of pre-processing;
  • Video (or audio) coding is used to reduce coding redundancy and reduce the amount of data transmission during compression of image feature maps or audio feature variables;
  • Sending 1804 Sending the compressed code stream data obtained after encoding through the sending module
  • Receiving 1805 the compressed code stream data is received by the receiving module through network transmission;
  • Code stream decoding 1806 perform code stream decoding on the code stream data
  • Rendering and displaying (or playing) 1807 rendering and displaying (or playing) the decoded data
  • FIG. 2B shows a possible image feature map (or audio feature variable) oriented system architecture 1900 in a machine task scenario, including:
  • Feature extraction 1901 perform feature extraction on the image (or audio) source
  • Side information extraction 1902 extract side information from the feature extraction data
  • Probability estimation 1903 the side information is used as the input of probability estimation, and the probability estimation is performed on the feature map (or feature variable) to obtain the probability estimation result;
  • Encoding 1904 performing entropy encoding on the feature extraction data in combination with the probability estimation result to obtain a code stream;
  • a quantization or rounding operation is performed on the feature extraction data before encoding, and then the quantized or rounded feature extraction data is encoded.
  • entropy coding is performed on the side information, so that the code stream includes side information data.
  • Decoding 1905 Perform entropy decoding on the code stream in combination with the probability estimation results to obtain image feature maps (or audio feature variables);
  • the coded stream includes side information coded data
  • entropy decoding is performed on the side information coded data
  • the decoded side information data is used as an input of probability estimation to obtain a probability estimation result.
  • the probability estimation results of each feature element can be output in parallel; when the input of probability estimation includes context information, the probability estimation results of each feature element need to be serially output.
  • the side information is the feature information obtained by inputting the image feature map or the audio feature variable into the neural network to further extract, and the number of feature elements contained in the side information is less than the feature elements of the image feature map or the audio feature variable.
  • side information of image feature maps or audio feature variables can be encoded into the code stream.
  • Machine Vision Task 1906 Perform a machine vision (or hearing) task on the decoded feature map (or feature variable).
  • the decoded feature data is input into the machine vision (or auditory) task network, and the network output is one-dimensional, two-dimensional or multi-dimensional data related to visual (or auditory) tasks such as classification, target recognition, semantic segmentation and other tasks.
  • the feature extraction and encoding processes are implemented on the terminal, and the decoding and execution of machine vision tasks are implemented on the cloud.
  • the encoder 20A is operable to receive images (or image data) or audio (or audio data) 17 via an input 202 or the like.
  • the received image, image data, audio, and audio data may also be preprocessed image (or preprocessed image data) or audio (or preprocessed audio data) 19 .
  • Image (or audio) 17 may also be referred to as a current image or an image to be encoded (especially when the current image is distinguished from other images in video encoding, other images such as the same video sequence, that is, the video sequence that also includes the current image previously encoded image and/or decoded image) or current audio or audio to be encoded.
  • a (digital) image is or can be viewed as a two-dimensional array or matrix of pixel points with intensity values. Pixels in the array may also be referred to as pixels (pixel or pel) (short for image element). The number of pixels in the array or image in the horizontal and vertical directions (or axes) determines the size and/or resolution of the image. In order to represent a color, three color components are usually used, that is, an image can be represented as or include three pixel arrays. In the RBG format or color space, an image includes corresponding red, green and blue pixel arrays.
  • each pixel can be expressed in a luminance/chroma format or color space, such as YCbCr, including a luminance component indicated by Y (also denoted by L sometimes) and two chrominance components indicated by Cb and Cr.
  • the luminance (luma) component Y represents brightness or grayscale level intensity (e.g., both are the same in a grayscale image), while the two chrominance (chroma) components Cb and Cr represent chrominance or color information components .
  • an image in the YCbCr format includes a luminance pixel point array of luminance pixel point values (Y) and two chrominance pixel point arrays of chrominance values (Cb and Cr).
  • Images in RGB format can be converted or transformed to YCbCr format and vice versa, a process also known as color transformation or conversion. If the image is black and white, the image may only include an array of luminance pixels. Correspondingly, the image can be, for example, an array of luma pixels in monochrome format or an array of luma pixels and two corresponding arrays of chrominance pixels in 4:2:0, 4:2:2 and 4:4:4 color formats .
  • the image encoder 20A places no limitation on the color space of the image.
  • an embodiment of the encoder 20A may include an image (or audio) segmentation unit (not shown in FIG. 1A or 1B ) for segmenting the image (or audio) 17 into multiple (typically non-overlapping ) image blocks 203 or audio segments.
  • image blocks can also be called root blocks, macro blocks (H.264/AVC) or coding tree blocks (Coding Tree Block, CTB) in the H.265/HEVC and VVC standards, or coding tree units (Coding Tree Unit, CTU).
  • the segmentation unit can be used to use the same block size for all images in a video sequence and to use a corresponding grid that defines the block size, or to vary the block size between images or subsets or groups of images and segment each image into corresponding piece.
  • the encoder can be adapted to directly receive blocks 203 of an image 17 , for example one, several or all blocks making up said image 17 .
  • the image block 203 may also be referred to as a current image block or an image block to be encoded.
  • the image block 203 is also or can be regarded as a two-dimensional array or matrix composed of pixels with intensity values (pixel values), but the image block 203 is smaller than that of the image 17 .
  • block 203 may comprise one pixel point array (for example, a luminance array in the case of a monochrome image 17 or a luminance array or a chrominance array in the case of a color image) or three pixel point arrays (for example, in the case of a color image 17 one luma array and two chrominance arrays) or any other number and/or type of arrays depending on the color format employed.
  • a block may be an array of M ⁇ N (M columns ⁇ N rows) pixel points, or an array of M ⁇ N transform coefficients, and the like.
  • the encoder 20A shown in FIGS. 1A-1B or 3A-3D is used to encode the image 17 block by block.
  • the encoder 20A shown in FIGS. 1A-1B or 3A-3D is used to encode the image 17 .
  • the encoder 20A shown in FIGS. 1A-1B or 3A-3D can also be used to partition the coded picture using slices (also called video slices), where the picture can use one or more slices (usually for non-overlapping) to split or encode.
  • slices also called video slices
  • Each slice may include one or more blocks (for example, coding tree unit CTU) or one or more block groups (for example, coding block (tile) in H.265/HEVC/VVC standard and sub-picture in VVC standard (subpicture).
  • the encoder 20A shown in FIGS. 1A-1B or 3A-3D can also be used to use slices/coded block groups (also called video coded block groups) and/or coded blocks ( Also known as a Video Coding Block) to partition and/or code an image, where an image can be partitioned or coded using one or more slices/coding block groups (usually non-overlapping), each slice/coding block A group may include one or more blocks (such as CTUs) or one or more coding blocks, etc., wherein each coding block may be in the shape of a rectangle or the like, and may include one or more complete or partial blocks (such as CTUs).
  • slices/coded block groups also called video coded block groups
  • coded blocks also known as a Video Coding Block
  • the coding network 20 is used to obtain image feature maps or audio feature variables according to the input data through the coding network.
  • the encoding network 20 is as shown in FIG. 4A .
  • the encoding network 20 includes multiple network layers, and any network layer may be a convolutional layer, a normalization layer, a nonlinear activation layer, and the like.
  • the input of the encoding network 20 is at least one image to be encoded or at least one image block to be encoded.
  • the image to be encoded can be an original image, a lossy image or a residual image.
  • FIG. 4B an example of the network structure of the encoding network in the encoding network 20 is shown in FIG. 4B. It can be seen that the encoding network in the example includes 5 network layers, specifically including three convolutional layers and two nonlinear activation layers. .
  • the rounding is used to round the image feature map or the audio feature variable by, for example, scalar quantization or vector quantization, to obtain the rounded image feature map or audio feature variable.
  • the encoder 20A can be used to output the rounding parameter (quantization parameter, QP), for example, directly output or output after encoding or compression by the encoding decision realization unit, for example, so that the decoder 30A can receive and use the quantization parameter to decode.
  • QP quantization parameter
  • the output feature map or feature audio feature variables are preprocessed before being rounded, and the preprocessing may include trimming, color format conversion (for example, from RGB to YCbCr), color correction, or denoising.
  • the probability estimation is based on the input feature map or feature variable information to obtain the probability estimation result of the image feature map or audio feature variable.
  • Probability estimation is used to perform probability estimation on rounded image feature maps or audio feature variables.
  • the probability estimation may be a probability estimation network, the probability estimation network is a convolutional network, and the convolutional network includes a convolutional layer and a nonlinear activation layer. Taking Figure 4B as an example, the probability estimation network includes 5 network layers, specifically including three convolutional layers and two nonlinear activation layers. Probability estimation can be realized by non-network traditional probability estimation method. Probability estimation methods include, but are not limited to, statistical methods such as equal maximum likelihood estimation, maximum a posteriori estimation, and maximum likelihood estimation.
  • the implementation of coding decision includes coding element judgment and entropy coding.
  • the image feature map or audio feature variable is one-dimensional, two-dimensional or multi-dimensional data output by the encoding network, where each data is a feature element. Coding Element Judgment 261
  • the coding element judgment is to judge each feature element in the image feature map or audio feature variable according to the probability estimation result information of the probability estimation, and decide which feature elements to perform entropy coding according to the judgment result.
  • the element judgment process of the Pth feature element of the image feature map or audio feature variable After the element judgment process of the Pth feature element of the image feature map or audio feature variable is completed, the element judgment process of the P+1th feature element of the image feature map starts, where P is a positive integer and P is less than M.
  • Entropy coding can be encoded using various public entropy coding algorithms, such as using schemes such as variable length coding (variable length coding, VLC) scheme, context adaptive VLC scheme (context adaptive VLC, CAVLC), entropy coding scheme, two Value algorithm, context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval segmentation entropy (probability interval partitioning entropy, PIPE) encoding or other entropy encoding methods or techniques.
  • VLC variable length coding
  • CAVLC context adaptive VLC scheme
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval segmentation entropy
  • the encoded image data 25 that can be output in the form of an encoded bit stream 25 or the like through the output terminal 212 is obtained so that the decoder 30A or the like can receive and use parameters for decoding.
  • Encoded bitstream 25 may be transmitted to decoder 30A, or stored in memory for later transmission or retrieval by decoder 30A.
  • the entropy coding can be coded by using an entropy coding network, for example by using a convolutional network.
  • the entropy coding since the entropy coding does not know the real character probability of the rounded feature map, these or related information can be added to the entropy coding and passed to the decoding end.
  • the joint network is based on the input side information to obtain the probability estimation results and decision information of image feature maps or audio feature variables.
  • the joint network is a multi-layer network, and the joint network may be a convolutional network, which includes a convolutional layer and a nonlinear activation layer. Any network layer of the joint network can be a convolutional layer, a normalization layer, a nonlinear activation layer, etc.
  • the decision information may be one-dimensional, two-dimensional or multi-dimensional data, and the size of the decision information may be consistent with the size of the image feature map.
  • the decision information can be output after any network layer in the joint network.
  • the probability estimation result can be output after any network layer in the joint network.
  • Figure 6 is an example of the output of the network structure of the joint network.
  • the network structure includes 4 network layers, in which the decision information is output after the fourth network layer, and the probability estimation result is output after the second network layer.
  • the generation network is to obtain the decision information of each feature element in the image feature map according to the input probability estimation result.
  • the generation network is a multi-layer network, and the generation network may be a convolutional network, which includes a convolutional layer and a nonlinear activation layer. Any network layer of the generated network can be a convolutional layer, a normalization layer, a nonlinear activation layer, etc.
  • the decision information can be output after generating any network layer in the network.
  • the decision information may be one-dimensional, two-dimensional or multi-dimensional data.
  • Fig. 7 is an example of output decision information of the network structure of the generating network, and the network structure includes 4 network layers.
  • the decoding decision implementation includes element judgment and entropy decoding.
  • the image feature map or audio feature variable is one-dimensional, two-dimensional or multi-dimensional data output by decoding decision-making, where each data is a feature element.
  • the decoding element judgment judges each feature element in the image feature map or audio feature variable according to the probability estimation result of the probability estimation, and decides which feature elements to perform entropy decoding according to the judgment result.
  • the decoding element judgment judges each feature element in the image feature map or audio feature variable and decides which feature elements to perform entropy decoding according to the judgment result, which can be regarded as the coding element judgment for each feature element in the image feature map.
  • Entropy decoding can be encoded using various public entropy decoding algorithms, such as schemes such as variable length coding (variable length coding, VLC) schemes, context adaptive VLC schemes (context adaptive VLC, CAVLC), entropy decoding schemes, two Value-based algorithm, context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval segmentation entropy (probability interval partitioning entropy, PIPE) encoding or other entropy encoding methods or techniques.
  • VLC variable length coding
  • CAVLC context adaptive VLC schemes
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the encoded image (or audio) data 25 that can be output in the form of an encoded bit stream 25 or the like through the output terminal 212 is obtained so that the decoder 30A or the like can receive and use parameters for decoding.
  • Encoded bitstream 25 may be transmitted to decoder 30A, or stored in memory for later transmission or retrieval by decoder 30A.
  • entropy decoding can be performed using an entropy decoding network, such as a convolutional network.
  • the decoding network is used to pass the decoded image feature map or audio feature variable 31 or the post-processed decoded image feature map or audio feature variable 33 through the decoding network 34 to obtain reconstructed image (or audio) data 35 or machine task-oriented data in the pixel domain.
  • the decoding network contains multiple network layers, and any network layer can be a convolutional layer, a normalization layer, a nonlinear activation layer, etc. Operations such as superposition (concat), addition, and subtraction may exist in the decoding network unit 306 .
  • the network layer structures in the decoding network may be the same or different from each other.
  • the decoding network in the example includes 5 network layers, including a normalization layer, two convolutional layers, and two nonlinear activation layers.
  • the decoding network outputs the reconstructed image (or audio), or outputs machine-oriented task data.
  • the decoding network may include an object recognition network, a classification network or a semantic segmentation network.
  • the processing result of the current step can be further processed, and then output to the next step.
  • further operations or processing may be performed on the processing results of the encoder unit or the decoder unit, such as clipping or shifting operations or filtering processing.
  • the first feature element or the second feature element is the current feature element to be encoded or the current feature element to be decoded or, for example
  • a decision map can also be called a decision map.
  • the decision graph is preferably a binary graph, and the binary graph may also be called a binary graph map.
  • Figure 10A shows a specific implementation process 1400, and the operation steps are as follows:
  • Step 1401 Get the feature map of the image
  • This step is specifically implemented by the encoding network 204 in FIG. 3A , and for details, reference may be made to the above description of the encoding network 20 .
  • the images are respectively input into the feature map y of the output image of the feature extraction module, and the feature map y can be three-dimensional data whose dimensions are wxhxc.
  • the feature extraction module can be implemented using an existing neural network, which is not limited here. This step is prior art.
  • the feature quantization module quantifies each feature value in the feature map y, rounds the feature value of the floating point number to obtain the integer feature value, and obtains the quantized feature map
  • the description of the rounding 24 in the foregoing embodiment may be referred to.
  • Step 1402 To feature map Perform probability estimation to obtain the probability estimation results of each feature element, that is, the feature map Each feature element in The probability distribution of :
  • the parameters x, y, and i are positive integers
  • the coordinates (x, y, i) indicate the position of the current feature element to be encoded.
  • the coordinates (x, y, i) indicate that the current feature element to be encoded is in the current three-dimensional feature
  • the probability distribution model can be used to obtain the probability distribution, for example, using a single Gaussian model (GSM) or a Gaussian mixture model (GMM) to model, first of all, the side information And the context information is input into the probability estimation network, and the feature map Each feature element in Perform probability estimation to get each feature element probability distribution.
  • the probability estimation network can be based on a deep learning network, such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here. Substitute the model parameters into the probability distribution model to obtain the probability distribution.
  • Step 1403 To feature map Perform entropy coding to obtain a compressed code stream, and generate a compressed code stream.
  • the current feature element to be encoded is obtained The probability P with a value of k, when the current feature element to be encoded
  • the probability estimation result P of does not meet the preset condition: when P is greater than (or equal to) the first threshold T0, skip the current feature element to be encoded and perform the entropy encoding process; otherwise, when the probability estimation result P of the current feature element to be encoded satisfies Preset condition: when P is smaller than the first threshold T0, perform entropy coding on the current feature element to be coded and write it into the code stream.
  • k can be any integer, such as 0, 1, -1, 2, 3 and so on.
  • the first threshold T0 is any number satisfying 0 ⁇ T0 ⁇ 1, such as 0.99, 0.98, 0.97, 0.95 and so on. (It can be considered that the threshold value of each feature element is the same).
  • Step 1404 the encoder sends or stores the compressed code stream.
  • Step 1411 Obtain the code stream of the decoded image feature map
  • Step 1412 Perform probability estimation according to the code stream to obtain the probability estimation results of each feature element
  • the feature map to be decoded Each feature element in Perform probability estimation to obtain the feature elements to be decoded probability distribution.
  • Feature map to be decoded includes a plurality of characteristic elements, and the plurality of characteristic elements include the currently to-be-decoded characteristic element.
  • the probability estimation network structure diagram used at the decoding end is the same as the probability estimation network structure at the encoding end in this embodiment.
  • Step 1413 Feature map to be decoded Perform entropy decoding
  • the decoding decision implementation 304 in FIG. 10B This step is specifically implemented by the decoding decision implementation 304 in FIG. 10B , and for details, refer to the above description of the decoding decision implementation 30 .
  • the probability P of the value of the current feature element to be decoded is k, that is, the probability estimation result P of the current feature element to be decoded.
  • the first threshold T0 can obtain the index number from the code stream by parsing the code stream, and the decoding end constructs the threshold candidate list in the same way as the encoding end, and then obtains according to the corresponding relationship between the preset and the index number in the threshold candidate list the corresponding threshold.
  • obtaining the index number from the code stream means obtaining the index number from the sequence header, image header, Slice/strip or SEI.
  • the code stream may be directly parsed, and the threshold value may be obtained from the code stream, specifically, the threshold value may be obtained from a sequence header, a picture header, a Slice/strip, or an SEI.
  • Step 1414 Decoded feature maps Reconstruction, or input to the machine vision task module to perform corresponding machine tasks.
  • This step can be specifically implemented by the decoding network 306 in FIG. 10B , and for details, reference can be made to the above description of the decoding network 34 .
  • Case 1 Feature map after entropy decoding Input the image reconstruction module, and the neural network outputs the reconstructed map.
  • the neural network can adopt any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like.
  • the neural network can adopt a multi-layer deep neural network structure to achieve better estimation results.
  • Case 2 Feature map after entropy decoding
  • the input is oriented to the machine vision task module to perform corresponding machine tasks.
  • complete machine vision tasks such as object classification, recognition, and segmentation.
  • the above k value at the decoding end is set corresponding to the k value at the encoding end.
  • FIG. 11A shows a specific implementation process 1500 of Embodiment 2 of the present application, and the operation steps are as follows:
  • the probability estimation results include the first parameter and the second parameter; when the probability distribution is a Gaussian distribution, the first parameter is the mean value ⁇ , and the second parameter is the variance ⁇ ; When the probability distribution is a Laplace distribution, the first parameter is the location parameter ⁇ , and the second parameter is the scale parameter b.
  • Step 1501 Get the feature map of the image
  • This step is specifically implemented by the encoding network 204 in FIG. 3B , and for details, reference may be made to the above description of the encoding network 20 .
  • the images are respectively input into the feature map y of the output image of the feature extraction module, and the feature map y can be three-dimensional data whose dimensions are wxhxc. .
  • the feature extraction module can be implemented using an existing neural network, which is not limited here. This step is prior art.
  • the feature quantization module quantifies each feature value in the feature map y, rounds the feature value of the floating point number to obtain the integer feature value, and obtains the quantized feature map
  • Step 1502 Feature map of the image Input side information extraction module, output side information
  • This step is specifically implemented by the side information extraction unit 214 in FIG. 3B .
  • the side information extraction module can be implemented using the network shown in Figure 12, the side information Can be understood as the feature map The feature map obtained by further extraction The number-ratio feature map of the contained feature elements few.
  • edge information Execute entropy coding and write into the code stream, and also perform edge information in subsequent step 1504 Execute entropy coding and write code stream, which is not limited here.
  • Step 1503 To feature map Perform probability estimation to obtain the probability estimation results of each feature element.
  • Probability distribution models can be used to obtain probability estimates and probability distributions.
  • the probability distribution model may be: a single Gaussian model (Gaussian single model, GSM) or an asymmetric Gaussian model or a mixed Gaussian model (Gaussian mixture model, GMM) or a Laplace distribution model (Laplace distribution).
  • the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
  • the side information Or the context information is input into the probability estimation network, and the feature map Each feature element in Probability estimation is performed to obtain the values of the mean parameter ⁇ and variance ⁇ .
  • the mean parameter ⁇ and variance ⁇ are input into the used probability distribution model to obtain a probability distribution.
  • the probability estimation result is the mean parameter ⁇ and variance ⁇ .
  • the probability distribution models the Laplace distribution model
  • the side information Or the context information is input into the probability estimation network, and the feature map Each feature element in Probability estimation is performed to obtain the values of the location parameter ⁇ and the scale parameter b.
  • the position parameter ⁇ and the scale parameter b are input into the probability distribution model used to obtain a probability distribution.
  • the probability estimation result is the position parameter ⁇ and the scale parameter b.
  • the probability estimation network to treat the encoded feature map
  • the current feature element to be encoded is obtained Take the probability P of the value m.
  • the probability estimation result is the current feature element to be encoded Take the probability P of the value m.
  • the probability estimation network can use a network based on deep learning, such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
  • a network based on deep learning such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
  • Step 1504 Judging the current feature element to be encoded according to the probability estimation result Whether it is necessary to perform entropy coding, and perform entropy coding to write into the compressed code stream (encoded code stream) or not perform entropy coding according to the judgment result. Only when it is determined that entropy coding needs to be performed on the first feature element currently to be encoded, entropy coding is performed on the feature element currently to be encoded.
  • This step is specifically implemented by the encoding decision implementation 208 in FIG. 3B , and for details, refer to the description of the above encoding decision implementation 26 .
  • Judging the current feature element to be encoded according to the probability estimation result Whether entropy coding needs to be performed can use one or more of the following methods.
  • the parameters x, y, and i are positive integers
  • the coordinates (x, y, i) indicate the position of the current feature element to be encoded.
  • the coordinates (x, y, i) indicate that the current feature element to be encoded is in the current three-dimensional feature The position of the feature element in the graph relative to the upper left vertex.
  • Method 1 When the probability distribution model is a Gaussian distribution, judge whether to perform entropy coding on the current feature element to be encoded according to the probability estimation result of the first feature element, when the mean value of the Gaussian distribution of the current feature element to be encoded is The values of the parameter ⁇ and the variance ⁇ do not meet the preset conditions: when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2, there is no need for the current feature element to be encoded Execute the entropy encoding process, otherwise, when the preset condition is met: when the absolute value of the difference between the mean value ⁇ and k is greater than or equal to the second threshold T1 or the variance ⁇ is less than the third threshold T2, the current feature element to be encoded Perform entropy encoding to write code stream.
  • the preset condition when the preset condition is met: when the absolute value of the difference between the mean value ⁇ and k is greater than or equal
  • k is any integer, such as 0, 1, -1, 2, 3 and so on.
  • the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, such as 0.2, 0.3, 0.4, etc.
  • T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
  • the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, then skip the feature element to be encoded Execute the entropy encoding process, otherwise, for the current feature element to be encoded Perform entropy encoding to write code stream.
  • the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, for example, the value is 0.2, 0.3, 0.4 and so on.
  • T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
  • Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the mean parameter ⁇ and variance ⁇ of the Gaussian distribution, when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (does not meet the preset conditions), skip the current feature to be encoded element Execute the entropy encoding process, where abs( ⁇ -k) means calculating the absolute value of the difference between the mean value ⁇ and k; otherwise, when the probability estimation result of the current feature element to be encoded satisfies abs( ⁇ -k)+ ⁇ T3 ( Preset conditions), for the current feature element to be encoded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the fourth threshold T3 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3
  • Method 3 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
  • abs( ⁇ -k) means calculating the absolute value of the difference between the position parameter ⁇ and k; otherwise, when the probability estimation result of the current feature element to be encoded satisfies abs( ⁇ -k)+ ⁇ T4( Preset conditions), for the current feature element to be encoded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the fourth threshold T4 is a number greater than or equal to 0 and less than 0.5, such as 0.05, 0.09, 0.17
  • Method 4 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
  • the preset condition is not met
  • skip the current feature element to be encoded Execute the entropy encoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (preset condition), the current feature element to be encoded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the value of T5 is 1e-2
  • the value of T6 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17, etc.
  • the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, then skip the feature element to be encoded Execute the entropy encoding process, otherwise, for the current feature element to be encoded Perform entropy encoding to write code stream.
  • the value of the threshold T5 is 1e-2, and the value of T2 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17 and so on.
  • Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution. When the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and any variance of the mixed Gaussian distribution is less than the fifth threshold T7 (do not meet the preset condition), skip the current feature element to be encoded Execute the entropy encoding process; Otherwise, when the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution and k and the sum of any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (preset condition), the current Feature elements to be coded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • T7 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4, etc. (It can be considered that the threshold value of each feature element is the same)
  • Method 6 Obtain the current feature element to be encoded according to the probability distribution
  • the probability P with a value of k when the probability estimation result P of the current feature element to be encoded does not meet the preset condition: when P is greater than (or equal to) the first threshold T0, the current feature element to be encoded is skipped and the entropy encoding process is performed; Otherwise, when the probability estimation result P of the current feature element to be encoded satisfies the preset condition: when P is smaller than the first threshold T0, entropy encoding is performed on the current feature element to be encoded and written into the code stream.
  • k can be any integer, such as 0, 1, -1, 2, 3 and so on.
  • the first threshold T0 is any number satisfying 0 ⁇ T0 ⁇ 1, such as 0.99, 0.98, 0.97, 0.95 and so on. (It can be considered that the threshold value of each feature element is the same)
  • the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
  • Method 1 Take the threshold T1 as an example, take any value within the value range of T1 as the threshold T1, and write the threshold T1 into the code stream. Specifically, the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
  • Method 2 The encoding end adopts the fixed threshold value agreed with the decoding end, and there is no need to write the code stream or transmit it to the decoding end. For example, taking the threshold T1 as an example, any value within the value range of T1 is directly taken as the value of T1. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
  • Method 3 Build a threshold candidate list, put the most likely value within the value range of T1 into the threshold candidate list, each threshold corresponds to a threshold index number, determine an optimal threshold, and use the optimal threshold as T1 , and use the index number of the optimal threshold as the threshold index number of T1, and write the threshold index number of T1 into the code stream.
  • the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
  • Step 1505 the encoder sends or stores the compressed code stream.
  • Step 1511 Obtain the code stream of the feature map of the image to be decoded
  • Step 1512 Obtain the probability estimation results of each feature element
  • This step is specifically implemented by the probability estimation unit 302 in FIG. 11A , and for details, refer to the above description of the probability estimation 40 .
  • side information Perform entropy decoding to get side information combined side information The feature map to be decoded Each feature element in Perform probability estimation to obtain the current feature element to be decoded The probability estimation result of .
  • the probability estimation method used by the decoder is the same as the probability estimation method of the encoder in this embodiment, and the probability estimation network structure diagram is the same as the probability estimation network structure of the encoder in this embodiment, so details are not repeated here.
  • Step 1513 This step is specifically implemented by the decoding decision implementation 304 in FIG. 11A , for details, refer to the above description of the decoding decision implementation 30 . Judging the current feature element to be decoded according to the probability estimation result Whether it is necessary to perform entropy decoding, and perform or not perform entropy decoding according to the judgment result, and obtain the decoded feature map
  • Judging the current feature element to be decoded according to the probability estimation result Whether entropy decoding needs to be performed can use one or more of the following methods.
  • Method 1 When the probability distribution model is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The value of the mean value parameter ⁇ and variance ⁇ of the value, when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2 (the preset condition is not satisfied), the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process; otherwise, when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 or the variance ⁇ is greater than or equal to the third threshold T2 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, the current feature element to be decoded If the value of is set to k, skip the current feature element to be decoded Execute the entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded
  • the value of the mean value parameter ⁇ and variance ⁇ when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (the preset condition is not satisfied), T3 is the fourth threshold, and the current to-be-decoded feature element
  • the value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T3 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 3 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
  • T4 is the fourth threshold, and the current feature element to be decoded
  • the value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T4 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 4 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
  • the current feature element to be decoded is The value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, the current feature element to be decoded
  • the value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution.
  • the The current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution and the value k of the current feature element to be decoded and the sum of any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (preset condition), for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to
  • Method 6 According to the probability distribution of the current feature element to be decoded, the probability P of the value of the current feature element to be decoded is k, that is, the probability estimation result P of the current feature element to be decoded.
  • the probability estimation result P does not meet the preset conditions:
  • P is greater than the first threshold T0, there is no need to perform entropy decoding on the current feature element to be decoded, and the value of the current feature element to be decoded is set to k; otherwise, when the current feature element to be decoded meets the preset condition: P is less than or equal to
  • the first threshold is T0, entropy decoding is performed on the code stream to obtain the value of the feature element currently to be decoded.
  • the above k value at the decoding end is set corresponding to the k value at the encoding end.
  • the method of obtaining thresholds T0, T1, T2, T3, T4, T5, T6 and T7 corresponds to the encoding end, and one of the following methods can be used:
  • Method 1 Obtain the threshold value from the code stream, specifically, obtain the threshold value from the sequence header, image header, slice/strip or SEI.
  • Method 2 The decoder adopts the fixed threshold agreed with the encoder.
  • Method 3 Obtain the threshold index number from the code stream, specifically, obtain the threshold index number from the sequence header, image header, Slice/strip or SEI. Then, the decoder constructs a threshold candidate list in the same way as the encoder, and obtains the corresponding threshold in the threshold candidate list according to the threshold index number.
  • the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
  • Step 1514 same as step 1414.
  • FIG. 13A shows a specific implementation process 1600 provided by Embodiment 3 of the present application, and the operation steps are as follows:
  • Step 1601 Same as step 1501, this step is specifically implemented by the coding network 204 in FIG. 3C , for details, please refer to the above description of the coding network 20;
  • Step 1602 Same as step 1502, this step is specifically implemented by side information extraction 214 in FIG. 3C;
  • Step 1603 To feature map Perform probability estimation to obtain the probability estimation results of each feature element
  • Probability distribution models can be used to obtain probability estimates.
  • the probability distribution model may be: a single Gaussian model or an asymmetric Gaussian model or a mixed Gaussian model or a Laplace distribution model.
  • the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
  • the side information or the context information is input into the probability estimation network, and the feature map
  • Each feature element in Perform probability estimation to obtain the values of the model parameter mean parameter ⁇ and variance ⁇ , that is, the result of probability estimation.
  • the probability distribution models the Laplace distribution model
  • the side information Or the context information is input into the probability estimation network, and the feature map Each feature element in Probability estimation is performed to obtain the values of the model parameter position parameter ⁇ and scale parameter b, that is, the result of probability estimation.
  • the probability estimation result is input into the used probability distribution model to obtain the probability distribution.
  • the probability estimation network To treat the encoded feature map
  • the current feature element to be encoded is obtained Take the probability P of the value m.
  • m is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the probability estimation network may use a network based on deep learning, such as a recurrent neural network and a convolutional neural network, etc., which are not limited here.
  • Step 1604 Determine whether to perform entropy coding on the current feature element to be coded according to the probability estimation result. Perform entropy coding on the current feature element to be coded according to the judgment result and write it into the coded stream or not perform entropy coding. Only when it is determined that entropy encoding needs to be performed on the current feature element to be encoded, entropy encoding is performed on the current feature element to be encoded.
  • the probability estimation result 211 is input into the judgment module, and the output and feature map Decision information 217 with the same dimension.
  • the decision information 217 may be a three-dimensional decision map.
  • the judging module can be realized by using a network method, that is, the probability estimation result or probability distribution is input into the generation network shown in FIG. 7 , and the network outputs a decision map map.
  • the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
  • the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position The value of high probability is k, and the decision map map[x][y][i] is not a preset value, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
  • the decision information is related to the feature map A decision map map with the same dimensions.
  • the decision map map[x][y][i] represents the value at the coordinate position (x, y, i) in the decision map map.
  • the default value is a specific value. For example, when the optional values of the feature element to be encoded are 0 and 1, the default value is 0 or 1; coded feature element When there are multiple optional values, the default value is some specific value, such as the current feature element to be encoded When the optional value of the element is 0-255, the default value is a proper subset of 0-255.
  • the probability estimation result or probability distribution of the current feature element to be encoded is input into the judgment module, and the judgment module directly outputs decision information on whether the current feature element to be encoded needs to perform entropy coding.
  • the decision information output by the judging module is a preset value
  • the decision information output by the judging module is not a preset value
  • the judging module can be implemented by means of a network, that is, the probability estimation result or probability distribution is input into the generation network shown in FIG. 7 , and the network outputs decision information, ie, a preset value.
  • the decision information is related to the feature map
  • the decision map map with the same dimension the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
  • the decision map map[x][y][i] is not a preset value indicating the current feature element to be encoded at the corresponding position
  • the value of high probability is k
  • the decision map map[x][y][i] is 0, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
  • the default value is a specific value. For example, when the optional values of the characteristic element are 0 and 1, the default value is 0 or 1; when the characteristic element in the decision map map When there are multiple optional values, the default value is some specific value, such as feature element When the optional value of the element is 0-255, the default value is a proper subset of 0-255.
  • Method 2 The decision information is related to the feature map
  • the decision map map with the same dimension, the decision map map[x][y][i] is greater than or equal to the threshold T0 indicates the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
  • the decision map map[x][y][i] is less than the threshold T0, indicating the current feature element to be encoded at the corresponding position
  • the value of high probability is k, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
  • T0 can be the mean value within the numerical range.
  • the decision information can also be the identifier or the value of the identifier directly output by the joint network.
  • the decision information is a preset value, it means that the current feature element to be encoded needs to perform entropy encoding, and the decision information output by the judgment module is not the preset value.
  • set to a value it means that the current feature element to be encoded does not need to perform entropy encoding.
  • the optional values of the identifier or the value of the identifier are 0 and 1, the default value is 0 or 1 accordingly.
  • the logo or the value of the logo can also have multiple optional values, the default value is some specific value. For example, when the optional value of the logo or the value of the logo is 0-255, the default value is a proper subset of 0-255 .
  • the high probability refers to: the current feature element to be encoded
  • the probability is high and greater than the threshold P, where P can be a number greater than 0.9, such as 0.9, 0.95 or 0.98.
  • Step 1605 The encoder sends or stores the compressed code stream.
  • pair feature map At least one of the characteristic elements executes the above steps 1601 to 1604 to obtain the compressed code stream, and transmit the compressed code stream to the decoding end.
  • Step 1611 Obtain the compressed code stream to be decoded
  • Step 1612 Feature map to be decoded Perform probability estimation to obtain the probability estimation results of each feature element
  • This step can be specifically implemented by the probability estimation 302 in FIG. 13B , and for details, refer to the above description of the probability estimation 40 .
  • Obtain side information from code stream Use the method in step 1603 to obtain the probability estimation result of the feature element to be decoded currently
  • Step 1613 Obtain decision information, and judge whether to perform entropy decoding according to the decision information.
  • This step can be specifically implemented by the generation network 310 and the decoding decision implementation 304 in FIG. 13B , and for details, refer to the above description of the generation network 46 and the decoding decision implementation 30 .
  • the decision information 311 is acquired using the same method as that of the encoder in this embodiment.
  • the decision map map[x][y][i] is a preset value indicating the current feature element to be decoded at the corresponding position Entropy decoding is required, and entropy decoding is performed on the current feature element to be decoded according to the probability distribution.
  • the decision map map[x][y][i] is not a preset value, indicating the current feature element to be decoded at the corresponding position Entropy decoding is not required, which means the corresponding position is a specific value k.
  • the probability estimation result or probability distribution of the feature element to be decoded is input into a judgment module, and the judgment module directly outputs decision information on whether the feature element to be decoded currently needs to perform entropy decoding.
  • the decision information output by the judging module is a preset value
  • the decision information output by the judging module is not a preset value
  • it means that the current feature element to be decoded does not need to perform entropy decoding.
  • the judging module can be implemented by means of a network, that is, the probability estimation result or probability distribution is input into the generating network shown in FIG. 8 , and the network outputs decision information, ie, a preset value.
  • the decision information is used to indicate whether to perform entropy decoding on the feature element currently to be decoded, and the decision information may include a decision map.
  • Step 1614 Step 1414 is the same.
  • the above k value at the decoding end is set corresponding to the k value at the encoding end.
  • Figure 14 shows a specific implementation process 1700 of Embodiment 4 of the present application, and the operation steps are as follows:
  • Step 1701 Same as step 1501, this step can be specifically implemented by the encoding network 204 in FIG. 3D , and can refer to the above description of the encoding network 20 for details;
  • Step 1702 Same as step 1502, this step is specifically implemented by side information extraction 214 in FIG. 3D;
  • Step 1703 Obtain feature map Probability estimation results and decision information of each feature element in ;
  • This step can be specifically implemented by the federation network 218 in FIG. 3D , and for details, reference can be made to the above description of the federation network 34 .
  • the side information and/or contextual information are input into the joint network, and the joint network outputs the feature map to be encoded
  • the network structure can be used as shown in Figure 15.
  • decision information, probability distribution and/or probability estimation results can all be output from different layers of the joint network. For example: case 1) the middle layer of the network outputs decision information, and the last layer outputs probability distribution and/or probability estimation results; case 2) the middle layer of the network outputs probability distribution and/or probability estimation results, and the last layer outputs decision information; case 3) the network The final layer outputs decision information together with probability distribution and/or probability estimation results.
  • the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
  • the side information or the context information is input into the joint network to obtain the values of the model parameter mean parameter ⁇ and variance ⁇ , that is, the probability estimation result.
  • the probability estimation result is input into the Gaussian model to obtain the probability distribution.
  • the probability distribution models the Laplace distribution model
  • the side information Or the context information is input into the joint network to obtain the value of the model parameter position parameter ⁇ and scale parameter b, that is, the probability estimation result. Further, the probability estimation result is input into the Laplace distribution model to obtain the probability distribution.
  • the side information and/or context information into the joint network to obtain the current feature elements to be encoded probability distribution.
  • the current feature element to be encoded is obtained
  • the probability P whose value is m is the probability estimation result.
  • m is any integer, such as 0, 1, -1, -2, 3 and so on.
  • Step 1704 and judge whether to perform entropy coding according to the decision information; perform entropy coding and write into the compressed code stream (encoded code stream) or not perform entropy coding according to the judgment result. Only when it is determined that entropy encoding needs to be performed on the current feature element to be encoded, entropy encoding is performed on the current feature element to be encoded.
  • This step can be specifically implemented by the encoding decision implementation 208 in FIG. 3D , and for details, refer to the description of the above encoding decision implementation 26 .
  • the decision information is related to the feature map
  • the decision map map with the same dimension the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
  • the decision map map[x][y][i] is not a preset value indicating the current feature element to be encoded at the corresponding position
  • the value of high probability is k
  • the decision map map[x][y][i] is 0, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
  • the default value is a specific value. For example, when the optional values of the current feature element to be encoded are 0 and 1, the default value is 0 or 1; when the current feature element to be encoded in the decision map map When there are multiple optional values, the default value is some specific value, such as the current feature element to be encoded When the optional value of the element is 0-255, the default value is a proper subset of 0-255.
  • Method 2 The decision information is related to the feature map
  • the decision map map with the same dimension, the decision map map[x][y][i] is greater than or equal to the threshold T0 indicates the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
  • the decision map map[x][y][i] is less than the threshold T0, indicating the current feature element to be encoded at the corresponding position
  • the value of high probability is k, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
  • T0 can be the mean value within the numerical range.
  • the decision information can also be the identifier or the value of the identifier directly output by the joint network.
  • the decision information is a preset value, it means that the current feature element to be encoded needs to perform entropy encoding, and the decision information output by the judgment module is not the preset value.
  • set to a value it means that the current feature element to be encoded does not need to perform entropy encoding.
  • the preset value is a specific value, for example, when the optional values of the current feature element to be encoded are 0 and 1, the preset value is Set the value to 0 or 1; when there are multiple optional values for the current feature element to be encoded in the joint network output decision map map, the default value is some specific value, for example, the optional value of the current feature element to be encoded is When 0 ⁇ 255, the default value is a proper subset of 0 ⁇ 255.
  • the high probability refers to: the current feature element to be encoded
  • the probability is high, for example, when the value is k, the probability is greater than the threshold P, where P can be a number greater than 0.9, such as 0.9, 0.95 or 0.98.
  • Step 1705 The encoder sends or stores the compressed code stream.
  • Step 1711 Obtain the code stream of the feature map of the image to be decoded, and obtain side information from the code stream
  • Step 1712 Obtain feature map Probability estimation results and decision information for each feature element in
  • This step can be specifically implemented by the federated network 312 in FIG. 16 , and for details, refer to the above description of the federated network 34 .
  • Get feature map The method of the probability estimation result and decision information of each feature element in is the same as step 1703.
  • Step 1713 Determine whether to perform entropy decoding according to the decision information; perform or not perform entropy decoding according to the judgment result. This step can be specifically implemented by the decoding decision implementation 304 in FIG.
  • the decision information is the decision map map
  • the decision map map[x][y][i] is a preset value indicating the current feature element to be decoded at the corresponding position Entropy decoding is required, and entropy decoding is performed on the current feature element to be decoded according to the probability distribution.
  • the decision map map[x][y][i] is not a preset value, indicating the current feature element to be decoded at the corresponding position Entropy decoding is not required, which means the corresponding position Set to a specific value k.
  • Method 2 The decision information is related to the feature map
  • the decision map map with the same dimension, the decision map map[x][y][i] is greater than or equal to the threshold T0 indicates the current feature element to be decoded at the corresponding position Entropy decoding is required.
  • the decision map map[x][y][i] is less than the threshold T0, indicating the current feature element to be decoded at the corresponding position
  • the high probability value is k, indicating the current feature element to be decoded at the corresponding position Entropy decoding is not required, which means the corresponding position Set to a specific value k.
  • the value of T0 is the same as that of the encoding end.
  • the decision information can also be the identifier or the value of the identifier directly output by the joint network.
  • the decision information is a preset value, it means that the current feature element to be decoded needs to perform entropy decoding, and the judgment module outputs the decision information
  • it is not a preset value it means that the current feature element to be decoded does not need to perform entropy decoding, and the value of the current feature element to be decoded is set to k.
  • the preset value is a specific value, for example, when the optional values of the current feature element to be decoded are 0 and 1, the preset value is Set the value to 0 or 1; when there are multiple optional values for the current feature element to be decoded in the joint network output decision map map, the default value is some specific value, for example, the optional value of the current feature element to be decoded is When 0 ⁇ 255, the default value is a proper subset of 0 ⁇ 255.
  • Step 1714 Same as step 1414, this step can be specifically implemented by the decoding network unit 306 in the decoder 9C of the above-mentioned embodiment, and details can refer to the description of the decoding network unit 306 in the above-mentioned embodiment.
  • the above k value at the decoding end is set corresponding to the k value at the encoding end.
  • FIG. 17 shows the specific implementation process 1800 of Embodiment 5 of the present application, and the operation steps are as follows:
  • Step 1801 Obtain the characteristic variables of the audio data to be encoded
  • the audio signal to be encoded can be a time-domain audio signal; the audio signal to be encoded can be a frequency-domain signal obtained after the time-domain signal undergoes time-frequency transformation, for example, the frequency-domain signal can be a frequency-domain signal obtained by MDCT transforming the time-domain audio signal, The frequency domain signal after the time domain audio signal is transformed by FFT; the signal to be encoded can also be a signal after QMF filtering; the signal to be encoded can also be a residual signal, such as other encoded residual signal or LPC filtered residual signal .
  • Obtaining the feature variable of the audio data to be encoded it may be to extract the feature vector according to the audio signal to be encoded, for example, to extract the Mel cepstral coefficient according to the audio signal to be encoded; quantize the extracted feature vector, and use the quantized feature vector as the feature vector to be encoded Feature variables for audio data.
  • the audio signal to be encoded is processed by the encoding neural network to obtain the latent variable, the latent variable output by the neural network is quantified, and the quantized potential variable as the characteristic variable of the audio data to be encoded.
  • the encoding neural network processing is pre-trained, and the present invention does not limit the specific network structure and training method of the encoding neural network.
  • the encoding neural network can choose a fully connected network or a CNN network.
  • the present invention also does not limit the number of layers included in the coding neural network and the number of nodes in each layer.
  • the form of latent variables output by encoding neural networks with different structures may be different.
  • the encoding neural network is a fully connected network, and the output latent variable is a vector.
  • the encoding neural network is a CNN network, and the output latent variable is an N*M dimensional matrix, where N is the number of channels (channels) of the CNN network, and M is the size (latent size) of each channel latent variable of the CNN network, such as
  • a specific method for quantizing the latent variable output by the neural network may be to perform scalar quantization on each element of the latent variable, and the quantization step size of the scalar quantization may be determined according to different encoding rates. There may also be a bias in scalar quantization, for example, the latent variable to be quantized is biased and then scalar quantized according to the determined quantization step size.
  • the quantification method for quantifying latent variables can also be implemented using other existing quantification techniques, which is not limited in the present invention.
  • the quantized feature vector or the quantized latent variable can be written as That is, the feature variable of the audio data to be encoded.
  • Step 1802 Characteristic variables of audio data to be encoded Input side information extraction module, output side information
  • the side information extraction module can be implemented using the network shown in Figure 12, the side information Can be understood as the feature variable
  • the feature variables obtained by further extraction The number of feature elements contained is greater than the feature variable few.
  • edge information Entropy encoding is performed and written into the code stream, and side information can also be checked in subsequent step 1804 Perform entropy encoding and write code stream, which is not limited here.
  • Step 1803 For feature variables Perform probability estimation to obtain the probability estimation results of each feature element.
  • Probability distribution models can be used to obtain probability estimates and probability distributions.
  • the probability distribution model may be: a single Gaussian model (Gaussian single model, GSM) or an asymmetric Gaussian model or a mixed Gaussian model (Gaussian mixture model, GMM) or a Laplace distribution model (Laplace distribution).
  • the following feature variables Take an N*M dimensional matrix as an example for illustration.
  • the current characteristic variable to be encoded The characteristic elements in
  • the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
  • the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
  • Each feature element in Probability estimation is performed to obtain the values of the mean parameter ⁇ and variance ⁇ .
  • the mean parameter ⁇ and variance ⁇ are input into the used probability distribution model to obtain a probability distribution.
  • the probability estimation result is the mean parameter ⁇ and variance ⁇ .
  • the variance can also be estimated by value.
  • the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
  • the side information Or contextual information input probability estimation network, for feature variables
  • Each feature element in Probability estimation is performed to obtain the value of the variance ⁇ .
  • the variance ⁇ is input into the used probability distribution model to obtain a probability distribution.
  • the probability estimation result is the variance ⁇ .
  • the probability distribution models the Laplace distribution model
  • the side information Or contextual information input probability estimation network for feature map variables
  • Each feature element in Probability estimation is performed to obtain the values of the location parameter ⁇ and the scale parameter b.
  • the position parameter ⁇ and the scale parameter b are input into the probability distribution model used to obtain a probability distribution.
  • the probability estimation result is the position parameter ⁇ and the scale parameter b.
  • the probability estimation network to treat the encoded feature map
  • the current feature element to be encoded is obtained Take the probability P of the value m.
  • the probability estimation result is the current feature element to be encoded Take the probability P of the value m.
  • the probability estimation network can use a network based on deep learning, such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
  • a network based on deep learning such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
  • Step 1804 Judging whether entropy coding is required for the current feature element to be coded according to the probability estimation result, and performing entropy coding and writing into the compressed code stream (coded code stream) or not performing entropy coding according to the judgment result.
  • Judging the current feature element to be encoded according to the probability estimation result Whether entropy coding needs to be performed can use one or more of the following methods.
  • the parameters j, i are positive integers, and the coordinates (j, i) indicate the current position of the feature element to be encoded.
  • judge the current feature element to be encoded according to the probability estimation result Whether entropy coding needs to be performed can use one or more of the following methods.
  • the parameter i is a positive integer
  • the coordinate i represents the current position of the feature element to be encoded.
  • the following is to judge the current feature element to be encoded according to the probability estimation result Whether it is necessary to perform entropy coding as an example to illustrate, to determine the current feature elements to be coded Whether entropy coding needs to be performed is similar, and will not be repeated here.
  • Method 1 When the probability distribution model is a Gaussian distribution, judge whether to perform entropy coding on the current feature element to be encoded according to the probability estimation result of the first feature element, when the mean value of the Gaussian distribution of the current feature element to be encoded is The value of parameter ⁇ and variance ⁇ satisfies the second condition: when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2, there is no need for the current feature element to be encoded Execute the entropy encoding process, otherwise, when the first condition is met: when the absolute value of the difference between the mean value ⁇ and k is greater than or equal to the second threshold T1 or the variance ⁇ is less than the third threshold T2, the current feature element to be encoded Perform entropy encoding to write code stream.
  • the first condition when the absolute value of the difference between the mean value ⁇ and k is greater than or equal to the second threshold T1 or the variance ⁇
  • k is any integer, such as 0, 1, -1, 2, 3 and so on.
  • the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, such as 0.2, 0.3, 0.4, etc.
  • T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
  • the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, then skip the feature element to be encoded Perform the entropy encoding process, otherwise, the current feature element to be encoded Perform entropy encoding to write code stream.
  • the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, for example, the value is 0.2, 0.3, 0.4 and so on.
  • T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
  • Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the mean parameter ⁇ and variance ⁇ of the Gaussian distribution, when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (second condition), skip the current feature element to be encoded Carry out the entropy encoding process, where abs( ⁇ -k) means calculating the absolute value of the difference between the mean value ⁇ and k; otherwise, when the probability estimation result of the current feature element to be encoded satisfies abs( ⁇ -k)+ ⁇ T3 ( The first condition), for the current feature element to be encoded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the fourth threshold T3 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4 and so on.
  • the probability distribution is a Gaussian distribution
  • the characteristic variable Each feature element in Perform probability estimation to get only the current feature elements to be encoded The value of the variance ⁇ of the Gaussian distribution, when the variance ⁇ satisfies ⁇ T3 (the second condition), skip the current feature element to be encoded Carry out the entropy coding process; otherwise, when the probability estimation result of the current feature element to be encoded satisfies ⁇ T3 (the first condition), the current feature element to be encoded Perform entropy encoding to write code stream.
  • the fourth threshold T3 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4 and so on.
  • Method 3 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
  • abs( ⁇ -k)+ ⁇ T4 the second condition
  • skip the current feature element to be encoded Perform entropy coding process, where abs( ⁇ -k) means calculating the absolute value of the difference between position parameter ⁇ and k;
  • the first condition for the current feature element to be encoded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the fourth threshold T4 is a number greater than or equal to 0 and less than 0.5, such as 0.05, 0.09, 0.17 and so on.
  • Method 4 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
  • the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 and the scale parameter b is less than the third threshold T6 (second condition)
  • skip the current feature element to be encoded Perform an entropy encoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (the first condition)
  • the current feature element to be encoded Perform entropy encoding to write code stream.
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • the value of T5 is 1e-2
  • the value of T6 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17, etc.
  • the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, then skip the feature element to be encoded Perform the entropy encoding process, otherwise, the current feature element to be encoded Perform entropy encoding to write code stream.
  • the value of the threshold T5 is 1e-2
  • the value of T2 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17 and so on.
  • Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution. When the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and the sum of any variance of the mixed Gaussian distribution is less than the fifth threshold T7 (second condition), skip the current feature element to be encoded Carry out the entropy encoding process; Otherwise, when the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (first condition), the current Feature elements to be coded Perform entropy encoding to write code stream.
  • T7 second condition
  • k is any integer, such as 0, 1, -1, -2, 3 and so on.
  • T7 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4, etc. (It can be considered that the threshold value of each feature element is the same)
  • Method 6 Obtain the current feature element to be encoded according to the probability distribution Take the probability P of k, when the probability estimation result P of the current feature element to be encoded satisfies the second condition: when P is greater than (or equal to) the first threshold T0, skip the entropy encoding process of the current feature element to be encoded; otherwise , when the probability estimation result P of the current feature element to be encoded satisfies the first condition: when P is smaller than the first threshold T0, perform entropy encoding on the current feature element to be encoded and write it into the code stream.
  • k can be any integer, such as 0, 1, -1, 2, 3 and so on.
  • the first threshold T0 is any number satisfying 0 ⁇ T0 ⁇ 1, such as 0.99, 0.98, 0.97, 0.95 and so on. (It can be considered that the threshold value of each feature element is the same)
  • the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
  • Method 1 Take the threshold T1 as an example, take any value within the value range of T1 as the threshold T1, and write the threshold T1 into the code stream. Specifically, the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
  • Method 2 The encoding end adopts the fixed threshold value agreed with the decoding end, and there is no need to write the code stream or transmit it to the decoding end. For example, taking the threshold T1 as an example, any value within the value range of T1 is directly taken as the value of T1. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
  • Method 3 Build a threshold candidate list, put the most likely value within the value range of T1 into the threshold candidate list, each threshold corresponds to a threshold index number, determine an optimal threshold, and use the optimal threshold as T1 , and use the index number of the optimal threshold as the threshold index number of T1, and write the threshold index number of T1 into the code stream.
  • the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
  • Step 1805 The encoder sends or stores the compressed code stream.
  • Step 1811 Obtain the code stream of the audio feature variable to be decoded
  • Step 1812 Obtain the probability estimation result of each feature element
  • side information Perform entropy decoding to obtain side information combined side information Treat decoded audio feature variables Each feature element in Perform probability estimation to obtain the current feature element to be decoded The probability estimation result of .
  • the parameters j, i are positive integers, and the coordinates (j, i) indicate the current position of the feature element to be decoded.
  • side information Perform entropy decoding to obtain side information combined side information Treat decoded audio feature variables Probability estimation is performed for each feature element [i] in , and the current feature element to be decoded is obtained The probability estimation result of .
  • the parameter i is a positive integer
  • the coordinate i represents the current position of the feature element to be decoded.
  • the probability estimation method used by the decoder is the same as the probability estimation method of the encoder in this embodiment, and the probability estimation network structure diagram is the same as the probability estimation network structure of the encoder in this embodiment, so details are not repeated here.
  • Step 1813 According to the probability estimation result, judge whether the current feature element to be decoded needs to perform entropy decoding, and perform or not perform entropy decoding according to the judgment result, and obtain the decoded feature variable
  • Judging the current feature element to be decoded according to the probability estimation result Whether entropy decoding needs to be performed can use one or more of the following methods. Or, judge the current feature element to be decoded according to the probability estimation result Whether entropy decoding needs to be performed can use one or more of the following methods.
  • Method 1 When the probability distribution model is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The value of the mean value parameter ⁇ and variance ⁇ of the value, when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2 (second condition), the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Perform an entropy decoding process; otherwise, when the absolute value of the difference between the mean ⁇ and k is less than the second threshold T1 or the variance ⁇ is greater than or equal to the third threshold T2 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • the third threshold T2 the first condition
  • the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, the current feature element to be decoded If the value of is set to k, skip the current feature element to be decoded Perform entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded
  • the value of the mean value parameter ⁇ and variance ⁇ of the value when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (the second condition), T3 is the fourth threshold, and the current feature element to be decoded
  • the value of is set to k, skipping the current feature element to be decoded Perform the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T3 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • the probability distribution is a Gaussian distribution
  • T3 is the fourth threshold
  • the current feature element to be decoded The value of is set to 0, skipping the current feature element to be decoded Carry out the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies ⁇ T3 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 3 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
  • T4 is the fourth threshold, and the current feature element to be decoded
  • the value of is set to k, skipping the feature element Carry out the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T4 (the first condition), the feature element Perform entropy decoding to obtain feature elements value.
  • Method 4 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
  • the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Perform an entropy decoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, the current feature element to be decoded
  • the value of is set to k, skipping the current feature element to be decoded Perform entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution.
  • the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Carry out the entropy decoding process, otherwise, when the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (first condition), the current Feature elements to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
  • Method 6 According to the probability distribution of the current feature element to be decoded, the probability P of the value of the current feature element to be decoded is k, that is, the probability estimation result P of the current feature element to be decoded.
  • the probability estimation result P satisfies the second condition: P
  • the value of the current feature element to be decoded is set to k; otherwise, when the current feature element to be decoded satisfies the first condition: P is less than or equal to the
  • the first threshold is T0, entropy decoding is performed on the code stream to obtain the value of the first feature element.
  • the above k value at the decoding end is set corresponding to the k value at the encoding end.
  • the method of obtaining thresholds T0, T1, T2, T3, T4, T5, T6 and T7 corresponds to the encoding end, and one of the following methods can be used:
  • Method 1 Obtain the threshold value from the code stream, specifically, obtain the threshold value from the sequence header, image header, slice/strip or SEI.
  • Method 2 The decoder adopts the fixed threshold agreed with the encoder.
  • Method 3 Obtain the threshold index number from the code stream, specifically, obtain the threshold index number from the sequence header, image header, Slice/strip or SEI. Then, the decoder constructs a threshold candidate list in the same way as the encoder, and obtains the corresponding threshold in the threshold candidate list according to the threshold index number.
  • the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
  • Step 1814 Decoded feature variables Reconstruction, or input into the machine-oriented auditory task module to perform corresponding machine tasks.
  • This step can be specifically implemented by the decoding network 306 in FIG. 10B , and for details, reference can be made to the above description of the decoding network 34 .
  • Case 1 Feature variables after entropy decoding Input the image reconstruction module, and the output of the neural network is reconstructed audio.
  • the neural network can adopt any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like.
  • the neural network can adopt a multi-layer deep neural network structure to achieve better estimation results.
  • Case 2 Feature variables after entropy decoding
  • the input is oriented to the machine auditory task module to perform corresponding machine tasks.
  • complete machine auditory tasks such as audio classification and recognition.
  • the above k value at the decoding end is set corresponding to the k value at the encoding end.
  • FIG. 18 is a schematic structural diagram of an exemplary encoding device of the present application.
  • the device in this example may correspond to an encoder 20A.
  • the apparatus may include: an obtaining module 2001 and an encoding module 2002 .
  • Obtaining module 2001 may include encoding network 204, rounding 206 (optional), probability estimation 210, side information extraction 214, generation network 216 (optional) and joint network 218 (optional) in the foregoing embodiments.
  • the encoding module 2002 includes the encoding decision implementation 208 in the previous embodiments. in,
  • Obtaining module 2001 configured to acquire feature data to be encoded, the feature data to be encoded includes a plurality of feature elements, the plurality of feature elements include a first feature element, and is used to acquire a probability estimate of the first feature element Result; the coding module 2002 is configured to judge whether to perform entropy coding on the first feature element according to the probability estimation result of the first feature element; only when it is judged that entropy coding needs to be performed on the first feature element, Entropy coding is performed on the first feature element.
  • the judging whether to perform entropy coding on the first feature element of the feature data includes: when the probability estimation result of the first feature element of the feature data satisfies a preset condition, it is necessary to perform entropy coding on the first feature element of the feature data Entropy encoding of the first feature element of the feature data; when the probability estimation result of the first feature element of the feature data does not meet the preset condition, entropy encoding of the first feature element of the feature data is not required.
  • the encoding module is further configured to judge according to the probability estimation result of the feature data: the probability estimation result of the feature data is input into a generation network, and the network outputs decision information.
  • the value of the decision information of the first feature element is 1, it is necessary to encode the first feature element of the feature data; when the value of the decision information of the first feature element is not 1, it is not necessary A first feature element of the feature data is encoded.
  • the preset condition is that the probability that the first feature element takes a value of k is less than or equal to a first threshold, where k is an integer.
  • the preset condition is that the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to the second threshold or the first The variance of the feature elements is greater than or equal to a third threshold, where k is an integer.
  • the preset condition is the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element and the probability distribution of the first feature element
  • the sum of the variances of is greater than or equal to the fourth threshold, where k is an integer.
  • the probability value that the first feature element takes a value of k is the maximum probability value among the probability values of all possible values of the first feature element.
  • probability estimation is performed on the feature data to obtain a probability estimation result of each feature element in the feature data, wherein the probability estimation result of the first feature element includes the first feature element The probability value of , and/or the first parameter of the probability distribution and the second parameter of the probability distribution.
  • a probability estimation result of the feature data is input into a generation network to obtain decision information of the first feature element. According to the decision information of the first feature element, it is judged whether to perform entropy coding on the first feature element.
  • the decision information of the characteristic data is a decision diagram
  • the value corresponding to the position of the first characteristic element in the decision diagram is a preset value
  • the value corresponding to the position of the first feature element in the decision diagram is not a preset value
  • the encoding module is further configured to construct a threshold candidate list of the first threshold, put the first threshold into the threshold candidate list of the first threshold and correspond to the first threshold An index number of a threshold, writing the index number of the first threshold into the encoded code stream, wherein the length of the threshold candidate list of the first threshold can be set to T; T is an integer greater than or equal to 1.
  • the device of this embodiment can be used in the technical solutions implemented by the encoder in the method embodiments shown in FIGS. 3A-3D , and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 19 is a schematic structural diagram of an exemplary decoding device of the present application. As shown in FIG. 19 , the device in this example may correspond to a decoder 30 .
  • the apparatus may include: an obtaining module 2101 and a decoding module 2102 .
  • Obtaining module 2101 may include probability estimation 302 , generation network 310 (optional) and joint network 312 in the foregoing embodiments.
  • the decoding module 2102 includes the decoding decision implementation 304 and the decoding network 306 in the foregoing embodiments. in,
  • Obtaining module 2101 configured to obtain a code stream of feature data to be decoded, the feature data to be decoded includes a plurality of feature elements, and the plurality of feature elements include a first feature element; acquire a probability estimation result of the first feature element ;
  • the decoding module 2102 is configured to judge whether to perform entropy decoding on the first feature element according to the probability estimation result of the first feature element; only when it is determined that entropy decoding needs to be performed on the first feature element, the The first feature element performs entropy decoding.
  • the judging whether to entropy decode the first feature element of the feature data includes: when the probability estimation result of the first feature element of the feature data satisfies a preset condition, the Decoding the first feature element of the feature data; or when the probability estimation result of the first feature element of the feature data does not meet the preset condition, there is no need to decode the first feature element of the feature data, and the first feature element of the feature data
  • the eigenvalues are set to k; where k is an integer.
  • the decoding module is further configured to judge according to the probability estimation result of the characteristic data: the probability estimation result of the characteristic data is input into a judgment network module, and the network outputs decision information.
  • the value of the first feature element position corresponding to the feature data in the decision information is 1, decode the first feature element of the feature data; when the first feature element corresponding to the feature data in the decision information
  • the value of the feature element position is not 1, the first feature element of the feature data is not decoded, and the feature value of the first feature element is set to k, where k is an integer.
  • the preset condition is that the probability that the first feature element takes a value of k is less than or equal to a first threshold, where k is an integer.
  • the preset condition is that the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to the second threshold or the The variance of the probability distribution of the first feature element is greater than or equal to the third threshold.
  • the preset condition is that the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element and the A sum of variances of the probability distributions is greater than or equal to a fourth threshold.
  • probability estimation is performed on the feature data to obtain a probability estimation result of each feature element in the feature data, wherein the probability estimation result of the first feature element includes the first feature element The probability value of , and/or the first parameter of the probability distribution and the second parameter of the probability distribution.
  • the probability value that the first feature element takes a value of k is the maximum probability value among the probability values of all possible values of the first feature element.
  • the probability estimation result of the Nth feature element includes at least one of the following items: the probability value of the Nth feature element, the first parameter of the probability distribution, and the second parameter of the probability distribution and decision information.
  • the value of the first feature element position corresponding to the feature data in the decision information is 1, decode the first feature element of the feature data; when the first feature element corresponding to the feature data in the decision information
  • the value of the feature element position is not 1, the first feature element of the feature data is not decoded, and the feature value of the first feature element is set to k, where k is an integer.
  • the probability estimation result of the feature data is input into the generation network to obtain the decision information of the first feature element; when the value of the decision information of the first feature element is a preset value , judging that entropy decoding needs to be performed on the first feature element;
  • the feature value of the feature element is set to k, where k is an integer and k is one of multiple candidate values of the first feature element.
  • the obtaining module is further configured to construct a threshold candidate list of the first threshold, and obtain an index number of the threshold candidate list of the first threshold by decoding the code stream, and The value of the threshold candidate list position of the first threshold corresponding to the index number of the first threshold is used as the value of the first threshold, wherein the length of the threshold candidate list of the first threshold can be set to T; T is An integer greater than or equal to 1.
  • the device of this embodiment can be used in the technical solutions implemented by the decoder in the method embodiments shown in FIGS. 10B , 13B, and 16 , and its implementation principles and technical effects are similar, and will not be repeated here.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD) and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the techniques of the present application may be implemented in a wide variety of devices or devices, including wireless handsets, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules, or units are described in this application to emphasize functional aspects of the means for judging the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请提供了图像或音频编解码方法和装置,涉及基于人工智能(AI)的图像或音频的编解码技术领域,具体涉及基于神经网络的图像特征图或音频特征变量的编解码技术领域。其中编码方法包括:获取待编码目标,所述待编码目标包括多个特征元素,所述多个特征元素包括第一特征元素。所述方法还包括获取所述第一特征元素的概率估计结果,根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。本申请根据概率估计结果来判断是否对特征元素编码。这样,能够在不影响编解码性能情况下降低编解码复杂度,其中待编码目标包括图像特征图或音频特征变量。

Description

特征数据编解码方法和装置 技术领域
本发明实施例涉及基于人工智能(AI)的图像或音频压缩技术领域,尤其涉及一种特征数据编解码方法及装置。
背景技术
图像或音频编码和解码(简称为编解码)广泛用于数字图像或音频应用,例如广播数字电视、互联网和移动网络上的图像或音频传输、视频或语音聊天、和视频或语音会议等实时会话应用、DVD和蓝光光盘、图像或音频内容采集和编辑系统以及可携式摄像机的安全应用。视频由多帧图像组成,因此本申请中的图像可以是单独的图像,也可以为视频中的图像。
即使在影片较短的情况下也需要对大量的视频数据进行描述,当数据要在带宽容量受限的网络中发送或以其它方式传输时,这样可能会造成困难。因此,图像(或音频)数据通常要先压缩然后在现代电信网络中传输。由于内存资源可能有限,当在存储设备上存储视频时,图像(或音频)的大小也可能成为问题。图像(或音频)压缩设备通常在信源侧使用软件和/或硬件,以在传输或存储之前对图像(或音频)数据进行编码,从而减少用来表示数字图像(或音频)所需的数据量。然后,压缩的数据在目的地侧由图像(或音频)解压缩设备接收。在有限的网络资源以及对更高图像(或音频)质量的需求不断增长的情况下,需要改进压缩和解压缩技术,这些改进的技术能够提高压缩率而几乎不影响图像(或音频)质量。
近年来,将深度学习应用于在图像(或音频)编解码领域逐渐成为一种趋势。如谷歌已连续几年在CVPR(IEEE Conference on Computer Vision and Pattern Recognition)会议上组织CLIC(Challenge on Learned Image Compression)专题竞赛,CLIC专注使用深度神经网络来提升图像的压缩效率,在2020年CLIC中还加入了图像挑战类别。基于竞赛方案的性能评估,当前基于深度学习技术的图像编解码方案的综合压缩效率已经与最新一代视频图像编解码标准VVC(Versatile Video Coding)相当,而且在提升用户感知质量方面有独特优势。
VVC的视频标准制定工作已于2020年6月完成,标准收纳几乎所有能够带来显著压缩效率提升的技术算法。因此沿传统信号处理路径继续研究新型的压缩编码算法在短时间内难以获得大的技术突破。区别于传统图像算法通过人工设计来对图像压缩的各模块进行优化,端到端AI的图像压缩是作为一个整体共同进行优化,因此AI图像压缩方案的压缩 效果更好。变分自编码器(Variational Autoencoder,AE)方法是当前AI图像有损压缩技术的主流技术方案。在目前的主流技术方案是待编码图像通过编码网络获得图像特征图,并进一步对图像特征图执行熵编码,但是熵编码过程存在着复杂度过高的问题。
发明内容
本申请提供一种特征数据的编解码方法和装置,能够在不影响编解码性能情况下降低编解码复杂度。
第一方面,提供了一种特征数据的编码方法,包括:
获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
获取所述第一特征元素的概率估计结果;
根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;
仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
其中,所述特征数据包括图像特征图,或音频特征变量,或图像特征图和音频特征变量。可以为编码网络所输出的一维、二维或多维数据,其中每个数据均为特征元素。需要说明的是,本申请中特征点和特征元素的含义相同。
具体的,所述第一特征元素为待编码特征数据中的任意一待编码特征元素。
一种可能性中,获取所述第一特征元素的概率估计结果的概率估计过程可以通过概率估计网络实现;在另一种可能性中,概率估计过程可以采用传统非网络的概率估计方法对特征数据进行概率估计。
需要说明的是,当只有边信息作为概率估计的输入时,可以并行输出各特征元素的概率估计结果;当概率估计的输入包括有上下文信息时,需要串行输出各特征元素概率估计结果。其中所述边信息为特征数据输入神经网络进一步提取得到的特征信息,所述边信息包含的特征元素的个数比特征数据的特征元素少。可选地,可以将特征数据的边信息编入码流。
一种可能性中,当所述特征数据的第一特征元素不满足预设条件下,不需要对所述特征数据的第一特征元素执行熵编码。
具体的,假如当前的第一特征元素为特征数据的第P个特征元素,则完成第P个特征元素的判断和根据判断结果执行或不执行熵编码后,开始特征数据的第P+1个特征元素的判断和根据判断结果执行或不执行熵编码过程,其中P为正整数且P小于M,其中M为整个的特征数据中特征元素的数量。比如对第二特征元素,当判断不需要对所述第二特征元 素执行熵编码时,则对所述第二特征元素跳过执行熵编码。
上述技术方案中,通过对每个待编码的特征元素进行判定是否需要执行熵编码,从而跳过某些特征元素的熵编码过程,可以显著减少需执行熵编码的元素个数。这样,可以降低熵编码复杂度。
在一种可能的实现方式中,判断是否对所述第一特征元素执行熵编码包括:当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述第一特征元素执行熵编码;或当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述第一特征元素执行熵编码。
在一种可能的实现方式中,所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,所述预设条件为第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数。
k为上述第一特征元素取值所可能的取值范围中的某一取值。比如,第一特征元素可以取值的范围为【-255,255】。k可以设置为0,则对概率值小于或者等于0.5的第一特征元素,执行熵编码。对概率值大于0.5的第一特征元素,不执行熵编码。
在一种可能的实现方式中,所述第一特征元素取值为k的概率值为所述第一特征元素的所有可能的取值的概率值中的最大概率值。
其中,编码码流在低码率情况所选定的第一阈值小于编码码流在高码率情况所选定的第一阈值。具体码率高低与图像的分辨率以及图像内容相关,以公开的Kodak数据集为例,低于0.5bpp为低码率,反之为高码率。
在某一码率情况下,所述第一阈值可以根据实际需要进行配置,此处不做限定。
上述技术方案中,通过灵活的第一阈值设定方式使得产生可以根据要求灵活的降低熵编码复杂度。
在一种可能的实现方式中,所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数。
则当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。则所述预设条件可以为以下任意一种:
所述第一特征元素的概率分布的第一参数与所述第一特征元素取值k的差的绝对值 大于或等于第二阈值;或
所述第一特征元素的概率分布的第二参数大于或等于第三阈值;或
所述第一特征元素的概率分布的第一参数与所述第一特征元素取值k的差的绝对值与所述第一特征元素的概率分布的第二参数的和大于或等于第四阈值。
则当所述概率分布为混合高斯分布时,所述第一特征元素概率分布的第一参数为所述第一特征元素混合高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素混合高斯分布的方差,则所述预设条件可以为以下任意一种:
所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或
所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或
所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值。
则当所述概率分布为非对称高斯分布时,所述第一特征元素概率分布的第一参数为所述第一特征元素非对称高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素非对称高斯分布的第一方差和第二方差,则所述预设条件可以为以下任意一种:
所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或
所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;
所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值。
所述第一特征元素的概率分布为混合高斯分布情况,确定所述第一特征元素的判断取值范围,当所述第一特征元素的概率分布的多个均值均不在所述第一特征元素的判断取值范围。
所述第一特征元素的概率分布为高斯分布情况,确定所述第一特征元素的判断取值范围,当所述第一特征元素的概率分布的均值不在所述第一特征元素的判断取值范围。
所述第一特征元素的概率分布为高斯分布情况,确定所述第一特征元素的判断取值范围,判断取值范围中包括了多个所述第一特征元素可能的取值,当所述第一特征元素的高斯分布的均值参数与所述第一特征元素的判断取值范围中的每个取值的差的绝对值大于或等于第十一阈值,或所述第一特征元素的概率分布的方差大于或等于第十二阈值。
所述第一特征元素的取值不在所述第一特征元素的判断取值范围。
所述第一特征元素的取值对应的概率值小于或等于第十三阈值。
在一种可能的实现方式中,所述方法还包括:构建第一阈值的阈值候选列表,将所述第一阈值放入所述第一阈值的阈值候选列表中且对应有所述第一阈值的索引号,将所述第一阈值的索引号写入编码码流,其中所述第一阈值的阈值候选列表的长度可以设置为T;T为大于或等于1的整数。可以理解的,所述其他的阈值可以采用如第一阈值的阈值候选列表构建方式,且有对应阈值的索引号和写入编码码流中。
具体地,将所述索引号写入码流,可将其保存在序列头(sequence header)、图像头(picture header)、Slice/条带(slice header)或SEI(suplemental enhancement information)中传送到解码端,还可以使用其他方法,在此不做限定。构建候选列表的方式不做限定。
另一种可能性中,根据概率估计结果输入生成网络得到所述决策信息。所述生成网络可以为卷积网络,可以包括多个网络层,任意一网络层可以为卷积层、归一化层、非线性激活层等。
在一种可能的实现方式中,将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,所述决策信息用于指示是否对所述第一特征元素执行熵编码。
在一种可能的实现方式中,所述特征数据的决策信息为决策图,决策图也可以称为决策图map。决策图优选的为二元图,二元图也可以称为二元图map。二元图中特征元素的决策信息取值通常为0或1。因此当所述决策图中对应所述第一特征元素所在位置的值为预设值时,需要对所述第一特征元素执行熵编码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,不需要对所述第一特征元素执行熵编码。
在一种可能的实现方式中,所述特征数据中的特征元素的决策信息为预设值。所述决策信息的预设值取值通常为1,因此当所述决策信息为预设值时,需要对所述第一特征元素执行熵编码;当所述决策信息不为预设值时,不需要对所述第一特征元素执行熵编码。决策信息可以为标识或者标识的值。判断是否对所述第一特征元素执行熵编码取决于所述标识或者标识的值是否为预设值,为预设值时,需要对所述第一特征元素执行熵编码;不为预设值时,不需要对所述第一特征元素执行熵编码。所述特征数据中各特征元素的决策信息的集合也可以为浮点数,也就是说取值可以为除0和1外的其他值。这时候,可以通过设置预设值,当所述第一特征元素的决策信息的值等于或者大于预设值时,判断需要对所述第一特征元素执行熵编码;或当所述第一特征元素的决策信息的值小于预设值时,判断不需要对所述第一特征元素执行熵编码。
在一种可能的实现方式中,所述方法还包括:待编码图像经过编码网络获取所述特征数据;待编码图像经过编码网络后经过取整获取所述特征数据;或待编码图像经过编码网络后经过量化和取整获取所述特征数据。
其中编码网络可以采用自编码器结构。编码网络可以为卷积神经网络。编码网络可以包括多个子网络,每个子网络包含一个或多个卷积层。子网络间的网络结构可以互为相同或不同。
其中待编码图像可以是原始图像,也可以是残差图像。
应理解,待编码图像可以为RGB格式或YUV、RAW等表示格式,待编码图像在输入编码网络前可以进行预处理操作,预处理操作可以包括转换、块划分、滤波、剪枝等操作。
应理解,允许在同一时间戳内或同一时刻将多个待编码图像或多个待编码图像块输入编解码网络进行处理以得到特征数据。
第二方面,提供了一种特征数据的解码方法,包括:
获取待解码特征数据的码流;
所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
获取所述第一特征元素的概率估计结果;
根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;
仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
可以理解,所述第一特征元素为待解码特征数据中的任意特征元素,当待解码特征数据中所有特征元素完成了所述判断并根据判断结果执行或不执行熵解码后得到了解码特征数据。
其中,所述待解码特征数据可以为一维、二维或多维数据,其中每个数据均为特征元素。需要说明的是,本申请中特征点和特征元素的含义相同。
具体的,所述第一特征元素为待解码特征数据中的任意一待解码特征元素。
一种可能性中,获取所述第一特征元素的概率估计结果的概率估计过程可以通过概率估计网络实现;在另一种可能性中,概率估计过程可以采用传统非网络的概率估计方法对特征数据进行概率估计。
需要说明的是,当只有边信息作为概率估计的输入时,可以并行输出各特征元素的概率估计结果;当概率估计的输入包括有上下文信息时,需要串行输出各特征元素概率估计结果。其中,所述边信息包含的特征元素的个数比特征数据的特征元素少。
一种可能性中,码流中包含了边信息,解码码流过程需要对边信息进行解码。
具体的,特征数据中的每个特征元素的判断过程包括了条件判断以及根据条件判断结果决定是否执行熵解码。
一种可能性中,熵解码可以通过神经网络方式实现。
在另一种可能性中,熵解码可以通过传统熵解码方式实现。
具体的,假如当前的第一特征元素为特征数据的第P个特征元素,则完成第P个特征元素的判断和根据判断结果执行或不执行熵解码后,开始特征数据的第P+1个特征元素的判断和根据判断结果执行或不执行熵解码过程,其中P为正整数且P小于M,其中M为整个的特征数据中特征元素的数量。比如对第二特征元素,当判断不需要对所述第二特征元素执行熵解码时,则对所述第二特征元素跳过执行熵解码。
上述技术方案中,通过对每个待解码的特征元素进行判定是否需要执行熵解码,从而跳过某些特征元素的熵解码过程,可以显著减少需执行熵解码的元素个数。这样,可以降低熵解码复杂度。
在一种可能的实现方式中,所述判断是否对所述特征数据的第一特征元素执行熵解码包括:当所述特征数据的第一特征元素的概率估计结果满足预设条件,判断需要对所述第一特征元素执行熵解码;或当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述第一特征元素熵解码,将所述第一特征元素的特征值设置为k;其中k为整数。
在一种可能的实现方式中,所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,所述预设条件为第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数。
一种可能性中,所述第一特征元素在不满足所述预设条件下设置为k。比如,第一特征元素可以取值的范围为【-255,255】。k可以设置为0,则对概率值小于或者等于0.5的第一特征元素,执行熵编码。对概率值大于0.5的第一特征元素,不执行熵编码。
在另一种可能性中,所述第一特征元素在不满足所述预设条件下通过列表确定取值。
在另一种可能性中,所述第一特征元素在不满足所述预设条件下设置为固定整数值。
k为上述第一特征元素取值所可能的取值范围中的某一取值。
在一种可能性中,k为上述第一特征元素中所有可能的取值范围中最大概率所对应的值。
其中,解码码流在低码率情况所选定的第一阈值小于解码码流在高码率情况所选定的第一阈值。具体码率高低与图像的分辨率以及图像内容相关,以公开的Kodak数据集为例, 低于0.5bpp为低码率,反之为高码率。
在某一码率情况下,所述第一阈值可以根据实际需要进行配置,此处不做限定。
上述技术方案中,通过灵活的第一阈值设定方式使得产生可以根据要求灵活的降低熵解码复杂度。
在一种可能的实现方式中,所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数。
则当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。则所述预设条件可以为以下任意一种:
所述第一特征元素的概率分布的第一参数与所述第一特征元素取值k的差的绝对值大于或等于第二阈值;或
所述第一特征元素的第二参数大于或等于第三阈值;或
所述第一特征元素的概率分布的第一参数与所述第一特征元素取值k的差的绝对值与所述第一特征元素的概率分布的第二参数的和大于或等于第四阈值。
则当所述概率分布为混合高斯分布时,所述第一特征元素概率分布的第一参数为所述第一特征元素混合高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素混合高斯分布的方差,则所述预设条件可以为以下任意一种:
所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或
所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于第六阈值;或
所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值。
则当所述概率分布为非对称高斯分布时,所述第一特征元素概率分布的第一参数为所述第一特征元素非对称高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素非对称高斯分布的第一方差和第二方差,则所述预设条件可以为以下任意一种:
所述第一特征元素的非对称高斯分布的均值参数与所述第一特征元素的取值为k的差的绝对值大于第八阈值;或
所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;
所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值。
所述第一特征元素的概率分布为混合高斯分布情况,确定所述第一特征元素的判断取值范围,当所述第一特征元素的概率分布的多个均值均不在所述第一特征元素的判断取值范围。
所述第一特征元素的概率分布为高斯分布情况,确定所述第一特征元素的判断取值范围,当所述第一特征元素的概率分布的均值不在所述第一特征元素的判断取值范围。
所述第一特征元素的概率分布为高斯分布情况,确定所述第一特征元素的判断取值范围,判断取值范围中包括了多个所述第一特征元素可能的取值,当所述第一特征元素的高斯分布的均值参数与所述第一特征元素的判断取值范围中的每个取值的差的绝对值大于或等于第十一阈值,或所述第一特征元素的概率分布的方差大于或等于第十二阈值。
所述第一特征元素取值为k不在所述第一特征元素的判断取值范围。
所述第一特征元素取值为k对应的概率值小于或等于第十三阈值。
在一种可能的实现方式中,构建第一阈值的阈值候选列表,通过对所述码流进行解码以得到所述第一阈值的阈值候选列表的索引号,将所述第一阈值的索引号所对应所述第一阈值的阈值候选列表位置的值作为所述第一阈值的值,其中所述第一阈值的阈值候选列表的长度可以设置为T;T为大于或等于1的整数。可以理解的,所述其他任意的阈值可以采用如第一阈值的阈值候选列表构建方式,且可以解码对应阈值的索引号,并根据所述索引号选取构建列表里的值作为阈值。
另一种可能性中,根据概率估计结果输入生成网络得到所述决策信息。所述生成网络可以为卷积网络,可以包括多个网络层,任意一网络层可以为卷积层、归一化层、非线性激活层等。
在一种可能的实现方式中,将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,所述决策信息用于指示是否对所述第一特征元素执行熵解码。
在一种可能的实现方式中,所述特征数据中各特征元素的决策信息为决策图,决策图也可以称为决策图map。决策图优选的为二元图,二元图也可以称为二元图map。二元图中特征元素的决策信息取值通常为0或1。因此当所述决策图中对应所述第一特征元素所在位置的值为预设值时,需要对所述第一特征元素执行熵解码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,不需要对所述第一特征元素执行熵解码。
所述特征数据中各特征元素的决策信息的集合也可以为浮点数,也就是说取值可以为 除0和1外的其他值。这时候,可以通过设置预设值,当所述第一特征元素的决策信息的值等于或者大于预设值时,判断需要对所述第一特征元素执行熵解码;或当所述第一特征元素的决策信息的值小于预设值时,判断不需要对所述第一特征元素执行熵解码。
在一种可能的实现方式中,所述特征数据经过解码网络以得到重建图像。
在另一种可能的实现方式中,所述特征数据经过解码网络以得到面向机器任务数据,具体的,所述特征数据经过面向机器任务模块以得到面向机器任务数据,所述面向机器模块包括目标识别网络,分类网络或者语义分割网络。
第三方面,提供了一种特征数据编码装置,包括:
获得模块,用于获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素,以及用于获取所述中第一特征元素的概率估计结果;
编码模块,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
上述获得模块、编码模块的进一步实现功能可以参考第一方面或者第一方面的任意一种实现方式,此处不再赘述。
第四方面,提供了一种特征数据解码装置,包括:
获得模块,用于获取待解码特征数据的码流,所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述第一特征元素的概率估计结果;
解码模块,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
上述获得模块、解码模块的进一步实现功能可以参考第二方面或者第二方面的任意一种实现方式,此处不再赘述。
第五方面,本申请提供一种编码器,包括处理电路,用于判断根据上述第一方面及第一方面任一项所述的方法。
第六方面,本申请提供一种解码器,包括处理电路,用于判断上述第二方面及第二方面任一项所述的方法。
第七方面,本申请提供一种计算机程序产品,包括程序代码,当其在计算机或处理器上判断时,用于判断上述第一方面及第一方面任一项、上述第二方面及第二方面任一项所述的方法。
第八方面,本申请提供一种编码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述解码器判断上述第一方面及第一方面任一项所述的方法。
第九方面,本申请提供一种解码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述编码器判断上述第二方面及第二方面任一项所述的方法所述的方法。
第十方面,本申请提供一种非瞬时性计算机可读存储介质,包括程序代码,当其由计算机设备判断时,用于判断上述第一方面及第一方面任一项、上述第二方面及第二方面任一项所述的方法。
第十一方面,本发明涉及编码装置,具有实现上述第一方面或第一方面任一项的方法实施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件判断相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述编码装置包括:获得模块,用于将原始图像或残差图像经过编码网络变换到特征空间,提取特征数据用来压缩。另外对特征数据进行概率估计获取特征数据各特征元素的概率估计结果;编码模块,利用特征数据各特征元素的概率估计结果,用于通过一定条件来判断特征数据中各特征元素是否执行熵编码并完成所述特征数据中所有特征元素的编码过程以得到特征数据的编码码流。这些模块可以判断上述第一方面或第一方面任一项方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第十二方面,本发明涉及解码装置,具有实现上述第二方面或第二方面任一项的方法实施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件判断相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述解码装置包括:获得模块,用于获取待解码特征数据的码流,并根据待解码特征数据的码流,进行概率估计以得到特征数据各特征元素的概率估计结果;解码模块,利用特征数据各特征元素的概率估计结果,通过一定条件来判断特征数据中各特征元素是否执行熵解码并完成所述特征数据中所有特征元素的解码过程以得到所述特征数据,并对所述特征数据进行解码以得到重建图像或面向机器任务数据。这些模块可以判断上述第二方面或第二方面任一项方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第十三方面,提供了一种特征数据的编码方法,包括:
获取待编码特征数据,所述特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
获取所述特征数据的边信息,对所述特征数据的边信息输入联合网络以得到所述第一特征元素的决策信息;
根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码;
仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
所述特征数据为编码网络所输出的一维、二维或多维数据,其中每个数据均为特征元素。
一种可能性中,将特征数据的边信息编入码流。所述边信息为特征数据输入神经网络进一步提取得到的特征信息,所述边信息包含的特征元素的个数比特征数据的特征元素少。
所述第一特征元素为所述特征数据中的任意特征元素。
一种可能性中,所述特征数据各特征元素的决策信息的集合可以以决策图等方式来进行表示。其中决策图为一维、二维或多维图像数据且与所述特征数据的尺寸一致。
一种可能性中,联合网络还输出所述第一特征元素的概率估计结果,所述第一特征元素的概率估计结果包括所述第一特征元素的概率值,和/或所述概率分布的第一参数和所述概率分布的第二参数。
上述技术方案中,通过对每个待编码的特征元素进行判定是否需要执行熵编码,从而跳过某些特征元素的熵编码过程,可以显著减少需执行熵编码的元素个数。这样,可以降低熵编码复杂度。
在一种可能性中,当所述决策图中对应第一特征元素位置的值为预设值时,需要对所述第一特征元素执行熵编码;当所述决策图中对应第一特征元素位置的值不为预设值时,不需要对所述第一特征元素执行熵编码。
第十四方面,提供了一种特征数据的解码方法,包括:
获取待解码特征数据的码流和所述待解码特征数据的边信息;
所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
对所述待解码特征数据的边信息输入联合网络以得到所述第一特征元素的决策信息;
根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码;
仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
一种可能性中,解码待解码特征数据的码流以得到边信息。所述边信息包含的特征元素的个数比特征数据的特征元素少。
所述第一特征元素为所述特征数据中的任意特征元素。
一种可能性中,所述特征数据各特征元素的决策信息可以以决策图等方式来进行表示。 其中决策图为一维、二维或多维图像数据且与所述特征数据的尺寸一致。
一种可能性中,联合网络还输出所述第一特征元素的概率估计结果,所述第一特征元素的概率估计结果包括所述第一特征元素的概率值,和/或所述概率分布的第一参数和所述概率分布的第二参数。
在一种可能性中,当所述决策图中对应第一特征元素位置的值为预设值时,需要对所述第一特征元素执行熵解码;当所述决策图中对应第一特征元素位置的值不为预设值时,不需要对所述第一特征元素执行熵解码,并将所述第一特征元素的特征值设置为k,其中k为整数。
上述技术方案中,通过对每个待编码的特征元素进行判定是否需要执行熵解码,从而跳过某些特征元素的熵解码过程,可以显著减少需执行熵解码的元素个数。这样,可以降低熵解码复杂度。
在现有端到端特征数据编解码主流方案中,在熵编解码或算术编解码过程中存在着复杂度过高问题。本申请利用了待编码特征数据中特征点概率分布相关信息,对每个待编解码特征数据中的特征元素进行判定是否需要进行熵编解码,从而跳过某些特征元素的熵编解码过程,可以显著减少需进行编解码的元素个数,降低了编解码复杂度。另一方面,可以根据码流的码率实际大小要求,灵活的对阈值大小进行设置以控制生成码流的码率大小。
附图及以下说明中将详细描述一个或多个实施例。其它特征、目的和优点在说明、附图以及权利要求中是显而易见的。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1A为图像译码系统示例性框图;
图1B为图像译码系统处理电路实现;
图1C为图像译码设备示意性框图;
图1D为本申请实施例装置实现图;
图2A为本申请一种可能场景的系统架构图;
图2B为本申请一种可能场景的系统架构图;
图3A-3D为编码器示意性框图;
图4A为编码网络单元示意图;
图4B为编码网络的网络结构示意图;
图5为编码决策实现单元结构示意图;
图6为联合网络输出示例图;
图7为生成网络输出示例图;
图8为解码决策实现实现示意图;
图9为解码网络的网络结构示例图;
图10A为本申请实施例译码方法的一个示例图;
图10B为本申请实施例图像特征图解码器示意性框图;
图11A为本申请实施例译码方法的一个示例图;
图12为边信息提取模块网络结构示例图;
图13A为本申请实施例译码方法的一个示例图;
图13B为本申请实施例图像特征图解码器示意性框图;
图14为本申请实施例译码方法的一个示例图;
图15为联合网络的网络结构示例图;
图16为本申请实施例图像特征图解码器示意性框图;
图17为本申请实施例译码方法的一个示例图;
图18为本申请编码装置的一个示例性的结构示意图;
图19为本申请解码装置的一个示例性的结构示意图。
具体实施方式
本申请实施例所涉及的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
本申请实施例提供一种基于AI的特征数据编解码技术,尤其是提供一种基于神经网 络的图像特征图和/或音频特征变量的编解码技术,具体提供一种基于端到端的图像特征图和/或音频特征变量的编解码系统。
在图像编码领域,术语“图像(picture)”、或“图片(image)”,可以用作同义词。图像编码(或通常称为编码)包括图像编码和图像解码两部分,其中视频由多个图像所组成,是连续图像的表示方式。图像编码在源侧判断,通常包括处理(例如,压缩)原始视频图像以减少表示该视频图像所需的数据量(从而更高效存储和/或传输)。图像解码在目的地侧判断,通常包括相对于编码器作逆处理,以重建图像。实施例涉及的图像或音频的“译码”应理解为图像或音频的“编码”或“解码”。编码部分和解码部分也合称为编解码(编码和解码,CODEC)。
在无损图像编码情况下,可以重建原始图像,即重建的图像与原始图像具有相同的质量(假设存储或传输期间没有传输损耗或其它数据丢失)。在传统有损图像编码情况下,通过量化等判断进一步压缩,来减少表示视频图像所需的数据量,而解码器侧无法完全重建视频图像,即重建的视频图像的质量比原始视频图像的质量较低或较差。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2022096510-appb-000001
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间 的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2022096510-appb-000002
其中,
Figure PCTCN2022096510-appb-000003
是输入向量,
Figure PCTCN2022096510-appb-000004
是输出向量,
Figure PCTCN2022096510-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2022096510-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2022096510-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2022096510-appb-000008
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022096510-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022096510-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)熵编码
熵编码用于将熵编码算法或方案(例如,可变长度编码(variable length coding,VLC)方案、上下文自适应VLC方案(context adaptive VLC,CALVC)、算术编码方案、 二值化算法、上下文自适应二进制算术编码(context adaptive binary arithmetic coding,CABAC)、基于语法的上下文自适应二进制算术编码(syntax-based context-adaptive binary arithmetic coding,SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)编码或其它熵编码方法或技术)应用于量化系数、其它语法元素,得到可以通过输出端以编码比特流等形式输出的编码数据,使得解码器等可以接收并使用用于解码的参数。可将编码比特流传输到解码器,或将其保存在存储器中稍后由解码器传输或检索。
在以下译码系统10的实施例中,编码器20A和解码器30A根据图1A至图15进行描述。
图1A为示例性译码系统10的示意性框图,例如可以利用本申请技术的图像(或音频)译码系统10(或简称为译码系统10)。图像译码系统10中的编码器20A和解码器30A代表可用于根据本申请中描述的各种示例判断各技术的设备等。
如图1A所示,译码系统10包括源设备12,源设备12用于将编码图像(或音频)等编码码流21提供给用于对编码码流21进行解码的目的设备14。
源设备12包括编码器20A,另外即可选地,图像源16、预处理器(或预处理单元)18、通信接口(或通信单元)26和概率估计(或概率估计单元)40。
图像(或音频)源16可包括或可以为任意类型的用于捕获现实世界图像(或音频)等的图像捕获设备,和/或任意类型的图像生成设备,例如用于生成计算机动画图像的计算机图形处理器或任意类型的用于获取和/或提供现实世界图像、计算机生成图像(例如,屏幕内容、虚拟现实(virtual reality,VR)图像和/或其任意组合(例如增强现实(augmented reality,AR)图像)的设备。所述音频或图像源可以为存储上述任意音频或图像的任意类型的内存或存储器。
为了区分预处理器(或预处理单元)18判断的处理,图像或音频(图像或音频数据)17也可称为原始图像或音频(原始图像数据或音频数据)17。
预处理器18用于接收(原始)图像(或音频)数据17,并对图像(或音频)数据17进行预处理,得到预处理图像或音频(或预处理图像或音频数据)19。例如,预处理器18判断的预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪。可以理解的是,预处理单元18可以为可选组件。
编码器20A包括编码网络20、熵编码24,另外即可选地,预处理器22。
图像(或音频)编码网络(或编码网络)20用于接收预处理图像(或音频)数据19 并提供编码图像(或音频)数据21。
预处理器22用于接收待编码特征数据21,并对待编码特征数据21进行预处理,得到预处理待编码特征数据23。例如,预处理器22判断的预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪。可以理解的是,预处理单元22可以为可选组件。
熵编码24用于接收待编码特征数据(或预处理待编码特征数据)23并根据概率估计40提供的概率估计结果41生成编码码流25。
源设备12中的通信接口26可用于:接收编码码流25并通过通信信道27向目的设备14等另一设备或任何其它设备发送编码码流25(或其它任意处理后的版本),以便存储或直接重建。
目的设备14包括解码器30A,另外即可选地,可包括通信接口(或通信单元)28、后处理器(或后处理单元)36和显示设备38。
目的设备14中的通信接口28用于直接从源设备12或从存储设备等任意其它源设备接收编码码流25(或其它任意处理后的版本),例如,存储设备为编码码流存储设备,并将编码码流25提供给解码器30A。
通信接口26和通信接口28可用于通过源设备12与目的设备14之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合,发送或接收编码码流(或编码码流数据)25。
例如,通信接口26可用于将编码码流25封装为报文等合适的格式,和/或使用任意类型的传输编码或处理来处理所述编码码流,以便在通信链路或通信网络上进行传输。
通信接口28与通信接口26对应,例如,可用于接收传输数据,并使用任意类型的对应传输解码或处理和/或解封装对传输数据进行处理,得到编码码流25。
通信接口26和通信接口28均可配置为如图1A中从源设备12指向目的设备14的对应通信信道27的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或例如编码后的图像数据传输等数据传输相关的任何其它信息,等等。
解码器30A包括解码网络34、熵解码30,另外即可选地,后处理器32。
熵解码30用于接收编码码流25并根据概率估计40提供的概率估计结果42提供解码特征数据31。
后处理器32用于对解码特征数据31进行后处理,得到后处理后的解码特征数据33。后处理单元32判断的后处理可以包括例如颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样,可以理解的是,后处理单元32可以为可选组件。
解码网络34用于接收解码特征数据31或后处理后的解码特征数据33并提供重建图像数据35。
后处理器36用于对重建图像数据35进行后处理,得到后处理后的重建图像数据37。后处理单元36判断的后处理可以包括例如颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样,可以理解的是,后处理单元36可以为可选组件。
显示设备38用于接收重建图像数据35或后处理后的重建图像数据37,以向用户或观看者等显示图像。显示设备38可以为或包括任意类型的用于表示重建后音频或图像的播放器或显示器,例如,集成或外部显示屏或显示器。例如,显示屏可包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微型LED显示器、硅基液晶显示器(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任意类型的其它显示屏。
尽管图1A示出了源设备12和目的设备14作为独立的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括源设备12和目的设备14的功能,即同时包括源设备12或对应功能和目的设备14或对应功能。在这些实施例中,源设备12或对应功能和目的设备14或对应功能可以使用相同硬件和/或软件或通过单独的硬件和/或软件或其任意组合来实现。
根据描述,图1A所示的源设备12和/或目的设备14中的不同单元或功能的存在和(准确)划分可能根据实际设备和应用而有所不同,这对技术人员来说是显而易见的。
特征数据编码器20A(例如图像特征图编码器或音频特征变量编码器)或特征数据解码器30A(例如图像特征图解码器或音频特征变量解码器)或两者都可通过如图1B所示的处理电路实现,例如一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件、图像编码专用处理器或其任意组合。特征数据编码器20A可以通过处理电路56实现,特征数据解码器30A可以通过处理电路56实现。所述处理电路56可用于判断下文论述的各种操作。如果部分技术在软件中实施,则设备可以将软件的指令存储在合适的非瞬时性计算机可读存储 介质中,并且使用一个或多个处理器在硬件中判断指令,从而判断本发明技术。特征数据编码器20A和特征数据解码器30A中的其中一个可作为组合编解码器(encoder/decoder,CODEC)的一部分集成在单个设备中,如图1B所示。
源设备12和目的设备14可包括各种设备中的任一种,包括任意类型的手持设备或固定设备,例如,笔记本电脑或膝上型电脑、手机、智能手机、平板或平板电脑、相机、台式计算机、机顶盒、电视机、显示设备、数字媒体播放器、视频游戏控制台、视频流设备(例如,内容业务服务器或内容分发服务器)、广播接收设备、广播发射设备,等等,并可以不使用或使用任意类型的操作系统。在一些情况下,源设备12和目的设备14可配备用于无线通信的组件。因此,源设备12和目的设备14可以是无线通信设备。
在一些情况下,图1A所示的译码系统10仅仅是示例性的,本申请提供的技术可适用于图像特征图或音频特征变量编码设置(例如,图像特征图编码或图像特征图解码),这些设置不一定包括编码设备与解码设备之间的任何数据通信。在其它示例中,数据从本地存储器中检索,通过网络发送,等等。图像特征图或音频特征变量编码设备可以对数据进行编码并将数据存储到存储器中,和/或图像特征图或音频特征变量解码设备可以从存储器中检索数据并对数据进行解码。在一些示例中,编码和解码由相互不通信而只是编码数据到存储器和/或从存储器中检索并解码数据的设备来判断。
图1B是根据一示例性实施例包含图1A的特征数据编码器20A和/或图1B的特征数据解码器30A的译码系统50的实例的说明图。译码系统50可以包含成像(或产生音频)设备51、编码器20A、解码器30A(和/或藉由处理电路56实施的特征数据编/解码器)、天线52、一个或多个处理器53、一个或多个内存存储器54和/或显示(或音频播放)设备55。
如图1B所示,成像(或产生音频)设备51、天线52、处理电路56、编码器20A、解码器30A、处理器53、内存存储器54和/或显示(或音频播放)设备55能够互相通信。在不同实例中,译码系统50可以只包含编码器20A或只包含解码器30A。
在一些实例中,天线52可以用于传输或接收特征数据的经编码比特流。另外,在一些实例中,显示(或音频播放)设备55可以用于呈现图像(或音频)数据。处理电路56可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。译码系统50也可以包含可选的处理器53,该可选处理器53类似地可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、音频处理器、通用处理器等。另外,内存存储器54可以是任何类型的 存储器,例如易失性存储器(例如,静态随机存取存储器(static random access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,内存存储器54可以由超速缓存内存实施。在其它实例中,处理电路56可以包含存储器(例如,缓存等)用于实施图像缓冲器等。
在一些实例中,通过逻辑电路实施的编码器20A可以包含(例如,通过处理电路56或内存存储器54实施的)图像缓冲器和(例如,通过处理电路56实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路56实施的编码器20A。逻辑电路可以用于判断本文所论述的各种操作。
在一些实例中,解码器30A可以以类似方式通过处理电路56实施,以实施参照图1B的解码器30和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的解码器30A可以包含(通过处理电路56或内存存储器54实施的)图像缓冲器和(例如,通过处理电路56实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路56实施的图像解码器30A。
在一些实例中,天线52可以用于接收图像数据的经编码比特流。如所论述,经编码比特流可以包含本文所论述的与编码音频或视频帧相关的数据、指示符、索引值、模式选择数据等,例如与编码分割相关的数据。译码系统50还可包含耦合至天线52并用于解码经编码比特流的解码器30A。显示(或音频播放)设备55用于呈现图像(或音频)。
应理解,本申请实施例中对于参考编码器20A所描述的实例,解码器30A可以用于判断相反过程。关于信令语法元素,解码器30A可以用于接收并解析这种语法元素,相应地解码相关图像数据。在一些例子中,编码器20A可以将语法元素熵编码成经编码比特流。在此类实例中,解码器30A可以解析这种语法元素,并相应地解码相关图像数据。
图1C为本发明实施例提供的译码设备400的示意图。译码设备400适用于实现本文描述的公开实施例。在一个实施例中,译码设备400可以是解码器,例如图1A中的图像特征图解码器30A,也可以是编码器,例如图1A中的图像特征图编码器20A。
图像译码设备400包括:用于接收数据的入端口410(或输入端口410)和接收单元(receiver unit,Rx)420;用于处理数据的处理器、逻辑单元或中央处理器(central processing unit,CPU)430;例如,这里的处理器430可以是神经网络处理器430;用于传输数据的发送单元(transmitter unit,Tx)440和出端口450(或输出端口450);用于存储数据的存储器460。图像(或音频)译码设备400还可包括耦合到入端口410、接收单元420、发送单元440和出端口450的光电(optical-to-electrical,OE)组件 和电光(electrical-to-optical,EO)组件,用于光信号或电信号的出口或入口。
处理器430通过硬件和软件实现。处理器430可实现为一个或多个处理器芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器430与入端口410、接收单元420、发送单元440、出端口450和存储器460通信。处理器430包括译码模块470(例如,基于神经网络NN的译码模块470)。译码模块470实施上文所公开的实施例。例如,译码模块470判断、处理、准备或提供各种编码操作。因此,通过译码模块470为译码设备400的功能提供了实质性的改进,并且影响了译码设备400到不同状态的切换。或者,以存储在存储器460中并由处理器430判断的指令来实现译码模块470。
存储器460包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择判断程序时存储此类程序,并且存储在程序判断过程中读取的指令和数据。存储器460可以是易失性和/或非易失性的,可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、三态内容寻址存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(static random-access memory,SRAM)。
图1D为示例性实施例提供的装置500的简化框图,装置500可用作图1A中的源设备12和目的设备14中的任一个或两个。
装置500中的处理器502可以是中央处理器。或者,处理器502可以是现有的或今后将研发出的能够操控或处理信息的任何其它类型设备或多个设备。虽然可以使用如图所示的处理器502等单个处理器来实施已公开的实现方式,但使用一个以上的处理器速度更快和效率更高。
在一种实现方式中,装置500中的存储器504可以是只读存储器(ROM)设备或随机存取存储器(RAM)设备。任何其它合适类型的存储设备都可以用作存储器504。存储器504可以包括处理器502通过总线512访问的代码和数据506。存储器504还可包括操作系统508和应用程序510,应用程序510包括允许处理器502判断本文所述方法的至少一个程序。例如,应用程序510可以包括应用1至N,还包括判断本文所述方法的图像译码应用。
装置500还可以包括一个或多个输出设备,例如显示器518。在一个示例中,显示器518可以是将显示器与可用于感测触摸输入的触敏元件组合的触敏显示器。显示器518可以通过总线512耦合到处理器502。
虽然装置500中的总线512在本文中描述为单个总线,但是总线512可以包括多个总 线。此外,辅助储存器可以直接耦合到装置500的其它组件或通过网络访问,并且可以包括存储卡等单个集成单元或多个存储卡等多个单元。因此,装置500可以具有各种各样的配置。
图2A示出了一种可能的图像特征图或音频特征变量编解码场景下的系统架构1800,包括:
采集设备1801:视频采集设备完成原始视频(或音频)采集;
采集前处理1802:原始视频(或音频)采集经过一系列的前处理得到视频(或音频)数据;
编码1803:视频(或音频)编码用于降低编码冗余,降低图像特征图或音频特征变量压缩过程中的数据传输量;
发送1804:将编码后得到的压缩编码码流数据通过发送模块进行数据发送;
接收1805:压缩编码码流数据经过网络传输被接收模块所接收;
码流解码1806:对码流数据进行码流解码;
渲染显示(或播放)1807:对解码后的数据进行渲染显示(或播放);
图2B示出了一种可能的图像特征图(或音频特征变量)面向机器任务场景下的系统架构1900,包括:
特征提取1901:对图像(或音频)源进行特征提取;
边信息提取1902:对特征提取数据进行边信息提取;
概率估计1903:边信息作为概率估计的输入,对特征图(或特征变量)进行概率估计以得到概率估计结果;
编码1904:结合概率估计结果对特征提取数据执行熵编码以得到码流;
可选的,在进行编码以前对特征提取数据执行量化或取整操作,然后对量化或取整后的特征提取数据进行编码。
可选的,对边信息执行熵编码,使得码流中包括了边信息数据。
解码1905:结合概率估计结果对码流执行熵解码得到图像特征图(或音频特征变量);
其中可选的,如码流中包括了边信息编码数据,对边信息编码数据执行熵解码,并结合解码的边信息数据作为概率估计的输入以得到概率估计结果。
需要说明的是,当只有边信息作为概率估计的输入时,可以并行输出各特征元素的概率估计结果;当概率估计的输入包括有上下文信息时,需要串行输出各特征元素概率估计结果。其中所述边信息为图像特征图或音频特征变量输入神经网络进一步提取得到的特征 信息,所述边信息包含的特征元素的个数比图像特征图或音频特征变量的特征元素少。可选地,可以将图像特征图或音频特征变量的边信息编入码流。
机器视觉任务1906:对解码特征图(或特征变量)执行机器视觉(或听觉)任务。
具体的,将解码特征数据输入值机器视觉(或听觉)任务网络,网路输出为视觉(或听觉)任务相关如分类、目标识别、语义分割等任务的一维、二维或多维数据。
在一种可能实现上,在系统架构1900实现过程中,特征提取、编码过程在终端上实现,对解码以及执行机器视觉任务在云端上实现。
编码器20A可用于通过输入端202等接收图像(或图像数据)或音频(或音频数据)17。接收图像、图像数据、音频、音频数据也可以是预处理后的图像(或预处理后的图像数据)或音频(或预处理后的音频数据)19。为简单起见,以下描述使用图像(或音频)17。图像(或音频)17也可称为当前图像或待编码的图像(尤其是在视频编码中将当前图像与其它图像区分开时,其它图像例如同一视频序列,即也包括当前图像的视频序列中的之前编码后图像和/或解码后图像)或当前音频或待编码的音频。
(数字)图像为或可以视为具有强度值的像素点组成的二维阵列或矩阵。阵列中的像素点也可以称为像素(pixel或pel)(图像元素的简称)。阵列或图像在水平方向和垂直方向(或轴线)上的像素点数量决定了图像的大小和/或分辨率。为了表示颜色,通常采用三个颜色分量,即图像可以表示为或包括三个像素点阵列。在RBG格式或颜色空间中,图像包括对应的红色、绿色和蓝色像素点阵列。同样,每个像素可以以亮度/色度格式或颜色空间表示,例如YCbCr,包括Y指示的亮度分量(有时也用L表示)以及Cb、Cr表示的两个色度分量。亮度(luma)分量Y表示亮度或灰度水平强度(例如,在灰度等级图像中两者相同),而两个色度(chrominance,简写为chroma)分量Cb和Cr表示色度或颜色信息分量。相应地,YCbCr格式的图像包括亮度像素点值(Y)的亮度像素点阵列和色度值(Cb和Cr)的两个色度像素点阵列。RGB格式的图像可以转换或变换为YCbCr格式,反之亦然,该过程也称为颜色变换或转换。如果图像是黑白的,则该图像可以只包括亮度像素点阵列。相应地,图像可以为例如单色格式的亮度像素点阵列或4:2:0、4:2:2和4:4:4彩色格式的亮度像素点阵列和两个相应的色度像素点阵列。图像编码器20A对图像的色彩空间不做限制。
在一个可能性中,编码器20A的实施例可包括图像(或音频)分割单元(图1A或图1B中未示出),用于将图像(或音频)17分割成多个(通常不重叠)图像块203或音频 段。这些图像块在H.265/HEVC和VVC标准中也可以称为根块、宏块(H.264/AVC)或编码树块(Coding Tree Block,CTB),或编码树单元(Coding Tree Unit,CTU)。分割单元可用于对视频序列中的所有图像使用相同的块大小和使用限定块大小的对应网格,或在图像或图像子集或图像组之间改变块大小,并将每个图像分割成对应块。
在另一个可能性中,编码器可用于直接接收图像17的块203,例如,组成所述图像17的一个、几个或所有块。图像块203也可以称为当前图像块或待编码图像块。
与图像17一样,图像块203同样是或可认为是具有强度值(像素点值)的像素点组成的二维阵列或矩阵,但是图像块203的比图像17的小。换句话说,块203可包括一个像素点阵列(例如,单色图像17情况下的亮度阵列或彩色图像情况下的亮度阵列或色度阵列)或三个像素点阵列(例如,彩色图像17情况下的一个亮度阵列和两个色度阵列)或根据所采用的颜色格式的任何其它数量和/或类型的阵列。块203的水平方向和垂直方向(或轴线)上的像素点数量限定了块203的大小。相应地,块可以为M×N(M列×N行)个像素点阵列,或M×N个变换系数阵列等。
在另一个可能性中,图1A-1B或图3A-3D所示的编码器20A用于逐块对图像17进行编码。
在另一个可能性中,图1A-1B或图3A-3D所示的编码器20A用于对图像17进行编码。
在另一个可能性中,图1A-1B或图3A-3D所示的编码器20A还可以用于使用片(也称为视频片)分割编码图像,其中图像可以使用一个或多个片(通常为不重叠的)进行分割或编码。每个片可包括一个或多个块(例如,编码树单元CTU)或一个或多个块组(例如H.265/HEVC/VVC标准中的编码区块(tile)和VVC标准中的子图像(subpicture)。
在另一个可能性中,图1A-1B或图3A-3D所示的编码器20A还可以用于使用片/编码区块组(也称为视频编码区块组)和/或编码区块(也称为视频编码区块)对图像进行分割和/或编码,其中图像可以使用一个或多个片/编码区块组(通常为不重叠的)进行分割或编码,每个片/编码区块组可包括一个或多个块(例如CTU)或一个或多个编码区块等,其中每个编码区块可以为矩形等形状,可包括一个或多个完整或部分块(例如CTU)。
编码网络20
编码网络20用于通过编码网络根据输入数据来得到图像特征图或音频特征变量。
在一个可能性中,编码网络20如图4A所示,编码网络20包含多个网络层,任意一网络层可以为卷积层、归一化层、非线性激活层等。
在一个可能性中,编码网络20输入为至少一张待编码图像或至少一个待编码图像块。 待编码图像为可以为原始图像,有损图像或者为残差图像。
在一个可能性中,编码网络20中的编码网络的网络结构示例如图4B所示,可见示例中编码网络包含了5个网络层,具体包括了三个卷积层以及两个非线性激活层。
取整24
取整用于通过例如标量量化或矢量量化对图像特征图或音频特征变量进行取整,得到取整后的图像特征图或音频特征变量。
在一个可能性中,编码器20A可用于输出取整参数(quantization parameter,QP),例如,直接输出或由编码决策实现单元进行编码或压缩后输出,例如使得解码器30A可接收并使用量化参数进行解码。
在一个可能性中,输出特征图或特征音频特征变量在进行取整前进行预处理,预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪等。
概率估计40
概率估计根据的输入特征图或特征变量信息以得到图像特征图或音频特征变量的概率估计结果。
概率估计用于对取整后的图像特征图或音频特征变量进行概率估计。
概率估计可以为概率估计网络,概率估计网络为卷积网络,卷积网络中包括了卷积层和非线性激活层。以图4B为例,概率估计网络包含了5个网络层,具体包括了三个卷积层以及两个非线性激活层。概率估计可以采用非网络的传统概率估计方法实现。概率估计方法包括且不限于等最大似然估计、最大后验估计、极大似然估计等统计方法。
编码决策实现26
如图5所示,编码决策实现包括了编码元素判断以及熵编码。所述图像特征图或音频特征变量为编码网络所输出的一维、二维或多维数据,其中每个数据均为特征元素。编码元素判断261
编码元素判断是根据概率估计的概率估计结果信息来对图像特征图或音频特征变量中的每一个特征元素进行判断并根据判断结果决定具体对哪些特征元素执行熵编码。
图像特征图或音频特征变量的第P个特征元素的元素判断过程完成后,开始图像特征图的第P+1个特征元素的元素判断过程,其中P为正整数且P小于M。
熵编码262
熵编码可以采用各种公开的熵编码算法进行编码,譬如采用方案例如,可变长度编码(variable length coding,VLC)方案、上下文自适应VLC方案(context adaptive VLC, CAVLC)、熵编码方案、二值化算法、上下文自适应二进制熵编码(context adaptive binary arithmetic coding,CABAC)、基于语法的上下文自适应二进制熵编码(syntax-based context-adaptive binary arithmetic coding,SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)编码或其它熵编码方法或技术。得到可以通过输出端212以编码比特流25等形式输出的编码图像数据25,使得解码器30A等可以接收并使用用于解码的参数。可将编码比特流25传输到解码器30A,或将其保存在存储器中稍后由解码器30A传输或检索。
另一种可能性中,熵编码可以采用熵编码网络进行编码,譬如采用卷积网络实现。
在一个可能性中,熵编码由于不知道取整特征图的真实字符概率,可以统计这些或相关的信息添加至熵编码中,把这些信息传到解码端。
联合网络44
联合网络是根据输入边信息得到图像特征图或音频特征变量的概率估计结果和决策信息。联合网络为多层网络,联合网络可以为卷积网络,卷积网络中包括了卷积层和非线性激活层。联合网络任意一网络层可以为卷积层、归一化层、非线性激活层等。
其中所述决策信息可以为一维、二维或多维数据,所述决策信息尺寸可以与图像特征图尺寸一致。
所述决策信息可以在联合网络中任意一网络层后输出。
所述概率估计结果可以在联合网络中任意一网络层后输出。
图6为联合网络的网络结构输出示例,网络结构包括4个网络层,其中决策信息在第四网络层后进行输出,概率估计结果在第二网络层后输出。
生成网络46
生成网络是根据输入概率估计结果得到图像特征图中各特征元素的决策信息。生成网络为多层网络,生成网络可以为卷积网络,卷积网络中包括了卷积层和非线性激活层。生成网络任意一网络层可以为卷积层、归一化层、非线性激活层等。
所述决策信息可以在生成网络中任意一网络层后输出。所述决策信息可以为一维、二维或多维数据。
图7为生成网络的网络结构输出决策信息示例,网络结构包括4个网络层。
解码决策实现30
如图8所示,解码决策实现包括了元素判断以及熵解码。所述图像特征图或音频特征变量为解码决策实现所输出的一维、二维或多维数据,其中每个数据均为特征元素。
解码元素判断301
解码元素判断根据概率估计的概率估计结果来对图像特征图或音频特征变量中的每一个特征元素进行判断并根据判断结果决定具体对哪些特征元素执行熵解码。解码元素判断对图像特征图或音频特征变量中的每一个特征元素进行判断并根据判断结果决定具体对哪些特征元素执行熵解码,可以看作是编码元素判断对图像特征图中的每一个特征元素进行判断并根据判断结果决定具体对哪些特征元素执行熵编码的逆过程。
熵解码302
熵解码可以采用各种公开的熵解码算法进行编码,譬如采用方案例如,可变长度编码(variable length coding,VLC)方案、上下文自适应VLC方案(context adaptive VLC,CAVLC)、熵解码方案、二值化算法、上下文自适应二进制熵解码(context adaptive binary arithmetic coding,CABAC)、基于语法的上下文自适应二进制熵解码(syntax-based context-adaptive binary arithmetic coding,SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)编码或其它熵编码方法或技术。得到可以通过输出端212以编码比特流25等形式输出的编码图像(或音频)数据25,使得解码器30A等可以接收并使用用于解码的参数。可将编码比特流25传输到解码器30A,或将其保存在存储器中稍后由解码器30A传输或检索。
另一种可能性中,熵解码可以采用熵解码网络进行解码,譬如采用卷积网络实现。
解码网络34
解码网络用于将解码图像特征图或音频特征变量31或后处理解码图像特征图或音频特征变量33通过解码网络34以在像素域中得到重建图像(或音频)数据35或面向机器任务数据。
解码网络包含多个网络层,任意一网络层可以为卷积层、归一化层、非线性激活层等。解码网络单元306中可以存在包括叠加(concat)、相加、相减等操作。
在一个可能性中,解码网络中各网络层结构可以互为相同或者不同。
解码网络的结构示例如图9所示,可见示例中解码网络包含了5个网络层,具体包括了一个归一化层、两个卷积层以及两个非线性激活层。
解码网络输出重建图像(或音频),或者输出得到面向机器任务数据。具体的,所述解码网络可以包括目标识别网络,分类网络或者语义分割网络。
应理解,在编码器20A和解码器30A中,可以对当前步骤的处理结果进一步处理,然后输出到下一步骤。例如,在编码器单元或解码器单元之后,可以对编码器单元或解码器 单元的处理结果进行进一步的运算或处理,例如裁剪(clip)或移位(shift)运算或滤波处理。
基于上文的描述,下面给出本申请实施例提供的一些图像特征图或音频特征变量的编解码方法。对于下文描述的各方法实施例,为了方便起见,将其都表述为一系列的动作步骤的组合,但是本领域技术人员应该知悉,本申请技术方案的具体实现并不受所描述的一系列的动作步骤的顺序的限制。
下面结合附图,对本申请的流程进行详细的描述。需要说明的是,流程图中的编码端过程具体可以由上述的编码器20A来执行,流程图中的解码端过程具体可以由上述的解码器30A来执行。
实施例一至实施例五中,第一特征元素或第二特征元素即为当前待编码特征元素或为当前待解码特征元素或,比如
Figure PCTCN2022096510-appb-000011
决策图也可以称为决策图map。决策图优选的为二元图,二元图也可以称为二元图map。
在本申请实施例一中,图10A示出了具体实现流程1400,运行步骤如下:
编码端:
步骤1401:获取图像的特征图
本步骤具体由图3A中的编码网络204来实现,具体可以参照上述对编码网络20的描述。将图像分别输入特征提取模块输出图像的特征图y,特征图y可以是维度为wxhxc的三维数据。具体的,特征提取模块可以使用现有的神经网络来实现,在此不做限定。本步骤为现有技术。
特征量化模块对特征图y中的每个特征值进行量化,将浮点数的特征值进行四舍五入得到整数特征值,得到量化后的特征图
Figure PCTCN2022096510-appb-000012
可以参照上述实施例对取整24的描述。
步骤1402:对特征图
Figure PCTCN2022096510-appb-000013
进行概率估计得到各特征元素的概率估计结果,即特征图
Figure PCTCN2022096510-appb-000014
中的每个特征元素
Figure PCTCN2022096510-appb-000015
的概率分布:
其中,参数x,y,i为正整数,坐标(x,y,i)表示当前待编码特征元素的位置,具体的,坐标(x,y,i)表示当前待编码特征元素在当前三维特征图中相对于左上顶点的特征元素的位置。本步骤具体由图3A中的概率估计210来实现,具体可以参照上述对概率估计40的描述。具体可以使用概率分布模型来获得概率分布,例如使用单高斯模型(Gaussian single model,GSM)或者混合高斯模型(Gaussian mixture model,GMM)建模,首先将边信息
Figure PCTCN2022096510-appb-000016
和上下文信息输入概率估计网络,对特征图
Figure PCTCN2022096510-appb-000017
中的每个特征元素
Figure PCTCN2022096510-appb-000018
进行概率估计得到各特征元素
Figure PCTCN2022096510-appb-000019
的概率分布。概率估计网络可以使用基于深度学习网络,例如循环神经网络(Recurrent Neural Network,RNN)和卷积神经网络(Convolutional Neural Network,PixelCNN)等,在此不做限定。将模型参数代入概率分布模型中,得到概率分布。
步骤1403:对特征图
Figure PCTCN2022096510-appb-000020
执行熵编码得到压缩码流,并生成压缩码流。
本步骤具体由图3A中的编码决策实现208来实现,具体可以参照上述对编码决策实现26的描述。根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000021
取值为k的概率P,当当前待编码特征元素
Figure PCTCN2022096510-appb-000022
的概率估计结果P不满足预设条件:当P大于(或者等于)第一阈值T0时,跳过当前待编码特征元素执行熵编码过程;否则,当当前待编码特征元素的概率估计结果P满足预设条件:当P小于第一阈值T0时,对当前待编码特征元素执行熵编码写入码流。其中,k可为任意整数,例如0,1,-1,2,3等。所述第一阈值T0为满足0<T0<1中的任一数,例如取值为0.99,0.98,0.97,0.95等。(可以认为每个特征元素的阈值都相同)。
步骤1404、编码器发送或存储压缩码流。
解码端:
步骤1411:获取解码图像特征图的码流
步骤1412:根据码流进行概率估计得到各特征元素的概率估计结果
本步骤具体由图10B中的的概率估计302来实现,具体可以参照上述对概率估计40的描述。对待解码特征图
Figure PCTCN2022096510-appb-000023
中的每个特征元素
Figure PCTCN2022096510-appb-000024
进行概率估计,得到待解码特征元素
Figure PCTCN2022096510-appb-000025
的概率分布。待解码特征图
Figure PCTCN2022096510-appb-000026
中包括多个特征元素,所述多个特征元素包括当前待解码特征元素。
解码端使用的概率估计网络结构图与本实施例的编码端概率估计网络结构相同。
步骤1413:对待解码特征图
Figure PCTCN2022096510-appb-000027
执行熵解码
本步骤具体由图10B中的解码决策实现304来实现,具体可以参照上述对解码决策实现30的描述。根据当前待解码特征元素的概率分布,得到当前待解码特征元素取值为k的概率P,即当前待解码特征元素的概率估计结果P,当概率估计结果P不满足预设条件:P大于所述第一阈值T0时,不需要对当前待解码特征元素执行熵解码,将当前待解码特征元素数值设置为k,否则,当当前待解码特征元素满足预设条件:P小于或者等于所述第一阈值T0时,对码流执行熵解码,得到当前待解码特征元素的值。
其中所述第一阈值T0可以通过解析码流,从码流中获取索引号,解码端使用编码端 相同的方式来构建阈值候选列表,然后根据阈值候选列表中预置和索引号的对应关系得到对应的阈值。其中,从码流中获取索引号即从序列头、图像头、Slice/条带或SEI中获取索引号。
或者可以直接解析码流,从码流中获取阈值,具体地,从序列头、图像头、Slice/条带或SEI中获取阈值。
或者根据与解码的阈值约定策略,直接设置固定阈值。
步骤1414:对解码后的特征图
Figure PCTCN2022096510-appb-000028
进行重建,或者输入面向机器视觉任务模块执行相应的机器任务。本步骤具体可以由图10B中的解码网络306来实现,具体可以参照上述对解码网络34的描述。
情况一:将熵解码后的特征图
Figure PCTCN2022096510-appb-000029
输入图像重建模块,神经网络输出重建图。所述神经网络可以采用任一结构,例如全连接网络、卷积神经网络、循环神经网络等。所述神经网络可以采用多层的结构深度神经网络结构来达到更好的估计效果。
情况二:将熵解码后的特征图
Figure PCTCN2022096510-appb-000030
输入面向机器视觉任务模块执行相应的机器任务。例如完成物体分类、识别、分割等机器视觉任务。
以上解码端的k值与编码端的k值相对应设置。
图11A示出了本申请实施例二的具体实现流程1500,运行步骤如下:
需要说明的是,本实施例的方法一至方法六中:概率估计结果包括第一参数和第二参数;当所述概率分布为高斯分布时,第一参数为均值μ,第二参数为方差σ;当所述概率分布为拉普拉斯分布时,第一参数为位置参数μ,第二参数为尺度参数b。
编码端:
步骤1501:获取图像的特征图
本步骤具体由图3B中的编码网络204来实现,具体可以参照上述对编码网络20的描述。将图像分别输入特征提取模块输出图像的特征图y,特征图y可以是维度为wxhxc的三维数据。。具体的,特征提取模块可以使用现有的神经网络来实现,在此不做限定。本步骤为现有技术。
特征量化模块对特征图y中的每个特征值进行量化,将浮点数的特征值进行四舍五入得到整数特征值,得到量化后的特征图
Figure PCTCN2022096510-appb-000031
步骤1502:图像的特征图
Figure PCTCN2022096510-appb-000032
输入边信息提取模块,输出边信息
Figure PCTCN2022096510-appb-000033
本步骤具体由图3B中的边信息提取单元214来实现。其中,边信息提取模块可以使 用图12所示的网络来实现,边信息
Figure PCTCN2022096510-appb-000034
可以理解为对特征图
Figure PCTCN2022096510-appb-000035
进行进一步提取得到的特征图
Figure PCTCN2022096510-appb-000036
Figure PCTCN2022096510-appb-000037
所含包含的特征元素的个数比特征图
Figure PCTCN2022096510-appb-000038
少。
需要说明的是,可以在本步骤中,对边信息
Figure PCTCN2022096510-appb-000039
执行熵编码并写入码流,也可以在后续的步骤1504中对边信息
Figure PCTCN2022096510-appb-000040
执行熵编码并写入码流,在此不做限定。
步骤1503:对特征图
Figure PCTCN2022096510-appb-000041
进行概率估计得到各特征元素的概率估计结果。
本步骤具体由图3B中的概率估计210来实现,具体可以参照上述对概率估计40的描述。可以使用概率分布模型来获得概率估计结果及概率分布。其中,概率分布模型可以为:单高斯模型(Gaussian single model,GSM)或者非对称高斯模型或者混合高斯模型(Gaussian mixture model,GMM)或者拉普拉斯分布模型(Laplace distribution)。
当概率分布模型为高斯模型时(单高斯模型或者非对称高斯模型或者混合高斯模型),首先将边信息
Figure PCTCN2022096510-appb-000042
或者上下文信息输入概率估计网络,对特征图
Figure PCTCN2022096510-appb-000043
中的每个特征元素
Figure PCTCN2022096510-appb-000044
进行概率估计得到均值参数μ和方差σ的值。进一步地,将所述均值参数μ和方差σ输入所使用的概率分布模型中,得到概率分布。此时概率估计结果为均值参数μ和方差σ。
当概率分布模型拉普拉斯分布模型时,首先将边信息
Figure PCTCN2022096510-appb-000045
或者上下文信息输入概率估计网络,对特征图
Figure PCTCN2022096510-appb-000046
中的每个特征元素
Figure PCTCN2022096510-appb-000047
进行概率估计得到位置参数μ和尺度参数b的值。进一步地,将所述位置参数μ和尺度参数b输入所使用的概率分布模型中,得到概率分布。此时概率估计结果为位置参数μ和尺度参数b。
还可以将边信息
Figure PCTCN2022096510-appb-000048
和/或上下文信息输入概率估计网络,对待编码特征图
Figure PCTCN2022096510-appb-000049
中的每个特征元素
Figure PCTCN2022096510-appb-000050
进行概率估计得到当前待编码特征元素
Figure PCTCN2022096510-appb-000051
的概率分布。根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000052
取值为m的概率P。此时概率估计结果为当前待编码特征元素
Figure PCTCN2022096510-appb-000053
取值为m的概率P。
其中,概率估计网络可以使用基于深度学习网络,例如循环神经网络(Recurrent Neural Network,RNN)和卷积神经网络(Convolutional Neural Network,PixelCNN)等,在此不做限定。
步骤1504:根据概率估计结果判断当前待编码特征元素
Figure PCTCN2022096510-appb-000054
是否需要执行熵编码,并根据判断结果执行熵编码写入压缩码流(编码码流)或者不执行熵编码。仅当判断出需要对所述当前待编码第一特征元素执行熵编码时,对所述当前待编码特征元素执行熵编码。
本步骤具体由图3B中的编码决策实现208来实现,具体可以参照上述编码决策实现26的描述。根据概率估计结果判断当前待编码特征元素
Figure PCTCN2022096510-appb-000055
是否需要执行熵编码可 以使用以下方法中的一项或者多项。其中,参数x,y,i为正整数,坐标(x,y,i)表示当前待编码特征元素的位置,具体的,坐标(x,y,i)表示当前待编码特征元素在当前三维特征图中相对于左上顶点的特征元素的位置。
方法一:当所述概率分布模型为高斯分布时,根据所述第一特元素的概率估计结果判断是否对所述当前待编码特征元素执行熵编码,当当前待编码特征元素的高斯分布的均值参数μ和方差σ的值不满足预设条件:当均值μ与k的差的绝对值小于第二阈值T1且方差σ小于第三阈值T2时,不需要对当前待编码特征元素
Figure PCTCN2022096510-appb-000056
执行熵编码过程,否则,当满足预设条件:当均值μ与k的差的绝对值大于或者等于第二阈值T1或方差σ小于第三阈值T2时,对当前待编码特征元素
Figure PCTCN2022096510-appb-000057
执行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,2,3等。T2取值为满足0<T2<1中的任一数,例如取值为0.2,0.3,0.4等。T1是大于或等于0小于1的数,例如0.01,0.02,0.001,0.002。
特别地,k取值为0时为最优值,可以直接判断当高斯分布的均值参数μ绝对值小于T1且高斯分布的方差σ小于T2时,则跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000058
执行熵编码过程,否则,对当前待编码特征元素
Figure PCTCN2022096510-appb-000059
执行熵编码写入码流。其中,T2的取值为满足0<T2<1中的任一数,例如取值为0.2,0.3,0.4等。T1是大于或等于0小于1的数,例如0.01,0.02,0.001,0.002。
方法二:当所述概率分布为高斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000060
的高斯分布的均值参数μ和方差σ的值,当均值μ、方差σ与k的关系满足abs(μ-k)+σ<T3时(不满足预设条件),跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000061
执行熵编码过程,其中,abs(μ-k)表示计算均值μ与k的差的绝对值;否则,当当前待编码特征元素的概率估计结果满足abs(μ-k)+σ≥T3时(预设条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000062
执行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。其中,第四阈值T3是大于或等于0小于1的数,例如取值为0.2,0.3,0.4等。
方法三:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000063
的拉普拉斯分布的位置参数μ、尺度参数b的值。当位置参数μ、尺度参数b与k的关系满足abs(μ-k)+σ<T4(不满足预设条件)时,跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000064
执行熵编码过程,其中,abs(μ-k)表示计算位置参数μ与k的差的绝对值;否则,当当前待编码特征元素的概率估计结果满足abs(μ-k)+σ≥T4(预设条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000065
执行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。第四阈值T4是大于或等于0小于0.5的数,例如取值为0.05,0.09,0.17 等。
方法四:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000066
的拉普拉斯分布的位置参数μ、尺度参数b的值。当位置参数μ与k的差的绝对值小于第二阈值T5且尺度参数b小于第三阈值T6(不满足预设条件)时,跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000067
执行熵编码过程,否则,当位置参数μ与k的差的绝对值小于第二阈值T5或尺度参数b大于或者等于第三阈值T6(预设条件)时,对当前待编码特征元素
Figure PCTCN2022096510-appb-000068
执行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。T5取值为1e-2,T6取值为满足T6<0.5中的任一数,例如取值为0.05,0.09,0.17等。
特别地,k取值为0时为最优值,可以直接判断当位置参数μ绝对值小于T5且尺度参数b小于T6时,则跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000069
执行熵编码过程,否则,对当前待编码特征元素
Figure PCTCN2022096510-appb-000070
执行熵编码写入码流。其中,阈值T5取值为1e-2,T2的取值为满足T6<0.5中的任一数,例如取值为0.05,0.09,0.17等。
方法五:当所述概率分布为混合高斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000071
的混合高斯分布的所有均值参数μ i和方差σ i的值。当混合高斯分布的所有均值与k的差的绝对值之和与所述混合高斯分布的任一方差的和小于第五阈值T7(不满足预设条件)时,跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000072
执行熵编码过程;否则,当混合高斯分布的所有均值与k的差的绝对值之和与所述混合高斯分布的任一方差的和大于或者等于第五阈值T7(预设条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000073
执行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。T7是大于或等于0小于1的数,例如取值为0.2,0.3,0.4等。(可以认为每个特征元素的阈值都相同)
方法六:根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000074
取值为k的概率P,当当前待编码特征元素的概率估计结果P不满足预设条件:当P大于(或者等于)第一阈值T0时,跳过当前待编码特征元素执行熵编码过程;否则,当当前待编码特征元素的概率估计结果P满足预设条件:当P小于第一阈值T0时,对当前待编码特征元素执行熵编码写入码流。其中,k可为任意整数,例如0,1,-1,2,3等。所述第一阈值T0为满足0<T0<1中的任一数,例如取值为0.99,0.98,0.97,0.95等。(可以认为每个特征元素的阈值都相同)
需要说明的是,在实际应用中,为保证平台的一致性,可以对所述阈值T1,T2,T3,T4,T5和T6进行整点化,即进行移位放大为整数。
需要说明的是,阈值的获取方法还可以使用以下方法之一,在此不做限定:
方法一:以阈值T1为例,取T1取值范围内的任意一个取值作为阈值T1,将阈值T1写入码流。具体地,将所述阈值写入码流,可将其保存在序列头、图像头、Slice/条带或SEI中传送到解码端,还可以使用其他方法,在此不做限定。其余阈值T0,T2,T3,T4,T5和T6也可以使用类似方法。
方法二:编码端采用与解码端约定的固定阈值,无需写入码流,无需传输到解码端。例如,以阈值T1为例,直接取T1取值范围内任一值作为T1的取值。其余阈值T0,T2,T3,T4,T5和T6也可以使用类似方法。
方法三:构建阈值候选列表,将在T1取值范围内最有可能的取值放入阈值候选列表中,每个阈值对应一个阈值索引号,确定一个最优的阈值,将最优阈值作为T1的值,并将最优阈值的索引号作为T1的阈值索引号,将T1的阈值索引号写入码流。具体地,将所述阈值写入码流,可将其保存在序列头、图像头、Slice/条带或SEI中传送到解码端,还可以使用其他方法,在此不做限定。其余阈值T0,T2,T3,T4,T5和T6也可以使用类似方法。
步骤1505:编码器发送或存储压缩码流。
解码端:
步骤1511:获取待解码图像特征图的码流
步骤1512:获取各特征元素的概率估计结果
本步骤具体由图11A中的概率估计单元302来实现,具体可以参照上述对概率估计40的描述。对边信息
Figure PCTCN2022096510-appb-000075
执行熵解码得到边信息
Figure PCTCN2022096510-appb-000076
结合边信息
Figure PCTCN2022096510-appb-000077
对待解码特征图
Figure PCTCN2022096510-appb-000078
中的每个特征元素
Figure PCTCN2022096510-appb-000079
进行概率估计,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000080
的概率估计结果。
需要说明的是,解码端使用的概率估计方法与本实施例编码端的概率估计方法对应相同,概率估计网络结构图与本实施例的编码端概率估计网络结构相同,在此不做赘述。
步骤1513:本步骤具体由图11A中的解码决策实现304来实现,具体可以参照上述对解码决策实现30的描述。根据概率估计结果判断当前待解码特征元素
Figure PCTCN2022096510-appb-000081
是否需要执行熵解码,并根据判断结果执行或者不执行熵解码,得到解码后的特征图
Figure PCTCN2022096510-appb-000082
根据概率估计结果判断当前待解码特征元素
Figure PCTCN2022096510-appb-000083
是否需要执行熵解码可以使用以下方法中的一项或者多项。
方法一:当所述概率分布模型为高斯分布时,根据所述概率估计结果,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000084
的均值参数μ和方差σ的值,当均值μ与k的差的绝对值小于第二阈 值T1且方差σ小于第三阈值T2时(不满足预设条件),将当前待解码特征元素
Figure PCTCN2022096510-appb-000085
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000086
执行熵解码过程;否则,当均值μ与k的差的绝对值小于第二阈值T1或方差σ大于或等于第三阈值T2时(预设条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000087
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000088
的值。
特别地,k取值为0时为最优值,可以直接判断当高斯分布的均值参数μ绝对值小于T1且高斯分布的方差σ小于T2时,将当前待解码特征元素
Figure PCTCN2022096510-appb-000089
的数值设置为k,则跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000090
执行熵解码过程,否则,对当前待解码特征元素
Figure PCTCN2022096510-appb-000091
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000092
的值。
方法二:当所述概率分布为高斯分布时,根据所述概率估计结果,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000093
的均值参数μ和方差σ的值,当均值μ、方差σ与k的关系满足abs(μ-k)+σ<T3时(不满足预设条件),T3为第四阈值,将当前待解码特征元素
Figure PCTCN2022096510-appb-000094
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000095
执行熵解码过程,否则,当当前待解码特征元素的概率估计结果满足abs(μ-k)+σ≥T3时(预设条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000096
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000097
的值。
方法三:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到位置参数μ、尺度参数b的值。当位置参数μ、尺度参数b与k的关系满足abs(μ-k)+σ<T4时(不满足预设条件),T4为第四阈值,将当前待解码特征元素
Figure PCTCN2022096510-appb-000098
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000099
执行熵解码过程,否则,当当前待解码特征元素的概率估计结果满足abs(μ-k)+σ≥T4(预设条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000100
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000101
的值。
方法四:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到位置参数μ、尺度参数b的值。当位置参数μ与k的差的绝对值小于第二阈值T5且尺度参数b小于第三阈值T6时(不满足预设条件),将当前待解码特征元素
Figure PCTCN2022096510-appb-000102
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000103
执行熵解码过程,否则,当位置参数μ与k的差的绝对值小于第二阈值T5或尺度参数b大于或者等于第三阈值T6(预设条件)时,对当前待解码特征元素
Figure PCTCN2022096510-appb-000104
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000105
的值。
特别地,k取值为0时为最优值,可以直接判断当位置参数μ绝对值小于T5且尺度参数b小于T6时,将当前待解码特征元素
Figure PCTCN2022096510-appb-000106
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000107
执行熵解码过程,否则,对当前待解码特征元素
Figure PCTCN2022096510-appb-000108
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000109
的值。
方法五:当所述概率分布为混合高斯分布时,根据所述概率估计结果,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000110
的混合高斯分布的所有均值参数μ i和方差σ i的值。当混合高斯分布的所有均值与当前待解码特征元素取值k的差的绝对值之和与所述混合高斯分布的任一方差的和小于第五阈值T7时(不满足预设条件),将当前待解码特征元素
Figure PCTCN2022096510-appb-000111
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000112
执行熵解码过程,否则,当混合高斯分布的所有均值与所述当前待解码特征元素取值k的差的绝对值之和与所述混合高斯分布的任一方差的和大于或者等于第五阈值T7(预设条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000113
执行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000114
的值。
方法六:根据当前待解码特征元素的概率分布,得到当前待解码特征元素取值为k的概率P,即当前待解码特征元素的概率估计结果P,当概率估计结果P不满足预设条件:P大于所述第一阈值T0时,不需要对当前待解码特征元素执行熵解码,将当前待解码特征元素数值设置为k,否则,当当前待解码特征元素满足预设条件:P小于或者等于所述第一阈值T0时,对码流执行熵解码,得到当前待解码特征元素的值。
以上解码端的k值与编码端的k值相对应设置。
其中,获取阈值T0,T1,T2,T3,T4,T5,T6和T7的方法与编码端对应,可使用以下方法之一:
方法一:从码流中获取阈值,具体地,从序列头、图像头、Slice/条带或SEI中获取阈值。
方法二:解码端采用与编码端约定的固定阈值。
方法三:从码流中获取阈值索引号,具体地,从序列头、图像头、Slice/条带或SEI中获取阈值索引号。然后解码端使用与编码端相同的方式构建阈值候选列表,根据阈值索引号在阈值候选列表中得到相应的阈值。
需要说明的是,在实际应用中,为保证平台的一致性,可以对所述阈值T1,T2,T3,T4,T5和T6进行整点化,即进行移位放大为整数。
步骤1514:与步骤1414相同。
图13A示出了本申请实施例三提供的具体实现流程1600,运行步骤如下:
编码端:
步骤1601:与步骤1501相同,本步骤具体由图3C中的编码网络204来实现,具体可以参照上述对编码网络20的描述;
步骤1602:与步骤1502相同,本步骤具体由图3C中的边信息提取214来实现;
步骤1603:对特征图
Figure PCTCN2022096510-appb-000115
进行概率估计得到各特征元素的概率估计结果;
本步骤具体可以由图3C中的概率估计210来实现,具体可以参照上述对概率估计40的描述。可以使用概率分布模型来获得概率估计结果。其中,概率分布模型可以为:单高斯模型或者非对称高斯模型或者混合高斯模型或者拉普拉斯分布模型。
当概率分布模型为高斯模型时(单高斯模型或者非对称高斯模型或者混合高斯模型),首先将边信息
Figure PCTCN2022096510-appb-000116
或者上下文信息输入概率估计网络,对特征图
Figure PCTCN2022096510-appb-000117
中的每个特征元素
Figure PCTCN2022096510-appb-000118
进行概率估计得到模型参数均值参数μ和方差σ的值,即概率估计结果。
当概率分布模型拉普拉斯分布模型时,首先将边信息
Figure PCTCN2022096510-appb-000119
或者上下文信息输入概率估计网络,对特征图
Figure PCTCN2022096510-appb-000120
中的每个特征元素
Figure PCTCN2022096510-appb-000121
进行概率估计得到模型参数位置参数μ和尺度参数b的值,即概率估计结果。
进一步地,将所述概率估计结果输入所使用的概率分布模型中,得到概率分布。或者,
将边信息
Figure PCTCN2022096510-appb-000122
和/或上下文信息输入概率估计网络,对待编码特征图
Figure PCTCN2022096510-appb-000123
中的每个特征元素
Figure PCTCN2022096510-appb-000124
进行概率估计得到当前待编码特征元素
Figure PCTCN2022096510-appb-000125
的概率分布。根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000126
取值为m的概率P。其中,m为任意的整数,例如0,1,-1,-2,3等。
其中,概率估计网络可以使用基于深度学习网络,例如循环神经网络和卷积神经网络等,在此不做限定。
步骤1604:根据概率估计结果判断是否对所述当前待编码特征元素执行熵编码。根据所述判断结果对所述当前待编码特征元素执行熵编码并写入编码码流或不执行熵编码。仅当判断出需要对所述当前待编码特征元素执行熵编码时,对所述当前待编码特征元素执行熵编码。
本步骤具体由图3C中的生成网络216以及编码决策实现208来实现,具体可以参照上述对生成网络46和编码决策实现26的描述。将所述概率估计结果211输入判断模块,输出与特征图
Figure PCTCN2022096510-appb-000127
维度相同的决策信息217。本实施例中决策信息217可以为三维的决策图map。其中,判断模块可以使用网络的方法来实现,即将所述概率估计结果或者概率分布输入图7所示的生成网络,网络输出决策图map。决策图map[x][y][i]为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000128
需要熵编码,根据概率分布对当前待编码特征元素执行熵编码。决策图map[x][y][i]为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000129
高概率取值为k,决策图map[x][y][i]不为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000130
不需要熵编码,即跳过熵编码的过程。其中,决策信息是与特征图
Figure PCTCN2022096510-appb-000131
维度相同的决策图map。决策图map[x][y][i]表示决策图map中坐标位置为(x,y,i)处的值。当生成网络输出的决策图map中所述当前待编码特征元素
Figure PCTCN2022096510-appb-000132
只有两种可选值时,预设值为特定某一数值,例如当前待编码特征元素可选数值为0和1时,预设值为0或1;当生成网络输出决策图map中当前待编码特征元素
Figure PCTCN2022096510-appb-000133
有多种可选值时,预设值为一些特定数值,例如当前待编码特征元素
Figure PCTCN2022096510-appb-000134
元素可选数值为0~255时,预设值为0~255的真子集。
在一种可能实现的方式中,将所述当前待编码特征元素的概率估计结果或者概率分布输入判断模块,判断模块直接输出当前待编码特征元素是否需要执行熵编码的决策信息。例如,判断模块输出的决策信息为预设值时,表示当前待编码特征元素需要执行熵编码,判断模块输出的决策信息不为预设值时,表示当前待编码特征元素不需要执行熵编码。判断模块可以通过网络的方法来实现,即将所述概率估计结果或者概率分布输入图7所示的生成网络,网络输出决策信息,即预设值。
方法一:决策信息是与特征图
Figure PCTCN2022096510-appb-000135
维度相同的决策图map,决策图map[x][y][i]为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000136
需要熵编码,根据概率分布对当前待编码特征元素执行熵编码。决策图map[x][y][i]不为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000137
高概率取值为k,决策图map[x][y][i]为0表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000138
不需要熵编码,即跳过熵编码的过程。当决策图map中所述特征元素
Figure PCTCN2022096510-appb-000139
只有两种可选值时,预设值为特定某一数值,例如特征元素可选数值为0和1时,预设值为0或1;当决策图map中特征元素
Figure PCTCN2022096510-appb-000140
有多种可选值时,预设值为一些特定数值,例如特征元素
Figure PCTCN2022096510-appb-000141
元素可选数值为0~255时,预设值为0~255的真子集。
方法二:决策信息是与特征图
Figure PCTCN2022096510-appb-000142
维度相同的决策图map,决策图map[x][y][i]大于或等于阈值T0表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000143
需要熵编码,根据概率分布对当前待编码特征元素执行熵编码。决策图map[x][y][i]小于阈值T0表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000144
高概率取值为k,表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000145
不需要熵编码,即跳过熵编码的过程。根据决策图的数值范围,T0可为数值范围内的均值。
方法三:决策信息还可以是所述联合网络直接输出的标识或者标识的值,当决策信息为预设值时,表示当前待编码特征元素需要执行熵编码,判断模块输出的决策信息不为预设值时,表示当前待编码特征元素不需要执行熵编码。例如标识或者标识的值的可选数值为0和1时,则相应地预设值为0或1。当标识或者标识的值也可以有多种可选值时,预 设值为一些特定数值,例如标识或者标识的值的可选数值为0~255时,预设值为0~255的真子集。
其中高概率是指:当前待编码特征元素
Figure PCTCN2022096510-appb-000146
取值为k时的概率很高,大于阈值P,其中P可以是大于0.9的数,例如0.9,0.95或0.98等。
步骤1605:编码器发送或存储压缩码流。
对特征图
Figure PCTCN2022096510-appb-000147
中的至少一个特征元素执行上述步骤1601到1604,以得到压缩码流,并将压缩码流传输到解码端。
解码端:
步骤1611:获取待解码的压缩码流
步骤1612:对待解码特征图
Figure PCTCN2022096510-appb-000148
进行概率估计得到各特征元素的概率估计结果
本步骤具体可以由图13B中的概率估计302来实现,具体可以参照上述对概率估计40的描述。从码流中获取边信息
Figure PCTCN2022096510-appb-000149
使用步骤1603中方法获取当前待解码特征元素的概率估计结果
步骤1613:获取决策信息,并根据决策信息判断是否执行熵解码。
本步骤具体可以由图13B中的生成网络310以及解码决策实现304来实现,具体可以参照上述对生成网络46和解码决策实现30的描述。使用与本实施例编码端相同的方法获取决策信息311。决策图map[x][y][i]为预设值表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000150
需要熵解码,根据概率分布对当前待解码特征元素执行熵解码。决策图map[x][y][i]不为预设值表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000151
不需要熵解码,即表示对应位置
Figure PCTCN2022096510-appb-000152
为特定数值k。
在一种可能实现的方式中,将所述当前待解码特征元素的概率估计结果或者概率分布输入判断模块,判断模块直接输出当前待解码特征元素是否需要执行熵解码的决策信息。例如,判断模块输出的决策信息为预设值时,表示当前待解码特征元素需要执行熵解码,判断模块输出的决策信息不为预设值时,表示当前待解码特征元素不需要执行熵解码,将当前待解码特征元素的值设置为k。判断模块可以通过网络的方法来实现,即将所述概率估计结果或者概率分布输入图8所示的生成网络,网络输出决策信息,即预设值。此决策信息用于指示是否对所述当前待解码特征元素执行熵解码,所述决策信息可以包括决策图map。
步骤1614:步骤1414相同。
以上解码端的k值与编码端的k值相对应设置。
图14示出了本申请实施例四的具体实现流程1700,运行步骤如下:
编码端:
步骤1701:与步骤1501相同,本步骤具体可以由图3D中的编码网络204来实现,具体可以参照上述对编码网络20的描述;
步骤1702:与步骤1502相同,本步骤具体由图3D中的边信息提取214来实现;
步骤1703:获取特征图
Figure PCTCN2022096510-appb-000153
中每个特征元素的概率估计结果和决策信息;
本步骤具体可以由图3D中的联合网络218来实现,具体可以参照上述对联合网络34的描述。具体的,将边信息
Figure PCTCN2022096510-appb-000154
和/或上下文信息输入联合网络,联合网络输出待编码特征图
Figure PCTCN2022096510-appb-000155
中的每个特征元素
Figure PCTCN2022096510-appb-000156
的概率分布和/或概率估计结果,及与特征图
Figure PCTCN2022096510-appb-000157
维度相同的决策信息。例如,当同时将边信息
Figure PCTCN2022096510-appb-000158
和上下文信息输入联合网络,可采用网络结构如图15所示。
需要说明的是,本实施例对联合网络的具体结构不做约束。
需要说明的是,决策信息、概率分布和/或概率估计结果均可以从联合网络的不同层输出。例如:情况1)网络中间层输出决策信息,最后层输出概率分布和/或概率估计结果;情况2)网络中间层输出概率分布和/或概率估计结果,最后层输出决策信息;情况3)网络最后层一起输出决策信息及概率分布和/或概率估计结果。
当概率分布模型为高斯模型时(单高斯模型或者非对称高斯模型或者混合高斯模型),首先将边信息
Figure PCTCN2022096510-appb-000159
或者上下文信息输入联合网络,得到模型参数均值参数μ和方差σ的值,即概率估计结果。进一步地,将概率估计结果输入高斯模型中,得到概率分布。
当概率分布模型拉普拉斯分布模型时,首先将边信息
Figure PCTCN2022096510-appb-000160
或者上下文信息输入联合网络,得到模型参数位置参数μ和尺度参数b的值,即概率估计结果。进一步地,将概率估计结果输入拉普拉斯分布模型中,得到概率分布。
或者,将边信息
Figure PCTCN2022096510-appb-000161
和/或上下文信息输入联合网络,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000162
的概率分布。根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000163
取值为m的概率P,为概率估计结果。其中,m为任意的整数,例如0,1,-1,-2,3等。
步骤1704,并根据决策信息判断是否执行熵编码;根据判断结果执行熵编码并写入压缩码流(编码码流)或者不执行熵编码。仅当判断出需要对所述当前待编码特征元素执行熵编码时,对所述当前待编码特征元素执行熵编码。本步骤具体可以由图3D中的编码决策实现208来实现,具体可以参照上述编码决策实现26的描述。
方法一:决策信息是与特征图
Figure PCTCN2022096510-appb-000164
维度相同的决策图map,决策图map[x][y][i]为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000165
需要熵编码,根据概率分布对当前待编码特征元素执行熵编码。决策图map[x][y][i]不为预设值表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000166
高概率取值为k,决策图map[x][y][i]为0表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000167
不需要熵编码,即跳过熵编码的过程。当决策图map中所述当前待编码特征元素
Figure PCTCN2022096510-appb-000168
只有两种可选值时,预设值为特定某一数值,例如当前待编码特征元素可选数值为0和1时,预设值为0或1;当决策图map中当前待编码特征元素
Figure PCTCN2022096510-appb-000169
有多种可选值时,预设值为一些特定数值,例如当前待编码特征元素
Figure PCTCN2022096510-appb-000170
元素可选数值为0~255时,预设值为0~255的真子集。
方法二:决策信息是与特征图
Figure PCTCN2022096510-appb-000171
维度相同的决策图map,决策图map[x][y][i]大于或等于阈值T0表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000172
需要熵编码,根据概率分布对当前待编码特征元素执行熵编码。决策图map[x][y][i]小于阈值T0表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000173
高概率取值为k,表示对应位置的当前待编码特征元素
Figure PCTCN2022096510-appb-000174
不需要熵编码,即跳过熵编码的过程。根据决策图map的数值范围,T0可为数值范围内的均值。
方法三:决策信息还可以是所述联合网络直接输出的标识或者标识的值,当决策信息为预设值时,表示当前待编码特征元素需要执行熵编码,判断模块输出的决策信息不为预设值时,表示当前待编码特征元素不需要执行熵编码。当联合网络输出决策图map中所述当前待编码特征元素只有两种可选值时,预设值为特定某一数值,例如所述当前待编码特征元素可选数值为0和1时,预设值为0或1;当联合网络输出决策图map中所述当前待编码特征元素有多种可选值时,预设值为一些特定数值,例如所述当前待编码特征元素可选数值为0~255时,预设值为0~255的真子集。
其中高概率是指:当前待编码特征元素
Figure PCTCN2022096510-appb-000175
取值为m时的概率很高,例如取值为k时的概率大于阈值P,其中P可以是大于0.9的数,例如0.9,0.95或0.98等。
步骤1705:编码器发送或存储压缩码流。
解码端:
步骤1711:获取待解码图像特征图的码流,从码流中获取边信息
Figure PCTCN2022096510-appb-000176
步骤1712:获取特征图
Figure PCTCN2022096510-appb-000177
中每个特征元素的概率估计结果和决策信息
本步骤具体可以由图16中的联合网络312来实现,具体可以参照上述对联合网络34的描述。获取特征图
Figure PCTCN2022096510-appb-000178
中每个特征元素的概率估计结果和决策信息的方法同步骤1703。
步骤1713:根据决策信息判断是否执行熵解码;根据判断结果执行或者不执行熵解码,本步骤具体可以由图16中的解码决策实现304来实现,具体可以参照上述对解码决策实现30的描述。
方法一:决策信息是决策图map,决策图map[x][y][i]为预设值表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000179
需要熵解码,根据概率分布对当前待解码特征元素执行熵解码。决策图map[x][y][i]不为预设值表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000180
不需要熵解码,即表示对应位置
Figure PCTCN2022096510-appb-000181
设定为特定数值k。
方法二:决策信息是与特征图
Figure PCTCN2022096510-appb-000182
维度相同的决策图map,决策图map[x][y][i]大于或等于阈值T0表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000183
需要熵解码。决策图map[x][y][i]小于阈值T0表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000184
高概率取值为k,表示对应位置的当前待解码特征元素
Figure PCTCN2022096510-appb-000185
不需要熵解码,即表示对应位置
Figure PCTCN2022096510-appb-000186
设定为特定数值k。T0取值与编码端相同。
方法三:决策信息还可以是所述联合网络直接输出的标识或者标识的值,当所述决策信息为预设值时,表示当前待解码特征元素需要执行熵解码,判断模块输出所述决策信息不为预设值时,表示当前待解码特征元素不需要执行熵解码,将当前待解码特征元素的值设置为k。当联合网络输出决策图map中所述当前待解码特征元素只有两种可选值时,预设值为特定某一数值,例如所述当前待解码特征元素可选数值为0和1时,预设值为0或1;当联合网络输出决策图map中所述当前待解码特征元素有多种可选值时,预设值为一些特定数值,例如所述当前待解码特征元素可选数值为0~255时,预设值为0~255的真子集。
步骤1714:与步骤1414相同,本步骤具体可以由上述实施例的解码器9C中的解码网络单元306来实现,具体可以参照上述实施例对解码网络单元306的描述。
以上解码端的k值与编码端的k值相对应设置。
图17示出了本申请实施例五的具体实现流程1800,运行步骤如下:
步骤1801:获取待编码音频数据的特征变量
待编码音频信号可以是时域音频信号;待编码音频信号可以是时域信号经过时频变换后得到的频域信号,例如频域信号可以是时域音频信号经过MDCT变换后的频域信号,时域音频信号经过FFT变换后的频域信号;待编码信号也可以是QMF滤波后的信号;待编码信号还可以是残差信号,例如其他编码的残差信号或者LPC滤波后的残差信号。
获取待编码音频数据的特征变量:可以是根据待编码音频信号提取特征矢量,例如根据待编码音频信号提取梅尔倒谱系数;对提取的特征矢量进行量化,将量化后的特征矢量作为待编码音频数据的特征变量。
获取待编码音频数据的特征变量:还可以利用现有的神经网络来实现,例如将待编码音频信号经过编码神经网络处理获得潜在变量,对神经网络输出的潜在变量进行量化,将量化后的潜在变量作为待编码音频数据的特征变量。编码神经网络处理是预先训练好的,本发明对编码神经网络的具体网络结构和训练方法不做限定。例如编码神经网络可以选择全连接网络或者CNN网络。本发明对编码神经网络包含的层数和每一层的节点数也不做限定。
不同结构的编码神经网络输出的潜在变量的形式可能不同。例如,编码神经网络是全连接网络,输出的潜在变量为一个矢量,矢量的维数M是潜在变量的大小(latent size),例如y=[y(0),y(1),…,y(M-1)]。编码神经网络是CNN网络,输出的潜在变量为一个N*M维矩阵,其中N为CNN网络的通道数(channel),M为CNN网络的每个通道潜在变量的大小(latent size),如
Figure PCTCN2022096510-appb-000187
对神经网络输出的潜在变量进行量化的具体方法可以是对潜在变量的每个元素进行标量量化,标量量化的量化步长可以根据不同的编码速率来确定。标量量化还可以存在偏置量,例如待量化的潜在变量经过偏置处理后再按照确定好的量化步长进行标量量化。对潜在变量进行量化的量化方法还可以使用其他的现有量化技术实现,本发明不做限定。
量化后的特征矢量或者量化后的潜在变量均可记作
Figure PCTCN2022096510-appb-000188
即待编码音频数据的特征变量。
步骤1802:待编码音频数据的特征变量
Figure PCTCN2022096510-appb-000189
输入边信息提取模块,输出边信息
Figure PCTCN2022096510-appb-000190
其中,边信息提取模块可以使用图12所示的网络来实现,边信息
Figure PCTCN2022096510-appb-000191
可以理解为对特征变量
Figure PCTCN2022096510-appb-000192
进行进一步提取得到的特征变量
Figure PCTCN2022096510-appb-000193
所含包含的特征元素的个数比特征变量
Figure PCTCN2022096510-appb-000194
少。
需要说明的是,可以在本步骤中,对边信息
Figure PCTCN2022096510-appb-000195
进行熵编码并写入码流,也可以在后续的步骤1804中对边信息
Figure PCTCN2022096510-appb-000196
进行熵编码并写入码流,在此不做限定。
步骤1803:对特征变量
Figure PCTCN2022096510-appb-000197
进行概率估计得到各特征元素的概率估计结果。
可以使用概率分布模型来获得概率估计结果及概率分布。其中,概率分布模型可以为:单高斯模型(Gaussian single model,GSM)或者非对称高斯模型或者混合高斯模型(Gaussian mixture model,GMM)或者拉普拉斯分布模型(Laplace distribution)。
下面以特征变量
Figure PCTCN2022096510-appb-000198
为N*M维矩阵为例进行说明。当前待编码特征变量
Figure PCTCN2022096510-appb-000199
中的特征元素记作
Figure PCTCN2022096510-appb-000200
当概率分布模型为高斯模型时(单高斯模型或者非对称高斯模型或者混合高斯模型),首先将边信息
Figure PCTCN2022096510-appb-000201
或者上下文信息输入概率估计网络,对特征变量
Figure PCTCN2022096510-appb-000202
中的每个特征元素
Figure PCTCN2022096510-appb-000203
进行概率估计得到均值参数μ和方差σ的值。进一步地,将所述均值参数μ和方差σ输入所使用的概率分布模型中,得到概率分布。此时概率估计结果为均值参数μ和方差σ。
也可以值估计方差,例如当概率分布模型为高斯模型时(单高斯模型或者非对称高斯模型或者混合高斯模型),首先将边信息
Figure PCTCN2022096510-appb-000204
或者上下文信息输入概率估计网络,对特征变量
Figure PCTCN2022096510-appb-000205
中的每个特征元素
Figure PCTCN2022096510-appb-000206
进行概率估计得到方差σ的值。进一步地,将所述方差σ输入所使用的概率分布模型中,得到概率分布。此时概率估计结果为方差σ。
当概率分布模型拉普拉斯分布模型时,首先将边信息
Figure PCTCN2022096510-appb-000207
或者上下文信息输入概率估计网络,对特征图变量
Figure PCTCN2022096510-appb-000208
中的每个特征元素
Figure PCTCN2022096510-appb-000209
进行概率估计得到位置参数μ和尺度参数b的值。进一步地,将所述位置参数μ和尺度参数b输入所使用的概率分布模型中,得到概率分布。此时概率估计结果为位置参数μ和尺度参数b。
还可以将边信息
Figure PCTCN2022096510-appb-000210
和/或上下文信息输入概率估计网络,对待编码特征图
Figure PCTCN2022096510-appb-000211
中的每个特征元素
Figure PCTCN2022096510-appb-000212
进行概率估计得到当前待编码特征元素
Figure PCTCN2022096510-appb-000213
的概率分布。根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000214
取值为m的概率P。此时概率估计结果为当前待编码特征元素
Figure PCTCN2022096510-appb-000215
取值为m的概率P。
其中,概率估计网络可以使用基于深度学习网络,例如循环神经网络(Recurrent Neural Network,RNN)和卷积神经网络(Convolutional Neural Network,PixelCNN)等,在此不做限定。
步骤1804:根据概率估计结果判断当前待编码特征元素是否需要执行熵编码,并根据判断结果执行熵编码写入压缩码流(编码码流)或者不执行熵编码。
根据概率估计结果判断当前待编码特征元素
Figure PCTCN2022096510-appb-000216
是否需要执行熵编码可以使用以下方法中的一项或者多项。其中,参数j,i为正整数,坐标(j,i)表示当前待编码特征元素的位置。或者,根据概率估计结果判断当前待编码特征元素
Figure PCTCN2022096510-appb-000217
是否需要执行熵编码可以使用以下方法中的一项或者多项。其中,参数i为正整数,坐标i表示当前待编码特征元素的位置。
下面以根据概率估计结果判断当前待编码特征元素
Figure PCTCN2022096510-appb-000218
是否需要执行熵编码为例进行说明,判断当前待编码特征元素
Figure PCTCN2022096510-appb-000219
是否需要执行熵编码的方法类似,这里不再赘述。
方法一:当所述概率分布模型为高斯分布时,根据所述第一特元素的概率估计结果判断是否对所述当前待编码特征元素执行熵编码,当当前待编码特征元素的高斯分布的均值参数μ和方差σ的值满足第二条件:当均值μ与k的差的绝对值小于第二阈值T1且方差σ小于第三阈值T2时,不需要对当前待编码特征元素
Figure PCTCN2022096510-appb-000220
执行熵编码过程,否则,当满足第一条件:当均值μ与k的差的绝对值大于或者等于第二阈值T1或方差σ小于第三阈值T2时,对当前待编码特征元素
Figure PCTCN2022096510-appb-000221
进行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,2,3等。T2取值为满足0<T2<1中的任一数,例如取值为0.2,0.3,0.4等。T1是大于或等于0小于1的数,例如0.01,0.02,0.001,0.002。
特别地,k取值为0时为最优值,可以直接判断当高斯分布的均值参数μ绝对值小于T1且高斯分布的方差σ小于T2时,则跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000222
进行熵编码过程,否则,对当前待编码特征元素
Figure PCTCN2022096510-appb-000223
进行熵编码写入码流。其中,T2的取值为满足0<T2<1中的任一数,例如取值为0.2,0.3,0.4等。T1是大于或等于0小于1的数,例如0.01,0.02,0.001,0.002。
方法二:当所述概率分布为高斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000224
的高斯分布的均值参数μ和方差σ的值,当均值μ、方差σ与k的关系满足abs(μ-k)+σ<T3时(第二条件),跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000225
进行熵编码过程,其中,abs(μ-k)表示计算均值μ与k的差的绝对值;否则,当当前待编码特征元素的概率估计结果满足abs(μ-k)+σ≥T3时(第一条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000226
进行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。其中,第四阈值T3是大于或等于0小于1的数,例如取值为0.2,0.3,0.4等。
当所述概率分布为高斯分布时,如果对特征变量
Figure PCTCN2022096510-appb-000227
中的每个特征元素
Figure PCTCN2022096510-appb-000228
进行概率估计仅得到当前待编码特征元素
Figure PCTCN2022096510-appb-000229
的高斯分布的方差σ的值,当方差σ满足σ<T3时(第二条件),跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000230
进行熵编码过程;否则,当当前待编码特征元素的概率估计结果满足σ≥T3时(第一条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000231
进行熵编码写入码流。其中,第四阈值T3是大于或等于0小于1的数,例如取值为0.2,0.3,0.4等。
方法三:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000232
的拉普拉斯分布的位置参数μ、尺度参数b的值。当位置参数μ、尺度参数b与k的关系满足abs(μ-k)+σ<T4(第二条件)时,跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000233
进行熵编码过程,其中,abs(μ-k)表示计算位置参数μ与k的差的绝对值;否则, 当当前待编码特征元素的概率估计结果满足abs(μ-k)+σ≥T4(第一条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000234
进行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。第四阈值T4是大于或等于0小于0.5的数,例如取值为0.05,0.09,0.17等。
方法四:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000235
的拉普拉斯分布的位置参数μ、尺度参数b的值。当位置参数μ与k的差的绝对值小于第二阈值T5且尺度参数b小于第三阈值T6(第二条件)时,跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000236
进行熵编码过程,否则,当位置参数μ与k的差的绝对值小于第二阈值T5或尺度参数b大于或者等于第三阈值T6(第一条件)时,对当前待编码特征元素
Figure PCTCN2022096510-appb-000237
进行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。T5取值为1e-2,T6取值为满足T6<0.5中的任一数,例如取值为0.05,0.09,0.17等。
特别地,k取值为0时为最优值,可以直接判断当位置参数μ绝对值小于T5且尺度参数b小于T6时,则跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000238
进行熵编码过程,否则,对当前待编码特征元素
Figure PCTCN2022096510-appb-000239
进行熵编码写入码流。其中,阈值T5取值为1e-2,T2的取值为满足T6<0.5中的任一数,例如取值为0.05,0.09,0.17等。
方法五:当所述概率分布为混合高斯分布时,根据所述概率估计结果,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000240
的混合高斯分布的所有均值参数μ i和方差σ i的值。当混合高斯分布的所有均值与k的差的绝对值之和与所述混合高斯分布的任一方差的和小于第五阈值T7(第二条件)时,跳过对当前待编码特征元素
Figure PCTCN2022096510-appb-000241
进行熵编码过程;否则,当混合高斯分布的所有均值与k的差的绝对值之和与所述混合高斯分布的任一方差的和大于或者等于第五阈值T7(第一条件),对当前待编码特征元素
Figure PCTCN2022096510-appb-000242
进行熵编码写入码流。其中,k为任意的整数,例如0,1,-1,-2,3等。T7是大于或等于0小于1的数,例如取值为0.2,0.3,0.4等。(可以认为每个特征元素的阈值都相同)
方法六:根据所述概率分布,得到当前待编码特征元素
Figure PCTCN2022096510-appb-000243
取值为k的概率P,当当前待编码特征元素的概率估计结果P满足第二条件:当P大于(或者等于)第一阈值T0时,跳过当前待编码特征元素进行熵编码过程;否则,当当前待编码特征元素的概率估计结果P满足第一条件:当P小于第一阈值T0时,对当前待编码特征元素进行熵编码写入码流。其中,k可为任意整数,例如0,1,-1,2,3等。所述第一阈值T0为满足0<T0<1中的任一数,例如取值为0.99,0.98,0.97,0.95等。(可以认为每个特征元素的阈值都相同)
需要说明的是,在实际应用中,为保证平台的一致性,可以对所述阈值T1,T2,T3,T4,T5和T6进行整点化,即进行移位放大为整数。
需要说明的是,阈值的获取方法还可以使用以下方法之一,在此不做限定:
方法一:以阈值T1为例,取T1取值范围内的任意一个取值作为阈值T1,将阈值T1写入码流。具体地,将所述阈值写入码流,可将其保存在序列头、图像头、Slice/条带或SEI中传送到解码端,还可以使用其他方法,在此不做限定。其余阈值T0,T2,T3,T4,T5和T6也可以使用类似方法。
方法二:编码端采用与解码端约定的固定阈值,无需写入码流,无需传输到解码端。例如,以阈值T1为例,直接取T1取值范围内任一值作为T1的取值。其余阈值T0,T2,T3,T4,T5和T6也可以使用类似方法。
方法三:构建阈值候选列表,将在T1取值范围内最有可能的取值放入阈值候选列表中,每个阈值对应一个阈值索引号,确定一个最优的阈值,将最优阈值作为T1的值,并将最优阈值的索引号作为T1的阈值索引号,将T1的阈值索引号写入码流。具体地,将所述阈值写入码流,可将其保存在序列头、图像头、Slice/条带或SEI中传送到解码端,还可以使用其他方法,在此不做限定。其余阈值T0,T2,T3,T4,T5和T6也可以使用类似方法。
步骤1805:编码器发送或存储压缩码流。
解码端:
步骤1811:获取待解码音频特征变量的码流
步骤1812:获取各特征元素的概率估计结果
对边信息
Figure PCTCN2022096510-appb-000244
进行熵解码得到边信息
Figure PCTCN2022096510-appb-000245
结合边信息
Figure PCTCN2022096510-appb-000246
对待解码音频特征变量
Figure PCTCN2022096510-appb-000247
中的每个特征元素
Figure PCTCN2022096510-appb-000248
进行概率估计,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000249
的概率估计结果。其中,参数j,i为正整数,坐标(j,i)表示当前待解码特征元素的位置。或者,对边信息
Figure PCTCN2022096510-appb-000250
进行熵解码得到边信息
Figure PCTCN2022096510-appb-000251
结合边信息
Figure PCTCN2022096510-appb-000252
对待解码音频特征变量
Figure PCTCN2022096510-appb-000253
中的每个特征元素[i]进行概率估计,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000254
的概率估计结果。其中,参数i为正整数,坐标i表示当前待解码特征元素的位置。
需要说明的是,解码端使用的概率估计方法与本实施例编码端的概率估计方法对应相同,概率估计网络结构图与本实施例的编码端概率估计网络结构相同,在此不做赘述。
步骤1813:根据概率估计结果判断当前待解码特征元素是否需要执行熵解码,并根据判断结果执行或者不执行熵解码,得到解码后的特征变量
Figure PCTCN2022096510-appb-000255
根据概率估计结果判断当前待解码特征元素
Figure PCTCN2022096510-appb-000256
是否需要执行熵解码可以使用以下方法中的一项或者多项。或者,根据概率估计结果判断当前待解码特征元素
Figure PCTCN2022096510-appb-000257
是否需要执行熵解码可以使用以下方法中的一项或者多项。
下面以根据概率估计结果判断当前待解码特征元素
Figure PCTCN2022096510-appb-000258
是否需要执行熵解码为例进行说明,判断当前待解码特征元素
Figure PCTCN2022096510-appb-000259
是否需要执行熵解码的方法类似,这里不再赘述。
方法一:当所述概率分布模型为高斯分布时,根据所述概率估计结果,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000260
的均值参数μ和方差σ的值,当均值μ与k的差的绝对值小于第二阈值T1且方差σ小于第三阈值T2时(第二条件),将当前待解码特征元素
Figure PCTCN2022096510-appb-000261
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000262
进行熵解码过程;否则,当均值μ与k的差的绝对值小于第二阈值T1或方差σ大于或等于第三阈值T2时(第一条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000263
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000264
的值。
特别地,k取值为0时为最优值,可以直接判断当高斯分布的均值参数μ绝对值小于T1且高斯分布的方差σ小于T2时,将当前待解码特征元素
Figure PCTCN2022096510-appb-000265
的数值设置为k,则跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000266
进行熵解码过程,否则,对当前待解码特征元素
Figure PCTCN2022096510-appb-000267
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000268
的值。
方法二:当所述概率分布为高斯分布时,根据所述概率估计结果,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000269
的均值参数μ和方差σ的值,当均值μ、方差σ与k的关系满足abs(μ-k)+σ<T3时(第二条件),T3为第四阈值,将当前待解码特征元素
Figure PCTCN2022096510-appb-000270
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000271
进行熵解码过程,否则,当当前待解码特征元素的概率估计结果满足abs(μ-k)+σ≥T3时(第一条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000272
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000273
的值。当所述概率分布为高斯分布时,如果根据所述概率估计结果,仅得到当前待解码特征元素
Figure PCTCN2022096510-appb-000274
的方差σ的值,当方差σ关系满足σ<T3时(第二条件),T3为第四阈值,将当前待解码特征元素
Figure PCTCN2022096510-appb-000275
的数值设置为0,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000276
进行熵解码过程,否则,当当前待解码特征元素的概率估计结果满足σ≥T3时(第一条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000277
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000278
的值。
方法三:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到位置参数μ、尺度参数b的值。当位置参数μ、尺度参数b与k的关系满足abs(μ-k)+σ<T4时(第 二条件),T4为第四阈值,将当前待解码特征元素
Figure PCTCN2022096510-appb-000279
的数值设置为k,跳过对特征元素
Figure PCTCN2022096510-appb-000280
进行熵解码过程,否则,当当前待解码特征元素的概率估计结果满足abs(μ-k)+σ≥T4(第一条件),对特征元素
Figure PCTCN2022096510-appb-000281
进行熵解码,得到特征元素
Figure PCTCN2022096510-appb-000282
的值。
方法四:当所述概率分布为拉普拉斯分布时,根据所述概率估计结果,得到位置参数μ、尺度参数b的值。当位置参数μ与k的差的绝对值小于第二阈值T5且尺度参数b小于第三阈值T6时(第二条件),将当前待解码特征元素
Figure PCTCN2022096510-appb-000283
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000284
进行熵解码过程,否则,当位置参数μ与k的差的绝对值小于第二阈值T5或尺度参数b大于或者等于第三阈值T6(第一条件)时,对当前待解码特征元素
Figure PCTCN2022096510-appb-000285
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000286
的值。
特别地,k取值为0时为最优值,可以直接判断当位置参数μ绝对值小于T5且尺度参数b小于T6时,将当前待解码特征元素
Figure PCTCN2022096510-appb-000287
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000288
进行熵解码过程,否则,对当前待解码特征元素
Figure PCTCN2022096510-appb-000289
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000290
的值。
方法五:当所述概率分布为混合高斯分布时,根据所述概率估计结果,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000291
的混合高斯分布的所有均值参数μ i和方差σ i的值。当混合高斯分布的所有均值与k的差的绝对值之和与所述混合高斯分布的任一方差的和小于第五阈值T7时(第二条件),将当前待解码特征元素
Figure PCTCN2022096510-appb-000292
的数值设置为k,跳过对当前待解码特征元素
Figure PCTCN2022096510-appb-000293
进行熵解码过程,否则,当混合高斯分布的所有均值与k的差的绝对值之和与所述混合高斯分布的任一方差的和大于或者等于第五阈值T7(第一条件),对当前待解码特征元素
Figure PCTCN2022096510-appb-000294
进行熵解码,得到当前待解码特征元素
Figure PCTCN2022096510-appb-000295
的值。
方法六:根据当前待解码特征元素的概率分布,得到当前待解码特征元素取值为k的概率P,即当前待解码特征元素的概率估计结果P,当概率估计结果P满足第二条件:P大于所述第一阈值T0时,不需要对第一特征元素执行熵解码,将当前待解码特征元素数值设置为k,否则,当当前待解码特征元素满足第一条件:P小于或者等于所述第一阈值T0时,对码流进行熵解码,得到第一特征元素的值。
以上解码端的k值与编码端的k值相对应设置。
其中,获取阈值T0,T1,T2,T3,T4,T5,T6和T7的方法与编码端对应,可使用以下方法之一:
方法一:从码流中获取阈值,具体地,从序列头、图像头、Slice/条带或SEI中获取阈值。
方法二:解码端采用与编码端约定的固定阈值。
方法三:从码流中获取阈值索引号,具体地,从序列头、图像头、Slice/条带或SEI中获取阈值索引号。然后解码端使用与编码端相同的方式构建阈值候选列表,根据阈值索引号在阈值候选列表中得到相应的阈值。
需要说明的是,在实际应用中,为保证平台的一致性,可以对所述阈值T1,T2,T3,T4,T5和T6进行整点化,即进行移位放大为整数。
步骤1814:对解码后的特征变量
Figure PCTCN2022096510-appb-000296
进行重建,或者输入面向机器听觉任务模块执行相应的机器任务。本步骤具体可以由图10B中的解码网络306来实现,具体可以参照上述对解码网络34的描述。
情况一:将熵解码后的特征变量
Figure PCTCN2022096510-appb-000297
输入图像重建模块,神经网络输出重建音频。所述神经网络可以采用任一结构,例如全连接网络、卷积神经网络、循环神经网络等。所述神经网络可以采用多层的结构深度神经网络结构来达到更好的估计效果。
情况二:将熵解码后的特征变量
Figure PCTCN2022096510-appb-000298
输入面向机器听觉任务模块执行相应的机器任务。例如完成音频分类、识别等机器听觉任务。
以上解码端的k值与编码端的k值相对应设置。
图18为本申请编码装置的一个示例性的结构示意图,如图18所示,本示例的装置可以对应于编码器20A。该装置可以包括:获得模块2001和编码模块2002。获得模块2001可以包括前述实施例中的编码网络204、取整206(可选)、概率估计210、边信息提取214、生成网络216(可选)及联合网络218(可选)。编码模块2002包括前述实施例中的编码决策实现208。其中,
获得模块2001,用于获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素,以及用于获取所述中第一特征元素的概率估计结果;编码模块2002,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
在一种可能的实现方式中,所述判断是否对所述特征数据的第一特征元素执行熵编码包括:当所述特征数据的第一特征元素的概率估计结果满足预设条件,需要对所述特征数据的第一特征元素熵编码;当所述特征数据的第一特征元素的概率估计结果不满足预设条件,不需要对所述特征数据的第一特征元素熵编码。
在一种可能的实现方式中,所述编码模块,还用于根据所述特征数据的概率估计结果判断:所述特征数据的概率估计结果输入生成网络,网络输出决策信息。当所述第一特征元素的决策信息的取值为1时,需要对所述特征数据的第一特征元素编码;当所述第一特征元素的决策信息的取值不为1时,不需要对所述特征数据的第一特征元素编码。
在一种可能的实现方式中,所述预设条件为第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数。
在一种可能的实现方式中,所述预设条件为所述第一特征元素的概率分布的均值与第一特征元素取值k的差的绝对值大于或等于第二阈值或所述第一特征元素的方差大于或等于第三阈值,其中k为整数。
在另一种可能的实现方式中,所述预设条件为所述第一特征元素的概率分布的均值与第一特征元素取值k的差的绝对值与所述第一特征元素的概率分布的方差的和大于或等于第四阈值,其中k为整数。
在一种可能的实现方式中,所述第一特征元素取值为k的概率值为所述第一特征元素的所有可能的取值的概率值中的最大概率值。
在一种可能的实现方式中,对所述特征数据进行概率估计以得到所述特征数据中各特征元素的概率估计结果,其中所述第一特征元素的概率估计结果包括所述第一特征元素的概率值,和/或所述概率分布的第一参数和所述概率分布的第二参数。
在一种可能的实现方式中,将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息。根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码。
在一种可能的实现方式中,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
在一种可能的实现方式中,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。在一种可能的实现方式中,所述编码模块,还用于构建第一阈值的阈值候选列表,将所述第一阈值放入所述第一阈值的阈值候选列表中且对应有所述第一阈值的索引号,将所述第一阈值的索引号写入编码码流,其中所述第一阈值的阈值候选列表的长度可以设置为T;T为大于或等于1的整数。
本实施例的装置,可以用于图3A-3D所示方法实施例中由编码器实施的技术方案,其实现原理和技术效果类似,此处不再赘述。
图19为本申请解码装置的一个示例性的结构示意图,如图19所示,本示例的装置可以对应于解码器30。该装置可以包括:获得模块2101和解码模块2102。获得模块2101可以包括前述实施例中的概率估计302、生成网络310(可选)及联合网络312。解码模块2102包括前述实施例中的解码决策实现304和解码网络306。其中,
获得模块2101,用于获取待解码特征数据的码流,所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述第一特征元素的概率估计结果;解码模块2102,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
在一种可能的实现方式中,所述判断是否对所述特征数据的第一特征元素熵解码包括:当所述特征数据的第一特征元素的概率估计结果满足预设条件,需要对所述特征数据的第一特征元素解码;或当所述特征数据的第一特征元素的概率估计结果不满足预设条件,不需要对所述特征数据的第一特征元素解码,将第一特征元素的特征值设置为k;其中k为整数。
在一种可能的实现方式中,所述解码模块,还用于根据所述特征数据的概率估计结果判断:所述特征数据的概率估计结果输入判断网络模块,网络输出决策信息。当所述决策信息中对应所述特征数据的第一特征元素位置的取值为1时,对所述特征数据的第一特征元素解码;当所述决策信息中对应所述特征数据的第一特征元素位置的取值不为1时,不对所述特征数据的第一特征元素解码,将第一特征元素的特征值设置为k,其中k为整数。
在一种可能的实现方式中,所述预设条件为第一特征元素取值为k的概率值小于等于第一阈值,其中k为整数。
在另一种可能的实现方式中,所述预设条件为所述第一特征元素的概率分布的均值与所述第一特征元素取值k的差的绝对值大于或等于第二阈值或所述第一特征元素的概率分布的方差大于或等于第三阈值。
在另一种可能的实现方式中,所述预设条件为所述第一特征元素的概率分布的均值与所述第一特征元素取值k的差的绝对值与所述第一特征元素的概率分布的方差的和大于或等于第四阈值。
在一种可能的实现方式中,对所述特征数据进行概率估计以得到所述特征数据中各特 征元素的概率估计结果,其中所述第一特征元素的概率估计结果包括所述第一特征元素的概率值,和/或所述概率分布的第一参数和所述概率分布的第二参数。
在一种可能的实现方式中,所述第一特征元素取值为k的概率值为所述第一特征元素的所有可能的取值的概率值中的最大概率值。
在一种可能的实现方式中,所述第N个特征元素的概率估计结果包括至少以下一项:所述第N个特征元素的概率值,概率分布的第一参数和概率分布的第二参数以及决策信息。当所述决策信息中对应所述特征数据的第一特征元素位置的取值为1时,对所述特征数据的第一特征元素解码;当所述决策信息中对应所述特征数据的第一特征元素位置的取值不为1时,不对所述特征数据的第一特征元素解码,将第一特征元素的特征值设置为k,其中k为整数。
在一种可能的实现方式中,将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息;当所述第一特征元素的决策信息的值为预设值时,判断需要对所述第一特征元素执行熵解码;当所述第一特征元素的决策信息的值不为预设值时,判断不需要对所述第一特征元素执行熵解码,将第一特征元素的特征值设置为k,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
在一种可能的实现方式中,所述获得模块,还用于构建第一阈值的阈值候选列表,通过对所述码流进行解码以得到所述第一阈值的阈值候选列表的索引号,将所述第一阈值的索引号所对应所述第一阈值的阈值候选列表位置的值作为所述第一阈值的值,其中所述第一阈值的阈值候选列表的长度可以设置为T;T为大于或等于1的整数。
本实施例的装置,可以用于图10B,13B,16所示方法实施例中由解码器实施的技术方案,其实现原理和技术效果类似,此处不再赘述。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元判断。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒 体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来判断指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于判断所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (70)

  1. 一种特征数据的编码方法,其特征在于,包括:
    获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
    获取所述第一特征元素的概率估计结果;
    根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;
    仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:
    当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述第一特征元素执行熵编码;或
    当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述第一特征元素执行熵编码。
  3. 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  4. 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或
    所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  5. 根据权利要求4项所述的方法,其特征在于:
    当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或
    当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一 特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
  6. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    构建阈值候选列表,将所述第一阈值放入所述阈值候选列表中,且将对应有所述第一阈值的索引号写入编码码流,其中所述阈值候选列表的长度为T,T为大于或等于1的整数。
  7. 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果通过混合高斯分布获得时,则所述预设条件为:
    所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或
    所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或
    所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  8. 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:
    所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或
    所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或
    所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  9. 根据权利要求3-8任一所述的方法,其特征在于:
    所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
  10. 根据权利要求1所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:
    将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,
    根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码。
  11. 根据权利要求10所述的方法,其特征在于,当所述特征数据的决策信息为决策 图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
  12. 根据权利要求10所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。
  13. 根据权利要求1-12任一所述的方法,其特征在于,所述多个特征元素还包括第二特征元素,当判断出不需要对所述第二特征元素执行熵编码时,跳过对所述第二特征元素执行熵编码。
  14. 根据权利要求1-13任一所述的方法,其特征在于,所述方法还包括:
    将包括所述第一特征元素的多个特征元素的熵编码结果写入编码码流。
  15. 一种特征数据的解码方法,其特征在于,包括:
    获取待解码特征数据的码流;
    所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
    获取所述第一特征元素的概率估计结果;
    根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;
    仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
  16. 根据权利要求15所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:
    当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述特征数据的第一特征元素执行熵解码;或
    当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述特征数据的第一特征元素执行熵解码,将所述第一特征元素的特征值设置为k,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  17. 根据权利要求16所述的方法,其特征在于,当所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  18. 根据权利要求16所述的方法,其特征在于,当所述第一特征元素的概率估计结 果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或
    所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  19. 根据权利要求18所述的方法,其特征在于:
    当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或
    当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
  20. 根据权利要求16所述的方法,其特征在于,所述第一特征元素的概率估计结果通过混合高斯分布获得时,则所述预设条件为:
    所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或
    所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或
    所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  21. 根据权利要求16所述的方法,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:
    所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或
    所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或
    所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  22. 根据权利要求16-21任一所述的装置,其特征在于,所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
  23. 根据权利要求15所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:
    将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,
    根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码。
  24. 根据权利要求23所述的方法,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵解码;
    当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵解码。
  25. 根据权利要求23所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵解码;
    当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵解码。
  26. 根据权利要求15-25任一所述的方法,其特征在于,所述方法还包括:
    所述特征数据经过解码网络以得到所述重建数据或面向机器任务数据。
  27. 一种特征数据编码装置,其特征在于,包括:
    获得模块,用于获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素,以及用于获取所述中第一特征元素的概率估计结果;
    编码模块,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
  28. 根据权利要求27所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:
    当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述特征数据的第一特征元素执行熵编码;或
    当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述特征数据的第一特征元素执行熵编码。
  29. 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的 概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  30. 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或
    所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  31. 根据权利要求30所述的装置,其特征在于:
    当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或
    当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
  32. 根据权利要求29所述的装置,其特征在于:
    所述编码模块,还用于构建阈值候选列表,将所述第一阈值放入所述阈值候选列表中,且将对应有所述第一阈值的索引号写入编码码流,其中所述阈值候选列表的长度为T,T为大于或等于1的整数。
  33. 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果通过混合高斯分布获得时,则所述预设条件为:
    所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或
    所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或
    所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  34. 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:
    所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或
    所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或
    所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  35. 根据权利要求29-34任一所述的方法,其特征在于:
    所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
  36. 根据权利要求27所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:
    将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,
    根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码。
  37. 根据权利要求36所述的装置,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
  38. 根据权利要求36所述的装置,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。
  39. 根据权利要求27-38任一所述的装置,其特征在于,所述多个特征元素还包括第二特征元素,当判断出不需要对所述第二特征元素执行熵编码时,跳过对所述第二特征元素执行熵编码。
  40. 根据权利要求27-39任一所述的装置,其特征在于,所述编码模块还包括:
    将包括所述的第一特征元素的多个特征元素的熵编码结果写入编码码流
  41. 一种特征数据解码装置,其特征在于,包括:
    获得模块,用于获取待解码特征数据的码流,所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述第一特征元素的概率估计结果;
    解码模块,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
  42. 根据权利要求41所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:
    当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述特征数据的第一特征元素执行熵解码;或
    当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述特征数据的第一特征元素执行熵解码,将所述第一特征元素的特征值设置为k,其中k为整数且k为多个候选取值中的一个。
  43. 根据权利要求42所述的装置,其特征在于,当所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  44. 根据权利要求42所述的装置,其特征在于,当所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或
    所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或
    所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  45. 根据权利要求44所述的装置,其特征在于:
    当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或
    当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
  46. 根据权利要求42所述的装置,其特征在于,所述第一特征元素的概率估计结果 通过混合高斯分布获得时,则所述预设条件为:
    所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或
    所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或
    所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  47. 根据权利要求42所述的装置,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:
    所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或
    所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或
    所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;
    其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
  48. 根据权利要求42-47任一所述的装置,其特征在于,所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
  49. 根据权利要求41所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:
    将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,
    根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码。
  50. 根据权利要求49所述的装置,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,需要对所述第一特征元素执行熵解码;
    当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,不需要对所述第一特征元素执行熵解码。
  51. 根据权利要求49所述的装置,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵解码;
    当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵解码。
  52. 根据权利要求41-51任一所述的装置,其特征在于:
    所述解码模块,还用于所述特征数据经过解码网络以得到所述重建数据或面向机器任务数据。
  53. 一种特征数据的编码方法,其特征在于,包括:
    获取待编码特征数据,所述特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
    获取所述特征数据的边信息,对所述特征数据的边信息输入联合网络以得到所述第一特征元素的决策信息;
    根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码;
    仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
  54. 根据权利要求53所述的方法,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
  55. 根据权利要求53所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。
  56. 根据权利要求53-55任一所述的方法,其特征在于,所述多个特征元素还包括第二特征元素,当判断出不需要对所述第二特征元素执行熵编码时,跳过对所述第二特征元素执行熵编码。
  57. 根据权利要求53-56任一所述的方法,其特征在于,所述方法还包括:
    将包括所述的第一特征元素的多个特征元素的熵编码结果写入编码码流。
  58. 一种特征数据的解码方法,其特征在于,包括:
    获取待解码特征数据的码流和所述待解码特征数据的边信息;
    所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;
    对所述待解码特征数据的边信息输入联合网络以得到所述第一特征元素的决策信息;
    根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码;
    仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
  59. 根据权利要求58所述的方法,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述 第一特征元素执行熵解码;
    当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵解码,将第一特征元素的特征值设置为k,其中k为整数。
  60. 根据权利要求58所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;
    当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码,将第一特征元素的特征值设置为k,其中k为整数。
  61. 根据权利要求58-60任一所述的方法,其特征在于,所述方法还包括:
    所述特征数据经过解码网络以得到所述重建数据或面向机器任务数据。
  62. 一种编码器,其特征在于,包括处理电路,用于执行权利要求1至14,53至57任一项所述的方法。
  63. 一种解码器,其特征在于,包括处理电路,用于执行权利要求15至26,58至61任一项所述的方法。
  64. 一种计算机程序产品,其特征在于,包括程序代码,当其在计算机或处理器上判断时,用于判断权利要求1至26,53至61任一项所述的方法。
  65. 一种非瞬时性计算机可读存储介质,其特征在于,包括根据权利要求14或57所述的编码方法获得的码流。
  66. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述编码器执行根据权利要求1至14,53至57任一项所述的方法。
  67. 一种解码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述解码器执行根据权利要求15至26,58至61任一项所述的方法。
  68. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序, 其中所述程序在由所述处理器判断时,使得所述编码器执行根据权利要求1至14,53至57任一项所述的方法。
  69. 一种图像或音频处理器,其特征在于,包括处理电路,用于执行根据权利要求1至26,53至61任一项所述的方法。
  70. 一种非瞬时性计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备判断时,用于执行根据权利要求1至26,53至61任一项所述的方法。
PCT/CN2022/096510 2021-06-02 2022-06-01 特征数据编解码方法和装置 WO2022253249A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
JP2023574690A JP2024520151A (ja) 2021-06-02 2022-06-01 特徴データ符号化および復号方法および装置
CA3222179A CA3222179A1 (en) 2021-06-02 2022-06-01 Feature data encoding and decoding method and apparatus
BR112023025167A BR112023025167A2 (pt) 2021-06-02 2022-06-01 Método e aparelho de codificação e decodificação de dados de característica
EP22815293.0A EP4336829A1 (en) 2021-06-02 2022-06-01 Feature data encoding method and apparatus and feature data decoding method and apparatus
KR1020237045517A KR20240016368A (ko) 2021-06-02 2022-06-01 특징 데이터 인코딩 및 디코딩 방법 및 장치
AU2022286517A AU2022286517A1 (en) 2021-06-02 2022-06-01 Feature data encoding method and apparatus and feature data decoding method and apparatus
US18/526,406 US20240105193A1 (en) 2021-06-02 2023-12-01 Feature Data Encoding and Decoding Method and Apparatus

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN202110616029 2021-06-02
CN202110616029.2 2021-06-02
CN202110674299.9 2021-06-17
CN202110674299 2021-06-17
CN202111091143.4 2021-09-17
CN202111091143.4A CN115442609A (zh) 2021-06-02 2021-09-17 特征数据编解码方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/526,406 Continuation US20240105193A1 (en) 2021-06-02 2023-12-01 Feature Data Encoding and Decoding Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2022253249A1 true WO2022253249A1 (zh) 2022-12-08

Family

ID=84271885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096510 WO2022253249A1 (zh) 2021-06-02 2022-06-01 特征数据编解码方法和装置

Country Status (9)

Country Link
US (1) US20240105193A1 (zh)
EP (1) EP4336829A1 (zh)
JP (1) JP2024520151A (zh)
KR (1) KR20240016368A (zh)
CN (1) CN115442609A (zh)
AU (1) AU2022286517A1 (zh)
BR (1) BR112023025167A2 (zh)
CA (1) CA3222179A1 (zh)
WO (1) WO2022253249A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016013629A1 (ja) 2014-07-24 2016-01-28 日本ポリエチレン株式会社 オレフィン重合触媒及びオレフィン重合体の製造方法
CN116828184B (zh) * 2023-08-28 2023-12-22 腾讯科技(深圳)有限公司 视频编码、解码方法、装置、计算机设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127913B1 (en) * 2017-07-07 2018-11-13 Sif Codec Llc Method of encoding of data stream, method of decoding of data stream, and devices for implementation of said methods
CN111107377A (zh) * 2018-10-26 2020-05-05 曜科智能科技(上海)有限公司 深度图像压缩方法及其装置、设备和存储介质
US10652581B1 (en) * 2019-02-27 2020-05-12 Google Llc Entropy coding in image and video compression using machine learning
CN111988629A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码方法和装置、图像解码方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127913B1 (en) * 2017-07-07 2018-11-13 Sif Codec Llc Method of encoding of data stream, method of decoding of data stream, and devices for implementation of said methods
CN111107377A (zh) * 2018-10-26 2020-05-05 曜科智能科技(上海)有限公司 深度图像压缩方法及其装置、设备和存储介质
US10652581B1 (en) * 2019-02-27 2020-05-12 Google Llc Entropy coding in image and video compression using machine learning
CN111988629A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码方法和装置、图像解码方法和装置

Also Published As

Publication number Publication date
US20240105193A1 (en) 2024-03-28
CA3222179A1 (en) 2022-12-08
AU2022286517A1 (en) 2023-12-21
CN115442609A (zh) 2022-12-06
EP4336829A1 (en) 2024-03-13
BR112023025167A2 (pt) 2024-02-27
JP2024520151A (ja) 2024-05-21
KR20240016368A (ko) 2024-02-06

Similar Documents

Publication Publication Date Title
WO2022253249A1 (zh) 特征数据编解码方法和装置
US20210329267A1 (en) Parallelized rate-distortion optimized quantization using deep learning
WO2022068716A1 (zh) 熵编/解码方法及装置
US20210150769A1 (en) High efficiency image and video compression and decompression
WO2021249290A1 (zh) 环路滤波方法和装置
TWI806199B (zh) 特徵圖資訊的指示方法,設備以及電腦程式
US20240064318A1 (en) Apparatus and method for coding pictures using a convolutional neural network
WO2023279961A1 (zh) 视频图像的编解码方法及装置
CN114125446A (zh) 图像编码方法、解码方法和装置
CN116711308A (zh) 视频编解码以及模型训练方法与装置
US20230396810A1 (en) Hierarchical audio/video or picture compression method and apparatus
WO2022111233A1 (zh) 帧内预测模式的译码方法和装置
US20240007637A1 (en) Video picture encoding and decoding method and related device
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
WO2022100173A1 (zh) 一种视频帧的压缩和视频帧的解压缩方法及装置
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
CN114554205B (zh) 一种图像编解码方法及装置
KR20230129068A (ko) 확장 가능한 인코딩 및 디코딩 방법 및 장치
JP2024513693A (ja) ピクチャデータ処理ニューラルネットワークに入力される補助情報の構成可能な位置
CN117321989A (zh) 基于神经网络的图像处理中的辅助信息的独立定位
WO2023279968A1 (zh) 视频图像的编解码方法及装置
WO2023165487A1 (zh) 特征域光流确定方法及相关设备
US20230412807A1 (en) Bit allocation for neural network feature channel compression
CN115834888A (zh) 特征图编解码方法和装置
WO2023091040A1 (en) Generalized difference coder for residual coding in video compression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815293

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: MX/A/2023/014419

Country of ref document: MX

Ref document number: 3222179

Country of ref document: CA

Ref document number: 2301007889

Country of ref document: TH

WWE Wipo information: entry into national phase

Ref document number: 2023574690

Country of ref document: JP

Ref document number: 2022286517

Country of ref document: AU

Ref document number: AU2022286517

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2022815293

Country of ref document: EP

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023025167

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2022815293

Country of ref document: EP

Effective date: 20231206

ENP Entry into the national phase

Ref document number: 2022286517

Country of ref document: AU

Date of ref document: 20220601

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20237045517

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2023135295

Country of ref document: RU

ENP Entry into the national phase

Ref document number: 112023025167

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20231130