WO2022253249A1 - 特征数据编解码方法和装置 - Google Patents
特征数据编解码方法和装置 Download PDFInfo
- Publication number
- WO2022253249A1 WO2022253249A1 PCT/CN2022/096510 CN2022096510W WO2022253249A1 WO 2022253249 A1 WO2022253249 A1 WO 2022253249A1 CN 2022096510 W CN2022096510 W CN 2022096510W WO 2022253249 A1 WO2022253249 A1 WO 2022253249A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature element
- feature
- value
- probability
- threshold
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 263
- 238000009826 distribution Methods 0.000 claims description 380
- 238000010586 diagram Methods 0.000 claims description 67
- 238000012545 processing Methods 0.000 claims description 47
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 abstract description 54
- 238000013473 artificial intelligence Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 74
- 238000004891 communication Methods 0.000 description 29
- 238000000605 extraction Methods 0.000 description 23
- 238000013527 convolutional neural network Methods 0.000 description 22
- 230000006870 function Effects 0.000 description 21
- 230000005540 biological transmission Effects 0.000 description 18
- 230000006835 compression Effects 0.000 description 18
- 238000007906 compression Methods 0.000 description 18
- 230000004913 activation Effects 0.000 description 16
- 238000013139 quantization Methods 0.000 description 14
- 230000001537 neural effect Effects 0.000 description 12
- 230000003044 adaptive effect Effects 0.000 description 10
- 238000003491 array Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000007781 pre-processing Methods 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 9
- 210000002569 neuron Anatomy 0.000 description 9
- 230000000306 recurrent effect Effects 0.000 description 9
- 239000000203 mixture Substances 0.000 description 8
- 238000012805 post-processing Methods 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 8
- 230000005236 sound signal Effects 0.000 description 8
- 238000010606 normalization Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 6
- 230000005055 memory storage Effects 0.000 description 5
- 241000023320 Luma <angiosperm> Species 0.000 description 4
- 230000009471 action Effects 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000007667 floating Methods 0.000 description 4
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 4
- 238000000638 solvent extraction Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000009966 trimming Methods 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3068—Precoding preceding compression, e.g. Burrows-Wheeler transformation
- H03M7/3079—Context modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0017—Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6064—Selection of Compressor
- H03M7/6082—Selection strategies
Definitions
- Embodiments of the present invention relate to the technical field of image or audio compression based on artificial intelligence (AI), in particular to a method and device for encoding and decoding feature data.
- AI artificial intelligence
- Image or audio encoding and decoding (referred to as codec) is widely used in digital image or audio applications, such as broadcasting digital TV, image or audio transmission on the Internet and mobile networks, video or voice chat, and real-time conversations such as video or voice conferencing applications, DVD and Blu-ray discs, image or audio content capture and editing systems, and security applications for camcorders.
- codec Digital image or audio applications
- the video is composed of multiple frames of images, so the images in this application can be individual images or images in the video.
- image (or audio) data is usually compressed before being transmitted in modern telecommunication networks.
- Image (or audio) size may also be an issue when storing video on a storage device, as memory resources may be limited.
- Image (or audio) compression equipment typically uses software and/or hardware at the source to encode image (or audio) data prior to transmission or storage, thereby reducing the amount of data required to represent a digital image (or audio) quantity.
- the compressed data is then received at the destination side by an image (or audio) decompression device.
- VVC's video standard formulation work was completed in June 2020, and the standard includes almost all technical algorithms that can bring about significant improvements in compression efficiency. Therefore, it is difficult to obtain major technological breakthroughs in a short period of time by continuing to study new compression coding algorithms along the traditional signal processing path.
- Different from traditional image algorithms that optimize each module of image compression through manual design end-to-end AI image compression is optimized as a whole, so the compression effect of the AI image compression scheme is better.
- the variational autoencoder (Variational Autoencoder, AE) method is the mainstream technical solution of the current AI image lossy compression technology.
- the current mainstream technical solution is to obtain the image feature map of the image to be encoded through the encoding network, and further perform entropy encoding on the image feature map, but the entropy encoding process has the problem of high complexity.
- the present application provides a feature data encoding and decoding method and device, which can reduce encoding and decoding complexity without affecting encoding and decoding performance.
- a method for encoding characteristic data including:
- the feature data to be encoded includes a plurality of feature elements, and the plurality of feature elements include a first feature element;
- Entropy coding is performed on the first feature element only when it is determined that entropy coding needs to be performed on the first feature element.
- the feature data includes an image feature map, or an audio feature variable, or an image feature map and an audio feature variable. It can be one-dimensional, two-dimensional or multi-dimensional data output by the encoding network, where each data is a feature element. It should be noted that the meanings of the feature point and the feature element in this application are the same.
- the first feature element is any feature element to be encoded in the feature data to be encoded.
- the probability estimation process of obtaining the probability estimation result of the first feature element can be realized through a probability estimation network; in another possibility, the probability estimation process can use a traditional non-network probability estimation method to evaluate the feature data for probability estimation.
- the probability estimation results of each feature element can be output in parallel; when the input of probability estimation includes context information, the probability estimation results of each feature element need to be serially output.
- the side information is the feature information obtained by inputting the feature data into the neural network and further extracting, and the number of feature elements contained in the side information is less than that of the feature data.
- the side information of the feature data can be encoded into the code stream.
- entropy coding does not need to be performed on the first feature element of the feature data.
- the current first feature element is the Pth feature element of the feature data
- start the P+1th feature element of the feature data Judgment of feature elements and execution or non-execution of entropy coding process according to the judgment result wherein P is a positive integer and P is less than M, where M is the number of feature elements in the entire feature data.
- P is a positive integer and P is less than M
- M is the number of feature elements in the entire feature data.
- judging whether to perform entropy coding on the first feature element includes: when the probability estimation result of the first feature element satisfies a preset condition, judging that entropy coding needs to be performed on the first feature element Entropy coding; or when the probability estimation result of the first feature element does not satisfy a preset condition, it is determined that entropy coding does not need to be performed on the first feature element.
- the probability estimation result of the first feature element is the probability value that the first feature element takes a value k
- the preset condition is the probability that the first feature element takes a value k
- the value is less than or equal to the first threshold, where k is an integer.
- k is a certain value in the possible value range of the above-mentioned first characteristic element.
- the value range that the first feature element can take is [-255, 255].
- k may be set to 0, then entropy coding is performed on the first feature element whose probability value is less than or equal to 0.5. Entropy coding is not performed for the first feature element with a probability value greater than 0.5.
- the probability value that the first feature element takes a value of k is the maximum probability value among the probability values of all possible values of the first feature element.
- the first threshold selected for the coded stream at a low code rate is smaller than the first threshold selected for the coded stream at a high code rate.
- the specific bit rate is related to the resolution and image content of the image. Taking the public Kodak dataset as an example, the bit rate is lower than 0.5bpp, and the bit rate is high if it is lower.
- the first threshold may be configured according to actual needs, which is not limited here.
- the entropy encoding complexity can be flexibly reduced according to requirements through a flexible first threshold setting manner.
- the probability estimation result of the first feature element includes a first parameter and a second parameter of a probability distribution of the first feature element.
- the first parameter of the probability distribution of the first feature element is the mean value of the Gaussian distribution of the first feature element
- the second parameter of the probability distribution of the first feature element is the first parameter of the probability distribution of the first feature element.
- the variance of a Gaussian distribution of a feature element; or when the probability distribution is a Laplace distribution, the first parameter of the probability distribution of the first feature element is the position parameter of the Laplace distribution of the first feature element, so The second parameter of the probability distribution of the first feature element is a scale parameter of the Laplace distribution of the first feature element.
- the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to the second threshold;
- a second parameter of the probability distribution of the first feature element is greater than or equal to a third threshold
- the sum of the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element and the second parameter of the probability distribution of the first feature element is greater than or equal to a fourth threshold .
- the first parameter of the probability distribution of the first feature element is the mean value of the mixed Gaussian distribution of the first feature element
- the second parameter of the probability distribution of the first feature element is The variance of the mixed Gaussian distribution of the first feature element
- the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution of the first feature element and the value k of the first feature element and the sum of any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the fifth threshold;
- the difference between any mean value of the mixed Gaussian distribution of the first feature element and the value k of the first feature element is greater than or equal to the sixth threshold;
- Any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the seventh threshold.
- the first parameter of the probability distribution of the first feature element is the mean value of the asymmetric Gaussian distribution of the first feature element
- the second parameter of the probability distribution of the first feature element is The parameter is the first variance and the second variance of the asymmetric Gaussian distribution of the first feature element
- the absolute value of the difference between the mean value of the asymmetric Gaussian distribution of the first feature element and the value k of the first feature element is greater than or equal to the eighth threshold;
- a first variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a ninth threshold
- a second variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a tenth threshold.
- the probability distribution of the first feature element is a mixed Gaussian distribution, and the judgment range of the first feature element is determined. When the multiple mean values of the probability distribution of the first feature element are not in the first feature element range of judgment values.
- the probability distribution of the first characteristic element is a Gaussian distribution, and the judgment value range of the first characteristic element is determined. When the mean value of the probability distribution of the first characteristic element is not within the judgment value range of the first characteristic element scope.
- the probability distribution of the first characteristic element is a Gaussian distribution
- the judgment value range of the first characteristic element is determined
- the judgment value range includes a plurality of possible values of the first characteristic element, when the The absolute value of the difference between the mean parameter of the Gaussian distribution of the first feature element and each value in the judgment value range of the first feature element is greater than or equal to the eleventh threshold, or the probability of the first feature element
- the variance of the distribution is greater than or equal to the twelfth threshold.
- the value of the first feature element is not within the value range of the first feature element.
- the probability value corresponding to the value of the first characteristic element is less than or equal to the thirteenth threshold.
- the method further includes: constructing a threshold candidate list of the first threshold, putting the first threshold into the threshold candidate list of the first threshold and corresponding to the first threshold
- the index number of the first threshold is written into the encoded code stream, wherein the length of the threshold candidate list of the first threshold can be set to T; T is an integer greater than or equal to 1.
- the other thresholds can be constructed in the manner of constructing a threshold candidate list as the first threshold, and have index numbers corresponding to the thresholds and write them into the encoded code stream.
- the index number is written into the code stream, which can be stored in the sequence header (sequence header), image header (picture header), Slice/strip (slice header) or SEI (supplemental enhancement information) and sent to At the decoding end, other methods may also be used, which are not limited here.
- sequence header sequence header
- image header picture header
- Slice/strip slice header
- SEI Supplemental Enhancement information
- the decision information is obtained by inputting the probability estimation result into the generation network.
- the generation network may be a convolutional network, which may include multiple network layers, and any network layer may be a convolutional layer, a normalization layer, a nonlinear activation layer, and the like.
- the probability estimation result of the feature data is input into the generation network to obtain the decision information of the first feature element, and the decision information is used to indicate whether to perform entropy on the first feature element coding.
- the decision information of the characteristic data is a decision map, which may also be called a decision map.
- the decision graph is preferably a binary graph, and the binary graph may also be called a binary graph map.
- the value of the decision information of the feature elements in the binary graph is usually 0 or 1. Therefore, when the value corresponding to the location of the first characteristic element in the decision diagram is a preset value, entropy coding needs to be performed on the first characteristic element; when the value corresponding to the location of the first characteristic element in the decision diagram is When the value of the position is not a preset value, entropy coding does not need to be performed on the first feature element.
- the decision information of the feature elements in the feature data is a preset value.
- the preset value of the decision information is usually 1, so when the decision information is a preset value, entropy coding needs to be performed on the first feature element; when the decision information is not a preset value, There is no need to perform entropy coding on the first feature element.
- the decision information can be an identifier or an identifier's value. Judging whether to perform entropy coding on the first feature element depends on whether the flag or the value of the flag is a preset value.
- the set of decision information of each feature element in the feature data can also be a floating point number, that is to say, the value can be other values except 0 and 1.
- the method further includes: the image to be encoded is passed through a coding network to obtain the feature data; the image to be coded is rounded to obtain the feature data after passing through the coding network; or the image to be coded is passed through the coding network Afterwards, the feature data is obtained through quantization and rounding.
- the encoding network can adopt an autoencoder structure.
- the encoding network can be a convolutional neural network.
- An encoding network can include multiple subnetworks, each containing one or more convolutional layers. The network structures among the sub-networks may be the same or different from each other.
- the image to be encoded can be an original image or a residual image.
- the image to be encoded can be in RGB format or YUV, RAW and other representation formats, and the image to be encoded can be pre-processed before being input into the encoding network.
- the pre-processing operation can include operations such as conversion, block division, filtering, and pruning.
- a method for decoding feature data including:
- the characteristic data to be decoded includes a plurality of characteristic elements, and the plurality of characteristic elements include a first characteristic element;
- Entropy decoding is performed on the first feature element only when it is determined that entropy decoding needs to be performed on the first feature element.
- the first feature element is any feature element in the feature data to be decoded, and when all the feature elements in the feature data to be decoded complete the judgment and execute or not perform entropy decoding according to the judgment result, the decoded feature data is obtained .
- the feature data to be decoded may be one-dimensional, two-dimensional or multi-dimensional data, each of which is a feature element. It should be noted that the meanings of the feature point and the feature element in this application are the same.
- the first feature element is any feature element to be decoded in the feature data to be decoded.
- the probability estimation process of obtaining the probability estimation result of the first feature element can be realized through a probability estimation network; in another possibility, the probability estimation process can use a traditional non-network probability estimation method to evaluate the feature data for probability estimation.
- the probability estimation results of each feature element can be output in parallel; when the input of probability estimation includes context information, the probability estimation results of each feature element need to be serially output. Wherein, the number of feature elements included in the side information is less than that of feature data.
- the code stream contains side information, and the process of decoding the code stream needs to decode the side information.
- the process of judging each feature element in the feature data includes conditional judgment and deciding whether to perform entropy decoding according to the result of the conditional judgment.
- entropy decoding can be implemented by means of neural networks.
- entropy decoding can be implemented by conventional entropy decoding.
- the current first feature element is the Pth feature element of the feature data
- start the P+1th feature element of the feature data Judgment of feature elements and execution or non-execution of entropy decoding process according to the judgment result where P is a positive integer and P is less than M, where M is the number of feature elements in the entire feature data.
- P is a positive integer and P is less than M
- M is the number of feature elements in the entire feature data.
- the judging whether to perform entropy decoding on the first feature element of the feature data includes: when the probability estimation result of the first feature element of the feature data satisfies a preset condition, judging whether to perform entropy decoding on the first feature element of the feature data Entropy decoding is performed on the first feature element; or when the probability estimation result of the first feature element does not meet the preset condition, it is judged that the entropy decoding of the first feature element is not required, and the The eigenvalues are set to k; where k is an integer.
- the probability estimation result of the first feature element is the probability value that the first feature element takes a value k
- the preset condition is the probability that the first feature element takes a value k
- the value is less than or equal to the first threshold, where k is an integer.
- the first feature element is set to k when the preset condition is not satisfied.
- the value range that the first feature element can take is [-255, 255].
- k may be set to 0, then entropy coding is performed on the first feature element whose probability value is less than or equal to 0.5. Entropy coding is not performed for the first feature element with a probability value greater than 0.5.
- the value of the first characteristic element is determined through a list when the preset condition is not satisfied.
- the first feature element is set to a fixed integer value when the preset condition is not satisfied.
- k is a certain value in the possible value range of the above-mentioned first characteristic element.
- k is the value corresponding to the maximum probability among all possible value ranges of the above-mentioned first feature element.
- the first threshold selected for the decoded code stream in the case of a low code rate is smaller than the first threshold selected for the decoded code stream in the case of a high code rate.
- the specific bit rate is related to the resolution and image content of the image. Taking the public Kodak dataset as an example, a bit rate lower than 0.5bpp is low, otherwise it is high bit rate.
- the first threshold may be configured according to actual needs, which is not limited here.
- the flexible first threshold setting method enables the entropy decoding complexity to be flexibly reduced according to requirements.
- the probability estimation result of the first feature element includes a first parameter and a second parameter of a probability distribution of the first feature element.
- the first parameter of the probability distribution of the first feature element is the mean value of the Gaussian distribution of the first feature element
- the second parameter of the probability distribution of the first feature element is the first parameter of the probability distribution of the first feature element.
- the variance of a Gaussian distribution of a feature element; or when the probability distribution is a Laplace distribution, the first parameter of the probability distribution of the first feature element is the position parameter of the Laplace distribution of the first feature element, so The second parameter of the probability distribution of the first feature element is a scale parameter of the Laplace distribution of the first feature element.
- the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to a second threshold
- a second parameter of the first feature element is greater than or equal to a third threshold
- the sum of the absolute value of the difference between the first parameter of the probability distribution of the first feature element and the value k of the first feature element and the second parameter of the probability distribution of the first feature element is greater than or equal to a fourth threshold .
- the first parameter of the probability distribution of the first feature element is the mean value of the mixed Gaussian distribution of the first feature element
- the second parameter of the probability distribution of the first feature element is The variance of the mixed Gaussian distribution of the first feature element
- the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution of the first feature element and the value k of the first feature element and the sum of any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the fifth threshold;
- the difference between any mean value of the mixed Gaussian distribution of the first feature element and the value k of the first feature element is greater than the sixth threshold;
- Any variance of the mixed Gaussian distribution of the first feature element is greater than or equal to the seventh threshold.
- the first parameter of the probability distribution of the first feature element is the mean value of the asymmetric Gaussian distribution of the first feature element
- the second parameter of the probability distribution of the first feature element is The parameter is the first variance and the second variance of the asymmetric Gaussian distribution of the first feature element
- the absolute value of the difference between the mean parameter of the asymmetric Gaussian distribution of the first feature element and the value k of the first feature element is greater than the eighth threshold;
- a first variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a ninth threshold
- a second variance of the asymmetric Gaussian distribution of the first feature element is greater than or equal to a tenth threshold.
- the probability distribution of the first feature element is a mixed Gaussian distribution, and the judgment range of the first feature element is determined. When the multiple mean values of the probability distribution of the first feature element are not in the first feature element range of judgment values.
- the probability distribution of the first characteristic element is a Gaussian distribution, and the judgment value range of the first characteristic element is determined. When the mean value of the probability distribution of the first characteristic element is not within the judgment value range of the first characteristic element scope.
- the probability distribution of the first characteristic element is a Gaussian distribution
- the judgment value range of the first characteristic element is determined
- the judgment value range includes a plurality of possible values of the first characteristic element, when the The absolute value of the difference between the mean parameter of the Gaussian distribution of the first feature element and each value in the judgment value range of the first feature element is greater than or equal to the eleventh threshold, or the probability of the first feature element
- the variance of the distribution is greater than or equal to the twelfth threshold.
- the value k of the first characteristic element is not within the judgment value range of the first characteristic element.
- the probability value corresponding to the value k of the first feature element is less than or equal to the thirteenth threshold.
- a threshold candidate list of the first threshold is constructed, the code stream is decoded to obtain an index number of the threshold candidate list of the first threshold, and the index number of the first threshold
- the value of the threshold candidate list position corresponding to the first threshold is used as the value of the first threshold, wherein the length of the threshold candidate list of the first threshold can be set to T; T is an integer greater than or equal to 1.
- the other arbitrary thresholds can adopt the threshold candidate list construction method as the first threshold, and can decode the index number corresponding to the threshold, and select the value in the construction list as the threshold according to the index number.
- the decision information is obtained by inputting the probability estimation result into the generation network.
- the generation network may be a convolutional network, which may include multiple network layers, and any network layer may be a convolutional layer, a normalization layer, a nonlinear activation layer, and the like.
- the probability estimation result of the feature data is input into the generation network to obtain the decision information of the first feature element, and the decision information is used to indicate whether to perform entropy on the first feature element decoding.
- the decision information of each feature element in the feature data is a decision map, which may also be called a decision map map.
- the decision graph is preferably a binary graph, and the binary graph may also be called a binary graph map.
- the value of the decision information of the feature elements in the binary graph is usually 0 or 1. Therefore, when the value corresponding to the location of the first characteristic element in the decision diagram is a preset value, entropy decoding needs to be performed on the first characteristic element; when the value corresponding to the location of the first characteristic element in the decision diagram is When the value of the position is not a preset value, entropy decoding does not need to be performed on the first feature element.
- the set of decision information of each feature element in the feature data can also be a floating point number, that is to say, the value can be other values except 0 and 1.
- the value of the decision information of the first feature element is equal to or greater than the preset value, it is judged that entropy decoding needs to be performed on the first feature element; or when the first feature element
- the value of the decision information of the element is less than a preset value, it is determined that entropy decoding does not need to be performed on the first feature element.
- the feature data is passed through a decoding network to obtain a reconstructed image.
- the feature data is passed through a decoding network to obtain machine-oriented task data
- the feature data is passed through a machine-oriented task module to obtain machine-oriented task data
- the machine-oriented module includes a target Recognition network, classification network or semantic segmentation network.
- a characteristic data encoding device including:
- An obtaining module configured to obtain feature data to be encoded, the feature data to be encoded includes a plurality of feature elements, the plurality of feature elements include a first feature element, and is used to obtain a probability estimation result of the first feature element ;
- An encoding module configured to determine whether to perform entropy encoding on the first feature element according to the probability estimation result of the first feature element; only when it is determined that entropy encoding needs to be performed on the first feature element, perform The first feature element performs entropy coding.
- a feature data decoding device including:
- An obtaining module configured to obtain a code stream of feature data to be decoded, where the feature data to be decoded includes a plurality of feature elements, and the plurality of feature elements include a first feature element; acquire a probability estimation result of the first feature element;
- a decoding module configured to judge whether to perform entropy decoding on the first feature element according to the probability estimation result of the first feature element; only when it is judged that entropy decoding needs to be performed on the first feature element, The first feature element performs entropy decoding.
- the present application provides an encoder, including a processing circuit for judging the method according to any one of the first aspect and the first aspect.
- the present application provides a decoder, including a processing circuit for judging the second aspect and the method described in any one of the second aspect.
- the present application provides a computer program product, including program code, which is used to judge the above-mentioned first aspect and any one of the first aspect, the above-mentioned second aspect and the second aspect when it is judged on a computer or a processor.
- program code which is used to judge the above-mentioned first aspect and any one of the first aspect, the above-mentioned second aspect and the second aspect when it is judged on a computer or a processor.
- the present application provides an encoder, including: one or more processors; a non-transitory computer-readable storage medium, coupled to the processor and storing a program judged by the processor, wherein the The program enables the decoder to judge the first aspect and the method described in any one of the first aspect when judged by the processor.
- the present application provides a decoder, including: one or more processors; a non-transitory computer-readable storage medium, coupled to the processors and storing a program judged by the processors, wherein the When judged by the processor, the program enables the encoder to judge the method described in the second aspect and the method described in any one of the second aspect.
- the present application provides a non-transitory computer-readable storage medium, including program code, which is used to determine any one of the first aspect and the first aspect, the second aspect and The method according to any one of the second aspect.
- the present invention relates to an encoding device, having a function of implementing the behaviors in the first aspect or any one of the method embodiments of the first aspect.
- Said functions can be realized by hardware, and can also be realized by corresponding software judged by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- the encoding device includes: an obtaining module, configured to transform the original image or the residual image into a feature space through an encoding network, and extract feature data for compression.
- the probability estimation of the feature data is performed to obtain the probability estimation results of each feature element of the feature data; the encoding module uses the probability estimation results of each feature element of the feature data to judge whether each feature element in the feature data performs entropy encoding and The encoding process of all the feature elements in the feature data is completed to obtain the code stream of the feature data.
- These modules can determine the corresponding functions in the first aspect or any method example of the first aspect. For details, refer to the detailed description in the method examples, and details are not repeated here.
- the present invention relates to a decoding device, which has the function of realizing the actions in the second aspect or any one of the method embodiments of the second aspect.
- Said functions can be realized by hardware, and can also be realized by corresponding software judged by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- the decoding device includes: an obtaining module, configured to obtain a code stream of the feature data to be decoded, and perform probability estimation according to the code stream of the feature data to be decoded to obtain a probability estimate of each feature element of the feature data Result; the decoding module uses the probability estimation results of each feature element of the feature data to judge whether each feature element in the feature data performs entropy decoding through certain conditions and completes the decoding process of all feature elements in the feature data to obtain the feature data. , and decode the feature data to obtain reconstructed image or machine-oriented task data.
- These modules can determine the corresponding functions in the above-mentioned second aspect or any method example of the second aspect. For details, refer to the detailed description in the method examples, and details are not repeated here.
- a method for encoding characteristic data including:
- the feature data includes a plurality of feature elements, and the plurality of feature elements include a first feature element
- Entropy coding is performed on the first feature element only when it is determined that entropy coding needs to be performed on the first feature element.
- the feature data is one-dimensional, two-dimensional or multi-dimensional data output by the encoding network, where each data is a feature element.
- the side information of the characteristic data is encoded into the code stream.
- the side information is the feature information obtained by inputting the feature data into the neural network and further extracting, and the number of feature elements included in the side information is less than that of the feature data.
- the first feature element is any feature element in the feature data.
- the set of decision information of each feature element of the feature data may be represented by a decision diagram or the like.
- the decision graph is one-dimensional, two-dimensional or multi-dimensional image data and is consistent with the size of the feature data.
- the joint network also outputs the probability estimation result of the first feature element, and the probability estimation result of the first feature element includes the probability value of the first feature element, and/or the probability distribution of the The first parameter and the second parameter of the probability distribution.
- entropy coding when the value corresponding to the position of the first characteristic element in the decision diagram is a preset value, entropy coding needs to be performed on the first characteristic element; when the value corresponding to the first characteristic element in the decision diagram When the value of the position is not a preset value, entropy coding does not need to be performed on the first feature element.
- a method for decoding feature data including:
- the characteristic data to be decoded includes a plurality of characteristic elements, and the plurality of characteristic elements include a first characteristic element;
- Entropy decoding is performed on the first feature element only when it is determined that entropy decoding needs to be performed on the first feature element.
- the code stream of the feature data to be decoded is decoded to obtain side information.
- the number of feature elements contained in the side information is less than that of feature data.
- the first feature element is any feature element in the feature data.
- the decision information of each characteristic element of the characteristic data may be expressed in a manner such as a decision diagram.
- the decision graph is one-dimensional, two-dimensional or multi-dimensional image data and is consistent with the size of the feature data.
- the joint network also outputs the probability estimation result of the first feature element, and the probability estimation result of the first feature element includes the probability value of the first feature element, and/or the probability distribution of the The first parameter and the second parameter of the probability distribution.
- entropy decoding needs to be performed on the first characteristic element; when the value corresponding to the first characteristic element in the decision diagram When the value of the position is not a preset value, entropy decoding does not need to be performed on the first feature element, and the feature value of the first feature element is set to k, where k is an integer.
- This application utilizes the relevant information about the probability distribution of feature points in the feature data to be encoded to determine whether entropy encoding and decoding is required for each feature element in the feature data to be encoded and decoded, thereby skipping the entropy encoding and decoding process of some feature elements.
- the number of elements to be encoded and decoded can be significantly reduced, reducing the complexity of encoding and decoding.
- the threshold value can be flexibly set to control the code rate of the generated code stream.
- FIG. 1A is an exemplary block diagram of an image decoding system
- Fig. 1B is the realization of the processing circuit of the image decoding system
- Fig. 1C is a schematic block diagram of an image decoding device
- Figure 1D is a diagram of the implementation of the device of the embodiment of the present application.
- FIG. 2A is a system architecture diagram of a possible scenario of the present application.
- FIG. 2B is a system architecture diagram of a possible scenario of the present application.
- 3A-3D are schematic block diagrams of encoders
- FIG. 4A is a schematic diagram of an encoding network unit
- Figure 4B is a schematic diagram of the network structure of the encoding network
- Fig. 5 is a structural schematic diagram of a coding decision-making realization unit
- Figure 6 is an example diagram of joint network output
- Fig. 7 is an example diagram of generating network output
- Fig. 8 is a schematic diagram of realization of decoding decision
- Fig. 9 is an example diagram of a network structure of a decoding network
- FIG. 10A is an example diagram of a decoding method in an embodiment of the present application.
- FIG. 10B is a schematic block diagram of an image feature map decoder according to an embodiment of the present application.
- Figure 11A is an example diagram of the decoding method of the embodiment of the present application.
- Fig. 12 is an example diagram of the network structure of the side information extraction module
- Fig. 13A is an example diagram of the decoding method of the embodiment of the present application.
- FIG. 13B is a schematic block diagram of an image feature map decoder according to an embodiment of the present application.
- FIG. 14 is an example diagram of a decoding method in an embodiment of the present application.
- Fig. 15 is an example diagram of a network structure of a joint network
- FIG. 16 is a schematic block diagram of an image feature map decoder according to an embodiment of the present application.
- Fig. 17 is an example diagram of the decoding method of the embodiment of the present application.
- FIG. 18 is a schematic structural diagram of an exemplary encoding device of the present application.
- FIG. 19 is a schematic structural diagram of an exemplary decoding device of the present application.
- At least one (item) means one or more, and “multiple” means two or more.
- “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
- the character “/” generally indicates that the contextual objects are an “or” relationship.
- At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
- At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
- the embodiment of the present application provides an AI-based feature data encoding and decoding technology, especially a neural network-based image feature map and/or audio feature variable encoding and decoding technology, and specifically provides an end-to-end image feature map-based and/or the codec system of the audio characteristic variable.
- Image coding (or commonly referred to as coding) includes two parts, image coding and image decoding, in which video is composed of multiple images and is a representation of continuous images.
- Image encoding is determined on the source side and typically involves processing (eg, compressing) raw video images to reduce the amount of data required to represent the video images (and thus more efficient storage and/or transmission).
- Image decoding is judged at the destination and usually involves inverse processing relative to the encoder to reconstruct the image.
- the "decoding" of images or audios involved in the embodiments should be understood as “encoding” or “decoding” of images or audios.
- the encoding part and the decoding part are also collectively referred to as codec (encoding and decoding, CODEC).
- the original image can be reconstructed, i.e. the reconstructed image has the same quality as the original image (assuming no transmission loss or other data loss during storage or transmission).
- the amount of data required to represent the video image is reduced by further compression through quantization and other judgments, and the decoder side cannot completely reconstruct the video image, that is, the quality of the reconstructed video image is higher than that of the original video image. lower or worse.
- the neural network can be composed of neural units, and the neural unit can refer to an operation unit that takes xs and intercept 1 as input, and the output of the operation unit can be:
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
- the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
- a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
- the local receptive field can be an area composed of several neural units.
- Deep neural network also known as multi-layer neural network
- DNN can be understood as a neural network with multiple hidden layers.
- DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
- the first layer is the input layer
- the last layer is the output layer
- the layers in the middle are all hidden layers.
- the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- DNN looks complicated, it is actually not complicated in terms of the work of each layer.
- it is the following linear relationship expression: in, is the input vector, is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
- Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
- DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
- the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
- the input layer has no W parameter.
- more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
- Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
- Convolutional neural network is a deep neural network with a convolutional structure.
- the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
- the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
- a neuron can only be connected to some adjacent neurons.
- a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
- Shared weights can be understood as a way to extract image information that is independent of location.
- the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
- the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
- Entropy coding is used to apply entropy coding algorithms or schemes (for example, variable length coding (variable length coding, VLC) schemes, context adaptive VLC schemes (context adaptive VLC, CALVC), arithmetic coding schemes, binarization algorithms, context automatic Adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), syntax-based context-adaptive binary arithmetic coding (syntax-based context-adaptive binary arithmetic coding, SBAC), probability interval partitioning entropy (probability interval partitioning entropy, PIPE) coding or other entropy coding methods or techniques) are applied to quantized coefficients and other syntax elements to obtain encoded data that can be output in the form of encoded bit streams through the output terminal, so that decoders, etc. can receive and use parameters for decoding.
- the encoded bitstream can be transmitted to the decoder, or it can be stored in memory for later transmission or retrieval by
- the encoder 20A and the decoder 30A are described with reference to FIGS. 1A to 15 .
- FIG. 1A is a schematic block diagram of an exemplary decoding system 10 , such as an image (or audio) decoding system 10 (or simply referred to as the decoding system 10 ) that can utilize the technology of the present application.
- the encoder 20A and the decoder 30A in the image decoding system 10 represent devices and the like that can be used to judge each technique from various examples described in this application.
- a decoding system 10 includes a source device 12 , and the source device 12 is configured to provide coded streams 21 such as coded images (or audio) to a destination device 14 for decoding the coded streams 21 .
- the source device 12 includes an encoder 20A, and optionally an image source 16 , a preprocessor (or preprocessing unit) 18 , a communication interface (or communication unit) 26 and a probability estimation (or probability estimation unit) 40 .
- Image (or audio) source 16 may comprise or may be any type of image capture device for capturing real world images (or audio), etc., and/or any type of image generation device, such as a computer for generating computer animation images Graphics processors or any type of processor used to acquire and/or provide real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality, AR) image) equipment.
- the audio or image source can be any type of memory or storage that stores any of the above audio or images.
- the image or audio (image or audio data) 17 may also be referred to as original image or audio (original image data or audio data) 17 .
- the preprocessor 18 is used to receive (original) image (or audio) data 17 and preprocess the image (or audio) data 17 to obtain preprocessed image or audio (or preprocessed image or audio data) 19 .
- preprocessing determined by preprocessor 18 may include cropping, color format conversion (eg, from RGB to YCbCr), color grading, or denoising. It can be understood that the preprocessing unit 18 can be an optional component.
- the encoder 20A includes an encoding network 20 , an entropy encoding 24 and, optionally, a preprocessor 22 .
- Image (or audio) encoding network (or encoding network) 20 is used to receive preprocessed image (or audio) data 19 and provide encoded image (or audio) data 21 .
- the preprocessor 22 is used to receive the feature data 21 to be encoded, and perform preprocessing on the feature data 21 to be encoded to obtain the preprocessed feature data 23 to be encoded.
- the preprocessing determined by the preprocessor 22 may include cropping, color format conversion (eg, from RGB to YCbCr), color correction, or denoising. It can be understood that the preprocessing unit 22 can be an optional component.
- the entropy coding 24 is used to receive the feature data to be coded (or preprocess the feature data to be coded) 23 and generate the code stream 25 according to the probability estimation result 41 provided by the probability estimation 40 .
- the communication interface 26 in the source device 12 can be used to: receive the coded code stream 25 and send the coded code stream 25 (or any other processed version) to another device such as the destination device 14 or any other device through the communication channel 27, so as to store Or rebuild directly.
- the destination device 14 includes a decoder 30A, and may additionally and optionally include a communication interface (or communication unit) 28 , a post-processor (or post-processing unit) 36 and a display device 38 .
- the communication interface 28 in the destination device 14 is used to receive the coded code stream 25 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, for example, the storage device is a coded code stream storage device, And the encoded code stream 25 is provided to the decoder 30A.
- the communication interface 26 and the communication interface 28 can be used to pass through a direct communication link between the source device 12 and the destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any other Combination, any type of private network and public network or any combination thereof, send or receive coded code stream (or coded code stream data) 25 .
- the communication interface 26 can be used to encapsulate the coded code stream 25 into a suitable format such as a message, and/or use any type of transmission coding or processing to process the coded code stream for transmission over a communication link or a communication network. transmission.
- the communication interface 28 corresponds to the communication interface 26 , for example, can be used to receive transmission data, and use any type of corresponding transmission decoding or processing and/or decapsulation to process the transmission data to obtain the encoded code stream 25 .
- Both the communication interface 26 and the communication interface 28 can be configured as a one-way communication interface as indicated by an arrow from the source device 12 to the corresponding communication channel 27 of the destination device 14 in FIG. 1A , or a two-way communication interface, and can be used to send and receive messages etc., to establish the connection, confirm and exchange any other information related to the communication link and/or data transmission such as encoded image data transmission, etc.
- the decoder 30A comprises a decoding network 34 , an entropy decoding 30 and, optionally, a post-processor 32 .
- the entropy decoding 30 is used to receive the encoded code stream 25 and provide the decoding characteristic data 31 according to the probability estimation result 42 provided by the probability estimation 40 .
- the post-processor 32 is used to post-process the decoded feature data 31 to obtain post-processed decoded feature data 33 .
- the post-processing determined by the post-processing unit 32 may include, for example, color format conversion (for example, conversion from YCbCr to RGB), color correction, trimming, or resampling. It can be understood that the post-processing unit 32 may be an optional component.
- the decoding network 34 is used to receive the decoded characteristic data 31 or post-processed decoded characteristic data 33 and provide reconstructed image data 35 .
- the post-processor 36 is used for post-processing the reconstructed image data 35 to obtain post-processed reconstructed image data 37 .
- the post-processing determined by the post-processing unit 36 may include, for example, color format conversion (for example, conversion from YCbCr to RGB), color correction, trimming, or resampling. It is understood that the post-processing unit 36 may be an optional component.
- the display device 38 is used to receive the reconstructed image data 35 or post-processed reconstructed image data 37 to display the image to a user or a viewer.
- Display device 38 may be or include any type of player or display for representing reconstructed audio or images, eg, an integrated or external display screen or display.
- the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display. ), a digital light processor (DLP), or any type of other display.
- FIG. 1A shows the source device 12 and the destination device 14 as independent devices
- device embodiments may also include the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14 at the same time, that is, include both the source device 12 and the destination device 14 at the same time.
- Device 12 or corresponding function and destination device 14 or corresponding function In these embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
- a feature data encoder 20A (such as an image feature map encoder or an audio feature variable encoder) or a feature data decoder 30A (such as an image feature map decoder or an audio feature variable decoder) or both can be implemented by Implementation of processing circuits, such as one or more microprocessors, digital signal processors (digital signal processors, DSPs), application-specific integrated circuits (application-specific integrated circuits, ASICs), field-programmable gate arrays (field-programmable gate arrays) , FPGA), discrete logic, hardware, dedicated processor for image encoding, or any combination thereof.
- Feature data encoder 20A can be implemented by processing circuit 56 and feature data decoder 30A can be implemented by processing circuit 56 .
- the processing circuitry 56 may be used to determine various operations discussed below. If part of the technology is implemented in software, the device can store software instructions in a suitable non-transitory computer-readable storage medium, and use one or more processors to judge the instructions in hardware, thereby judging the technology of the present invention.
- One of the feature data encoder 20A and the feature data decoder 30A may be integrated in a single device as part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 1B .
- Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, cell phone, smartphone, tablet or tablet computer, camera, desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiving devices, broadcast transmitting devices, etc., and may not Use or use any type of operating system.
- source device 12 and destination device 14 may be equipped with components for wireless communication. Accordingly, source device 12 and destination device 14 may be wireless communication devices.
- the decoding system 10 shown in FIG. 1A is merely exemplary, and the techniques provided herein can be applied to image feature map or audio feature variable encoding settings (e.g., image feature map encoding or image feature map decoding) , these settings do not necessarily include any data communication between the encoding device and the decoding device.
- data is retrieved from local storage, sent over a network, and so on.
- the image feature map or audio feature variable encoding device may encode data and store the data in memory, and/or the image feature map or audio feature variable decoding device may retrieve data from memory and decode the data.
- encoding and decoding are determined by devices that do not communicate with each other but simply encode data to memory and/or retrieve and decode data from memory.
- FIG. 1B is an illustrative diagram of an example of a coding system 50 including feature data encoder 20A of FIG. 1A and/or feature data decoder 30A of FIG. 1B , according to an example embodiment.
- the decoding system 50 may include an imaging (or audio generating) device 51, an encoder 20A, a decoder 30A (and/or a feature data encoder/decoder implemented by a processing circuit 56), an antenna 52, one or more processors 53.
- One or more memory stores 54 and/or display (or audio playback) devices 55 are examples of the decoding system 50 including feature data encoder 20A of FIG. 1A and/or feature data decoder 30A of FIG. 1B , according to an example embodiment.
- the decoding system 50 may include an imaging (or audio generating) device 51, an encoder 20A, a decoder 30A (and/or a feature data encoder/decoder implemented by a processing circuit 56), an antenna 52, one or more processors 53.
- an imaging (or audio producing) device 51 an antenna 52, a processing circuit 56, an encoder 20A, a decoder 30A, a processor 53, a memory storage 54, and/or a display (or audio playback) device 55 can interact with each other. communication.
- coding system 50 may include only encoder 20A or only decoder 30A.
- antenna 52 may be used to transmit or receive an encoded bitstream of characteristic data.
- a display (or audio playback) device 55 may be used to present image (or audio) data.
- the processing circuit 56 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
- the decoding system 50 can also include an optional processor 53, which can similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, an audio processor, a general-purpose processor, etc.
- the memory storage 54 can be any type of memory, such as volatile memory (for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.) or non-volatile memory.
- volatile memory for example, flash memory, etc.
- memory storage 54 may be implemented by cache memory.
- processing circuitry 56 may include memory (eg, cache, etc.) for implementing an image buffer or the like.
- encoder 20A implemented with logic circuitry may include an image buffer (eg, implemented with processing circuitry 56 or memory storage 54 ) and a graphics processing unit (eg, implemented with processing circuitry 56 ).
- a graphics processing unit may be communicatively coupled to the image buffer.
- Graphics processing unit may include encoder 20A implemented by processing circuitry 56 .
- Logic circuits may be used to determine the various operations discussed herein.
- decoder 30A may be implemented by processing circuitry 56 in a similar manner to implement the various modules discussed with reference to decoder 30 of FIG. 1B and/or any other decoder system or subsystem described herein.
- logic circuit implemented decoder 30A may include an image buffer (implemented by processing circuit 56 or memory storage 54 ) and a graphics processing unit (eg, implemented by processing circuit 56 ).
- a graphics processing unit may be communicatively coupled to the image buffer.
- Graphics processing unit may include image decoder 30A implemented by processing circuitry 56 .
- antenna 52 may be used to receive an encoded bitstream of image data.
- an encoded bitstream may contain data related to encoding audio or video frames, indicators, index values, mode selection data, etc., as discussed herein, such as data related to encoding partitions.
- Coding system 50 may also include decoder 30A coupled to antenna 52 and used to decode the encoded bitstream.
- a display (or audio playback) device 55 is used to present images (or audio).
- the decoder 30A may be used for judging the opposite process.
- the decoder 30A may be configured to receive and parse such syntax elements, decoding the associated image data accordingly.
- encoder 20A may entropy encode the syntax elements into an encoded bitstream. In such instances, decoder 30A may parse such syntax elements and decode the associated image data accordingly.
- FIG. 1C is a schematic diagram of a decoding device 400 provided by an embodiment of the present invention.
- the decoding device 400 is suitable for implementing the disclosed embodiments described herein.
- the decoding device 400 may be a decoder, such as the image feature map decoder 30A in FIG. 1A , or an encoder, such as the image feature map encoder 20A in FIG. 1A .
- the image decoding device 400 includes: an input port 410 (or input port 410) for receiving data and a receiving unit (receiver unit, Rx) 420; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 430;
- the processor 430 here can be a neural network processor 430; a sending unit (transmitter unit, Tx) 440 and an output port 450 (or output port 450) for transmitting data; memory 460.
- the image (or audio) decoding device 400 may also include an optical-to-electrical (OE) component and an electrical-to-optical (electrical-to-optical, EO) components for the exit or entry of optical or electrical signals.
- OE optical-to-electrical
- EO electrical-to-optical
- the processor 430 is realized by hardware and software.
- Processor 430 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
- the processor 430 is in communication with the ingress port 410 , the receiving unit 420 , the transmitting unit 440 , the egress port 450 and the memory 460 .
- the processor 430 includes a decoding module 470 (eg, a neural network NN based decoding module 470 ).
- the decoding module 470 implements the embodiments disclosed above. For example, the decode module 470 judges, processes, prepares, or provides for various encoding operations.
- the decoding module 470 is implemented with instructions stored in the memory 460 and judged by the processor 430 .
- Memory 460 includes one or more magnetic disks, tape drives, and solid-state drives, which can be used as overflow data storage devices for storing judgment programs as they are selected, and for storing instructions and data read during program judgment.
- Memory 460 may be volatile and/or nonvolatile, and may be a read-only memory (ROM), random access memory (random access memory, RAM), ternary content-addressable memory (ternary content-addressable memory (TCAM) and/or static random-access memory (static random-access memory, SRAM).
- ROM read-only memory
- RAM random access memory
- TCAM ternary content-addressable memory
- SRAM static random-access memory
- FIG. 1D is a simplified block diagram of an apparatus 500 provided in an exemplary embodiment.
- the apparatus 500 may be used as either or both of the source device 12 and the destination device 14 in FIG. 1A .
- Processor 502 in apparatus 500 may be a central processing unit.
- processor 502 may be any other type of device or devices, existing or to be developed in the future, capable of manipulating or processing information. While the disclosed implementations can be implemented using a single processor, such as processor 502 as shown, it is faster and more efficient to use more than one processor.
- memory 504 in apparatus 500 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 504 .
- Memory 504 may include code and data 506 accessed by processor 502 via bus 512 .
- Memory 504 may also include an operating system 508 and application programs 510, including at least one program that allows processor 502 to execute methods described herein.
- application program 510 may include applications 1 through N, and also include an image decoding application that determines the methods described herein.
- Apparatus 500 may also include one or more output devices, such as display 518 .
- display 518 may be a touch-sensitive display that combines the display with touch-sensitive elements that may be used to sense touch input.
- Display 518 may be coupled to processor 502 via bus 512 .
- bus 512 in device 500 is described herein as a single bus, bus 512 may include multiple buses. Additionally, secondary storage may be directly coupled to other components of device 500 or accessed over a network, and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, apparatus 500 may have a wide variety of configurations.
- FIG. 2A shows a system architecture 1800 in a possible image feature map or audio feature variable encoding and decoding scenario, including:
- Collection device 1801 the video collection device completes the original video (or audio) collection
- Pre-collection processing 1802 the original video (or audio) is collected to obtain video (or audio) data through a series of pre-processing;
- Video (or audio) coding is used to reduce coding redundancy and reduce the amount of data transmission during compression of image feature maps or audio feature variables;
- Sending 1804 Sending the compressed code stream data obtained after encoding through the sending module
- Receiving 1805 the compressed code stream data is received by the receiving module through network transmission;
- Code stream decoding 1806 perform code stream decoding on the code stream data
- Rendering and displaying (or playing) 1807 rendering and displaying (or playing) the decoded data
- FIG. 2B shows a possible image feature map (or audio feature variable) oriented system architecture 1900 in a machine task scenario, including:
- Feature extraction 1901 perform feature extraction on the image (or audio) source
- Side information extraction 1902 extract side information from the feature extraction data
- Probability estimation 1903 the side information is used as the input of probability estimation, and the probability estimation is performed on the feature map (or feature variable) to obtain the probability estimation result;
- Encoding 1904 performing entropy encoding on the feature extraction data in combination with the probability estimation result to obtain a code stream;
- a quantization or rounding operation is performed on the feature extraction data before encoding, and then the quantized or rounded feature extraction data is encoded.
- entropy coding is performed on the side information, so that the code stream includes side information data.
- Decoding 1905 Perform entropy decoding on the code stream in combination with the probability estimation results to obtain image feature maps (or audio feature variables);
- the coded stream includes side information coded data
- entropy decoding is performed on the side information coded data
- the decoded side information data is used as an input of probability estimation to obtain a probability estimation result.
- the probability estimation results of each feature element can be output in parallel; when the input of probability estimation includes context information, the probability estimation results of each feature element need to be serially output.
- the side information is the feature information obtained by inputting the image feature map or the audio feature variable into the neural network to further extract, and the number of feature elements contained in the side information is less than the feature elements of the image feature map or the audio feature variable.
- side information of image feature maps or audio feature variables can be encoded into the code stream.
- Machine Vision Task 1906 Perform a machine vision (or hearing) task on the decoded feature map (or feature variable).
- the decoded feature data is input into the machine vision (or auditory) task network, and the network output is one-dimensional, two-dimensional or multi-dimensional data related to visual (or auditory) tasks such as classification, target recognition, semantic segmentation and other tasks.
- the feature extraction and encoding processes are implemented on the terminal, and the decoding and execution of machine vision tasks are implemented on the cloud.
- the encoder 20A is operable to receive images (or image data) or audio (or audio data) 17 via an input 202 or the like.
- the received image, image data, audio, and audio data may also be preprocessed image (or preprocessed image data) or audio (or preprocessed audio data) 19 .
- Image (or audio) 17 may also be referred to as a current image or an image to be encoded (especially when the current image is distinguished from other images in video encoding, other images such as the same video sequence, that is, the video sequence that also includes the current image previously encoded image and/or decoded image) or current audio or audio to be encoded.
- a (digital) image is or can be viewed as a two-dimensional array or matrix of pixel points with intensity values. Pixels in the array may also be referred to as pixels (pixel or pel) (short for image element). The number of pixels in the array or image in the horizontal and vertical directions (or axes) determines the size and/or resolution of the image. In order to represent a color, three color components are usually used, that is, an image can be represented as or include three pixel arrays. In the RBG format or color space, an image includes corresponding red, green and blue pixel arrays.
- each pixel can be expressed in a luminance/chroma format or color space, such as YCbCr, including a luminance component indicated by Y (also denoted by L sometimes) and two chrominance components indicated by Cb and Cr.
- the luminance (luma) component Y represents brightness or grayscale level intensity (e.g., both are the same in a grayscale image), while the two chrominance (chroma) components Cb and Cr represent chrominance or color information components .
- an image in the YCbCr format includes a luminance pixel point array of luminance pixel point values (Y) and two chrominance pixel point arrays of chrominance values (Cb and Cr).
- Images in RGB format can be converted or transformed to YCbCr format and vice versa, a process also known as color transformation or conversion. If the image is black and white, the image may only include an array of luminance pixels. Correspondingly, the image can be, for example, an array of luma pixels in monochrome format or an array of luma pixels and two corresponding arrays of chrominance pixels in 4:2:0, 4:2:2 and 4:4:4 color formats .
- the image encoder 20A places no limitation on the color space of the image.
- an embodiment of the encoder 20A may include an image (or audio) segmentation unit (not shown in FIG. 1A or 1B ) for segmenting the image (or audio) 17 into multiple (typically non-overlapping ) image blocks 203 or audio segments.
- image blocks can also be called root blocks, macro blocks (H.264/AVC) or coding tree blocks (Coding Tree Block, CTB) in the H.265/HEVC and VVC standards, or coding tree units (Coding Tree Unit, CTU).
- the segmentation unit can be used to use the same block size for all images in a video sequence and to use a corresponding grid that defines the block size, or to vary the block size between images or subsets or groups of images and segment each image into corresponding piece.
- the encoder can be adapted to directly receive blocks 203 of an image 17 , for example one, several or all blocks making up said image 17 .
- the image block 203 may also be referred to as a current image block or an image block to be encoded.
- the image block 203 is also or can be regarded as a two-dimensional array or matrix composed of pixels with intensity values (pixel values), but the image block 203 is smaller than that of the image 17 .
- block 203 may comprise one pixel point array (for example, a luminance array in the case of a monochrome image 17 or a luminance array or a chrominance array in the case of a color image) or three pixel point arrays (for example, in the case of a color image 17 one luma array and two chrominance arrays) or any other number and/or type of arrays depending on the color format employed.
- a block may be an array of M ⁇ N (M columns ⁇ N rows) pixel points, or an array of M ⁇ N transform coefficients, and the like.
- the encoder 20A shown in FIGS. 1A-1B or 3A-3D is used to encode the image 17 block by block.
- the encoder 20A shown in FIGS. 1A-1B or 3A-3D is used to encode the image 17 .
- the encoder 20A shown in FIGS. 1A-1B or 3A-3D can also be used to partition the coded picture using slices (also called video slices), where the picture can use one or more slices (usually for non-overlapping) to split or encode.
- slices also called video slices
- Each slice may include one or more blocks (for example, coding tree unit CTU) or one or more block groups (for example, coding block (tile) in H.265/HEVC/VVC standard and sub-picture in VVC standard (subpicture).
- the encoder 20A shown in FIGS. 1A-1B or 3A-3D can also be used to use slices/coded block groups (also called video coded block groups) and/or coded blocks ( Also known as a Video Coding Block) to partition and/or code an image, where an image can be partitioned or coded using one or more slices/coding block groups (usually non-overlapping), each slice/coding block A group may include one or more blocks (such as CTUs) or one or more coding blocks, etc., wherein each coding block may be in the shape of a rectangle or the like, and may include one or more complete or partial blocks (such as CTUs).
- slices/coded block groups also called video coded block groups
- coded blocks also known as a Video Coding Block
- the coding network 20 is used to obtain image feature maps or audio feature variables according to the input data through the coding network.
- the encoding network 20 is as shown in FIG. 4A .
- the encoding network 20 includes multiple network layers, and any network layer may be a convolutional layer, a normalization layer, a nonlinear activation layer, and the like.
- the input of the encoding network 20 is at least one image to be encoded or at least one image block to be encoded.
- the image to be encoded can be an original image, a lossy image or a residual image.
- FIG. 4B an example of the network structure of the encoding network in the encoding network 20 is shown in FIG. 4B. It can be seen that the encoding network in the example includes 5 network layers, specifically including three convolutional layers and two nonlinear activation layers. .
- the rounding is used to round the image feature map or the audio feature variable by, for example, scalar quantization or vector quantization, to obtain the rounded image feature map or audio feature variable.
- the encoder 20A can be used to output the rounding parameter (quantization parameter, QP), for example, directly output or output after encoding or compression by the encoding decision realization unit, for example, so that the decoder 30A can receive and use the quantization parameter to decode.
- QP quantization parameter
- the output feature map or feature audio feature variables are preprocessed before being rounded, and the preprocessing may include trimming, color format conversion (for example, from RGB to YCbCr), color correction, or denoising.
- the probability estimation is based on the input feature map or feature variable information to obtain the probability estimation result of the image feature map or audio feature variable.
- Probability estimation is used to perform probability estimation on rounded image feature maps or audio feature variables.
- the probability estimation may be a probability estimation network, the probability estimation network is a convolutional network, and the convolutional network includes a convolutional layer and a nonlinear activation layer. Taking Figure 4B as an example, the probability estimation network includes 5 network layers, specifically including three convolutional layers and two nonlinear activation layers. Probability estimation can be realized by non-network traditional probability estimation method. Probability estimation methods include, but are not limited to, statistical methods such as equal maximum likelihood estimation, maximum a posteriori estimation, and maximum likelihood estimation.
- the implementation of coding decision includes coding element judgment and entropy coding.
- the image feature map or audio feature variable is one-dimensional, two-dimensional or multi-dimensional data output by the encoding network, where each data is a feature element. Coding Element Judgment 261
- the coding element judgment is to judge each feature element in the image feature map or audio feature variable according to the probability estimation result information of the probability estimation, and decide which feature elements to perform entropy coding according to the judgment result.
- the element judgment process of the Pth feature element of the image feature map or audio feature variable After the element judgment process of the Pth feature element of the image feature map or audio feature variable is completed, the element judgment process of the P+1th feature element of the image feature map starts, where P is a positive integer and P is less than M.
- Entropy coding can be encoded using various public entropy coding algorithms, such as using schemes such as variable length coding (variable length coding, VLC) scheme, context adaptive VLC scheme (context adaptive VLC, CAVLC), entropy coding scheme, two Value algorithm, context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval segmentation entropy (probability interval partitioning entropy, PIPE) encoding or other entropy encoding methods or techniques.
- VLC variable length coding
- CAVLC context adaptive VLC scheme
- SBAC syntax-based context-adaptive binary arithmetic coding
- PIPE probability interval segmentation entropy
- the encoded image data 25 that can be output in the form of an encoded bit stream 25 or the like through the output terminal 212 is obtained so that the decoder 30A or the like can receive and use parameters for decoding.
- Encoded bitstream 25 may be transmitted to decoder 30A, or stored in memory for later transmission or retrieval by decoder 30A.
- the entropy coding can be coded by using an entropy coding network, for example by using a convolutional network.
- the entropy coding since the entropy coding does not know the real character probability of the rounded feature map, these or related information can be added to the entropy coding and passed to the decoding end.
- the joint network is based on the input side information to obtain the probability estimation results and decision information of image feature maps or audio feature variables.
- the joint network is a multi-layer network, and the joint network may be a convolutional network, which includes a convolutional layer and a nonlinear activation layer. Any network layer of the joint network can be a convolutional layer, a normalization layer, a nonlinear activation layer, etc.
- the decision information may be one-dimensional, two-dimensional or multi-dimensional data, and the size of the decision information may be consistent with the size of the image feature map.
- the decision information can be output after any network layer in the joint network.
- the probability estimation result can be output after any network layer in the joint network.
- Figure 6 is an example of the output of the network structure of the joint network.
- the network structure includes 4 network layers, in which the decision information is output after the fourth network layer, and the probability estimation result is output after the second network layer.
- the generation network is to obtain the decision information of each feature element in the image feature map according to the input probability estimation result.
- the generation network is a multi-layer network, and the generation network may be a convolutional network, which includes a convolutional layer and a nonlinear activation layer. Any network layer of the generated network can be a convolutional layer, a normalization layer, a nonlinear activation layer, etc.
- the decision information can be output after generating any network layer in the network.
- the decision information may be one-dimensional, two-dimensional or multi-dimensional data.
- Fig. 7 is an example of output decision information of the network structure of the generating network, and the network structure includes 4 network layers.
- the decoding decision implementation includes element judgment and entropy decoding.
- the image feature map or audio feature variable is one-dimensional, two-dimensional or multi-dimensional data output by decoding decision-making, where each data is a feature element.
- the decoding element judgment judges each feature element in the image feature map or audio feature variable according to the probability estimation result of the probability estimation, and decides which feature elements to perform entropy decoding according to the judgment result.
- the decoding element judgment judges each feature element in the image feature map or audio feature variable and decides which feature elements to perform entropy decoding according to the judgment result, which can be regarded as the coding element judgment for each feature element in the image feature map.
- Entropy decoding can be encoded using various public entropy decoding algorithms, such as schemes such as variable length coding (variable length coding, VLC) schemes, context adaptive VLC schemes (context adaptive VLC, CAVLC), entropy decoding schemes, two Value-based algorithm, context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval segmentation entropy (probability interval partitioning entropy, PIPE) encoding or other entropy encoding methods or techniques.
- VLC variable length coding
- CAVLC context adaptive VLC schemes
- SBAC syntax-based context-adaptive binary arithmetic coding
- PIPE probability interval partitioning entropy
- the encoded image (or audio) data 25 that can be output in the form of an encoded bit stream 25 or the like through the output terminal 212 is obtained so that the decoder 30A or the like can receive and use parameters for decoding.
- Encoded bitstream 25 may be transmitted to decoder 30A, or stored in memory for later transmission or retrieval by decoder 30A.
- entropy decoding can be performed using an entropy decoding network, such as a convolutional network.
- the decoding network is used to pass the decoded image feature map or audio feature variable 31 or the post-processed decoded image feature map or audio feature variable 33 through the decoding network 34 to obtain reconstructed image (or audio) data 35 or machine task-oriented data in the pixel domain.
- the decoding network contains multiple network layers, and any network layer can be a convolutional layer, a normalization layer, a nonlinear activation layer, etc. Operations such as superposition (concat), addition, and subtraction may exist in the decoding network unit 306 .
- the network layer structures in the decoding network may be the same or different from each other.
- the decoding network in the example includes 5 network layers, including a normalization layer, two convolutional layers, and two nonlinear activation layers.
- the decoding network outputs the reconstructed image (or audio), or outputs machine-oriented task data.
- the decoding network may include an object recognition network, a classification network or a semantic segmentation network.
- the processing result of the current step can be further processed, and then output to the next step.
- further operations or processing may be performed on the processing results of the encoder unit or the decoder unit, such as clipping or shifting operations or filtering processing.
- the first feature element or the second feature element is the current feature element to be encoded or the current feature element to be decoded or, for example
- a decision map can also be called a decision map.
- the decision graph is preferably a binary graph, and the binary graph may also be called a binary graph map.
- Figure 10A shows a specific implementation process 1400, and the operation steps are as follows:
- Step 1401 Get the feature map of the image
- This step is specifically implemented by the encoding network 204 in FIG. 3A , and for details, reference may be made to the above description of the encoding network 20 .
- the images are respectively input into the feature map y of the output image of the feature extraction module, and the feature map y can be three-dimensional data whose dimensions are wxhxc.
- the feature extraction module can be implemented using an existing neural network, which is not limited here. This step is prior art.
- the feature quantization module quantifies each feature value in the feature map y, rounds the feature value of the floating point number to obtain the integer feature value, and obtains the quantized feature map
- the description of the rounding 24 in the foregoing embodiment may be referred to.
- Step 1402 To feature map Perform probability estimation to obtain the probability estimation results of each feature element, that is, the feature map Each feature element in The probability distribution of :
- the parameters x, y, and i are positive integers
- the coordinates (x, y, i) indicate the position of the current feature element to be encoded.
- the coordinates (x, y, i) indicate that the current feature element to be encoded is in the current three-dimensional feature
- the probability distribution model can be used to obtain the probability distribution, for example, using a single Gaussian model (GSM) or a Gaussian mixture model (GMM) to model, first of all, the side information And the context information is input into the probability estimation network, and the feature map Each feature element in Perform probability estimation to get each feature element probability distribution.
- the probability estimation network can be based on a deep learning network, such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here. Substitute the model parameters into the probability distribution model to obtain the probability distribution.
- Step 1403 To feature map Perform entropy coding to obtain a compressed code stream, and generate a compressed code stream.
- the current feature element to be encoded is obtained The probability P with a value of k, when the current feature element to be encoded
- the probability estimation result P of does not meet the preset condition: when P is greater than (or equal to) the first threshold T0, skip the current feature element to be encoded and perform the entropy encoding process; otherwise, when the probability estimation result P of the current feature element to be encoded satisfies Preset condition: when P is smaller than the first threshold T0, perform entropy coding on the current feature element to be coded and write it into the code stream.
- k can be any integer, such as 0, 1, -1, 2, 3 and so on.
- the first threshold T0 is any number satisfying 0 ⁇ T0 ⁇ 1, such as 0.99, 0.98, 0.97, 0.95 and so on. (It can be considered that the threshold value of each feature element is the same).
- Step 1404 the encoder sends or stores the compressed code stream.
- Step 1411 Obtain the code stream of the decoded image feature map
- Step 1412 Perform probability estimation according to the code stream to obtain the probability estimation results of each feature element
- the feature map to be decoded Each feature element in Perform probability estimation to obtain the feature elements to be decoded probability distribution.
- Feature map to be decoded includes a plurality of characteristic elements, and the plurality of characteristic elements include the currently to-be-decoded characteristic element.
- the probability estimation network structure diagram used at the decoding end is the same as the probability estimation network structure at the encoding end in this embodiment.
- Step 1413 Feature map to be decoded Perform entropy decoding
- the decoding decision implementation 304 in FIG. 10B This step is specifically implemented by the decoding decision implementation 304 in FIG. 10B , and for details, refer to the above description of the decoding decision implementation 30 .
- the probability P of the value of the current feature element to be decoded is k, that is, the probability estimation result P of the current feature element to be decoded.
- the first threshold T0 can obtain the index number from the code stream by parsing the code stream, and the decoding end constructs the threshold candidate list in the same way as the encoding end, and then obtains according to the corresponding relationship between the preset and the index number in the threshold candidate list the corresponding threshold.
- obtaining the index number from the code stream means obtaining the index number from the sequence header, image header, Slice/strip or SEI.
- the code stream may be directly parsed, and the threshold value may be obtained from the code stream, specifically, the threshold value may be obtained from a sequence header, a picture header, a Slice/strip, or an SEI.
- Step 1414 Decoded feature maps Reconstruction, or input to the machine vision task module to perform corresponding machine tasks.
- This step can be specifically implemented by the decoding network 306 in FIG. 10B , and for details, reference can be made to the above description of the decoding network 34 .
- Case 1 Feature map after entropy decoding Input the image reconstruction module, and the neural network outputs the reconstructed map.
- the neural network can adopt any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like.
- the neural network can adopt a multi-layer deep neural network structure to achieve better estimation results.
- Case 2 Feature map after entropy decoding
- the input is oriented to the machine vision task module to perform corresponding machine tasks.
- complete machine vision tasks such as object classification, recognition, and segmentation.
- the above k value at the decoding end is set corresponding to the k value at the encoding end.
- FIG. 11A shows a specific implementation process 1500 of Embodiment 2 of the present application, and the operation steps are as follows:
- the probability estimation results include the first parameter and the second parameter; when the probability distribution is a Gaussian distribution, the first parameter is the mean value ⁇ , and the second parameter is the variance ⁇ ; When the probability distribution is a Laplace distribution, the first parameter is the location parameter ⁇ , and the second parameter is the scale parameter b.
- Step 1501 Get the feature map of the image
- This step is specifically implemented by the encoding network 204 in FIG. 3B , and for details, reference may be made to the above description of the encoding network 20 .
- the images are respectively input into the feature map y of the output image of the feature extraction module, and the feature map y can be three-dimensional data whose dimensions are wxhxc. .
- the feature extraction module can be implemented using an existing neural network, which is not limited here. This step is prior art.
- the feature quantization module quantifies each feature value in the feature map y, rounds the feature value of the floating point number to obtain the integer feature value, and obtains the quantized feature map
- Step 1502 Feature map of the image Input side information extraction module, output side information
- This step is specifically implemented by the side information extraction unit 214 in FIG. 3B .
- the side information extraction module can be implemented using the network shown in Figure 12, the side information Can be understood as the feature map The feature map obtained by further extraction The number-ratio feature map of the contained feature elements few.
- edge information Execute entropy coding and write into the code stream, and also perform edge information in subsequent step 1504 Execute entropy coding and write code stream, which is not limited here.
- Step 1503 To feature map Perform probability estimation to obtain the probability estimation results of each feature element.
- Probability distribution models can be used to obtain probability estimates and probability distributions.
- the probability distribution model may be: a single Gaussian model (Gaussian single model, GSM) or an asymmetric Gaussian model or a mixed Gaussian model (Gaussian mixture model, GMM) or a Laplace distribution model (Laplace distribution).
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- the side information Or the context information is input into the probability estimation network, and the feature map Each feature element in Probability estimation is performed to obtain the values of the mean parameter ⁇ and variance ⁇ .
- the mean parameter ⁇ and variance ⁇ are input into the used probability distribution model to obtain a probability distribution.
- the probability estimation result is the mean parameter ⁇ and variance ⁇ .
- the probability distribution models the Laplace distribution model
- the side information Or the context information is input into the probability estimation network, and the feature map Each feature element in Probability estimation is performed to obtain the values of the location parameter ⁇ and the scale parameter b.
- the position parameter ⁇ and the scale parameter b are input into the probability distribution model used to obtain a probability distribution.
- the probability estimation result is the position parameter ⁇ and the scale parameter b.
- the probability estimation network to treat the encoded feature map
- the current feature element to be encoded is obtained Take the probability P of the value m.
- the probability estimation result is the current feature element to be encoded Take the probability P of the value m.
- the probability estimation network can use a network based on deep learning, such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
- a network based on deep learning such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
- Step 1504 Judging the current feature element to be encoded according to the probability estimation result Whether it is necessary to perform entropy coding, and perform entropy coding to write into the compressed code stream (encoded code stream) or not perform entropy coding according to the judgment result. Only when it is determined that entropy coding needs to be performed on the first feature element currently to be encoded, entropy coding is performed on the feature element currently to be encoded.
- This step is specifically implemented by the encoding decision implementation 208 in FIG. 3B , and for details, refer to the description of the above encoding decision implementation 26 .
- Judging the current feature element to be encoded according to the probability estimation result Whether entropy coding needs to be performed can use one or more of the following methods.
- the parameters x, y, and i are positive integers
- the coordinates (x, y, i) indicate the position of the current feature element to be encoded.
- the coordinates (x, y, i) indicate that the current feature element to be encoded is in the current three-dimensional feature The position of the feature element in the graph relative to the upper left vertex.
- Method 1 When the probability distribution model is a Gaussian distribution, judge whether to perform entropy coding on the current feature element to be encoded according to the probability estimation result of the first feature element, when the mean value of the Gaussian distribution of the current feature element to be encoded is The values of the parameter ⁇ and the variance ⁇ do not meet the preset conditions: when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2, there is no need for the current feature element to be encoded Execute the entropy encoding process, otherwise, when the preset condition is met: when the absolute value of the difference between the mean value ⁇ and k is greater than or equal to the second threshold T1 or the variance ⁇ is less than the third threshold T2, the current feature element to be encoded Perform entropy encoding to write code stream.
- the preset condition when the preset condition is met: when the absolute value of the difference between the mean value ⁇ and k is greater than or equal
- k is any integer, such as 0, 1, -1, 2, 3 and so on.
- the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, such as 0.2, 0.3, 0.4, etc.
- T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
- the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, then skip the feature element to be encoded Execute the entropy encoding process, otherwise, for the current feature element to be encoded Perform entropy encoding to write code stream.
- the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, for example, the value is 0.2, 0.3, 0.4 and so on.
- T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
- Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the mean parameter ⁇ and variance ⁇ of the Gaussian distribution, when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (does not meet the preset conditions), skip the current feature to be encoded element Execute the entropy encoding process, where abs( ⁇ -k) means calculating the absolute value of the difference between the mean value ⁇ and k; otherwise, when the probability estimation result of the current feature element to be encoded satisfies abs( ⁇ -k)+ ⁇ T3 ( Preset conditions), for the current feature element to be encoded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- the fourth threshold T3 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3
- Method 3 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
- abs( ⁇ -k) means calculating the absolute value of the difference between the position parameter ⁇ and k; otherwise, when the probability estimation result of the current feature element to be encoded satisfies abs( ⁇ -k)+ ⁇ T4( Preset conditions), for the current feature element to be encoded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- the fourth threshold T4 is a number greater than or equal to 0 and less than 0.5, such as 0.05, 0.09, 0.17
- Method 4 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
- the preset condition is not met
- skip the current feature element to be encoded Execute the entropy encoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (preset condition), the current feature element to be encoded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- the value of T5 is 1e-2
- the value of T6 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17, etc.
- the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, then skip the feature element to be encoded Execute the entropy encoding process, otherwise, for the current feature element to be encoded Perform entropy encoding to write code stream.
- the value of the threshold T5 is 1e-2, and the value of T2 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17 and so on.
- Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution. When the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and any variance of the mixed Gaussian distribution is less than the fifth threshold T7 (do not meet the preset condition), skip the current feature element to be encoded Execute the entropy encoding process; Otherwise, when the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution and k and the sum of any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (preset condition), the current Feature elements to be coded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- T7 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4, etc. (It can be considered that the threshold value of each feature element is the same)
- Method 6 Obtain the current feature element to be encoded according to the probability distribution
- the probability P with a value of k when the probability estimation result P of the current feature element to be encoded does not meet the preset condition: when P is greater than (or equal to) the first threshold T0, the current feature element to be encoded is skipped and the entropy encoding process is performed; Otherwise, when the probability estimation result P of the current feature element to be encoded satisfies the preset condition: when P is smaller than the first threshold T0, entropy encoding is performed on the current feature element to be encoded and written into the code stream.
- k can be any integer, such as 0, 1, -1, 2, 3 and so on.
- the first threshold T0 is any number satisfying 0 ⁇ T0 ⁇ 1, such as 0.99, 0.98, 0.97, 0.95 and so on. (It can be considered that the threshold value of each feature element is the same)
- the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
- Method 1 Take the threshold T1 as an example, take any value within the value range of T1 as the threshold T1, and write the threshold T1 into the code stream. Specifically, the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
- Method 2 The encoding end adopts the fixed threshold value agreed with the decoding end, and there is no need to write the code stream or transmit it to the decoding end. For example, taking the threshold T1 as an example, any value within the value range of T1 is directly taken as the value of T1. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
- Method 3 Build a threshold candidate list, put the most likely value within the value range of T1 into the threshold candidate list, each threshold corresponds to a threshold index number, determine an optimal threshold, and use the optimal threshold as T1 , and use the index number of the optimal threshold as the threshold index number of T1, and write the threshold index number of T1 into the code stream.
- the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
- Step 1505 the encoder sends or stores the compressed code stream.
- Step 1511 Obtain the code stream of the feature map of the image to be decoded
- Step 1512 Obtain the probability estimation results of each feature element
- This step is specifically implemented by the probability estimation unit 302 in FIG. 11A , and for details, refer to the above description of the probability estimation 40 .
- side information Perform entropy decoding to get side information combined side information The feature map to be decoded Each feature element in Perform probability estimation to obtain the current feature element to be decoded The probability estimation result of .
- the probability estimation method used by the decoder is the same as the probability estimation method of the encoder in this embodiment, and the probability estimation network structure diagram is the same as the probability estimation network structure of the encoder in this embodiment, so details are not repeated here.
- Step 1513 This step is specifically implemented by the decoding decision implementation 304 in FIG. 11A , for details, refer to the above description of the decoding decision implementation 30 . Judging the current feature element to be decoded according to the probability estimation result Whether it is necessary to perform entropy decoding, and perform or not perform entropy decoding according to the judgment result, and obtain the decoded feature map
- Judging the current feature element to be decoded according to the probability estimation result Whether entropy decoding needs to be performed can use one or more of the following methods.
- Method 1 When the probability distribution model is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The value of the mean value parameter ⁇ and variance ⁇ of the value, when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2 (the preset condition is not satisfied), the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process; otherwise, when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 or the variance ⁇ is greater than or equal to the third threshold T2 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, the current feature element to be decoded If the value of is set to k, skip the current feature element to be decoded Execute the entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded
- the value of the mean value parameter ⁇ and variance ⁇ when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (the preset condition is not satisfied), T3 is the fourth threshold, and the current to-be-decoded feature element
- the value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T3 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 3 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
- T4 is the fourth threshold, and the current feature element to be decoded
- the value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T4 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 4 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
- the current feature element to be decoded is The value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (preset condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, the current feature element to be decoded
- the value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution.
- the The current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Execute the entropy decoding process, otherwise, when the sum of the absolute values of the differences between all mean values of the mixed Gaussian distribution and the value k of the current feature element to be decoded and the sum of any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (preset condition), for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to
- Method 6 According to the probability distribution of the current feature element to be decoded, the probability P of the value of the current feature element to be decoded is k, that is, the probability estimation result P of the current feature element to be decoded.
- the probability estimation result P does not meet the preset conditions:
- P is greater than the first threshold T0, there is no need to perform entropy decoding on the current feature element to be decoded, and the value of the current feature element to be decoded is set to k; otherwise, when the current feature element to be decoded meets the preset condition: P is less than or equal to
- the first threshold is T0, entropy decoding is performed on the code stream to obtain the value of the feature element currently to be decoded.
- the above k value at the decoding end is set corresponding to the k value at the encoding end.
- the method of obtaining thresholds T0, T1, T2, T3, T4, T5, T6 and T7 corresponds to the encoding end, and one of the following methods can be used:
- Method 1 Obtain the threshold value from the code stream, specifically, obtain the threshold value from the sequence header, image header, slice/strip or SEI.
- Method 2 The decoder adopts the fixed threshold agreed with the encoder.
- Method 3 Obtain the threshold index number from the code stream, specifically, obtain the threshold index number from the sequence header, image header, Slice/strip or SEI. Then, the decoder constructs a threshold candidate list in the same way as the encoder, and obtains the corresponding threshold in the threshold candidate list according to the threshold index number.
- the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
- Step 1514 same as step 1414.
- FIG. 13A shows a specific implementation process 1600 provided by Embodiment 3 of the present application, and the operation steps are as follows:
- Step 1601 Same as step 1501, this step is specifically implemented by the coding network 204 in FIG. 3C , for details, please refer to the above description of the coding network 20;
- Step 1602 Same as step 1502, this step is specifically implemented by side information extraction 214 in FIG. 3C;
- Step 1603 To feature map Perform probability estimation to obtain the probability estimation results of each feature element
- Probability distribution models can be used to obtain probability estimates.
- the probability distribution model may be: a single Gaussian model or an asymmetric Gaussian model or a mixed Gaussian model or a Laplace distribution model.
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- the side information or the context information is input into the probability estimation network, and the feature map
- Each feature element in Perform probability estimation to obtain the values of the model parameter mean parameter ⁇ and variance ⁇ , that is, the result of probability estimation.
- the probability distribution models the Laplace distribution model
- the side information Or the context information is input into the probability estimation network, and the feature map Each feature element in Probability estimation is performed to obtain the values of the model parameter position parameter ⁇ and scale parameter b, that is, the result of probability estimation.
- the probability estimation result is input into the used probability distribution model to obtain the probability distribution.
- the probability estimation network To treat the encoded feature map
- the current feature element to be encoded is obtained Take the probability P of the value m.
- m is any integer, such as 0, 1, -1, -2, 3 and so on.
- the probability estimation network may use a network based on deep learning, such as a recurrent neural network and a convolutional neural network, etc., which are not limited here.
- Step 1604 Determine whether to perform entropy coding on the current feature element to be coded according to the probability estimation result. Perform entropy coding on the current feature element to be coded according to the judgment result and write it into the coded stream or not perform entropy coding. Only when it is determined that entropy encoding needs to be performed on the current feature element to be encoded, entropy encoding is performed on the current feature element to be encoded.
- the probability estimation result 211 is input into the judgment module, and the output and feature map Decision information 217 with the same dimension.
- the decision information 217 may be a three-dimensional decision map.
- the judging module can be realized by using a network method, that is, the probability estimation result or probability distribution is input into the generation network shown in FIG. 7 , and the network outputs a decision map map.
- the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
- the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position The value of high probability is k, and the decision map map[x][y][i] is not a preset value, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
- the decision information is related to the feature map A decision map map with the same dimensions.
- the decision map map[x][y][i] represents the value at the coordinate position (x, y, i) in the decision map map.
- the default value is a specific value. For example, when the optional values of the feature element to be encoded are 0 and 1, the default value is 0 or 1; coded feature element When there are multiple optional values, the default value is some specific value, such as the current feature element to be encoded When the optional value of the element is 0-255, the default value is a proper subset of 0-255.
- the probability estimation result or probability distribution of the current feature element to be encoded is input into the judgment module, and the judgment module directly outputs decision information on whether the current feature element to be encoded needs to perform entropy coding.
- the decision information output by the judging module is a preset value
- the decision information output by the judging module is not a preset value
- the judging module can be implemented by means of a network, that is, the probability estimation result or probability distribution is input into the generation network shown in FIG. 7 , and the network outputs decision information, ie, a preset value.
- the decision information is related to the feature map
- the decision map map with the same dimension the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
- the decision map map[x][y][i] is not a preset value indicating the current feature element to be encoded at the corresponding position
- the value of high probability is k
- the decision map map[x][y][i] is 0, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
- the default value is a specific value. For example, when the optional values of the characteristic element are 0 and 1, the default value is 0 or 1; when the characteristic element in the decision map map When there are multiple optional values, the default value is some specific value, such as feature element When the optional value of the element is 0-255, the default value is a proper subset of 0-255.
- Method 2 The decision information is related to the feature map
- the decision map map with the same dimension, the decision map map[x][y][i] is greater than or equal to the threshold T0 indicates the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
- the decision map map[x][y][i] is less than the threshold T0, indicating the current feature element to be encoded at the corresponding position
- the value of high probability is k, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
- T0 can be the mean value within the numerical range.
- the decision information can also be the identifier or the value of the identifier directly output by the joint network.
- the decision information is a preset value, it means that the current feature element to be encoded needs to perform entropy encoding, and the decision information output by the judgment module is not the preset value.
- set to a value it means that the current feature element to be encoded does not need to perform entropy encoding.
- the optional values of the identifier or the value of the identifier are 0 and 1, the default value is 0 or 1 accordingly.
- the logo or the value of the logo can also have multiple optional values, the default value is some specific value. For example, when the optional value of the logo or the value of the logo is 0-255, the default value is a proper subset of 0-255 .
- the high probability refers to: the current feature element to be encoded
- the probability is high and greater than the threshold P, where P can be a number greater than 0.9, such as 0.9, 0.95 or 0.98.
- Step 1605 The encoder sends or stores the compressed code stream.
- pair feature map At least one of the characteristic elements executes the above steps 1601 to 1604 to obtain the compressed code stream, and transmit the compressed code stream to the decoding end.
- Step 1611 Obtain the compressed code stream to be decoded
- Step 1612 Feature map to be decoded Perform probability estimation to obtain the probability estimation results of each feature element
- This step can be specifically implemented by the probability estimation 302 in FIG. 13B , and for details, refer to the above description of the probability estimation 40 .
- Obtain side information from code stream Use the method in step 1603 to obtain the probability estimation result of the feature element to be decoded currently
- Step 1613 Obtain decision information, and judge whether to perform entropy decoding according to the decision information.
- This step can be specifically implemented by the generation network 310 and the decoding decision implementation 304 in FIG. 13B , and for details, refer to the above description of the generation network 46 and the decoding decision implementation 30 .
- the decision information 311 is acquired using the same method as that of the encoder in this embodiment.
- the decision map map[x][y][i] is a preset value indicating the current feature element to be decoded at the corresponding position Entropy decoding is required, and entropy decoding is performed on the current feature element to be decoded according to the probability distribution.
- the decision map map[x][y][i] is not a preset value, indicating the current feature element to be decoded at the corresponding position Entropy decoding is not required, which means the corresponding position is a specific value k.
- the probability estimation result or probability distribution of the feature element to be decoded is input into a judgment module, and the judgment module directly outputs decision information on whether the feature element to be decoded currently needs to perform entropy decoding.
- the decision information output by the judging module is a preset value
- the decision information output by the judging module is not a preset value
- it means that the current feature element to be decoded does not need to perform entropy decoding.
- the judging module can be implemented by means of a network, that is, the probability estimation result or probability distribution is input into the generating network shown in FIG. 8 , and the network outputs decision information, ie, a preset value.
- the decision information is used to indicate whether to perform entropy decoding on the feature element currently to be decoded, and the decision information may include a decision map.
- Step 1614 Step 1414 is the same.
- the above k value at the decoding end is set corresponding to the k value at the encoding end.
- Figure 14 shows a specific implementation process 1700 of Embodiment 4 of the present application, and the operation steps are as follows:
- Step 1701 Same as step 1501, this step can be specifically implemented by the encoding network 204 in FIG. 3D , and can refer to the above description of the encoding network 20 for details;
- Step 1702 Same as step 1502, this step is specifically implemented by side information extraction 214 in FIG. 3D;
- Step 1703 Obtain feature map Probability estimation results and decision information of each feature element in ;
- This step can be specifically implemented by the federation network 218 in FIG. 3D , and for details, reference can be made to the above description of the federation network 34 .
- the side information and/or contextual information are input into the joint network, and the joint network outputs the feature map to be encoded
- the network structure can be used as shown in Figure 15.
- decision information, probability distribution and/or probability estimation results can all be output from different layers of the joint network. For example: case 1) the middle layer of the network outputs decision information, and the last layer outputs probability distribution and/or probability estimation results; case 2) the middle layer of the network outputs probability distribution and/or probability estimation results, and the last layer outputs decision information; case 3) the network The final layer outputs decision information together with probability distribution and/or probability estimation results.
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- the side information or the context information is input into the joint network to obtain the values of the model parameter mean parameter ⁇ and variance ⁇ , that is, the probability estimation result.
- the probability estimation result is input into the Gaussian model to obtain the probability distribution.
- the probability distribution models the Laplace distribution model
- the side information Or the context information is input into the joint network to obtain the value of the model parameter position parameter ⁇ and scale parameter b, that is, the probability estimation result. Further, the probability estimation result is input into the Laplace distribution model to obtain the probability distribution.
- the side information and/or context information into the joint network to obtain the current feature elements to be encoded probability distribution.
- the current feature element to be encoded is obtained
- the probability P whose value is m is the probability estimation result.
- m is any integer, such as 0, 1, -1, -2, 3 and so on.
- Step 1704 and judge whether to perform entropy coding according to the decision information; perform entropy coding and write into the compressed code stream (encoded code stream) or not perform entropy coding according to the judgment result. Only when it is determined that entropy encoding needs to be performed on the current feature element to be encoded, entropy encoding is performed on the current feature element to be encoded.
- This step can be specifically implemented by the encoding decision implementation 208 in FIG. 3D , and for details, refer to the description of the above encoding decision implementation 26 .
- the decision information is related to the feature map
- the decision map map with the same dimension the decision map map[x][y][i] is a preset value indicating the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
- the decision map map[x][y][i] is not a preset value indicating the current feature element to be encoded at the corresponding position
- the value of high probability is k
- the decision map map[x][y][i] is 0, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
- the default value is a specific value. For example, when the optional values of the current feature element to be encoded are 0 and 1, the default value is 0 or 1; when the current feature element to be encoded in the decision map map When there are multiple optional values, the default value is some specific value, such as the current feature element to be encoded When the optional value of the element is 0-255, the default value is a proper subset of 0-255.
- Method 2 The decision information is related to the feature map
- the decision map map with the same dimension, the decision map map[x][y][i] is greater than or equal to the threshold T0 indicates the current feature element to be encoded at the corresponding position Entropy coding is required, and entropy coding is performed on the current feature element to be coded according to the probability distribution.
- the decision map map[x][y][i] is less than the threshold T0, indicating the current feature element to be encoded at the corresponding position
- the value of high probability is k, indicating the current feature element to be encoded at the corresponding position Entropy encoding is not required, that is, the process of entropy encoding is skipped.
- T0 can be the mean value within the numerical range.
- the decision information can also be the identifier or the value of the identifier directly output by the joint network.
- the decision information is a preset value, it means that the current feature element to be encoded needs to perform entropy encoding, and the decision information output by the judgment module is not the preset value.
- set to a value it means that the current feature element to be encoded does not need to perform entropy encoding.
- the preset value is a specific value, for example, when the optional values of the current feature element to be encoded are 0 and 1, the preset value is Set the value to 0 or 1; when there are multiple optional values for the current feature element to be encoded in the joint network output decision map map, the default value is some specific value, for example, the optional value of the current feature element to be encoded is When 0 ⁇ 255, the default value is a proper subset of 0 ⁇ 255.
- the high probability refers to: the current feature element to be encoded
- the probability is high, for example, when the value is k, the probability is greater than the threshold P, where P can be a number greater than 0.9, such as 0.9, 0.95 or 0.98.
- Step 1705 The encoder sends or stores the compressed code stream.
- Step 1711 Obtain the code stream of the feature map of the image to be decoded, and obtain side information from the code stream
- Step 1712 Obtain feature map Probability estimation results and decision information for each feature element in
- This step can be specifically implemented by the federated network 312 in FIG. 16 , and for details, refer to the above description of the federated network 34 .
- Get feature map The method of the probability estimation result and decision information of each feature element in is the same as step 1703.
- Step 1713 Determine whether to perform entropy decoding according to the decision information; perform or not perform entropy decoding according to the judgment result. This step can be specifically implemented by the decoding decision implementation 304 in FIG.
- the decision information is the decision map map
- the decision map map[x][y][i] is a preset value indicating the current feature element to be decoded at the corresponding position Entropy decoding is required, and entropy decoding is performed on the current feature element to be decoded according to the probability distribution.
- the decision map map[x][y][i] is not a preset value, indicating the current feature element to be decoded at the corresponding position Entropy decoding is not required, which means the corresponding position Set to a specific value k.
- Method 2 The decision information is related to the feature map
- the decision map map with the same dimension, the decision map map[x][y][i] is greater than or equal to the threshold T0 indicates the current feature element to be decoded at the corresponding position Entropy decoding is required.
- the decision map map[x][y][i] is less than the threshold T0, indicating the current feature element to be decoded at the corresponding position
- the high probability value is k, indicating the current feature element to be decoded at the corresponding position Entropy decoding is not required, which means the corresponding position Set to a specific value k.
- the value of T0 is the same as that of the encoding end.
- the decision information can also be the identifier or the value of the identifier directly output by the joint network.
- the decision information is a preset value, it means that the current feature element to be decoded needs to perform entropy decoding, and the judgment module outputs the decision information
- it is not a preset value it means that the current feature element to be decoded does not need to perform entropy decoding, and the value of the current feature element to be decoded is set to k.
- the preset value is a specific value, for example, when the optional values of the current feature element to be decoded are 0 and 1, the preset value is Set the value to 0 or 1; when there are multiple optional values for the current feature element to be decoded in the joint network output decision map map, the default value is some specific value, for example, the optional value of the current feature element to be decoded is When 0 ⁇ 255, the default value is a proper subset of 0 ⁇ 255.
- Step 1714 Same as step 1414, this step can be specifically implemented by the decoding network unit 306 in the decoder 9C of the above-mentioned embodiment, and details can refer to the description of the decoding network unit 306 in the above-mentioned embodiment.
- the above k value at the decoding end is set corresponding to the k value at the encoding end.
- FIG. 17 shows the specific implementation process 1800 of Embodiment 5 of the present application, and the operation steps are as follows:
- Step 1801 Obtain the characteristic variables of the audio data to be encoded
- the audio signal to be encoded can be a time-domain audio signal; the audio signal to be encoded can be a frequency-domain signal obtained after the time-domain signal undergoes time-frequency transformation, for example, the frequency-domain signal can be a frequency-domain signal obtained by MDCT transforming the time-domain audio signal, The frequency domain signal after the time domain audio signal is transformed by FFT; the signal to be encoded can also be a signal after QMF filtering; the signal to be encoded can also be a residual signal, such as other encoded residual signal or LPC filtered residual signal .
- Obtaining the feature variable of the audio data to be encoded it may be to extract the feature vector according to the audio signal to be encoded, for example, to extract the Mel cepstral coefficient according to the audio signal to be encoded; quantize the extracted feature vector, and use the quantized feature vector as the feature vector to be encoded Feature variables for audio data.
- the audio signal to be encoded is processed by the encoding neural network to obtain the latent variable, the latent variable output by the neural network is quantified, and the quantized potential variable as the characteristic variable of the audio data to be encoded.
- the encoding neural network processing is pre-trained, and the present invention does not limit the specific network structure and training method of the encoding neural network.
- the encoding neural network can choose a fully connected network or a CNN network.
- the present invention also does not limit the number of layers included in the coding neural network and the number of nodes in each layer.
- the form of latent variables output by encoding neural networks with different structures may be different.
- the encoding neural network is a fully connected network, and the output latent variable is a vector.
- the encoding neural network is a CNN network, and the output latent variable is an N*M dimensional matrix, where N is the number of channels (channels) of the CNN network, and M is the size (latent size) of each channel latent variable of the CNN network, such as
- a specific method for quantizing the latent variable output by the neural network may be to perform scalar quantization on each element of the latent variable, and the quantization step size of the scalar quantization may be determined according to different encoding rates. There may also be a bias in scalar quantization, for example, the latent variable to be quantized is biased and then scalar quantized according to the determined quantization step size.
- the quantification method for quantifying latent variables can also be implemented using other existing quantification techniques, which is not limited in the present invention.
- the quantized feature vector or the quantized latent variable can be written as That is, the feature variable of the audio data to be encoded.
- Step 1802 Characteristic variables of audio data to be encoded Input side information extraction module, output side information
- the side information extraction module can be implemented using the network shown in Figure 12, the side information Can be understood as the feature variable
- the feature variables obtained by further extraction The number of feature elements contained is greater than the feature variable few.
- edge information Entropy encoding is performed and written into the code stream, and side information can also be checked in subsequent step 1804 Perform entropy encoding and write code stream, which is not limited here.
- Step 1803 For feature variables Perform probability estimation to obtain the probability estimation results of each feature element.
- Probability distribution models can be used to obtain probability estimates and probability distributions.
- the probability distribution model may be: a single Gaussian model (Gaussian single model, GSM) or an asymmetric Gaussian model or a mixed Gaussian model (Gaussian mixture model, GMM) or a Laplace distribution model (Laplace distribution).
- the following feature variables Take an N*M dimensional matrix as an example for illustration.
- the current characteristic variable to be encoded The characteristic elements in
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- Each feature element in Probability estimation is performed to obtain the values of the mean parameter ⁇ and variance ⁇ .
- the mean parameter ⁇ and variance ⁇ are input into the used probability distribution model to obtain a probability distribution.
- the probability estimation result is the mean parameter ⁇ and variance ⁇ .
- the variance can also be estimated by value.
- the probability distribution model is a Gaussian model (single Gaussian model or asymmetric Gaussian model or mixed Gaussian model)
- the side information Or contextual information input probability estimation network, for feature variables
- Each feature element in Probability estimation is performed to obtain the value of the variance ⁇ .
- the variance ⁇ is input into the used probability distribution model to obtain a probability distribution.
- the probability estimation result is the variance ⁇ .
- the probability distribution models the Laplace distribution model
- the side information Or contextual information input probability estimation network for feature map variables
- Each feature element in Probability estimation is performed to obtain the values of the location parameter ⁇ and the scale parameter b.
- the position parameter ⁇ and the scale parameter b are input into the probability distribution model used to obtain a probability distribution.
- the probability estimation result is the position parameter ⁇ and the scale parameter b.
- the probability estimation network to treat the encoded feature map
- the current feature element to be encoded is obtained Take the probability P of the value m.
- the probability estimation result is the current feature element to be encoded Take the probability P of the value m.
- the probability estimation network can use a network based on deep learning, such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
- a network based on deep learning such as a recurrent neural network (Recurrent Neural Network, RNN) and a convolutional neural network (Convolutional Neural Network, PixelCNN), etc., which are not limited here.
- Step 1804 Judging whether entropy coding is required for the current feature element to be coded according to the probability estimation result, and performing entropy coding and writing into the compressed code stream (coded code stream) or not performing entropy coding according to the judgment result.
- Judging the current feature element to be encoded according to the probability estimation result Whether entropy coding needs to be performed can use one or more of the following methods.
- the parameters j, i are positive integers, and the coordinates (j, i) indicate the current position of the feature element to be encoded.
- judge the current feature element to be encoded according to the probability estimation result Whether entropy coding needs to be performed can use one or more of the following methods.
- the parameter i is a positive integer
- the coordinate i represents the current position of the feature element to be encoded.
- the following is to judge the current feature element to be encoded according to the probability estimation result Whether it is necessary to perform entropy coding as an example to illustrate, to determine the current feature elements to be coded Whether entropy coding needs to be performed is similar, and will not be repeated here.
- Method 1 When the probability distribution model is a Gaussian distribution, judge whether to perform entropy coding on the current feature element to be encoded according to the probability estimation result of the first feature element, when the mean value of the Gaussian distribution of the current feature element to be encoded is The value of parameter ⁇ and variance ⁇ satisfies the second condition: when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2, there is no need for the current feature element to be encoded Execute the entropy encoding process, otherwise, when the first condition is met: when the absolute value of the difference between the mean value ⁇ and k is greater than or equal to the second threshold T1 or the variance ⁇ is less than the third threshold T2, the current feature element to be encoded Perform entropy encoding to write code stream.
- the first condition when the absolute value of the difference between the mean value ⁇ and k is greater than or equal to the second threshold T1 or the variance ⁇
- k is any integer, such as 0, 1, -1, 2, 3 and so on.
- the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, such as 0.2, 0.3, 0.4, etc.
- T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
- the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, then skip the feature element to be encoded Perform the entropy encoding process, otherwise, the current feature element to be encoded Perform entropy encoding to write code stream.
- the value of T2 is any number satisfying 0 ⁇ T2 ⁇ 1, for example, the value is 0.2, 0.3, 0.4 and so on.
- T1 is a number greater than or equal to 0 and less than 1, such as 0.01, 0.02, 0.001, 0.002.
- Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the mean parameter ⁇ and variance ⁇ of the Gaussian distribution, when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (second condition), skip the current feature element to be encoded Carry out the entropy encoding process, where abs( ⁇ -k) means calculating the absolute value of the difference between the mean value ⁇ and k; otherwise, when the probability estimation result of the current feature element to be encoded satisfies abs( ⁇ -k)+ ⁇ T3 ( The first condition), for the current feature element to be encoded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- the fourth threshold T3 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4 and so on.
- the probability distribution is a Gaussian distribution
- the characteristic variable Each feature element in Perform probability estimation to get only the current feature elements to be encoded The value of the variance ⁇ of the Gaussian distribution, when the variance ⁇ satisfies ⁇ T3 (the second condition), skip the current feature element to be encoded Carry out the entropy coding process; otherwise, when the probability estimation result of the current feature element to be encoded satisfies ⁇ T3 (the first condition), the current feature element to be encoded Perform entropy encoding to write code stream.
- the fourth threshold T3 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4 and so on.
- Method 3 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
- abs( ⁇ -k)+ ⁇ T4 the second condition
- skip the current feature element to be encoded Perform entropy coding process, where abs( ⁇ -k) means calculating the absolute value of the difference between position parameter ⁇ and k;
- the first condition for the current feature element to be encoded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- the fourth threshold T4 is a number greater than or equal to 0 and less than 0.5, such as 0.05, 0.09, 0.17 and so on.
- Method 4 When the probability distribution is a Laplace distribution, according to the probability estimation result, obtain the current feature element to be encoded The value of the location parameter ⁇ and scale parameter b of the Laplace distribution.
- the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 and the scale parameter b is less than the third threshold T6 (second condition)
- skip the current feature element to be encoded Perform an entropy encoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (the first condition)
- the current feature element to be encoded Perform entropy encoding to write code stream.
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- the value of T5 is 1e-2
- the value of T6 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17, etc.
- the value of k when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, then skip the feature element to be encoded Perform the entropy encoding process, otherwise, the current feature element to be encoded Perform entropy encoding to write code stream.
- the value of the threshold T5 is 1e-2
- the value of T2 is any number satisfying T6 ⁇ 0.5, such as 0.05, 0.09, 0.17 and so on.
- Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be encoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution. When the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and the sum of any variance of the mixed Gaussian distribution is less than the fifth threshold T7 (second condition), skip the current feature element to be encoded Carry out the entropy encoding process; Otherwise, when the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (first condition), the current Feature elements to be coded Perform entropy encoding to write code stream.
- T7 second condition
- k is any integer, such as 0, 1, -1, -2, 3 and so on.
- T7 is a number greater than or equal to 0 and less than 1, such as 0.2, 0.3, 0.4, etc. (It can be considered that the threshold value of each feature element is the same)
- Method 6 Obtain the current feature element to be encoded according to the probability distribution Take the probability P of k, when the probability estimation result P of the current feature element to be encoded satisfies the second condition: when P is greater than (or equal to) the first threshold T0, skip the entropy encoding process of the current feature element to be encoded; otherwise , when the probability estimation result P of the current feature element to be encoded satisfies the first condition: when P is smaller than the first threshold T0, perform entropy encoding on the current feature element to be encoded and write it into the code stream.
- k can be any integer, such as 0, 1, -1, 2, 3 and so on.
- the first threshold T0 is any number satisfying 0 ⁇ T0 ⁇ 1, such as 0.99, 0.98, 0.97, 0.95 and so on. (It can be considered that the threshold value of each feature element is the same)
- the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
- Method 1 Take the threshold T1 as an example, take any value within the value range of T1 as the threshold T1, and write the threshold T1 into the code stream. Specifically, the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
- Method 2 The encoding end adopts the fixed threshold value agreed with the decoding end, and there is no need to write the code stream or transmit it to the decoding end. For example, taking the threshold T1 as an example, any value within the value range of T1 is directly taken as the value of T1. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
- Method 3 Build a threshold candidate list, put the most likely value within the value range of T1 into the threshold candidate list, each threshold corresponds to a threshold index number, determine an optimal threshold, and use the optimal threshold as T1 , and use the index number of the optimal threshold as the threshold index number of T1, and write the threshold index number of T1 into the code stream.
- the threshold is written into the code stream, which can be stored in the sequence header, image header, Slice/strip or SEI and sent to the decoding end, and other methods can also be used, which are not limited here. Similar methods can also be used for the remaining thresholds T0, T2, T3, T4, T5 and T6.
- Step 1805 The encoder sends or stores the compressed code stream.
- Step 1811 Obtain the code stream of the audio feature variable to be decoded
- Step 1812 Obtain the probability estimation result of each feature element
- side information Perform entropy decoding to obtain side information combined side information Treat decoded audio feature variables Each feature element in Perform probability estimation to obtain the current feature element to be decoded The probability estimation result of .
- the parameters j, i are positive integers, and the coordinates (j, i) indicate the current position of the feature element to be decoded.
- side information Perform entropy decoding to obtain side information combined side information Treat decoded audio feature variables Probability estimation is performed for each feature element [i] in , and the current feature element to be decoded is obtained The probability estimation result of .
- the parameter i is a positive integer
- the coordinate i represents the current position of the feature element to be decoded.
- the probability estimation method used by the decoder is the same as the probability estimation method of the encoder in this embodiment, and the probability estimation network structure diagram is the same as the probability estimation network structure of the encoder in this embodiment, so details are not repeated here.
- Step 1813 According to the probability estimation result, judge whether the current feature element to be decoded needs to perform entropy decoding, and perform or not perform entropy decoding according to the judgment result, and obtain the decoded feature variable
- Judging the current feature element to be decoded according to the probability estimation result Whether entropy decoding needs to be performed can use one or more of the following methods. Or, judge the current feature element to be decoded according to the probability estimation result Whether entropy decoding needs to be performed can use one or more of the following methods.
- Method 1 When the probability distribution model is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The value of the mean value parameter ⁇ and variance ⁇ of the value, when the absolute value of the difference between the mean value ⁇ and k is less than the second threshold T1 and the variance ⁇ is less than the third threshold T2 (second condition), the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Perform an entropy decoding process; otherwise, when the absolute value of the difference between the mean ⁇ and k is less than the second threshold T1 or the variance ⁇ is greater than or equal to the third threshold T2 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- the third threshold T2 the first condition
- the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the mean parameter ⁇ of the Gaussian distribution is less than T1 and the variance ⁇ of the Gaussian distribution is less than T2, the current feature element to be decoded If the value of is set to k, skip the current feature element to be decoded Perform entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 2 When the probability distribution is a Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded
- the value of the mean value parameter ⁇ and variance ⁇ of the value when the relationship between the mean value ⁇ , variance ⁇ and k satisfies abs( ⁇ -k)+ ⁇ T3 (the second condition), T3 is the fourth threshold, and the current feature element to be decoded
- the value of is set to k, skipping the current feature element to be decoded Perform the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T3 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- the probability distribution is a Gaussian distribution
- T3 is the fourth threshold
- the current feature element to be decoded The value of is set to 0, skipping the current feature element to be decoded Carry out the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies ⁇ T3 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 3 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
- T4 is the fourth threshold, and the current feature element to be decoded
- the value of is set to k, skipping the feature element Carry out the entropy decoding process, otherwise, when the probability estimation result of the current feature element to be decoded satisfies abs( ⁇ -k)+ ⁇ T4 (the first condition), the feature element Perform entropy decoding to obtain feature elements value.
- Method 4 When the probability distribution is a Laplace distribution, obtain the values of the position parameter ⁇ and the scale parameter b according to the probability estimation result.
- the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Perform an entropy decoding process, otherwise, when the absolute value of the difference between the position parameter ⁇ and k is less than the second threshold T5 or the scale parameter b is greater than or equal to the third threshold T6 (the first condition), the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- the current feature element to be decoded when the value of k is 0, it is the optimal value, and it can be directly judged that when the absolute value of the position parameter ⁇ is less than T5 and the scale parameter b is less than T6, the current feature element to be decoded
- the value of is set to k, skipping the current feature element to be decoded Perform entropy decoding process, otherwise, for the current feature element to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 5 When the probability distribution is a mixed Gaussian distribution, according to the probability estimation result, obtain the current feature element to be decoded The values of all mean parameters ⁇ i and variance ⁇ i of the mixture Gaussian distribution.
- the current feature element to be decoded The value of is set to k, skipping the current feature element to be decoded Carry out the entropy decoding process, otherwise, when the sum of the absolute values of the difference between all mean values and k of the mixed Gaussian distribution and any variance of the mixed Gaussian distribution is greater than or equal to the fifth threshold T7 (first condition), the current Feature elements to be decoded Perform entropy decoding to obtain the current feature element to be decoded value.
- Method 6 According to the probability distribution of the current feature element to be decoded, the probability P of the value of the current feature element to be decoded is k, that is, the probability estimation result P of the current feature element to be decoded.
- the probability estimation result P satisfies the second condition: P
- the value of the current feature element to be decoded is set to k; otherwise, when the current feature element to be decoded satisfies the first condition: P is less than or equal to the
- the first threshold is T0, entropy decoding is performed on the code stream to obtain the value of the first feature element.
- the above k value at the decoding end is set corresponding to the k value at the encoding end.
- the method of obtaining thresholds T0, T1, T2, T3, T4, T5, T6 and T7 corresponds to the encoding end, and one of the following methods can be used:
- Method 1 Obtain the threshold value from the code stream, specifically, obtain the threshold value from the sequence header, image header, slice/strip or SEI.
- Method 2 The decoder adopts the fixed threshold agreed with the encoder.
- Method 3 Obtain the threshold index number from the code stream, specifically, obtain the threshold index number from the sequence header, image header, Slice/strip or SEI. Then, the decoder constructs a threshold candidate list in the same way as the encoder, and obtains the corresponding threshold in the threshold candidate list according to the threshold index number.
- the thresholds T1, T2, T3, T4, T5 and T6 can be integrated, that is, shifted and amplified into integers.
- Step 1814 Decoded feature variables Reconstruction, or input into the machine-oriented auditory task module to perform corresponding machine tasks.
- This step can be specifically implemented by the decoding network 306 in FIG. 10B , and for details, reference can be made to the above description of the decoding network 34 .
- Case 1 Feature variables after entropy decoding Input the image reconstruction module, and the output of the neural network is reconstructed audio.
- the neural network can adopt any structure, such as a fully connected network, a convolutional neural network, a recurrent neural network, and the like.
- the neural network can adopt a multi-layer deep neural network structure to achieve better estimation results.
- Case 2 Feature variables after entropy decoding
- the input is oriented to the machine auditory task module to perform corresponding machine tasks.
- complete machine auditory tasks such as audio classification and recognition.
- the above k value at the decoding end is set corresponding to the k value at the encoding end.
- FIG. 18 is a schematic structural diagram of an exemplary encoding device of the present application.
- the device in this example may correspond to an encoder 20A.
- the apparatus may include: an obtaining module 2001 and an encoding module 2002 .
- Obtaining module 2001 may include encoding network 204, rounding 206 (optional), probability estimation 210, side information extraction 214, generation network 216 (optional) and joint network 218 (optional) in the foregoing embodiments.
- the encoding module 2002 includes the encoding decision implementation 208 in the previous embodiments. in,
- Obtaining module 2001 configured to acquire feature data to be encoded, the feature data to be encoded includes a plurality of feature elements, the plurality of feature elements include a first feature element, and is used to acquire a probability estimate of the first feature element Result; the coding module 2002 is configured to judge whether to perform entropy coding on the first feature element according to the probability estimation result of the first feature element; only when it is judged that entropy coding needs to be performed on the first feature element, Entropy coding is performed on the first feature element.
- the judging whether to perform entropy coding on the first feature element of the feature data includes: when the probability estimation result of the first feature element of the feature data satisfies a preset condition, it is necessary to perform entropy coding on the first feature element of the feature data Entropy encoding of the first feature element of the feature data; when the probability estimation result of the first feature element of the feature data does not meet the preset condition, entropy encoding of the first feature element of the feature data is not required.
- the encoding module is further configured to judge according to the probability estimation result of the feature data: the probability estimation result of the feature data is input into a generation network, and the network outputs decision information.
- the value of the decision information of the first feature element is 1, it is necessary to encode the first feature element of the feature data; when the value of the decision information of the first feature element is not 1, it is not necessary A first feature element of the feature data is encoded.
- the preset condition is that the probability that the first feature element takes a value of k is less than or equal to a first threshold, where k is an integer.
- the preset condition is that the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to the second threshold or the first The variance of the feature elements is greater than or equal to a third threshold, where k is an integer.
- the preset condition is the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element and the probability distribution of the first feature element
- the sum of the variances of is greater than or equal to the fourth threshold, where k is an integer.
- the probability value that the first feature element takes a value of k is the maximum probability value among the probability values of all possible values of the first feature element.
- probability estimation is performed on the feature data to obtain a probability estimation result of each feature element in the feature data, wherein the probability estimation result of the first feature element includes the first feature element The probability value of , and/or the first parameter of the probability distribution and the second parameter of the probability distribution.
- a probability estimation result of the feature data is input into a generation network to obtain decision information of the first feature element. According to the decision information of the first feature element, it is judged whether to perform entropy coding on the first feature element.
- the decision information of the characteristic data is a decision diagram
- the value corresponding to the position of the first characteristic element in the decision diagram is a preset value
- the value corresponding to the position of the first feature element in the decision diagram is not a preset value
- the encoding module is further configured to construct a threshold candidate list of the first threshold, put the first threshold into the threshold candidate list of the first threshold and correspond to the first threshold An index number of a threshold, writing the index number of the first threshold into the encoded code stream, wherein the length of the threshold candidate list of the first threshold can be set to T; T is an integer greater than or equal to 1.
- the device of this embodiment can be used in the technical solutions implemented by the encoder in the method embodiments shown in FIGS. 3A-3D , and its implementation principles and technical effects are similar, and will not be repeated here.
- FIG. 19 is a schematic structural diagram of an exemplary decoding device of the present application. As shown in FIG. 19 , the device in this example may correspond to a decoder 30 .
- the apparatus may include: an obtaining module 2101 and a decoding module 2102 .
- Obtaining module 2101 may include probability estimation 302 , generation network 310 (optional) and joint network 312 in the foregoing embodiments.
- the decoding module 2102 includes the decoding decision implementation 304 and the decoding network 306 in the foregoing embodiments. in,
- Obtaining module 2101 configured to obtain a code stream of feature data to be decoded, the feature data to be decoded includes a plurality of feature elements, and the plurality of feature elements include a first feature element; acquire a probability estimation result of the first feature element ;
- the decoding module 2102 is configured to judge whether to perform entropy decoding on the first feature element according to the probability estimation result of the first feature element; only when it is determined that entropy decoding needs to be performed on the first feature element, the The first feature element performs entropy decoding.
- the judging whether to entropy decode the first feature element of the feature data includes: when the probability estimation result of the first feature element of the feature data satisfies a preset condition, the Decoding the first feature element of the feature data; or when the probability estimation result of the first feature element of the feature data does not meet the preset condition, there is no need to decode the first feature element of the feature data, and the first feature element of the feature data
- the eigenvalues are set to k; where k is an integer.
- the decoding module is further configured to judge according to the probability estimation result of the characteristic data: the probability estimation result of the characteristic data is input into a judgment network module, and the network outputs decision information.
- the value of the first feature element position corresponding to the feature data in the decision information is 1, decode the first feature element of the feature data; when the first feature element corresponding to the feature data in the decision information
- the value of the feature element position is not 1, the first feature element of the feature data is not decoded, and the feature value of the first feature element is set to k, where k is an integer.
- the preset condition is that the probability that the first feature element takes a value of k is less than or equal to a first threshold, where k is an integer.
- the preset condition is that the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element is greater than or equal to the second threshold or the The variance of the probability distribution of the first feature element is greater than or equal to the third threshold.
- the preset condition is that the absolute value of the difference between the mean value of the probability distribution of the first feature element and the value k of the first feature element and the A sum of variances of the probability distributions is greater than or equal to a fourth threshold.
- probability estimation is performed on the feature data to obtain a probability estimation result of each feature element in the feature data, wherein the probability estimation result of the first feature element includes the first feature element The probability value of , and/or the first parameter of the probability distribution and the second parameter of the probability distribution.
- the probability value that the first feature element takes a value of k is the maximum probability value among the probability values of all possible values of the first feature element.
- the probability estimation result of the Nth feature element includes at least one of the following items: the probability value of the Nth feature element, the first parameter of the probability distribution, and the second parameter of the probability distribution and decision information.
- the value of the first feature element position corresponding to the feature data in the decision information is 1, decode the first feature element of the feature data; when the first feature element corresponding to the feature data in the decision information
- the value of the feature element position is not 1, the first feature element of the feature data is not decoded, and the feature value of the first feature element is set to k, where k is an integer.
- the probability estimation result of the feature data is input into the generation network to obtain the decision information of the first feature element; when the value of the decision information of the first feature element is a preset value , judging that entropy decoding needs to be performed on the first feature element;
- the feature value of the feature element is set to k, where k is an integer and k is one of multiple candidate values of the first feature element.
- the obtaining module is further configured to construct a threshold candidate list of the first threshold, and obtain an index number of the threshold candidate list of the first threshold by decoding the code stream, and The value of the threshold candidate list position of the first threshold corresponding to the index number of the first threshold is used as the value of the first threshold, wherein the length of the threshold candidate list of the first threshold can be set to T; T is An integer greater than or equal to 1.
- the device of this embodiment can be used in the technical solutions implemented by the decoder in the method embodiments shown in FIGS. 10B , 13B, and 16 , and its implementation principles and technical effects are similar, and will not be repeated here.
- Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
- a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
- Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
- a computer program product may include a computer readable medium.
- such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
- any connection is properly termed a computer-readable medium.
- coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
- coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
- Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD) and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable logic arrays
- processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
- the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
- the techniques may be fully implemented in one or more circuits or logic elements.
- the techniques of the present application may be implemented in a wide variety of devices or devices, including wireless handsets, an integrated circuit (IC), or a group of ICs (eg, a chipset).
- IC integrated circuit
- a group of ICs eg, a chipset
- Various components, modules, or units are described in this application to emphasize functional aspects of the means for judging the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (70)
- 一种特征数据的编码方法,其特征在于,包括:获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述第一特征元素的概率估计结果;根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述第一特征元素执行熵编码;或当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求4项所述的方法,其特征在于:当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一 特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:构建阈值候选列表,将所述第一阈值放入所述阈值候选列表中,且将对应有所述第一阈值的索引号写入编码码流,其中所述阈值候选列表的长度为T,T为大于或等于1的整数。
- 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果通过混合高斯分布获得时,则所述预设条件为:所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求2所述的方法,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求3-8任一所述的方法,其特征在于:所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码。
- 根据权利要求10所述的方法,其特征在于,当所述特征数据的决策信息为决策 图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求10所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求1-12任一所述的方法,其特征在于,所述多个特征元素还包括第二特征元素,当判断出不需要对所述第二特征元素执行熵编码时,跳过对所述第二特征元素执行熵编码。
- 根据权利要求1-13任一所述的方法,其特征在于,所述方法还包括:将包括所述第一特征元素的多个特征元素的熵编码结果写入编码码流。
- 一种特征数据的解码方法,其特征在于,包括:获取待解码特征数据的码流;所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述第一特征元素的概率估计结果;根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
- 根据权利要求15所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述特征数据的第一特征元素执行熵解码;或当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述特征数据的第一特征元素执行熵解码,将所述第一特征元素的特征值设置为k,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求16所述的方法,其特征在于,当所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求16所述的方法,其特征在于,当所述第一特征元素的概率估计结 果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求18所述的方法,其特征在于:当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
- 根据权利要求16所述的方法,其特征在于,所述第一特征元素的概率估计结果通过混合高斯分布获得时,则所述预设条件为:所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求16所述的方法,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求16-21任一所述的装置,其特征在于,所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
- 根据权利要求15所述的方法,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码。
- 根据权利要求23所述的方法,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵解码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵解码。
- 根据权利要求23所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵解码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵解码。
- 根据权利要求15-25任一所述的方法,其特征在于,所述方法还包括:所述特征数据经过解码网络以得到所述重建数据或面向机器任务数据。
- 一种特征数据编码装置,其特征在于,包括:获得模块,用于获取待编码特征数据,所述待编码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素,以及用于获取所述中第一特征元素的概率估计结果;编码模块,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
- 根据权利要求27所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述特征数据的第一特征元素执行熵编码;或当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述特征数据的第一特征元素执行熵编码。
- 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的 概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求30所述的装置,其特征在于:当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
- 根据权利要求29所述的装置,其特征在于:所述编码模块,还用于构建阈值候选列表,将所述第一阈值放入所述阈值候选列表中,且将对应有所述第一阈值的索引号写入编码码流,其中所述阈值候选列表的长度为T,T为大于或等于1的整数。
- 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果通过混合高斯分布获得时,则所述预设条件为:所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求28所述的装置,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求29-34任一所述的方法,其特征在于:所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
- 根据权利要求27所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵编码包括:将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码。
- 根据权利要求36所述的装置,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求36所述的装置,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求27-38任一所述的装置,其特征在于,所述多个特征元素还包括第二特征元素,当判断出不需要对所述第二特征元素执行熵编码时,跳过对所述第二特征元素执行熵编码。
- 根据权利要求27-39任一所述的装置,其特征在于,所述编码模块还包括:将包括所述的第一特征元素的多个特征元素的熵编码结果写入编码码流
- 一种特征数据解码装置,其特征在于,包括:获得模块,用于获取待解码特征数据的码流,所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述第一特征元素的概率估计结果;解码模块,用于根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码;仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
- 根据权利要求41所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:当所述第一特征元素的概率估计结果满足预设条件时,判断需要对所述特征数据的第一特征元素执行熵解码;或当所述第一特征元素的概率估计结果不满足预设条件时,判断不需要对所述特征数据的第一特征元素执行熵解码,将所述第一特征元素的特征值设置为k,其中k为整数且k为多个候选取值中的一个。
- 根据权利要求42所述的装置,其特征在于,当所述第一特征元素的概率估计结果为所述第一特征元素取值为k的概率值,则所述预设条件为所述第一特征元素取值为k的概率值小于或等于第一阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求42所述的装置,其特征在于,当所述第一特征元素的概率估计结果包括所述第一特征元素概率分布的第一参数和第二参数,则所述预设条件为:所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值大于或等于第二阈值;或所述第一特征元素的概率分布的所述第二参数大于或等于第三阈值;或所述第一特征元素的概率分布的所述第一参数与所述第一特征元素取值为k的差的绝对值与所述第一特征元素的概率分布的所述第二参数的和大于或等于第四阈值,其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求44所述的装置,其特征在于:当所述概率分布为高斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素高斯分布的均值,所述第一特征元素概率分布的第二参数为所述第一特征元素高斯分布的方差;或当所述概率分布为拉普拉斯分布,所述第一特征元素概率分布的第一参数为所述第一特征元素拉普拉斯分布的位置参数,所述第一特征元素概率分布的第二参数为所述第一特征元素拉普拉斯分布的尺度参数。
- 根据权利要求42所述的装置,其特征在于,所述第一特征元素的概率估计结果 通过混合高斯分布获得时,则所述预设条件为:所述第一特征元素的混合高斯分布的所有均值与所述第一特征元素的取值为k的差的绝对值之和与所述第一特征元素的混合高斯分布的任一方差的和大于或等于第五阈值;或所述第一特征元素的混合高斯分布的任一均值与所述第一特征元素的取值为k的差大于或等于第六阈值;或所述第一特征元素的混合高斯分布的任一方差大于或等于第七阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求42所述的装置,其特征在于,所述第一特征元素的概率估计结果通过非对称高斯分布获得时,则所述预设条件为:所述第一特征元素的非对称高斯分布的均值与所述第一特征元素的取值为k的差的绝对值大于或等于第八阈值;或所述第一特征元素的非对称高斯分布的第一方差大于或等于第九阈值;或所述第一特征元素的非对称高斯分布的第二方差大于或等于第十阈值;其中k为整数且k为所述第一特征元素的多个候选取值中的一个。
- 根据权利要求42-47任一所述的装置,其特征在于,所述第一特征元素取值为k的概率值为所述第一特征元素的所有候选取值的概率值中的最大概率值。
- 根据权利要求41所述的装置,其特征在于,所述根据所述第一特征元素的概率估计结果,判断是否对所述第一特征元素执行熵解码包括:将所述特征数据的概率估计结果输入生成网络以得到所述第一特征元素的决策信息,根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码。
- 根据权利要求49所述的装置,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,需要对所述第一特征元素执行熵解码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,不需要对所述第一特征元素执行熵解码。
- 根据权利要求49所述的装置,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵解码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵解码。
- 根据权利要求41-51任一所述的装置,其特征在于:所述解码模块,还用于所述特征数据经过解码网络以得到所述重建数据或面向机器任务数据。
- 一种特征数据的编码方法,其特征在于,包括:获取待编码特征数据,所述特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;获取所述特征数据的边信息,对所述特征数据的边信息输入联合网络以得到所述第一特征元素的决策信息;根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵编码;仅当判断出需要对所述第一特征元素执行熵编码时,对所述第一特征元素执行熵编码。
- 根据权利要求53所述的方法,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求53所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码。
- 根据权利要求53-55任一所述的方法,其特征在于,所述多个特征元素还包括第二特征元素,当判断出不需要对所述第二特征元素执行熵编码时,跳过对所述第二特征元素执行熵编码。
- 根据权利要求53-56任一所述的方法,其特征在于,所述方法还包括:将包括所述的第一特征元素的多个特征元素的熵编码结果写入编码码流。
- 一种特征数据的解码方法,其特征在于,包括:获取待解码特征数据的码流和所述待解码特征数据的边信息;所述待解码特征数据包括多个特征元素,所述多个特征元素包括第一特征元素;对所述待解码特征数据的边信息输入联合网络以得到所述第一特征元素的决策信息;根据所述第一特征元素的决策信息,判断是否对所述第一特征元素执行熵解码;仅当判断出需要对所述第一特征元素执行熵解码时,对所述第一特征元素执行熵解码。
- 根据权利要求58所述的方法,其特征在于,当所述特征数据的决策信息为决策图时,则所述决策图中对应所述第一特征元素所在位置的值为预设值时,判断需要对所述 第一特征元素执行熵解码;当所述决策图中对应所述第一特征元素所在位置的值不为预设值时,判断不需要对所述第一特征元素执行熵解码,将第一特征元素的特征值设置为k,其中k为整数。
- 根据权利要求58所述的方法,其特征在于,当所述特征数据的决策信息为预设值时,判断需要对所述第一特征元素执行熵编码;当所述决策信息不为预设值时,判断不需要对所述第一特征元素执行熵编码,将第一特征元素的特征值设置为k,其中k为整数。
- 根据权利要求58-60任一所述的方法,其特征在于,所述方法还包括:所述特征数据经过解码网络以得到所述重建数据或面向机器任务数据。
- 一种编码器,其特征在于,包括处理电路,用于执行权利要求1至14,53至57任一项所述的方法。
- 一种解码器,其特征在于,包括处理电路,用于执行权利要求15至26,58至61任一项所述的方法。
- 一种计算机程序产品,其特征在于,包括程序代码,当其在计算机或处理器上判断时,用于判断权利要求1至26,53至61任一项所述的方法。
- 一种非瞬时性计算机可读存储介质,其特征在于,包括根据权利要求14或57所述的编码方法获得的码流。
- 一种编码器,其特征在于,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述编码器执行根据权利要求1至14,53至57任一项所述的方法。
- 一种解码器,其特征在于,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序,其中所述程序在由所述处理器判断时,使得所述解码器执行根据权利要求15至26,58至61任一项所述的方法。
- 一种编码器,其特征在于,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器判断的程序, 其中所述程序在由所述处理器判断时,使得所述编码器执行根据权利要求1至14,53至57任一项所述的方法。
- 一种图像或音频处理器,其特征在于,包括处理电路,用于执行根据权利要求1至26,53至61任一项所述的方法。
- 一种非瞬时性计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备判断时,用于执行根据权利要求1至26,53至61任一项所述的方法。
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22815293.0A EP4336829A1 (en) | 2021-06-02 | 2022-06-01 | Feature data encoding method and apparatus and feature data decoding method and apparatus |
KR1020237045517A KR20240016368A (ko) | 2021-06-02 | 2022-06-01 | 특징 데이터 인코딩 및 디코딩 방법 및 장치 |
CA3222179A CA3222179A1 (en) | 2021-06-02 | 2022-06-01 | Feature data encoding and decoding method and apparatus |
AU2022286517A AU2022286517A1 (en) | 2021-06-02 | 2022-06-01 | Feature data encoding method and apparatus and feature data decoding method and apparatus |
MX2023014419A MX2023014419A (es) | 2021-06-02 | 2022-06-01 | Método y aparato de codificación y decodificación de datos de características. |
JP2023574690A JP2024520151A (ja) | 2021-06-02 | 2022-06-01 | 特徴データ符号化および復号方法および装置 |
BR112023025167A BR112023025167A2 (pt) | 2021-06-02 | 2022-06-01 | Método e aparelho de codificação e decodificação de dados de característica |
US18/526,406 US20240105193A1 (en) | 2021-06-02 | 2023-12-01 | Feature Data Encoding and Decoding Method and Apparatus |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110616029 | 2021-06-02 | ||
CN202110616029.2 | 2021-06-02 | ||
CN202110674299 | 2021-06-17 | ||
CN202110674299.9 | 2021-06-17 | ||
CN202111091143.4 | 2021-09-17 | ||
CN202111091143.4A CN115442609A (zh) | 2021-06-02 | 2021-09-17 | 特征数据编解码方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/526,406 Continuation US20240105193A1 (en) | 2021-06-02 | 2023-12-01 | Feature Data Encoding and Decoding Method and Apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022253249A1 true WO2022253249A1 (zh) | 2022-12-08 |
Family
ID=84271885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/096510 WO2022253249A1 (zh) | 2021-06-02 | 2022-06-01 | 特征数据编解码方法和装置 |
Country Status (10)
Country | Link |
---|---|
US (1) | US20240105193A1 (zh) |
EP (1) | EP4336829A1 (zh) |
JP (1) | JP2024520151A (zh) |
KR (1) | KR20240016368A (zh) |
CN (1) | CN115442609A (zh) |
AU (1) | AU2022286517A1 (zh) |
BR (1) | BR112023025167A2 (zh) |
CA (1) | CA3222179A1 (zh) |
MX (1) | MX2023014419A (zh) |
WO (1) | WO2022253249A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024188073A1 (zh) * | 2023-03-10 | 2024-09-19 | 华为技术有限公司 | 编解码方法及电子设备 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3173430B1 (en) | 2014-07-24 | 2019-11-13 | Japan Polyethylene Corporation | Olefin polymerization catalyst and method for producing olefin polymer |
CN118338011A (zh) * | 2023-01-10 | 2024-07-12 | 杭州海康威视数字技术股份有限公司 | 一种解码、编码方法、装置及其设备 |
CN116828184B (zh) * | 2023-08-28 | 2023-12-22 | 腾讯科技(深圳)有限公司 | 视频编码、解码方法、装置、计算机设备和存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10127913B1 (en) * | 2017-07-07 | 2018-11-13 | Sif Codec Llc | Method of encoding of data stream, method of decoding of data stream, and devices for implementation of said methods |
CN111107377A (zh) * | 2018-10-26 | 2020-05-05 | 曜科智能科技(上海)有限公司 | 深度图像压缩方法及其装置、设备和存储介质 |
US10652581B1 (en) * | 2019-02-27 | 2020-05-12 | Google Llc | Entropy coding in image and video compression using machine learning |
CN111988629A (zh) * | 2019-05-22 | 2020-11-24 | 富士通株式会社 | 图像编码方法和装置、图像解码方法和装置 |
-
2021
- 2021-09-17 CN CN202111091143.4A patent/CN115442609A/zh active Pending
-
2022
- 2022-06-01 BR BR112023025167A patent/BR112023025167A2/pt unknown
- 2022-06-01 AU AU2022286517A patent/AU2022286517A1/en active Pending
- 2022-06-01 EP EP22815293.0A patent/EP4336829A1/en active Pending
- 2022-06-01 WO PCT/CN2022/096510 patent/WO2022253249A1/zh active Application Filing
- 2022-06-01 JP JP2023574690A patent/JP2024520151A/ja active Pending
- 2022-06-01 KR KR1020237045517A patent/KR20240016368A/ko active Search and Examination
- 2022-06-01 CA CA3222179A patent/CA3222179A1/en active Pending
- 2022-06-01 MX MX2023014419A patent/MX2023014419A/es unknown
-
2023
- 2023-12-01 US US18/526,406 patent/US20240105193A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10127913B1 (en) * | 2017-07-07 | 2018-11-13 | Sif Codec Llc | Method of encoding of data stream, method of decoding of data stream, and devices for implementation of said methods |
CN111107377A (zh) * | 2018-10-26 | 2020-05-05 | 曜科智能科技(上海)有限公司 | 深度图像压缩方法及其装置、设备和存储介质 |
US10652581B1 (en) * | 2019-02-27 | 2020-05-12 | Google Llc | Entropy coding in image and video compression using machine learning |
CN111988629A (zh) * | 2019-05-22 | 2020-11-24 | 富士通株式会社 | 图像编码方法和装置、图像解码方法和装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024188073A1 (zh) * | 2023-03-10 | 2024-09-19 | 华为技术有限公司 | 编解码方法及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN115442609A (zh) | 2022-12-06 |
US20240105193A1 (en) | 2024-03-28 |
CA3222179A1 (en) | 2022-12-08 |
AU2022286517A1 (en) | 2023-12-21 |
KR20240016368A (ko) | 2024-02-06 |
MX2023014419A (es) | 2024-03-05 |
JP2024520151A (ja) | 2024-05-21 |
BR112023025167A2 (pt) | 2024-02-27 |
EP4336829A1 (en) | 2024-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022253249A1 (zh) | 特征数据编解码方法和装置 | |
CN113287306B (zh) | 用于从比特流中解码编码的视频块的方法、装置和解码器 | |
WO2022068716A1 (zh) | 熵编/解码方法及装置 | |
WO2021249290A1 (zh) | 环路滤波方法和装置 | |
US20210150769A1 (en) | High efficiency image and video compression and decompression | |
WO2022194137A1 (zh) | 视频图像的编解码方法及相关设备 | |
WO2023279961A1 (zh) | 视频图像的编解码方法及装置 | |
CN116711308A (zh) | 视频编解码以及模型训练方法与装置 | |
CN114125446A (zh) | 图像编码方法、解码方法和装置 | |
JP2024513693A (ja) | ピクチャデータ処理ニューラルネットワークに入力される補助情報の構成可能な位置 | |
WO2022156688A1 (zh) | 分层编解码的方法及装置 | |
WO2022100173A1 (zh) | 一种视频帧的压缩和视频帧的解压缩方法及装置 | |
US20230396810A1 (en) | Hierarchical audio/video or picture compression method and apparatus | |
WO2022111233A1 (zh) | 帧内预测模式的译码方法和装置 | |
WO2022063267A1 (zh) | 帧内预测方法及装置 | |
US20230412807A1 (en) | Bit allocation for neural network feature channel compression | |
CN115834888A (zh) | 特征图编解码方法和装置 | |
WO2023279968A1 (zh) | 视频图像的编解码方法及装置 | |
WO2023165487A1 (zh) | 特征域光流确定方法及相关设备 | |
WO2024217530A1 (en) | Method and apparatus for image encoding and decoding | |
US20240296594A1 (en) | Generalized Difference Coder for Residual Coding in Video Compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22815293 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2023/014419 Country of ref document: MX Ref document number: 3222179 Country of ref document: CA Ref document number: 2301007889 Country of ref document: TH |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023574690 Country of ref document: JP Ref document number: 2022286517 Country of ref document: AU Ref document number: AU2022286517 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022815293 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202337084042 Country of ref document: IN |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112023025167 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2022815293 Country of ref document: EP Effective date: 20231206 |
|
ENP | Entry into the national phase |
Ref document number: 2022286517 Country of ref document: AU Date of ref document: 20220601 Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20237045517 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023135295 Country of ref document: RU |
|
ENP | Entry into the national phase |
Ref document number: 112023025167 Country of ref document: BR Kind code of ref document: A2 Effective date: 20231130 |