WO2023011420A1 - 编解码方法和装置 - Google Patents
编解码方法和装置 Download PDFInfo
- Publication number
- WO2023011420A1 WO2023011420A1 PCT/CN2022/109485 CN2022109485W WO2023011420A1 WO 2023011420 A1 WO2023011420 A1 WO 2023011420A1 CN 2022109485 W CN2022109485 W CN 2022109485W WO 2023011420 A1 WO2023011420 A1 WO 2023011420A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- rounded value
- decoded
- code stream
- decoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 180
- 238000012545 processing Methods 0.000 claims description 118
- 230000004913 activation Effects 0.000 claims description 23
- 238000001994 activation Methods 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 13
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000033001 locomotion Effects 0.000 description 89
- 238000013139 quantization Methods 0.000 description 85
- 239000013598 vector Substances 0.000 description 57
- 230000008569 process Effects 0.000 description 53
- 230000000875 corresponding effect Effects 0.000 description 50
- 238000013528 artificial neural network Methods 0.000 description 46
- 230000006870 function Effects 0.000 description 40
- 238000010586 diagram Methods 0.000 description 34
- 238000004891 communication Methods 0.000 description 31
- 230000005540 biological transmission Effects 0.000 description 22
- 239000011159 matrix material Substances 0.000 description 22
- 238000013527 convolutional neural network Methods 0.000 description 20
- 238000011176 pooling Methods 0.000 description 20
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 19
- 238000007906 compression Methods 0.000 description 17
- 230000006835 compression Effects 0.000 description 17
- 238000012549 training Methods 0.000 description 17
- 238000005192 partition Methods 0.000 description 16
- 238000001914 filtration Methods 0.000 description 14
- 241000023320 Luma <angiosperm> Species 0.000 description 13
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 12
- 230000011218 segmentation Effects 0.000 description 12
- 238000000638 solvent extraction Methods 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 11
- 230000003044 adaptive effect Effects 0.000 description 10
- 230000001537 neural effect Effects 0.000 description 10
- 238000003491 array Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 8
- 210000002569 neuron Anatomy 0.000 description 8
- 230000002123 temporal effect Effects 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 239000011449 brick Substances 0.000 description 7
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000005055 memory storage Effects 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000006837 decompression Effects 0.000 description 4
- 235000019800 disodium phosphate Nutrition 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000011664 signaling Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 101150114515 CTBS gene Proteins 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000007727 signaling mechanism Effects 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
- H04N19/463—Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Definitions
- the present application relates to the technical field of artificial intelligence, and in particular to encoding and decoding methods and devices.
- Compressing images based on neural networks can improve the efficiency of image compression.
- Existing image compression methods based on neural networks are mainly divided into neural network image compression methods that require online training (referred to as method 1) and neural network image compression methods that do not require online training. (Method 2 for short), the rate-distortion performance of the first method is better, but the image compression speed is slow due to the need for online training, and the rate-distortion performance of the second method is poor but the image compression speed is fast.
- the present application provides a codec method and device, which can improve the rate-distortion performance of the data codec method without online training.
- the application adopts the following technical solutions:
- the present application discloses an encoding method, the method comprising: first obtaining data to be encoded, then inputting the data to be encoded into a first encoding network to obtain target parameters; and then constructing a second encoding network according to the target parameters ; Next, input the data to be encoded into the second encoding network to obtain first features; finally encode the first features to obtain an encoded code stream.
- the encoding network (ie, the second encoding network) uses fixed parameter weights to extract the content features (ie, the first feature) of the data to be encoded, and then encodes the content features into a code stream (ie, the encoded code stream) and sends it to the decoder.
- the decoding end decodes and reconstructs the code stream to obtain decoded data. It can be seen that the parameter weights of the encoding network in the prior art are not related to the data to be encoded.
- the data to be encoded is first input into the first encoding network, and then the first encoding network generates the parameter weights of the second encoding network according to the data to be encoded, and then dynamically adjusts the second encoding network according to the obtained weights
- the parameter weight of the second encoding network is related to the parameter weight of the second encoding network, which increases the expressive ability of the second encoding network, so that the decoding end obtains the decoded data reconstructed by the code stream decoding and the data to be encoded through the first feature encoding closer, thereby improving the rate-distortion performance of the codec network.
- the target parameter is a parameter weight of all convolutions or partial convolutions and nonlinear activations of the second encoding network.
- the encoding the first feature to obtain an encoded code stream includes: rounding the first feature to obtain an integer value of the first feature; Probability estimation is performed on the rounded value of a feature to obtain an estimated probability distribution of the rounded value of the first feature; according to the estimated probability distribution of the rounded value of the first feature, entropy coding to obtain the coded code stream.
- the rounded value of the first feature is entropy encoded to form a code stream, which can reduce the coding redundancy of outputting the first feature, and further reduce the data encoding and decoding (compression) process. data transfer volume.
- the performing probability estimation on the rounded value of the first feature to obtain the estimated probability distribution of the rounded value of the first feature includes: Probability estimation is performed on the rounded value of a feature to obtain an estimated probability distribution of the rounded value of the first feature, and the first information includes at least one item of context information and side information.
- Estimating the probability distribution through the context information and side information can improve the accuracy of the estimated probability distribution, thereby reducing the code rate in the entropy coding process and reducing the entropy coding overhead.
- the present application provides a decoding method, the method comprising: first obtaining a code stream to be decoded; then decoding the code stream to be decoded to obtain the rounded value of the first feature and the rounded value of the second feature ; then input the rounded value of the second feature into the first decoding network to obtain the target parameter; then construct the second decoding network according to the target parameter; finally input the rounded value of the first feature into the The second decoding network to obtain decoded data.
- the rounded value of the first feature is used to obtain decoded data
- the rounded value of the second feature is used to obtain a target parameter.
- the decoding network ie, the second decoding network
- the content features and model features (i.e., the first feature and the second feature) of the data to be decoded are compiled into a code stream to be decoded, and then the decoding end obtains the rounded value of the second feature by decoding the code stream to be decoded, through Input the rounded value of the second feature into the first decoding network to obtain the parameter weight of the second decoding network, and then dynamically adjust the parameter weight of the second decoding network according to the parameter weight, so that the parameter weight of the second decoding network is consistent with the data to be decoded
- Correlation improves the expressive ability of the second decoding network, and makes the decoded data obtained by decoding and reconstruction of the second decoding network closer to the data to be encoded, thereby improving the rate-distortion performance of the encoding and decoding network.
- the target parameter is a parameter weight of all convolutions or partial convolutions and nonlinear activations of the second decoding network.
- the code stream to be decoded may include a first code stream to be decoded and a second code stream to be decoded.
- the decoding the code stream to be decoded to obtain the rounded value of the first feature and the rounded value of the second feature includes: performing Decoding to obtain an integer value of the first feature; decoding the second code stream to be decoded to obtain an integer value of the second feature.
- the decoding the first code stream to be decoded to obtain the integer value of the first feature includes: decoding the first feature in the first code stream to be decoded Perform probability estimation on the rounded value of the first feature to obtain an estimated probability distribution of the rounded value of the first feature; perform entropy decoding on the first code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain A rounded value of the first feature is obtained.
- performing probability estimation on the rounded value of the first feature in the first code stream to be decoded to obtain an estimated probability distribution of the rounded value of the first feature includes: Probability estimation is performed on the rounded value of the first feature in the first code stream to be decoded according to the first information to obtain an estimated probability distribution of the rounded value of the first feature, where the first information includes context information and At least one item of side information.
- the decoding the second code stream to be decoded to obtain the rounded value of the second feature includes: decoding the second feature in the second code stream to be decoded Perform probability estimation on the rounded value of the second feature to obtain an estimated probability distribution of the rounded value of the second feature; perform entropy decoding on the second code stream to be decoded according to the estimated probability distribution of the rounded value of the second feature to obtain A rounded value of the second feature is obtained.
- performing probability estimation on the rounded value of the second feature in the second code stream to be decoded to obtain an estimated probability distribution of the rounded value of the second feature includes: performing probability estimation on the rounded value of the second feature in the second code stream to be decoded according to the first information to obtain an estimated probability distribution of the rounded value of the second feature, where the first information includes context information and At least one item of side information.
- the present application provides a decoding method, which includes: firstly obtaining a code stream to be decoded; then decoding the code stream to be decoded to obtain the rounded value of the first feature; and then converting the code stream to The rounded value of a feature is input into the third decoding network to obtain the target parameter; then the second decoding network is constructed according to the target parameter; finally, the rounded value of the first feature is input into the second decoding network to obtain the decoded data.
- the first feature is used to obtain decoded data and target parameters.
- the decoding network ie, the second decoding network
- the rounded value of the first feature is obtained by decoding the code stream to be decoded encoded by the feature of the data to be decoded (ie, the first feature), and the rounded value of the first feature is input into the first decoding network to obtain the first The parameter weight of the second decoding network, and then dynamically adjust the parameter weight of the second decoding network according to the parameter weight, so that the parameter weight of the second decoding network is related to the data to be decoded, which improves the expressive ability of the second decoding network, and makes the second decoding network
- the decoded data obtained by network decoding reconstruction is closer to the data to be encoded, thereby improving the rate-distortion performance of the encoding and decoding network.
- the target parameter is a parameter weight of all convolutions or partial convolutions and nonlinear activations of the second decoding network.
- the decoding the code stream to be decoded to obtain the rounded value of the first feature includes: the rounded value of the first feature in the code stream to be decoded performing probability estimation to obtain an estimated probability distribution of the rounded value of the first feature; performing entropy decoding on the code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the first feature The rounded value of .
- the probability estimation of the rounded value of the first feature in the code stream to be decoded to obtain the estimated probability distribution of the rounded value of the first feature includes: according to A piece of information that performs probability estimation on the rounded value of the first feature in the code stream to be decoded to obtain an estimated probability distribution of the rounded value of the first feature, where the first information includes context information and side information at least one.
- the present application provides an encoding device, which includes a processing circuit, and the processing circuit is used to: acquire data to be encoded; input the data to be encoded into a first encoding network to obtain target parameters; according to the The target parameters construct a second encoding network; input the data to be encoded into the second encoding network to obtain a first feature; encode the first feature to obtain an encoded code stream.
- the target parameter is a parameter weight of all convolutions or partial convolutions and nonlinear activations of the second encoding network.
- the processing circuit is specifically configured to: perform rounding on the first feature to obtain a rounded value of the first feature; perform probability estimation on the rounded value of the first feature to obtain Obtain an estimated probability distribution of the rounded value of the first feature; perform entropy encoding on the rounded value of the first feature according to the estimated probability distribution of the rounded value of the first feature to obtain the coded code stream.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the first feature according to the first information to obtain an estimated probability distribution of the rounded value of the first feature,
- the first information includes at least one of context information and side information.
- the present application provides a decoding device, the decoding device includes a processing circuit, and the processing circuit is used to: obtain the code stream to be decoded; decode the code stream to be decoded to obtain the rounded value of the first feature and the rounded value of the second feature, the rounded value of the first feature is used to obtain the decoded data, the rounded value of the second feature is used to obtain the target parameter; the rounded value of the second feature is input A first decoding network to obtain the target parameters; constructing a second decoding network according to the target parameters; inputting the integer value of the first feature into the second decoding network to obtain decoded data.
- the target parameter is a parameter weight of all convolutions or partial convolutions and nonlinear activations of the second decoding network.
- the code stream to be decoded includes a first code stream to be decoded and a second code stream to be decoded.
- the processing circuit is specifically configured to: decode the first code stream to be decoded to obtain an integer value of the first feature; Decode to obtain the rounded value of the second feature.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the first feature in the first code stream to be decoded to obtain the rounded value of the first feature Estimating a probability distribution: performing entropy decoding on the first code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the first feature in the first code stream to be decoded according to the first information to obtain the value of the first feature
- An estimated probability distribution of rounded values, the first information includes at least one item of context information and side information.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the second feature in the second code stream to be decoded to obtain the rounded value of the second feature Estimating a probability distribution; performing entropy decoding on the second code stream to be decoded according to the estimated probability distribution of the rounded value of the second feature to obtain the rounded value of the second feature.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the second feature in the second to-be-decoded code stream according to the first information to obtain the value of the second feature
- An estimated probability distribution of rounded values, the first information includes at least one item of context information and side information.
- the present application provides a decoding device, which includes a processing circuit, and the processing circuit is configured to: obtain a code stream to be decoded; decode the code stream to be decoded to obtain a rounded value of the first feature , the rounded value of the first feature is used to obtain decoded data and target parameters; the rounded value of the first feature is input into the first decoding network to obtain the target parameters; the second decoding network is constructed according to the target parameters; Inputting the integer value of the first feature to the second decoding network to obtain decoded data.
- the target parameter is a parameter weight of all convolutions or partial convolutions and nonlinear activations of the second decoding network.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the first feature in the code stream to be decoded to obtain an estimated probability of the rounded value of the first feature distribution: performing entropy decoding on the code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- the processing circuit is specifically configured to: perform probability estimation on the rounded value of the first feature in the code stream to be decoded according to the first information to obtain the rounded value of the first feature An estimated probability distribution of values, the first information including at least one of context information and side information.
- the embodiment of the present application further provides an encoder, the encoder includes: at least one processor, and when the at least one processor executes program codes or instructions, the above first aspect or any possible implementation thereof can be realized method described in the method.
- the encoder may also include at least one memory for storing the program code or instructions.
- the embodiment of the present application further provides a decoder, the decoder includes: at least one processor, and when the at least one processor executes program codes or instructions, the above-mentioned second aspect or any possible implementation thereof can be realized method described in the method.
- the decoder may further include at least one memory, and the at least one memory is used to store the program code or instruction.
- the embodiment of the present application further provides a chip, including: an input interface, an output interface, and at least one processor.
- the chip also includes a memory.
- the at least one processor is used to execute the code in the memory, and when the at least one processor executes the code, the chip implements the method described in the above first aspect or any possible implementation thereof.
- the aforementioned chip may also be an integrated circuit.
- the embodiment of the present application further provides a terminal, where the terminal includes the foregoing encoding device, decoding device, encoder, decoder, or the foregoing chip.
- the present application further provides a computer-readable storage medium for storing a computer program, and the computer program includes a method for implementing the above-mentioned first aspect or any possible implementation thereof.
- the embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, enables the computer to implement the method described in the above first aspect or any possible implementation thereof.
- the encoding device, decoding device, encoder, decoder, computer storage medium, computer program product and chip provided in this embodiment are all used to execute the method provided above, therefore, the beneficial effects it can achieve can refer to the above The beneficial effects of the provided method will not be repeated here.
- Figure 1a is an exemplary block diagram of a decoding system provided by an embodiment of the present application.
- FIG. 1b is an exemplary block diagram of a video decoding system provided by an embodiment of the present application.
- FIG. 2 is an exemplary block diagram of a video encoder provided in an embodiment of the present application
- FIG. 3 is an exemplary block diagram of a video decoder provided in an embodiment of the present application.
- FIG. 4 is an exemplary schematic diagram of a candidate image block provided by an embodiment of the present application.
- FIG. 5 is an exemplary block diagram of a video decoding device provided in an embodiment of the present application.
- FIG. 6 is an exemplary block diagram of a device provided by an embodiment of the present application.
- Fig. 7a is a schematic diagram of an application scenario provided by the embodiment of the present application.
- Fig. 7b is a schematic diagram of an application scenario provided by the embodiment of the present application.
- FIG. 8 is a schematic flowchart of a codec method provided in an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a codec system provided by an embodiment of the present application.
- FIG. 10 is a schematic flowchart of another encoding and decoding method provided in the embodiment of the present application.
- FIG. 11 is a schematic structural diagram of another encoding and decoding system provided by an embodiment of the present application.
- FIG. 12 is a schematic structural diagram of another codec system provided by the embodiment of the present application.
- FIG. 13 is a schematic flowchart of another encoding and decoding method provided in the embodiment of the present application.
- FIG. 14 is a schematic structural diagram of another codec system provided by the embodiment of the present application.
- FIG. 15 is a schematic diagram of the performance of the codec method provided by the embodiment of the present application.
- FIG. 16 is a schematic diagram of an application scenario provided by an embodiment of the present application.
- FIG. 17 is a schematic diagram of another application scenario provided by the embodiment of the present application.
- FIG. 18 is a schematic structural diagram of a codec device provided by an embodiment of the present application.
- FIG. 19 is a schematic structural diagram of another codec device provided by the embodiment of the present application.
- FIG. 20 is a schematic structural diagram of a chip provided by an embodiment of the present application.
- first and second in the specification and drawings of the present application are used to distinguish different objects, or to distinguish different processes for the same object, rather than to describe a specific sequence of objects.
- the embodiment of the present application provides an AI-based data compression/decompression technology, especially a neural network-based data compression/decompression technology, and specifically provides a codec technology to improve the traditional mixed data codec system .
- Data encoding and decoding includes two parts: data encoding and data decoding.
- Data encoding is performed on the source side (or commonly referred to as the encoder side), and typically involves processing (eg, compressing) raw data to reduce the amount of data needed to represent that raw data (and thus more efficient storage and/or transmission).
- Data decoding is performed on the destination side (or commonly referred to as the decoder side), and usually involves inverse processing relative to the encoder side to reconstruct the original data.
- the "codec" of data involved in the embodiments of the present application should be understood as “encoding” or "decoding" of data.
- the encoding part and the decoding part are also collectively referred to as codec (encoding and decoding, CODEC).
- the original data can be reconstructed, i.e. the reconstructed original data is of the same quality as the original data (assuming no transmission loss or other data loss during storage or transmission).
- further compression is performed by quantization, etc., to reduce the amount of data required to represent the original data, and the decoder side cannot completely reconstruct the original data, that is, the quality of the reconstructed original data is lower than that of the original data or Difference.
- the embodiments of the present application may be applied to video data, image data, audio data, integer data, and other data that require compression/decompression.
- video data encoding referred to as video encoding
- Other types of data can refer to the following description , which will not be described in detail in this embodiment of the present application. It should be noted that, compared with video coding, in the coding process of data such as audio data and integer data, there is no need to divide the data into blocks, but the data can be directly coded.
- Video coding generally refers to the processing of sequences of images that form a video or video sequence.
- picture In the field of video coding, the terms “picture”, “frame” or “image” may be used as synonyms.
- Video coding standards belong to "lossy hybrid video codecs" (ie, combining spatial and temporal prediction in the pixel domain with 2D transform coding in the transform domain for applying quantization).
- Each image in a video sequence is usually partitioned into a non-overlapping set of blocks, usually encoded at the block level.
- encoders usually process, i.e.
- video at the block (video block) level e.g., through spatial (intra) prediction and temporal (inter) prediction to produce a predicted block; from the current block (currently processed/to be processed block) to obtain the residual block; transform the residual block in the transform domain and quantize the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side will be inversely processed relative to the encoder Partially applied to encoded or compressed blocks to reconstruct the current block for representation.
- the encoder needs to repeat the decoder's processing steps such that the encoder and decoder generate the same predicted (eg, intra and inter) and/or reconstructed pixels for processing, ie encoding, subsequent blocks.
- the encoder 20 and the decoder 30 are described with reference to FIGS. 1 a to 3 .
- FIG. 1a is an exemplary block diagram of a decoding system 10 provided by an embodiment of the present application, for example, a video decoding system 10 (or simply referred to as the decoding system 10 ) that can utilize the technology of the present application.
- Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) in video coding system 10 represent devices, etc. that may be used to perform techniques according to various examples described in this application. .
- the decoding system 10 includes a source device 12 for providing coded image data 21 such as coded images to a destination device 14 for decoding the coded image data 21 .
- the source device 12 includes an encoder 20 , and optionally, an image source 16 , a preprocessor (or a preprocessing unit) 18 such as an image preprocessor, and a communication interface (or a communication unit) 22 .
- Image source 16 may include or be any type of image capture device for capturing real world images, etc., and/or any type of image generation device, such as a computer graphics processor or any type of Devices for acquiring and/or providing real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality (AR) images). So
- the image source may be any type of memory or storage that stores any of the above images.
- the image (or image data) 17 may also be referred to as an original image (or original image data) 17 .
- the preprocessor 18 is used to receive the original image data 17 and perform preprocessing on the original image data 17 to obtain a preprocessed image (or preprocessed image data) 19 .
- preprocessing performed by preprocessor 18 may include cropping, color format conversion (eg, from RGB to YCbCr), color grading, or denoising. It can be understood that the preprocessing unit 18 can be an optional component.
- a video encoder (or encoder) 20 is used to receive preprocessed image data 19 and provide encoded image data 21 (to be further described below with reference to FIG. 2 etc.).
- the communication interface 22 in the source device 12 may be used to receive the encoded image data 21 and send the encoded image data 21 (or any other processed version) via the communication channel 13 to another device such as the destination device 14 or any other device for storage Or rebuild directly.
- the destination device 14 includes a decoder 30 , and may also optionally include a communication interface (or communication unit) 28 , a post-processor (or post-processing unit) 32 and a display device 34 .
- the communication interface 28 in the destination device 14 is used to receive the coded image data 21 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, for example, the storage device is a coded image data storage device, And the coded image data 21 is supplied to the decoder 30 .
- the communication interface 22 and the communication interface 28 can be used to pass through a direct communication link between the source device 12 and the destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any other Combination, any type of private network and public network or any combination thereof, send or receive coded image data (or coded data) 21 .
- the communication interface 22 can be used to encapsulate the encoded image data 21 into a suitable format such as a message, and/or use any type of transmission encoding or processing to process the encoded image data, so that it can be transmitted over a communication link or communication network on the transmission.
- the communication interface 28 corresponds to the communication interface 22, eg, can be used to receive the transmission data and process the transmission data using any type of corresponding transmission decoding or processing and/or decapsulation to obtain the encoded image data 21 .
- Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by an arrow pointing from the source device 12 to the corresponding communication channel 13 of the destination device 14 in FIG. 1 a, or a two-way communication interface, and can be used to send and receive messages etc., to establish the connection, confirm and exchange any other information related to the communication link and/or data transmission such as encoded image data transmission, etc.
- the video decoder (or decoder) 30 is used to receive encoded image data 21 and provide decoded image data (or decoded image data) 31 (which will be further described below with reference to FIG. 3 , etc.).
- the post-processor 32 is used to perform post-processing on decoded image data 31 (also referred to as reconstructed image data) such as a decoded image to obtain post-processed image data 33 such as a post-processed image.
- Post-processing performed by post-processing unit 32 may include, for example, color format conversion (e.g., from YCbCr to RGB), color grading, cropping, or resampling, or any other processing for producing decoded image data 31 for display by a display device 34 or the like. .
- the display device 34 is used to receive the post-processed image data 33 to display the image to a user or viewer or the like.
- Display device 34 may be or include any type of display for representing the reconstructed image, eg, an integrated or external display screen or display.
- the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display. ), a digital light processor (DLP), or any type of other display.
- LCD liquid crystal display
- OLED organic light emitting diode
- plasma display e.g., a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, or a liquid crystal on silicon (LCoS) display.
- DLP digital light processor
- the decoding system 10 also includes a training engine 25, the training engine 25 is used to train the encoder 20 (especially the entropy encoding unit 270 in the encoder 20) or the decoder 30 (especially the entropy decoding unit 304 in the decoder 30), Entropy coding is performed on the image block to be coded by using the estimated probability distribution obtained according to the estimation.
- the training engine 25 please refer to the following method embodiment.
- FIG. 1a shows the source device 12 and the destination device 14 as independent devices
- device embodiments may also include the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14 at the same time, that is, include both the source device 12 and the destination device 14.
- Device 12 or corresponding function and destination device 14 or corresponding function may be implemented using the same hardware and/or software or by separate hardware and/or software or any combination thereof.
- FIG. 1b is an exemplary block diagram of a video decoding system 40 provided by an embodiment of the present application, an encoder 20 (such as a video encoder 20) or a decoder 30 (such as a video decoder 30) or both All can be realized by processing circuits in the video decoding system 40 shown in FIG. 1b, such as one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuits , ASIC), field-programmable gate array (field-programmable gate array, FPGA), discrete logic, hardware, video encoding dedicated processor, or any combination thereof.
- DSP digital signal processor
- ASIC application-specific integrated circuits
- FPGA field-programmable gate array
- FIG. 2 is an exemplary block diagram of a video encoder provided in an embodiment of the present application
- FIG. 3 is an exemplary block diagram of a video decoder provided in an embodiment of the present application.
- Encoder 20 may be implemented by processing circuitry 46 to include the various modules discussed with reference to encoder 20 of FIG. 2 and/or any other encoder system or subsystem described herein.
- Decoder 30 may be implemented by processing circuitry 46 to include the various modules discussed with reference to decoder 30 of FIG. 3 and/or any other decoder system or subsystem described herein.
- the processing circuitry 46 may be used to perform various operations discussed below.
- the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and use one or more processors to execute the instructions in hardware, thereby Implement the technology of this application.
- One of the video encoder 20 and the video decoder 30 may be integrated in a single device as part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 1b.
- Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, cell phone, smartphone, tablet or tablet computer, camera, Desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (such as content service servers or content distribution servers), broadcast receiving devices, broadcast transmitting devices, and monitoring devices, etc., and No or any type of operating system may be used.
- the source device 12 and the destination device 14 may also be devices in a cloud computing scenario, such as virtual machines in a cloud computing scenario.
- source device 12 and destination device 14 may be equipped with components for wireless communication. Accordingly, source device 12 and destination device 14 may be wireless communication devices.
- the source device 12 and the destination device 14 may install a virtual scene application (application, APP) such as a virtual reality (virtual reality, VR) application, an augmented reality (augmented reality, AR) application or a mixed reality (mixed reality, MR) application, and A VR application, an AR application or an MR application may be run based on user operations (such as clicking, touching, sliding, shaking, voice control, etc.).
- APP virtual scene application
- the source device 12 and the destination device 14 can collect images/videos of any objects in the environment through cameras and/or sensors, and then display virtual objects on the display device according to the collected images/videos.
- the virtual objects can be VR scenes, AR scenes or Virtual objects in the MR scene (that is, objects in the virtual environment).
- the virtual scene applications in the source device 12 and the destination device 14 can be built-in applications in the source device 12 and the destination device 14, or can be third-party service providers installed by the user
- the provided application is not specifically limited.
- source device 12 and destination device 14 may install real-time video transmission applications, such as live broadcast applications.
- the source device 12 and the destination device 14 can collect images/videos through cameras, and then display the collected images/videos on a display device.
- the video coding system 10 shown in FIG. 1a is merely exemplary, and the techniques provided herein may be applicable to video coding settings (e.g., video encoding or video decoding) that do not necessarily include encoding devices and Decode any data communication between devices.
- data is retrieved from local storage, sent over a network, and so on.
- a video encoding device may encode and store data into memory, and/or a video decoding device may retrieve and decode data from memory.
- encoding and decoding are performed by devices that do not communicate with each other but simply encode data to memory and/or retrieve and decode data from memory.
- FIG. 1b is an exemplary block diagram of a video decoding system 40 provided by an embodiment of the present application.
- the video decoding system 40 may include an imaging device 41, a video encoder 20, a video decoding 30 (and/or a video codec implemented by processing circuitry 46 ), an antenna 42 , one or more processors 43 , one or more memory stores 44 and/or a display device 45 .
- imaging device 41, antenna 42, processing circuit 46, video encoder 20, video decoder 30, processor 43, memory storage 44 and/or display device 45 are capable of communicating with each other.
- the video coding system 40 may include only the video encoder 20 or only the video decoder 30 .
- antenna 42 may be used to transmit or receive an encoded bitstream of video data.
- display device 45 may be used to present video data.
- the processing circuit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
- the video decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
- the memory storage 44 can be any type of memory, such as volatile memory (for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.) or non-volatile memory volatile memory (for example, flash memory, etc.) and the like.
- volatile memory for example, static random access memory (static random access memory, SRAM), dynamic random access memory (dynamic random access memory, DRAM), etc.
- non-volatile memory volatile memory for example, flash memory, etc.
- memory storage 44 may be implemented by cache memory.
- processing circuitry 46 may include memory (eg, cache, etc.) for implementing an image buffer or the like.
- video encoder 20 implemented by logic circuitry may include an image buffer (eg, implemented by processing circuitry 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuitry 46 ).
- a graphics processing unit may be communicatively coupled to the image buffer.
- Graphics processing unit may include video encoder 20 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 2 and/or any other encoder system or subsystem described herein.
- Logic circuits may be used to perform the various operations discussed herein.
- video decoder 30 may be implemented by processing circuitry 46 in a similar manner to implement the various aspects discussed with reference to video decoder 30 of FIG. 3 and/or any other decoder system or subsystem described herein. module.
- logic circuit implemented video decoder 30 may include an image buffer (implemented by processing circuit 46 or memory storage 44 ) and a graphics processing unit (eg, implemented by processing circuit 46 ).
- a graphics processing unit may be communicatively coupled to the image buffer.
- Graphics processing unit may include video decoder 30 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder system or subsystem described herein.
- antenna 42 may be used to receive an encoded bitstream of video data.
- an encoded bitstream may contain data related to encoded video frames, indicators, index values, mode selection data, etc., as discussed herein, such as data related to encoding partitions (e.g., transform coefficients or quantized transform coefficients , (as discussed) an optional indicator, and/or data defining an encoding split).
- Video coding system 40 may also include video decoder 30 coupled to antenna 42 and used to decode the encoded bitstream.
- a display device 45 is used to present video frames.
- the video decoder 30 may be used to perform a reverse process.
- the video decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly.
- video encoder 20 may entropy encode the syntax elements into an encoded video bitstream.
- video decoder 30 may parse such syntax elements and decode the related video data accordingly.
- VVC general video coding
- VCEG video coding experts group
- MPEG motion picture experts group
- HEVC high-efficiency video coding
- the video encoder 20 includes an input terminal (or input interface) 201, a residual calculation unit 204, a transformation processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transformation processing unit 212, a reconstruction unit 214, Loop filter 220 , decoded picture buffer (decoded picture buffer, DPB) 230 , mode selection unit 260 , entropy coding unit 270 and output terminal (or output interface) 272 .
- Mode selection unit 260 may include inter prediction unit 244 , intra prediction unit 254 , and partition unit 262 .
- Inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown).
- the video encoder 20 shown in FIG. 2 may also be called a hybrid video encoder or a video encoder based on a hybrid video codec.
- the inter-frame prediction unit is a trained target model (also called a neural network), and the neural network is used to process an input image or an image region or an image block to generate a prediction value of the input image block.
- a neural network for inter-frame prediction is used to receive an input image or image region or image block and generate a prediction value for the input image or image region or image block.
- the residual calculation unit 204, the transform processing unit 206, the quantization unit 208, and the mode selection unit 260 constitute the forward signal path of the encoder 20, while the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop A path filter 220, a decoded picture buffer (decoded picture buffer, DPB) 230, an inter prediction unit 244, and an intra prediction unit 254 form the backward signal path of the encoder, wherein the backward signal path of the encoder 20 corresponds to the decoding signal path of the decoder (see decoder 30 in FIG. 3).
- Inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, loop filter 220, decoded picture buffer 230, inter prediction unit 244, and intra prediction unit 254 also make up the "built-in decoder" of video encoder 20 .
- the encoder 20 is operable to receive, via an input 201 or the like, an image (or image data) 17, eg an image in a sequence of images forming a video or a video sequence.
- the received image or image data may also be a preprocessed image (or preprocessed image data) 19 .
- image 17 may also be referred to as a current image or an image to be encoded (especially when the current image is distinguished from other images in video encoding, other images such as the same video sequence, that is, the video sequence that also includes the current image, before encoding post image and/or decoded image).
- a (digital) image is or can be viewed as a two-dimensional array or matrix of pixel points with intensity values. Pixels in the array may also be referred to as pixels (pixel or pel) (short for image element). The number of pixels in the array or image in the horizontal and vertical directions (or axes) determines the size and/or resolution of the image. In order to represent a color, three color components are usually used, that is, an image can be represented as or include three pixel arrays. In the RBG format or color space, an image includes corresponding red, green and blue pixel arrays.
- each pixel is usually expressed in a luminance/chroma format or color space, such as YCbCr, including a luminance component indicated by Y (sometimes also indicated by L) and two chrominance components indicated by Cb and Cr.
- the luminance (luma) component Y represents brightness or grayscale level intensity (e.g., both are the same in a grayscale image), while the two chrominance (chroma) components Cb and Cr represent chrominance or color information components .
- an image in the YCbCr format includes a luminance pixel point array of luminance pixel point values (Y) and two chrominance pixel point arrays of chrominance values (Cb and Cr).
- Images in RGB format can be converted or transformed to YCbCr format and vice versa, a process also known as color transformation or conversion. If the image is black and white, the image may only include an array of luminance pixels. Correspondingly, the image can be, for example, an array of luma pixels in monochrome format or an array of luma pixels and two corresponding arrays of chrominance pixels in 4:2:0, 4:2:2 and 4:4:4 color formats .
- an embodiment of the video encoder 20 may include an image segmentation unit (not shown in FIG. 2 ) for segmenting the image 17 into a plurality of (typically non-overlapping) image blocks 203 .
- These blocks can also be called root blocks, macroblocks (H.264/AVC) or coding tree blocks (CTB), or coding tree units (coding tree unit, CTU) in the H.265/HEVC and VVC standards ).
- the segmentation unit can be used to use the same block size for all images in a video sequence and to use a corresponding grid that defines the block size, or to vary the block size between images or subsets or groups of images and segment each image into corresponding piece.
- the video encoder may be adapted to directly receive the blocks 203 of the image 17 , for example one, several or all blocks making up said image 17 .
- the image block 203 may also be referred to as a current image block or an image block to be encoded.
- the image block 203 is also or can be regarded as a two-dimensional array or matrix composed of pixels with intensity values (pixel values), but the image block 203 is smaller than that of the image 17 .
- block 203 may comprise one pixel point array (for example, a luminance array in the case of a monochrome image 17 or a luminance array or a chrominance array in the case of a color image) or three pixel point arrays (for example, in the case of a color image 17 one luma array and two chrominance arrays) or any other number and/or type of arrays depending on the color format employed.
- a block may be an array of M ⁇ N (M columns ⁇ N rows) pixel points, or an array of M ⁇ N transform coefficients, and the like.
- the video encoder 20 shown in FIG. 2 is used to encode the image 17 block by block, eg, performing encoding and prediction on each block 203 .
- the video encoder 20 shown in FIG. 2 can also be used to segment and/or encode an image using slices (also called video slices), where an image can use one or more slices (typically non-overlapping ) for segmentation or encoding.
- slices also called video slices
- Each slice may include one or more blocks (for example, a coding tree unit CTU) or one or more block groups (for example, a coding block (tile) in the H.265/HEVC/VVC standard and a tile in the VVC standard ( brick).
- the video encoder 20 shown in FIG. 2 can also be configured to use slices/coded block groups (also called video coded block groups) and/or coded blocks (also called video coded block groups) ) to segment and/or encode an image, where an image may be segmented or encoded using one or more slices/coded block groups (usually non-overlapping), each slice/coded block group may consist of one or more A block (such as a CTU) or one or more coding blocks, etc., wherein each coding block may be in the shape of a rectangle or the like, and may include one or more complete or partial blocks (such as a CTU).
- slices/coded block groups also called video coded block groups
- coded blocks also called video coded block groups
- the residual calculation unit 204 is used to calculate the residual block 205 according to the image block (or original block) 203 and the prediction block 265 (the prediction block 265 will be described in detail later): for example, pixel by pixel (pixel by pixel) from the image
- the pixel value of the predicted block 265 is subtracted from the pixel value of the block 203 to obtain the residual block 205 in the pixel domain.
- the transform processing unit 206 is configured to perform discrete cosine transform (discrete cosine transform, DCT) or discrete sine transform (discrete sine transform, DST) etc. on the pixel point values of the residual block 205 to obtain transform coefficients 207 in the transform domain.
- the transform coefficients 207 may also be referred to as transform residual coefficients, representing the residual block 205 in the transform domain.
- Transform processing unit 206 may be configured to apply an integer approximation of DCT/DST, such as the transform specified for H.265/HEVC. This integer approximation is usually scaled by some factor compared to the orthogonal DCT transform. To maintain the norm of the forward and inverse transformed residual blocks, other scaling factors are used as part of the transformation process. The scaling factor is usually chosen according to certain constraints, such as the scaling factor being a power of 2 for the shift operation, the bit depth of the transform coefficients, the trade-off between accuracy and implementation cost, etc.
- specifying a specific scaling factor for the inverse transform at the encoder 20 side by the inverse transform processing unit 212 (and for the corresponding inverse transform at the decoder 30 side by, for example, the inverse transform processing unit 312), and correspondingly, can The side 20 specifies the corresponding scaling factor for the forward transform through the transform processing unit 206 .
- the video encoder 20 (correspondingly, the transform processing unit 206) can be used to output transform parameters such as one or more transform types, for example, directly output or output after encoding or compression by the entropy encoding unit 270 , for example, so that the video decoder 30 can receive and use the transformation parameters for decoding.
- transform parameters such as one or more transform types, for example, directly output or output after encoding or compression by the entropy encoding unit 270 , for example, so that the video decoder 30 can receive and use the transformation parameters for decoding.
- the quantization unit 208 is configured to quantize the transform coefficient 207 by, for example, scalar quantization or vector quantization, to obtain a quantized transform coefficient 209 .
- Quantized transform coefficients 209 may also be referred to as quantized residual coefficients 209 .
- the quantization process may reduce the bit depth associated with some or all of the transform coefficients 207 .
- n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m.
- the degree of quantization can be modified by adjusting a quantization parameter (quantization parameter, QP).
- QP quantization parameter
- QP quantization parameter
- a smaller quantization step size corresponds to finer quantization
- a larger quantization step size corresponds to coarser quantization.
- a suitable quantization step size can be indicated by a quantization parameter (quantization parameter, QP).
- a quantization parameter may be an index to a predefined set of suitable quantization step sizes.
- Quantization may include dividing by a quantization step size, while corresponding or inverse dequantization performed by the inverse quantization unit 210 or the like may include multiplying by a quantization step size.
- Embodiments according to some standards such as HEVC may be used to determine the quantization step size using quantization parameters.
- the quantization step size can be calculated from the quantization parameter using a fixed-point approximation of an equation involving division.
- the video encoder 20 (correspondingly, the quantization unit 208) can be used to output a quantization parameter (quantization parameter, QP), for example, directly output or output after being encoded or compressed by the entropy encoding unit 270, for example, making the video Decoder 30 may receive and use the quantization parameters for decoding.
- a quantization parameter quantization parameter, QP
- the inverse quantization unit 210 is used to perform the inverse quantization of the quantization unit 208 on the quantization coefficients to obtain the dequantization coefficients 211, for example, perform the inverse quantization of the quantization scheme performed by the quantization unit 208 according to or use the same quantization step size as that of the quantization unit 208 plan.
- the dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211 , corresponding to the transform coefficients 207 , but due to loss caused by quantization, the dequantized coefficients 211 are usually not exactly the same as the transform coefficients.
- the inverse transform processing unit 212 is configured to perform an inverse transform of the transform performed by the transform processing unit 206, for example, an inverse discrete cosine transform (discrete cosine transform, DCT) or an inverse discrete sine transform (discrete sine transform, DST), to transform in the pixel domain
- DCT inverse discrete cosine transform
- DST inverse discrete sine transform
- a reconstructed residual block 213 (or corresponding dequantization coefficients 213) is obtained.
- the reconstructed residual block 213 may also be referred to as a transform block 213 .
- the reconstruction unit 214 (e.g., summer 214) is used to add the transform block 213 (i.e., the reconstructed residual block 213) to the predicted block 265 to obtain the reconstructed block 215 in the pixel domain, for example, the reconstructed residual block 213
- the pixel value is added to the pixel value of the prediction block 265 .
- the loop filter unit 220 (or “loop filter” 220 for short) is used to filter the reconstructed block 215 to obtain the filtered block 221, or generally used to filter the reconstructed pixels to obtain filtered pixel values.
- a loop filter unit is used to smooth pixel transitions or improve video quality.
- the loop filter unit 220 may include one or more loop filters, such as deblocking filters, pixel adaptive offset (sample-adaptive offset, SAO) filters, or one or more other filters, such as auto Adaptive loop filter (ALF), noise suppression filter (NSF), or any combination.
- the loop filter unit 220 may include a deblocking filter, an SAO filter, and an ALF filter.
- the order of the filtering process may be deblocking filter, SAO filter and ALF filter.
- add a process called luma mapping with chroma scaling (LMCS) ie adaptive in-loop shaper. This process is performed before deblocking.
- LMCS luma mapping with chroma scaling
- the deblocking filtering process can also be applied to internal sub-block edges, such as affine sub-block edges, ATMVP sub-block edges, sub-block transform (sub-block transform, SBT) edges and intra sub-partition (ISP )edge.
- loop filter unit 220 is shown in FIG. 2 as a loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter.
- the filtering block 221 may also be referred to as a filtering reconstruction block 221 .
- video encoder 20 (correspondingly, loop filter unit 220) can be used to output loop filter parameters (such as SAO filter parameters, ALF filter parameters or LMCS parameters), for example, directly or by entropy
- the encoding unit 270 performs entropy encoding to output, for example, so that the decoder 30 can receive and use the same or different loop filter parameters for decoding.
- a decoded picture buffer (DPB) 230 may be a reference picture memory that stores reference picture data for use by the video encoder 20 when encoding video data.
- the DPB 230 may be formed from any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (synchronous DRAM, SDRAM), magnetoresistive RAM (magnetoresistive RAM, MRAM), Resistive RAM (resistive RAM, RRAM) or other types of storage devices.
- DRAM dynamic random access memory
- the decoded picture buffer 230 may be used to store one or more filter blocks 221 .
- the decoded picture buffer 230 may also be used to store other previously filtered blocks, such as the previously reconstructed and filtered block 221, of the same current picture or a different picture such as a previous reconstructed picture, and may provide the complete previously reconstructed, i.e. decoded picture (and the corresponding reference blocks and pixels) and/or a partially reconstructed current image (and corresponding reference blocks and pixels), for example for inter-frame prediction.
- the decoded image buffer 230 can also be used to store one or more unfiltered reconstruction blocks 215, or generally store unfiltered reconstruction pixels, for example, the reconstruction blocks 215 that have not been filtered by the loop filter unit 220, or have not been filtered. Any other processed reconstruction blocks or reconstructed pixels.
- the mode selection unit 260 includes a segmentation unit 262, an inter prediction unit 244, and an intra prediction unit 254 for receiving or obtaining raw raw image data such as block 203 (current block 203 of current image 17) and reconstructed image data, e.g. filtered and/or unfiltered reconstructed pixels of the same (current) image and/or one or more previously decoded images or Rebuild blocks.
- the reconstructed image data is used as reference image data required for prediction such as inter-frame prediction or intra-frame prediction to obtain a prediction block 265 or a prediction value 265 .
- the mode selection unit 260 can be used to determine or select a partition for the current block (including no partition) and a prediction mode (such as intra or inter prediction mode), and generate a corresponding prediction block 265 to calculate and calculate the residual block 205
- the reconstruction block 215 is reconstructed.
- mode selection unit 260 is operable to select a partitioning and prediction mode (e.g., from among the prediction modes supported or available by mode selection unit 260) that provides the best match or the smallest residual (minimum Residual refers to better compression in transmission or storage), or provides minimal signaling overhead (minimum signaling overhead refers to better compression in transmission or storage), or considers or balances both of the above.
- the mode selection unit 260 may be configured to determine the partition and prediction mode according to rate distortion optimization (RDO), that is, to select the prediction mode that provides the minimum rate distortion optimization.
- RDO rate distortion optimization
- best do not necessarily refer to “best”, “lowest”, “best” in general, but may refer to situations where termination or selection criteria are met, e.g., Values above or below thresholds or other constraints may result in “sub-optimal selection”, but reduce complexity and processing time.
- segmentation unit 262 may be used to segment images in a video sequence into a sequence of coding tree units (CTUs), and CTUs 203 may be further segmented into smaller block portions or sub-blocks (again forming blocks), e.g. By iteratively using quad-tree partitioning (QT) partitioning, binary-tree partitioning (BT) partitioning or triple-tree partitioning (TT) partitioning or any combination thereof, and for example or each of the sub-blocks to perform prediction, wherein the mode selection includes selecting the tree structure of the partition block 203 and selecting the prediction mode to be applied to the block portion or each of the sub-blocks.
- QT quad-tree partitioning
- BT binary-tree partitioning
- TT triple-tree partitioning
- partitioning eg, performed by partition unit 262
- prediction processing eg, performed by inter-prediction unit 244 and intra-prediction unit 254
- the segmentation unit 262 may divide (or divide) an image block (or CTU) 203 into smaller parts, such as square or rectangular shaped small blocks.
- a CTU consists of N ⁇ N luma pixel blocks and two corresponding chrominance pixel blocks.
- the maximum allowed size of a luma block in a CTU is specified as 128 ⁇ 128 in the developing Versatile Video Coding (VVC) standard, but may be specified in the future to a value other than 128 ⁇ 128, such as 256 ⁇ 256.
- VVC Versatile Video Coding
- the CTUs of an image can be pooled/grouped into slices/coded block groups, coded blocks or bricks.
- a coding block covers a rectangular area of an image, and a coding block can be divided into one or more bricks.
- a brick consists of multiple CTU rows within an encoded block.
- a coded block that is not partitioned into multiple bricks may be called a brick.
- bricks are a true subset of coded blocks and are therefore not called coded blocks.
- VVC supports two coded block group modes, namely raster scan slice/coded block group mode and rectangular slice mode.
- RSCBG mode a slice/CBG contains a sequence of CBGs in a coded block raster scan of an image.
- rectangular tile mode a tile contains multiple tiles of an image that together form a rectangular area of the image.
- the tiles within the rectangular slice are arranged in the photo's tile raster scan order.
- These smaller blocks can be further divided into smaller parts.
- This is also known as tree splitting or hierarchical tree splitting, where the root block at root tree level 0 (hierarchy level 0, depth 0) etc. can be recursively split into blocks of two or more next lower tree levels, For example a node at tree level 1 (hierarchy level 1, depth 1).
- These blocks can in turn be split into two or more blocks at the next lower level, e.g. tree level 2 (hierarchy level 2, depth 2), etc., until the end of the split (because the end criteria are met, e.g. maximum tree depth or minimum block size).
- Blocks that are not further divided are also called leaf blocks or leaf nodes of the tree.
- a tree divided into two parts is called a binary-tree (BT)
- a tree divided into three parts is called a ternary-tree (TT)
- a tree divided into four parts is called a quadtree ( quad-tree, QT).
- a coding tree unit may be or include a CTB of luma pixels, two corresponding CTBs of chroma pixels of an image having an array of three pixels, or a CTB of pixels of a monochrome image or using three
- a coding tree block can be an N ⁇ N pixel block, where N can be set to a certain value so that the components are divided into CTBs, which is segmentation.
- a coding unit may be or include a coding block of luma pixels, two corresponding coding blocks of chrominance pixels of an image having three pixel arrays, or a coding block of pixels of a monochrome image or An encoded block of pixels of an image encoded using three separate color planes and syntax structures (for encoding pixels).
- a coding block can be M ⁇ N pixel blocks, where M and N can be set to a certain value so that the CTB is divided into coding blocks, which is division.
- a coding tree unit may be divided into a plurality of CUs according to HEVC by using a quadtree structure represented as a coding tree.
- the decision whether to encode an image region using inter (temporal) prediction or intra (spatial) prediction is made at the leaf-CU level.
- Each leaf-CU can be further divided into one, two or four PUs according to the PU division type.
- the same prediction process is used within a PU, and relevant information is transmitted to the decoder in units of PUs.
- the leaf CU can be partitioned into transform units (TUs) according to other quadtree structures similar to the coding tree used for the CU.
- VVC Versatile Video Coding
- a combined quadtree of nested multi-type trees (such as binary and ternary trees) is used to partition for partition coding
- the segmentation structure of the tree unit In the coding tree structure in the coding tree unit, the CU can be square or rectangular.
- the coding tree unit (CTU) is first divided by the quadtree structure.
- the quadtree leaf nodes are further composed of multi-type Tree structure segmentation.
- Multi-type leaf nodes are called is a coding unit (CU), unless the CU is too large for the maximum transform length, such a segment is used for prediction and transform processing without any other partition.In most cases, this means that CU, PU and TU are in the quad.
- CU coding unit
- the block size in the coding block structure of the tree-nested multi-type tree is the same. This exception occurs when the maximum supported transform length is less than the width or height of the color component of the CU.
- VVC has a quad-tree nested multi-type tree
- the signaling mechanism the coding tree unit (CTU) is first divided by the quadtree structure as the root of the quadtree. Then each quadtree leaf node (when enough can be further split into a multi-type tree structure.
- the first flag mtt_split_cu_flag
- the second flag mtt_split_cu_vertical_flag
- the decoder can derive the multi-type tree division mode (MttSplitMode) of the CU based on predefined rules or tables.
- TT division when the width or height of the luma coding block is greater than 64, TT division is not allowed .
- the width or height of the chroma encoding block is greater than 32, TT division is also not allowed.
- the pipeline design divides the image into multiple virtual pipeline data units (virtual pipeline data unit, VPDU), and each VPDU is defined in the image as mutual Non-overlapping units.
- VPDU virtual pipeline data unit
- consecutive VPDUs are processed simultaneously in multiple pipeline stages.
- the VPDU size is roughly proportional to the buffer size, so VPD needs to be kept small U.
- the VPDU size can be set to the maximum transform block (TB) size.
- TT ternary tree
- BT binary tree
- the tree node block is forced to be divided until all pixels of each coded CU are located within the image boundary.
- the intra sub-partitions (intra sub-partitions, ISP) tool may vertically or horizontally divide the luma intra prediction block into two or four sub-parts according to the block size.
- mode selection unit 260 of video encoder 20 may be configured to perform any combination of the segmentation techniques described above.
- the video encoder 20 is configured to determine or select the best or optimal prediction mode from a set of (predetermined) prediction modes.
- the set of prediction modes may include, for example, intra prediction modes and/or inter prediction modes.
- the set of intra prediction modes can include 35 different intra prediction modes, e.g. non-directional modes like DC (or mean) mode and planar mode, or directional modes as defined by HEVC, or can include 67 different Intra prediction modes, eg non-directional modes like DC (or mean) mode and planar mode, or directional modes as defined in VVC.
- intra prediction modes e.g. non-directional modes like DC (or mean) mode and planar mode, or directional modes as defined in VVC.
- several traditional angle intra prediction modes are adaptively replaced with wide angle intra prediction modes for non-square blocks defined in VVC.
- to avoid the division operation of DC prediction only the longer side is used to calculate the average value of non-square blocks.
- the intra prediction result of the planar mode can also be modified by using a position dependent intra prediction combination (PDPC) method.
- PDPC position dependent intra prediction combination
- the intra prediction unit 254 is configured to generate an intra prediction block 265 by using reconstructed pixels of adjacent blocks of the same current image according to an intra prediction mode in the intra prediction mode set.
- Intra prediction unit 254 (or generally mode selection unit 260) is also configured to output intra prediction parameters (or generally information indicating the selected intra prediction mode for a block) in the form of syntax elements 266 to entropy encoding unit 270 , to be included in the encoded image data 21, so that the video decoder 30 can perform operations such as receiving and using prediction parameters for decoding.
- the intra prediction modes in HEVC include DC prediction mode, planar prediction mode and 33 angle prediction modes, a total of 35 candidate prediction modes.
- the current block can be intra-predicted using the pixels of the reconstructed image blocks on the left and above as references.
- An image block used for performing intra-frame prediction on the current block in the peripheral area of the current block becomes a reference block, and pixels in the reference block are called reference pixels.
- the DC prediction mode is suitable for the area with flat texture in the current block, and all pixels in this area use the average value of the reference pixels in the reference block as prediction;
- the planar prediction mode is suitable for image blocks with smooth texture changes , the current block that meets this condition uses the reference pixels in the reference block to perform bilinear interpolation as the prediction of all pixels in the current block;
- the angle prediction mode uses the characteristic that the texture of the current block is highly correlated with the texture of the adjacent reconstructed image block , copy the value of the reference pixel in the corresponding reference block along a certain angle as the prediction of all the pixels in the current block.
- the HEVC encoder selects an optimal intra prediction mode from 35 candidate prediction modes for the current block, and writes the optimal intra prediction mode into the video stream.
- the encoder/decoder will derive the three most probable modes from the respective optimal intra prediction modes of the reconstructed image blocks in the surrounding area using intra prediction. If given to the current block The selected optimal intra prediction mode is one of the three most probable modes, then encode a first index indicating that the selected optimal intra prediction mode is one of the three most probable modes; if selected The optimal intra prediction mode is not the three most probable modes, then encode a second index indicating that the selected optimal intra prediction mode is the other 32 modes (except the above three most probable modes among the 35 candidate prediction modes one of the other modes).
- the HEVC standard uses a 5-bit fixed-length code as the aforementioned second index.
- the method for the HEVC encoder to derive the three most probable modes includes: selecting the optimal intra prediction mode of the left adjacent image block and the upper adjacent image block of the current block into the set, if the two optimal intra prediction modes are the same, only one can be kept in the set. If the two optimal intra prediction modes are the same and both are angle prediction modes, then select two angle prediction modes adjacent to the angle direction to add to the set; otherwise, select planar prediction mode, DC mode mode and vertical prediction mode in turn Patterns are added to the set until the number of patterns in the set reaches 3.
- the HEVC decoder After the HEVC decoder performs entropy decoding on the code stream, it obtains the mode information of the current block, which includes an indicator indicating whether the optimal intra prediction mode of the current block is among the three most probable modes, and the optimal intra prediction mode of the current block.
- the set of inter prediction modes depends on available reference pictures (i.e., e.g. at least some previously decoded pictures previously stored in DBP 230) and other inter prediction parameters, e.g. on whether the entire reference picture is used or only Use part of the reference image, e.g. the search window area around the area of the current block, to search for the best matching reference block, and/or e.g. depending on whether half-pel, quarter-pel and/or 16th interpolation is performed pixel interpolation.
- available reference pictures i.e., e.g. at least some previously decoded pictures previously stored in DBP 230
- other inter prediction parameters e.g. on whether the entire reference picture is used or only Use part of the reference image, e.g. the search window area around the area of the current block, to search for the best matching reference block, and/or e.g. depending on whether half-pel, quarter-pel and/or 16th interpolation is performed pixel interpolation.
- skip mode and/or direct mode may also be employed.
- the merge candidate list for this mode consists of the following five candidate types in order: Spatial MVP from spatially adjacent CUs, Temporal MVP from collocated CUs, History-based MVP from FIFO table, Pairwise MVP Average MVP and zero MV.
- Decoder side motion vector refinement (DMVR) based on bilateral matching can be used to increase the accuracy of MV in merge mode.
- Merge mode with MVD (merge mode with MVD, MMVD) comes from merge mode with motion vector difference. Send the MMVD flag immediately after sending the skip flag and the merge flag to specify whether the CU uses MMVD mode.
- a CU-level adaptive motion vector resolution (AMVR) scheme may be used. AMVR supports CU's MVD encoding at different precisions.
- the MVD of the current CU is adaptively selected.
- a combined inter/intra prediction (CIIP) mode can be applied to the current CU.
- a weighted average is performed on the inter-frame and intra-frame prediction signals to obtain CIIP prediction.
- the affine motion field of a block is described by the motion information of 2 control points (4 parameters) or 3 control points (6 parameters) motion vector.
- SBTMVP subblock-based temporal motion vector prediction
- TMVP temporal motion vector prediction
- Bi-directional optical flow (BDOF), formerly known as BIO, is a simplified version that reduces computation, especially in terms of the number of multiplications and the size of the multiplier.
- the triangular partition mode the CU is evenly divided into two triangular parts in two ways: diagonal division and anti-diagonal division.
- the bidirectional prediction mode extends simple averaging to support weighted averaging of two prediction signals.
- the inter prediction unit 244 may include a motion estimation (motion estimation, ME) unit and a motion compensation (motion compensation, MC) unit (both are not shown in FIG. 2 ).
- the motion estimation unit is operable to receive or acquire image block 203 (current image block 203 of current image 17) and decoded image 231, or at least one or more previously reconstructed blocks, e.g., of one or more other/different previously decoded images 231 Reconstruct blocks for motion estimation.
- a video sequence may comprise a current picture and a previous decoded picture 231, or in other words, the current picture and a previous decoded picture 231 may be part of or form a sequence of pictures forming the video sequence.
- encoder 20 may be configured to select a reference block from a plurality of reference blocks in the same or different images in a plurality of other images, and assign the reference image (or reference image index) and/or the position (x, y coordinates) of the reference block ) and the position of the current block (spatial offset) are provided to the motion estimation unit as inter prediction parameters.
- This offset is also called a motion vector (MV).
- the motion compensation unit is configured to obtain, for example, receive, inter-frame prediction parameters, and perform inter-frame prediction according to or using the inter-frame prediction parameters to obtain an inter-frame prediction block 246 .
- Motion compensation performed by the motion compensation unit may include extracting or generating a prediction block from a motion/block vector determined by motion estimation, and may include performing interpolation to sub-pixel precision. Interpolation filtering can generate pixels of other pixels from pixels of known pixels, thereby potentially increasing the number of candidate predictive blocks that can be used to encode an image block.
- the motion compensation unit may locate the prediction block pointed to by the motion vector in one of the reference image lists.
- the motion compensation unit may also generate block- and video-slice-related syntax elements for use by video decoder 30 when decoding image blocks of video slices. Additionally, or instead of slices and corresponding syntax elements, coding block groups and/or coding blocks and corresponding syntax elements may be generated or used.
- the motion vector (motion vector, MV) that can be added to the candidate motion vector list as an alternative includes the spatial phase of the current block
- the MVs of adjacent and temporally adjacent image blocks, wherein the MVs of spatially adjacent image blocks may include the MV of the left candidate image block on the left of the current block and the MV of the upper candidate image block above the current block.
- FIG. 4 is an exemplary schematic diagram of candidate image blocks provided by the embodiment of the present application. As shown in FIG.
- the set of candidate image blocks on the left includes ⁇ A0, A1 ⁇ , and the upper
- the set of candidate image blocks includes ⁇ B0, B1, B2 ⁇
- the set of temporally adjacent candidate image blocks includes ⁇ C, T ⁇ .
- the order can be to give priority to the set ⁇ A0, A1 ⁇ of the left candidate image block of the current block (consider A0 first, A0 is not available and then consider A1), and secondly consider the set of candidate image blocks above the current block ⁇ B0, B1, B2 ⁇ (consider B0 first, consider B1 if B0 is not available, and then consider B2 if B1 is not available), and finally consider the set ⁇ C, T ⁇ of candidate image blocks adjacent to the current block in time domain (consider T first, T is not available Consider C) again.
- the optimal MV is determined from the candidate motion vector list through the rate distortion cost (RD cost), and the candidate motion vector with the smallest RD cost is used as the motion vector predictor (motion vector predictor, MVP).
- RD cost rate distortion cost
- MVP motion vector predictor
- J represents RD cost
- SAD is the absolute error sum (sum of absolute differences, SAD) between the pixel value of the prediction block obtained after motion estimation using the candidate motion vector and the pixel value of the current block
- R represents the code rate
- ⁇ represents the Lagrangian multiplier
- the encoding end transmits the determined index of the MVP in the candidate motion vector list to the decoding end. Further, the motion search can be performed in the neighborhood centered on the MVP to obtain the actual motion vector of the current block, and the encoding end calculates the motion vector difference (motion vector difference, MVD) between the MVP and the actual motion vector, and calculates the MVD passed to the decoder.
- the decoding end parses the index, finds the corresponding MVP in the candidate motion vector list according to the index, parses the MVD, and adds the MVD and the MVP to obtain the actual motion vector of the current block.
- the motion information that can be added to the candidate motion information list as an alternative includes the motion information of the image blocks adjacent to the current block in the spatial domain or in the temporal domain, where the spatial domain Adjacent image blocks and temporally adjacent image blocks can refer to Figure 4.
- the candidate motion information corresponding to the spatial domain in the candidate motion information list comes from five spatially adjacent blocks (A0, A1, B0, B1, and B2) , if the neighboring blocks in space are unavailable or are intra-frame predicted, their motion information will not be added to the candidate motion information list.
- the candidate motion information in the time domain of the current block is obtained by scaling the MV of the corresponding position block in the reference frame according to the picture order count (POC) of the reference frame and the current frame, and first judges the block whose position is T in the reference frame Whether it is available, if not available, select the block with position C. After obtaining the above candidate motion information list, determine the optimal motion information from the candidate motion information list through RD cost as the motion information of the current block.
- the encoding end transmits the index value (denoted as merge index) of the position of the optimal motion information in the candidate motion information list to the decoding end.
- the entropy coding unit 270 is used to use an entropy coding algorithm or scheme (for example, a variable length coding (variable length coding, VLC) scheme, a context adaptive VLC scheme (context adaptive VLC, CALVC), an arithmetic coding scheme, a binarization algorithm, Context Adaptive Binary Arithmetic Coding (CABAC), Syntax-based context-adaptive Binary Arithmetic Coding (SBAC), Probability Interval Partitioning Entropy (PIPE) ) encoding or other entropy encoding methods or techniques) are applied to the quantized residual coefficient 209, inter prediction parameters, intra prediction parameters, loop filter parameters and/or other syntax elements, and the obtained bit stream can be encoded by the output terminal 272 21 etc., so that the video decoder 30 etc. can receive and use parameters for decoding.
- the encoded bitstream 21 may be transmitted to the video decoder 30, or stored in memory for later transmission or retrieval by the video decoder 30.
- a non-transform based encoder 20 may directly quantize the residual signal without a transform processing unit 206 for certain blocks or frames.
- encoder 20 may have quantization unit 208 and inverse quantization unit 210 combined into a single unit.
- the video decoder 30 is used to receive the encoded image data 21 (eg encoded bit stream 21 ) encoded by the encoder 20 to obtain a decoded image 331 .
- the coded image data or bitstream comprises information for decoding said coded image data, eg data representing image blocks of a coded video slice (and/or coded block group or coded block) and associated syntax elements.
- the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (such as a summer 314), a loop filter 320, a decoded picture buffer (DBP ) 330, mode application unit 360, inter prediction unit 344, and intra prediction unit 354.
- Inter prediction unit 344 may be or include a motion compensation unit.
- video decoder 30 may perform a decoding process that is substantially inverse to the encoding process described with reference to video encoder 100 of FIG. 2 .
- the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the decoded picture buffer DPB 230, the inter prediction unit 344 and the intra prediction unit 354 also constitute a video encoder 20's "built-in decoder".
- the inverse quantization unit 310 may be functionally the same as the inverse quantization unit 110
- the inverse transform processing unit 312 may be functionally the same as the inverse transform processing unit 122
- the reconstruction unit 314 may be functionally the same as the reconstruction unit 214
- the loop The filter 320 may be functionally the same as the loop filter 220
- the decoded picture buffer 330 may be functionally the same as the decoded picture buffer 230 . Therefore, the explanation of the corresponding elements and functions of the video encoder 20 applies to the corresponding elements and functions of the video decoder 30 accordingly.
- the entropy decoding unit 304 is used to analyze the bit stream 21 (or generally coded image data 21) and perform entropy decoding on the coded image data 21 to obtain quantization coefficients 309 and/or decoded coding parameters (not shown in FIG. 3 ), etc. , such as inter prediction parameters (such as reference image index and motion vector), intra prediction parameters (such as intra prediction mode or index), transformation parameters, quantization parameters, loop filter parameters and/or other syntax elements, etc. either or all.
- the entropy decoding unit 304 may be configured to apply a decoding algorithm or scheme corresponding to the encoding scheme of the entropy encoding unit 270 of the encoder 20 .
- Entropy decoding unit 304 may also be configured to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360 , and to provide other parameters to other units of decoder 30 .
- Video decoder 30 may receive video slice and/or video block level syntax elements. Additionally, or instead of slices and corresponding syntax elements, coding block groups and/or coding blocks and corresponding syntax elements may be received or used.
- the inverse quantization unit 310 may be configured to receive a quantization parameter (quantization parameter, QP) (or generally information related to inverse quantization) and quantization coefficients from the encoded image data 21 (for example, parsed and/or decoded by the entropy decoding unit 304), and based on The quantization parameter performs inverse quantization on the decoded quantization coefficient 309 to obtain an inverse quantization coefficient 311 , and the inverse quantization coefficient 311 may also be called a transform coefficient 311 .
- the inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization, as well as the degree of inverse quantization that needs to be performed.
- the inverse transform processing unit 312 is operable to receive dequantized coefficients 311 , also referred to as transform coefficients 311 , and apply a transform to the dequantized coefficients 311 to obtain a reconstructed residual block 213 in the pixel domain.
- the reconstructed residual block 213 may also be referred to as a transform block 313 .
- the transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process.
- the inverse transform processing unit 312 may also be configured to receive transform parameters or corresponding information from the encoded image data 21 (eg, parsed and/or decoded by the entropy decoding unit 304 ) to determine the transform to apply to the dequantized coefficients 311 .
- the reconstruction unit 314 (for example, the summer 314) is used to add the reconstruction residual block 313 to the prediction block 365 to obtain the reconstruction block 315 in the pixel domain, for example, the pixel value of the reconstruction residual block 313 and the prediction block 365 pixel values are added.
- the loop filter unit 320 is used (in the encoding loop or after) to filter the reconstructed block 315 to obtain the filtered block 321 to smooth pixel transformation or improve video quality, etc.
- the loop filter unit 320 may include one or more loop filters, such as deblocking filters, pixel adaptive offset (sample-adaptive offset, SAO) filters, or one or more other filters, such as auto Adaptive loop filter (ALF), noise suppression filter (NSF), or any combination.
- the loop filter unit 220 may include a deblocking filter, an SAO filter, and an ALF filter. The order of the filtering process may be deblocking filter, SAO filter and ALF filter.
- LMCS luma mapping with chroma scaling
- This process is performed before deblocking.
- the deblocking filtering process can also be applied to internal sub-block edges, such as affine sub-block edges, ATMVP sub-block edges, sub-block transform (sub-block transform, SBT) edges and intra sub-partition (ISP )edge.
- loop filter unit 320 is shown in FIG. 3 as a loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.
- the decoded video block 321 in one picture is then stored in a decoded picture buffer 330 which stores the decoded picture 331 as a reference picture for subsequent motion compensation in other pictures and/or for respective output display.
- the decoder 30 is used to output the decoded image 311 through the output terminal 312 and so on, for displaying or viewing by the user.
- the inter prediction unit 344 may be functionally the same as the inter prediction unit 244 (especially the motion compensation unit), and the intra prediction unit 354 may be functionally the same as the inter prediction unit 254, and is based on the coded image data 21 (eg Partitioning and/or prediction parameters or corresponding information received by the entropy decoding unit 304 (parsed and/or decoded) determines partitioning or partitioning and performs prediction.
- the mode application unit 360 can be used to perform prediction (intra-frame or inter-frame prediction) for each block according to the reconstructed image, block or corresponding pixels (filtered or unfiltered), to obtain the predicted block 365 .
- the intra prediction unit 354 in the mode application unit 360 is used to generate an input frame based on the indicated intra prediction mode and data from a previously decoded block of the current picture.
- a prediction block 365 based on an image block of the current video slice.
- inter prediction unit 344 e.g., motion compensation unit
- the element generates a prediction block 365 for a video block of the current video slice.
- the predicted blocks may be generated from one of the reference pictures in one of the reference picture lists.
- Video decoder 30 may construct reference frame list 0 and list 1 from the reference pictures stored in DPB 330 using a default construction technique.
- slices e.g., video slices
- the same or similar process can be applied to embodiments of encoding block groups (e.g., video encoding block groups) and/or encoding blocks (e.g., video encoding blocks),
- video may be encoded using I, P or B coding block groups and/or coding blocks.
- the mode application unit 360 is configured to determine prediction information for a video block of the current video slice by parsing motion vectors and other syntax elements, and use the prediction information to generate a prediction block for the current video block being decoded. For example, the mode application unit 360 uses some of the received syntax elements to determine the prediction mode (such as intra prediction or inter prediction), the inter prediction slice type (such as B slice, P slice or GPB slice) for encoding the video block of the video slice. slice), construction information for one or more reference picture lists for the slice, motion vectors for each inter-coded video block of the slice, inter prediction state for each inter-coded video block of the slice, Other information to decode video blocks within the current video slice.
- the prediction mode such as intra prediction or inter prediction
- the inter prediction slice type such as B slice, P slice or GPB slice
- construction information for one or more reference picture lists for the slice motion vectors for each inter-coded video block of the slice, inter prediction state for each inter-coded video block of the slice, Other information to decode video blocks within the
- encoding block groups e.g., video encoding block groups
- encoding blocks e.g., video encoding blocks
- video may be encoded using I, P or B coding block groups and/or coding blocks.
- the video encoder 30 of FIG. 3 can also be used to segment and/or decode an image using slices (also called video slices), where an image can be segmented using one or more slices (typically non-overlapping). split or decode.
- slices also called video slices
- Each slice may include one or more blocks (eg, CTUs) or one or more block groups (eg, coded blocks in the H.265/HEVC/VVC standard and tiles in the VVC standard.
- the video decoder 30 shown in FIG. 3 can also be configured to use slices/coded block groups (also called video coded block groups) and/or coded blocks (also called video coded block groups) ) to segment and/or decode an image, where an image may be segmented or decoded using one or more slices/coded block groups (usually non-overlapping), each slice/coded block group may consist of one or more A block (such as a CTU) or one or more coding blocks, etc., wherein each coding block may be in the shape of a rectangle or the like, and may include one or more complete or partial blocks (such as a CTU).
- slices/coded block groups also called video coded block groups
- coded blocks also called video coded block groups
- video decoder 30 may be used to decode encoded image data 21 .
- decoder 30 may generate an output video stream without loop filter unit 320 .
- the non-transform based decoder 30 can directly inverse quantize the residual signal if some blocks or frames do not have the inverse transform processing unit 312 .
- video decoder 30 may have inverse quantization unit 310 and inverse transform processing unit 312 combined into a single unit.
- the processing result of the current step can be further processed, and then output to the next step.
- further operations such as clipping or shifting operations, may be performed on the processing results of interpolation filtering, motion vector derivation or loop filtering.
- the value of the motion vector is limited to a predefined range according to the representation bits of the motion vector. If the representation bit of the motion vector is bitDepth, the range is -2 ⁇ (bitDepth-1) to 2 ⁇ (bitDepth-1)-1, where " ⁇ " represents a power. For example, if the bitDepth is set to 16, the range is -32768 to 32767; if the bitDepth is set to 18, the range is -131072 to 131071.
- the value of deriving a motion vector (e.g. the MVs of 4 4x4 sub-blocks in an 8x8 block) is constrained such that the maximum difference between the integer parts of the 4 4x4 sub-blocks MVs is not More than N pixels, for example, no more than 1 pixel.
- a motion vector e.g. the MVs of 4 4x4 sub-blocks in an 8x8 block
- bitDepth two ways to limit motion vectors based on bitDepth.
- embodiments of the decoding system 10, encoder 20, and decoder 30, as well as other embodiments described herein may also be used for still image processing or codecs, That is, the processing or coding of a single image in a video codec independently of any previous or successive images.
- image processing is limited to a single image 17, inter prediction unit 244 (encoder) and inter prediction unit 344 (decoder) may not be available.
- All other functions (also referred to as tools or techniques) of video encoder 20 and video decoder 30 are equally applicable to still image processing, such as residual calculation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse ) transformation 212/312, segmentation 262/362, intra prediction 254/354 and/or loop filtering 220/320, entropy encoding 270 and entropy decoding 304.
- FIG. 5 is an exemplary block diagram of a video decoding device 500 provided in an embodiment of the present application.
- the video coding apparatus 500 is suitable for implementing the disclosed embodiments described herein.
- the video decoding device 500 may be a decoder, such as the video decoder 30 in FIG. 1a, or an encoder, such as the video encoder 20 in FIG. 1a.
- the video decoding device 500 includes: an input port 510 (or input port 510) for receiving data and a receiving unit (receiver unit, Rx) 520; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 530;
- the processor 530 here can be a neural network processor 530; a sending unit (transmitter unit, Tx) 540 and an output port 550 (or output port 550) for transmitting data; memory 560.
- the video decoding device 500 may also include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the input port 510, the receiving unit 520, the transmitting unit 540 and the output port 550, For the exit or entrance of optical or electrical signals.
- OE optical-to-electrical
- EO electrical-to-optical
- the processor 530 is realized by hardware and software.
- Processor 530 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
- Processor 530 is in communication with ingress port 510 , receiving unit 520 , transmitting unit 540 , egress port 550 and memory 560 .
- Processor 530 includes a decoding module 570 (eg, a neural network based decoding module 570).
- the decoding module 570 implements the embodiments disclosed above. For example, the decode module 570 performs, processes, prepares, or provides for various encoding operations.
- the decoding module 570 is implemented as instructions stored in the memory 560 and executed by the processor 530 .
- Memory 560 including one or more magnetic disks, tape drives, and solid-state drives, may be used as an overflow data storage device for storing programs when such programs are selected for execution, and for storing instructions and data that are read during execution of the programs.
- Memory 560 may be volatile and/or nonvolatile, and may be a read-only memory (ROM), random access memory (random access memory, RAM), ternary content-addressable memory (ternary content-addressable memory (TCAM) and/or static random-access memory (static random-access memory, SRAM).
- ROM read-only memory
- RAM random access memory
- TCAM ternary content-addressable memory
- SRAM static random-access memory
- FIG. 6 is an exemplary block diagram of an apparatus 600 provided in an embodiment of the present application.
- the apparatus 600 may be used as either or both of the source device 12 and the destination device 14 in FIG. 1 a .
- Processor 602 in apparatus 600 may be a central processing unit.
- processor 602 may be any other type of device or devices, existing or to be developed in the future, capable of manipulating or processing information. While the disclosed implementations can be implemented using a single processor, such as processor 602 as shown, it is faster and more efficient to use more than one processor.
- memory 604 in apparatus 600 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 604 .
- Memory 604 may include code and data 606 accessed by processor 602 via bus 612 .
- Memory 604 may also include an operating system 608 and application programs 610, including at least one program that allows processor 602 to perform the methods described herein.
- application programs 610 may include applications 1 through N, and also include a video coding application that performs the methods described herein.
- Apparatus 600 may also include one or more output devices, such as display 618 .
- display 618 may be a touch-sensitive display that combines the display with touch-sensitive elements that may be used to sense touch input.
- Display 618 may be coupled to processor 602 via bus 612 .
- bus 612 in device 600 is described herein as a single bus, bus 612 may include multiple buses. Additionally, secondary storage may be directly coupled to other components of device 600 or accessed over a network, and may comprise a single integrated unit such as a memory card or multiple units such as multiple memory cards. Accordingly, apparatus 600 may have a wide variety of configurations.
- Neural network (neural network, NN) is a machine learning model.
- a neural network can be composed of neural units.
- a neural unit can refer to a computing unit that takes xs and intercept 1 as input.
- the output of the computing unit can be:
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
- the activation function may be a sigmoid function.
- a neural network is a network formed by connecting many of the above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
- the local receptive field can be an area composed of several neural units.
- a deep neural network also known as a multi-layer neural network
- DNN can be understood as a neural network with many hidden layers, and there is no special metric for the "many” here.
- the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
- the first layer is the input layer
- the last layer is the output layer
- the layers in the middle are all hidden layers.
- the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- the coefficient of the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as It should be noted that the input layer has no W parameter.
- more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
- Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
- CNN Convolutional neural network
- a convolutional neural network consists of a feature extractor consisting of convolutional and pooling layers. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolutional feature map.
- the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
- the convolution layer can include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
- the convolution operator can essentially Is a weight matrix, this weight matrix is usually pre-defined, in the process of convolution operation on the image, the weight matrix is usually along the horizontal direction of the input image pixel by pixel (or two pixels by two pixels... ...This depends on the value of the stride) to complete the work of extracting specific features from the image.
- the size of the weight matrix should be related to the size of the image.
- the depth dimension of the weight matrix is the same as the depth dimension of the input image.
- the weight matrix will be extended to The entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolutional output with a single depth dimension, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row ⁇ column) are applied, That is, multiple matrices of the same shape.
- the output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image.
- one weight matrix is used to extract image edge information
- another weight matrix is used to extract specific colors of the image
- another weight matrix is used to filter unwanted noise in the image. Do blurring etc.
- the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple feature maps of the same size are combined to form the convolution operation. output.
- the weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network can make correct predictions.
- the initial convolutional layer often extracts more general features, which can also be called low-level features; as the depth of the convolutional neural network deepens,
- the features extracted by the later convolutional layers become more and more complex, such as high-level semantic features, and the higher semantic features are more suitable for the problem to be solved.
- pooling layer After a convolutional layer. It can be a convolutional layer followed by a pooling layer, or a multi-layer convolutional layer followed by a pooling layer. layer or multiple pooling layers.
- the sole purpose of pooling layers is to reduce the spatial size of the image.
- the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling an input image to obtain an image of a smaller size.
- the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
- the maximum pooling operator can take the pixel with the largest value within a specific range as the result of maximum pooling.
- the operators in the pooling layer should also be related to the size of the image.
- the size of the image output after being processed by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
- the convolutional neural network After being processed by the convolutional layer/pooling layer, the convolutional neural network is not enough to output the required output information. Because as mentioned earlier, the convolutional layer/pooling layer only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network needs to use the neural network layer to generate an output of one or a set of required classes. Therefore, the neural network layer can include multiple hidden layers, and the parameters contained in the multi-layer hidden layers can be pre-trained according to the relevant training data of the specific task type. For example, the task type can include image recognition, Image classification, image super-resolution reconstruction and more.
- the output layer of the entire convolutional neural network is also included.
- This output layer has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
- Recurrent neural networks are used to process sequence data.
- the layers are fully connected, and each node in each layer is unconnected.
- RNN Recurrent neural networks
- this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
- RNN can process sequence data of any length.
- the training of RNN is the same as that of traditional CNN or DNN.
- the error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is expanded to the network, then the parameters, such as W, are shared; while the above-mentioned traditional neural network is not the case.
- the output of each step depends not only on the network of the current step, but also depends on the state of the previous several steps of the network. This learning algorithm is called Back propagation Through Time (BPTT) based on time.
- BPTT Back propagation Through Time
- the convolutional neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial super-resolution model by backpropagating the error loss information, so that the error loss converges.
- the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
- GAN Generative adversarial networks
- the model includes at least two modules: one module is a Generative Model, and the other is a Discriminative Model. These two modules learn from each other through games to produce better output.
- Both the generative model and the discriminative model can be neural networks, specifically deep neural networks or convolutional neural networks.
- the basic principle of GAN is as follows: Taking the GAN that generates pictures as an example, suppose there are two networks, G (Generator) and D (Discriminator), where G is a network that generates pictures, which receives a random noise z, and passes this noise Generate a picture, denoted as G(z); D is a discriminant network, used to determine whether a picture is "real".
- Its input parameter is x
- x represents a picture
- the output D(x) represents the probability that x is a real picture. If it is 1, it means that 100% is a real picture. If it is 0, it means that it cannot be real. picture.
- the goal of the generation network G is to generate real pictures as much as possible to deceive the discriminant network D
- the goal of the discriminant network D is to distinguish the pictures generated by G from the real pictures as much as possible. Come. In this way, G and D constitute a dynamic "game” process, which is the "confrontation" in the "generative confrontation network”.
- Fig. 7a is a schematic diagram of an application scenario provided by the embodiment of the present application.
- the application scenario is that the device acquires data, compresses the acquired data, and then stores the compressed data.
- the device may integrate the functions of the aforementioned source device and destination device.
- the device acquires data.
- the device compresses the data to obtain the compressed data.
- the device stores compressed data.
- the device compresses the data to save storage space.
- the device can store the compressed data in an album or cloud album.
- the device decompresses the compressed data to obtain the data.
- Figure 7b is a schematic diagram of an application scenario provided by the embodiment of the present application.
- the application scenario is that the source device obtains data, compresses the obtained data to obtain compressed data, and then sends the compressed data to the destination device.
- the source device may perform compression processing on the obtained data and then transmit it to the destination device, which can reduce the transmission bandwidth.
- the source device acquires data.
- the source device compresses the data to obtain compressed data.
- the source device sends compressed data to the destination device.
- the source device compresses the data before transmission, which can reduce the transmission bandwidth and improve the transmission efficiency.
- the destination device decompresses the compressed data to obtain data.
- FIG. 8 is a flowchart of an encoding and decoding method 800 provided in an embodiment of the present application.
- the codec method 800 may be performed by an encoder and a decoder.
- the codec method 800 is described as a series of steps or operations. It should be understood that the codec method 800 may be executed in various orders and/or concurrently, and is not limited to the execution order shown in FIG. 8 .
- the codec method 800 may include:
- Step 801 the encoder acquires data to be encoded.
- the encoder obtains data x to be encoded.
- Step 802 the encoder inputs the data to be encoded into the first encoding network to obtain target parameters.
- the target parameter may be the parameter weight of all or part of the convolution and nonlinear activation of the second encoding network.
- the first encoding network may include a convolution kernel generator (convolution or fully connected group), and the convolution kernel generator is used to generate target parameters according to the data to be encoded.
- a convolution kernel generator convolution or fully connected group
- the encoder inputs the data x to be encoded into the first encoding network to obtain the target parameter ⁇ g .
- Step 803 the encoder constructs a second encoding network according to the target parameters.
- the encoder constructs the second encoding network g a (x; ⁇ g ) according to the target parameter ⁇ g .
- Step 804 the encoder inputs the data to be encoded into the second encoding network to obtain the first feature.
- the first feature is used to reconstruct the data to be encoded, and the first feature may also be called a content feature.
- the first feature may be a three-dimensional feature map of the data x to be encoded.
- the encoder inputs the data x to be encoded into the second encoding network g a (x; ⁇ g ) to obtain the first feature y.
- Step 805 the encoder encodes the first feature to obtain an encoded code stream (that is, a code stream to be decoded).
- the encoder encodes the first feature to obtain an encoded code stream, which may include: the encoder first rounds the first feature to obtain the rounded value of the first feature; then the encoder rounds the first feature Probability estimation is performed on the rounded value of a feature to obtain an estimated probability distribution of the rounded value of the first feature; then the encoder performs entropy encoding on the rounded value of the first feature according to the estimated probability distribution of the rounded value of the first feature to obtain Get the encoded code stream.
- the rounded value of the first feature may be referred to as a first value feature or a content rounded feature.
- the encoder first rounds the first feature y to obtain the rounded value of the first feature Then the encoder rounds the integer value of the first feature Probability estimation to get an estimated probability distribution for the rounded values of the first feature Then the encoder estimates the probability distribution according to the rounded value of the first feature rounded value for the first feature Entropy encoding is performed to obtain an encoded code stream.
- the encoder performs probability estimation on the rounded value of the first feature to obtain an estimated probability distribution of the rounded value of the first feature, which may include: the encoder performs probability estimation on the rounded value of the first feature according to the first information Estimate to obtain an estimated probability distribution of rounded values of the first feature.
- the first information includes at least one item of context information and side information.
- estimating the probability distribution through context information and side information can improve the accuracy of the estimated probability distribution, thereby reducing the code rate in the entropy coding process and reducing the entropy coding overhead.
- Step 806 the encoder sends the coded code stream to the decoder.
- Step 807 the decoder decodes the encoded code stream to obtain the integer value of the first feature.
- the decoder decodes the coded code stream to obtain the rounded value of the first feature, which may include: the decoder first performs probability estimation on the rounded value of the first feature in the coded code stream to obtain The estimated probability distribution of the rounded value of the first feature; then, the decoder performs entropy decoding on the encoded code stream according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- Step 808 the decoder inputs the integer value of the first feature into the decoding network to obtain decoded data.
- the decoder takes the rounded value of the first feature input decoding network get decoded data Among them, the decoded data satisfy Parameter weights for full or partial convolutions and nonlinear activations of the decoding network.
- the encoding network (ie, the second encoding network) uses fixed parameter weights to extract the content features (ie, the first feature) of the data to be encoded, and then encodes the content features into a code stream (ie, the encoded code stream) and sends it to the decoder.
- the decoding end decodes and reconstructs the code stream to obtain decoded data. It can be seen that the parameter weights of the encoding network in the prior art are not related to the data to be encoded.
- the data to be encoded is first input into the first encoding network, and then the first encoding network generates parameter weights of the second encoding network according to the data to be encoded, and then dynamically adjusts the second encoding network according to the obtained weights.
- the parameter weight of the encoding network makes the parameter weight of the second encoding network related to the data to be encoded, which increases the expressive ability of the second encoding network, and enables the decoding end to obtain the decoded data reconstructed by decoding the code stream through the first feature encoding.
- the encoded data is closer, which improves the rate-distortion performance of the codec network.
- the codec method 800 provided in the embodiment of the present application may be applicable to the codec system shown in FIG. 9 .
- the codec system includes a first encoding network 901 , a second encoding network 902 , a rounding module 903 , an entropy estimation network 904 , an entropy encoding module 905 , an entropy decoding module 906 , and a decoding network 907 .
- the data to be encoded is first input into the first encoding network 901 to obtain the target parameters, and then the parameters of the second encoding network 902 are adjusted by the target parameters (that is, all convolutions of the second encoding network 902 are adjusted by the target parameters or parameter weights for partial convolutions and nonlinear activations).
- the data to be encoded is input into the second encoding network 902 to obtain the first features.
- the rounding module 903 rounds the first feature to obtain a rounded value of the first feature.
- the entropy estimation network 904 performs probability estimation on the rounded value of the first feature to obtain an estimated probability distribution of the rounded value of the first feature.
- the entropy coding module 905 performs entropy coding on the rounded value of the first feature according to the estimated probability distribution of the rounded value of the first feature to obtain a coded code stream.
- the entropy decoding module 906 performs entropy decoding on the encoded code stream according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- FIG. 10 is a flowchart of an encoding and decoding method 1000 provided in an embodiment of the present application.
- the codec method 1000 can be performed by an encoder and a decoder.
- the codec method 1000 is described as a series of steps or operations. It should be understood that the codec method 1000 may be executed in various orders and/or concurrently, and is not limited to the execution order shown in FIG. 10 .
- the codec method 1000 may include:
- Step 1001 the encoder acquires data to be encoded.
- Step 1002 the encoder inputs the data to be encoded into the second encoding network to obtain the first features.
- the first feature is used to reconstruct the data to be encoded.
- Step 1003 the encoder inputs the data to be encoded into the first encoding network to obtain the second feature.
- the second feature is used to reconstruct the target parameter
- the second feature may also be called a model feature
- the target parameter is the parameter weight of all or part of the convolution and nonlinear activation of the second decoding network.
- the encoder can also divide the first feature into two parts (first sub-feature and second sub-feature) from the channel dimension, and one part is used to reconstruct the data to be encoded (first sub-feature feature), and the other part is used to reconstruct the target parameters (the second sub-feature). The encoder then feeds the second sub-features into the first encoding network to obtain the second features.
- the second sub-feature before the second sub-feature is input into the third encoding network, the second sub-feature may be converted through a convolutional and fully connected network.
- the second sub-feature before conversion may be called an initial model feature
- the second sub-feature after conversion may be called a model feature.
- Step 1004 the encoder encodes the first feature to obtain a first code stream to be decoded.
- Step 1005 the encoder encodes the second feature to obtain a second code stream to be decoded.
- the encoder may encode the first feature and the second feature to obtain a code stream to be decoded.
- Step 1006 the encoder sends the first code stream to be decoded and the second code stream to be decoded to the decoder.
- Step 1007 the decoder decodes the first code stream to be decoded to obtain the integer value of the first feature.
- the decoder decoding the first code stream to be decoded to obtain the rounded value of the first feature may include: the decoder performing Probability estimation to obtain an estimated probability distribution of the rounded value of the first feature; performing entropy decoding on the code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the first feature The rounded value of .
- the probability estimation of the rounded value of the first feature in the first code stream to be decoded to obtain the estimated probability distribution of the rounded value of the first feature includes: according to The first information performs probability estimation on the rounded value of the first feature in the first code stream to be decoded to obtain the estimated probability distribution of the rounded value of the first feature, and the first information includes context information and edge at least one of the information.
- Step 1008 the decoder decodes the second code stream to be decoded to obtain the integer value of the second feature.
- the rounded value of the second feature may also be referred to as a model rounded feature.
- the decoder decodes the second code stream to be decoded to obtain the rounded value of the second feature, including: the decoder rounds the second feature in the second code stream to be decoded value to obtain the estimated probability distribution of the rounded value of the second feature; perform entropy decoding on the second code stream to be decoded according to the estimated probability distribution of the rounded value of the second feature to obtain the The rounded value of the second feature.
- the probability estimation of the rounded value of the second feature in the second code stream to be decoded to obtain the estimated probability distribution of the rounded value of the second feature includes: according to The first information performs probability estimation on the rounded value of the second feature in the second code stream to be decoded to obtain the estimated probability distribution of the rounded value of the second feature, and the first information includes context information and edge at least one of the information.
- Step 1009 the decoder inputs the rounded value of the second feature into the first decoding network to obtain the target parameter.
- Step 1010 the decoder builds a second decoding network according to the target parameters.
- Step 1011 the decoder inputs the integer value of the first feature into the second decoding network to obtain decoded data.
- the decoding network ie, the second decoding network
- the content features and model features (i.e., the first feature and the second feature) of the data to be decoded are compiled into a code stream to be decoded, and then the decoding end obtains the rounded value of the second feature by decoding the code stream to be decoded , by inputting the rounded value of the second feature into the first decoding network to obtain the parameter weight of the second decoding network, and then dynamically adjusting the parameter weight of the second decoding network according to the parameter weight, so that the parameter weight of the second decoding network is equal to that to be
- the correlation of the decoded data improves the expressive ability of the second decoding network, and makes the decoded data obtained by decoding and reconstruction of the second decoding network closer to the data to be encoded, thereby improving the rate-distortion performance of the encoding and decoding network.
- the encoding and decoding method 1000 provided in the embodiment of the present application may be applicable to the encoding and decoding system described in FIG. 11 .
- the codec system includes a first encoding network 1101, a second encoding network 1102, a first rounding module 1103, a second rounding module 1104, an entropy estimation network 1105, a first entropy encoding module 1106, a second Two entropy coding module 1107 , first entropy decoding module 1108 , second entropy decoding module 1109 , first decoding network 1110 , and second decoding network 1111 .
- the data to be encoded is first input into the second encoding network 1102 to obtain the first feature, and then the data to be encoded is input to the first encoding network 1101 to obtain the second feature.
- the first rounding module 1103 rounds the first feature to obtain a rounded value of the first feature.
- the second rounding module 1104 rounds the second feature to obtain a rounded value of the second feature.
- the entropy estimation network 1105 first performs probability estimation on the rounded value of the first feature to obtain the estimated probability distribution of the rounded value of the first feature, and then performs probability estimation on the rounded value of the second feature to obtain the rounded value of the second feature Estimated probability distribution of values.
- the first entropy coding module 1106 performs entropy coding on the rounded value of the first feature according to the estimated probability distribution of the rounded value of the first feature to obtain a first code stream to be decoded.
- the second entropy coding module 1107 performs entropy coding on the rounded value of the second feature according to the estimated probability distribution of the rounded value of the second feature to obtain a second code stream to be decoded.
- the first entropy decoding module 1108 performs entropy decoding on the first code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- the second entropy decoding module 1109 performs entropy decoding on the second code stream to be decoded according to the estimated probability distribution of the rounded value of the second feature to obtain the rounded value of the second feature.
- the rounded value of the second feature is first input into the first decoding network 1110 to obtain the target parameters, and then the parameters of the second decoding network 1111 are adjusted according to the target parameters (that is, all convolutions or partial convolutions of the second decoding network 1111 are adjusted by the target parameters Product and parameter weights for nonlinear activations).
- the integer value of the first feature is input into the second decoding network 1111 to obtain decoded data.
- the encoding and decoding method 1000 provided in the embodiment of the present application may also be applicable to the encoding and decoding system described in FIG. 12 .
- the codec system includes a first encoding network 1201, a second encoding network 1202, a sub-channel module 1203, a first rounding module 1204, a second rounding module 1205, an entropy estimation network 1206, a first entropy An encoding module 1207 , a second entropy encoding module 1208 , a first entropy decoding module 1209 , a second entropy decoding module 1210 , a first decoding network 1211 , and a second decoding network 1212 .
- the data to be encoded is first input into the second encoding network 1102 to obtain the first feature.
- the first feature is input to the channel division module 1203 and divided into the first sub-feature and the second sub-feature in the channel dimension.
- the second sub-features are input into the first encoding network 1201 to obtain the second features.
- the first rounding module 1204 rounds the first sub-feature to obtain the rounded value of the first feature.
- the second rounding module 1205 rounds the second feature to obtain a rounded value of the second feature.
- the entropy estimation network 1206 first performs probability estimation on the rounded value of the first feature to obtain the estimated probability distribution of the rounded value of the first feature, and then performs probability estimation on the rounded value of the second feature to obtain the rounded value of the second feature Estimated probability distribution of values.
- the first entropy coding module 1207 performs entropy coding on the rounded value of the first feature according to the estimated probability distribution of the rounded value of the first feature to obtain a first code stream to be decoded.
- the second entropy coding module 1208 performs entropy coding on the rounded value of the second feature according to the estimated probability distribution of the rounded value of the second feature to obtain a second code stream to be decoded.
- the first entropy decoding module 1209 performs entropy decoding on the first code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- the second entropy decoding module 1210 performs entropy decoding on the second code stream to be decoded according to the estimated probability distribution of the rounded value of the second feature to obtain the rounded value of the second feature.
- the rounded value of the second feature is first input into the first decoding network 1211 to obtain the target parameters, and then the parameters of the second decoding network 1212 are adjusted according to the target parameters (that is, all or part of the convolutions of the second decoding network 1212 are adjusted by the target parameters.
- Product and parameter weights for nonlinear activations are adjusted according to the target parameters.
- the rounded value of the first feature is input into the second decoding network 1212 to obtain decoded data.
- FIG. 13 is a flowchart of a codec method 1300 provided in an embodiment of the present application.
- the codec method 1300 can be performed by an encoder and a decoder.
- the codec method 1300 is described as a series of steps or operations. It should be understood that the codec method 1300 may be executed in various orders and/or concurrently, and is not limited to the execution order shown in FIG. 13 .
- the codec method 1300 may include:
- Step 1301 the encoder acquires data to be encoded.
- Step 1302 the encoder inputs the data to be encoded into the encoding network to obtain the first feature.
- Step 1303 the encoder encodes the first feature to obtain a code stream to be decoded.
- Step 1304 the encoder sends the code stream to be decoded to the decoder.
- Step 1305 the decoder decodes the to-be-decoded code stream to obtain the integer value of the first feature.
- the decoder decoding the code stream to be decoded to obtain the rounded value of the first feature may include: the decoder performs probability estimation on the rounded value of the first feature in the code stream to be decoded to obtain Obtain an estimated probability distribution of the rounded value of the first feature; perform entropy decoding on the code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature value.
- performing probability estimation on the rounded value of the first feature in the code stream to be decoded to obtain an estimated probability distribution of the rounded value of the first feature includes: according to the first information Performing probability estimation on the rounded value of the first feature in the code stream to be decoded to obtain an estimated probability distribution of the rounded value of the first feature, the first information includes at least one of context information and side information item.
- Step 1306 the decoder inputs the integer value of the first feature into the first decoding network to obtain the target parameter.
- the first decoding network may include a convolution kernel generator (convolution or fully connected group), and the convolution kernel generator is used to generate target parameters according to the rounded value of the first feature of the data to be encoded.
- a convolution kernel generator convolution or fully connected group
- Step 1307 the decoder builds a second decoding network according to the target parameters.
- Step 1308 the decoder inputs the integer value of the first feature into the second decoding network to obtain decoded data.
- the decoding network ie, the second decoding network
- the rounded value of the first feature is obtained by decoding the code stream to be decoded encoded by the feature of the data to be decoded (that is, the first feature), and the rounded value of the first feature is input into the first decoding network to obtain
- the parameter weight of the second decoding network is obtained, and then the parameter weight of the second decoding network is dynamically adjusted according to the parameter weight, so that the parameter weight of the second decoding network is related to the data to be decoded, which improves the expressive ability of the second decoding network and makes the second decoding network
- the decoded data obtained by decoding and reconstruction of the binary decoding network is closer to the data to be encoded, thereby improving the rate-distortion performance of the encoding and decoding network.
- the codec method 1300 provided in the embodiment of the present application may be applicable to the codec system described in FIG. 14 .
- the encoding and decoding system includes an encoding network 1401 , a rounding module 1402 , an entropy estimation network 1403 , an entropy encoding module 1404 , an entropy decoding module 1405 , a first decoding network 1406 , and a second decoding network 1407 .
- the data to be encoded is first input into the encoding network 1401 to obtain the first feature.
- the rounding module 1402 rounds the first feature to obtain a rounded value of the first feature.
- the entropy estimation network 1403 performs probability estimation on the rounded value of the first feature to obtain an estimated probability distribution of the rounded value of the first feature.
- the entropy coding module 1404 performs entropy coding on the rounded value of the first feature according to the estimated probability distribution of the rounded value of the first feature to obtain a code stream to be decoded.
- the entropy decoding module 1405 performs entropy decoding on the code stream to be decoded according to the estimated probability distribution of the rounded value of the first feature to obtain the rounded value of the first feature.
- FIG. 15 is a schematic diagram of the performance of the encoding and decoding method provided by the embodiment of the present application.
- the coordinate system in FIG. 15 shows the peak signal to noise ratio (Peak Signal to Noise Ratio, PSNR) index used
- PSNR Peak Signal to Noise Ratio
- the embodiments of the present application and related technologies respectively perform encoding and decoding performance on the test set.
- the test set is the Kodak test set, and the Kodak test set includes 24 images in Portable Network Graphics (PNG) format.
- the resolution of the 24 images can be 768 ⁇ 512 or 512 ⁇ 768.
- the abscissa is bit rate (bits per pixel, BPP), and the ordinate is PSNR.
- BPP is the number of bits used to store each pixel, and the smaller the bit rate, the smaller the compression rate.
- PSNR is an objective standard for evaluating images, the higher the image quality, the better.
- Line segment A in the coordinate system shown in FIG. 15 represents the embodiment of the present application
- line segment B represents the related technology.
- the PSNR index of the embodiment of the present application is higher than that of the related art
- the BPP index of the embodiment of the present application is lower than that of the related art under the same picture compression quality (i.e. PSNR index).
- the rate-distortion performance of the embodiment of the present application is higher than that of the related technology, and the embodiment of the present application can improve the rate-distortion performance of the data encoding and decoding method.
- Applicable scenarios for the codec method provided by the embodiments of the present application include but are not limited to: electronic devices, cloud services, and video surveillance, all services involving the collection, storage, and transmission of data such as images, videos, and voices. (Such as electronic equipment taking pictures, video and audio, photo albums, cloud photo albums, video surveillance, video conferencing, model compression, etc.).
- Fig. 16 is a schematic diagram of an application scenario provided by the embodiment of the present application.
- the electronic device captures image data (data to be compressed) in the application scenario
- the captured image data (such as RAW, YUV format) , RGB format image data) into the AI coding unit of the electronic device.
- the AI encoding unit of the electronic device invokes the first encoding network, the second encoding network and the entropy estimation network to transform the image data into output features with lower redundancy and generate an estimated probability distribution of the output features.
- the arithmetic coding unit of the electronic device calls the entropy coding module to encode the output features into a data file according to the estimated probability distribution of the output features.
- the file saving unit saves the data file generated by the entropy encoding module to a corresponding storage location of the electronic device.
- the electronic device loads the data file from the corresponding storage location of the electronic device through the loading file unit, and inputs it into the arithmetic decoding unit, and the arithmetic decoding unit calls the entropy decoding module to decode the data file to obtain the output feature, and outputs
- the features are input to the AI decoding unit, and the AI decoding unit calls the first decoding network, the second decoding network and the third decoding network to inversely transform the output features, and parse the output features into image data (such as RGB image data), that is, compressed data.
- the AI coding unit and the AI decoding unit can be deployed in a neural network processing unit (neural network processing unit, NPU or a graphics processing unit, GPU) of an electronic device.
- the arithmetic decoding unit may be deployed in the CPU of the electronic device.
- FIG 17 is a schematic diagram of another application scenario provided by the embodiment of the present application.
- the electronic device directly uploads or encodes (such as JPEG encoding) the stored picture data (data to be compressed) to the On the cloud side (such as a server), the cloud side directly inputs or decodes the received picture data (such as JPEG decoding) and then inputs it into the AI coding unit on the cloud side.
- the AI coding unit on the cloud side calls the first coding network, the second coding network and An entropy estimation network transforms the image data into less redundant output features and produces an estimated probability distribution of the output features.
- the arithmetic coding unit on the cloud side calls the entropy coding module to encode the output features into data files according to the estimated probability distribution of the output features.
- the file saving unit saves the data file generated by the entropy encoding module to a corresponding storage location on the cloud side.
- the electronic device When the electronic device needs to use the data file, it sends a download request to the cloud side. After receiving the download request, the cloud side loads the data file from the corresponding storage location on the cloud side through the file loading unit, and inputs it into the arithmetic decoding unit.
- the unit calls the entropy decoding module to decode the data file to obtain the output features, and input the output features to the AI decoding unit, and the AI decoding unit calls the first decoding network, the second decoding network and the third decoding network to inversely transform the output features, and the output features parsed as image data.
- the cloud sends the image data (compressed data) directly or encodes it to the electronic device.
- a codec device for performing the above codec method will be introduced below with reference to FIG. 18 and FIG. 19 .
- the codec device includes hardware and/or software modules corresponding to each function.
- the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions in combination with the embodiments for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
- the embodiments of the present application may divide the functional modules of the codec device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
- the above integrated modules may be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
- FIG. 18 shows a possible composition diagram of the codec device involved in the above embodiment.
- the device 1800 may include: a transceiver unit 1801 and a processing unit 1802, where the processing unit 1802 can implement the methods performed by the encoding device, decoding device, encoder or decoder in the above method embodiments, and/or other processes for the technologies described herein.
- the apparatus 1800 may include a processing unit, a storage unit, and a communication unit.
- the processing unit may be used to control and manage the actions of the apparatus 1800, for example, may be used to support the apparatus 1800 to execute the steps performed by the above-mentioned units.
- the storage unit may be used to support the device 1800 to execute stored program codes, and/or data, and the like.
- the communication unit may be used to support communication of the apparatus 1800 with other devices.
- the processing unit may be a processor or a controller. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
- the processor can also be a combination of computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (digital signal processing, DSP) and a microprocessor, and the like.
- the storage unit may be a memory.
- the communication unit may be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, and a Wi-Fi chip.
- the codec apparatus involved in this embodiment of the present application may be an apparatus 1900 having the structure shown in FIG. 19 , where the apparatus 1900 includes a processor 1901 and a transceiver 1902 .
- the transceiver unit 1801 and the processing unit 1802 in FIG. 18 may be implemented by the processor 1901 .
- the apparatus 1900 may further include a memory 1903, and the processor 1901 and the memory 1903 communicate with each other through an internal connection path.
- the relevant functions implemented by the storage unit in FIG. 18 may be implemented by the memory 1903 .
- the embodiment of the present application also provides a computer storage medium, where computer instructions are stored in the computer storage medium, and when the computer instructions are run on the electronic device, the electronic device is made to execute the above-mentioned related method steps to implement the codec method in the above-mentioned embodiment .
- An embodiment of the present application also provides a computer program product, which, when running on a computer, causes the computer to execute the above-mentioned related steps, so as to implement the encoding and decoding method in the above-mentioned embodiments.
- the embodiment of the present application also provides a codec device, which may specifically be a chip, an integrated circuit, a component, or a module.
- the device may include a connected processor and a memory for storing instructions, or the device may include at least one processor for fetching instructions from an external memory.
- the processor can execute instructions, so that the chip executes the encoding and decoding methods in the above method embodiments.
- FIG. 20 shows a schematic structural diagram of a chip 2000 .
- the chip 2000 includes one or more processors 2001 and an interface circuit 2002 .
- the above-mentioned chip 2000 may further include a bus 2003 .
- the processor 2001 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above encoding method may be completed by an integrated logic circuit of hardware in the processor 2001 or instructions in the form of software.
- the above-mentioned processor 2001 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP) device, an integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array) , FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processing
- ASIC application specific integrated circuit
- FPGA field-programmable gate array
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the interface circuit 2002 can be used for sending or receiving data, instructions or information.
- the processor 2001 can process the data, instructions or other information received by the interface circuit 2002, and can send the processing completion information through the interface circuit 2002.
- the chip further includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
- a portion of the memory may also include non-volatile random access memory (non-volatile random access memory, NVRAM).
- the memory stores executable software modules or data structures
- the processor can execute corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
- the chip may be used in the electronic device or DOP involved in the embodiment of the present application.
- the interface circuit 2002 may be used to output the execution result of the processor 2001.
- processor 2001 and the interface circuit 2002 can be realized by hardware design, software design, or a combination of software and hardware, which is not limited here.
- the electronic device, computer storage medium, computer program product or chip provided in this embodiment is all used to execute the corresponding method provided above, therefore, the beneficial effects it can achieve can refer to the corresponding method provided above The beneficial effects in the method will not be repeated here.
- sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application.
- the implementation process constitutes any limitation.
- the disclosed systems, devices and methods may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the above units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (34)
- 一种编码方法,其特征在于,所述方法包括:获取待编码数据;将所述待编码数据输入第一编码网络以得到目标参数;根据所述目标参数构建第二编码网络;将所述待编码数据输入所述第二编码网络以得到第一特征;对所述第一特征进行编码以得到编码码流。
- 根据权利要求1所述的方法,其特征在于,所述目标参数为所述第二编码网络的全部卷积或部分卷积和非线性激活的参数权重。
- 根据权利要求1或2所述的方法,其特征在于,所述对所述第一特征进行编码以得到编码码流,包括:对所述第一特征进行取整以得到第一特征的取整值;对所述第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布;根据所述第一特征的取整值的估计概率分布对所述第一特征的取整值进行熵编码以得所述编码码流。
- 根据权利要求3所述的方法,其特征在于,所述对所述第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,包括:根据第一信息对所述第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 一种解码方法,其特征在于,所述方法包括:获取待解码码流;对所述待解码码流进行解码以得到第一特征的取整值和第二特征的取整值,所述第一特征的取整值用于获得解码数据,所述第二特征的取整值用于获得目标参数;将所述第二特征的取整值输入第一解码网络以得到所述目标参数;根据所述目标参数构建第二解码网络;将所述第一特征的取整值输入所述第二解码网络以得到解码数据。
- 根据权利要求5所述的方法,其特征在于,所述目标参数为所述第二解码网络的全部卷积或部分卷积和非线性激活的参数权重。
- 根据权利要求5或6所述的方法,其特征在于,所述待解码码流包括第一待解码码流和第二待解码码流,所述对所述待解码码流进行解码以得到第一特征的取整值和第二特征的取整值,包括:对所述第一待解码码流进行解码以得到所述第一特征的取整值;对所述第二待解码码流进行解码以得到所述第二特征的取整值。
- 根据权利要求7所述的方法,其特征在于,所述对所述第一待解码码流进行解码以得到所述第一特征的取整值,包括:对所述第一待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布;根据所述第一特征的取整值的估计概率分布对所述第一待解码码流进行熵解码以得到所述第一特征的取整值。
- 根据权利要求8所述的方法,其特征在于,所述对所述第一待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,包括:根据第一信息对所述第一待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 根据权利要求7至9中任一项所述的方法,其特征在于,所述对所述第二待解码码流进行解码以得到所述第二特征的取整值,包括:对所述第二待解码码流中的第二特征的取整值进行概率估计以得到所述第二特征的取整值的估计概率分布;根据所述第二特征的取整值的估计概率分布对所述第二待解码码流进行熵解码以得到所述第二特征的取整值。
- 根据权利要求10所述的方法,其特征在于,所述对所述第二待解码码流中的第二特征的取整值进行概率估计以得到所述第二特征的取整值的估计概率分布,包括:根据第一信息对所述第二待解码码流中的第二特征的取整值进行概率估计以得到所述第二特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 一种解码方法,其特征在于,所述方法包括:获取待解码码流;对所述待解码码流进行解码以得到第一特征的取整值,所述第一特征的取整值用于获得解码数据和目标参数;将所述第一特征的取整值输入第一解码网络以得到目标参数;根据所述目标参数构建第二解码网络;将所述第一特征的取整值输入到所述第二解码网络以得到解码数据。
- 根据权利要求12所述的方法,其特征在于,所述目标参数为所述第二解码网络的全部卷积或部分卷积和非线性激活的参数权重。
- 根据权利要求12或13所述的方法,其特征在于,所述对所述待解码码流进行解码以得到所述第一特征的取整值,包括:对所述待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布;根据所述待第一特征的取整值的估计概率分布对所述待解码码流进行熵解码以得到所述第一特征的取整值。
- 根据权利要求14所述的方法,其特征在于,所述对所述待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,包括:根据第一信息对所述待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 一种编码装置,其特征在于,包括处理电路,所述处理电路用于:获取待编码数据;将所述待编码数据输入第一编码网络以得到目标参数;根据所述目标参数构建第二编码网络;将所述待编码数据输入所述第二编码网络以得到第一特征;对所述第一特征进行编码以得到编码码流。
- 根据权利要求16所述的装置,其特征在于,所述目标参数为所述第二编码网络的全部卷积或部分卷积和非线性激活的参数权重。
- 根据权利要求16或17所述的装置,其特征在于,所述处理电路具体用于:对所述第一特征进行取整以得到第一特征的取整值;对所述第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布;根据所述第一特征的取整值的估计概率分布对所述第一特征的取整值进行熵编码以得所述编码码流。
- 根据权利要求18所述的装置,其特征在于,所述处理电路具体用于:根据第一信息对所述第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 一种解码装置,其特征在于,包括处理电路,所述处理电路用于:获取待解码码流;对所述待解码码流进行解码以得到第一特征的取整值和第二特征的取整值,所述第一特征的取整值用于获得解码数据,所述第二特征的取整值用于获得目标参数;将所述第二特征的取整值输入第一解码网络以得到所述目标参数;根据所述目标参数构建第二解码网络;将所述第一特征的取整值输入所述第二解码网络以得到解码数据。
- 根据权利要求20所述的装置,其特征在于,所述目标参数为所述第二解码网络的全部卷积或部分卷积和非线性激活的参数权重。
- 根据权利要求20或21所述的装置,其特征在于,所述待解码码流包括第一待解码码流和第二待解码码流,所述处理电路具体用于:对所述第一待解码码流进行解码以得到所述第一特征的取整值;对所述第二待解码码流进行解码以得到所述第二特征的取整值。
- 根据权利要求22所述的装置,其特征在于,所述处理电路具体用于:对所述第一待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布;根据所述第一特征的取整值的估计概率分布对所述第一待解码码流进行熵解码以得到所述第一特征的取整值。
- 根据权利要求23所述的装置,其特征在于,所述处理电路具体用于:根据第一信息对所述第一待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 根据权利要求22至24中任一项所述的装置,其特征在于,所述处理电路具体用于:对所述第二待解码码流中的第二特征的取整值进行概率估计以得到所述第二特征的取整值的估计概率分布;根据所述第二特征的取整值的估计概率分布对所述第二待解码码流进行熵解码以得到所述第二特征的取整值。
- 根据权利要求25所述的装置,其特征在于,所述处理电路具体用于:根据第一信息对所述第二待解码码流中的第二特征的取整值进行概率估计以得到所述第二特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 一种解码装置,其特征在于,包括处理电路,所述处理电路用于:获取待解码码流;对所述待解码码流进行解码以得到第一特征的取整值,所述第一特征的取整值用于获得解码数据和目标参数;将所述第一特征的取整值输入第一解码网络以得到目标参数;根据所述目标参数构建第二解码网络;将所述第一特征的取整值输入到所述第二解码网络以得到解码数据。
- 根据权利要求27所述的装置,其特征在于,所述目标参数为所述第二解码网络的全部卷积或部分卷积和非线性激活的参数权重。
- 根据权利要求或27或28所述的装置,其特征在于,所述处理电路具体用于:对所述待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布;根据所述待第一特征的取整值的估计概率分布对所述待解码码流进行熵解码以得到所述第一特征的取整值。
- 根据权利要求29所述的装置,其特征在于,所述处理电路具体用于:根据第一信息对所述待解码码流中的第一特征的取整值进行概率估计以得到所述第一特征的取整值的估计概率分布,所述第一信息包括上下文信息和边信息中的至少一项。
- 一种编码器,其特征在于,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述编码器执行根据权利要求1至4中任一项所述的方法。
- 一种解码器,其特征在于,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述编码器执行根据权利要求5至11或12至15中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1至4、5至11或12至15中任一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码, 当所述计算机程序代码在计算机上运行时,使得计算机执行权利要求1至4、5至11或12至15中任一项所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BR112024002242A BR112024002242A2 (pt) | 2021-08-05 | 2022-08-01 | Método e aparelho de codificação e decodificação, codificador, decodificador, meio de armazenamento legível por computador, e produto de programa de computador |
EP22852132.4A EP4373081A1 (en) | 2021-08-05 | 2022-08-01 | Encoding method and apparatus, and decoding method and apparatus |
KR1020247006856A KR20240039178A (ko) | 2021-08-05 | 2022-08-01 | 인코딩 및 디코딩 방법 그리고 장치 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110898667.8 | 2021-08-05 | ||
CN202110898667.8A CN115883831A (zh) | 2021-08-05 | 2021-08-05 | 编解码方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/433,054 Continuation US20240205435A1 (en) | 2021-08-05 | 2024-02-05 | Encoding and decoding method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023011420A1 true WO2023011420A1 (zh) | 2023-02-09 |
Family
ID=85154401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/109485 WO2023011420A1 (zh) | 2021-08-05 | 2022-08-01 | 编解码方法和装置 |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP4373081A1 (zh) |
KR (1) | KR20240039178A (zh) |
CN (1) | CN115883831A (zh) |
BR (1) | BR112024002242A2 (zh) |
WO (1) | WO2023011420A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117014610A (zh) * | 2023-10-07 | 2023-11-07 | 华侨大学 | 基于多任务学习的h.266vvc屏幕内容帧内cu快速划分方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110401836A (zh) * | 2018-04-25 | 2019-11-01 | 杭州海康威视数字技术股份有限公司 | 一种图像解码、编码方法、装置及其设备 |
CN110870310A (zh) * | 2018-09-04 | 2020-03-06 | 深圳市大疆创新科技有限公司 | 图像编码方法和装置 |
CN111641826A (zh) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | 对数据进行编码、解码的方法、装置与系统 |
WO2021063559A1 (en) * | 2019-09-30 | 2021-04-08 | Interdigital Vc Holdings France, Sas | Systems and methods for encoding a deep neural network |
JP2021072540A (ja) * | 2019-10-30 | 2021-05-06 | キヤノン株式会社 | 画像符号化装置、復号装置、伝送システム、及びその制御方法 |
-
2021
- 2021-08-05 CN CN202110898667.8A patent/CN115883831A/zh active Pending
-
2022
- 2022-08-01 KR KR1020247006856A patent/KR20240039178A/ko active Search and Examination
- 2022-08-01 BR BR112024002242A patent/BR112024002242A2/pt unknown
- 2022-08-01 WO PCT/CN2022/109485 patent/WO2023011420A1/zh active Application Filing
- 2022-08-01 EP EP22852132.4A patent/EP4373081A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110401836A (zh) * | 2018-04-25 | 2019-11-01 | 杭州海康威视数字技术股份有限公司 | 一种图像解码、编码方法、装置及其设备 |
CN110870310A (zh) * | 2018-09-04 | 2020-03-06 | 深圳市大疆创新科技有限公司 | 图像编码方法和装置 |
CN111641826A (zh) * | 2019-03-01 | 2020-09-08 | 杭州海康威视数字技术股份有限公司 | 对数据进行编码、解码的方法、装置与系统 |
WO2021063559A1 (en) * | 2019-09-30 | 2021-04-08 | Interdigital Vc Holdings France, Sas | Systems and methods for encoding a deep neural network |
JP2021072540A (ja) * | 2019-10-30 | 2021-05-06 | キヤノン株式会社 | 画像符号化装置、復号装置、伝送システム、及びその制御方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117014610A (zh) * | 2023-10-07 | 2023-11-07 | 华侨大学 | 基于多任务学习的h.266vvc屏幕内容帧内cu快速划分方法及装置 |
CN117014610B (zh) * | 2023-10-07 | 2023-12-29 | 华侨大学 | 基于多任务学习的h.266vvc屏幕内容帧内cu快速划分方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
KR20240039178A (ko) | 2024-03-26 |
BR112024002242A2 (pt) | 2024-04-30 |
CN115883831A (zh) | 2023-03-31 |
EP4373081A1 (en) | 2024-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020181997A1 (en) | An encoder, a decoder and corresponding methods for inter prediction | |
WO2022068716A1 (zh) | 熵编/解码方法及装置 | |
WO2021109978A1 (zh) | 视频编码的方法、视频解码的方法及相应装置 | |
AU2020261145B2 (en) | Picture prediction method and apparatus, and computer-readable storage medium | |
WO2020228560A1 (zh) | 候选运动矢量列表获取方法、装置及编解码器 | |
WO2022063265A1 (zh) | 帧间预测方法及装置 | |
WO2020244579A1 (zh) | Mpm列表构建方法、色度块的帧内预测模式获取方法及装置 | |
WO2020232845A1 (zh) | 一种帧间预测的方法和装置 | |
CN114125446A (zh) | 图像编码方法、解码方法和装置 | |
AU2024201357A1 (en) | Picture prediction method and apparatus, and computer-readable storage medium | |
WO2022111233A1 (zh) | 帧内预测模式的译码方法和装置 | |
US20230388490A1 (en) | Encoding method, decoding method, and device | |
WO2020253681A1 (zh) | 融合候选运动信息列表的构建方法、装置及编解码器 | |
WO2023011420A1 (zh) | 编解码方法和装置 | |
WO2021008524A1 (zh) | 图像编码方法、解码方法、装置和存储介质 | |
WO2020259567A1 (zh) | 视频编码器、视频解码器及相应方法 | |
US20230239500A1 (en) | Intra Prediction Method and Apparatus | |
WO2023020320A1 (zh) | 熵编解码方法和装置 | |
WO2021045657A9 (en) | Motion vector range derivation for enhanced interpolation filter | |
WO2020114393A1 (zh) | 变换方法、反变换方法以及视频编码器和视频解码器 | |
US20240205435A1 (en) | Encoding and decoding method and apparatus | |
WO2023160470A1 (zh) | 编解码方法和装置 | |
CN118042136A (zh) | 编解码方法和装置 | |
CN116647683A (zh) | 量化处理方法和装置 | |
CN116134817A (zh) | 使用稀疏光流表示的运动补偿 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22852132 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2024506874 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022852132 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112024002242 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2022852132 Country of ref document: EP Effective date: 20240213 |
|
ENP | Entry into the national phase |
Ref document number: 20247006856 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020247006856 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 112024002242 Country of ref document: BR Kind code of ref document: A2 Effective date: 20240202 |