WO2024007843A1 - 一种编解码方法、装置及计算机设备 - Google Patents

一种编解码方法、装置及计算机设备 Download PDF

Info

Publication number
WO2024007843A1
WO2024007843A1 PCT/CN2023/100760 CN2023100760W WO2024007843A1 WO 2024007843 A1 WO2024007843 A1 WO 2024007843A1 CN 2023100760 W CN2023100760 W CN 2023100760W WO 2024007843 A1 WO2024007843 A1 WO 2024007843A1
Authority
WO
WIPO (PCT)
Prior art keywords
boundary value
encoding
decoding
probability distribution
value
Prior art date
Application number
PCT/CN2023/100760
Other languages
English (en)
French (fr)
Other versions
WO2024007843A9 (zh
Inventor
师一博
黄允麒
王晶
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024007843A1 publication Critical patent/WO2024007843A1/zh
Publication of WO2024007843A9 publication Critical patent/WO2024007843A9/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present application relates to the field of image processing technology, and in particular to a coding and decoding method, device and computer equipment.
  • Image compression refers to a technology that utilizes image data characteristics such as spatial redundancy, visual redundancy, and statistical redundancy to losslessly or losslessly represent the original image pixel matrix with fewer bits to achieve effective image information Transmission and storage play an important role in the current media era where the types of image transmission information and the amount of data are increasing. Image compression is divided into two types: lossy compression and lossless compression. Lossy compression achieves a larger compression ratio at the expense of a certain degree of image quality reduction. Lossless compression does not cause the loss of image details, but the compression ratio is usually higher than lossy compression. compression.
  • Embodiments of the present application provide a coding and decoding method, device and computer equipment for simplifying the complexity of entropy coding and entropy decoding, and improving the speed of entropy coding and entropy decoding.
  • embodiments of the present application provide a coding method, which includes: inputting an image into a coding network to obtain a feature map of the image, where the feature map includes multiple elements; determining a first element in the feature map the boundary value of The boundary value of the element is determined; if the first element is within the target range corresponding to the first element, entropy encoding is performed on the first element; if the first element is within the target range corresponding to the first element outside the target range, modify the first element to the boundary value corresponding to the first element, and perform entropy coding on the modified first element.
  • the encoding end obtains the feature map of the image through the encoding network; the boundary value of each element in the feature map is determined, and the target range corresponding to each element is determined based on the boundary value of each element; if the element value in its Within the corresponding target range, the element is entropy coded; if the element value is outside its corresponding target range, the element value is modified to the boundary value corresponding to the element, and the modified element is entropy coded.
  • the corresponding boundary value is determined for each element in the above method, it helps to narrow the target range corresponding to each element, and the boundary value is at the element level, that is, the boundary value determined for an element only applies to that element, not applicable Compared with other elements, it can more accurately reflect the probability distribution characteristics of each element. Since the target range is narrowed, the complexity of the probability distribution of elements within the target range is reduced, thereby reducing the complexity of entropy coding and improving the speed of entropy coding. For elements that exceed the target range, existing entropy coding optimization methods do not perform entropy coding on the out-of-bounds elements, but directly write their values into the code stream, which is equivalent to using a special fixed-length code for encoding.
  • the boundary value includes an upper boundary value and/or a lower boundary value; and modifying the first element to a boundary value corresponding to the first element includes: if the first element If an element is greater than the upper boundary value, the first element is modified to the upper boundary value, or if the first element is smaller than the lower boundary value, the first element is modified to the lower boundary value. Boundary value.
  • determining the boundary value of the first element in the feature map includes: inputting the feature map into an entropy estimation network, and the entropy estimation network outputs Probability distribution model of the first element; determining the boundary value of the first element according to the probability distribution model of the first element.
  • the entropy estimation network can output the probability distribution model of the first element. For example, when using the Gaussian distribution model, the entropy estimation network can output the mean and variance of the first element; the encoding end determines the boundary of the first element based on the probability distribution model of the first element. The value can make the determined boundary value filter out values with very small probability as much as possible, thereby avoiding the complex entropy coding process.
  • determining the boundary value of the first element according to the probability distribution model of the first element includes: determining the first element according to the variance of the probability distribution model of the first element.
  • the variance can be used to measure the degree of data dispersion, and determine the boundary value based on the variance, so as to obtain the boundary value that can filter out values with very small probability.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the upper boundary value in the boundary value is k* ⁇ , and/or the lower boundary value in the boundary value The value is -k* ⁇ ; where k is a constant and ⁇ represents the variance of the probability distribution model.
  • determining the boundary value of the first element in the feature map includes: inputting the feature map to an entropy estimation network, and the entropy estimation network outputs a probability distribution of the first element model and the boundary value of the first element.
  • the entropy estimation network can also be trained in advance, so that the entropy estimation network can output both the probability distribution model of the first element and the boundary value, and the output boundary value is often more consistent with improving the entropy coding performance. needs.
  • the method further includes: quantizing the first element; and determining whether the first element is within a target range corresponding to the first element includes: determining after quantization Whether the first element of Within the target range corresponding to the first element; performing entropy encoding on the first element includes: performing entropy encoding on the quantized first element.
  • performing entropy encoding on the one element includes: determining a probability that the first element is within the target range and performing entropy encoding on the first element.
  • the ratio of probabilities, entropy coding is performed on the first element.
  • the probability of the target range can be obtained by subtracting the probability of the value being smaller than the lower boundary from the probability of the value being smaller than the upper boundary.
  • the probability distribution model is a Gaussian distribution model.
  • embodiments of the present application provide a coding method, including: inputting an image into a coding network to obtain a feature map of the image, where the feature map includes multiple elements; determining the first element in the feature map the boundary value of The boundary value of the element is determined; if the first element is within the target range corresponding to the first element, entropy encoding is performed on the first element; if the first element is within the target range corresponding to the first element If the element is outside the target range, mark the first element as an out-of-bounds element, and perform variable length code encoding on the first element.
  • the encoding end obtains the feature map of the image through the encoding network; the boundary value of each element in the feature map is determined, and the target range corresponding to each element is determined based on the boundary value of each element; if the element value If the value of the element is within its corresponding target range, entropy encoding will be performed on the element; if the value of the element is outside its corresponding target range, the first element will be marked as an out-of-bounds element, and the first element will be encoded with a variable length code. Since the corresponding boundary value is determined for each element in the above method, it helps to narrow the target range corresponding to each element.
  • variable-length codes are used to encode out-of-bounds elements, which helps improve Compression performance, especially variable-length code encoding of the difference between out-of-bound elements and boundaries, is more conducive to reducing the number of bits required after encoding, further improving compression performance.
  • marking the first element as an out-of-bounds element includes encoding flag information indicating that the first element is an out-of-bounds element into a code stream.
  • the code stream includes encoding information obtained by performing entropy encoding on the first element, or encoding information obtained by performing variable length code encoding on the first element.
  • the code stream may also include code stream information obtained by entropy encoding other non-boundary elements, and coding information obtained by variable length coding of other out-of-bounds elements.
  • the boundary value includes an upper boundary value and/or a lower boundary value
  • encoding the first element with a variable length code includes: if the first element is larger than the upper boundary value, Boundary value, determine the first difference between the first element and the upper boundary value; perform variable length code encoding on the first difference; or, if the first element is smaller than the lower boundary value, determine The second difference between the first element and the lower boundary value; perform variable length code encoding on the second difference.
  • determining the boundary value of the first element in the feature map includes: inputting the feature map into an entropy estimation network, and the entropy estimation network outputs Probability distribution model of the first element; determining the boundary value of the first element according to the probability distribution model of the first element.
  • determining the boundary value of the first element according to the probability distribution model of the first element includes: determining the first element according to the variance of the probability distribution model of the first element. The boundary value of an element.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the upper boundary value in the boundary value is k* ⁇ , and/or the lower boundary value in the boundary value The value is -k* ⁇ ; where k is a constant and ⁇ represents the variance of the probability distribution model.
  • determining the boundary value of the first element in the feature map includes: inputting the feature map to an entropy estimation network, and the entropy estimation network outputs a probability distribution of the first element model and the boundary value of the first element.
  • the method further includes: quantizing the first element; and determining whether the first element is within a target range corresponding to the first element includes: determining after quantization Whether the first element is within the target range corresponding to the first element; performing entropy encoding on the first element includes: performing entropy encoding on the quantized first element.
  • performing entropy encoding on the one element includes: determining a probability that the first element is within the target range and performing entropy encoding on the first element.
  • the ratio of probabilities, entropy coding is performed on the first element.
  • the probability of the target range can be obtained by subtracting the probability of the value being smaller than the lower boundary from the probability of the value being smaller than the upper boundary.
  • the probability distribution model is a Gaussian distribution model.
  • embodiments of the present application provide a decoding method, including: obtaining a code stream, where the code stream includes information encoded by multiple elements; determining whether the first element in the code stream is an out-of-bounds element, The first element is any one of the plurality of elements; if the first element is an out-of-bounds element, perform variable length code decoding on the encoded information of the first element to obtain the first element; if If the first element is not an out-of-bounds element, perform entropy decoding on the encoded information of the first element to obtain the first element.
  • the decoding end determines whether the first element is an out-of-bounds element, that is, determines whether the first element is within the target range corresponding to the first element. If it is within the target range, then the first element is not an out-of-bounds element. If it is outside the target range, then the first element is not an out-of-bounds element. An element is an out-of-bounds element.
  • determining whether the first element in the code stream is an out-of-bounds element includes: if the code stream includes the first element out-of-bounds flag information, determining whether the first element in the code stream is an out-of-bounds element. The element is an out-of-bounds element.
  • decoding the encoded information of the first element with a variable length code to obtain the first element includes: determining a boundary value of the first element; The encoded information is decoded with a variable length code to obtain a difference, which is the difference between the first element and the upper boundary value of the boundary value, or the difference between the first element and the lower boundary value. The difference between boundary values; the first element is determined based on the boundary value and the difference.
  • determining the boundary value of the first element includes: determining a probability distribution model of the first element; determining the first element according to the probability distribution model of the first element. boundary value.
  • determining the boundary value of the first element according to the probability distribution model of the first element includes: determining the first element according to the variance of the probability distribution model of the first element. The boundary value of an element.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the upper boundary value in the boundary value is k* ⁇ , and/or the lower boundary value in the boundary value The value is -k* ⁇ ; where k is a constant and ⁇ represents the variance of the probability distribution model.
  • determining the boundary value of the first element includes: inputting the code stream to an entropy estimation network that outputs a boundary value of the first element.
  • embodiments of the present application provide an encoding device, which includes modules/units that perform the method of the above-mentioned first aspect and any possible implementation of the first aspect; these modules/units can be implemented by hardware, The corresponding software implementation can also be executed through hardware.
  • the encoding device may include: a coding network module, configured to output a feature map of the image according to the input image, where the feature map includes multiple elements; and a determining module, configured to determine the first feature map in the feature map.
  • the boundary value of an element, the first element being any one of the plurality of elements; a judgment module used to judge whether the first element is within the target range corresponding to the first element, and the target The range is determined based on the boundary value of the first element; the encoding module is configured to perform entropy encoding on the first element when the first element is within the target range corresponding to the first element.
  • the first element is modified to the boundary value corresponding to the first element, and entropy coding is performed on the modified first element.
  • embodiments of the present application provide an encoding device, which includes modules/units that perform the method of the above-mentioned second aspect and any possible implementation of the second aspect; these modules/units can be implemented by hardware, The corresponding software implementation can also be executed through hardware.
  • the encoding device may include: a coding network module, configured to output a feature map of the image according to the input image, where the feature map includes multiple elements; and a boundary determination module, configured to determine the feature map in the feature map.
  • the boundary value of the first element the first element being any one of the plurality of elements; a judgment module used to judge whether the first element is within the target range corresponding to the first element, the The target range is determined based on the boundary value of the first element; the entropy coding module is used to perform entropy coding on the first element when the first element is within the target range corresponding to the first element. , when the first element is outside the target range corresponding to the first element, mark the first element as an out-of-bounds element, and perform variable length code encoding on the first element.
  • embodiments of the present application provide a decoding device, which includes modules/units that perform the above third aspect and any of the possible implementation methods of the third aspect; these modules/units can be implemented by hardware, The corresponding software implementation can also be executed through hardware.
  • the decoding device may include: an acquisition module, used to obtain a code stream, where the code stream includes information encoded by multiple elements; a determination module, used to determine whether the first element in the code stream is out of bounds element, the first element is any one of the plurality of elements; a decoding module is used to perform variable length code decoding on the information encoded by the first element when the first element is an out-of-bounds element. , obtain the first element. When the first element is not an out-of-bounds element, perform entropy decoding on the encoded information of the first element to obtain the first element.
  • embodiments of the present application provide an encoder, which includes a processing circuit for executing the encoding method described in the first aspect, the second aspect, and any implementation thereof.
  • embodiments of the present application provide a decoder, which includes a processing circuit for executing the decoding method described in the third aspect and any implementation thereof.
  • embodiments of the present application provide an encoder, including: one or more processors; a computer-readable storage medium coupled to the one or more processors, the computer-readable storage medium storing a program , wherein the program, when executed by the one or more processors, causes the encoder to perform the encoding method described in the first aspect, the second aspect and any implementation thereof.
  • a decoder includes: one or more processors; a computer-readable storage medium coupled to the one or more processors, the computer-readable storage medium stores a program, wherein, The program, when executed by the one or more processors, causes the decoder to execute the decoding method described in the third aspect and any implementation thereof. Law.
  • embodiments of the present application provide a computer-readable storage medium. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, they cause the computer to execute the first aspect and the third aspect. The method described in the second aspect, the third aspect and any implementation thereof.
  • embodiments of the present application provide a computer program product containing instructions that, when run on a computer, enable the method described in the first aspect, the second aspect, the third aspect and any of their implementations to be implement.
  • embodiments of the present application provide a computer-readable storage medium in which a bit stream is stored, and the bit stream is implemented according to the first aspect or the second aspect and any possible implementation.
  • the encoding method in the method is generated.
  • embodiments of the present application provide a computer-readable storage medium.
  • a bit stream is stored in the computer-readable storage medium.
  • the bit stream includes program instructions executable by a decoder.
  • the program instructions enable decoding.
  • the processor executes the decoding method in the third aspect and any possible implementation manner of the third aspect.
  • inventions of the present application provide a decoding system.
  • the decoding system includes at least one memory and a decoder.
  • the at least one memory is used to store a bit stream.
  • the decoder is used to perform the third aspect and The decoding method in any possible implementation of the third aspect.
  • embodiments of the present application provide a method of storing a bit stream.
  • the method includes receiving or generating a bit stream, and storing the bit stream in a storage medium.
  • the method further includes: performing format conversion processing on the bit stream to obtain a format-converted bit stream, and storing the format-converted bit stream in a storage medium.
  • embodiments of the present application provide a method of transmitting a bit stream.
  • the method includes receiving or generating a bit stream, transmitting the bit stream to a cloud server, or transmitting the bit stream to a mobile terminal.
  • Figure 1 is a schematic flow chart of a VAE method
  • Figure 2 is a flow chart of an entropy coding optimization method based on probability distribution
  • Figure 3 is a schematic diagram of a scenario provided by an embodiment of the present application.
  • Figure 4 is a schematic flow chart of an encoding method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of a coding network and a decoding network provided by an embodiment of the present application
  • Figure 6 is a schematic diagram of an entropy estimation network provided by an embodiment of the present application.
  • Figure 7 is a schematic flow chart of a decoding method provided by an embodiment of the present application.
  • Figure 8 is a comparison chart of encoding effects provided by the embodiment of the present application.
  • Figure 9 is a comparison chart of decoding effects provided by the embodiment of the present application.
  • Figure 10 is a schematic flow chart of another encoding method provided by an embodiment of the present application.
  • Figure 11 is a schematic flow chart of another decoding method provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of another encoding device provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a decoding device provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • AI performs better than traditional image algorithms in many fields such as image recognition and target detection
  • deep learning is also used in the field of image compression.
  • each module of the AI image compression algorithm encoding network, entropy estimation network, decoding network, etc.
  • encoding network entropy estimation network, decoding network, etc.
  • VAE variational autoencoder
  • MS-SSIM multi-scale structural similarity index
  • PSNR peak signal to noise ratio
  • the coding process can include the following steps:
  • the encoding end inputs the image into the encoding network, and the encoding network outputs the feature map of the image.
  • the encoding network is responsible for converting the image into a feature representation in another space and removing noise, high-frequency information, etc. in the image.
  • the encoding end quantifies the feature map, for example, rounding the feature values in the feature map to obtain the quantized feature map to be encoded.
  • the encoding end obtains the result based on the entropy estimation network The probability distribution of each element in .
  • the encoding end performs entropy coding on the elements according to the probability distribution of each element to obtain the encoded code stream.
  • the decoding process can include the following steps:
  • the decoder After the decoder obtains the code stream, it determines it based on the entropy estimation network The probability distribution of each element in .
  • the decoding end performs entropy decoding based on the probability distribution of the code stream and each element to obtain
  • the decoding end will then Input the decoding network and obtain the reconstructed image.
  • Entropy coding is a data compression coding based on the distribution characteristics of message occurrence probability.
  • Entropy coding and entropy decoding are commonly used algorithms in data compression, which are used to compress data to the theoretical entropy size: -log b P s , where b represents the base number to measure the size of the code stream (usually 2), and P s represents the data The probability of the element.
  • b the base number to measure the size of the code stream (usually 2)
  • P s the data The probability of the element.
  • the purpose of entropy coding is to convert the sequence S A binary code stream compressed to the size of ⁇ -log 2 p i (s i ).
  • the purpose of entropy decoding is to recover the sequence S based on the probability distribution of each element and the code stream.
  • AI image compression technology has a better compression rate, which has led researchers and scientific research institutions to continue to use AI image compression technology in practical applications.
  • practical applications in addition to higher requirements on compression rate, there are also higher requirements on decoding time and encoding time.
  • decoding time and encoding time In AI image compression, in addition to the long time-consuming encoding network/decoding network part, the processes of entropy encoding and entropy decoding are also very time-consuming steps. Therefore, entropy encoding and entropy decoding are one of the important bottlenecks in the speed of AI image compression. How to optimize the efficiency of entropy coding and entropy decoding has become an important research issue.
  • the CPU When the CPU performs entropy encoding and entropy decoding, the CPU can only operate serially due to the correlation between elements.
  • the range of possible values for each element is relatively large.
  • traditional compression technology usually uses a binary coder, and the element value is 0 or 1; in AI compression technology, the value range of the element is usually [-64, 64], [-128, 128], etc., as the value range becomes larger, the probability distribution becomes more complex, and each element in AI compression technology has its own probability distribution.
  • an entropy coding optimization method based on probability distribution can be shown in Figure 2.
  • This scheme believes that elements with high peak probability distribution do not require entropy coding, and the peak value is directly used to replace the element. For example, if the probability of the probability distribution of an element at the mean is greater than the preset threshold, the mean value is used to replace the original value of the element. , the element is no longer entropy encoded.
  • This method is a trade-off between code rate and loss, while reducing the elements required for entropy encoding and entropy decoding, and speeding up the entropy encoding/decoding.
  • the encoding process of the above method may include the following steps:
  • the encoding end inputs the image into the encoding network and quantizes the feature map output by the encoding network to obtain the quantized feature map to be encoded.
  • the encoding end estimates the network based on the entropy
  • the probability distribution of each element in taking the Gaussian distribution as an example, obtains the mean ⁇ and variance ⁇ of each element.
  • the encoding end determines the probability peak P of each element within the value range based on the probability distribution, which is the probability corresponding to the mean ⁇ in the Gaussian distribution.
  • the encoding end traverses each element and determines whether the peak probability P of each element is greater than the threshold P th . If it is greater than the threshold P th , replace the actual value of the element with the element value corresponding to the peak probability P and write it If the code stream is smaller than the threshold P th , the element is entropy coded according to the probability distribution.
  • the decoding process can include the following steps:
  • the decoder After the decoder obtains the code stream, it determines it based on the entropy estimation network The probability distribution of each element in .
  • the encoding end determines the probability peak P of each element within the value range based on the probability distribution.
  • the encoding end traverses each element and determines whether the peak probability P of each element is greater than the threshold P th . If it is greater than the threshold P th , then the element is assigned the value corresponding to the probability peak value P. If it is less than the threshold P th , then Entropy decodes the element based on a probability distribution.
  • the decoding end will get the decoded Input the decoding network and obtain the reconstructed image.
  • the above entropy coding optimization method helps to speed up entropy encoding/decoding, but when targeting each element in the feature map, this method either does not perform entropy encoding or entropy decoding, or performs the entire process of entropy encoding and entropy decoding in a traditional way. .
  • the entropy encoding and entropy decoding processes are not optimized; for elements that do not undergo entropy encoding and entropy decoding, writing the assigned elements directly into the code stream is not conducive to reducing the compression rate. .
  • bypass coding bypass
  • the coding process of this method can include the following steps:
  • the encoding end determines the feature map The probability distribution of each element in , determines the value range of each element that is not less than the lowest probability threshold. For example, assuming that the probability threshold is 95%, the value range of an element is [-64, 64], and the probability that the element takes a value less than or equal to 40 is 95%, then the value range of the element that is not less than the lowest probability threshold is - 64 ⁇ 40.
  • the encoding end traverses each element and determines whether the value of the element is within a value range that is not less than the lowest probability threshold:
  • a flag bit is set for the element, indicating that the element exceeds the value range that is not less than the minimum probability threshold, and the value of the element is directly written into the code stream without encoding.
  • the decoding process of this method may include the following steps:
  • the decoder After the decoder obtains the code stream, it determines it based on the entropy estimation network The probability distribution of each element in .
  • the decoding section determines whether the flag bit is included based on the information corresponding to each element in the code stream. If it is included, the value of the element is directly read. If it is not included, the information corresponding to the element is based on the probability distribution of the element. Perform entropy decoding.
  • embodiments of the present application provide an encoding and decoding method that narrows the target range of element values by setting the boundary value of elements, simplifying the probability distribution of elements within the target range, thereby reducing the bits required for element probability.
  • This application can be applied to the process of encoding and compressing data such as images and videos, such as the data encoding and compression process in video surveillance, live broadcast, terminal recording, storage, transmission, cloud codec, cloud transcoding, video stream distribution and other services, Especially suitable for AI-based compression scenarios.
  • Figure 3 exemplarily provides a schematic diagram of a scenario in which the encoding and decoding methods provided by the embodiments of the present application can be applied.
  • the monitoring device 301 (or the monitoring device 302) encodes the collected video information, and The code stream obtained after encoding is uploaded to the cloud server 306; the cloud server 306 can send the code stream to the terminal device 303 after receiving a request from the terminal device 303 (or the terminal device 304 or the terminal device 305) to obtain the code stream.
  • 303 decodes the obtained code stream to play the video.
  • the cloud server 306 may also have decoding and/or encoding capabilities. For example, the cloud server 306 may decode the obtained code stream, then process the video, and then encode the processed video for subsequent sending to other parties. Terminal Equipment.
  • the encoding end can be a computing device, or can be implemented by multiple computing devices, where the computing
  • the device is a device with encoding function, which can be a server, such as a cloud server; it can also be a terminal device, such as a monitoring device, a terminal device for live broadcast, etc.
  • the encoding method shown in Figure 4 may include the following steps:
  • Step 401 The encoding end inputs the image into the encoding network to obtain a feature map of the image.
  • the feature map includes multiple elements.
  • the encoding end inputs the obtained image to be compressed into the encoding network, which is used to convert the image into a feature representation of another space and output a feature map.
  • the encoding network can be understood as a functional module, which can be composed of convolution, activation (such as relu, leaky_relu, etc.), up and down sampling, etc.
  • Figure 5 exemplarily provides an encoding and decoding network that can be applied to the embodiment of the present application.
  • the encoding network on the encoding side can be composed of convolution (conv) and generalized divisor normalization. (generalized divisive normalization, GDN) alternate formation.
  • the convolution function conv M*5*5/2 shown in Figure 5 represents a convolution function with a channel number of M, a convolution kernel of 5*5, and 1/2 downsampling; and GDN is an activation function. one of kind. It should be understood that Figure 5 is only an example, and other coding networks that can achieve similar functions can also be used in actual applications.
  • the feature map output by the encoding network consists of multiple elements.
  • the feature map output by the encoding network can also be a three-dimensional matrix.
  • the size of the three-dimensional matrix is M*N*C, that is, the three-dimensional matrix includes M*N*C elements, each The value of an element is the characteristic value corresponding to that element.
  • Step 402 The encoding end determines the boundary value of the first element in the feature map, where the first element is any one of the above multiple elements.
  • Common element value ranges include [-64, 64], [-128, 128], etc. Due to the large element value range, the probability distribution of element values is more complex, which will further increase the complexity of entropy coding. Entropy encoding is slower. In order to simplify the complexity of entropy coding and improve the speed of entropy coding, in the embodiment of the present application, based on the original value range, a boundary at the element level is further set. The boundary determines the target range of values when entropy coding is performed on the element. So that when subsequent entropy coding is performed, entropy coding is performed according to the probability distribution within the target range, and the boundary is applied to its corresponding element and does not apply to every element in the feature map. In an optional case, each of the multiple elements in the feature map corresponds to its respective boundary value; in another optional case, some of the multiple elements correspond to its respective boundary value. , and not every element has its own boundary value.
  • element-level boundaries can be set to eliminate element values with lower probability values, so that subsequent entropy coding can be performed based on
  • the value probability distribution within the target range determined by the boundary is entropy encoded.
  • the original value range of the first element is [-64, 64], but the probability of the first element being greater than 40 is only 0.001, and the probability of being less than -40 is only 0.001. Since the first element is greater than 40 or less than -40 The probability is very low. Therefore, -40 and 40 can be set as boundaries, and the target range is [-40, 40].
  • entropy coding is performed on the first element according to the probability distribution within the target range to reduce the entropy coding. complexity, thus improving the entropy coding speed.
  • the boundary value of the first element may include an upper boundary and/or a lower boundary.
  • the target range corresponding to the first element when the boundary of the first element includes the upper boundary r, then the target range corresponding to the first element can be [-64, r], where r ⁇ 64; when the boundary of the first element includes the lower boundary l, then the target range corresponding to the first element can be [l, 64], where l ⁇ -64; when the boundary of the first element includes the upper boundary r and When the lower boundary is l, then the target range corresponding to the first element is [l, r], where l ⁇ -64 and r ⁇ 64.
  • the encoding end determines the boundary value of the first element in the feature map, it can determine the boundary value of the first element based on the probability distribution of the first element; it can also perform network training in advance, and the trained network outputs the first element boundary value.
  • the encoding end can first input the feature map output by the encoding network into the entropy estimation network, and the entropy estimation network outputs a probability distribution model of the first element.
  • the encoding end determines the boundary value of the first element based on the probability distribution model of the first element.
  • the probability distribution model of the first element can be represented by the mean ⁇ and variance ⁇ , that is, the entropy estimation network can output the corresponding mean ⁇ and variance ⁇ , and the encoding end can determine the first element based on the variance. boundary value. For example, if the mean ⁇ is 0, the encoding end can determine k ⁇ as the upper boundary value, and/or determine -k ⁇ as the lower boundary value; where k is a positive constant and ⁇ is the variance.
  • the probability of a value in the range [- ⁇ , ⁇ ] About 68%, the probability in the range of [-2 ⁇ , 2 ⁇ ] is about 95%, the probability in the range of [-3 ⁇ , 3 ⁇ ] is about 99%, when the value of k is larger, the probability is within the target range The probability will be greater. It can be seen that by setting the k value, values with very small probability can be eliminated.
  • Figure 6 exemplarily provides an entropy estimation network applicable to the embodiment of the present application.
  • the entropy estimation network on the encoding side can alternate between convolution (conv) and activation (relu). constitute.
  • the network determines the boundary value of the first element.
  • the encoding end may input the feature map to the network for determining the boundary value, so that the network outputs the boundary value of the first element.
  • the entropy estimation network can be trained on boundary values, so that the entropy estimation network can not only output the probability distribution model of the first element, but also output the boundary value of the first element.
  • other networks may output the boundary value of the first element based on the feature map, or other networks may output the boundary value of the first element based on the probability distribution model of the first element. value.
  • the encoding end can use any of the above implementation methods to traverse each element in the feature map and determine the boundary value corresponding to each element.
  • Step 403 The encoding end determines whether the first element is within the target range corresponding to the first element.
  • the target range corresponding to the first element on is the range determined based on the boundary value of the first element.
  • the first element may also be quantized; accordingly, if step 403 is performed, it is determined whether the quantized first element is within the target range corresponding to the first element.
  • quantization can be to round the value of the first element, and the quantization formula can be Among them, y represents the value of the first element before quantization, Represents the quantized value of the first element.
  • quantization can also perform residual quantization on the value of the first element.
  • the quantization formula can use Among them, y represents the value of the first element before quantization, ⁇ represents the mean of the probability distribution of the value of the first element, Represents the quantized value of the first element.
  • step 404a is executed; if the first element is outside the target range corresponding to the first element, step 404b is executed.
  • Step 404a If the first element is within the target range corresponding to the first element, the encoding end performs entropy coding on the first element.
  • the first element is located within the target range corresponding to the first element, it can be considered that the value of the first element is not a value with a very small probability.
  • the first element can be calculated based on the probability distribution of the value of the first element within the target range.
  • Perform entropy encoding If the first element is quantized, entropy coding is performed on the quantized first element.
  • the value probability of the quantized element s 1 is p G (s), and the value of the quantized element s 1 is within the target range [l, r], then according to the value of the quantized element s 1
  • the quantized element s 1 is entropy-encoded according to the probability distribution in the target range [l, r]. Specifically, the probability that the value of element s 1 is less than the upper boundary r of the target range is c G (r), and the probability that the value of element s 1 is less than the lower boundary l of the target range is c G (l). Then the quantized element The probability that the value of s 1 is within the target range is Just perform entropy coding on the quantized element s 1 according to the probability p LG (s).
  • the probability distribution of the value of the first element can be determined based on the entropy estimation network, that is, the encoding end inputs the feature map into the already trained entropy estimation network, and the entropy estimation network outputs the probability distribution model of the first element. For example, if the first element obeys Gaussian distribution (also known as normal distribution), then the entropy estimation network can output the corresponding mean ⁇ and variance ⁇ , and the encoding end can determine the corresponding value of each possible value based on the mean ⁇ and variance ⁇ . The probability.
  • Gaussian distribution also known as normal distribution
  • the encoding end can determine the probability corresponding to any value in [-64, 64] based on the mean ⁇ and variance ⁇ corresponding to the first element.
  • the encoding end determines the actual value of the first element (if the first element is quantized, Then here is the probability of the first element after quantization), and the probability that the actual value of the first element (if the first element is quantized, here is the first element after quantization) is within the target range, with Implement entropy coding of the first element based on the probability that the actual value of the first element is within the target range.
  • Step 404b If the first element is outside the target range corresponding to the first element, the encoding end modifies the first element to the boundary value corresponding to the first element, and performs entropy coding on the modified first element.
  • the value of the first element can be considered to be a value with a very small probability.
  • the first element can be The value of the element is modified to the corresponding boundary value, and the information loss is not obvious, but it can reduce the complexity of entropy encoding of the first element; in addition, since the probability of the value of the first element before modification is very small, this probability is Not only will large errors occur during quantization, but it will also reduce the probability of other originally higher probability values. Modifying the first element to a boundary value is equivalent to packaging the probabilities of several values beyond the boundary, and then performing The quantization error generated during quantization is smaller, reducing the impact on high-probability values.
  • the value of the first element is modified to the upper boundary value; if the first element is smaller than the target range, The lower boundary of the range, that is, the lower boundary included in the boundary value determined in step 402, then the value of the first element is modified to the lower boundary value.
  • the target range corresponding to the first element is [-40, 40]; if the value of the first element is 64, which is greater than the upper boundary of the target range, then the value of the first element is modified to 40, and then the value of the first element is 40, and then the value of the first element is 40.
  • Entropy encoding is performed on the probability distribution in [-40, 40]; if the value of the first element is -50, which is less than the lower boundary of the target range, the value of the first element is modified to -40, and then based on the value - 40The probability distribution in [-40, 40] is entropy encoded.
  • the encoding end can also determine the probability corresponding to each possible value of the first element through the entropy estimation network, and then modify the value of the first element to the corresponding boundary After the value is obtained, the probability that the modified first element is within the target range is determined, and then the modified first element is entropy-encoded according to the probability that the modified first element is within the target range.
  • the obtained encoding information will be included in the code stream and can be sent to the decoding end.
  • the code stream may include coding information obtained by coding each element in the feature map.
  • the encoding end obtains the feature map of the image through the encoding network; the boundary value of each element in the feature map is determined, and the target range corresponding to each element is determined based on the boundary value of each element; if the element value Within its corresponding target range, entropy coding is performed on the element; if the value of the element is outside its corresponding target range, the value of the element is modified to the boundary value corresponding to the element, and entropy is performed on the modified element. coding. Since the corresponding boundary value is determined for each element in the above method, it helps to narrow the target range corresponding to each element.
  • the complexity of the probability distribution of the element within the target range is reduced, thereby reducing entropy.
  • the complexity of encoding improves the entropy encoding speed; for elements that exceed the target range, the existing entropy encoding optimization method does not perform entropy encoding on the out-of-bounds elements, but writes their values directly into the code stream, which is equivalent to using A special fixed-length code needs to occupy a larger number of bits for encoding; however, in the above-mentioned method embodiments of this application, elements outside the target range are still entropy-encoded after modifying their values, and the information loss is not obvious. Information compression is performed under the circumstances, which improves the compression performance and helps to avoid quantization errors caused by quantization of smaller probability values.
  • the decoding end can perform decoding according to the process shown in Figure 6.
  • the process shown in Figure 6 is executed by the decoding end, which can be a computing device or can be implemented by multiple computing devices.
  • the decoding method shown in Figure 7 may include the following steps:
  • Step 701 The decoding end obtains a code stream, which includes information encoded by multiple elements.
  • Step 702 The decoding end decodes the code stream according to the entropy estimation network to obtain the feature map of the image.
  • Figure 6 exemplarily provides an entropy estimation network that can be applied to the embodiment of the present application.
  • the entropy estimation network at the decoding end can be composed of depth convolution (dconv), activation (relu) Alternate composition.
  • Step 703 The decoder inputs the feature map to the decoding network to obtain the reconstructed image.
  • the decoding end can use the decoding network as shown in (b) in Figure 5.
  • the decoding network can be composed of a depth convolution ( dconv) and GDN are alternately formed.
  • Encoding and decoding based on the encoding and decoding processes shown in Figures 4 and 7 are superior to currently existing entropy encoding and decoding methods in terms of encoding and decoding speed and compression performance.
  • the encoding and decoding methods provided by the above embodiments of the present application will be compared with the encoding and decoding methods shown in Figure 1 (referred to as the basic model baseline, or simply referred to as model 1) and the encoding and decoding methods shown in Figure 2 (referred to as baseline+). skip, or simply model 2) for comparison.
  • two types of data were collected based on the encoding and decoding methods of the embodiments of the present application, which are: 1.
  • the upper and lower boundary values are -5 ⁇ and 5 ⁇ respectively (referred to as limitG5, or referred to as model 3); 2.
  • the upper and lower boundary values are -10 ⁇ and 10 ⁇ respectively (referred to as limitG10).
  • BD-Rate Bjontegaard-Delta bitrate
  • the BD-Rate of method A compared to method B indicates the difference in code rate between method A and method B under the same objective indicators. It is usually expressed as a percentage. If it is -x%, it means that method A is compared to method B. The method can save x% of space. If it is positive, it means increasing x% of space.
  • the baseline is used as the comparison benchmark, and baseline+skip, limitG5, and limitG10 are compared with the baseline respectively.
  • the BD-Rate of baseline+skip is -4.17%, which means that baseline+skip saves 4.17% of space compared to baseline;
  • the BD-Rate of limitG10 is -4.91%, which means that limitG10 saves 4.91% of space compared to baseline;
  • limitG5 BD-Rate is -4.95%, which means limitG5 saves 4.95% of space compared to baseline.
  • baseline+bypass5 or simply model 4
  • baseline+bypass10 or model 5 for short
  • limitG5 limitG5 for comparison.
  • Each method uses 8bit, 10bit, 12bit and 14bit to quantify the probability.
  • the abscissa represents encoding time
  • the ordinate represents BD-Rate
  • the five curves correspond to baseline, baseline+bypass5, baseline+bypass10, baseline+skip and limitG5 from top to bottom.
  • limitG5 encoding time is the shortest
  • the BD-Rate is the lowest
  • 10 bit is used to quantify the probability (that is, the second point on each curve, the second point on the baseline curve is not displayed because the BD-Rate is too high)
  • the shortest and the lowest BD-Rate when 12bit and 14bit are used to quantify the probability, limitG5 also has the shortest encoding time and the lowest BD-Rate.
  • the abscissa represents decoding time
  • the ordinate represents BD-Rate.
  • limitG5 has the shortest decoding time and the lowest BD-Rate. Therefore, It can be seen that the encoding and decoding methods provided by the embodiments of the present application are superior to the encoding and decoding methods and the bypass method shown in Figures 1 and 2 in terms of decoding speed.
  • Embodiments of the present application also provide an encoding and decoding method, which can also reduce the complexity of entropy encoding and entropy decoding algorithms, thereby significantly reducing the time consuming of entropy encoding and entropy decoding.
  • This method can also be applied to the process of encoding and compressing data such as images and videos, such as the data encoding and compression process in video surveillance, live broadcast, terminal recording, storage, transmission and other services. It is especially suitable for AI-based compression scenarios.
  • FIG 10 is a schematic flow chart of another encoding method provided by an embodiment of the present application.
  • the process shown in Figure 10 is executed by the encoding end.
  • the encoding end can be a computing device or can be implemented by multiple computing devices.
  • the encoding method shown in Figure 10 may include the following steps:
  • Step 1001 The encoding end inputs the image into the encoding network to obtain a feature map of the image.
  • the feature map includes multiple elements.
  • This step is similar to step 401 in the above embodiment.
  • the encoding network as shown in Figure 5(a) can also be used to obtain the feature map of the image.
  • Figure 5 is only an example, and other coding networks that can achieve similar functions can also be used.
  • Step 1002 The encoding end determines the boundary value of the first element in the feature map, where the first element is any one of the above multiple elements.
  • the determined boundary value of the first element may include an upper boundary value, a lower boundary value, or an upper boundary value and a lower boundary value.
  • the encoding end can determine the boundary value of the first element according to the probability distribution of the first element, or the network can determine the boundary value of the first element.
  • the network can determine the boundary value of the first element.
  • the encoding end can traverse each element in the feature map and determine the boundary value corresponding to each element.
  • Step 1003 The encoding end determines whether the first element is within the target range corresponding to the first element.
  • the encoding end can determine the target range corresponding to the first element based on the boundary value.
  • the original value range of the first element is [-64, 64].
  • the target range corresponding to the first element can be [-64, r], where r ⁇ 64 ;
  • the target range corresponding to the first element can be [l, 64], where l ⁇ -64;
  • the target range corresponding to the first element is [l, r], where l ⁇ -64 and r ⁇ 64.
  • step 1004a is executed; if the first element is outside the target range corresponding to the first element, step 1004b is executed.
  • Step 1004a If the first element is within the target range corresponding to the first element, the encoding end performs entropy coding on the first element.
  • the first element is located within the target range corresponding to the first element, it can be considered that the value of the first element is not a value with a very small probability.
  • the first element can be calculated based on the probability distribution of the value of the first element within the target range.
  • Perform entropy encoding If the first element is quantized, entropy coding is performed on the quantized first element. For details, please refer to any implementation of step 404a in the aforementioned embodiments.
  • Step 1004b If the first element is outside the target range corresponding to the first element, the encoding end marks the first element as an out-of-bounds element and performs variable length code encoding on the first element.
  • the value of the first element can be considered to be a value with a very small probability.
  • the first element can not be entropy encoded, but variable length can be used.
  • code encoding method for the first element elements are encoded.
  • the first element can be encoded using variable length code encoding methods such as Golomb code, Golomb-Rice code, and exponential Golomb code.
  • the encoding end When the encoding end marks the first element as an out-of-bounds element, it needs to encode the flag information indicating that the first element is an out-of-bounds element into the code stream, so that the decoder can determine based on the out-of-bounds flag information that the first element can be an out-of-bounds element. .
  • the encoding end can set an out-of-bounds bit for an out-of-bounds element to indicate that the element is an out-of-bounds element.
  • the decoding end parses the out-of-bounds bit from the code stream, it can determine that the element is an out-of-bounds element; or, the encoding end can Set a flag bit for each element, using "0" and "1" to indicate no out-of-bounds and out-of-bounds respectively.
  • the encoding end can also modify the value of the first element to a preset out-of-bounds value, and perform entropy encoding or other encoding forms on it, so that the decoder can determine that the element is after decoding and obtaining the preset out-of-bounds value.
  • Out-of-bounds value assume that the target range corresponding to the first element is [-10, 10]. If the value of the first element is greater than 10, modify it to 11, which means that the first element is greater than the upper boundary. If the value of the first element is less than -10, then modify it to -11, indicating that the first element is smaller than the lower boundary.
  • the above-mentioned code stream includes encoding information obtained by entropy encoding the first element, or encoding information obtained by performing variable length code encoding on the first element.
  • the code stream may also include code stream information obtained by entropy encoding other non-boundary elements, and coding information obtained by variable length code encoding of other out-of-bounds elements.
  • the encoding end performs variable length code encoding on the quantized first element.
  • variable length code encoding is performed on the out-of-bounds elements, which can not only retain more characteristic information of the out-of-bounds elements, but also encode and compress the information of the out-of-bounds elements.
  • the probability distribution curve is a single peak (such as the Gaussian distribution model mentioned in the embodiments of this application)
  • using variable-length code encoding has higher compression performance than traditional fixed-length code encoding, and the encoded information occupies fewer bits.
  • the difference between the first element and the boundary can be encoded. For example, if the first element is greater than the upper boundary in the boundary, the first difference between the first element and the upper boundary value can be determined, and then the first difference is encoded with a variable length code; if the first element is smaller than the lower boundary in the boundary , the second difference between the first element and the lower boundary can be determined, and then the second difference is encoded with a variable length code. Since the difference is usually much smaller than the value of the first element and can be represented by fewer bits, variable-length code encoding of the difference between the first element and the boundary can further improve compression performance.
  • the decoding end can perform decoding according to the process shown in Figure 11.
  • the process shown in Figure 11 is executed by the decoding end, which can be a computing device or can be implemented by multiple computing devices.
  • the decoding method shown in Figure 11 may include the following steps:
  • Step 1101 The decoding end obtains a code stream, which includes information encoded by multiple elements.
  • Step 1102 The decoding end determines whether the first element in the code stream is an out-of-bounds element, where the first element is any one of the above multiple elements.
  • the decoding end determines whether the first element is an out-of-bounds element, that is, determines whether the first element is within the target range corresponding to the first element. If it is within the target range, then the first element is not an out-of-bounds element. If it is outside the target range, then the first element is not an out-of-bounds element. An element is an out-of-bounds element.
  • the decoding end can also determine the first element based on the out-of-bounds mark information when decoding. Whether it crosses the line. For example, the encoding end can set an out-of-bounds bit for an out-of-bounds element to indicate that the element is an out-of-bounds element.
  • the code end parses the out-of-bounds bit from the code stream, it can determine that the element is an out-of-bounds element; alternatively, the encoding end can set a mark bit for each element, using "0" and "1" respectively to indicate that there is no out-of-bounds element. and out of bounds, the decoding end determines whether the element is out of bounds based on the flag bit of each element. For another example, if the encoding end modifies the out-of-bounds first element value to a preset out-of-bounds value, the decoding end first determines the boundary value of the first element during decoding.
  • the decoded first element value is an out-of-bounds value
  • determine The first element is out of bounds; assuming that the upper boundary value and lower boundary value corresponding to the first element are 10 and -10 respectively. If the value of the decoded first element is 11, it means that the first element is greater than the upper boundary. If the decoded first element has a value of 11, it means that the first element is greater than the upper boundary. The value of one element is -11, which means that the first element is smaller than the lower boundary.
  • step 1103a If the decoding end determines that the first element is not out of bounds, step 1103a is executed; if the decoding end determines that the first element is out of bounds, step 1103b is executed.
  • Step 1103a If the first element is not an out-of-bounds element, the decoding end performs entropy decoding on the encoded information of the first element to obtain the first element.
  • the decoder performs entropy decoding on the encoded information. For example, the decoder can entropy decode the encoded information based on the entropy estimation network to obtain the value of the first element. If the encoding end uses the entropy estimation network shown in (a) in Figure 6 for encoding, then the decoding end can use the entropy estimation network shown in (b) in Figure 6 for decoding.
  • Step 1103b If the first element is an out-of-bounds element, the decoding end performs variable length code decoding on the encoded information of the first element to obtain the first element.
  • the encoding end encodes the difference between the first element and the boundary value with a variable length code when encoding the out-of-bounds element; then when the decoding end performs decoding, it can first determine the boundary value of the first element and encode the first element.
  • the element-encoded information is decoded by variable length code to obtain the difference, and then the value of the first element is determined based on the boundary value of the first element and the difference obtained by decoding.
  • the boundary of the first element includes the upper boundary and the lower boundary. If the difference obtained by decoding is positive, the difference can be considered to be the difference obtained by subtracting the upper boundary from the first element; if the difference obtained by decoding is negative, it can be Consider the difference obtained by subtracting the lower bound from the first element of the difference.
  • the code stream can be input into the entropy estimation network to obtain the probability distribution of the first element, and then the decoding end determines the probability distribution of the first element based on the probability distribution of the first element.
  • the boundary value of an element alternatively, the decoder can also obtain the first boundary value based on the trained network.
  • the decoder can input the code stream to the entropy estimation network, and the entropy estimation network outputs the boundary value of the first element.
  • the decoding end can perform a reverse operation according to the code stream to obtain the boundary value of the first element.
  • the decoding end can obtain the feature map of the image after executing the above step 1103a or step 1103b for each element, and then the decoding end can input the feature map into the encoding network to obtain the reconstructed image. If the encoding end uses the encoding network as shown in (a) in Figure 5, then the decoding end can use the decoding network as shown in (b) in Figure 5.
  • the encoding end obtains the feature map of the image through the encoding network; the boundary value of each element in the feature map is determined, and the target range corresponding to each element is determined based on the boundary value of each element; if the element value If the value of the element is within its corresponding target range, entropy encoding will be performed on the element; if the value of the element is outside its corresponding target range, the first element will be marked as an out-of-bounds element, and the first element will be encoded with a variable length code.
  • the decoding end when it determines that the first element is not an out-of-bounds element based on the obtained code stream, it can perform entropy decoding on the encoded information of the first element; when it is determined that the first element is an out-of-bounds element, it can perform entropy decoding on the encoded information of the first element.
  • the information is decoded by variable length code. Since the corresponding boundary value is determined for each element in the above method, it helps to narrow the target range corresponding to each element. Since the target range is narrowed, the complexity of the probability distribution of the element within the target range is reduced, thereby reducing entropy. The complexity of encoding increases the speed of entropy encoding.
  • variable length code encoding is performed on out-of-bounds elements, which helps to improve compression performance.
  • variable-length code encoding on the difference between out-of-bounds elements and boundaries is more conducive to compression. The number of bits required after encoding further improves compression performance.
  • embodiments of the present application also provide an encoding device. Used to implement the functions of the encoding end in the above method embodiments.
  • the device may include modules/units that execute any of the possible implementation methods in the above method embodiments; these modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software.
  • the device may include: a coding network module 1201, a determination module 1202, a judgment module 1203 and a coding module 1204.
  • the encoding network module 1201 is used to input an image into the encoding network and obtain a feature map of the image, where the feature map includes multiple elements.
  • the determination module 1202 is configured to determine the boundary value of a first element in the feature map, where the first element is any one of the plurality of elements.
  • Determination module 1203 is used to determine whether the first element is within a target range corresponding to the first element, where the target range is determined based on the boundary value of the first element.
  • Encoding module 1204 configured to perform entropy encoding on the first element when the first element is within the target range corresponding to the first element; when the first element is within the target range corresponding to the first element When it is outside the target range: modify the first element to the boundary value corresponding to the first element, and perform entropy coding on the modified first element.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the encoding module 1204, when modifying the first element to the boundary value corresponding to the first element, Specifically used for: if the first element is greater than the upper boundary value, then modify the first element to the upper boundary value, or if the first element is less than the lower boundary value, then change the first element to the upper boundary value. The first element is modified to the lower bound value.
  • the determination module 1202 when determining the boundary value of the first element in the feature map, is specifically configured to: input the feature map into an entropy estimation network, and the entropy estimation network The network outputs a probability distribution model of the first element in the feature map; and determines the boundary value of the first element according to the probability distribution model of the first element.
  • the determination module 1202 when determining the boundary value of the first element according to the probability distribution model of the first element, is specifically configured to: according to the probability distribution of the first element The variance of the model determines the boundary value of the first element.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the upper boundary value in the boundary value is k* ⁇ , and/or the lower boundary value in the boundary value The value is -k* ⁇ ; where k is a constant and ⁇ represents the variance of the probability distribution model.
  • the determination module 1202 when determining the boundary value of the first element in the feature map, is specifically configured to: input the feature map to an entropy estimation network, and the entropy estimation network outputs The probability distribution model of the first element and the boundary value of the first element.
  • embodiments of the present application also provide an encoding device. Used to implement the functions of the encoding end in the above method embodiments.
  • the device may include modules/units that execute any of the possible implementation methods in the above method embodiments; these modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software.
  • the device may include: a coding network module 1301, a determination module 1302, a judgment module 1303, and a coding module 1304.
  • the encoding network module 1301 is used to input an image into the encoding network and obtain a feature map of the image, where the feature map includes multiple elements.
  • the determination module 1302 is configured to determine the boundary value of a first element in the feature map, where the first element is any one of the plurality of elements.
  • Determination module 1303 is used to determine whether the first element is within a target range corresponding to the first element, where the target range is determined based on the boundary value of the first element.
  • Encoding module 1304 configured to perform entropy encoding on the first element when the first element is within the target range corresponding to the first element; when the first element is within the target range corresponding to the first element When it is outside the target range: mark the first element as an out-of-bounds element, and perform variable length code encoding on the first element.
  • the encoding module 1304 when marking the first element as an out-of-bounds element, is specifically configured to: encode the flag information indicating that the first element is an out-of-bounds element into the code stream. middle.
  • the boundary value includes an upper boundary value and/or a lower boundary value
  • the encoding module 1304 when encoding the first element with a variable length code, is specifically used to: if The first element is greater than the upper boundary value, determine the first difference between the first element and the upper boundary value; perform variable length code encoding on the first difference; or, if the first element is less than the lower boundary value, determine a second difference between the first element and the lower boundary value; perform variable length code encoding on the second difference.
  • the determination module 1302 when determining the boundary value of the first element in the feature map, is specifically configured to: input the feature map into an entropy estimation network, and the entropy estimation network The network outputs a probability distribution model of the first element in the feature map; and determines the boundary value of the first element according to the probability distribution model of the first element.
  • the determination module 1302 when determining the boundary value of the first element according to the probability distribution model of the first element, is specifically configured to: according to the probability distribution of the first element The variance of the model determines the boundary value of the first element.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the upper boundary value in the boundary value is k* ⁇ , and/or the lower boundary value in the boundary value The value is -k* ⁇ ; where k is a constant and ⁇ represents the variance of the probability distribution model.
  • the determination module 1302 when determining the boundary value of the first element in the feature map, is specifically configured to: input the feature map to an entropy estimation network, and the entropy estimation network outputs The probability distribution model of the first element and the boundary value of the first element.
  • embodiments of the present application also provide a decoding device. Used to implement the functions of the decoding end in the above method embodiment.
  • the device may include modules/units that execute any of the possible implementation methods in the above method embodiments; these modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software.
  • the device may include: an acquisition module 1401, a determination module 1402, and a decoding module 1403.
  • the acquisition module 1401 is used to acquire a code stream, where the code stream includes information encoded by multiple elements.
  • Determining module 1402 is used to determine whether the first element in the code stream is an out-of-bounds element, and the first element is any one of the plurality of elements.
  • the decoding module 1403 is configured to perform variable length code decoding on the encoded information of the first element to obtain the first element when the first element is an out-of-bounds element; when the first element is not an out-of-bounds element, Entropy decoding is performed on the encoded information of the first element to obtain the first element.
  • the determination module 1402 when determining whether the first element in the code stream is an out-of-bounds element, is specifically configured to: if the code stream includes the first element out-of-bounds flag information, it is determined that the first element is an out-of-bounds element.
  • the decoding module 1403 when the decoding module 1403 performs variable length code decoding on the encoded information of the first element to obtain the first element, it is specifically used to: determine the boundary of the first element. value; perform variable length code decoding on the encoded information of the first element to obtain a difference, which is the difference between the first element and the upper boundary value of the boundary value, or is the first The difference between the element and the lower boundary value among the boundary values; the first element is determined based on the boundary value and the difference.
  • the decoding module 1403, when determining the boundary value of the first element is specifically used to: determine the probability distribution model of the first element; according to the probability of the first element The distribution model determines the boundary values of the first element.
  • the decoding module 1403 when determining the boundary value of the first element according to the probability distribution model of the first element, is specifically configured to: according to the probability distribution of the first element The variance of the model determines the boundary value of the first element.
  • the boundary value includes an upper boundary value and/or a lower boundary value; the upper boundary value in the boundary value is k* ⁇ , and/or the lower boundary value in the boundary value The value is -k* ⁇ ; where k is a constant and ⁇ represents the variance of the probability distribution model.
  • the decoding module 1403 when determining the boundary value of the first element, is specifically configured to: input the code stream to an entropy estimation network, and the entropy estimation network outputs the The boundary value of the first element.
  • An embodiment of the present application also provides a computer device.
  • the computer device includes a processor 1501 as shown in Figure 15, and a memory 1502 connected to the processor 1501. Further, the computer device may also include a communication interface 1503 and a communication bus 1504.
  • the processor 1501 may be a general processor, a microprocessor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or one or more integrated circuits used to control the execution of the program of this application, etc.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the memory 1502 is used to store program instructions and/or data, so that the processor 1501 calls the instructions and/or data stored in the memory 1502 to implement the above functions of the processor 1501.
  • Memory 1502 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type that can store information and instructions.
  • a dynamic storage device that can also be an electrically erasable programmable read-only memory (EEPROM) or can be used to carry or store desired program code in the form of instructions or data structures and can be stored by a computer. any other medium, but not limited to this.
  • the memory 1502 may exist independently, such as an off-chip memory, and is connected to the processor 1501 through the communication bus 1504. Memory 1502 may also be integrated with processor 1501.
  • Storage 1502 may include internal memory and external memory (such as hard disk, etc.).
  • Communication interface 1503 is used to communicate with other devices, such as PCI bus interface, network card, wireless access network (radio access network, RAN), wireless local area networks (WLAN), etc.
  • PCI bus interface for communicating with other devices, such as PCI bus interface, network card, wireless access network (radio access network, RAN), wireless local area networks (WLAN), etc.
  • WLAN wireless local area networks
  • Communication bus 1504 may include a path for communicating information between the components described above.
  • the computer device may be the encoding end in Figure 4 or Figure 10, or the decoding end shown in Figure 11.
  • the processor 1501 can call instructions in the memory 1502 to perform the following steps:
  • the feature map includes multiple elements; determine the boundary value of the first element in the feature map, where the first element is one of the multiple elements. any element; determine whether the first element is within the target range corresponding to the first element, and the target range is determined based on the boundary value of the first element; if the first element is within the If the first element is within the target range corresponding to the first element, perform entropy coding on the first element; if the first element is outside the target range corresponding to the first element: modify the first element to the first element. the boundary value corresponding to the element, and perform entropy encoding on the modified first element; or, mark the first element as an out-of-bounds element, and perform variable length code encoding on the first element.
  • each of the above components can also be used to support other processes performed by the encoding end in the embodiment shown in FIG. 4 or FIG. 10 .
  • the beneficial effects can be referred to the previous description and will not be repeated here.
  • the processor 1501 can call instructions in the memory 1502 to perform the following steps:
  • Obtain a code stream which includes information encoded by multiple elements; determine whether the first element in the code stream is an out-of-bounds element, and the first element is any one of the multiple elements; if The first element is an out-of-bounds element, and the encoded information of the first element is decoded with a variable length code to obtain the first element; if the first element is not an out-of-bounds element, the encoded information of the first element is Perform entropy decoding on the information to obtain the first element.
  • each of the above components can also be used to support other processes performed by the decoding end in the embodiment shown in FIG. 11 .
  • the beneficial effects can be referred to the previous description and will not be repeated here.
  • embodiments of the present application also provide a computer-readable storage medium.
  • Computer-readable instructions are stored in the computer-readable storage medium.
  • the above-mentioned Method embodiments are executed.
  • embodiments of the present application provide a computer program product containing instructions, which when run on a computer causes any of the above method embodiments to be executed.
  • embodiments of the present application provide a computer-readable storage medium that stores a bit stream, and the bit stream is generated according to the encoding method shown in Figure 4 or Figure 10.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a bit stream.
  • the bit stream includes program instructions executable by a decoder.
  • the program instructions The decoder is caused to perform the decoding method in the third aspect and any possible implementation manner of the third aspect.
  • the decoding system includes at least one memory and a decoder.
  • the at least one memory is used to store a bit stream.
  • the decoder is used to execute FIG. 11 decoding method shown.
  • embodiments of the present application provide a method for storing a bit stream.
  • the method includes receiving or generating a bit stream, and storing the bit stream in a storage medium.
  • the method further includes: performing format conversion processing on the bit stream to obtain a format-converted bit stream, and storing the format-converted bit stream in a storage medium.
  • embodiments of the present application provide a method for transmitting a bit stream.
  • the method includes receiving or generating a bit stream, transmitting the bit stream to a cloud server, or transmitting the bit stream to a mobile terminal.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请公开了一种编解码方法、装置及计算机设备,应用于图像处理技术领域。该方法中,通过编码网络得到图像特征图(特征图中包括多个元素);确定特征图中第一元素的边界值,根据该边界值确定第一元素对应的目标范围;若第一元素处于其对应的目标范围内,对第一元素进行熵编码;若第一元素处于其对应的目标范围外,将第一元素修改为第一元素对应的边界值,对修改后的第一元素进行熵编码,或者标记第一元素越界,对第一元素进行变长码编码。在上述方法中,针对每个元素确定其对应的边界值,缩小每个元素对应的目标范围,从而降低熵编、解码的复杂度;对于超范围元素,将其修改为边界值再进行熵编码,实现对每个元素进行熵编码以保证图像压缩性能。

Description

一种编解码方法、装置及计算机设备
相关申请的交叉引用
本申请要求在2022年07月08日提交中国专利局、申请号为202210806919.4、申请名称为“一种编解码方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种编解码方法、装置及计算机设备。
背景技术
图像压缩是指一种利用空间冗余度、视觉冗余度和统计冗余度等图像数据特性,以较少的比特有损或无损地表示原本的图像像素矩阵的技术,实现图像信息的有效传输和存储,对当前图像传输信息的种类和数据量越来越大的媒体时代起着重要作用。图像压缩分为有损压缩和无损压缩两种,有损压缩以一定程度的图像质量降低为代价实现较大的压缩比,无损压缩不会造成图像细节的损失,但压缩比通常高于有损压缩。
在传统的图像有损压缩算法中,很多技术被用于去除图像数据的冗余信息,如量化技术被用于消除由图像中相邻像素间的相关性引起的空间冗余度和由人眼视觉系统感知决定的视觉冗余度,熵编码技术被用于消除图像数据的统计冗余性。传统的有损图像压缩技术经过相关领域技术人员数十年的研究与优化已形成了诸如JPEG、BPG等成熟的有损图像压缩标准,但是传统图像压缩技术在编码效率的提升上也遇到了瓶颈,已无法满足多媒体应用数据日益增多的时代需求。
在图像压缩过程中,熵编码、熵解码是非常耗时的步骤。如何优化熵编码、熵解码的效率、如何结合人工智能(artificial intelligence,AI)压缩的特点来优化熵编码、熵解码的效率,也就变成了重要的研究问题之一。
发明内容
本申请实施例提供一种编解码方法、装置及计算机设备,用于简化熵编码、熵解码的复杂度,提升熵编码、熵解码的速度。
第一方面,本申请实施例提供一种编码方法,包括:将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素;确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;若所述第一元素处于所述第一元素对应的目标范围内,则对所述第一元素进行熵编码;若所述第一元素处于所述第一元素对应的目标范围外,将所述第一元素修改为所述第一元素对应的边界值,并对修改后的第一元素进行熵编码。
在上述方法实施例中,编码端通过编码网络得到图像的特征图;确定的特征图中每个元素的边界值,根据每个元素的边界值确定每个元素对应的目标范围;若元素取值处于其 对应的目标范围内,则对该元素进行熵编码;若元素取值处于其对应的目标范围外,将该元素取值修改为该元素对应的边界值,并对修改后的元素进行熵编码。由于上述方法中针对每个元素确定其对应的边界值,有助于缩小每个元素对应的目标范围,并且边界值是元素级别的,即针对元素确定的边界值仅适用于该元素,不适用于其他元素,能够更加精准的体现每个元素的概率分布特点。由于缩小了目标范围,降低了元素在目标范围内的概率分布复杂度,从而能够降低熵编码的复杂度,提升熵编码速度。对于超出目标范围的元素,已有的熵编码优化方式中对越界元素不进行熵编码,而是直接将其取值写入码流中,相当于采用了一种特殊的定长码进行编码,需要占用较多数量的比特位;而本申请上述方法实施例中对目标范围外的元素修改其取值后仍对其进行熵编码,在信息损失不明显的情况下进行了信息压缩;此外,由于修改前的第一元素取值概率非常小,对此概率进行量化时不仅会产生较大的误差,还会降低其他原本较高概率取值的概率,而将第一元素修改为边界值后,相当于对超出边界的若干取值的概率进行打包,再进行量化时产生的量化误差较小,减少了对高概率取值的影响,从而有助于避免较小概率值在量化时产生的量化误差。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述将所述第一元素修改为所述第一元素对应的边界值,包括:若所述第一元素大于所述上边界值,则将所述第一元素修改为所述上边界值,或者若所述第一元素小于所述下边界值,则将所述第一元素修改为所述下边界值。在该实现方式中,可以仅设置上边界值,将第一元素仅与上边界值比较即可;也可以仅设置下边界值,将第一元素仅与下边界值比较即可;还可以设置上边界值和下边界值,则需要将第一元素分别与上下边界值进行比较,以使最终能够对目标范内的取值进行熵编码,以实现简化熵编码的复杂度、提升熵编码的速度。
在一种可能的实现方式中,所述确定所述特征图中的第一元素的边界值,包括:将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;根据所述第一元素的概率分布模型确定所述第一元素的边界值。熵估计网络能输出第一元素的概率分布模型,例如,在采用高斯分布模型时熵估计网络可以输出第一元素的均值和方差;编码端根据第一元素的概率分布模型确定第一元素的边界值,能够使得确定出的边界值尽可能过滤掉概率非常小的取值,从而避免复杂的熵编码过程。
在一种可能的实现方式中,所述根据所述第一元素的概率分布模型确定所述第一元素的边界值,包括:根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。方差能够用于衡量数据离散程度,根据方差确定边界值,便于取得能够过滤掉概率非常小的取值的边界值。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;其中,k为常数,σ表示所述概率分布模型的方差。
在一种可能的实现方式中,所述确定特征图中的第一元素的边界值,包括:将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。在该实现方式中,也可以预先对熵估计网络进行训练,以使熵估计网络既能够输出第一元素的概率分布模型,又能够输出边界值,且输出的边界值往往更符合提高熵编码性能的需求。
在一种可能的实现方式中,所述方法还包括:对所述第一元素进行量化;所述判断所述第一元素是否处于所述第一元素对应的目标范围内,包括:判断量化后的第一元素是否 处于所述第一元素对应的目标范围内;所述对所述第一元素进行熵编码,包括:对所述量化后的第一元素进行熵编码。
在一种可能的实现方式中,所述对所述一元素进行熵编码,包括:确定所述第一元素在所述目标范围内的概率对所述第一元素进行熵编码。可选地,可以先根据熵估计网络输出的第一元素的概率分布模型,确定所述第一元素的概率、所述目标范围的概率,根据所述第一元素的概率与所述目标范围的概率的比值,对所述第一元素进行熵编码。其中,目标范围的概率可以由取值小于上边界的概率减去取值小于下边界的概率得到。
在一种可能的实现方式中,所述概率分布模型为高斯分布模型。
第二方面,本申请实施例提供一种编码方法,包括:将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素;确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;若所述第一元素处于所述第一元素对应的目标范围内,则对所述第一元素进行熵编码;若所述第一元素处于所述第一元素对应的目标范围外,标记所述第一元素为越界元素,对所述第一元素进行变长码编码。
在上述方法实施例中,编码端通过编码网络得到图像的特征图;确定的特征图中每个元素的边界值,根据每个元素的边界值确定每个元素对应的目标范围;若元素取值处于其对应的目标范围内,则对该元素进行熵编码;若元素取值处于其对应的目标范围外,则将第一元素标记为越界元素,并对第一元素进行变长码编码。由于上述方法中针对每个元素确定其对应的边界值,有助于缩小每个元素对应的目标范围,由于缩小了目标范围,降低了元素在目标范围内的概率分布复杂度,从而能够降低熵编码的复杂度,提升熵编码速度。对于越界元素,已有的熵编码优化方式中采用了定长码进行编码,需要占用较多数量的比特位;而本申请上述实施例中,对越界元素进行变长码编码,有助于提升压缩性能,尤其是对越界元素与边界的差值进行变长码编码,更加有利于缩减编码后所需比特位的数量,进一步提升压缩性能。
在一种可能的实现方式中,所述标记所述第一元素为越界元素,包括:将用于指示所述第一元素为越界元素的标志信息编码至码流中。所述码流包括对所述第一元素进行熵编码得到的编码信息,或者,对所述第一元素进行变长码编码得到的编码信息。此外,所述码流还可以包括对其他非越界元素进行熵编码得到的码流信息,以及对其他越界元素进行变长码编码得到的编码信息。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述对所述第一元素进行变长码编码,包括:若所述第一元素大于所述上边界值,确定所述第一元素与所述上边界值的第一差值;对所述第一差值进行变长码编码;或者,若所述第一元素小于所述下边界值,确定所述第一元素与所述下边界值的第二差值;对所述第二差值进行变长码编码。在该实现方式中,可以仅设置上边界值,将第一元素仅与上边界值比较即可;也可以仅设置下边界值,将第一元素仅与下边界值比较即可;还可以设置上边界值和下边界值,则需要将第一元素分别与上下边界值进行比较,从而确定第一元素与边界的差值。
在一种可能的实现方式中,所述确定所述特征图中的第一元素的边界值,包括:将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;根据所述第一元素的概率分布模型确定所述第一元素的边界值。
在一种可能的实现方式中,所述根据所述第一元素的概率分布模型确定所述第一元素的边界值,包括:根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;其中,k为常数,σ表示所述概率分布模型的方差。
在一种可能的实现方式中,所述确定特征图中的第一元素的边界值,包括:将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。
在一种可能的实现方式中,所述方法还包括:对所述第一元素进行量化;所述判断所述第一元素是否处于所述第一元素对应的目标范围内,包括:判断量化后的第一元素是否处于所述第一元素对应的目标范围内;所述对所述第一元素进行熵编码,包括:对所述量化后的第一元素进行熵编码。
在一种可能的实现方式中,所述对所述一元素进行熵编码,包括:确定所述第一元素在所述目标范围内的概率对所述第一元素进行熵编码。可选地,可以先根据熵估计网络输出的第一元素的概率分布模型,确定所述第一元素的概率、所述目标范围的概率,根据所述第一元素的概率与所述目标范围的概率的比值,对所述第一元素进行熵编码。其中,目标范围的概率可以由取值小于上边界的概率减去取值小于下边界的概率得到。
在一种可能的实现方式中,所述概率分布模型为高斯分布模型。
第三方面,本申请实施例提供一种解码方法,包括:获取码流,所述码流包括多个元素编码后的信息;确定所述码流中的第一元素是否为越界元素,所述第一元素为所述多个元素中的任一个元素;若所述第一元素为越界元素,对所述第一元素编码后的信息进行变长码解码,得到所述第一元素;若所述第一元素不是越界元素,对所述第一元素编码后的信息进行熵解码,得到所述第一元素。
解码端确定第一元素是否为越界元素,即确定第一元素是否位于第一元素对应的目标范围内,若位于目标范围内,则第一元素不是越界元素,若位于目标范围之外,则第一元素为越界元素。
在一种可能的实现方式中,所述确定所述码流中的第一元素是否为越界元素,包括:若所述码流中包括所述第一元素越界标志信息,则确定所述第一元素为越界元素。
在一种可能的实现方式中,所述对所述第一元素编码后的信息进行变长码解码,得到第一元素,包括:确定所述第一元素的边界值;对所述第一元素编码后的信息进行变长码解码得到差值,所述差值为所述第一元素与所述边界值中上边界值的差值,或者为所述第一元素与所述边界值中下边界值的差值;根据所述边界值和所述差值确定所述第一元素。
在一种可能的实现方式中,所述确定所述第一元素的边界值,包括:确定所述第一元素的概率分布模型;根据所述第一元素的概率分布模型确定所述第一元素的边界值。
在一种可能的实现方式中,所述根据所述第一元素的概率分布模型确定所述第一元素的边界值,包括:根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;其中,k为常数,σ表示所述概率分布模型的方差。
在一种可能的实现方式中,所述确定所述第一元素的边界值,包括:将所述码流输入 至熵估计网络,所述熵估计网络输出所述第一元素的边界值。
第四方面,本申请实施例提供一种编码装置,所述装置包括执行上述第一方面以及第一方面的任意一种可能实现方式的方法的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
示例性的,该编码装置可以包括:编码网络模块,用于根据输入的图像输出所述图像的特征图,所述特征图包括多个元素;确定模块,用于确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;判断模块,用于判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;编码模块,用于当所述第一元素处于所述第一元素对应的目标范围内时,则对所述第一元素进行熵编码,当所述第一元素处于所述第一元素对应的目标范围外时,将所述第一元素修改为所述第一元素对应的边界值,并对修改后的第一元素进行熵编码。
第五方面,本申请实施例提供一种编码装置,所述装置包括执行上述第二方面以及第二方面的任意一种可能实现方式的方法的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
示例性的,该编码装置可以包括:编码网络模块,用于根据输入的图像输出所述图像的特征图,所述特征图包括多个元素;确定边界模块,用于确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;判断模块,用于判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;熵编码模块,用于当所述第一元素处于所述第一元素对应的目标范围内时,则对所述第一元素进行熵编码,当所述第一元素处于所述第一元素对应的目标范围外时,标记所述第一元素为越界元素,对所述第一元素进行变长码编码。
第六方面,本申请实施例提供一种解码装置,所述装置包括执行上述第三方面以及第三方面的任意一种可能实现方式的方法的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
示例性的,该解码装置可以包括:获取模块,用于获取码流,所述码流包括多个元素编码后的信息;确定模块,用于确定所述码流中的第一元素是否为越界元素,所述第一元素为所述多个元素中的任一个元素;解码模块,用于当所述第一元素为越界元素时,对所述第一元素编码后的信息进行变长码解码,得到所述第一元素,当所述第一元素不是越界元素时,对所述第一元素编码后的信息进行熵解码,得到所述第一元素。
第七方面,本申请实施例提供一种编码器,所述编码器包括处理电路,用于执行如第一方面、第二方面及其任一实现方式所述的编码方法。
第八方面,本申请实施例提供一种解码器,所述解码器包括处理电路,用于执行如第三方面及其任一实现方式所述的解码方法。
第九方面,本申请实施例提供一种编码器,包括:一个或多个处理器;计算机可读存储介质,耦合到所述一个或多个处理器,所述计算机可读存储介质存储有程序,其中,所述程序在由所述一个或多个处理器执行时,使得所述编码器执行如第一方面、第二方面及其任一实现方式所述的编码方法。
第十方面,一种解码器,包括:一个或多个处理器;计算机可读存储介质,耦合到所述一个或多个处理器,所述计算机可读存储介质存储有程序,其中,所述程序在由所述一个或多个处理器执行时,使得所述解码器执行如第三方面及其任一实现方式所述的解码方 法。
第十一方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得所述计算机执行第一方面、第二方面、第三方面及其任一实现方式所述的方法。
第十二方面,本申请实施例提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得第一方面、第二方面、第三方面及其任一实现方式所述的方法被执行。
第十三方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有比特流,所述比特流根据第一方面或第二方面以及任一种可能的实现方式中的编码方法生成。
第十四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有比特流,所述比特流包括解码器可执行的程序指令,所述程序指令使得解码器执行第三方面以及第三方面的任一种可能的实现方式中的解码方法。
第十五方面,本申请实施例提供一种译码系统,所述译码系统包括至少一个存储器和解码器,所述至少一个存储器用于存储比特流,所述解码器用于执行第三方面以及第三方面的任一种可能的实现方式中的解码方法。
第十六方面,本申请实施例提供一种存储比特流的方法,该方法包括接收或生成比特流,将所述比特流存储到存储介质中。
在一种可能的实现方式中,该方法还包括:对所述比特流进行格式转换处理,得到格式转换后的比特流,并将所述格式转换后的比特流存储到存储介质中。
第十七方面,本申请实施例提供一种传输比特流的方法,该方法包括接收或生成比特流,将所述比特流传输到云端服务器,或将所述比特流传输到移动终端。
附图说明
图1为一种VAE方法的流程示意图;
图2为一种基于概率分布的熵编码优化方法的流程示意图;
图3为本申请实施例提供的一种场景示意图;
图4为本申请实施例提供的一种编码方法的流程示意图;
图5为本申请实施例提供的一种编码网络、解码网络示意图;
图6为本申请实施例提供的一种熵估计网络示意图;
图7为本申请实施例提供的一种解码方法的流程示意图;
图8为本申请实施例提供的编码效果对比图;
图9为本申请实施例提供的解码效果对比图;
图10为本申请实施例提供的另一种编码方法的流程示意图;
图11为本申请实施例提供的另一种解码方法的流程示意图;
图12为本申请实施例提供的一种编码装置的结构示意图;
图13为本申请实施例提供的另一种编码装置的结构示意图;
图14为本申请实施例提供的一种解码装置的结构示意图;
图15为本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
由于AI在图像识别、目标检测等诸多领域中的表现优于传统图像算法,故深度学习也被应用于实现图像压缩领域。区别于传统图像算法通过人工设计实现对图间变换(即原始图像到特征图的变换)、量化、熵编码等处理步骤进行优化,AI图像压缩算法的各模块(编码网络、熵估计网络、解码网络等)是作为一个整体共同进行优化,因此AI图像压缩方案的压缩效果更好。
变分自编码(variational auto encoder,VAE)方法是当前AI图像有损压缩技术的主流技术方案,该方法在多尺度结构相似性指数(multi-scale structural similarity,MS-SSIM)和峰值信噪比(peak signal to noise ratio,PSNR)这两项图像压缩效果评价指标上,均优于JPEG等传统图像有损压缩技术。VAE方法的主要流程可以如图1所示。
其中,编码流程可以包括以下步骤:
1)编码端将图像输入编码网络,编码网络输出图像的特征图。编码网络负责把图像转换为另一个空间的特征表示,并去除图像中的噪声、高频信息等。
2)编码端对特征图进行量化,例如对特征图中的特征值进行四舍五入,得到量化后的待编码特征图
3)编码端根据熵估计网络得到中每个元素的概率分布。
4)编码端根据每个元素的概率分布,对元素进行熵编码,得到编码后的码流。
解码流程可以包括以下步骤:
1)解码端在获取到码流后,根据熵估计网络确定中每个元素的概率分布。
2)解码端根据码流与每个元素的概率分布进行熵解码,得到
4)解码端再将输入解码网络,得到重建后的图像。
在上述编码、解码流程中,采用了熵编码、熵解码技术。根据信息论的原理,可以找到最佳数据压缩编码的方法,数据压缩的理论极限是信息熵;如果要求编码过程中不丢失信息量,即要求保存信息熵,这种信息保持编码称为熵编码,熵编码是根据消息出现概率的分布特性而进行的数据压缩编码。
熵编码、熵解码是数据压缩中常用的算法,用于将数据压缩至理论熵大小:-logb Ps,其中b表示衡量码流大小的进制数(通常为2),Ps表示数据元素的概率。对于一个序列S={s0,s1,…,sn},以及该序列中每个元素对应的概率分布{p0,p1,…,pn},熵编码的目的是将序列S压缩至∑-log2pi(si)大小的二进制码流。而熵解码的目的是根据每个元素的概率分布以及码流,恢复出序列S。
AI图像压缩技术具有较优的压缩率,使得研究学者、科研机构开始不断将AI图像压缩技术用于实际应用当中。然而,实际应用中除了对压缩率有较高要求之外,还对解码时间、编码时间也有较高的要求。在AI图像压缩中,除了编码网络/解码网络部分耗时较长,熵编码、熵解码的过程也是非常耗时的步骤,故熵编码、熵解码是AI图像压缩速度上的重要瓶颈之一。如何优化熵编码、熵解码的效率,也就变成了重要的研究问题。
熵编码、熵解码之所以非常耗时,主要原因在于:
一、CPU在执行熵编码、熵解码时,由于元素之间存在关联性,故CPU只能串行操作。
二、每个元素可取值范围较大。例如,传统压缩技术采用的通常为二进制编码器(binary coder),元素取值为0或1;而AI压缩技术中,元素的取值范围通常为[-64,64],[-128, 128]等,由于取值范围变大,使得概率分布更加复杂,且AI压缩技术中每个元素的都有自己的概率分布。
此外,由于在熵编码、熵解码中,需要对概率分布值做量化,通常做法为量化至bit次幂,即round(pi*2bit)。且范围内每一个可能取值的概率至少需要大于0,因此若范围较大,则对于小概率的元素值,可能存在较大的量化误差,例如,元素第i个可能取值对应的概率pi=0.0000001,而量化至8bit,概率变为1/28=0.03,若该可能取值的概率由于量化过程被增加,那么其他可能取值的概率将被降低。
为了优化熵编码、熵解码的过程,一种基于概率分布的熵编码优化方法可以如图2所示。该方案认为对于具有高峰值概率分布的元素不需要熵编码,直接采用峰值替换元素,例如若某个元素的概率分布在均值的概率大于预设阈值,则采用均值替换该元素的原有取值,不再对该元素进行熵编码。该方法是一种在码率和损失之间进行权衡的方法,同时减少了需要进行熵编码、熵解码的元素,加快了熵编/解码的速度。
如图2所示,上述方法的编码流程可以包括以下步骤:
1)编码端将图像输入编码网络,并对编码网络输出的特征图进行量化,得到量化后的待编码特征图
2)编码端根据熵估计网络得到中每个元素的概率分布,以高斯分布为例,得到每个元素的均值μ和方差σ。
3)编码端根据概率分布确定每个元素在取值范围内的概率峰值P,在高斯分布中即为均值μ对应的概率。
4)编码端遍历每个元素,判断每个元素的峰值概率P是否大于阈值Pth,若大于阈值Pth,则将该元素的实际取值替换为峰值概率P对应的元素取值并写入码流,若小于阈值Pth,则根据概率分布对该元素进行熵编码。
解码流程可以包括以下步骤:
1)解码端在获取到码流后,根据熵估计网络确定中每个元素的概率分布。
2)编码端根据概率分布确定每个元素在取值范围内的概率峰值P。
3)编码端遍历每个元素,判断每个元素的峰值概率P是否大于阈值Pth,若大于阈值Pth,则将该元素赋值为概率峰值P对应的取值,若小于阈值Pth,则根据概率分布对该元素进行熵解码。
5)解码端将解码后得到的输入解码网络,得到重建后的图像。
上述熵编码优化方法有助于加速熵编/解码,但是该方法在针对特征图中的每个元素时,要么不进行熵编码、熵解码,要么按照传统方式执行熵编码、熵解码的全流程。对于进行熵编码、熵解码的元素来说,熵编码、熵解码过程并未优化;对于不进行熵编码、熵解码的元素,将赋值后的元素直接写入码流,不利于压缩率的降低。
在另一种对熵编码、熵解码进行优化的方案中,可称为旁路编码(bypass)方法,若一个元素的可能取值对应的概率足够小,则不对该元素进行熵编码,直接将该元素的取值写入码流。
该方法的编码流程可以包括以下步骤:
1)编码端确定特征图中每个元素的概率分布,确定每个元素不小于最低概率阈值的取值范围。例如,假设概率阈值为95%,一个元素的取值范围是[-64,64],该元素取值小于等于40的概率为95%,则该元素不小于最低概率阈值的取值范围为-64~40。
2)编码端遍历每个元素,判断元素的取值是否位于不小于最低概率阈值的取值范围内:
2a)若位于该范围外,则对该元素设置标志位,表示该元素超出了不小于最低概率阈值的取值范围,并将该元素的取值直接写入码流,不再进行编码。
2b)若位于该范围外,则根据概率分布对该元素进行熵编码。
该方法的解码流程可以包括以下步骤:
1)解码端在获取到码流后,根据熵估计网络确定中每个元素的概率分布。
2)解码段根据码流中每个元素对应的信息,确定是否包含标志位,若包含则直接读取该元素的取值,若不包含,则根据该元素的概率分布对该元素对应的信息进行熵解码。
按照上述方法进行熵编码、熵解码时,有助于降低量化过程中小概率元素取值的误差,但是依然对量化精度有较高要求。此外,该方法也存在部分元素不编码,部分元素按照传统方式执行熵编码、熵解码的全流程的问题。
由于目前对熵编码、熵解码的优化方法,并没有降低熵编码、熵解码算法的复杂度,因而对编码、解码的速度优化效果十分有限。
有鉴于此,本申请实施例提供一种编码、解码方法,通过设置元素的边界值,缩小元素取值的目标范围,简化元素在目标范围内的概率分布,进而减少元素概率所需比特位,以实现对降低熵编码、熵解码算法的复杂度,从而显著降低熵编码、熵解码的耗时。本申请可以应用于对图像、视频等数据进行编码压缩的过程,例如视频监控、直播、终端录像、存储、传输、云端编解码、云端转码、视频流分发等业务中的数据编码压缩过程,尤其适用于基于AI的压缩场景中。
图3示例性的提供了一种能够应用本申请实施例提供的编码、解码方法的场景示意图,在该场景中,监控设备301(或监控设备302)对采集到的视频信息进行编码,并将编码后得到的码流上传至云端服务器306;云端服务器306可以在接收到终端设备303(或终端设备304、终端设备305)获取码流的请求后,将码流发送给终端设备303,终端设备303对获取到的码流进行解码,从而进行视频播放。此外,云端服务器306也可以具备解码和/或编码的能力,例如云端服务器306可以对获取到的码流进行解码,然后对视频进行处理,再对处理后的视频进行编码,以便后续发送至其他终端设备。
参见图4,为本申请实施例提供的一种编码方法的流程示意图,图4所示流程由编码端执行,编码端可以是一个计算设备,也可以由多个计算设备共同实现,其中,计算设备为具有编码功能的设备,可以是服务器,如云端服务器;也可以是终端设备,如监控设备、用于直播的终端设备等等。具体的,图4所示的编码方法可以包括以下步骤:
步骤401、编码端将图像输入至编码网络,得到图像的特征图,该特征图包括多个元素。
编码端将获取到的待压缩图像输入至编码网络,编码网络用于将图像转换为另一个空间的特征表示并输出特征图。
其中,编码网络可以理解为一个功能模块,可以由卷积、激活(如relu、leaky_relu等)、上下采样等组成。图5示例性的提供了一种能够适用于本申请实施例的编、解码网络,如图5中的(a)所示,编码端的编码网络可以由卷积(conv)、广义除数归一化(generalized divisive normalization,GDN)交替构成。其中,图5中所示的卷积函数conv M*5*5/2,表示进行通道数为M、卷积核为5*5以及1/2下采样的卷积函数;而GDN为激活函数的一 种。应当理解,图5仅为一个示例,实际应用时也可以采用其他能够实现相似功能的编码网络。
编码网络输出的特征图包括多个元素。例如,当输入的图像为三维矩阵时,编码网络输出的特征图也可以是一个三维矩阵,假设三维矩阵的尺寸为M*N*C,即该三维矩阵包括M*N*C个元素,每个元素的取值即为该元素对应的特征值。
步骤402、编码端确定特征图中的第一元素的边界值,该第一元素为上述多个元素中的任一个元素。
常见的元素取值范围包[-64,64]、[-128,128]等,由于元素取值范围较大,导致元素取值的概率分布情况较为复杂,将进一步导致熵编码复杂度增加、熵编码速度较低。为了简化熵编码复杂度,提高熵编码速度,本申请实施例中在原有取值范围的基础上,进一步设置元素级别的边界,由该边界确定对元素进行熵编码时的取值的目标范围,以使后续进行熵编码时,根据该目标范围内的概率分布情况进行熵编码,且该边界应用于其对应的元素,并不适用于特征图中的每个元素。在一种可选的情况中,特征图中的多个元素中的每个元素对应各自的边界值;在另一种可选的情况中,该多个元素中的部分元素对应各自的边界值,而并非每个元素都有各自的边界值。
以特征图中元素的原取值范围是[-64,64]为例,该取值范围是特征图级别的,即特征图中每个元素的原本取值范围均是[-64,64]。但是,对于其中的某一个元素来说,该元素的取值在[-64,64]中部分区间上的概率很小,若仍根据该元素取值在[-64,64]上的概率分布进行熵编码,则编码过程复杂度较高、耗时较长,为了简化熵编码,可以通过设置元素级别的边界,以实现剔除概率值较低的元素取值,从而使得后续进行熵编码时根据由边界确定的目标范围内的取值概率分布情况进行熵编码。例如,第一元素原本取值范围是[-64,64],但第一元素取值大于40的概率仅为0.001,小于-40的概率仅为0.001,由于第一元素大于40或小于-40的概率非常低,因此,可以将-40、40设置为边界,目标范围则为[-40,40],后续则根据目标范围内的概率分布情况对第一元素进行熵编码,以降低熵编码的复杂度,从而提高熵编码速度。
第一元素的边界值可以包括上边界和/或下边界。仍以特征图中元素的原取值范围是[-64,64]为例,当第一元素的边界包括上边界r时,那么第一元素对应的目标范围可以是[-64,r],其中r≤64;当第一元素的边界包括下边界l时,那么第一元素对应的目标范围可以是[l,64],其中l≥-64;当第一元素的边界包括上边界r和下边界l时,那么第一元素对应的目标范围是[l,r],其中l≥-64且r≤64。
编码端在确定特征图中的第一元素的边界值时,可以根据第一元素的概率分布情况确定出第一元素的边界值;也可以预先进行网络训练,由训练好的网络输出第一元素的边界值。下面分别进行介绍:
一、根据第一元素的概率分布情况确定出第一元素的边界值。
编码端可以先将编码网络输出的特征图输入至熵估计网络,熵估计网络输出第一元素的概率分布模型,编码端根据第一元素的概率分布模型确定第一元素的边界值。
以第一元素服从高斯分布为例,那么第一元素的概率分布模型可以通过均值μ和方差σ表示,即熵估计网络可以输出相应的均值μ和方差σ,编码端可以根据方差确定第一元素的边界值。例如,若均值μ为0,编码端可以将kσ确定为上边界值,和/或,将-kσ确定为下边界值;其中,k为正常数,σ为方差。在标准高斯分布中,取值在[-σ,σ]范围内的概率 约为68%,在[-2σ,2σ]范围内的概率约为95%,在[-3σ,3σ]范围内的概率约为99%,当k的取值更大时,则目标范围内的概率会更大,由此可见,通过k值的设置,能够剔除掉概率很小的取值。
图6示例性的提供了一种能够适用于本申请实施例的熵估计网络,如图6中的(a)所示,编码端的熵估计网络可以由卷积(conv)、激活(relu)交替构成。
二、由网络确定第一元素的边界值。
编码端可以将特征图输入至用于确定边界值的网络,以使该网络输出第一元素的边界值。例如,可以对熵估计网络进行边界值的训练,以使熵估计网络不仅能够输出第一元素的概率分布模型,还能够输出第一元素的边界值。
又或者,也可以不由熵估计网络输出第一元素的边界值,而由其他网络根据特征图输出第一元素的边界值,或者由其他网络根据第一元素的概率分布模型输出第一元素的边界值。
编码端可以采用上述任意实现方式遍历特征图中的每个元素,确定出每个元素对应的边界值。
步骤403、编码端判断第一元素是否处于第一元素对应的目标范围内。
上的第一元素对应的目标范围即为根据第一元素的边界值确定出的范围。
可选的,在执行步骤403之前,还可以对第一元素进行量化;相应地,若执行步骤403,则判断量化后的第一元素是否处于第一元素对应的目标范围内。例如,量化可以是对第一元素的取值进行四舍五入,量化公式可以采用其中,y表示第一元素量化前的取值,表示第一元素量化后的取值。又例如,量化也可以对第一元素的取值进行残差量化,量化公式可以采用其中,y表示第一元素量化前的取值,μ表示第一元素取值的概率分布的均值,表示第一元素量化后的取值。
若第一元素处于第一元素对应的目标范围内,这执行步骤404a;若第一元素处于第一元素对应的目标范围外,则执行步骤404b。
步骤404a、若第一元素处于第一元素对应的目标范围内,编码端对第一元素进行熵编码。
若第一元素位于第一元素对应的目标范围内,可以认为第一元素的取值并非概率很小的取值,可以根据第一元素的取值在目标范围内的概率分布情况对第一元素进行熵编码。若对第一元素进行了量化,则对量化后的第一元素进行熵编码。
例如,量化后的元素s1的取值概率为pG(s),且量化后的元素s1的取值在目标范围[l,r]内,则根据量化后的元素s1的取值在目标范围[l,r]中的概率分布情况对量化后的元素s1进行熵编码。具体的,元素s1的取值小于目标范围上边界r的概率为cG(r),元素s1的取值小于目标范围下边界l的概率为cG(l),那么量化后的元素s1的取值在目标范围内的概率为根据概率pLG(s)对量化后的元素s1进行熵编码即可。
其中,第一元素的取值的概率分布情况可以根据熵估计网络确定,即,编码端将特征图输入已经训练好的熵估计网络,熵估计网络输出第一元素的概率分布模型。例如,以第一元素服从高斯分布(又称正态分布)为例,那么熵估计网络可以输出相应的均值μ和方差σ,编码端能够根据均值μ和方差σ确定出每个可能取值对应的概率。以特征图中元素的原取值范围是[-64,64]为例,编码端能够根据第一元素对应的均值μ和方差σ确定在[-64,64]中任意取值对应的概率。编码端在确定出第一元素实际取值(若对第一元素进行了量化, 则此处为量化后的第一元素)的概率,以及第一元素实际取值(若对第一元素进行了量化,则此处为量化后的第一元素)在目标范围内的概率,以实现根据第一元素实际取值在目标范围内的概率对第一元素进行熵编码。
步骤404b、若第一元素处于第一元素对应的目标范围外,编码端将第一元素修改为第一元素对应的边界值,并对修改后的第一元素进行熵编码。
若第一元素(或者量化后的第一元素)位于第一元素对应的目标范围之外,可以认为第一元素的取值为概率很小的取值,在这种情况下,可以将第一元素的取值修改为对应的边界值,信息损失并不明显,但能够降低对第一元素进行熵编码的复杂度;此外,由于修改前的第一元素取值概率非常小,对此概率进行量化时不仅会产生较大的误差,还会降低其他原本较高概率取值的概率,而将第一元素修改为边界值后,相当于对超出边界的若干取值的概率进行打包,再进行量化时产生的量化误差较小,减少了对高概率取值的影响。
具体的,若第一元素大于目标范围的上边界,即上述步骤402中确定出的边界值中包括的上边界,则将第一元素的取值修改为上边界值;若第一元素小于目标范围的下边界,即上述步骤402中确定出的边界值中包括的下边界,则将第一元素的取值修改为下边界值。例如,第一元素对应的目标范围是[-40,40];若第一元素的取值为64,大于目标范围的上边界,则将第一元素取值修改为40,然后根据取值40在[-40,40]中的概率分布进行熵编码;若第一元素的取值为-50,小于目标范围的下边界,则将第一元素取值修改为-40,然后根据取值-40在[-40,40]中的概率分布进行熵编码。
与步骤404a中确定概率分布情况的方式类似,在步骤404b中,编码端也可以通过熵估计网络确定第一元素每个可能取值对应的概率,在将第一元素取值修改为对应的边界值后,确定修改后的第一元素在目标范围内的概率,进而根据修改后的第一元素在目标范围内的概率对修改后的第一元素进行熵编码。
在对第一元素执行上述步骤404a或步骤404b之后,得到的编码信息将被包括在码流中,可以被发送给解码端。该码流中可以包括对特征图中每个元素进行编码后得到的编码信息。
在上述方法实施例中,编码端通过编码网络得到图像的特征图;确定的特征图中每个元素的边界值,根据每个元素的边界值确定每个元素对应的目标范围;若元素取值处于其对应的目标范围内,则对该元素进行熵编码;若元素取值处于其对应的目标范围外,将该元素取值修改为该元素对应的边界值,并对修改后的元素进行熵编码。由于上述方法中针对每个元素确定其对应的边界值,有助于缩小每个元素对应的目标范围,由于缩小了目标范围,降低了元素在目标范围内的概率分布复杂度,从而能够降低熵编码的复杂度,提升熵编码速度;对于超出目标范围的元素,已有的熵编码优化方式中对越界元素不进行熵编码,而是直接将其取值写入码流中,相当于采用了一种特殊的定长码进行编码,需要占用较多数量的比特位;而本申请上述方法实施例中对目标范围外的元素修改其取值后仍对其进行熵编码,在信息损失不明显的情况下进行了信息压缩,提升了的压缩性能,且有助于避免较小概率值在量化时产生的量化误差。
与上述方法相对应地,解码端可以根据图6所示的流程进行解码。图6所示流程由解码端执行,解码端可以是一个计算设备,也可以由多个计算设备共同实现。具体的,图7所示的解码方法可以包括以下步骤:
步骤701、解码端获取码流,该码流包括多个元素编码后的信息。
步骤702、解码端根据熵估计网络,对码流进行解码得到图像的特征图。
图6示例性的提供了一种能够适用于本申请实施例的熵估计网络,如图6中的(b)所示,解码端的熵估计网络可以由深度卷积(dconv)、激活(relu)交替构成。
步骤703、解码端将特征图输入至解码网络,得到重建图像。
示例性的,当编码端采用如图5中的(a)所示的编码网络时,解码端可以采用如图5中的(b)所示的解码网络,该解码网络可以由深度卷积(dconv)、GDN交替构成。
基于图4、图7所示的编码、解码流程进行编、解码,在编、解码速度和压缩性能方面均优于目前已有的熵编解码方法。下面将本申请上述实施例提供的编、解码方法,与图1所示的编解码方法(简称为基础模型baseline,或者简称为模型1)以及图2所示的编解码方法(简称为baseline+skip,或者简称为模型2)进行比较。在进行对比试验时,基于本申请实施例的编、解码方法采集了两种数据,分别为:1、上、下边界值分别为-5σ、5σ(简称为limitG5,或者简称为模型3);2、上、下边界值分别为-10σ、10σ(简称为limitG10)。
Bjontegaard-Delta比特率(BD-Rate)用于在不同的压缩方法之间进行性能度量。A方法相比于B方法的BD-Rate,表示在相同的客观指标下,A方法相比于B方法码率的差异,通常用百分数表示,若为-x%则表明A方法相比于B方法能够节省x%的空间,若为正,则表示增加x%的空间。
表1
如表1所示,以baseline作为比较基准,将baseline+skip、limitG5、limitG10分别与baseline进行比较。baseline+skip的BD-Rate为-4.17%,表示baseline+skip相比于baseline节省4.17%的空间;limitG10的BD-Rate为-4.91%,表示limitG10相比于baseline节省4.91%的空间;limitG5的BD-Rate为-4.95%,表示limitG5相比于baseline节省4.95%的空间。由此可见,本申请实施例提供的编码方法在压缩性能方面优于图1、图2所示的编码方法。
在比较上述多种编码方法的编码速度时,将baseline、baseline+skip、bypass方法(概率阈值分别取5σ、10σ的两组数据分别简称为baseline+bypass5(或者简称为模型4)、baseline+bypass10(或者简称为模型5))以及limitG5进行比较。每种方法均分别采用8bit、10bit、12bit以及14bit对概率进行量化。
如图8所示,横坐标表示编码的时间,纵坐标表示BD-Rate,5条曲线从上到下依次对应baseline、baseline+bypass5、baseline+bypass10、baseline+skip以及limitG5。从图8可以看出,采用8bit对概率进行量化时(即每条曲线上的第一个点,baseline曲线上的第一个点由于BD-Rate过高而未显示出),limitG5编码时间最短且BD-Rate最低;采用10bit对概率进行量化时(即每条曲线上的第二个点,baseline曲线上的第二个点由于BD-Rate过高而未显示出),同样是limitG5编码时间最短且BD-Rate最低;采用12bit、14bit对概率进行量化时,也是limitG5编码时间最短且BD-Rate最低。由此可见,本申请实施例提供的编、解码方法在编码速度方面优于图1、图2所示的编、解码方法以及bypass方法。
如图9所示,横坐标表示解码的时间,纵坐标表示BD-Rate。从图9可以看出,采用8bit、10bit、12bit、14bit对概率进行量化时,limitG5解码时间最短且BD-Rate最低,由此 可见,本申请实施例提供的编、解码方法在解码速度方面优于图1、图2所示的编、解码方法以及bypass方法。
本申请实施例还提供了一种编码、解码方法,同样能够实现对降低熵编码、熵解码算法的复杂度,从而显著降低熵编码、熵解码的耗时。该方法同样可以应用于对图像、视频等数据进行编码压缩的过程,例如视频监控、直播、终端录像、存储、传输等业务中的数据编码压缩过程,尤其适用于基于AI的压缩场景中。
参见图10,为本申请实施例提供的另一种编码方法的流程示意图,图10所示流程由编码端执行,编码端可以是一个计算设备,也可以由多个计算设备共同实现。具体的,图10所示的编码方法可以包括以下步骤:
步骤1001、编码端将图像输入至编码网络,得到图像的特征图,该特征图包括多个元素。
该步骤上述实施例中的步骤401类似,也可以采用如图5(a)所示的编码网络,得到图像的特征图。当然,图5仅为示例,也可以采用其他能够实现相似功能的编码网络。
步骤1002、编码端确定特征图中的第一元素的边界值,该第一元素为上述多个元素中的任一个元素。
其中,确定出的第一元素的边界值可以包括上边界值,或者包括下边界值,或者也可以包括上边界值和下边界值。
编码端可以根据第一元素的概率分布情况确定出第一元素的边界值,也可以由网络确定第一元素的边界值,具体实现方式可参考前述实施例中的步骤402的具体实现方式,此处不再赘述。
编码端可以遍历特征图中的每个元素,确定出每个元素对应的边界值。
步骤1003、编码端判断第一元素是否处于第一元素对应的目标范围内。
编码端在确定出第一元素的边界值,即可根据边界值确定第一元素对应的目标范围。例如,第一元素原本取值范围是[-64,64],当第一元素的边界包括上边界r时,那么第一元素对应的目标范围可以是[-64,r],其中r≤64;当第一元素的边界包括下边界l时,那么第一元素对应的目标范围可以是[l,64],其中l≥-64;当第一元素的边界包括上边界r和下边界l时,那么第一元素对应的目标范围是[l,r],其中l≥-64且r≤64。
若第一元素处于第一元素对应的目标范围内,这执行步骤1004a;若第一元素处于第一元素对应的目标范围外,则执行步骤1004b。
步骤1004a、若第一元素处于第一元素对应的目标范围内,编码端对第一元素进行熵编码。
若第一元素位于第一元素对应的目标范围内,可以认为第一元素的取值并非概率很小的取值,可以根据第一元素的取值在目标范围内的概率分布情况对第一元素进行熵编码。若对第一元素进行了量化,则对量化后的第一元素进行熵编码。具体可参见前述实施例中步骤404a中任一实现方式。
步骤1004b、若第一元素处于第一元素对应的目标范围外,编码端标记第一元素为越界元素,对第一元素进行变长码编码。
若第一元素位于第一元素对应的目标范围之外,可以认为第一元素的取值为概率很小的取值,在这种情况下,可以不对第一元素进行熵编码,而采用变长码编码方式对第一元 素进行编码。例如,可以采用哥伦布码、哥伦布莱斯码、指数哥伦布码等变长码编码方法对第一元素进行编码。
编码端在将第一元素标记为越界元素时,需要将用于指示第一元素为越界元素的标志信息编入码流中,以使解码端能够根据该越界标志信息确定第一元素可以越界元素。例如,编码端可以为越界元素设置一个越界比特位,以表示该元素为越界元素,解码端若从码流中解析出该越界比特位,则可以确定该元素为越界元素;或者,编码端可以为每个元素设置一个标记比特位,用“0”和“1”分别表示没有越界和越界。又例如,编码端也可以将第一元素的值修改为预设的越界值,并对其进行熵编码或其他编码形式,以使解码端在解码得到预设的越界值后,确定该元素为越界值;假设第一元素对应的目标范围为[-10,10],若第一元素取值大于10,则将其修改为11,表示第一元素大于上边界,若第一元素取值小于-10,则将其修改为-11,表示第一元素小于下边界。
上述码流包括对第一元素进行熵编码得到的编码信息,或者,对第一元素进行变长码编码得到的编码信息。此外,码流还可以包括对其他非越界元素进行熵编码得到的码流信息,以及对其他越界元素进行变长码编码得到的编码信息。
若对第一元素进行了量化,那么在执行上述1004b时,当量化后的第一元素位于目标范围之外,编码端则对量化后的第一元素进行变长码编码。
在图2所示的编码方法中,对于标记为越界的元素,直接将其取值写入码流中,相当于采用了一种特殊的定长码进行编码,需要占用较多数量的比特位。而在本申请实施例的步骤1004b中,对越界元素进行变长码编码,既能够更多的保留越界元素的特征信息,又能够对越界元素的信息进行编码压缩,对于概率分布曲线为单峰(如本申请实施例提到的高斯分布模型)的场景,采用变长码编码与传统的采用定长码编码相比,压缩性能更高,编码后的信息占用的比特数更少。
在一种可能的实现方式中,为了进一步提高压缩性能,编码端在对越界的第一元素进行变长码编码时,可以对第一元素与边界的差值进行编码。例如,若第一元素大于边界中的上边界,可以确定第一元素与上边界值的第一差值,然后对第一差值进行变长码编码;若第一元素小于边界中的下边界,可以确定第一元素与下边界的第二差值,然后对第二差值进行变长码编码。由于差值通常远远小于第一元素的取值,可以通过更少的比特位表示,因此,对第一元素与边界的差值进行变长码编码,能够进一步提供压缩性能。
与上述图10所示方法对应地,解码端可以根据图11所示的流程进行解码。图11所示流程由解码端执行,解码端可以是一个计算设备,也可以由多个计算设备共同实现。具体的,图11所示的解码方法可以包括以下步骤:
步骤1101、解码端获取码流,该码流包括多个元素编码后的信息。
步骤1102、解码端确定码流中的第一元素是否为越界元素,其中,第一元素为上述多个元素中的任一个元素。
解码端确定第一元素是否为越界元素,即确定第一元素是否位于第一元素对应的目标范围内,若位于目标范围内,则第一元素不是越界元素,若位于目标范围之外,则第一元素为越界元素。
由于编码端在编码时,对于越界的元素进行了标记,则码流中包含有第一元素的越界标志信息,相应地,解码端在进行解码时,也能够根据该越界标记信息确定第一元素是否越界。例如,编码端可以对越界元素设置一个越界比特位,以表示该元素为越界元素,解 码端若从码流中解析出该越界比特位,则可以确定该元素为越界元素;或者,编码端可以为每个元素设置一个标记比特位,用“0”和“1”分别表示没有越界和越界,解码端根据每个元素的标记比特位,确定该元素是否越界。又例如,若编码端将越界的第一元素值修改为预设的越界值,解码端在解码时先确定第一元素的边界值,若发现解码后的第一元素值为越界值,则确定第一元素越界;假设第一元素对应的上边界值和下边界值分别为10、-10,若解码后的第一元素取值为11,表示第一元素大于上边界,若解码后的第一元素取值为-11,表示第一元素小于下边界。
若解码端确定第一元素没有越界,则执行步骤1103a;若解码端确定第一元素越界,则执行步骤1103b。
步骤1103a、若第一元素不是越界元素,解码端对第一元素编码后的信息进行熵解码,得到第一元素。
对于非越界元素,解码端则对编码后的信息进行熵解码,例如,解码端可以根据熵估计网络,对编码后的信息进行熵解码,从而得到第一元素的取值。若编码端采用了如图6中的(a)所示的熵估计网络进行编码,那么解码端可以采用图6中的(b)所示的熵估计网络进行解码。
步骤1103b、若第一元素为越界元素,解码端对第一元素编码后的信息进行变长码解码,得到第一元素。
若编码端在对越界元素进行编码时,对第一元素与边界值的差值进行了变长码编码;那么解码端在进行解码时,可以先确定第一元素的边界值,并对第一元素编码后的信息进行变长码解码得到差值,然后根据第一元素的边界值和解码得到的差值,确定出第一元素的取值。例如,第一元素的边界包括上边界和下边界,若解码得到的差值为正,可以认为差值是第一元素减去上边界得到的差值;若解码得到的差值为负,可以认为差值的第一元素减去下边界得到的差值。
解码端在确定第一元素的边界时,与编码端类似,可以将码流输入至熵估计网络中,获取第一元素的概率分布情况,然后解码端根据第一元素的概率分布情况确定出第一元素的边界值;或者,解码端也可以根据训练好的网络获得第一边界值,例如,解码端可以将码流输入至熵估计网络,由熵估计网络输出第一元素的边界值。通常情况下,编码端如何获取第一元素的边界值,解码端可以根据码流进行逆向操作相应获得第一元素的边界值。
进一步的,解码端在每个元素在执行完上述步骤1103a或者步骤1103b之后,即可得到图像的特征图,然后解码端可以将特征图输入至编码网络以获取重建图像。若编码端采用了如图5中(a)所示的编码网络,那么解码端可以采用如图5中(b)所示的解码网络。
在上述方法实施例中,编码端通过编码网络得到图像的特征图;确定的特征图中每个元素的边界值,根据每个元素的边界值确定每个元素对应的目标范围;若元素取值处于其对应的目标范围内,则对该元素进行熵编码;若元素取值处于其对应的目标范围外,则将第一元素标记为越界元素,并对第一元素进行变长码编码。相应的,解码端在根据获取的码流确定第一元素不是越界元素时,可以对第一元素编码后的信息进行熵解码;在确定第一元素为越界元素时,则对第一元素编码后的信息进行变长码解码。由于上述方法中针对每个元素确定其对应的边界值,有助于缩小每个元素对应的目标范围,由于缩小了目标范围,降低了元素在目标范围内的概率分布复杂度,从而能够降低熵编码的复杂度,提升熵编码速度。对于越界元素,已有的熵编码优化方式中采用了定长码进行编码,需要占用较 多数量的比特位;而本申请上述实施例中,对越界元素进行变长码编码,有助于提升压缩性能,尤其是对越界元素与边界的差值进行变长码编码,更加有利于缩减编码后所需比特位的数量,进一步提升压缩性能。
基于相同的技术构思,本申请实施例还提供一种编码装置。用于实现上述方法实施例中编码端的功能。装置可以包括执行上述方法实施例中任意一种可能的实现方式的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
示例性的,该装置可以如图12所示,包括:编码网络模块1201、确定模块1202、判断模块1203以及编码模块1204。
具体的,编码网络模块1201,用于将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素。
确定模块1202,用于确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素。
判断模块1203,用于判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的。
编码模块1204,用于当所述第一元素处于所述第一元素对应的目标范围内时,则对所述第一元素进行熵编码;当所述第一元素处于所述第一元素对应的目标范围外时:将所述第一元素修改为所述第一元素对应的边界值,并对修改后的第一元素进行熵编码。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述编码模块1204,在将所述第一元素修改为所述第一元素对应的边界值时,具体用于:若所述第一元素大于所述上边界值,则将所述第一元素修改为所述上边界值,或者若所述第一元素小于所述下边界值,则将所述第一元素修改为所述下边界值。
在一种可能的实现方式中,所述确定模块1202,在确定所述特征图中的第一元素的边界值时,具体用于:将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;根据所述第一元素的概率分布模型确定所述第一元素的边界值。
在一种可能的实现方式中,所述确定模块1202,在根据所述第一元素的概率分布模型确定所述第一元素的边界值时,具体用于:根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;其中,k为常数,σ表示所述概率分布模型的方差。
在一种可能的实现方式中,所述确定模块1202,在确定特征图中的第一元素的边界值时,具体用于:将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。
基于相同的技术构思,本申请实施例还提供一种编码装置。用于实现上述方法实施例中编码端的功能。装置可以包括执行上述方法实施例中任意一种可能的实现方式的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
示例性的,该装置可以如图13所示,包括:编码网络模块1301、确定模块1302、判断模块1303以及编码模块1304。
具体的,编码网络模块1301,用于将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素。
确定模块1302,用于确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素。
判断模块1303,用于判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的。
编码模块1304,用于当所述第一元素处于所述第一元素对应的目标范围内时,则对所述第一元素进行熵编码;当所述第一元素处于所述第一元素对应的目标范围外时:标记所述第一元素为越界元素,对所述第一元素进行变长码编码。
在一种可能的实现方式中,所述编码模块1304,在标记所述第一元素为越界元素时,具体用于:将用于指示所述第一元素为越界元素的标志信息编码至码流中。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述编码模块1304,在对所述第一元素进行变长码编码时,具体用于:若所述第一元素大于所述上边界值,确定所述第一元素与所述上边界值的第一差值;对所述第一差值进行变长码编码;或者,若所述第一元素小于所述下边界值,确定所述第一元素与所述下边界值的第二差值;对所述第二差值进行变长码编码。
在一种可能的实现方式中,所述确定模块1302,在确定所述特征图中的第一元素的边界值时,具体用于:将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;根据所述第一元素的概率分布模型确定所述第一元素的边界值。
在一种可能的实现方式中,所述确定模块1302,在根据所述第一元素的概率分布模型确定所述第一元素的边界值时,具体用于:根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;其中,k为常数,σ表示所述概率分布模型的方差。
在一种可能的实现方式中,所述确定模块1302,在确定特征图中的第一元素的边界值时,具体用于:将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。
基于相同的技术构思,本申请实施例还提供一种解码装置。用于实现上述方法实施例中解码端的功能。装置可以包括执行上述方法实施例中任意一种可能的实现方式的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
示例性的,该装置可以如图14所示,包括:获取模块1401、确定模块1402、解码模块1403。
具体的,获取模块1401,用于获取码流,所述码流包括多个元素编码后的信息。
确定模块1402,用于确定所述码流中的第一元素是否为越界元素,所述第一元素为所述多个元素中的任一个元素。
解码模块1403,用于当所述第一元素为越界元素时,对所述第一元素编码后的信息进行变长码解码,得到所述第一元素;当所述第一元素不是越界元素,对所述第一元素编码后的信息进行熵解码,得到所述第一元素。
在一种可能的实现方式中,所述确定模块1402,在确定所述码流中的第一元素是否为越界元素时,具体用于:若所述码流中包括所述第一元素越界标志信息,则确定所述第一元素为越界元素。
在一种可能的实现方式中,所述解码模块1403,在对所述第一元素编码后的信息进行变长码解码,得到第一元素时,具体用于:确定所述第一元素的边界值;对所述第一元素编码后的信息进行变长码解码得到差值,所述差值为所述第一元素与所述边界值中上边界值的差值,或者为所述第一元素与所述边界值中下边界值的差值;根据所述边界值和所述差值确定所述第一元素。
在一种可能的实现方式中,所述解码模块1403,在确定所述第一元素的边界值时,具体用于:确定所述第一元素的概率分布模型;根据所述第一元素的概率分布模型确定所述第一元素的边界值。
在一种可能的实现方式中,所述解码模块1403,在根据所述第一元素的概率分布模型确定所述第一元素的边界值时,具体用于:根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
在一种可能的实现方式中,所述边界值包括上边界值和/或下边界值;所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;其中,k为常数,σ表示所述概率分布模型的方差。
在一种可能的实现方式中,所述解码模块1403,在确定所述第一元素的边界值时,具体用于:将所述码流输入至熵估计网络,所述熵估计网络输出所述第一元素的边界值。
本申请实施例还提供一种计算机设备。该计算机设备包括如图15所示的处理器1501,以及与处理器1501连接的存储器1502。进一步的,该计算机设备还可以包括通信接口1503以及通信总线1504。
处理器1501可以是通用处理器,微处理器,特定集成电路(application specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件,分立门或者晶体管逻辑器件,或一个或多个用于控制本申请方案程序执行的集成电路等。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器1502,用于存储程序指令和/或数据,以使处理器1501调用存储器1502中存储的指令和/或数据,实现处理器1501的上述功能。存储器1502可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器1502可以是独立存在,例如片外存储器,通过通信总线1504与处理器1501相连接。存储器1502也可以和处理器1501集成在一起。存储1502可以包括内存储器和外存储器(如硬盘等)。
通信接口1503,用于与其他设备通信,如PCI总线接口、网卡,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。
通信总线1504可包括一通路,用于在上述组件之间传送信息。
示例性的,该计算机设备可以为图4或图10中的编码端,也可以是图11所示的解码端。
当该计算机设备为编码端时,处理器1501可以调用存储器1502中的指令执行以下步骤:
将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素;确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;若所述第一元素处于所述第一元素对应的目标范围内,则对所述第一元素进行熵编码;若所述第一元素处于所述第一元素对应的目标范围外:将所述第一元素修改为所述第一元素对应的边界值,并对修改后的第一元素进行熵编码;或者,标记所述第一元素为越界元素,对所述第一元素进行变长码编码。
此外,上述各个部件还可以用于支持图4或图10所示实施例中编码端所执行的其它过程。有益效果可参考前面的描述,此处不再赘述。
当该计算机设备为解码端时,处理器1501可以调用存储器1502中的指令执行以下步骤:
获取码流,所述码流包括多个元素编码后的信息;确定所述码流中的第一元素是否为越界元素,所述第一元素为所述多个元素中的任一个元素;若所述第一元素为越界元素,对所述第一元素编码后的信息进行变长码解码,得到所述第一元素;若所述第一元素不是越界元素,对所述第一元素编码后的信息进行熵解码,得到所述第一元素。
此外,上述各个部件还可以用于支持图11所示实施例中解码端所执行的其它过程。有益效果可参考前面的描述,此处不再赘述。
基于相同的技术构思,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机可读指令,当所述计算机可读指令在计算机上运行时,使得上述方法实施例被执行。
基于相同的技术构思,本申请实施例提供还一种包含指令的计算机程序产品,当其在计算机上运行时,使得上述任一方法实施例被执行。
基于相同的技术构思,本申请实施例提供还一种计算机可读存储介质,所述计算机可读存储介质中存储有比特流,所述比特流根据图4或图10所示的编码方法生成。
基于相同的技术构思,本申请实施例提供还一种计算机可读存储介质,所述计算机可读存储介质中存储有比特流,所述比特流包括解码器可执行的程序指令,所述程序指令使得解码器执行第三方面以及第三方面的任一种可能的实现方式中的解码方法。
基于相同的技术构思,本申请实施例提供还一种译码系统,所述译码系统包括至少一个存储器和解码器,所述至少一个存储器用于存储比特流,所述解码器用于执行图11所示解码方法。
基于相同的技术构思,本申请实施例提供还一种存储比特流的方法,该方法包括接收或生成比特流,将所述比特流存储到存储介质中。
在一种可能的实现方式中,该方法还包括:对所述比特流进行格式转换处理,得到格式转换后的比特流,并将所述格式转换后的比特流存储到存储介质中。
基于相同的技术构思,本申请实施例提供还一种传输比特流的方法,该方法包括接收或生成比特流,将所述比特流传输到云端服务器,或将所述比特流传输到移动终端。
需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (42)

  1. 一种编码方法,其特征在于,所述方法包括:
    将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素;
    确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;
    判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;
    若所述第一元素处于所述第一元素对应的目标范围内,则对所述第一元素进行熵编码;
    若所述第一元素处于所述第一元素对应的目标范围外,标记所述第一元素为越界元素,对所述第一元素进行变长码编码。
  2. 根据权利要求1所述的方法,其特征在于,所述标记所述第一元素为越界元素,包括:
    将用于指示所述第一元素为越界元素的标志信息编码至码流中。
  3. 根据权利要求1或2所述的方法,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述对所述第一元素进行变长码编码,包括:
    若所述第一元素大于所述上边界值,确定所述第一元素与所述上边界值的第一差值;对所述第一差值进行变长码编码;或者
    若所述第一元素小于所述下边界值,确定所述第一元素与所述下边界值的第二差值;对所述第二差值进行变长码编码。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述确定所述特征图中的第一元素的边界值,包括:
    将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;
    根据所述第一元素的概率分布模型确定所述第一元素的边界值。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第一元素的概率分布模型确定所述第一元素的边界值,包括:
    根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
  6. 根据权利要求5所述的方法,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;
    其中,k为常数,σ表示所述概率分布模型的方差。
  7. 根据权利要求1-3任一项所述的方法,其特征在于,所述确定特征图中的第一元素的边界值,包括:
    将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。
  8. 一种解码方法,其特征在于,所述方法包括:
    获取码流,所述码流包括多个元素编码后的信息;
    确定所述码流中的第一元素是否为越界元素,所述第一元素为所述多个元素中的任一个元素;
    若所述第一元素为越界元素,对所述第一元素编码后的信息进行变长码解码,得到所述第一元素;
    若所述第一元素不是越界元素,对所述第一元素编码后的信息进行熵解码,得到所述第一元素。
  9. 根据权利要求8所述的方法,其特征在于,所述确定所述码流中的第一元素是否为越界元素,包括:
    若所述码流中包括所述第一元素为越界元素的标志信息,则确定所述第一元素为越界元素。
  10. 根据权利要求8或9所述的方法,其特征在于,所述对所述第一元素编码后的信息进行变长码解码,得到第一元素,包括:
    确定所述第一元素的边界值;
    对所述第一元素编码后的信息进行变长码解码得到差值,所述差值为所述第一元素与所述边界值中上边界值的差值,或者为所述第一元素与所述边界值中下边界值的差值;
    根据所述边界值和所述差值确定所述第一元素。
  11. 根据权利要求10所述的方法,其特征在于,所述确定所述第一元素的边界值,包括:
    确定所述第一元素的概率分布模型;
    根据所述第一元素的概率分布模型确定所述第一元素的边界值。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述第一元素的概率分布模型确定所述第一元素的边界值,包括:
    根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
  13. 根据权利要求12所述的方法,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;
    其中,k为常数,σ表示所述概率分布模型的方差。
  14. 根据权利要求10所述的方法,其特征在于,所述确定所述第一元素的边界值,包括:
    将所述码流输入至熵估计网络,所述熵估计网络输出所述第一元素的边界值。
  15. 一种编码方法,其特征在于,所述方法包括:
    将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素;
    确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;
    判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;
    若所述第一元素处于所述第一元素对应的目标范围内,则对所述第一元素进行熵编码;
    若所述第一元素处于所述第一元素对应的目标范围外,将所述第一元素修改为所述第一元素对应的边界值,并对修改后的第一元素进行熵编码。
  16. 根据权利要求15所述的方法,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述将所述第一元素修改为所述第一元素对应的边界值,包括:
    若所述第一元素大于所述上边界值,则将所述第一元素修改为所述上边界值,或者若所述第一元素小于所述下边界值,则将所述第一元素修改为所述下边界值。
  17. 根据权利要求15或16所述的方法,其特征在于,所述确定所述特征图中的第一元素的边界值,包括:
    将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;
    根据所述第一元素的概率分布模型确定所述第一元素的边界值。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述第一元素的概率分布模型确定所述第一元素的边界值,包括:
    根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
  19. 根据权利要求18所述的方法,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;
    其中,k为常数,σ表示所述概率分布模型的方差。
  20. 根据权利要求15或16所述的方法,其特征在于,所述确定特征图中的第一元素的边界值,包括:
    将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。
  21. 一种编码装置,其特征在于,所述装置包括:
    编码网络模块,用于将图像输入至编码网络,得到所述图像的特征图,所述特征图包括多个元素;
    确定模块,用于确定所述特征图中的第一元素的边界值,所述第一元素为所述多个元素中的任一个元素;
    判断模块,用于判断所述第一元素是否处于所述第一元素对应的目标范围内,所述目标范围是根据所述第一元素的边界值确定的;
    编码模块,用于当所述第一元素处于所述第一元素对应的目标范围内时,则对所述第一元素进行熵编码;当所述第一元素处于所述第一元素对应的目标范围外时:标记所述第一元素为越界元素,对所述第一元素进行变长码编码。
  22. 根据权利要求21所述的装置,其特征在于,所述编码模块,在标记所述第一元素为越界元素时,具体用于:
    将用于指示所述第一元素为越界元素的标志信息编码至码流中。
  23. 根据权利要求21或22所述的装置,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述编码模块,在对所述第一元素进行变长码编码时,具体用于:
    若所述第一元素大于所述上边界值,确定所述第一元素与所述上边界值的第一差值;对所述第一差值进行变长码编码;或者
    若所述第一元素小于所述下边界值,确定所述第一元素与所述下边界值的第二差值;对所述第二差值进行变长码编码。
  24. 根据权利要求21-23任一项所述的装置,其特征在于,所述确定模块,在确定所述 特征图中的第一元素的边界值时,具体用于:
    将所述特征图输入至熵估计网络,所述熵估计网络输出所述特征图中的第一元素的概率分布模型;
    根据所述第一元素的概率分布模型确定所述第一元素的边界值。
  25. 根据权利要求24所述的装置,其特征在于,所述确定模块,在根据所述第一元素的概率分布模型确定所述第一元素的边界值时,具体用于:
    根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
  26. 根据权利要求25所述的装置,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;
    其中,k为常数,σ表示所述概率分布模型的方差。
  27. 根据权利要求21-23任一项所述的装置,其特征在于,所述确定模块,在确定特征图中的第一元素的边界值时,具体用于:
    将所述特征图输入至熵估计网络,所述熵估计网络输出所述第一元素的概率分布模型和所述第一元素的边界值。
  28. 一种解码装置,其特征在于,所述装置包括:
    获取模块,用于获取码流,所述码流包括多个元素编码后的信息;
    确定模块,用于确定所述码流中的第一元素是否为越界元素,所述第一元素为所述多个元素中的任一个元素;
    解码模块,用于当所述第一元素为越界元素时,对所述第一元素编码后的信息进行变长码解码,得到所述第一元素;当所述第一元素不是越界元素,对所述第一元素编码后的信息进行熵解码,得到所述第一元素。
  29. 根据权利要求28所述的装置,其特征在于,所述确定模块,在确定所述码流中的第一元素是否为越界元素时,具体用于:
    若所述码流中包括所述第一元素越界标志信息,则确定所述第一元素为越界元素。
  30. 根据权利要求28或29所述的装置,其特征在于,所述解码模块,在对所述第一元素编码后的信息进行变长码解码,得到第一元素时,具体用于:
    确定所述第一元素的边界值;
    对所述第一元素编码后的信息进行变长码解码得到差值,所述差值为所述第一元素与所述边界值中上边界值的差值,或者为所述第一元素与所述边界值中下边界值的差值;
    根据所述边界值和所述差值确定所述第一元素。
  31. 根据权利要求30所述的装置,其特征在于,所述解码模块,在确定所述第一元素的边界值时,具体用于:
    确定所述第一元素的概率分布模型;
    根据所述第一元素的概率分布模型确定所述第一元素的边界值。
  32. 根据权利要求31所述的装置,其特征在于,所述解码模块,在根据所述第一元素的概率分布模型确定所述第一元素的边界值时,具体用于:
    根据所述第一元素的概率分布模型的方差,确定所述第一元素的边界值。
  33. 根据权利要求32所述的装置,其特征在于,所述边界值包括上边界值和/或下边界值;
    所述边界值中的上边界值为k*σ,和/或,所述边界值中的下边界值为-k*σ;
    其中,k为常数,σ表示所述概率分布模型的方差。
  34. 根据权利要求30所述的装置,其特征在于,所述解码模块,在确定所述第一元素的边界值时,具体用于:
    将所述码流输入至熵估计网络,所述熵估计网络输出所述第一元素的边界值。
  35. 一种编码器,其特征在于,包括处理电路,用于执行如权利要求1至7任一项所述的编码方法,或者,用于执行如权利要求15至20任一项所述的编码方法。
  36. 一种解码器,其特征在于,包括处理电路,用于执行如权利要求8至14任一项所述的解码方法。
  37. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    计算机可读存储介质,耦合到所述一个或多个处理器,所述计算机可读存储介质存储有程序,其中,所述程序在由所述一个或多个处理器执行时,使得所述编码器执行如权利要求1至7任一项所述的编码方法,或者,执行如权利要求15至20任一项所述的编码方法。
  38. 一种解码器,其特征在于,包括:
    一个或多个处理器;
    计算机可读存储介质,耦合到所述一个或多个处理器,所述计算机可读存储介质存储有程序,其中,所述程序在由所述一个或多个处理器执行时,使得所述解码器执行如权利要求8至14任一项所述的解码方法。
  39. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1至7、8至14、15至20任一项所述的方法。
  40. 一种计算机程序产品,其特征在于,包括程序代码,当所述程序代码在计算机或处理器上执行时,用于执行如权利要求1至7、8至14、15至20任一项所述的方法。
  41. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有码流,所述码流是根据权利要求1至7、15至20任一项所述的编码方法生成的。
  42. 一种计算机可读存储介质,其特征在于,存储有包含程序代码的码流,当所述程序代码被一个或多个处理器执行时,使得解码器执行如权利要求8至14任一项所述的解码方法。
PCT/CN2023/100760 2022-07-08 2023-06-16 一种编解码方法、装置及计算机设备 WO2024007843A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210806919.4A CN117412046A (zh) 2022-07-08 2022-07-08 一种编解码方法、装置及计算机设备
CN202210806919.4 2022-07-08

Publications (2)

Publication Number Publication Date
WO2024007843A1 true WO2024007843A1 (zh) 2024-01-11
WO2024007843A9 WO2024007843A9 (zh) 2024-02-15

Family

ID=89454177

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100760 WO2024007843A1 (zh) 2022-07-08 2023-06-16 一种编解码方法、装置及计算机设备

Country Status (3)

Country Link
CN (1) CN117412046A (zh)
TW (1) TW202408230A (zh)
WO (1) WO2024007843A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652581B1 (en) * 2019-02-27 2020-05-12 Google Llc Entropy coding in image and video compression using machine learning
CN111988629A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码方法和装置、图像解码方法和装置
CN114339262A (zh) * 2020-09-30 2022-04-12 华为技术有限公司 熵编/解码方法及装置
CN114554205A (zh) * 2020-11-26 2022-05-27 华为技术有限公司 一种图像编解码方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10652581B1 (en) * 2019-02-27 2020-05-12 Google Llc Entropy coding in image and video compression using machine learning
CN111988629A (zh) * 2019-05-22 2020-11-24 富士通株式会社 图像编码方法和装置、图像解码方法和装置
CN114339262A (zh) * 2020-09-30 2022-04-12 华为技术有限公司 熵编/解码方法及装置
CN114554205A (zh) * 2020-11-26 2022-05-27 华为技术有限公司 一种图像编解码方法及装置

Also Published As

Publication number Publication date
TW202408230A (zh) 2024-02-16
CN117412046A (zh) 2024-01-16
WO2024007843A9 (zh) 2024-02-15

Similar Documents

Publication Publication Date Title
US20210211728A1 (en) Image Compression Method and Apparatus
CN109379598B (zh) 一种基于fpga实现的图像无损压缩方法
US8285062B2 (en) Method for improving the performance of embedded graphics coding
WO2020113827A1 (zh) 图像压缩方法
US20110135210A1 (en) Embedded graphics coding: reordered bitstream for parallel decoding
US8754792B2 (en) System and method for fixed rate entropy coded scalar quantization
CN112290953B (zh) 多道数据流的阵列编码装置和方法、阵列解码装置和方法
WO2024007843A1 (zh) 一种编解码方法、装置及计算机设备
US9948928B2 (en) Method and apparatus for encoding an image
CN116896651A (zh) 码率自适应的视频语义通信方法及相关装置
CN116029345A (zh) 中间层特征压缩传输方法、压缩数据的解码方法及装置
CN108184113B (zh) 一种基于图像间参考的图像压缩编码方法和系统
CN116527903B (zh) 图像浅压缩方法及解码方法
EP4224852A1 (en) Video encoding and decoding methods, encoder, decoder, and storage medium
US20240146975A1 (en) Dynamic queuing of entropy-coded data for transmission in a bitstream
WO2023185806A9 (zh) 一种图像编解码方法、装置、电子设备及存储介质
WO2024027635A1 (zh) 视频传输方法、电子设备及计算机存储介质
US20150245029A1 (en) Image processing system and method
WO2021238606A1 (zh) 视频编码、解码方法、装置、电子设备及存储介质
CN109561308B (zh) 带宽压缩中自适应纹理渐变预测方法
CN110099279B (zh) 一种基于硬件的可调节有损压缩的方法
CN117896525A (zh) 视频处理、模型训练方法、装置、电子设备及存储介质
CN118474384A (zh) 编解码方法、装置、设备、存储介质及计算机程序
TW202433401A (zh) 編解碼方法、裝置、設備、儲存介質及電腦程式
TW202408235A (zh) 視訊影像處理方法及裝置、編解碼器、碼流、儲存媒介

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23834617

Country of ref document: EP

Kind code of ref document: A1