WO2023207836A1 - Image encoding method and apparatus, and image decompression method and apparatus - Google Patents

Image encoding method and apparatus, and image decompression method and apparatus Download PDF

Info

Publication number
WO2023207836A1
WO2023207836A1 PCT/CN2023/090043 CN2023090043W WO2023207836A1 WO 2023207836 A1 WO2023207836 A1 WO 2023207836A1 CN 2023090043 W CN2023090043 W CN 2023090043W WO 2023207836 A1 WO2023207836 A1 WO 2023207836A1
Authority
WO
WIPO (PCT)
Prior art keywords
residual
image
encoding
input
model
Prior art date
Application number
PCT/CN2023/090043
Other languages
French (fr)
Chinese (zh)
Inventor
康宁
仇善召
张鸣天
张世枫
李震国
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023207836A1 publication Critical patent/WO2023207836A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present application relates to the field of image processing, and in particular, to an image encoding method, image decompression method and device.
  • Images are widely used in various fields, and the transmission or storage of images may be involved in a large number of scenarios. And with the higher the resolution of the image, more storage space is consumed when saving the image, the higher the bandwidth required when transmitting the image, and the lower the transmission efficiency. Therefore, in general, in order to facilitate the transmission or storage of images, images can be compressed to reduce the number of bits occupied by the image, thereby reducing the storage space required to save the image and the bandwidth required to transmit the image.
  • entropy coding can be used for image compression.
  • commonly used entropy coding algorithms include Huffman coding, arithmetic coding, ANS coding, etc. for image compression.
  • the compression rates of various entropy coding methods have reached the optimal level, and it is difficult to further improve the compression rate. Therefore, how to improve encoding and decoding efficiency has become an urgent problem to be solved.
  • This application provides an image encoding method, an image decompression method and a device for encoding by combining the output of an autoregressive model and an autoencoding model, reducing the size of the required model and improving encoding and decoding efficiency.
  • the present application provides an image coding method, which includes: using the input image as the input of the autoregressive model, outputting the first image; obtaining the residual between the first image and the input image, and obtaining the first Residual image; use the input image as the input of the autoencoding model, output latent variables and the first residual distribution, the latent variables include features extracted from the input image, and the first residual distribution includes the predictions of the autoencoding model for representation Input the residual value corresponding to each pixel point in the image and each pixel point in the first residual image; encode the first residual image and the first residual distribution to obtain the residual encoded data; encode the latent variable to obtain Latent variable encoded data, latent variable encoded data and residual encoded data are used to obtain the input image after decompression.
  • the output results of the autoregressive model and the autoencoding model are combined for coding, which can control both the autoencoding and the autoregressive models to a very small size and avoid the long inference time caused by the large network of the autoencoding model. problem to achieve efficient image compression.
  • the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
  • the aforementioned encoding of the first residual image and the first residual distribution to obtain residual encoded data includes: using the first residual image and the first residual distribution as semi-dynamic entropy
  • the input of the encoder is to output residual encoded data.
  • the semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation.
  • the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder is used to perform entropy encoding.
  • the second preset type of encoding operation is not included in the entropy encoder.
  • the second preset type includes at least one of multiplication, division or remainder operation, that is, the semi-dynamic entropy encoder does not include multiplication, division or remainder operation and other long-consuming operations, that is, the semi-dynamic entropy Only simple addition and subtraction operations can be included in the encoder, allowing efficient encoding.
  • the residual image can be semi-dynamic entropy encoded and encoded in a limited distribution manner. Compared with dynamic entropy encoding, time-consuming operations such as multiplication, division and remainder operations are reduced. The coding efficiency is greatly improved.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
  • the aforementioned coding of latent variables to obtain residual coded data may include: using latent variables as inputs of a static entropy encoder to obtain latent variable coded data.
  • static entropy coding can be performed on the features extracted from the input image, and coding can be achieved efficiently.
  • the autoencoding model may include an encoding model and a decoding model, using the input image as the input of the autoencoding model, and outputting the latent variable and the first residual distribution, including: using the input image as the input of the encoding model. , output latent variables, and the encoding model is used to extract features from the input graphics; the latent variables are used as input to the decoding model to obtain the first residual distribution, and the decoding model is used to predict the residual between the input image and the corresponding pixel distribution. value.
  • a trained autoencoding model can be used to extract important features from the input image and predict the corresponding residual image, so as to combine the output of the autoregressive model to obtain a residual that can represent the data in the input image. Poorly encoded data.
  • the autoregressive model is used to predict the values of pixels on the same connection using the pixel values of the predicted pixels, so that in the subsequent decoding process, for the pixels on the same connection , there is no need to wait for other pixels to be decoded before the current pixel can be decoded, achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.
  • this application provides an image decompression method, which includes: obtaining latent variable coded data and residual coded data.
  • the latent variable coded data includes features extracted by the coding end from the input image and is obtained by coding.
  • the residual coded data The data includes the residual error between the forward propagation output image of the autoregressive model and the input image, which is encoded; the latent variable encoded data is decoded to obtain the latent variable.
  • the latent variable includes the features extracted by the encoding end from the input image.
  • the autoencoding model usually has poor fitting ability and requires a deeper network to achieve Better compression rate, and this application combines the output results of the autoregressive model, thereby reducing the size of the autoencoding model. Therefore, in this application, the autoregressive model and the autoencoding model are combined for decoding, and both the autoencoding and the autoregressive models can be controlled to be very small, thus avoiding the problem of too long inference time caused by an excessively large network of the autoencoding model. Enable efficient image decompression.
  • the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
  • the aforementioned decoding of latent variable encoded data to obtain latent variables includes: using latent variable encoded data as input to a static entropy encoder and outputting latent variables.
  • this decoding can be understood as the inverse operation of static entropy coding performed by the encoding end, so that important features in the image can be obtained by lossless recovery.
  • the aforementioned combination of the second residual distribution and the residual coded data for decoding to obtain the second residual image includes: using the second residual distribution and the residual coded data as semi-dynamic entropy coding
  • the semi-dynamic entropy encoder is used as input to the encoder to output a second residual image.
  • the semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation.
  • the first preset type of encoding operation includes addition, subtraction or bit operation
  • the semi-dynamic entropy encoder is The entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of multiplication, division, or remainder operations, that is, the semi-dynamic entropy encoder does not include multiplication, division, or remainder operations.
  • the semi-dynamic entropy encoder can only include simple addition and subtraction operations to achieve high-efficiency encoding. Therefore, the residual image can be decoded based on semi-dynamic entropy coding and decoded in a limited distribution manner. Compared with dynamic entropy coding, time-consuming operations such as multiplication, division and remainder operations are reduced, and the decoding efficiency is greatly improved. efficiency.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
  • the aforementioned second residual image is used as the input of the back propagation of the autoregressive model, and outputting the decompressed image includes: using the autoregressive model, the second residual image is on the same line
  • the pixels on the image are decoded in parallel to obtain the decompressed image. Therefore, for pixels on the same connection, there is no need to wait for other pixels to be decoded before the current pixel can be decoded, thereby achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.
  • this application provides an image coding device, including:
  • the autoregressive module is used to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;
  • the residual calculation module is used to obtain the residual between the first image and the input image to obtain the first residual image
  • the autoencoding module is used to use the input image as the input of the autoencoding model, and output latent variables and a first residual distribution.
  • the latent variables include features extracted from the input image, and the first residual distribution includes the user output of the autoencoding model. Yu represents the residual value corresponding to each pixel point in the input image and each pixel point in the first residual image;
  • a residual coding module used to code the first residual image and the first residual distribution to obtain residual coded data
  • the latent variable encoding module is used to encode latent variables to obtain latent variable encoded data.
  • the latent variable encoded data and residual encoded data are used to obtain the input image after decompression.
  • the residual coding module is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy encoder and output residual encoded data.
  • the semi-dynamic entropy encoder uses When performing entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation.
  • the default type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-term operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder can Only simple addition and subtraction operations are included, allowing for efficient encoding.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
  • the latent variable encoding module is specifically configured to use latent variables as inputs to the static entropy encoder to obtain latent variable encoded data.
  • the auto-encoding model includes an encoding model and a decoding model.
  • the auto-encoding module is specifically used to: use the input image as the input of the encoding model, output latent variables, and the encoding model is used to extract features from the input graphics. ; Use the latent variable as the input of the decoding model to obtain the first residual distribution.
  • the decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
  • the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
  • this application provides an image decompression device, including:
  • the transceiver module is used to obtain latent variable encoding data and residual encoding data.
  • the latent variable encoding data includes encoding the features extracted from the input image by the encoding end.
  • the residual encoding data includes the image output by the autoregressive model and The data obtained by encoding the residual between the input images;
  • the latent variable decoding module is used to decode the latent variable encoded data to obtain the latent variable.
  • the latent variable includes the features extracted by the encoding end from the input image;
  • the autoencoding module is used to use latent variables as the input of the autoencoding model and output the second residual distribution;
  • the residual decoding module is used to decode the second residual distribution and the residual coded data to obtain the second residual image
  • the autoregressive module is used to use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.
  • the latent variable decoding module is specifically configured to use the latent variable encoded data as the input of the static entropy encoder and output the latent variable.
  • the residual decoding module is specifically configured to use the second residual distribution and residual coded data as inputs of a semi-dynamic entropy encoder, and output a second residual image.
  • the semi-dynamic entropy encoder uses To use the first default class Entropy encoding is performed using a type of encoding operation.
  • the first preset type of encoding operation includes addition, subtraction or bit operations, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation.
  • the second preset type includes multiplication, At least one of division or remainder operations, that is, the semi-dynamic entropy encoder does not include long time-consuming operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder may only include simple addition and subtraction. operations, thereby enabling efficient encoding.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
  • the autoregressive module is specifically configured to decode pixels on the same connection line in the second residual image in parallel through the autoregressive model to obtain the decompressed image.
  • embodiments of the present application provide an image coding device, which has the function of implementing the image processing method in the first aspect.
  • This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • embodiments of the present application provide an image decompression device, which has the function of implementing the image processing method in the second aspect.
  • This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • embodiments of the present application provide an image encoding device, including: a processor and a memory, wherein the processor and the memory are interconnected through lines, and the processor calls the program code in the memory to execute any one of the above first aspects. Shown are processing-related functions used in image coding methods.
  • the image encoding device may be a chip.
  • embodiments of the present application provide an image decompression device, including: a processor and a memory, wherein the processor and the memory are interconnected through lines, and the processor calls the program code in the memory to execute any one of the above second aspects. Shown are the processing-related functions used in the image decompression method.
  • the image decompression device may be a chip.
  • inventions of the present application provide an image encoding device.
  • the image encoding device may also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are
  • the processing unit executes, and the processing unit is configured to perform processing-related functions in the above-mentioned first aspect or any optional implementation manner of the first aspect.
  • inventions of the present application provide an image decompression device.
  • the image encoding device may also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are
  • the processing unit executes, and the processing unit is configured to perform processing-related functions as in the above-mentioned second aspect or any optional implementation manner of the second aspect.
  • an embodiment of the present application provides an image processing system, which is characterized in that it includes an image encoding device and an image decompression device, and the image encoding device is configured to perform the above-mentioned first aspect or any one of the first aspects.
  • the image decompression device is configured to perform processing-related functions as described in the above second aspect or any one of the second aspects. Select processing-related functions in the implementation.
  • embodiments of the present application provide a computer-readable storage medium, including instructions that, when run on a computer, cause the computer to execute any of the optional implementations of the first aspect or the second aspect. method.
  • embodiments of the present application provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the method in any optional implementation of the first aspect or the second aspect.
  • Figure 1 is a schematic diagram of an artificial intelligence subject framework applied in this application
  • Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of an application scenario according to the embodiment of the present application.
  • Figure 4 is a schematic diagram of another application scenario according to the embodiment of the present application.
  • Figure 5 is a schematic diagram of another application scenario according to the embodiment of the present application.
  • Figure 6 is a schematic flowchart of an image encoding method provided by an embodiment of the present application.
  • Figure 7 is a schematic flow chart of another image encoding method provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a prediction method of an autoregressive model provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of the prediction sequence of an autoregressive model provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of a residual calculation method provided by the embodiment of the present application.
  • Figure 11 is a schematic diagram of a data structure provided by an embodiment of the present application.
  • Figure 12 is a schematic flow chart of an image decompression method provided by an embodiment of the present application.
  • Figure 13 is a schematic flow chart of another image decompression method provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of an image coding device provided by this application.
  • Figure 15 is a schematic structural diagram of an image decoding device provided by this application.
  • Figure 16 is a schematic structural diagram of another image coding device provided by the present application.
  • FIG. 17 is a schematic structural diagram of another image decoding device provided by this application.
  • Figure 18 is a schematic structural diagram of a chip provided by this application.
  • Figure 1 shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” ranges from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to systems
  • the industrial ecological process reflects the value that artificial intelligence brings to the information technology industry.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by smart chips, such as central processing unit (CPU), neural-network processing unit (NPU), graphics processing unit (GPU), dedicated integration Hardware acceleration chips such as application specific integrated circuit (ASIC) or field programmable gate array (FPGA) are provided;
  • the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include Cloud storage and computing, interconnection network, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
  • the embodiments of the present application involve a large number of related applications of neural networks and images.
  • the relevant terms and concepts in the fields of neural networks and images that may be involved in the embodiments of the present application are first introduced below.
  • the neural network can be composed of neural units.
  • the neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input.
  • the output of the arithmetic unit can be as shown in the following formula:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple intermediate layers.
  • DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in between are all intermediate layers, or hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complex, each layer of it can be expressed as a linear relationship expression: in, is the input vector, is the output vector, is the offset vector or bias parameter, w is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and offset vector The number is also relatively large.
  • DNN The definitions of these parameters in DNN are as follows: Taking the coefficient w as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as
  • the input layer has no W parameter.
  • more intermediate layers make the network more capable of describing complex situations in the real world.
  • a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a neuron can be connected to only some of the neighboring layer neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information independent of position.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training of convolutional neural network In the process, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the loss function can usually include error square mean square, cross entropy, logarithm, exponential and other loss functions. For example, one can use the error mean square as the loss function, defined as The specific loss function can be selected according to the actual application scenario.
  • the neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • Entropy coding refers to coding that does not lose any information according to the entropy principle during the coding process.
  • Information entropy is the average amount of information in the source (a measure of uncertainty).
  • Common entropy codes include: Shannon coding, Huffman coding, arithmetic coding, etc.
  • the optimal compression scheme can be obtained using entropy coding technology.
  • an image with probability p can be represented by -log 2 p bits. For example: an image with a probability of 1/8 needs to be represented by 3 bits, and an image with a probability of 1/256 needs to be represented by 8 bits.
  • the algorithm needs to know the probability of each letter appearing as accurately as possible, and the model's job is to provide this data. The better the predictions of the model, the better the compression results. Furthermore the model must present the same data during compression and recovery.
  • the static model (or static entropy coding) analyzes the entire text to calculate the probability of each letter before compression. The result of this calculation is applied to the entire text.
  • the encoding table only needs to be calculated once, so the encoding speed is high, and the result will definitely not be longer than the original text except for the probability value required during decoding.
  • the entropy coding used may include static entropy coding methods such as tANS or fse.
  • Forward dynamics The probability is calculated based on the letters that have been encoded. Each time a letter is encoded, its probability increases.
  • Inverse dynamics before encoding, calculate the probability of each letter in the remaining unencoded part. As the encoding proceeds, more and more letters no longer appear, and their probabilities become 0, while the probabilities of the remaining letters increase, and the number of bits encoded for them decreases. The compression ratio keeps increasing so that the last letter only requires 0 bits to encode.
  • the model is optimized according to the specificity of different parts; probabilistic data does not need to be transmitted in the forward model.
  • entropy coding is divided into many types. For example, it can be divided into static entropy coding, semi-dynamic entropy coding and dynamic entropy coding.
  • static entropy coding uses a single probability distribution for coding
  • semi-dynamic coding uses multiple (i.e. limited types) probability distributions for coding
  • dynamic entropy coding uses any infinite types of probability distributions for coding.
  • the same variable such as x in previous periods that is, x 1 to x t-1 , is used to predict the performance of x t in the current period, and it is assumed that they are a linear relationship. Because this is developed from linear regression in regression analysis, it just does not use x to predict y, but uses x to predict x; so it is called autoregression.
  • the autoencoding model is a neural network that uses the backpropagation algorithm to make the output value equal to the input value. It first compresses the input data into a latent space representation, and then reconstructs the output through this representation.
  • Autoencoding models usually include encoding (encoder) models and decoder (decoder) models.
  • the trained encoding model is used to extract features from the input image to obtain latent variables.
  • the latent variables are input to the trained decoding model to output the predicted residual corresponding to the input image.
  • a technology that compresses data After compression, the data takes up less space than before compression, and the compressed data can be decompressed to restore the original data.
  • the decompressed data is completely consistent with the data before compression.
  • the greater the probability of occurrence of each pixel in the image that is, the probability value obtained when the pixel value of the current pixel is predicted by the pixel value of other pixels), the shorter the compressed length will be.
  • the probability of a real image is much higher than that of a randomly generated image, so the number of bits per pixel (bpd) required for compression is much smaller than the latter.
  • the BPD of most images is significantly smaller than before compression, and has only a very small probability to be higher than before compression, thus reducing the average bpd of each image.
  • the ratio of the original data size to the compressed data size If there is no compression, the value is 1. The larger the value, the better.
  • the size of raw data that can be compressed/decompressed per second is the size of raw data that can be compressed/decompressed per second.
  • the point When predicting a pixel, the point needs to be known in advance. Changing points in the non-receptive field does not change the prediction of the pixel.
  • the encoding method and decoding method provided by the embodiments of this application can be executed on the server or on the terminal device.
  • the neural network mentioned below in this application can be deployed on the server or on the terminal. , which can be adjusted according to actual application scenarios.
  • the encoding method and decoding method provided by this application can Deployed in the terminal through plug-ins.
  • the terminal device may be a mobile phone with image processing function, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), or a personal digital assistant (PDA). ), personal computer (PC), camera, camcorder, smart watch, wearable device (WD) or self-driving vehicle, etc., the embodiments of the present application are not limited to this.
  • the following is an exemplary description taking the encoding method and decoding method provided by this application being deployed on a terminal as an example.
  • All or part of the processes in the encoding method and decoding method provided by this application can be implemented through neural networks.
  • the autoregressive model, autoencoding model, etc. can be implemented through neural networks.
  • the neural network needs to be deployed on the terminal after training.
  • this embodiment of the present application provides a system architecture 100.
  • data collection device 160 is used to collect training data.
  • the training data may include a large number of high-definition images.
  • the data collection device 160 After collecting the training data, the data collection device 160 stores the training data into the database 130, and the training device 120 trains to obtain the target model/rules 101 based on the training data maintained in the database 130.
  • the training set mentioned in the following embodiments of this application may be obtained from the database 130 or may be obtained through user input data.
  • the target model/rule 101 may be a neural network trained in the embodiment of the present application.
  • the neural network may include one or more networks, such as an autoregressive model or an autoencoding model.
  • the training device 120 processes the input three-dimensional model and compares the output image with the high-quality rendering image corresponding to the input three-dimensional model until the training device The difference between the output image 120 and the high-quality rendered image is less than a certain threshold, thereby completing the training of the target model/rule 101.
  • the above target model/rule 101 can be used to implement the neural network mentioned in the encoding method and decoding method in the embodiment of the present application, that is, the data to be processed (such as the image to be compressed) is input to the target after relevant preprocessing.
  • Model/Rule 101 you can get the processing results.
  • the target model/rule 101 in the embodiment of this application may specifically be the neural network mentioned below in this application, and the neural network may be the aforementioned CNN, DNN or RNN type of neural network.
  • the training data maintained in the database 130 may not necessarily be collected by the data collection device 160, but may also be received from other devices.
  • the training device 120 may not necessarily train the target model/rules 101 based entirely on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training, which is not limited in this application. .
  • the target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in Figure 2.
  • the execution device 110 can also be called a computing device.
  • the execution device 110 It can be a terminal, such as a mobile phone terminal, a tablet, a laptop, an augmented reality (AR)/virtual reality (VR), a vehicle-mounted terminal, etc. It can also be a server or cloud device, etc.
  • the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data may include: data to be processed input by the client device.
  • the client can be other hardware devices, such as terminals or servers, etc.
  • the client can also be software deployed on the terminal, such as APPs, web pages, etc.
  • the preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as data to be processed) received by the I/O interface 112.
  • the preprocessing module 113 and the preprocessing module may not be present.
  • 114 there can also be only one preprocessing module, and the calculation module 111 is directly used to process the input data.
  • the execution device 110 When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processes, the execution device 110 can call data, codes, etc. in the data storage system 150 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result to the client device 140 to provide it to the user. For example, if the first neural network is used for image classification and the processing result is a classification result, the I/O interface 112 The classification results obtained above are returned to the client device 140 to provide them to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the The above tasks, thereby providing the user with the desired results.
  • the execution device 110 and the training device 120 may be the same device, or located within the same computing device. To facilitate understanding, this application will introduce the execution device and the training device separately, which is not a limitation.
  • the user can manually set the input data, and the manual setting can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send input data to the I/O interface 112. If requiring the client device 140 to automatically send input data requires the user's authorization, the user can set corresponding permissions in the client device 140.
  • the user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be display, sound, action, etc.
  • the client device 140 can also serve as a data collection end, collecting the input data input to the I/O interface 112 as shown in the figure and the predicted labels output from the I/O interface 112 as new sample data, and stored in the database 130 .
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 as shown in the figure and the predicted label output from the I/O interface 112 as a new sample.
  • the data is stored in database 130.
  • Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.
  • the target model/rule 101 is obtained by training according to the training device 120.
  • the target model/rule 101 can be the neural network in the present application.
  • the neural network provided in the embodiment of the present application It can include CNN, deep convolutional neural networks (DCNN), recurrent neural network (RNN) or constructed neural networks, etc.
  • the encoding method and decoding method in the embodiment of the present application can be executed by an electronic device, which is the aforementioned execution device.
  • This electronic device includes a CPU and a GPU that can compress images.
  • other devices may also be included, such as NPU or ASIC, etc. This is only an illustrative description and will not be repeated one by one.
  • the electronic device may be a mobile phone (mobile phone), tablet computer, notebook computer, PC, mobile internet device (mobile internet device, MID), wearable device, virtual reality (VR) device, augmented reality device, etc.
  • Augmented reality (AR) equipment wireless electronic equipment in industrial control (industrial control), Wireless electronic devices in self-driving, wireless electronic devices in remote medical surgery, wireless electronic devices in smart grid, wireless electronic devices in transportation safety, Wireless electronic devices in smart cities, wireless electronic devices in smart homes, etc.
  • the electronic device can be a device running Android system, IOS system, Windows system and other systems.
  • the electronic device can run applications that need to compress images to obtain compressed images, such as communication software, photo albums or camera applications.
  • entropy coding can be used for compression.
  • the distribution of the image is unknown, so the original distribution needs to be estimated, and the estimated distribution is input into the entropy encoder for encoding.
  • the more accurate the estimate the higher the compression rate.
  • Traditional image lossless compression algorithms mostly adopt the principle of "similar pixel values are usually closer" and use fixed prediction methods. The encoding efficiency of this method is low.
  • AI image lossless compression can also be used for compression. Compared with traditional encoding algorithms, AI algorithms can achieve significantly higher compression rates, but the compression/decompression efficiency is very low.
  • autoregressive models can be used for image compression. If you build an autoregressive model and input the values of all previous pixels, you can output the distribution parameters of the predicted points. If the distribution is a Gaussian distribution, the output is the two parameters of mean and variance.
  • the autoregressive model for compression, all pixels can be input to the autoregressive model to obtain the distribution prediction of the pixel, and the distribution prediction of the pixel and the value of the pixel can be input into the entropy encoder to obtain the encoded data.
  • all pixels are input to the autoregressive model to obtain the distribution prediction of the pixels.
  • the distribution prediction and its encoding data are input to the entropy encoder to obtain the decoded data.
  • the prediction of each pixel relies on all previous pixels, which results in low operating efficiency.
  • all pixels before the current pixel need to be decompressed before the current pixel can be decompressed. Only one network inference can be decompressed at a time. pixels, the number of network inferences is large, and the decompression efficiency is low.
  • an autoencoding model can be used for image compression.
  • the original data is input into the encoding network (Encoder) to obtain the hidden variables
  • the hidden variables are input into the decoding network (Decoder) to obtain the distribution prediction of the image
  • the manually designed distribution and the value of the hidden variable are input into the entropy encoding, Encoding latent variables; input the distribution prediction of the image and the original image into entropy coding to encode the image.
  • the hand-designed distribution and the encoding of the hidden variables are input into the entropy encoding, and the hidden variables are decoded; the hidden variables are input into the decoding network (Decoder) to obtain the distribution prediction of the image; the distribution prediction of the image and the encoding of the image are input Entropy coding,decoding images.
  • the decoding network Decoder
  • autoencoding models have poorer fitting capabilities. If the compression rate exceeds that of traditional compression algorithms, a deeper network is required, and the latency of a single network inference is high.
  • this application provides an encoding method and a decoding method that utilizes autoregressive models and autoencoder models for lossless compression, and provides an efficient semi-dynamic entropy encoder so that both the model inference and encoding processes run on the AI chip. It reduces the transmission between system memory and AI chip memory, and achieves high-bandwidth compression and decompression.
  • the terminal may include a mobile phone, a camera, a monitoring device, or other devices with a shooting function or connected to a camera device.
  • the image can be losslessly compressed through the encoding method provided by this application, thereby obtaining compressed encoded data.
  • you need to read the image such as when displaying the image in a photo album
  • images can be efficiently and losslessly compressed, reducing the content required to save the image, losslessly restoring the image, and decompressing the image to obtain a high-definition image.
  • image transmission may be involved.
  • users when users use communication software to communicate, they can transmit images through wired or wireless networks.
  • the encoding method provided by this application can be used to The image is losslessly compressed to obtain the compressed encoded data, and then the encoded data is transmitted. After receiving the encoded data, the receiving end can decode the encoded data through the decoding method provided by this application to obtain the restored image.
  • Scenario 3 The server saves a large number of images
  • the input image can be an image to be compressed, and the autoregressive model can be used to use the values of other pixels in the input image except the current pixel to predict the pixel value of the current pixel, and obtain the predicted pixels of each pixel. distribution, that is, the first image.
  • the input image may include a variety of images, and the sources of the input images may be different depending on the scene.
  • the input image may be a photographed image, a received image, etc.
  • the pixel values of the predicted pixels can be used for prediction, so that in the subsequent decoding process, for the pixels on the same connection, pixels, there is no need to wait for other pixels to be decoded before the current pixel can be decoded, achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.
  • the same connection can be the same row, the same column, the same diagonal, etc., which can be determined according to the actual application scenario.
  • the residual value between each pixel point in the first image and the corresponding pixel point in the input image can be calculated to obtain the first residual image.
  • the resolution between the first image and the input image is usually the same, that is, the pixel points in the first image and the input image correspond one to one. Therefore, when calculating the residual value, the residual value between each pair of pixel points can be calculated. Difference value, the obtained residual value can form an image, that is, the first residual image.
  • the residual value when calculating the residual, is usually an integer type in the range [-255, 255].
  • the residual value can be converted to a low-precision numerical type to represent it, such as converting the integer to a uint8 numerical type. , thereby reducing the value to [0, 255], and by setting the offset, the residual values of each pixel point are distributed near 128, thereby making the data more concentrated, and the input image can be expressed with less data and the residual distribution between the autoregressive model output images.
  • the input image can also be used as the input of the autoencoding model to output the corresponding latent variable and first residual distribution.
  • the latent variable may include features extracted from the input image, and the first residual distribution may include the residual between each pixel of the input image predicted by the autoencoding model and the corresponding pixel in the first residual image. value.
  • the autoencoding model may include an encoding model and a decoding model.
  • the encoding model may be used to extract features from the input image
  • the decoding model may be used to predict the residual between the input image and the image output by the autoregressive model. That is, features can be extracted from the input image through the encoding model to obtain latent variables used to represent important features of the input image.
  • the latent variables are used as input to the decoding model to output the first residual distribution.
  • step 601 may be executed first, step 603 may be executed first, or step 601 and step 603 may be executed simultaneously. The details may be adjusted according to the actual application scenario.
  • the first residual image and the first residual distribution may be encoded to obtain residual encoded data.
  • semi-dynamic entropy coding when encoding the first residual image and the first residual distribution, semi-dynamic entropy coding can be used, that is, a limited kind of probability distribution is used for encoding to obtain the encoded data of the residual image, that is, the residual encoded data.
  • the semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation.
  • the first preset type of encoding operation includes addition, subtraction or bit operation
  • the semi-dynamic entropy encoder does not include a second preset type. Assuming a type of encoding operation, the second preset type includes at least one of multiplication, division or remainder operation that takes a long time to improve encoding efficiency.
  • a limited number of probability distributions can be used for encoding to obtain the encoding of the residual image.
  • decompressing a character requires more instructions; division and exponentiation are time-consuming, and each instruction takes dozens of times as much as addition.
  • Encoding can achieve efficient encoding and improve encoding efficiency.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
  • the latent variable can include important features extracted from the input image, so when performing image compression, it can The extracted important features are encoded to obtain residual encoded data, which facilitates subsequent image restoration and obtains a lossless image.
  • static entropy coding can be used.
  • the latent variable is taken as input to the static entropy encoder, which outputs an encoded bitstream of the latent variable.
  • the latent variable encoded data and residual encoded data can be used at the decoder to perform lossless image restoration, thereby achieving lossless compression and restoration of the image.
  • autoencoding models usually have poor fitting capabilities and require a deeper network to achieve a better compression rate.
  • this application combines the output results of the autoregressive model, thereby reducing the size of the autoencoding model. Therefore, in this application, the output results of the autoregressive model and the autoencoding model are combined for coding, which can control both the autoencoding and the autoregressive models to a very small size and avoid the long inference time caused by the large network of the autoencoding model. problem to achieve efficient image compression.
  • the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
  • the input image 701 is obtained.
  • the input image 701 may include an image collected by itself or a received image.
  • the input image may include an image collected by the terminal, or may be an image received by the terminal from other servers or terminals.
  • the input image 701 is used as the input of the autoregressive model 702, and a predicted image 703 is output.
  • the autoregressive model can be used to predict the pixel probability distribution of each pixel using the adjacent pixels of each pixel to obtain the predicted image 703, which is the aforementioned first image.
  • the autoregressive model can use the pixel values of adjacent pixels to predict the pixel value of the current pixel.
  • the pixel values of adjacent pixels can be used for prediction in parallel.
  • a specific autoregressive model as an example, as shown in Figure 8, given an m ⁇ n image and hyperparameter h (0 ⁇ h ⁇ n), if for any pixel (i,j), automatically All points (i′, j′) predicting (i, j) in the regression model satisfy: h ⁇ i′+j′ ⁇ h ⁇ i+j, then this image can be parallelized n+(m-1) ⁇ h times
  • the pixel probability distribution of a point is the probability that the pixel is the value of each pixel.
  • the pixel values of multiple pixels on the left can be selected in units of 2 as receptive fields to predict the pixel probability distribution of the current pixel.
  • pixels on the same diagonal can be decompressed in parallel.
  • the prediction order for each pixel can be shown in Figure 9, where the smaller the number, the higher the priority of the prediction order, and pixels with the same number are predicted at the same time. Therefore, pixels on the same diagonal can be predicted in parallel to improve the prediction efficiency of the autoregressive model.
  • image residual 704 is the aforementioned first residual image.
  • the autoregressive model For example, given the original image x, that is, the input image, use the autoregressive model to predict the original image and obtain the predicted reconstructed image.
  • the image residual between each pixel of the reconstructed image and the original image can be calculated
  • the difference between the corresponding pixels of the input image and the predicted image can be calculated to obtain the residual value between each pixel to form a residual image.
  • the residual value when calculating the residual, you can convert the residual value to an integer type in the range [-255, 255] and convert it to a low-precision numerical type, such as converting the integer to a uint8 numerical type, thereby converting The value is reduced to [0, 255], and the offset can be set so that the residual values of each pixel are distributed around 128, thereby making the data more concentrated, and the input image and autoregression can be expressed with less data. Distribution of residuals between model output images.
  • N Gaussian distribution or logistic distribution.
  • the input image is also input to the autoencoding model 705, and the prediction residual 707 and the latent variable 706 are output.
  • the original image x can be input to the autoencoding model, and the autoencoding model is used to estimate the probability distribution p(r
  • the autoencoding model can include an encoding model (encoder) and a decoding model (decoder).
  • the input image is used as the input of the encoding model.
  • Important features can be extracted from the input image to obtain the latent variable 706, and then the latent variable can be used as the decoder.
  • the input to the model is the output prediction residual 707.
  • the autoencoding model can be a pre-trained model. Specifically, it can use an autoencoder (AutoEncoder, AE), a variational word encoder (Variational AutoEncoder, VAE) or VQ-VAE (Vector Quantised-Variational AutoEncoder), etc., specifically It can be adjusted according to actual application scenarios, and this application does not limit this.
  • AutoEncoder AutoEncoder, AE
  • VAE Variational AutoEncoder
  • VQ-VAE Vector Quantised-Variational AutoEncoder
  • latent variable 706 may be encoded to obtain latent variable encoding 708.
  • the latent variables can be encoded using static entropy coding. That is, the numerical structure is used to represent data with high probability using shorter bit numbers, and data with low probability using longer bits.
  • the tree structure can be shown in Figure 11, and its corresponding bits can be expressed as shown in Table 1.
  • the data a 1 a 2 a 1 a 4 is encoded as 0100110.
  • image residual 704 and the prediction residual 707 can also be encoded to obtain residual encoding 709.
  • semi-dynamic entropy coding can be performed on the image residual 704 and the prediction residual 707 to obtain residual coding.
  • dynamic coding uses state (usually a large integer, or a large or small number) to represent data, and uses the probability information of the data to change the state value.
  • the final coded value is a 0 or 1 representation of the state.
  • an M value must first be set, which represents the number of bits required to represent a probability. For a character a i , its corresponding PMF i is proportional to its probability, and the sum is 2 M ; its corresponding CDF i is the accumulation of all previous PMF values, that is, PMF 1 +PMF 2 +...+PMF i-1 .
  • Dynamic entropy coding can also be used as static entropy coding. When the value in the table is a fixed value, it is static entropy coding; when the tables of different symbols are not exactly the same, dynamic entropy coding is needed.
  • the speed bottlenecks in dynamic entropy coding include: symbol search and operations during decompression: division and remainder operations are the most time-consuming, followed by multiplication. Therefore, in order to reduce the efficiency caused by the wireless probability distribution method in dynamic entropy coding, this application provides a semi-dynamic entropy coding.
  • approximate processing is first performed, such as replacing operations such as multiplication, division, and remainder in dynamic entropy coding with approximate lightweight operations such as addition, subtraction, and bitwise operations.
  • the state value S is truncated and approximated, but the differences include:
  • this solution changes it to an approximate solution method of loop + bit operation to further reduce the storage space required for tabulation.
  • the loop in this calculation takes a long time, so after this processing, the time consumption will usually exceed that of the original rANS. However, in subsequent processing, the number of loops will be tabulated to achieve efficient compression and decompression.
  • a table is used to precalculate and store the number of cycles (that is, the number of state right shifts), and the difference between the next state and this state under this distribution and symbol.
  • this solution stores the intermediate result ⁇ .
  • the number of loops can be calculated as ( ⁇ +S)>>M.
  • the encoding method provided by this application can reduce the memory space required to store the table.
  • the semi-dynamic entropy coding method provided by this application stores the difference between two states after the state is shifted to the right. This method can be stored with unsigned numbers, reducing the number by half for the same number of digits. memory space.
  • subsequent operations can be performed. For example, the residual code 709 and the latent variable code 708 are saved, or the residual code 709 and the latent variable code 708 are transmitted to the receiving end. The details can be determined according to the actual application scenario.
  • the method provided by the embodiment of the present application can be applied to image lossless compression to achieve efficient image lossless compression. It also provides an efficient semi-dynamic entropy encoder, allowing the model inference and encoding processes to run on the AI chip, reducing the transmission between system memory and AI chip memory, and achieving high-bandwidth compression and decompression.
  • the decoder can read the latent variable encoded data and residual encoded data locally, or receive the latent variable encoded data and residual encoded data sent by the encoding end.
  • the latent variable encoded data and residual encoded data can be determined according to the actual application scenario.
  • the source of differentially encoded data is not limited by this application.
  • the latent variable encoding data can be obtained by encoding the features extracted from the input image at the encoding end.
  • the residual coding data may be obtained by encoding the aforementioned image residual and prediction residual at the encoding end.
  • the image residual may include the residual between the input image at the encoding end and the image output by the autoregressive model.
  • the latent variable coded data and residual coded data can be referred to the relevant introductions in Figures 6 to 11 mentioned above, and will not be described again here.
  • the method of decoding the latent variable encoded data can correspond to the encoding end.
  • the static entropy encoder can be used for decoding during decoding.
  • the latent variable encoded data is used as the input of the static entropy encoder to output the latent variable.
  • Cain variables may include features extracted from the input image.
  • the cain variables represent features in the decompression image.
  • the latent variable After decoding the latent variable encoding data to obtain the latent variable, the latent variable can be used as the input of the autoencoding model, and the corresponding second residual distribution is output, that is, the image corresponding to the first residual distribution at the encoding end. It can be understood that To represent the residual distribution between the image output by the autoregressive model in the encoding end and the input image.
  • the autoencoding model can include a decoding model, and the predicted residual image can be output by using the latent variable as the input of the decoding model.
  • the decoding model may be a trained model and is used to output a residual image corresponding to the input image.
  • the residual image may be understood as a residual value between the residual image predicted by the autoregressive model and the input image.
  • both the encoding end and the decoding end deploy autoregressive models and autoencoding models, and the autoregressive model on the encoding end is the same as the autoregressive model on the decoding end. If the encoding end and decoding end are deployed in the same device, then Coding end and The auto-encoding model on the decoding end is the same. If the encoding end and decoding end are deployed in different devices, the encoding end and decoding end can deploy the same auto-encoding model, or a complete auto-encoding model can be deployed on the encoding end and deployed on the decoding end.
  • the decoding model in the autoencoding model can be adjusted according to actual application scenarios, and this application does not limit this.
  • the second residual distribution and the residual coded data can be combined for decoding to obtain the second residual image.
  • the decoding end can also decode based on semi-dynamic entropy coding and output the second residual image, that is, the image corresponding to the first residual image on the encoding end.
  • the semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation.
  • the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include the second preset type.
  • the second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include multiplication, division or remainder operations. Only simple addition and subtraction operations can be included in the dynamic entropy encoder, allowing efficient encoding.
  • the semi-dynamic entropy encoder can participate in the related descriptions in the aforementioned Figures 6 to 11, and will not be described again here.
  • the process can be performed
  • the inverse operation to deduce the second residual distribution is equivalent to obtaining the residual between the first image output by the autoregressive model at the encoding end and the input image, that is, the first residual distribution.
  • the second residual distribution can be used as the input of the autoregressive model for backpropagation, and the decompressed image can be deduced, that is, lossless recovery of the input image at the encoding end can be achieved.
  • the autoregressive model at the encoding end uses the pixel values of the predicted pixels to predict the values of pixels on the same connection, then in When the decoding end performs the decoding operation, the values of pixels on the same connection can be decoded in parallel to achieve efficient decoding.
  • the same connection can be the same row, the same column, the same diagonal, etc., which can be determined according to the actual application scenario.
  • the autoencoding model usually has poor fitting ability and requires a deeper network to achieve a better compression rate.
  • the present application combines the output results of the autoregressive model, thereby reducing the error of the autoencoding model. size. Therefore, in this application, the autoregressive model and the autoencoding model are combined for decoding, and both the autoencoding and the autoregressive models can be controlled to be very small, thus avoiding the problem of too long inference time caused by an excessively large network of the autoencoding model. Enable efficient image decompression.
  • the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
  • the latent variable encoding 1301 and the residual encoding 1302 can be read locally or received from the encoding end, and can be adjusted according to the actual application scenario.
  • the latent variable encoding 1301 and the residual encoding may be the latent variable encoding 708 and the residual encoding 709 mentioned in FIG. 7 .
  • the latent variable encoding 1301 is input to the static entropy encoder 1303, and the latent variable 1304 is output.
  • the bits corresponding to each probability in entropy coding can be as shown in the aforementioned Table 1.
  • the probability corresponding to each character can be determined based on the corresponding relationship, thereby outputting the latent variable. It can be understood as decompressing important features in the image.
  • the latent variable 1304 is then used as the input of the decoding model in the autoencoding model 1305, and the prediction residual 1306 is output.
  • the decoding model is similar to the decoding model in Figure 7 and will not be described again here.
  • the prediction residual 1306 is similar to the aforementioned prediction residual 707 and will not be described again here.
  • both the residual encoding 1302 and the prediction residual 1306 are used as inputs to the semi-dynamic entropy encoder, and an image residual 1308 is output.
  • the image residual 1308 is similar to the aforementioned image residual 704 and will not be described again here.
  • the decoding process of semi-dynamic entropy coding can be understood as the inverse operation of the aforementioned semi-dynamic entropy coding, that is, when the prediction residual and residual coding are known, the image residual is inversely inferred.
  • the image residual 1308 After the image residual 1308 is obtained, the image residual can be used as the input of backpropagation of the autoregressive model 1309 to infer the decompressed image 1310.
  • the autoregressive model 1309 is a trained model, which is the same as the aforementioned autoregressive model 702. It can be understood that when the image residual is known, the input image 701 is deduced in reverse.
  • the encoding end uses the pixel values of the predicted pixels to predict the pixel values of the pixels on the same line in parallel when outputting the prediction residual through the autoregressive model, then reverse the autoregressive model.
  • the pixel values of pixels on the same line can be decoded to achieve parallel decoding.
  • the decompression sequence includes:
  • the autoregressive model implements a lightweight design and only contains 12 parameters. For a three-channel image, each channel only needs 4 parameters for prediction.
  • the autoencoder model uses a vector quantized autoencoder. It uses the vector codebook to reduce the space of latent variables and sets the codebook size to 256. That is, the value space of the latent variables in the autoencoder is limited to 256 integers. middle.
  • the encoder and decoder of the autoencoder both use four residual convolution blocks, and the number of channels for each layer of features is 32.
  • the model training process and testing process are as follows:
  • Training Train on the training set of a single data set to obtain the parameters of the autoregressive model, the autoencoding model, and the statistics of the latent variables, which are used to compress the latent variables.
  • Decompression Using the method provided by this application, the residual coding and latent variables of all images are used as input in the decompression process at once, and the original images of all images are output in parallel.
  • PILC Practical Image Lossless Compression, image lossless compression
  • this technical invention improves the throughput rate by 14 times while maintaining the compression rate. At the same time, this technical invention improves the compression rate and throughput rate. It is also better than traditional methods such as PNG, WebP, and FLIF.
  • the method provided by this application combines the autoregressive model and the autoencoding model, which greatly reduces the model size compared to using the autoencoding model alone.
  • the autoregressive model provided by this application can realize parallel encoding and parallel decompression, efficient encoding and decoding, and efficient image compression and decompression.
  • the process of the method provided in this application can be run on the AI chip, which avoids the transmission of information between the system memory and the AI chip memory, further improving the encoding and decoding efficiency.
  • this embodiment is designed as follows.
  • Model training In the model training stage, high-definition large data sets such as OpenImage and ImageNet64 are used for model training to obtain the parameters of the autoregressive model and the autoencoding model.
  • OpenImage and ImageNet64 are used for model training to obtain the parameters of the autoregressive model and the autoencoding model.
  • the decompression speed is increased by 7.9 times compared with the non-parallel solution.
  • the parallel scheme has a restriction on the receptive field, but this receptive field has a limited impact on the compression ratio.
  • the coding speed of the semi-dynamic entropy coding (ANS-AI) proposed in this application is increased by 20 times, the decoding speed is increased by 100 times, and the BPD loss is less than 0.55 and 0.17.
  • this semi-dynamic entropy coding can be Running on the AI chip, on a single V100, the peak speed can reach 1GB/s.
  • semi-dynamic entropy coding reduces the number of distribution types required from 2048 to 8, the memory size required for preprocessing is reduced to 1/256 of the original, and the BPD loss is less than 0.03, which can reduce entropy coding.
  • Required computing resources to improve coding efficiency compared with dynamic entropy coding, semi-dynamic entropy coding reduces the number of distribution types required from 2048 to 8, the memory size required for preprocessing is reduced to 1/256 of the original, and the BPD loss is less than 0.03, which can reduce entropy coding. Required computing resources to improve coding efficiency.
  • the image coding device includes:
  • the autoregressive module 1401 is used to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;
  • the residual calculation module 1402 is used to obtain the residual between the first image and the input image to obtain the first residual image
  • the autoencoding module 1403 is used to use the input image as the input of the autoencoding model, and output latent variables and a first residual distribution.
  • the latent variables include features extracted from the input image, and the first residual distribution includes the features output by the autoencoding model. Used to represent the residual value between each pixel in the input image and the corresponding pixel in the first residual image;
  • Residual coding module 1404 used to code the first residual image and the first residual distribution to obtain residual coded data
  • the latent variable encoding module 1405 is used to encode latent variables to obtain latent variable encoded data.
  • the latent variable encoded data and residual encoded data are used to obtain the input image after decompression.
  • the residual encoding module 1404 is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy encoder, and output residual encoding data.
  • the semi-dynamic entropy encoding The encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type of encoding operation includes addition, subtraction or bit operation.
  • the second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-time operations such as multiplication, division or remainder operations.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, such as addition, subtraction, and bit operations. and other efficient coding operations to achieve efficient coding.
  • the latent variable encoding module 1405 is specifically configured to use latent variables as inputs to the static entropy encoder to obtain latent variable encoded data.
  • the auto-encoding model includes an encoding model and a decoding model.
  • the auto-encoding module 1403 is specifically used to: use the input image as the input of the encoding model, output latent variables, and the encoding model is used to extract from the input graphics. Features; use the latent variable as the input of the decoding model to obtain the first residual distribution, and the decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
  • the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
  • the image decompression device includes:
  • the transceiver module 1501 is used to obtain latent variable coded data and residual coded data.
  • the latent variable coded data includes the features extracted from the input image by the coding end and is obtained by coding.
  • the residual coded data includes the third output of the autoregressive model. Data obtained by encoding the residual between an image and the input image;
  • the latent variable decoding module 1502 is used to decode the latent variable encoded data to obtain latent variables.
  • the latent variables include features extracted by the encoding end from the input image;
  • the autoencoding module 1503 is used to use latent variables as the input of the autoencoding model and output the second residual distribution;
  • the residual decoding module 1504 is used to decode the second residual distribution and the residual coded data to obtain the second residual image
  • the autoregressive module 1505 is configured to use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.
  • the latent variable decoding module 1502 is specifically configured to use the latent variable encoded data as the input of the static entropy encoder and output the latent variable.
  • the residual decoding module 1504 is specifically configured to use the second residual distribution and the residual coded data as inputs of the semi-dynamic entropy encoder, and output the second residual image.
  • the semi-dynamic entropy encoder The encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type of encoding operation includes addition, subtraction or bit operation.
  • the second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-time operations such as multiplication, division or remainder operations.
  • the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder.
  • the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder.
  • the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder.
  • simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
  • the autoregressive module 1505 is specifically configured to decode pixels on the same connection line in the second residual image in parallel through an autoregressive model to obtain a decompressed image.
  • Figure 16 is a schematic structural diagram of another image encoding device provided by this application, as follows.
  • the image encoding device may include a processor 1601 and a memory 1602.
  • the processor 1601 and the memory 1602 are interconnected through lines.
  • the memory 1602 stores program instructions and data.
  • the memory 1602 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 6-11.
  • the processor 1601 is configured to execute the method steps performed by the image encoding device shown in any of the embodiments shown in FIGS. 6 to 11 .
  • the image encoding device may also include a transceiver 1603 for receiving or sending data.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program for generating vehicle driving speed.
  • the computer When running on the computer, the computer is caused to execute the steps shown in Figures 6 to 11.
  • the illustrated embodiments describe steps in a method.
  • the aforementioned image encoding device shown in FIG. 16 is a chip.
  • Embodiments of the present application also provide an image encoding device.
  • the image encoding device may also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is configured to perform the method steps performed by the image encoding device shown in any of the embodiments in FIGS. 6 to 11 .
  • An embodiment of the present application also provides a digital processing chip.
  • the digital processing chip integrates the circuit and one or more interfaces for realizing the above-mentioned processor 1601, or the functions of the processor 1601.
  • the digital processing chip can complete the method steps of any one or more of the foregoing embodiments.
  • the digital processing chip does not have an integrated memory, it can be connected to an external memory through a communication interface.
  • the digital processing chip implements the actions performed by the image encoding device in the above embodiment according to the program code stored in the external memory.
  • the image encoding device may be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor.
  • the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the server executes the image encoding method described in the embodiments shown in FIGS. 6-11.
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Figure 17 is a schematic structural diagram of another image decompression device provided by this application, as described below.
  • the image decompression device may include a processor 1701 and a memory 1702.
  • the processor 1701 and the memory 1702 are interconnected through lines.
  • the memory 1702 stores program instructions and data.
  • the memory 1702 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 12-13.
  • the processor 1701 is configured to execute the method steps performed by the image decompression device shown in any of the embodiments shown in FIGS. 12 and 13 .
  • the image decompression device may also include a transceiver 1703 for receiving or sending data.
  • Embodiments of the present application also provide a computer-readable storage medium, which stores useful For the program that generates the vehicle's driving speed, when it is driving on the computer, the computer is caused to execute the steps in the method described in the embodiment shown in FIGS. 12-13.
  • the aforementioned image decompression device shown in Figure 17 is a chip.
  • Embodiments of the present application also provide an image decompression device.
  • the image decompression device may also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to execute the method steps executed by the image decompression device shown in any of the embodiments in FIGS. 12 and 13 .
  • An embodiment of the present application also provides a digital processing chip.
  • the digital processing chip integrates the circuit and one or more interfaces for realizing the above-mentioned processor 1701, or the functions of the processor 1701.
  • the digital processing chip can complete the method steps of any one or more of the foregoing embodiments.
  • the digital processing chip does not have an integrated memory, it can be connected to an external memory through a communication interface.
  • the digital processing chip implements the actions performed by the image decompression device in the above embodiment according to the program code stored in the external memory.
  • the image decompression device may be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor.
  • the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the server executes the image decompression method described in the embodiments shown in FIGS. 6-11.
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the image decompression device or the image decompression device in the method described in the embodiments shown in Figures 6 to 13. step.
  • This application also provides an image processing system, which includes an image encoding device and an image decompression device.
  • the image encoding device is used to execute the method steps corresponding to the aforementioned Figures 6-11.
  • the image decompression device is used to execute the aforementioned Figures 12-11.
  • Figure 13 corresponds to the method steps.
  • the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (GPU), or a digital signal processing unit.
  • CPU central processing unit
  • NPU network processor
  • GPU graphics processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or any conventional processor, etc.
  • Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 180.
  • the NPU 180 serves as a co-processor and is mounted to the main CPU ( On the Host CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1803.
  • the arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1803 includes multiple processing units (process engines, PEs) internally.
  • the arithmetic circuit 1803 is a two-dimensional systolic array.
  • the arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1803 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .
  • the unified memory 1806 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (direct memory access controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802.
  • Input data is also transferred to unified memory 1806 via DMAC.
  • DMAC direct memory access controller
  • Bus interface unit (bus interface unit, BIU) 1810 is used for interaction between the AXI bus and DMAC and instruction fetch buffer (IFB) 1809.
  • IOB instruction fetch buffer
  • the bus interface unit 1810 (bus interface unit, BIU) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .
  • the vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, upsampling of feature planes, etc.
  • vector calculation unit 1807 can store the processed output vectors to unified memory 1806 .
  • the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;
  • the unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the recurrent neural network can be performed by the operation circuit 1803 or the vector calculation unit 1807.
  • the processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the methods in Figures 6 to 13.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. can be based on actual It is necessary to select some or all of the modules to achieve the purpose of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., including a number of instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments of this application.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

The present application provides an image encoding method and apparatus, and an image decompression method and apparatus, relating to computer vision in the field of artificial intelligence, and used for encoding in combination with the output of an autoregressive model and the output of an autoencoding model, reducing the sizes of the required models and improving the encoding and decoding efficiency. The image encoding method comprises: taking an input image as the input of the autoregressive model, and outputting a first image; obtaining a residual between the first image and the input image to obtain a first residual image; taking the input image as the input of the autoencoding model, and outputting a hidden variable and a first residual distribution, the hidden variable comprising features extracted from the input image, and the first residual distribution comprising a residual value corresponding to each pixel point in the input image outputted by the autoencoding model; encoding the first residual image and the first residual distribution to obtain residual encoded data; and encoding the hidden variable to obtain hidden variable encoded data, the hidden variable encoded data and the residual encoded data being decompressed to obtain the input image.

Description

一种图像编码方法、图像解压方法以及装置An image encoding method, image decompression method and device
本申请要求于2022年04月26日提交中国专利局、申请号为“202210447177.0”、申请名称为“一种图像编码方法、图像解压方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on April 26, 2022, with the application number "202210447177.0" and the application title "An image encoding method, image decompression method and device", and its entire content is approved by This reference is incorporated into this application.
技术领域Technical field
本申请涉及图像处理领域,尤其涉及一种图像编码方法、图像解压方法以及装置。The present application relates to the field of image processing, and in particular, to an image encoding method, image decompression method and device.
背景技术Background technique
图像在各个领域使用广泛,在大量场景中都可能涉及到图像的传输或保存等。且随着图像的分辨率越高,在保存图像时所需要的存储空间消耗也就越多,在传输图像时所需的带宽也就越高,传输效率也就越低。因此,通常,为了便于图像的传输或者保存,可以对图像进行压缩,从而降低图像所占的比特数量,进而减少保存图像所需的存储空间以及传输图像所需的带宽。Images are widely used in various fields, and the transmission or storage of images may be involved in a large number of scenarios. And with the higher the resolution of the image, more storage space is consumed when saving the image, the higher the bandwidth required when transmitting the image, and the lower the transmission efficiency. Therefore, in general, in order to facilitate the transmission or storage of images, images can be compressed to reduce the number of bits occupied by the image, thereby reducing the storage space required to save the image and the bandwidth required to transmit the image.
例如,在一些常用的图像压缩方式中,可以采用熵编码的方式来进行图像压缩,如常用的熵编码算法有霍夫曼编码,算术编码,ANS编码等来进行图像压缩。然而,多种熵编码方式的压缩率均已达到最优,很难再进一步提高压缩率。因此,如何提高编解码效率,成为亟待解决的问题。For example, in some common image compression methods, entropy coding can be used for image compression. For example, commonly used entropy coding algorithms include Huffman coding, arithmetic coding, ANS coding, etc. for image compression. However, the compression rates of various entropy coding methods have reached the optimal level, and it is difficult to further improve the compression rate. Therefore, how to improve encoding and decoding efficiency has become an urgent problem to be solved.
发明内容Contents of the invention
本申请提供一种图像编码方法、图像解压方法以及装置,用于结合自回归模型和自编码模型的输出进行编码,降低所需模型的大小,提高编解码效率。This application provides an image encoding method, an image decompression method and a device for encoding by combining the output of an autoregressive model and an autoencoding model, reducing the size of the required model and improving encoding and decoding efficiency.
有鉴于此,第一方面,本申请提供一种图像编码方法,包括:将输入图像作为自回归模型的输入,输出第一图像;获取第一图像和输入图像之间的残差,得到第一残差图像;将输入图像作为自编码模型的输入,输出隐变量和第一残差分布,隐变量包括从输入图像中提取到的特征,第一残差分布包括自编码模型预测的用于表示输入图像中各个像素点和第一残差图像中各个像素点对应的残差值;对第一残差图像和第一残差分布进行编码,得到残差编码数据;对隐变量进行编码,得到隐变量编码数据,隐变量编码数据和残差编码数据用于解压后得到输入图像。In view of this, in a first aspect, the present application provides an image coding method, which includes: using the input image as the input of the autoregressive model, outputting the first image; obtaining the residual between the first image and the input image, and obtaining the first Residual image; use the input image as the input of the autoencoding model, output latent variables and the first residual distribution, the latent variables include features extracted from the input image, and the first residual distribution includes the predictions of the autoencoding model for representation Input the residual value corresponding to each pixel point in the image and each pixel point in the first residual image; encode the first residual image and the first residual distribution to obtain the residual encoded data; encode the latent variable to obtain Latent variable encoded data, latent variable encoded data and residual encoded data are used to obtain the input image after decompression.
因此,本申请中,结合了自回归模型与自编码模型的输出结果进行编码,可以将自编码与自回归模型都控制到很小,避免了自编码模型的网络过大造成的推理时间过长的问题,实现高效的图像压缩。并且,本申请提供的方法中,全流程均可基于AI芯片的AI无损压缩实现,包括AI模型及熵编码,避免了系统内存与AI芯片内存的传输问题,提高编码效率。Therefore, in this application, the output results of the autoregressive model and the autoencoding model are combined for coding, which can control both the autoencoding and the autoregressive models to a very small size and avoid the long inference time caused by the large network of the autoencoding model. problem to achieve efficient image compression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
在一种可能的实施方式中,前述的对第一残差图像和第一残差分布进行编码,得到残差编码数据,包括:将第一残差图像和第一残差分布作为半动态熵编码器的输入,输出残差编码数据,该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运 算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算,即该半动态熵编码器中可以仅包括简单的加减运算,从而可以实现高效编码。In a possible implementation, the aforementioned encoding of the first residual image and the first residual distribution to obtain residual encoded data includes: using the first residual image and the first residual distribution as semi-dynamic entropy The input of the encoder is to output residual encoded data. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder is used to perform entropy encoding. The second preset type of encoding operation is not included in the entropy encoder. calculation, the second preset type includes at least one of multiplication, division or remainder operation, that is, the semi-dynamic entropy encoder does not include multiplication, division or remainder operation and other long-consuming operations, that is, the semi-dynamic entropy Only simple addition and subtraction operations can be included in the encoder, allowing efficient encoding.
因此,本申请实施方式中,可以对残差图像进行半动态熵编码,以有限种分布方式进行编码,相对于动态熵编码减少了乘、除以及取余运算等耗时较多的运损,大大提高了编码效率。Therefore, in the embodiment of the present application, the residual image can be semi-dynamic entropy encoded and encoded in a limited distribution manner. Compared with dynamic entropy encoding, time-consuming operations such as multiplication, division and remainder operations are reduced. The coding efficiency is greatly improved.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
在一种可能的实施方式中,前述的对隐变量进行编码,得到残差编码数据,可以包括:将隐变量作为静态熵编码器的输入,得到隐变量编码数据。In a possible implementation, the aforementioned coding of latent variables to obtain residual coded data may include: using latent variables as inputs of a static entropy encoder to obtain latent variable coded data.
因此,本申请实施方式中,可以对从输入图像中提取到的特征进行静态熵编码,可以高效地实现编码。Therefore, in the embodiment of the present application, static entropy coding can be performed on the features extracted from the input image, and coding can be achieved efficiently.
在一种可能的实施方式中,自编码模型可以包括编码模型和解码模型,将输入图像作为自编码模型的输入,输出隐变量和第一残差分布,包括:将输入图像作为编码模型的输入,输出隐变量,编码模型用于从输入图形中提取特征;将隐变量作为解码模型的输入,得到第一残差分布,解码模型用于预测输入的图像与对应的像素分布之间的残差值。In a possible implementation, the autoencoding model may include an encoding model and a decoding model, using the input image as the input of the autoencoding model, and outputting the latent variable and the first residual distribution, including: using the input image as the input of the encoding model. , output latent variables, and the encoding model is used to extract features from the input graphics; the latent variables are used as input to the decoding model to obtain the first residual distribution, and the decoding model is used to predict the residual between the input image and the corresponding pixel distribution. value.
本申请实施方式中,可以利用训练好的自编码模型来从输入图像中提取重要特征,并预测对应的残差图像,以便于结合自回归模型的输出,得到能表示输入图像中的数据的残差编码数据。In the implementation of the present application, a trained autoencoding model can be used to extract important features from the input image and predict the corresponding residual image, so as to combine the output of the autoregressive model to obtain a residual that can represent the data in the input image. Poorly encoded data.
在一种可能的实施方式中,自回归模型用于使用已预测的像素点的像素值预测处于同一连线上的像素点的值,以使后续解码过程中,针对同一连线上的像素点,无需等待其他像素点解码后才能对当前像素点进行解码,实现同一连线上的像素点的解码效率,提高针对输入图像的解码效率。In a possible implementation, the autoregressive model is used to predict the values of pixels on the same connection using the pixel values of the predicted pixels, so that in the subsequent decoding process, for the pixels on the same connection , there is no need to wait for other pixels to be decoded before the current pixel can be decoded, achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.
第二方面,本申请提供一种图像解压方法,包括:获取隐变量编码数据和残差编码数据,该隐变量编码数据包括编码端从输入图像中提取到的特征进行编码得到,该残差编码数据包括自回归模型前向传播输出的图像和该输入图像之间的残差进行编码得到;对隐变量编码数据进行解码,得到隐变量,该隐变量包括编码端从输入图像中提取到的特征;将隐变量作为自编码模型的输入,输出第二残差分布;结合第二残差分布和残差编码数据进行解码,得到第二残差图像;将第二残差图像作为自回归模型的反向传播的输入,输出解压图像。In a second aspect, this application provides an image decompression method, which includes: obtaining latent variable coded data and residual coded data. The latent variable coded data includes features extracted by the coding end from the input image and is obtained by coding. The residual coded data The data includes the residual error between the forward propagation output image of the autoregressive model and the input image, which is encoded; the latent variable encoded data is decoded to obtain the latent variable. The latent variable includes the features extracted by the encoding end from the input image. ; Use the latent variable as the input of the autoencoding model and output the second residual distribution; combine the second residual distribution and the residual encoded data for decoding to obtain the second residual image; use the second residual image as the autoregressive model Backpropagation takes input and outputs a decompressed image.
因此,本申请实施方式,自编码模型通常拟合能力较差,需要用较深的网络才能达到 较好的压缩率,而本申请结合了自回归模型的输出结果,从而可以降低自编码模型的大小。因此,本申请中,结合了自回归模型与自编码模型进行解码,可以将自编码与自回归模型都控制到很小,避免了自编码模型的网络过大造成的推理时间过长的问题,实现高效的图像解压。并且,本申请提供的方法中,全流程均可基于AI芯片的AI无损压缩实现,包括AI模型及熵编码,避免了系统内存与AI芯片内存的传输问题,提高编码效率。Therefore, in the implementation of this application, the autoencoding model usually has poor fitting ability and requires a deeper network to achieve Better compression rate, and this application combines the output results of the autoregressive model, thereby reducing the size of the autoencoding model. Therefore, in this application, the autoregressive model and the autoencoding model are combined for decoding, and both the autoencoding and the autoregressive models can be controlled to be very small, thus avoiding the problem of too long inference time caused by an excessively large network of the autoencoding model. Enable efficient image decompression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
在一种可能的实施方式中,前述的对隐变量编码数据进行解码,得到隐变量,包括:将隐变量编码数据作为静态熵编码器的输入,输出隐变量。其中,该解码可以理解为编码端进行静态熵编码的逆运算,从而无损恢复得到图像中的重要特征。In a possible implementation, the aforementioned decoding of latent variable encoded data to obtain latent variables includes: using latent variable encoded data as input to a static entropy encoder and outputting latent variables. Among them, this decoding can be understood as the inverse operation of static entropy coding performed by the encoding end, so that important features in the image can be obtained by lossless recovery.
在一种可能的实施方式中,前述的结合第二残差分布和残差编码数据进行解码,得到第二残差图像,包括:将第二残差分布和残差编码数据作为半动态熵编码器的输入,输出第二残差图像,该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算,即该半动态熵编码器中可以仅包括简单的加减运算,从而可以实现高效编码。因此,可以对残差图像基于半动态熵编码进行解码,以有限种分布方式进行解码,相对于动态熵编码减少了乘、除以及取余运算等耗时较多的运损,大大提高了解码效率。In a possible implementation, the aforementioned combination of the second residual distribution and the residual coded data for decoding to obtain the second residual image includes: using the second residual distribution and the residual coded data as semi-dynamic entropy coding The semi-dynamic entropy encoder is used as input to the encoder to output a second residual image. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder is The entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of multiplication, division, or remainder operations, that is, the semi-dynamic entropy encoder does not include multiplication, division, or remainder operations. The semi-dynamic entropy encoder can only include simple addition and subtraction operations to achieve high-efficiency encoding. Therefore, the residual image can be decoded based on semi-dynamic entropy coding and decoded in a limited distribution manner. Compared with dynamic entropy coding, time-consuming operations such as multiplication, division and remainder operations are reduced, and the decoding efficiency is greatly improved. efficiency.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
在一种可能的实施方式中,前述的将第二残差图像作为自回归模型的反向传播的输入,输出解压图像,包括:通过自回归模型,对第二残差图像中处于同一连线上的像素点进行并行解码,得到解压图像。因此,针对同一连线上的像素点,无需等待其他像素点解码后才能对当前像素点进行解码,实现同一连线上的像素点的解码效率,提高针对输入图像的解码效率。In a possible implementation, the aforementioned second residual image is used as the input of the back propagation of the autoregressive model, and outputting the decompressed image includes: using the autoregressive model, the second residual image is on the same line The pixels on the image are decoded in parallel to obtain the decompressed image. Therefore, for pixels on the same connection, there is no need to wait for other pixels to be decoded before the current pixel can be decoded, thereby achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.
第三方面,本申请提供一种图像编码装置,包括:In a third aspect, this application provides an image coding device, including:
自回归模块,用于将输入图像作为自回归模型的输入,输出第一图像,自回归模型;The autoregressive module is used to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;
残差计算模块,用于获取第一图像和输入图像之间的残差,得到第一残差图像;The residual calculation module is used to obtain the residual between the first image and the input image to obtain the first residual image;
自编码模块,用于将输入图像作为自编码模型的输入,输出隐变量和第一残差分布,隐变量包括从输入图像中提取到的特征,第一残差分布包括自编码模型输出的用于表示输入图像中各个像素点和第一残差图像中各个像素点对应的残差值;The autoencoding module is used to use the input image as the input of the autoencoding model, and output latent variables and a first residual distribution. The latent variables include features extracted from the input image, and the first residual distribution includes the user output of the autoencoding model. Yu represents the residual value corresponding to each pixel point in the input image and each pixel point in the first residual image;
残差编码模块,用于对第一残差图像和第一残差分布进行编码,得到残差编码数据; A residual coding module, used to code the first residual image and the first residual distribution to obtain residual coded data;
隐变量编码模块,用于对隐变量进行编码,得到隐变量编码数据,隐变量编码数据和残差编码数据用于解压后得到输入图像。The latent variable encoding module is used to encode latent variables to obtain latent variable encoded data. The latent variable encoded data and residual encoded data are used to obtain the input image after decompression.
在一种可能的实施方式中,残差编码模块,具体用于将第一残差图像和第一残差分布作为半动态熵编码器的输入,输出残差编码数据,该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算,即该半动态熵编码器中可以仅包括简单的加减运算,从而可以实现高效编码。In a possible implementation, the residual coding module is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy encoder and output residual encoded data. The semi-dynamic entropy encoder uses When performing entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation. The default type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-term operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder can Only simple addition and subtraction operations are included, allowing for efficient encoding.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
在一种可能的实施方式中,隐变量编码模块,具体用于将隐变量作为静态熵编码器的输入,得到隐变量编码数据。In a possible implementation, the latent variable encoding module is specifically configured to use latent variables as inputs to the static entropy encoder to obtain latent variable encoded data.
在一种可能的实施方式中,自编码模型包括编码模型和解码模型,自编码模块,具体用于:将输入图像作为编码模型的输入,输出隐变量,编码模型用于从输入图形中提取特征;将隐变量作为解码模型的输入,得到第一残差分布,解码模型用于预测输入的图像与对应的像素分布之间的残差值。In a possible implementation, the auto-encoding model includes an encoding model and a decoding model. The auto-encoding module is specifically used to: use the input image as the input of the encoding model, output latent variables, and the encoding model is used to extract features from the input graphics. ; Use the latent variable as the input of the decoding model to obtain the first residual distribution. The decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
在一种可能的实施方式中,自回归模型用于使用已预测的像素点的像素值预测处于同一连线上的像素点的值。In a possible implementation, the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
第四方面,本申请提供一种图像解压装置,包括:In a fourth aspect, this application provides an image decompression device, including:
收发模块,用于获取隐变量编码数据和残差编码数据,该隐变量编码数据包括编码端从输入图像中提取到的特征进行编码得到,该残差编码数据包括对自回归模型输出的图像和该输入图像之间的残差进行编码得到的数据;The transceiver module is used to obtain latent variable encoding data and residual encoding data. The latent variable encoding data includes encoding the features extracted from the input image by the encoding end. The residual encoding data includes the image output by the autoregressive model and The data obtained by encoding the residual between the input images;
隐变量解码模块,用于对隐变量编码数据进行解码,得到隐变量,该隐变量包括编码端从输入图像中提取到的特征;The latent variable decoding module is used to decode the latent variable encoded data to obtain the latent variable. The latent variable includes the features extracted by the encoding end from the input image;
自编码模块,用于将隐变量作为自编码模型的输入,输出第二残差分布;The autoencoding module is used to use latent variables as the input of the autoencoding model and output the second residual distribution;
残差解码模块,用于结合第二残差分布和残差编码数据进行解码,得到第二残差图像;The residual decoding module is used to decode the second residual distribution and the residual coded data to obtain the second residual image;
自回归模块,用于将第二残差图像作为自回归模型的反向传播的输入,输出解压图像。The autoregressive module is used to use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.
在一种可能的实施方式中,隐变量解码模块,具体用于将隐变量编码数据作为静态熵编码器的输入,输出隐变量。In a possible implementation, the latent variable decoding module is specifically configured to use the latent variable encoded data as the input of the static entropy encoder and output the latent variable.
在一种可能的实施方式中,残差解码模块,具体用于将第二残差分布和残差编码数据作为半动态熵编码器的输入,输出第二残差图像,该半动态熵编码器用于使用第一预设类 型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算,即该半动态熵编码器中可以仅包括简单的加减运算,从而可以实现高效编码。In a possible implementation, the residual decoding module is specifically configured to use the second residual distribution and residual coded data as inputs of a semi-dynamic entropy encoder, and output a second residual image. The semi-dynamic entropy encoder uses To use the first default class Entropy encoding is performed using a type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operations, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation. The second preset type includes multiplication, At least one of division or remainder operations, that is, the semi-dynamic entropy encoder does not include long time-consuming operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder may only include simple addition and subtraction. operations, thereby enabling efficient encoding.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
在一种可能的实施方式中,自回归模块,具体用于通过自回归模型,对第二残差图像中处于同一连线上的像素点进行并行解码,得到解压图像。In a possible implementation, the autoregressive module is specifically configured to decode pixels on the same connection line in the second residual image in parallel through the autoregressive model to obtain the decompressed image.
第五方面,本申请实施例提供一种图像编码装置,该图像编码装置具有实现上述第一方面图像处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fifth aspect, embodiments of the present application provide an image coding device, which has the function of implementing the image processing method in the first aspect. This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
第六方面,本申请实施例提供一种图像解压装置,该图像解压装置具有实现上述第二方面图像处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a sixth aspect, embodiments of the present application provide an image decompression device, which has the function of implementing the image processing method in the second aspect. This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
第七方面,本申请实施例提供一种图像编码装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的用于图像编码方法中与处理相关的功能。可选地,该图像编码装置可以是芯片。In a seventh aspect, embodiments of the present application provide an image encoding device, including: a processor and a memory, wherein the processor and the memory are interconnected through lines, and the processor calls the program code in the memory to execute any one of the above first aspects. Shown are processing-related functions used in image coding methods. Alternatively, the image encoding device may be a chip.
第八方面,本申请实施例提供一种图像解压装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第二方面任一项所示的用于图像解压方法中与处理相关的功能。可选地,该图像解压装置可以是芯片。In an eighth aspect, embodiments of the present application provide an image decompression device, including: a processor and a memory, wherein the processor and the memory are interconnected through lines, and the processor calls the program code in the memory to execute any one of the above second aspects. Shown are the processing-related functions used in the image decompression method. Optionally, the image decompression device may be a chip.
第九方面,本申请实施例提供了一种图像编码装置,该图像编码装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能。In the ninth aspect, embodiments of the present application provide an image encoding device. The image encoding device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are The processing unit executes, and the processing unit is configured to perform processing-related functions in the above-mentioned first aspect or any optional implementation manner of the first aspect.
第十方面,本申请实施例提供了一种图像解压装置,该图像编码装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第二方面或第二方面任一可选实施方式中与处理相关的功能。In a tenth aspect, embodiments of the present application provide an image decompression device. The image encoding device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are The processing unit executes, and the processing unit is configured to perform processing-related functions as in the above-mentioned second aspect or any optional implementation manner of the second aspect.
第十一方面,本申请实施例提供了一种图像处理系统,其特征在于,包括图像编码装置和图像解压装置,所述图像编码装置用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能,所述图像解压装置用于执行如上述第二方面或第二方面任一可 选实施方式中与处理相关的功能。In an eleventh aspect, an embodiment of the present application provides an image processing system, which is characterized in that it includes an image encoding device and an image decompression device, and the image encoding device is configured to perform the above-mentioned first aspect or any one of the first aspects. In the optional embodiment, the image decompression device is configured to perform processing-related functions as described in the above second aspect or any one of the second aspects. Select processing-related functions in the implementation.
第十二方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。In a twelfth aspect, embodiments of the present application provide a computer-readable storage medium, including instructions that, when run on a computer, cause the computer to execute any of the optional implementations of the first aspect or the second aspect. method.
第十三方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面中任一可选实施方式中的方法。In a thirteenth aspect, embodiments of the present application provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the method in any optional implementation of the first aspect or the second aspect.
附图说明Description of the drawings
图1为本申请应用的一种人工智能主体框架示意图;Figure 1 is a schematic diagram of an artificial intelligence subject framework applied in this application;
图2为本申请实施例提供的一种系统架构示意图;Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图3为本申请实施例的一种应用场景示意图;Figure 3 is a schematic diagram of an application scenario according to the embodiment of the present application;
图4为本申请实施例的另一种应用场景示意图;Figure 4 is a schematic diagram of another application scenario according to the embodiment of the present application;
图5为本申请实施例的另一种应用场景示意图;Figure 5 is a schematic diagram of another application scenario according to the embodiment of the present application;
图6为本申请实施例提供的一种图像编码方法的流程示意图;Figure 6 is a schematic flowchart of an image encoding method provided by an embodiment of the present application;
图7为本申请实施例提供的另一种图像编码方法的流程示意图;Figure 7 is a schematic flow chart of another image encoding method provided by an embodiment of the present application;
图8为本申请实施例提供的一种自回归模型的预测方式示意图;Figure 8 is a schematic diagram of a prediction method of an autoregressive model provided by an embodiment of the present application;
图9为本申请实施例提供的一种自回归模型的预测顺序示意图;Figure 9 is a schematic diagram of the prediction sequence of an autoregressive model provided by an embodiment of the present application;
图10为本申请实施例提供的一种残差计算方式示意图;Figure 10 is a schematic diagram of a residual calculation method provided by the embodiment of the present application;
图11为本申请实施例提供的一种数据结构示意图;Figure 11 is a schematic diagram of a data structure provided by an embodiment of the present application;
图12为本申请实施例提供的一种图像解压方法的流程示意图;Figure 12 is a schematic flow chart of an image decompression method provided by an embodiment of the present application;
图13为本申请实施例提供的另一种图像解压方法的流程示意图;Figure 13 is a schematic flow chart of another image decompression method provided by an embodiment of the present application;
图14为本申请提供的一种图像编码装置的结构示意图;Figure 14 is a schematic structural diagram of an image coding device provided by this application;
图15为本申请提供的一种图像解码装置的结构示意图;Figure 15 is a schematic structural diagram of an image decoding device provided by this application;
图16为本申请提供的另一种图像编码装置的结构示意图;Figure 16 is a schematic structural diagram of another image coding device provided by the present application;
图17为本申请提供的另一种图像解码装置的结构示意图;Figure 17 is a schematic structural diagram of another image decoding device provided by this application;
图18为本申请提供的一种芯片结构示意图。Figure 18 is a schematic structural diagram of a chip provided by this application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统 的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a structural schematic diagram of the artificial intelligence main framework. The following is from the "intelligent information chain" (horizontal axis) and "IT value chain" ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" ranges from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to systems The industrial ecological process reflects the value that artificial intelligence brings to the information technology industry.
(1)基础设施(1)Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by smart chips, such as central processing unit (CPU), neural-network processing unit (NPU), graphics processing unit (GPU), dedicated integration Hardware acceleration chips such as application specific integrated circuit (ASIC) or field programmable gate array (FPGA) are provided; the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include Cloud storage and computing, interconnection network, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2)Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3)Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
(5)智能产品及行业应用(5) Intelligent products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
本申请实施例涉及了大量神经网络和图像的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络和图像领域的相关术语和概念进行介绍。The embodiments of the present application involve a large number of related applications of neural networks and images. In order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts in the fields of neural networks and images that may be involved in the embodiments of the present application are first introduced below.
(1)神经网络(1)Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以如以下公式所示:
The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input. The output of the arithmetic unit can be as shown in the following formula:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层中间层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,中间层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是中间层,或者称为隐层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple intermediate layers. DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all intermediate layers, or hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
虽然DNN看起来很复杂,其每一层可以表示为线性关系表达式:其中,是输入向量,是输出向量,是偏移向量或者称为偏置参数,w是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,系数W和偏移向量的数量也比较多。这些参数在DNN中的定义如下所述:以系数w为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。Although DNN looks complex, each layer of it can be expressed as a linear relationship expression: in, is the input vector, is the output vector, is the offset vector or bias parameter, w is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and offset vector The number is also relatively large. The definitions of these parameters in DNN are as follows: Taking the coefficient w as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为 To sum up, the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的中间层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer has no W parameter. In a deep neural network, more intermediate layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
(3)卷积神经网络(3) Convolutional neural network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练 过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In the convolutional layer of a convolutional neural network, a neuron can be connected to only some of the neighboring layer neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information independent of position. The convolution kernel can be initialized in the form of a matrix of random size. During the training of convolutional neural network In the process, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(4)损失函数(4)Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失函数,定义为具体可以根据实际应用场景选择具体的损失函数。In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value that we really want to predict, we can compare the predicted value of the current network with the really desired target value, and then based on the difference between the two to update the weight vector of each layer of the neural network according to the difference (of course, there is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in the deep neural network). For example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and continue to adjust until the deep neural network can predict the really desired target value or a value that is very close to the really desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is an important function for measuring the difference between the predicted value and the target value. equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible. The loss function can usually include error square mean square, cross entropy, logarithm, exponential and other loss functions. For example, one can use the error mean square as the loss function, defined as The specific loss function can be selected according to the actual application scenario.
(5)反向传播算法(5)Back propagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
(6)熵编码(6)Entropy coding
熵编码即编码过程中按熵原理不丢失任何信息的编码。信息熵为信源的平均信息量(不确定性的度量)。常见的熵编码有:香农(Shannon)编码、哈夫曼(Huffman)编码和算术编码(arithmetic coding)等。Entropy coding refers to coding that does not lose any information according to the entropy principle during the coding process. Information entropy is the average amount of information in the source (a measure of uncertainty). Common entropy codes include: Shannon coding, Huffman coding, arithmetic coding, etc.
例如,若预测的图像中各个像素点的像素值分布已知,则最优压缩方案可利用熵编码技术获得。利用熵编码技术,一张概率为p的图像可利用-log2p比特表示。例如:概率为1/8的图像需要用3个比特表示,概率为1/256的图像需用8个比特表示。For example, if the pixel value distribution of each pixel in the predicted image is known, the optimal compression scheme can be obtained using entropy coding technology. Using entropy coding technology, an image with probability p can be represented by -log 2 p bits. For example: an image with a probability of 1/8 needs to be represented by 3 bits, and an image with a probability of 1/256 needs to be represented by 8 bits.
要确定每个字母的比特数算法需要尽可能精确地知道每个字母的出现机率,模型的任务是提供这个数据。模型的预言越好压缩的结果就越好。此外模型必须在压缩和恢复时提出同样的数据。To determine the number of bits for each letter, the algorithm needs to know the probability of each letter appearing as accurately as possible, and the model's job is to provide this data. The better the predictions of the model, the better the compression results. Furthermore the model must present the same data during compression and recovery.
静态模型(或者称为静态熵编码)在压缩前对整个文字进行分析计算每个字母的机率。这个计算结果用于整个文字上。编码表只需计算一次,因此编码速度高,除在解码时所需要的机率值外结果肯定不比原文长。本申请提供的方法中,所采用的熵编码可以包括tANS或fse等静态熵编码方式。The static model (or static entropy coding) analyzes the entire text to calculate the probability of each letter before compression. The result of this calculation is applied to the entire text. The encoding table only needs to be calculated once, so the encoding speed is high, and the result will definitely not be longer than the original text except for the probability value required during decoding. In the method provided by this application, the entropy coding used may include static entropy coding methods such as tANS or fse.
动态模型在这个模型里机率随编码过程而不断变化。通过多种算法可以达到这个目的,如: Dynamic model In this model, the probabilities continuously change during the encoding process. This goal can be achieved through a variety of algorithms, such as:
前向动态:机率按照已经被编码的字母来计算,每次一个字母被编码后它的机率就增高。Forward dynamics: The probability is calculated based on the letters that have been encoded. Each time a letter is encoded, its probability increases.
反向动态:在编码前计算每个字母在剩下的还未编码的部分的机率。随着编码的进行最后越来越多的字母不再出现,它们的机率成为0,而剩下的字母的机率升高,为它们编码的比特数降低。压缩率不断增高,以至于最后一个字母只需要0比特来编码。Inverse dynamics: before encoding, calculate the probability of each letter in the remaining unencoded part. As the encoding proceeds, more and more letters no longer appear, and their probabilities become 0, while the probabilities of the remaining letters increase, and the number of bits encoded for them decreases. The compression ratio keeps increasing so that the last letter only requires 0 bits to encode.
因此,模型按照不同部位的特殊性优化;在前向模型中机率数据不需要输送。Therefore, the model is optimized according to the specificity of different parts; probabilistic data does not need to be transmitted in the forward model.
本申请中,将熵编码分为多种,如可以分为静态熵编码、半动态熵编码以及动态熵编码,无论哪种编码器,实现的目的均为:对于概率为p的数据,用接近log2p的长度将其编码出来。区别在于,静态熵编码采用单一概率分布进行编码,半动态采用多种(即有限种)概率分布进行编码,而动态熵编码采用任意无限种概率分布进行编码。In this application, entropy coding is divided into many types. For example, it can be divided into static entropy coding, semi-dynamic entropy coding and dynamic entropy coding. No matter which encoder is used, the purpose achieved is: for data with probability p, use a method close to The length of log 2 p encodes it. The difference is that static entropy coding uses a single probability distribution for coding, semi-dynamic coding uses multiple (i.e. limited types) probability distributions for coding, and dynamic entropy coding uses any infinite types of probability distributions for coding.
(7)自回归模型(7)Autoregressive model
是一种处理时间序列的方式,其用同一变量的前期历史数据来预测当前数据。It is a way of processing time series that uses previous historical data of the same variable to predict current data.
例如,用同一变数例如x的之前各期,即x1至xt-1来预测本期xt的表现,并假设它们为一线性关系。因为这是从回归分析中的线性回归发展而来,只是不用x预测y,而是用x预测x;所以叫做自回归。For example, the same variable such as x in previous periods, that is, x 1 to x t-1 , is used to predict the performance of x t in the current period, and it is assumed that they are a linear relationship. Because this is developed from linear regression in regression analysis, it just does not use x to predict y, but uses x to predict x; so it is called autoregression.
(8)自编码模型(8)Autoencoding model
自编码模型是一种利用反向传播算法使得输出值等于输入值的神经网络,先将输入数据压缩成潜在空间表征,然后通过这种表征来重构输出。The autoencoding model is a neural network that uses the backpropagation algorithm to make the output value equal to the input value. It first compresses the input data into a latent space representation, and then reconstructs the output through this representation.
自编码模型通常包括编码(encoder)模型和解码(decoder)模型。本申请中,训练后的编码模型用于从输入图像中提取特征,得到隐变量,将该隐变量输入至训练后的解码模型,即可输出预测的输入图像对应的残差。Autoencoding models usually include encoding (encoder) models and decoder (decoder) models. In this application, the trained encoding model is used to extract features from the input image to obtain latent variables. The latent variables are input to the trained decoding model to output the predicted residual corresponding to the input image.
(9)无损压缩(9)Lossless compression
对数据进行压缩的技术,压缩后数据占用空间小于压缩前,并且压缩后数据能够通过解压还原出原始数据,解压后的数据与压缩前的数据是完全一致的。A technology that compresses data. After compression, the data takes up less space than before compression, and the compressed data can be decompressed to restore the original data. The decompressed data is completely consistent with the data before compression.
通常,图像中各个像素点出现的概率(即通过其他像素点的像素值预测当前像素点的像素值时得到的概率值)越大,压缩后的长度越短。真实存在的图像的概率远高于随机生成的图像,因此压缩每像素所需要的比特数(bpd)远小于后者。在实际应用中,大部分图像的BPD显著小于压缩前,只有极小概率高于压缩前,从而减小平均每张图像的bpd。Generally, the greater the probability of occurrence of each pixel in the image (that is, the probability value obtained when the pixel value of the current pixel is predicted by the pixel value of other pixels), the shorter the compressed length will be. The probability of a real image is much higher than that of a randomly generated image, so the number of bits per pixel (bpd) required for compression is much smaller than the latter. In practical applications, the BPD of most images is significantly smaller than before compression, and has only a very small probability to be higher than before compression, thus reducing the average bpd of each image.
(10)压缩率(10)Compression rate
原始数据大小与压缩后数据大小的比值,如果没有压缩该值为1,该值越大越好。The ratio of the original data size to the compressed data size. If there is no compression, the value is 1. The larger the value, the better.
(11)吞吐量(11)Throughput
每秒钟能够压缩/解压原始数据的大小。The size of raw data that can be compressed/decompressed per second.
(12)感受野(12) Receptive field
预测一个像素点时,需要预先知道的点。改变非感受野中的点不会改变像素点的预测。When predicting a pixel, the point needs to be known in advance. Changing points in the non-receptive field does not change the prediction of the pixel.
本申请实施例提供的编码方法以及解码方法可以在服务器上被执行,还可以在终端设备上被执行,相应地,本申请以下提及的神经网络,可以部署于服务器,也可以部署于终端上,具体可以根据实际应用场景调整。例如,本申请提供的编码方法以及解码方法,可 以通过插件的方式部署于终端中。其中该终端设备可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等,本申请实施例对此不作限定。下面示例性地,以本申请提供的编码方法以及解码方法部署于终端为例进行示例性说明。The encoding method and decoding method provided by the embodiments of this application can be executed on the server or on the terminal device. Correspondingly, the neural network mentioned below in this application can be deployed on the server or on the terminal. , which can be adjusted according to actual application scenarios. For example, the encoding method and decoding method provided by this application can Deployed in the terminal through plug-ins. The terminal device may be a mobile phone with image processing function, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), or a personal digital assistant (PDA). ), personal computer (PC), camera, camcorder, smart watch, wearable device (WD) or self-driving vehicle, etc., the embodiments of the present application are not limited to this. The following is an exemplary description taking the encoding method and decoding method provided by this application being deployed on a terminal as an example.
本申请提供的编码方法以及解码方法中的全部或者部分流程可以通过神经网络来实现,如其中的自回归模型、自编码模型等,都可以通过神经网络来实现。而通常神经网络需要在训练之后部署在终端上,如图2所示,本申请实施例提供了一种系统架构100。在图2中,数据采集设备160用于采集训练数据。在一些可选的实现中,本申请中,针对自回归模型和自编码模型,训练数据可以包括大量高清图像。All or part of the processes in the encoding method and decoding method provided by this application can be implemented through neural networks. For example, the autoregressive model, autoencoding model, etc. can be implemented through neural networks. Generally, the neural network needs to be deployed on the terminal after training. As shown in Figure 2, this embodiment of the present application provides a system architecture 100. In Figure 2, data collection device 160 is used to collect training data. In some optional implementations, in this application, for the autoregressive model and the autoencoding model, the training data may include a large number of high-definition images.
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。可选地,在本申请以下实施方式中所提及的训练集,可以是从该数据库130中得到,也可以是通过用户的输入数据得到。After collecting the training data, the data collection device 160 stores the training data into the database 130, and the training device 120 trains to obtain the target model/rules 101 based on the training data maintained in the database 130. Optionally, the training set mentioned in the following embodiments of this application may be obtained from the database 130 or may be obtained through user input data.
其中,目标模型/规则101可以为本申请实施例中进行训练后的神经网络,该神经网络可以包括一个或者多个网络,如自回归模型或者自编码模型等。The target model/rule 101 may be a neural network trained in the embodiment of the present application. The neural network may include one or more networks, such as an autoregressive model or an autoencoding model.
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的三维模型进行处理,将输出的图像与输入的三维模型对应的高质量渲染图像进行对比,直到训练设备120输出的图像与高质量渲染图像的差值小于一定的阈值,从而完成目标模型/规则101的训练。The following describes how the training device 120 obtains the target model/rules 101 based on the training data. The training device 120 processes the input three-dimensional model and compares the output image with the high-quality rendering image corresponding to the input three-dimensional model until the training device The difference between the output image 120 and the high-quality rendered image is less than a certain threshold, thereby completing the training of the target model/rule 101.
上述目标模型/规则101能够用于实现本申请实施例的用于编码方法以及解码方法中提及的神经网络,即,将待处理数据(如待压缩的图像)通过相关预处理后输入该目标模型/规则101,即可得到处理结果。本申请实施例中的目标模型/规则101具体可以为本申请以下所提及的神经网络,该神经网络可以是前述的CNN、DNN或者RNN等类型的神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,本申请对此并不作限定。The above target model/rule 101 can be used to implement the neural network mentioned in the encoding method and decoding method in the embodiment of the present application, that is, the data to be processed (such as the image to be compressed) is input to the target after relevant preprocessing. Model/Rule 101, you can get the processing results. The target model/rule 101 in the embodiment of this application may specifically be the neural network mentioned below in this application, and the neural network may be the aforementioned CNN, DNN or RNN type of neural network. It should be noted that in actual applications, the training data maintained in the database 130 may not necessarily be collected by the data collection device 160, but may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rules 101 based entirely on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training, which is not limited in this application. .
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图2所示的执行设备110,该执行设备110也可以称为计算设备,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端设备等。在图2中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待处理数据。客户端可以是其他的硬件设备,如终端或者服务器等,客户端也可以是部署于终端上的软件,如APP、网页端等。 The target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in Figure 2. The execution device 110 can also be called a computing device. The execution device 110 It can be a terminal, such as a mobile phone terminal, a tablet, a laptop, an augmented reality (AR)/virtual reality (VR), a vehicle-mounted terminal, etc. It can also be a server or cloud device, etc. In Figure 2, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices. The user can input data to the I/O interface 112 through the client device 140. In this embodiment of the present application, the input data may include: data to be processed input by the client device. The client can be other hardware devices, such as terminals or servers, etc. The client can also be software deployed on the terminal, such as APPs, web pages, etc.
预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如待处理数据)进行预处理,在本申请实施例中,也可以没有预处理模块113和预处理模块114(也可以只有其中的一个预处理模块),而直接采用计算模块111对输入数据进行处理。The preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as data to be processed) received by the I/O interface 112. In the embodiment of the present application, the preprocessing module 113 and the preprocessing module may not be present. 114 (there can also be only one preprocessing module), and the calculation module 111 is directly used to process the input data.
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processes, the execution device 110 can call data, codes, etc. in the data storage system 150 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150 .
最后,I/O接口112将处理结果,则将处理结果返回给客户设备140,从而提供给用户,例如若第一神经网络用于进行图像分类,处理结果为分类结果,则I/O接口112将上述得到的分类结果返回给客户设备140,从而提供给用户。Finally, the I/O interface 112 returns the processing result to the client device 140 to provide it to the user. For example, if the first neural network is used for image classification and the processing result is a classification result, the I/O interface 112 The classification results obtained above are returned to the client device 140 to provide them to the user.
需要说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。在一些场景中,执行设备110和训练设备120可以是相同的设备,或者位于相同的计算设备内部,为便于理解,本申请将执行设备和训练设备分别进行介绍,并不作为限定。It should be noted that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the The above tasks, thereby providing the user with the desired results. In some scenarios, the execution device 110 and the training device 120 may be the same device, or located within the same computing device. To facilitate understanding, this application will introduce the execution device and the training device separately, which is not a limitation.
在图2所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的预测标签作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的预测标签,作为新的样本数据存入数据库130。In the situation shown in FIG. 2 , the user can manually set the input data, and the manual setting can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send input data to the I/O interface 112. If requiring the client device 140 to automatically send input data requires the user's authorization, the user can set corresponding permissions in the client device 140. The user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be display, sound, action, etc. The client device 140 can also serve as a data collection end, collecting the input data input to the I/O interface 112 as shown in the figure and the predicted labels output from the I/O interface 112 as new sample data, and stored in the database 130 . Of course, it is also possible to collect without going through the client device 140. Instead, the I/O interface 112 directly uses the input data input to the I/O interface 112 as shown in the figure and the predicted label output from the I/O interface 112 as a new sample. The data is stored in database 130.
需要说明的是,图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。It should be noted that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2, the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.
如图2所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是本申请中的神经网络,具体的,本申请实施例提供的神经网络可以包括CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)或者构建得到的神经网络等等。As shown in Figure 2, the target model/rule 101 is obtained by training according to the training device 120. In the embodiment of the present application, the target model/rule 101 can be the neural network in the present application. Specifically, the neural network provided in the embodiment of the present application It can include CNN, deep convolutional neural networks (DCNN), recurrent neural network (RNN) or constructed neural networks, etc.
本申请实施例中的编码方法以及解码方法可以由电子设备来执行,该电子设备即前述的执行设备。该电子设备中包括CPU和GPU,能够对图像进行压缩。当然,还可以包括其他设备,如NPU或者ASIC等,此处仅仅是示例性说明,不再一一赘述。示例性地,该电子设备例如可以是手机(mobile phone)、平板电脑、笔记本电脑、PC、移动互联网设备(mobile internet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线电子设备、 无人驾驶(self driving)中的无线电子设备、远程手术(remote medical surgery)中的无线电子设备、智能电网(smart grid)中的无线电子设备、运输安全(transportation safety)中的无线电子设备、智慧城市(smart city)中的无线电子设备、智慧家庭(smart home)中的无线电子设备等。该电子设备可以是运行安卓系统、IOS系统、windows系统以及其他系统的设备。在该电子设备中可以运行有需要对图像进行压缩得到压缩图像的应用程序,例如通信软件、相册或相机等应用。The encoding method and decoding method in the embodiment of the present application can be executed by an electronic device, which is the aforementioned execution device. This electronic device includes a CPU and a GPU that can compress images. Of course, other devices may also be included, such as NPU or ASIC, etc. This is only an illustrative description and will not be repeated one by one. Illustratively, the electronic device may be a mobile phone (mobile phone), tablet computer, notebook computer, PC, mobile internet device (mobile internet device, MID), wearable device, virtual reality (VR) device, augmented reality device, etc. Augmented reality (AR) equipment, wireless electronic equipment in industrial control (industrial control), Wireless electronic devices in self-driving, wireless electronic devices in remote medical surgery, wireless electronic devices in smart grid, wireless electronic devices in transportation safety, Wireless electronic devices in smart cities, wireless electronic devices in smart homes, etc. The electronic device can be a device running Android system, IOS system, Windows system and other systems. The electronic device can run applications that need to compress images to obtain compressed images, such as communication software, photo albums or camera applications.
通常,在一些图像压缩场景中,可以采用熵编码进行压缩。而图像的分布未知,需对原分布做估计,并将估计的分布输入熵编码器进行编码。通常估计得越准确,压缩率越高。传统的图像无损压缩算法多采用“相近的像素值通常较为接近”的原则,利用固定的预测方法。而这种方式编码效率低。Generally, in some image compression scenarios, entropy coding can be used for compression. The distribution of the image is unknown, so the original distribution needs to be estimated, and the estimated distribution is input into the entropy encoder for encoding. Usually the more accurate the estimate, the higher the compression rate. Traditional image lossless compression algorithms mostly adopt the principle of "similar pixel values are usually closer" and use fixed prediction methods. The encoding efficiency of this method is low.
在一些场景中,还可以采用AI图像无损压缩的方式进行压缩,相对于传统编码算法,AI算法可得到显著更高的压缩率,但压缩/解压效率很低。In some scenarios, AI image lossless compression can also be used for compression. Compared with traditional encoding algorithms, AI algorithms can achieve significantly higher compression rates, but the compression/decompression efficiency is very low.
例如,可以采用自回归模型进行图像压缩。如构建一个自回归模型,输入此前所有像素的值,则可输出被预测点的分布参数。如分布为高斯分布,则输出为均值与方差两个参数。在使用自回归模型进行压缩时,可以将所有像素输入至自回归模型,得到像素的分布预测,将像素的分布预测以及像素的值输入熵编码器,得到编码后的数据。在解压时,将所有像素输入至自回归模型,得到像素的分布预测,将分布预测及其编码数据输入至熵编码器,即可得到解码数据。然而,在编码和解码过程中,每个像素的预测依赖此前的所有像素,运行效率低,在解压时,需将当前像素之前的所有像素解压后才能解压当前像素,一次网络推理只能解压一个像素,网络推理的次数大,解压效率低。For example, autoregressive models can be used for image compression. If you build an autoregressive model and input the values of all previous pixels, you can output the distribution parameters of the predicted points. If the distribution is a Gaussian distribution, the output is the two parameters of mean and variance. When using the autoregressive model for compression, all pixels can be input to the autoregressive model to obtain the distribution prediction of the pixel, and the distribution prediction of the pixel and the value of the pixel can be input into the entropy encoder to obtain the encoded data. During decompression, all pixels are input to the autoregressive model to obtain the distribution prediction of the pixels. The distribution prediction and its encoding data are input to the entropy encoder to obtain the decoded data. However, during the encoding and decoding process, the prediction of each pixel relies on all previous pixels, which results in low operating efficiency. During decompression, all pixels before the current pixel need to be decompressed before the current pixel can be decompressed. Only one network inference can be decompressed at a time. pixels, the number of network inferences is large, and the decompression efficiency is low.
又例如,可以采用自编码模型进行图像压缩。在进行编码时,将原数据输入编码网络(Encoder),得到隐变量,将隐变量输入解码网络(Decoder),得到图像的分布预测;将手工设计的分布,及隐变量的值输入熵编码,编码隐变量;将图像的分布预测及原图像输入熵编码,编码图像。在进行解码时,将手工设计的分布,及隐变量的编码输入熵编码,解码隐变量;将隐变量输入解码网络(Decoder),得到图像的分布预测;将图像的分布预测及图像的编码输入熵编码,解码图像。与自回归模型相比,自编码模型的拟合能力较差。若要压缩率超过传统压缩算法,则需要较深的网络,单次网络推理的时延高。As another example, an autoencoding model can be used for image compression. When encoding, the original data is input into the encoding network (Encoder) to obtain the hidden variables, and the hidden variables are input into the decoding network (Decoder) to obtain the distribution prediction of the image; the manually designed distribution and the value of the hidden variable are input into the entropy encoding, Encoding latent variables; input the distribution prediction of the image and the original image into entropy coding to encode the image. When decoding, the hand-designed distribution and the encoding of the hidden variables are input into the entropy encoding, and the hidden variables are decoded; the hidden variables are input into the decoding network (Decoder) to obtain the distribution prediction of the image; the distribution prediction of the image and the encoding of the image are input Entropy coding,decoding images. Compared with autoregressive models, autoencoding models have poorer fitting capabilities. If the compression rate exceeds that of traditional compression algorithms, a deeper network is required, and the latency of a single network inference is high.
因此,本申请提供一种编码方法和解码方法,利用自回归模型以及自编码器模型进行无损压缩,且提供了高效的半动态熵编码器,使得模型推理以及编码过程均在AI芯片上运行,减少了系统内存和AI芯片内存之间的传输,实现了高带宽的压缩解压。Therefore, this application provides an encoding method and a decoding method that utilizes autoregressive models and autoencoder models for lossless compression, and provides an efficient semi-dynamic entropy encoder so that both the model inference and encoding processes run on the AI chip. It reduces the transmission between system memory and AI chip memory, and achieves high-bandwidth compression and decompression.
首先,为便于理解,对本申请提供的编码方法以及解码方法的一些应用场景进行示例性介绍。First, for ease of understanding, some application scenarios of the encoding method and decoding method provided in this application are exemplarily introduced.
场景一、本地保存拍摄图像Scenario 1. Save captured images locally
以本申请提供的方法部署于终端为例,该终端可以包括手机、相机、监测设备或者其他具有拍摄功能或者与摄像装置连接的设备。例如,如图3所示,在拍摄得到图像之后,为例降低图像所占用的存储空间,可以通过本申请提供的编码方法对该图像进行无损压缩,从而得到压缩后的编码数据。当需要读取该图像时,如在相册中显示该图像时,则可以通 过本申请提供的解码方法进行解码,从而得到高清图像。通过本申请提供的方法,可以对图像进行高效的无损压缩,降低保存图像所需的内容,并对图像进行无损恢复,解压得到高清图像。Taking the method provided by this application deployed on a terminal as an example, the terminal may include a mobile phone, a camera, a monitoring device, or other devices with a shooting function or connected to a camera device. For example, as shown in Figure 3, after an image is captured, as an example to reduce the storage space occupied by the image, the image can be losslessly compressed through the encoding method provided by this application, thereby obtaining compressed encoded data. When you need to read the image, such as when displaying the image in a photo album, you can Decode through the decoding method provided in this application to obtain high-definition images. Through the method provided by this application, images can be efficiently and losslessly compressed, reducing the content required to save the image, losslessly restoring the image, and decompressing the image to obtain a high-definition image.
场景二、图像传输Scenario 2. Image transmission
在一些通信场景中,可能涉及到图像传输。例如,如图4所示,如用户在使用通信软件进行交流时,可以通过有线或者无线网络传输图像,为了提高传输速率以及降低传输图像所占用的网络资源,可以通过本申请提供的编码方法对图像进行无损压缩,得到压缩后的编码数据,随后传输编码数据即可。接收端在接收到编码数据之后,即可通过本申请提供的解码方法,对编码数据进行解码,得到恢复后的图像。In some communication scenarios, image transmission may be involved. For example, as shown in Figure 4, when users use communication software to communicate, they can transmit images through wired or wireless networks. In order to increase the transmission rate and reduce the network resources occupied by transmitted images, the encoding method provided by this application can be used to The image is losslessly compressed to obtain the compressed encoded data, and then the encoded data is transmitted. After receiving the encoded data, the receiving end can decode the encoded data through the decoding method provided by this application to obtain the restored image.
场景三、服务器保存大量图像Scenario 3: The server saves a large number of images
在一些为用户提供服务的平台或者一些数据库中,通常需要保存大量的高清图像,若直接按照每帧图像的像素点进行保存,则需要占用非常大的存储空间。例如,如图5所示,一些购物软件或者公开数据集中,需要在服务器中对大量的高清图像进行保存,用户可以从服务器中读取所需的图像。可以通过本申请提供的编码方法来对需要保存的图像高效地进行无损压缩,得到压缩后的数据。当需要读取图像时,即可通过本申请提供的解码方法对保存的编码数据进行解码,得到高清图像。In some platforms or databases that provide services to users, it is usually necessary to save a large number of high-definition images. If you save them directly according to the pixels of each frame of the image, it will take up a very large storage space. For example, as shown in Figure 5, some shopping software or public data sets need to save a large number of high-definition images in the server, and users can read the required images from the server. The encoding method provided by this application can be used to efficiently perform lossless compression on images that need to be saved, and obtain compressed data. When an image needs to be read, the saved encoded data can be decoded through the decoding method provided by this application to obtain a high-definition image.
为便于理解,下面分别对本申请提供的编码方法和解码方法的流程进行介绍。For ease of understanding, the processes of the encoding method and decoding method provided by this application are introduced below.
参阅图6,本申请提供的一种编码方法的流程示意图,如下所述。Referring to Figure 6, a schematic flow chart of an encoding method provided by this application is as follows.
601、将输入图像作为自回归模型的输入,输出第一图像。601. Use the input image as the input of the autoregressive model and output the first image.
其中,输入图像可以是待压缩的图像,自回归模型可以用于使用输入图像中除当前像素点外的其他像素点的值,来预测当前像素点的像素值,得到预测的各个像素点的像素分布,即第一图像。Among them, the input image can be an image to be compressed, and the autoregressive model can be used to use the values of other pixels in the input image except the current pixel to predict the pixel value of the current pixel, and obtain the predicted pixels of each pixel. distribution, that is, the first image.
该输入图像可以包括多种图像,根据场景的不同输入图像的来源也可能不相同。例如,该输入图像可以是拍摄得到的图像,也可以接收到的图像等。The input image may include a variety of images, and the sources of the input images may be different depending on the scene. For example, the input image may be a photographed image, a received image, etc.
可选地,在自回归模型进行预测的过程中,针对处于同一连线上的像素点,可以使用已进行预测的像素点的像素值进行预测,以使后续解码过程中,针对同一连线上的像素点,无需等待其他像素点解码后才能对当前像素点进行解码,实现同一连线上的像素点的解码效率,提高针对输入图像的解码效率。该同一连线可以是同一行、同一列或者同一对角线等,具体可以根据实际应用场景确定。Optionally, during the prediction process of the autoregressive model, for pixels on the same connection, the pixel values of the predicted pixels can be used for prediction, so that in the subsequent decoding process, for the pixels on the same connection, pixels, there is no need to wait for other pixels to be decoded before the current pixel can be decoded, achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image. The same connection can be the same row, the same column, the same diagonal, etc., which can be determined according to the actual application scenario.
602、获取第一图像和输入图像之间的残差,得到第一残差图像。602. Obtain the residual between the first image and the input image to obtain the first residual image.
在得到第一图像之后,可以计算第一图像中各个像素点和输入图像中对应像素点之间的残差值,得到第一残差图像。After obtaining the first image, the residual value between each pixel point in the first image and the corresponding pixel point in the input image can be calculated to obtain the first residual image.
其中,该第一图像和输入图像之间的分辨率通常相同,即第一图像和输入图像中的像素点一一对应,从而在计算残差值时,可以计算每对像素点之间的残差值,得到的残差值可以组成图像,即第一残差图像。 Wherein, the resolution between the first image and the input image is usually the same, that is, the pixel points in the first image and the input image correspond one to one. Therefore, when calculating the residual value, the residual value between each pair of pixel points can be calculated. Difference value, the obtained residual value can form an image, that is, the first residual image.
可选地,在计算残差时,通常残差值为范围在[-255,255]的整数类型,可以将残差值转换为低精度的数值类型来表示,如将整数转化为uint8数值类型,从而将数值缩小至[0,255],并可以通过设置偏移量,使各个像素点的残差值分布在128附近,从而使数据更集中,通过较少的数据即可表示出输入图像和自回归模型输出图像之间的残差分布。Optionally, when calculating the residual, the residual value is usually an integer type in the range [-255, 255]. The residual value can be converted to a low-precision numerical type to represent it, such as converting the integer to a uint8 numerical type. , thereby reducing the value to [0, 255], and by setting the offset, the residual values of each pixel point are distributed near 128, thereby making the data more concentrated, and the input image can be expressed with less data and the residual distribution between the autoregressive model output images.
603、将输入图像作为自编码模型的输入,输出隐变量和第一残差分布。603. Use the input image as the input of the autoencoding model, and output the latent variable and first residual distribution.
在得到输入图像之后,还可以将输入图像作为自编码模型的输入,输出对应的隐变量和第一残差分布。After obtaining the input image, the input image can also be used as the input of the autoencoding model to output the corresponding latent variable and first residual distribution.
该隐变量可以包括从输入图像中提取到的特征,该第一残差分布可以包括由自编码模型预测得到的输入图像的各个像素点和第一残差图像中对应像素点之间的残差值。The latent variable may include features extracted from the input image, and the first residual distribution may include the residual between each pixel of the input image predicted by the autoencoding model and the corresponding pixel in the first residual image. value.
具体地,该自编码模型可以包括编码模型和解码模型,编码模型可以用于从输入图像中提取特征,解码模型用于对输入图像与自回归模型输出的图像之间的残差进行预测。即可以通过编码模型从输入图像中提取特征,得到用于表示输入图像重要特征的隐变量,将隐变量作为解码模型的输入,输出第一残差分布。Specifically, the autoencoding model may include an encoding model and a decoding model. The encoding model may be used to extract features from the input image, and the decoding model may be used to predict the residual between the input image and the image output by the autoregressive model. That is, features can be extracted from the input image through the encoding model to obtain latent variables used to represent important features of the input image. The latent variables are used as input to the decoding model to output the first residual distribution.
需要说明的是,本申请对步骤601和步骤603的执行顺序不作限定,可以先执行步骤601,也可以先执行步骤603,还可以同时执行步骤601和步骤603,具体可以根据实际应用场景调整。It should be noted that this application does not limit the execution order of step 601 and step 603. Step 601 may be executed first, step 603 may be executed first, or step 601 and step 603 may be executed simultaneously. The details may be adjusted according to the actual application scenario.
604、对第一残差图像和第一残差分布进行编码,得到残差编码数据。604. Encode the first residual image and the first residual distribution to obtain residual encoded data.
在得到第一残差图像和第一残差分布之后,可以对该第一残差图像和第一残差分布进行编码,得到残差编码数据。After obtaining the first residual image and the first residual distribution, the first residual image and the first residual distribution may be encoded to obtain residual encoded data.
具体地,在对第一残差图像和第一残差分布进行编码时,可以采用半动态熵编码,即采用有限种概率分布进行编码,得到残差图像的编码数据,即残差编码数据。该该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且该半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种耗时较长的运算,以提高编码效率。因此,本申请实施方式中,可以采用有限数量的概率分布进行编码,得到残差图像的编码。相对于动态熵编码,解压一个字符,需要的指令较多;除法、乘方的耗时长,每个指令的耗时为加法的数十倍,通过本申请提供的有限种概率分布的半动态熵编码即可实现高效编码,提高编码效率。Specifically, when encoding the first residual image and the first residual distribution, semi-dynamic entropy coding can be used, that is, a limited kind of probability distribution is used for encoding to obtain the encoded data of the residual image, that is, the residual encoded data. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type. Assuming a type of encoding operation, the second preset type includes at least one of multiplication, division or remainder operation that takes a long time to improve encoding efficiency. Therefore, in the embodiment of the present application, a limited number of probability distributions can be used for encoding to obtain the encoding of the residual image. Compared with dynamic entropy encoding, decompressing a character requires more instructions; division and exponentiation are time-consuming, and each instruction takes dozens of times as much as addition. Through the semi-dynamic entropy of the limited probability distribution provided by this application Encoding can achieve efficient encoding and improve encoding efficiency.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
605、对隐变量进行编码,得到残差编码数据。605. Encode the latent variables to obtain residual encoded data.
该隐变量可以包括从输入图像中提取到的重要特征,因此在进行图像压缩时,可以对 提取到的重要特征进行编码,得到残差编码数据,以便于后续对图像进行恢复,得到无损图像。The latent variable can include important features extracted from the input image, so when performing image compression, it can The extracted important features are encoded to obtain residual encoded data, which facilitates subsequent image restoration and obtains a lossless image.
可选地,在对隐变量进行编码时,可以采用静态熵编码的方式进行编码。将该隐变量作为静态熵编码器的输入,从而输出隐变量的编码比特流。Optionally, when encoding latent variables, static entropy coding can be used. The latent variable is taken as input to the static entropy encoder, which outputs an encoded bitstream of the latent variable.
该隐变量编码数据和残差编码数据即可用于解码端进行图像的无损恢复,从而实现对图像的无损压缩与恢复。The latent variable encoded data and residual encoded data can be used at the decoder to perform lossless image restoration, thereby achieving lossless compression and restoration of the image.
通常,自编码模型通常拟合能力较差,需要用较深的网络才能达到较好的压缩率,而本申请结合了自回归模型的输出结果,从而可以降低自编码模型的大小。因此,本申请中,结合了自回归模型与自编码模型的输出结果进行编码,可以将自编码与自回归模型都控制到很小,避免了自编码模型的网络过大造成的推理时间过长的问题,实现高效的图像压缩。并且,本申请提供的方法中,全流程均可基于AI芯片的AI无损压缩实现,包括AI模型及熵编码,避免了系统内存与AI芯片内存的传输问题,提高编码效率。Generally, autoencoding models usually have poor fitting capabilities and require a deeper network to achieve a better compression rate. However, this application combines the output results of the autoregressive model, thereby reducing the size of the autoencoding model. Therefore, in this application, the output results of the autoregressive model and the autoencoding model are combined for coding, which can control both the autoencoding and the autoregressive models to a very small size and avoid the long inference time caused by the large network of the autoencoding model. problem to achieve efficient image compression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
前述对本申请提供的编码方法的流程进行了介绍,下面结合具体的应用场景,对本申请提供的编码方法的流程进行更详细的介绍。参与图7,本申请提供的另一种编码方法的流程示意图。The process of the encoding method provided by this application has been introduced above. The process of the encoding method provided by this application will be introduced in more detail below based on specific application scenarios. Refer to Figure 7, which is a schematic flow chart of another encoding method provided by this application.
首先,获取输入图像701。First, the input image 701 is obtained.
该输入图像701可以包括自身采集到的图像也可以包括接收到的图像。例如,若本申请提供的方法部署于终端,该输入图像可以包括终端采集到的图像,也可以是终端从其他服务器或者终端接收到的图像。The input image 701 may include an image collected by itself or a received image. For example, if the method provided by this application is deployed on a terminal, the input image may include an image collected by the terminal, or may be an image received by the terminal from other servers or terminals.
随后,将输入图像701作为自回归模型702的输入,输出预测图像703。Subsequently, the input image 701 is used as the input of the autoregressive model 702, and a predicted image 703 is output.
其中,该自回归模型可以用于使用每个像素点相邻的像素点来预测每个像素点的像素概率分布,得到预测图像703,即前述的第一图像。The autoregressive model can be used to predict the pixel probability distribution of each pixel using the adjacent pixels of each pixel to obtain the predicted image 703, which is the aforementioned first image.
可以理解为,自回归模型可以使用相邻像素点的像素值来预测当前像素点的像素值。It can be understood that the autoregressive model can use the pixel values of adjacent pixels to predict the pixel value of the current pixel.
本申请实施方式中,为了加快解码端的解码效率,在自回归模型进行预测时,针对同一条线的像素点,可以使用与其相邻的像素点的像素值来并行进行预测。以一种具体的自回归模型为例,如图8所示,给定一张m×n的图像及超参h(0≤h<n),若对于任意像素点(i,j),自回归模型中预测(i,j)的所有点(i′,j′)满足:h×i′+j′<h×i+j,则此图像可以通过n+(m-1)×h次并行预测,如图8中所示,当h=1时,对于同一对角线上的像素点,可以使用以1为单位选择其左侧多个像素点的像素值作为感受野,来预测当前像素点的像素概率分布,即该像素点为各个像素值的概率。如图8中所示,当h=2时,可以以2为单位选择其左侧多个像素点的像素值作为感受野来预测当前像素点的像素概率分布。以便于在后续进行解压时,针对处于同一对角线上的像素点可以并行解压。In the embodiment of the present application, in order to speed up the decoding efficiency of the decoder, when the autoregressive model performs prediction, for the pixels on the same line, the pixel values of adjacent pixels can be used for prediction in parallel. Taking a specific autoregressive model as an example, as shown in Figure 8, given an m×n image and hyperparameter h (0≤h<n), if for any pixel (i,j), automatically All points (i′, j′) predicting (i, j) in the regression model satisfy: h×i′+j′<h×i+j, then this image can be parallelized n+(m-1)×h times Prediction, as shown in Figure 8, when h=1, for pixels on the same diagonal, the pixel values of multiple pixels on the left can be selected as the receptive field in units of 1 to predict the current pixel The pixel probability distribution of a point is the probability that the pixel is the value of each pixel. As shown in Figure 8, when h=2, the pixel values of multiple pixels on the left can be selected in units of 2 as receptive fields to predict the pixel probability distribution of the current pixel. In order to facilitate subsequent decompression, pixels on the same diagonal can be decompressed in parallel.
此外,对于各个像素点的预测顺序可以如图9所示,其中数字越小表示预测顺序越优先,相同数字的像素点同时预测。因此,针对处于同一对角线上的像素点,可以并行进行预测,提高自回归模型的预测效率。In addition, the prediction order for each pixel can be shown in Figure 9, where the smaller the number, the higher the priority of the prediction order, and pixels with the same number are predicted at the same time. Therefore, pixels on the same diagonal can be predicted in parallel to improve the prediction efficiency of the autoregressive model.
随后,计算预测图像和输入图像之间的残差,得到图像残差704。Subsequently, the residual between the predicted image and the input image is calculated to obtain image residual 704.
在得到自回归输出的预测图像703之后,即可计算该预测图像与输入图像中各个像素 点之间的残差,得到图像残差704,即前述的第一残差图像。After obtaining the predicted image 703 output by the autoregression, the predicted image and each pixel in the input image can be calculated. The residual between the points results in image residual 704, which is the aforementioned first residual image.
如给定原始图像x,即输入图像,利用自回归模型对原始图像进行预测,得到预测的重建图像可以计算出重建图像和原始图像每个像素点之间的图像残差 For example, given the original image x, that is, the input image, use the autoregressive model to predict the original image and obtain the predicted reconstructed image. The image residual between each pixel of the reconstructed image and the original image can be calculated
例如,如图10所示,在得到输入图像和预测图像之后,即可计算输入图像和预测图像对应像素点之间的差值,得到各个像素点之间的残差值,组成残差图像。For example, as shown in Figure 10, after obtaining the input image and the predicted image, the difference between the corresponding pixels of the input image and the predicted image can be calculated to obtain the residual value between each pixel to form a residual image.
可选地,在计算残差时,可以将范围在[-255,255]的整数类型,可以将残差值转换为低精度的数值类型来表示,如将整数转化为uint8数值类型,从而将数值缩小至[0,255],并可以通过设置偏移量,使各个像素点的残差值分布在128附近,从而使数据更集中,通过较少的数据即可表示出输入图像和自回归模型输出图像之间的残差分布。Optionally, when calculating the residual, you can convert the residual value to an integer type in the range [-255, 255] and convert it to a low-precision numerical type, such as converting the integer to a uint8 numerical type, thereby converting The value is reduced to [0, 255], and the offset can be set so that the residual values of each pixel are distributed around 128, thereby making the data more concentrated, and the input image and autoregression can be expressed with less data. Distribution of residuals between model output images.
例如,用自回归模型输入原图像x,输出y,则预测的图像x′=round(clip(y,0,M-1)),则残差计算为其中x′中每像素的取值范围为0~M-1的整数;利用模型二预测r得到分布N(μ,σ),则利用分布编码 其中N为高斯分布或逻辑分布。For example, if an autoregressive model is used to input the original image x and output y, then the predicted image x′=round(clip(y,0,M-1)), then the residual is calculated as The value range of each pixel in x′ is an integer from 0 to M-1; use model 2 to predict r to obtain the distribution N (μ, σ), then use distributed coding Where N is Gaussian distribution or logistic distribution.
此外,还将输入图像输入至自编码模型705,输出预测残差707和隐变量706。In addition, the input image is also input to the autoencoding model 705, and the prediction residual 707 and the latent variable 706 are output.
如可以将原始图像x输入至自编码模型,利用自编码模型来估计图像残差r的概率分布p(r|x),即预测残差707。For example, the original image x can be input to the autoencoding model, and the autoencoding model is used to estimate the probability distribution p(r|x) of the image residual r, that is, the prediction residual 707.
具体地,该自编码模型可以包括编码模型(encoder)和解码模型(decoder),将输入图像作为编码模型的输入,可以从输入图像中提取重要特征,得到隐变量706,随后将隐变量作为解码模型的输入,输出预测残差707。Specifically, the autoencoding model can include an encoding model (encoder) and a decoding model (decoder). The input image is used as the input of the encoding model. Important features can be extracted from the input image to obtain the latent variable 706, and then the latent variable can be used as the decoder. The input to the model is the output prediction residual 707.
通常,自编码模型可以是预训练后的模型,具体可以采用自编码器(AutoEncoder,AE)、变分字编码器(Variational AutoEncoder,VAE)或者VQ-VAE(Vector Quantised-Variational AutoEncoder)等,具体可以根据实际应用场景进行调整,本申请对此并不作限定。Usually, the autoencoding model can be a pre-trained model. Specifically, it can use an autoencoder (AutoEncoder, AE), a variational word encoder (Variational AutoEncoder, VAE) or VQ-VAE (Vector Quantised-Variational AutoEncoder), etc., specifically It can be adjusted according to actual application scenarios, and this application does not limit this.
随后,可以对隐变量706进行编码,得到隐变量编码708。Subsequently, latent variable 706 may be encoded to obtain latent variable encoding 708.
具体地,对隐变量可以采用静态熵编码的方式进行编码。即利用数状结构将概率大的数据用较短的比特数表示,概率小的数据用较长的表示。Specifically, the latent variables can be encoded using static entropy coding. That is, the numerical structure is used to represent data with high probability using shorter bit numbers, and data with low probability using longer bits.
例如,树状结构可以如图11所示,其对应的比特可以表示为如表1所示。
For example, the tree structure can be shown in Figure 11, and its corresponding bits can be expressed as shown in Table 1.
表1Table 1
因此,数据a1a2a1a4编码后为0100110。Therefore, the data a 1 a 2 a 1 a 4 is encoded as 0100110.
此外,还可以对图像残差704和预测残差707进行编码,得到残差编码709。 In addition, the image residual 704 and the prediction residual 707 can also be encoded to obtain residual encoding 709.
具体地,可以对图像残差704和预测残差707进行半动态熵编码,得到残差编码。Specifically, semi-dynamic entropy coding can be performed on the image residual 704 and the prediction residual 707 to obtain residual coding.
为便于理解,对动态熵编码和本申请提供的半动态熵编码的区别进行说明。For ease of understanding, the difference between dynamic entropy coding and the semi-dynamic entropy coding provided in this application is explained.
首先,以rANS编码为例,动态编码即利用状态(通常为大整数,或大小数)表示数据,利用数据的概率信息更改状态值,最终编码值为状态的0、1表示。在rANS编码中,首先要设一个M值,代表表示一个概率所需要的比特数。对于一个字符ai,其对应的PMFi与其概率成正比,且加和为2M;其对应的CDFi为之前所有PMF的值的累加,即PMF1+PMF2+…+PMFi-1。上表中,若取M=4,则概率值对应的PMF及CDF如表2所示:
First, taking rANS coding as an example, dynamic coding uses state (usually a large integer, or a large or small number) to represent data, and uses the probability information of the data to change the state value. The final coded value is a 0 or 1 representation of the state. In rANS coding, an M value must first be set, which represents the number of bits required to represent a probability. For a character a i , its corresponding PMF i is proportional to its probability, and the sum is 2 M ; its corresponding CDF i is the accumulation of all previous PMF values, that is, PMF 1 +PMF 2 +...+PMF i-1 . In the above table, if M=4, the PMF and CDF corresponding to the probability value are as shown in Table 2:
表2Table 2
若压缩一个字符x前后的状态分别为S、S’,则
S’=S/PMF(x)*2M+CDF(x)+S%PMF(x)
If the states before and after compressing a character x are S and S' respectively, then
S'=S/PMF(x)*2 M +CDF(x)+S%PMF(x)
动态熵编码也可当作静态熵编码使用,当表格中的值为定值时,则为静态熵编码;当不同符号的表格不完全一样时,需用动态熵编码。Dynamic entropy coding can also be used as static entropy coding. When the value in the table is a fixed value, it is static entropy coding; when the tables of different symbols are not exactly the same, dynamic entropy coding is needed.
动态熵编码中的速度瓶颈包括:解压中的符号搜索以及运算:其中除法、取余运算最耗时,其次为乘法。因此,针对动态熵编码中的无线中概率分布方式所带来的效率降低,本申请提供一种半动态熵编码。基于前述动态熵编码,即rANS的编码公式,先做近似处理,如将动态熵编码中的乘、除、取余等运算替换为近似的加、减、位等轻量级运算,在少量的压缩率损失的前提下,大幅减少或者去除乘、除、取余等运算;再通过一系列的变换处理,将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算。可以理解为,本申请提供的半动态熵编码,可以通过算法变换及表格化处理,去除所有符号搜索、乘除取余等耗时运算,达到与静态熵编码相当的吞吐率。The speed bottlenecks in dynamic entropy coding include: symbol search and operations during decompression: division and remainder operations are the most time-consuming, followed by multiplication. Therefore, in order to reduce the efficiency caused by the wireless probability distribution method in dynamic entropy coding, this application provides a semi-dynamic entropy coding. Based on the aforementioned dynamic entropy coding, that is, the coding formula of rANS, approximate processing is first performed, such as replacing operations such as multiplication, division, and remainder in dynamic entropy coding with approximate lightweight operations such as addition, subtraction, and bitwise operations. Under the premise of losing the compression rate, operations such as multiplication, division, and remainder are greatly reduced or eliminated; and then through a series of transformation processes, all operations that take more than a certain time (such as the remaining remainder, multiplication, division, etc.) are reduced. ) into table access, and lightweight operations such as addition, subtraction, and bitwise operations. It can be understood that the semi-dynamic entropy coding provided by this application can remove all time-consuming operations such as symbol search, multiplication, division and remainder through algorithm transformation and tabular processing, thereby achieving a throughput comparable to static entropy coding.
例如,与常用的rANS实现相似,对状态值S做截断及近似处理,但区别包括:For example, similar to the commonly used rANS implementation, the state value S is truncated and approximated, but the differences include:
不同于通常rANS将S截断至[2M,22M),共22M-2M种状态;此方案将其截断至[2M,2M+1),共2M种状态。以实现更小的状态空间,便于后续的表格化处理;Unlike the usual rANS that truncates S to [2 M ,2 2M ), a total of 2 2M -2 M states; this scheme truncates it to [2 M ,2 M+1 ), a total of 2 M states. To achieve a smaller state space and facilitate subsequent tabular processing;
不同于通常rANS利用除法及取余计算,此方案将其改成循环+位运算的近似求解方法,便于进一步减小表格化所需的存储空间。此计算中的循环耗时较长,因此在此处理后,耗时通常会超过原rANS,但后续的处理中,会将循环次数表格化处理,以实现高效压缩、解压。Unlike the usual rANS calculations that use division and remainder calculations, this solution changes it to an approximate solution method of loop + bit operation to further reduce the storage space required for tabulation. The loop in this calculation takes a long time, so after this processing, the time consumption will usually exceed that of the original rANS. However, in subsequent processing, the number of loops will be tabulated to achieve efficient compression and decompression.
在压缩的过程中,对于每种分布、符号,利用表格预计算、存储循环的次数(即状态右移位数),及此分布、符号下,下一状态与此状态的差值。如在进行压缩时,对于每一个输入的分布索引、及符号,查表得到对应的δ,计算得到状态右移位数b=(δ+S)>>M; 将状态的最右边的b位压栈至存储器中,并将状态值右移b位;通过分布索引、及符号,查表得到下一状态与此状态的差值,将此差值加到当前状态值上,得到更新后的状态值。During the compression process, for each distribution and symbol, a table is used to precalculate and store the number of cycles (that is, the number of state right shifts), and the difference between the next state and this state under this distribution and symbol. For example, when performing compression, for each input distribution index and symbol, the corresponding δ is obtained by looking up the table, and the state right shift number b=(δ+S)>>M is calculated; Push the rightmost b bit of the state into the memory, and shift the state value to the right by b bits; through the distribution index and symbol, look up the table to get the difference between the next state and this state, and add this difference to the current state On the status value, get the updated status value.
相比于直接存储循环的次数,此方案存储中间结果δ,循环次数可计算为(δ+S)>>M,本申请提供的编码方式可减小存储表格所需的内存空间。相比于直接存储两状态的差值,本申请提供的半动态熵编码方式中存储状态右移后,两状态的差值,此方法可以用无符号数字存储,相同位数下减小一半的内存空间。Compared with directly storing the number of loops, this solution stores the intermediate result δ. The number of loops can be calculated as (δ+S)>>M. The encoding method provided by this application can reduce the memory space required to store the table. Compared with directly storing the difference between two states, the semi-dynamic entropy coding method provided by this application stores the difference between two states after the state is shifted to the right. This method can be stored with unsigned numbers, reducing the number by half for the same number of digits. memory space.
在得到残差编码709和隐变量编码708之后,即可进行后续操作。如保存该残差编码709和隐变量编码708,或者向接收端传输该残差编码709和隐变量编码708等,具体可以根据实际应用场景确定。After obtaining the residual encoding 709 and the latent variable encoding 708, subsequent operations can be performed. For example, the residual code 709 and the latent variable code 708 are saved, or the residual code 709 and the latent variable code 708 are transmitted to the receiving end. The details can be determined according to the actual application scenario.
因此,本申请实施方式提供的方法,可以应用于图像无损压缩,实现高效的图像无损压缩。且提供了高效的半动态熵编码器,使得模型推理以及编码过程均在AI芯片上运行,减少了系统内存和AI芯片内存之间的传输,实现了高带宽的压缩解压。Therefore, the method provided by the embodiment of the present application can be applied to image lossless compression to achieve efficient image lossless compression. It also provides an efficient semi-dynamic entropy encoder, allowing the model inference and encoding processes to run on the AI chip, reducing the transmission between system memory and AI chip memory, and achieving high-bandwidth compression and decompression.
前述对本申请提供的编码方法的流程进行了介绍,下面对与其对应的解码方法的流程进行介绍,即前述编码流程的逆运算。参阅图12,本申请提供的一种解码方法的流程示意图,如下所述。The process of the encoding method provided by this application is introduced above, and the process of the corresponding decoding method is introduced below, that is, the inverse operation of the foregoing encoding process. Referring to Figure 12, a schematic flow chart of a decoding method provided by this application is as follows.
1201、获取隐变量编码数据和残差编码数据。1201. Obtain latent variable encoding data and residual encoding data.
其中,解码端可以从本地读取隐变量编码数据和残差编码数据,或者接收编码端发送的该隐变量编码数据和残差编码数据,具体可以根据实际应用场景确定该隐变量编码数据和残差编码数据的来源,本申请对此并不作限定。Among them, the decoder can read the latent variable encoded data and residual encoded data locally, or receive the latent variable encoded data and residual encoded data sent by the encoding end. Specifically, the latent variable encoded data and residual encoded data can be determined according to the actual application scenario. The source of differentially encoded data is not limited by this application.
具体地,该隐变量编码数据可以由编码端对从输入图像中提取到的特征进行编码得到。该残差编码数据可以是编码端对前述的图像残差和预测残差进行编码得到,该图像残差可以包括前述编码端的输入图像和自回归模型输出的图像之间的残差。该隐变量编码数据和残差编码数据可以参阅前述图6-图11中的相关介绍,此处不再赘述。Specifically, the latent variable encoding data can be obtained by encoding the features extracted from the input image at the encoding end. The residual coding data may be obtained by encoding the aforementioned image residual and prediction residual at the encoding end. The image residual may include the residual between the input image at the encoding end and the image output by the autoregressive model. The latent variable coded data and residual coded data can be referred to the relevant introductions in Figures 6 to 11 mentioned above, and will not be described again here.
1202、对隐变量编码数据进行解码,得到隐变量。1202. Decode the latent variable encoded data to obtain the latent variable.
其中,对隐变量编码数据进行解码的方式可以与编码端相对应。例如,若编码端采用静态熵编码器进行编码,则在解码时,可以使用静态熵编码器进行解码。如将该隐变量编码数据作为静态熵编码器的输入,输出隐变量。该隐变量可以包括从输入图像中提取到的特征,针对解压端而言,该隐变量即表示解压图像中的特征。Among them, the method of decoding the latent variable encoded data can correspond to the encoding end. For example, if the encoding end uses a static entropy encoder for encoding, the static entropy encoder can be used for decoding during decoding. For example, the latent variable encoded data is used as the input of the static entropy encoder to output the latent variable. Cain variables may include features extracted from the input image. For the decompression end, the cain variables represent features in the decompression image.
1203、将隐变量作为自编码模型的输入,输出第二残差分布。1203. Use the latent variable as the input of the autoencoding model and output the second residual distribution.
在对隐变量编码数据进行解码得到隐变量之后,即可将隐变量作为自编码模型的输入,输出对应的第二残差分布,即与前述编码端第一残差分布对应的图像,可以理解为表示编码端中自回归模型输出的图像与输入图像之间的残差分布。After decoding the latent variable encoding data to obtain the latent variable, the latent variable can be used as the input of the autoencoding model, and the corresponding second residual distribution is output, that is, the image corresponding to the first residual distribution at the encoding end. It can be understood that To represent the residual distribution between the image output by the autoregressive model in the encoding end and the input image.
具体地,该自编码模型中可以包括解码模型,将隐变量作为该解码模型的输入,即可输出预测的残差图像。该解码模型可以是训练后的模型,用于输出输入图像对应的残差图像,该残差图像可以理解为自回归模型预测的残差图像与输入图像之间的残差值。Specifically, the autoencoding model can include a decoding model, and the predicted residual image can be output by using the latent variable as the input of the decoding model. The decoding model may be a trained model and is used to output a residual image corresponding to the input image. The residual image may be understood as a residual value between the residual image predicted by the autoregressive model and the input image.
需要说明的是,编码端和解码端均部署了自回归模型和自编码模型,且编码端侧的自回归模型与解码端的自回归模型相同,若编码端和解码端部署于同一设备中,则编码端和 解码端的自编码模型相同,若编码端和解码端部署于不同设备中,则编码端和解码端可以部署相同的自编码模型,也可以在编码端部署完整的自编码模型,而在解码端部署自编码模型中的解码模型,具体可以根据实际应用场景进行调整,本申请对此并不作限定。It should be noted that both the encoding end and the decoding end deploy autoregressive models and autoencoding models, and the autoregressive model on the encoding end is the same as the autoregressive model on the decoding end. If the encoding end and decoding end are deployed in the same device, then Coding end and The auto-encoding model on the decoding end is the same. If the encoding end and decoding end are deployed in different devices, the encoding end and decoding end can deploy the same auto-encoding model, or a complete auto-encoding model can be deployed on the encoding end and deployed on the decoding end. The decoding model in the autoencoding model can be adjusted according to actual application scenarios, and this application does not limit this.
1204、结合第二残差分布和残差编码数据进行解码,得到第二残差图像。1204. Combine the second residual distribution and the residual coded data for decoding to obtain the second residual image.
在得到第二残差分布和残差编码数据之后,即可结合该第二残差分布和残差编码数据进行解码,得到第二残差图像。After obtaining the second residual distribution and the residual coded data, the second residual distribution and the residual coded data can be combined for decoding to obtain the second residual image.
具体地,若编码端采用半动态熵编码的方式进行编码,则将解码端也可以基于半动态熵编码进行解码,输出第二残差图像,即与编码端的第一残差图像对应的图像。该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算,即该半动态熵编码器中可以仅包括简单的加减运算,从而可以实现高效编码。Specifically, if the encoding end uses semi-dynamic entropy coding for encoding, the decoding end can also decode based on semi-dynamic entropy coding and output the second residual image, that is, the image corresponding to the first residual image on the encoding end. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include the second preset type. The second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include multiplication, division or remainder operations. Only simple addition and subtraction operations can be included in the dynamic entropy encoder, allowing efficient encoding.
更具体地,该半动态熵编码器可以参与前述图6-图11中的相关描述,此处不再赘述。More specifically, the semi-dynamic entropy encoder can participate in the related descriptions in the aforementioned Figures 6 to 11, and will not be described again here.
可以理解为,针对前述编码端对第一残差图像和第一残差分布进行编码得到残差编码数据的过程,在解码端获取到第二残差分布和残差编码数据之后,即可进行逆运算推理出第二残差分布,相当于获取到前述编码端自回归模型输出的第一图像和输入图像之间的残差,即第一残差分布。It can be understood that for the aforementioned process of encoding the first residual image and the first residual distribution at the encoding end to obtain the residual encoded data, after the decoding end obtains the second residual distribution and the residual encoded data, the process can be performed The inverse operation to deduce the second residual distribution is equivalent to obtaining the residual between the first image output by the autoregressive model at the encoding end and the input image, that is, the first residual distribution.
1205、将第二残差图像作为自回归模型的反向传播的输入,输出解压图像。1205. Use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.
在得到第二残差图像之后,即可将该第二残差分布作为自回归模型的输入进行反向传播,推理出解压图像,即实现对编码端的输入图像的无损恢复。After obtaining the second residual image, the second residual distribution can be used as the input of the autoregressive model for backpropagation, and the decompressed image can be deduced, that is, lossless recovery of the input image at the encoding end can be achieved.
此外,在将第二残差图像作为自回归模型的输入进行反向传播时,若编码端的自回归模型使用已预测的像素点的像素值预测处于同一连线上的像素点的值,则在解码端进行解码操作时,可以对处于同一连线上的像素点的值进行并行解码,实现高效解码。该同一连线可以是同一行、同一列或者同一对角线等,具体可以根据实际应用场景确定。In addition, when the second residual image is used as the input of the autoregressive model for backpropagation, if the autoregressive model at the encoding end uses the pixel values of the predicted pixels to predict the values of pixels on the same connection, then in When the decoding end performs the decoding operation, the values of pixels on the same connection can be decoded in parallel to achieve efficient decoding. The same connection can be the same row, the same column, the same diagonal, etc., which can be determined according to the actual application scenario.
因此,本申请实施方式,自编码模型通常拟合能力较差,需要用较深的网络才能达到较好的压缩率,而本申请结合了自回归模型的输出结果,从而可以降低自编码模型的大小。因此,本申请中,结合了自回归模型与自编码模型进行解码,可以将自编码与自回归模型都控制到很小,避免了自编码模型的网络过大造成的推理时间过长的问题,实现高效的图像解压。并且,本申请提供的方法中,全流程均可基于AI芯片的AI无损压缩实现,包括AI模型及熵编码,避免了系统内存与AI芯片内存的传输问题,提高编码效率。Therefore, in the embodiment of the present application, the autoencoding model usually has poor fitting ability and requires a deeper network to achieve a better compression rate. However, the present application combines the output results of the autoregressive model, thereby reducing the error of the autoencoding model. size. Therefore, in this application, the autoregressive model and the autoencoding model are combined for decoding, and both the autoencoding and the autoregressive models can be controlled to be very small, thus avoiding the problem of too long inference time caused by an excessively large network of the autoencoding model. Enable efficient image decompression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.
为便于理解,下面结合具体的应用场景对本申请提供的解码方法的流程进行介绍,参阅图13,本申请提供的另一种解码方法的流程示意图,如下所述。For ease of understanding, the process of the decoding method provided by this application is introduced below in conjunction with specific application scenarios. Refer to Figure 13, which is a schematic flow chart of another decoding method provided by this application, as follows.
首先,获取隐变量编码1301以及残差编码1302。First, obtain the latent variable encoding 1301 and the residual encoding 1302.
其中,该隐变量编码1301以及残差编码1302可以是从本地读取到的,也可以是从编码端接收得到的,具体可以根据实际应用场景调整。例如,该隐变量编码1301以及残差编码可以是前述图7中提及的隐变量编码708以及残差编码709。Among them, the latent variable encoding 1301 and the residual encoding 1302 can be read locally or received from the encoding end, and can be adjusted according to the actual application scenario. For example, the latent variable encoding 1301 and the residual encoding may be the latent variable encoding 708 and the residual encoding 709 mentioned in FIG. 7 .
随后,对隐变量编码1301输入至静态熵编码器1303,输出隐变量1304。 Subsequently, the latent variable encoding 1301 is input to the static entropy encoder 1303, and the latent variable 1304 is output.
通常,熵编码中各个概率对应的比特可以如前述表1所示,在得到隐变量编码的比特流之后,即可根据该对应关系确定出各个字符对应的概率,从而输出隐变量,该隐变量可以理解为解压图像中的重要特征。Generally, the bits corresponding to each probability in entropy coding can be as shown in the aforementioned Table 1. After obtaining the latent variable encoded bit stream, the probability corresponding to each character can be determined based on the corresponding relationship, thereby outputting the latent variable. It can be understood as decompressing important features in the image.
随后将隐变量1304作为自编码模型1305中解码模型的输入,输出预测残差1306。The latent variable 1304 is then used as the input of the decoding model in the autoencoding model 1305, and the prediction residual 1306 is output.
其中,解码模型与前述图7中的解码模型类似,此处不再赘述。该预测残差1306与前述的预测残差707类似,此处不再赘述。The decoding model is similar to the decoding model in Figure 7 and will not be described again here. The prediction residual 1306 is similar to the aforementioned prediction residual 707 and will not be described again here.
随后将残差编码1302和预测残差1306均作为半动态熵编码器的输入,输出图像残差1308。Subsequently, both the residual encoding 1302 and the prediction residual 1306 are used as inputs to the semi-dynamic entropy encoder, and an image residual 1308 is output.
该图像残差1308与前述图像残差704类似,此处不再赘述。The image residual 1308 is similar to the aforementioned image residual 704 and will not be described again here.
其中,半动态熵编码的解码过程可以理解为前述半动态熵编码的逆运算,即在已知预测残差和残差编码的情况下,反向推理出图像残差。如:求出当前符号的状态值:s=S%2M,找到s对应的符号x,解压出x。需满足CDF(x)≤s<CDF(x)+PMF(x),根据解压出的符号x,还原上一步的状态值:S=S’/2M*PMF(x)+S’%2M–PMF(x)。Among them, the decoding process of semi-dynamic entropy coding can be understood as the inverse operation of the aforementioned semi-dynamic entropy coding, that is, when the prediction residual and residual coding are known, the image residual is inversely inferred. For example: find the state value of the current symbol: s=S % 2 M , find the symbol x corresponding to s, and decompress x. It needs to satisfy CDF(x)≤s<CDF(x)+PMF(x). According to the decompressed symbol x, restore the status value of the previous step: S=S'/2 M *PMF(x)+S'%2 M –PMF(x).
在得到图像残差1308之后,即可使用该图像残差作为自回归模型1309的反向传播的输入,从而推理出解压图像1310。After the image residual 1308 is obtained, the image residual can be used as the input of backpropagation of the autoregressive model 1309 to infer the decompressed image 1310.
可以理解为,自回归模型1309为训练后的模型,与前述自回归模型702相同,可以理解为已知图像残差的情况下,反向推理出输入图像701。It can be understood that the autoregressive model 1309 is a trained model, which is the same as the aforementioned autoregressive model 702. It can be understood that when the image residual is known, the input image 701 is deduced in reverse.
可选地,若编码端在通过自回归模型输出预测残差时,使用已预测的像素点的像素值来并行预测处于同一条线的像素点的像素值,则在对自回归模型进行反向传播时,可以对处于同一条线上的像素点的像素值进行解码,从而实现并行解码。Optionally, if the encoding end uses the pixel values of the predicted pixels to predict the pixel values of the pixels on the same line in parallel when outputting the prediction residual through the autoregressive model, then reverse the autoregressive model. During propagation, the pixel values of pixels on the same line can be decoded to achieve parallel decoding.
例如,给定一张m×n的图像及超参h(0≤h<n),若对于任意像素点(i,j),自回归模型中预测(i,j)的所有点(i′,j′)满足:h×i+j<h×i+j,则此图像可以通过n+(m-1)×h次并行计算解压。解压顺序包括:For example, given an m×n image and hyperparameter h (0≤h<n), if for any pixel point (i,j), all points (i′ of (i,j) predicted in the autoregressive model , j′) satisfies: h×i +j <h×i+j, then this image can be decompressed through n+(m-1)×h parallel calculations. The decompression sequence includes:
顺次解压第一行中的点:(0,0),(0,1),...,(0,n-1)。在解压(0,j)点的同时,若j-h≥0,则同时解压(1,j-h);若j-h×2≥0,则同时解压(2,j-h×2),依此类推;Extract the points in the first row sequentially: (0,0),(0,1),...,(0,n-1). While decompressing the (0,j) point, if j-h≥0, then decompress (1,j-h) at the same time; if j-h×2≥0, then decompress (2,j-h×2) at the same time, and so on;
顺次解压第二行中的点:(1,n-h-1),...,(1,n-1)。解压(1,j)点的同时,若j-h≥0,Extract the points in the second row sequentially: (1,n-h-1),...,(1,n-1). While decompressing the (1,j) point, if j-h≥0,
则同时解压(2,j-h);若j-h×2≥0,则同时解压(3,j-h×2),依此类推;Then decompress (2,j-h) at the same time; if j-h×2≥0, then decompress (3,j-h×2) at the same time, and so on;
按照此规律解压,直到解压所有点。Decompress according to this rule until all points are decompressed.
因此,通过本申请提供的对同一条线并行进行编码以及解码的方式,可以大大提高编码以及解码效率,实现更高效的图像压缩。Therefore, through the method of encoding and decoding the same line in parallel provided by this application, the encoding and decoding efficiency can be greatly improved, and more efficient image compression can be achieved.
为便于理解,下面以一些具体的应用场景为例对本申请实现的效果进行示例性介绍。For ease of understanding, the following takes some specific application scenarios as examples to introduce the effects achieved by this application.
首先需要构造以自回归模型和自编码模型为核心的神经网络模型,本技术方案中自回归模型实施了轻量化的设计,只包含12个参数,对三通道的图像来说每个通道只需要4个参数进行预测。自编码器模型使用的是向量量化的自编码器,利用向量码本缩小隐变量的空间,设置码本大小为256,也即自编码器中的隐变量的取值空间被限制在256个整数中。自编码器的编码器解码器均采用四个残差卷积块,每层特征的通道数为32。First, it is necessary to construct a neural network model with the autoregressive model and the autoencoding model as the core. In this technical solution, the autoregressive model implements a lightweight design and only contains 12 parameters. For a three-channel image, each channel only needs 4 parameters for prediction. The autoencoder model uses a vector quantized autoencoder. It uses the vector codebook to reduce the space of latent variables and sets the codebook size to 256. That is, the value space of the latent variables in the autoencoder is limited to 256 integers. middle. The encoder and decoder of the autoencoder both use four residual convolution blocks, and the number of channels for each layer of features is 32.
模型训练过程以及测试过程如下: The model training process and testing process are as follows:
训练:在单一数据集的训练集上训练,得到自回归模型、自编码模型的参数,及隐变量的统计量,用于隐变量的压缩。Training: Train on the training set of a single data set to obtain the parameters of the autoregressive model, the autoencoding model, and the statistics of the latent variables, which are used to compress the latent variables.
压缩:通过本申请提供的方法,将单一数据集的所有测试图像在batch维度堆叠在一起,组成一个四维张量。将此四维张量一次性作为流程的输入,并将所有图像的残差编码与隐变量的编码并行输出。Compression: Through the method provided in this application, all test images of a single data set are stacked together in the batch dimension to form a four-dimensional tensor. This four-dimensional tensor is used as input to the process at once, and the residual encoding of all images is output in parallel with the encoding of the latent variables.
解压:通过本申请提供的方法将所有图像的残差编码、隐变量一次性作为解压流程中的输入,并将所有图像的原图像并行输出。Decompression: Using the method provided by this application, the residual coding and latent variables of all images are used as input in the decompression process at once, and the original images of all images are output in parallel.
以一些常用的无损压缩为对比,如L3C(Practical Full Resolution Learned Lossless Image Compression)、FLIF(free lossless image format based on maniac compression)WebP或PNG(Portable Network Graphics)等,将本申请提供的方法称为PILC(Practical Image Lossless Compression,图像无损压缩),参阅表3。
Compared with some commonly used lossless compressions, such as L3C (Practical Full Resolution Learned Lossless Image Compression), FLIF (free lossless image format based on maniac compression), WebP or PNG (Portable Network Graphics), etc., the method provided by this application is called PILC (Practical Image Lossless Compression, image lossless compression), see Table 3.
表3table 3
从表3可以看出,与此前AI图像无损压缩算法——L3C相比,本技术发明在保持压缩率基本相当的情况下,将吞吐率提升14倍,同时本技术发明在压缩率和吞吐率上也优于PNG、WebP、FLIF等传统方法。As can be seen from Table 3, compared with the previous AI image lossless compression algorithm - L3C, this technical invention improves the throughput rate by 14 times while maintaining the compression rate. At the same time, this technical invention improves the compression rate and throughput rate. It is also better than traditional methods such as PNG, WebP, and FLIF.
因此,本申请提供的方法,结合了自回归模型与自编码模型,相比单独使用自编码模型,大大降低了模型大小。且本申请提供的自回归模型可以实现并行编码和并行解压,可以实现高效的编码和解码,实现高效的图像压缩和解压。且本申请提供的方法的流程均可在AI芯片运行,避免了系统内存与AI芯片内存间的信息传输,进一步提高编解码效率。Therefore, the method provided by this application combines the autoregressive model and the autoencoding model, which greatly reduces the model size compared to using the autoencoding model alone. Moreover, the autoregressive model provided by this application can realize parallel encoding and parallel decompression, efficient encoding and decoding, and efficient image compression and decompression. Moreover, the process of the method provided in this application can be run on the AI chip, which avoids the transmission of information between the system memory and the AI chip memory, further improving the encoding and decoding efficiency.
此外,针对现实生产环境中,图像的尺寸基本是不同的,同时图像的分辨率也相对较高,为了能够不同尺寸的高清大图进行高效压缩、解压,本实施例进行如下设计。In addition, in the real production environment, the sizes of images are basically different, and the resolution of the images is also relatively high. In order to efficiently compress and decompress high-definition large images of different sizes, this embodiment is designed as follows.
模型训练:在模型训练阶段,利用OpenImage,ImageNet64等高清大数据集进行模型训练,得到自回归模型、自编码模型的参数。Model training: In the model training stage, high-definition large data sets such as OpenImage and ImageNet64 are used for model training to obtain the parameters of the autoregressive model and the autoencoding model.
压缩:compression:
首先进行图像的预处理,将不同尺寸的高清大图像统一切片成相同尺寸(如32x 32),并单独存储每张图像大小信息,用于图像的还原;First, perform image preprocessing, slice large high-definition images of different sizes into the same size (such as 32x 32), and store each image size information separately for image restoration;
将所有切片在batch维度堆叠在一起,作为流程的输入; Stack all slices together in the batch dimension as input to the process;
将所有图像的残差编码与隐变量的编码并行输出;Output the residual coding of all images in parallel with the coding of latent variables;
将每个数据集(同一数据集)/每张图像(不同数据集)隐变量的统计信息记录下来,作为流程的另一个输出。Record the statistical information of the latent variables of each data set (same data set)/each image (different data sets) as another output of the process.
所实现的效果可以参阅表4。
The achieved effects can be seen in Table 4.
表4Table 4
显然,本申请提供的方法,可以实现更高的吞吐率,实现高效编解码。Obviously, the method provided by this application can achieve higher throughput and efficient encoding and decoding.
更具体地,下面以一些常用的压缩方式进行更详细的对比。More specifically, the following is a more detailed comparison of some commonly used compression methods.
参阅表5,在最大似然(评价生成模型预测准确率的指标,越小越好)与此前最快AI算法L3C基本一致的前提下,推理速度提升9.6倍。
Referring to Table 5, under the premise that the maximum likelihood (an indicator for evaluating the prediction accuracy of the generated model, the smaller the better) is basically consistent with the previous fastest AI algorithm L3C, the inference speed is increased by 9.6 times.
表5table 5
参阅表6,相同的自回归模型,用本申请提供的并行方案,与未并行的方案相比,解压速度提升7.9倍。并行方案对感受野有限制,但此感受野对压缩率的影响有限。
Referring to Table 6, for the same autoregressive model, using the parallel solution provided by this application, the decompression speed is increased by 7.9 times compared with the non-parallel solution. The parallel scheme has a restriction on the receptive field, but this receptive field has a limited impact on the compression ratio.
表6Table 6
参阅表7,与动态熵编码(rANS)相比,本申请提出的半动态熵编码(ANS-AI)编码速度提升20倍,解码速度提升100倍,BPD损失小于0.55,0.17。且此半动态熵编码可在 AI芯片上运行,在单张V100上,巅峰速度可达1GB/s。
Referring to Table 7, compared with dynamic entropy coding (rANS), the coding speed of the semi-dynamic entropy coding (ANS-AI) proposed in this application is increased by 20 times, the decoding speed is increased by 100 times, and the BPD loss is less than 0.55 and 0.17. And this semi-dynamic entropy coding can be Running on the AI chip, on a single V100, the peak speed can reach 1GB/s.
表7Table 7
此外,半动态熵编码与动态熵编码相比,所需要的分布种类数由2048减小到8,预处理需要的内存大小减小至原来的1/256,BPD损失小于0.03,可以降低熵编码所需计算资源,提高编码效率。In addition, compared with dynamic entropy coding, semi-dynamic entropy coding reduces the number of distribution types required from 2048 to 8, the memory size required for preprocessing is reduced to 1/256 of the original, and the BPD loss is less than 0.03, which can reduce entropy coding. Required computing resources to improve coding efficiency.
前述对本申请提供的图像编码方法和图像解压方法的流程进行了介绍,下面对执行前述方法的装置进行介绍。The process of the image encoding method and image decompression method provided by the present application has been introduced above. The device for executing the foregoing method will be introduced below.
参阅图14,本申请提供的一种图像编码装置的结构示意图,该图像编码装置包括:Referring to Figure 14, this application provides a schematic structural diagram of an image coding device. The image coding device includes:
自回归模块1401,用于将输入图像作为自回归模型的输入,输出第一图像,自回归模型;The autoregressive module 1401 is used to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;
残差计算模块1402,用于获取第一图像和输入图像之间的残差,得到第一残差图像;The residual calculation module 1402 is used to obtain the residual between the first image and the input image to obtain the first residual image;
自编码模块1403,用于将输入图像作为自编码模型的输入,输出隐变量和第一残差分布,隐变量包括从输入图像中提取到的特征,第一残差分布包括自编码模型输出的用于表示输入图像中各个像素点和第一残差图像中对应的各个像素点之间的残差值;The autoencoding module 1403 is used to use the input image as the input of the autoencoding model, and output latent variables and a first residual distribution. The latent variables include features extracted from the input image, and the first residual distribution includes the features output by the autoencoding model. Used to represent the residual value between each pixel in the input image and the corresponding pixel in the first residual image;
残差编码模块1404,用于对第一残差图像和第一残差分布进行编码,得到残差编码数据;Residual coding module 1404, used to code the first residual image and the first residual distribution to obtain residual coded data;
隐变量编码模块1405,用于对隐变量进行编码,得到隐变量编码数据,隐变量编码数据和残差编码数据用于解压后得到输入图像。The latent variable encoding module 1405 is used to encode latent variables to obtain latent variable encoded data. The latent variable encoded data and residual encoded data are used to obtain the input image after decompression.
在一种可能的实施方式中,残差编码模块1404,具体用于将第一残差图像和第一残差分布作为半动态熵编码器的输入,输出残差编码数据,该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算。In a possible implementation, the residual encoding module 1404 is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy encoder, and output residual encoding data. The semi-dynamic entropy encoding The encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type of encoding operation includes addition, subtraction or bit operation. The second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-time operations such as multiplication, division or remainder operations.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态 熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic The entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using this semi-dynamic entropy encoder for entropy encoding, simple operations can be used, such as addition, subtraction, and bit operations. and other efficient coding operations to achieve efficient coding.
在一种可能的实施方式中,隐变量编码模块1405,具体用于将隐变量作为静态熵编码器的输入,得到隐变量编码数据。In a possible implementation, the latent variable encoding module 1405 is specifically configured to use latent variables as inputs to the static entropy encoder to obtain latent variable encoded data.
在一种可能的实施方式中,自编码模型包括编码模型和解码模型,自编码模块1403,具体用于:将输入图像作为编码模型的输入,输出隐变量,编码模型用于从输入图形中提取特征;将隐变量作为解码模型的输入,得到第一残差分布,解码模型用于预测输入的图像与对应的像素分布之间的残差值。In a possible implementation, the auto-encoding model includes an encoding model and a decoding model. The auto-encoding module 1403 is specifically used to: use the input image as the input of the encoding model, output latent variables, and the encoding model is used to extract from the input graphics. Features; use the latent variable as the input of the decoding model to obtain the first residual distribution, and the decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
在一种可能的实施方式中,自回归模型用于使用已预测的像素点的像素值预测处于同一连线上的像素点的值。In a possible implementation, the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
参阅图15,本申请提供的一种图像解压装置的结构示意图,该图像解压装置包括:Refer to Figure 15, which is a schematic structural diagram of an image decompression device provided by this application. The image decompression device includes:
收发模块1501,用于获取隐变量编码数据和残差编码数据,该隐变量编码数据包括编码端从输入图像中提取到的特征进行编码得到,该残差编码数据包括对自回归模型输出的第一图像和该输入图像之间的残差进行编码得到的数据;The transceiver module 1501 is used to obtain latent variable coded data and residual coded data. The latent variable coded data includes the features extracted from the input image by the coding end and is obtained by coding. The residual coded data includes the third output of the autoregressive model. Data obtained by encoding the residual between an image and the input image;
隐变量解码模块1502,用于对隐变量编码数据进行解码,得到隐变量,该隐变量包括编码端从输入图像中提取到的特征;The latent variable decoding module 1502 is used to decode the latent variable encoded data to obtain latent variables. The latent variables include features extracted by the encoding end from the input image;
自编码模块1503,用于将隐变量作为自编码模型的输入,输出第二残差分布;The autoencoding module 1503 is used to use latent variables as the input of the autoencoding model and output the second residual distribution;
残差解码模块1504,用于结合第二残差分布和残差编码数据进行解码,得到第二残差图像;The residual decoding module 1504 is used to decode the second residual distribution and the residual coded data to obtain the second residual image;
自回归模块1505,用于将第二残差图像作为自回归模型的反向传播的输入,输出解压图像。The autoregressive module 1505 is configured to use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.
在一种可能的实施方式中,隐变量解码模块1502,具体用于将隐变量编码数据作为静态熵编码器的输入,输出隐变量。In a possible implementation, the latent variable decoding module 1502 is specifically configured to use the latent variable encoded data as the input of the static entropy encoder and output the latent variable.
在一种可能的实施方式中,残差解码模块1504,具体用于将第二残差分布和残差编码数据作为半动态熵编码器的输入,输出第二残差图像,该半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,第一预设类型的编码运算包括加、减或位运算,且半动态熵编码器中不包括第二预设类型的编码运算,第二预设类型包括乘、除或取余运算中的至少一种,即该半动态熵编码器中不包括乘、除或取余运算等耗时较长运算。In a possible implementation, the residual decoding module 1504 is specifically configured to use the second residual distribution and the residual coded data as inputs of the semi-dynamic entropy encoder, and output the second residual image. The semi-dynamic entropy encoder The encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type of encoding operation includes addition, subtraction or bit operation. The second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-time operations such as multiplication, division or remainder operations.
在一种可能的实施方式中,该半动态熵编码器可以是对动态上编码器进行转换得到。具体地,可以对动态熵编码器的运算进行近似处理,如将动态熵编码器的运算替换为近似运算,减少或者去除乘、除、取余等运算,随后还可以进行变换处理,对运算进行变换,从而将所有耗时超过一定时长的运算(如剩余的取余、乘以及除等运损)转化为表格存取,及加、减、位等轻量级运算,得到本申请提供的半动态熵编码器。可以理解为,该半动态熵编码器可以是对动态熵编码器中的一些运算进行替换或者转换得到的熵编码器,使用该半动态熵编码器进行熵编码时,即可使用简单的运算,如加、减、位运算等高效编码的运算,从而实现高效编码。 In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.
在一种可能的实施方式中,自回归模块1505,具体用于通过自回归模型,对第二残差图像中处于同一连线上的像素点进行并行解码,得到解压图像。In one possible implementation, the autoregressive module 1505 is specifically configured to decode pixels on the same connection line in the second residual image in parallel through an autoregressive model to obtain a decompressed image.
请参阅图16,本申请提供的另一种图像编码装置的结构示意图,如下所述。Please refer to Figure 16, which is a schematic structural diagram of another image encoding device provided by this application, as follows.
该图像编码装置可以包括处理器1601和存储器1602。该处理器1601和存储器1602通过线路互联。其中,存储器1602中存储有程序指令和数据。The image encoding device may include a processor 1601 and a memory 1602. The processor 1601 and the memory 1602 are interconnected through lines. Among them, the memory 1602 stores program instructions and data.
存储器1602中存储了前述图6-图11中的步骤对应的程序指令以及数据。The memory 1602 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 6-11.
处理器1601用于执行前述图6-图11中任一实施例所示的图像编码装置执行的方法步骤。The processor 1601 is configured to execute the method steps performed by the image encoding device shown in any of the embodiments shown in FIGS. 6 to 11 .
可选地,该图像编码装置还可以包括收发器1603,用于接收或者发送数据。Optionally, the image encoding device may also include a transceiver 1603 for receiving or sending data.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图6-图11所示实施例描述的方法中的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program for generating vehicle driving speed. When running on the computer, the computer is caused to execute the steps shown in Figures 6 to 11. The illustrated embodiments describe steps in a method.
可选地,前述的图16中所示的图像编码装置为芯片。Optionally, the aforementioned image encoding device shown in FIG. 16 is a chip.
本申请实施例还提供了一种图像编码装置,该图像编码装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图6-图11中任一实施例所示的图像编码装置执行的方法步骤。Embodiments of the present application also provide an image encoding device. The image encoding device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. , the processing unit is configured to perform the method steps performed by the image encoding device shown in any of the embodiments in FIGS. 6 to 11 .
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器1601,或者处理器1601的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中图像编码装置执行的动作。An embodiment of the present application also provides a digital processing chip. The digital processing chip integrates the circuit and one or more interfaces for realizing the above-mentioned processor 1601, or the functions of the processor 1601. When a memory is integrated into the digital processing chip, the digital processing chip can complete the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not have an integrated memory, it can be connected to an external memory through a communication interface. The digital processing chip implements the actions performed by the image encoding device in the above embodiment according to the program code stored in the external memory.
本申请实施例提供的图像编码装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图6-图11所示实施例描述的图像编码方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The image encoding device provided by the embodiment of the present application may be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the server executes the image encoding method described in the embodiments shown in FIGS. 6-11. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
请参阅图17,本申请提供的另一种图像解压装置的结构示意图,如下所述。Please refer to Figure 17, which is a schematic structural diagram of another image decompression device provided by this application, as described below.
该图像解压装置可以包括处理器1701和存储器1702。该处理器1701和存储器1702通过线路互联。其中,存储器1702中存储有程序指令和数据。The image decompression device may include a processor 1701 and a memory 1702. The processor 1701 and the memory 1702 are interconnected through lines. Among them, the memory 1702 stores program instructions and data.
存储器1702中存储了前述图12-图13中的步骤对应的程序指令以及数据。The memory 1702 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 12-13.
处理器1701用于执行前述图12-图13中任一实施例所示的图像解压装置执行的方法步骤。The processor 1701 is configured to execute the method steps performed by the image decompression device shown in any of the embodiments shown in FIGS. 12 and 13 .
可选地,该图像解压装置还可以包括收发器1703,用于接收或者发送数据。Optionally, the image decompression device may also include a transceiver 1703 for receiving or sending data.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用 于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图12-图13所示实施例描述的方法中的步骤。Embodiments of the present application also provide a computer-readable storage medium, which stores useful For the program that generates the vehicle's driving speed, when it is driving on the computer, the computer is caused to execute the steps in the method described in the embodiment shown in FIGS. 12-13.
可选地,前述的图17中所示的图像解压装置为芯片。Optionally, the aforementioned image decompression device shown in Figure 17 is a chip.
本申请实施例还提供了一种图像解压装置,该图像解压装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图12-图13中任一实施例所示的图像解压装置执行的方法步骤。Embodiments of the present application also provide an image decompression device. The image decompression device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. , the processing unit is used to execute the method steps executed by the image decompression device shown in any of the embodiments in FIGS. 12 and 13 .
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器1701,或者处理器1701的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中图像解压装置执行的动作。An embodiment of the present application also provides a digital processing chip. The digital processing chip integrates the circuit and one or more interfaces for realizing the above-mentioned processor 1701, or the functions of the processor 1701. When a memory is integrated into the digital processing chip, the digital processing chip can complete the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not have an integrated memory, it can be connected to an external memory through a communication interface. The digital processing chip implements the actions performed by the image decompression device in the above embodiment according to the program code stored in the external memory.
本申请实施例提供的图像解压装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图6-图11所示实施例描述的图像解压方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The image decompression device provided by the embodiment of the present application may be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the server executes the image decompression method described in the embodiments shown in FIGS. 6-11. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图6-图13所示实施例描述的方法中图像解压装置或者图像解压装置所执行的步骤。An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the image decompression device or the image decompression device in the method described in the embodiments shown in Figures 6 to 13. step.
本申请还提供了一种图像处理系统,包括了图像编码装置以及图像解压装置,该图像编码装置用于执行前述图6-图11对应的方法步骤,该图像解压装置用于执行前述图12-图13对应的方法步骤。This application also provides an image processing system, which includes an image encoding device and an image decompression device. The image encoding device is used to execute the method steps corresponding to the aforementioned Figures 6-11. The image decompression device is used to execute the aforementioned Figures 12-11. Figure 13 corresponds to the method steps.
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (GPU), or a digital signal processing unit. Digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete Hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.
示例性地,请参阅图18,图18为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 180,NPU 180作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1803,通过控制器1804控制运算电路1803提取存储器中的矩阵数据并进行乘法运算。Exemplarily, please refer to Figure 18. Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip can be represented as a neural network processor NPU 180. The NPU 180 serves as a co-processor and is mounted to the main CPU ( On the Host CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1803. The arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路1803内部包括多个处理单元(process engine,PE)。在一 些实现中,运算电路1803是二维脉动阵列。运算电路1803还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1803是通用的矩阵处理器。In some implementations, the computing circuit 1803 includes multiple processing units (process engines, PEs) internally. In a In some implementations, the arithmetic circuit 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1803 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1802中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1801中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1808中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit. The operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .
统一存储器1806用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)1805,DMAC被搬运到权重存储器1802中。输入数据也通过DMAC被搬运到统一存储器1806中。The unified memory 1806 is used to store input data and output data. The weight data directly passes through the storage unit access controller (direct memory access controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802. Input data is also transferred to unified memory 1806 via DMAC.
总线接口单元(bus interface unit,BIU)1810,用于AXI总线与DMAC和取指存储器(instruction fetch buffer,IFB)1809的交互。Bus interface unit (bus interface unit, BIU) 1810 is used for interaction between the AXI bus and DMAC and instruction fetch buffer (IFB) 1809.
总线接口单元1810(bus interface unit,BIU),用于取指存储器1809从外部存储器获取指令,还用于存储单元访问控制器1805从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1810 (bus interface unit, BIU) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1806或将权重数据搬运到权重存储器1802中或将输入数据数据搬运到输入存储器1801中。DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .
向量计算单元1807包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。The vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, upsampling of feature planes, etc.
在一些实现中,向量计算单元1807能将经处理的输出的向量存储到统一存储器1806。例如,向量计算单元1807可以将线性函数和/或非线性函数应用到运算电路1803的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1807生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1803的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, vector calculation unit 1807 can store the processed output vectors to unified memory 1806 . For example, the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value. In some implementations, vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.
控制器1804连接的取指存储器(instruction fetch buffer)1809,用于存储控制器1804使用的指令;The instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;
统一存储器1806,输入存储器1801,权重存储器1802以及取指存储器1809均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
其中,循环神经网络中各层的运算可以由运算电路1803或向量计算单元1807执行。Among them, the operations of each layer in the recurrent neural network can be performed by the operation circuit 1803 or the vector calculation unit 1807.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图6-图13的方法的程序执行的集成电路。The processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the methods in Figures 6 to 13.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际 的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate. The physical unit can be located in one place, or it can be distributed across multiple network units. can be based on actual It is necessary to select some or all of the modules to achieve the purpose of this embodiment. In addition, in the drawings of the device embodiments provided in this application, the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., including a number of instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments of this application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)), etc.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects without necessarily using Used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
最后应说明的是:以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。 Finally, it should be noted that: the above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes within the technical scope disclosed in the present application. or replacement, shall be covered by the protection scope of this application.

Claims (23)

  1. 一种图像编码方法,其特征在于,包括:An image coding method, characterized by including:
    将输入图像作为自回归模型的输入,输出第一图像;Use the input image as the input of the autoregressive model and output the first image;
    获取所述第一图像和所述输入图像之间的残差,得到第一残差图像;Obtain the residual between the first image and the input image to obtain a first residual image;
    将所述输入图像作为自编码模型的输入,输出隐变量和第一残差分布,所述隐变量包括从所述输入图像中提取到的特征,所述第一残差分布包括所述自编码模型输出的用于表示所述输入图像中各个像素点和所述第一残差图像中对应的各个像素点之间的残差值;The input image is used as the input of the autoencoding model, and a latent variable and a first residual distribution are output. The latent variable includes features extracted from the input image. The first residual distribution includes the autoencoding The model output is used to represent the residual value between each pixel point in the input image and the corresponding each pixel point in the first residual image;
    对所述第一残差图像和所述第一残差分布进行编码,得到残差编码数据;Encode the first residual image and the first residual distribution to obtain residual coded data;
    对所述隐变量进行编码,得到隐变量编码数据,所述隐变量编码数据和所述残差编码数据用于解压后得到所述输入图像。The latent variable is encoded to obtain latent variable encoded data, and the latent variable encoded data and the residual encoded data are used to obtain the input image after decompression.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述第一残差图像和所述第一残差分布进行编码,得到残差编码数据,包括:The method according to claim 1, characterized in that said encoding the first residual image and the first residual distribution to obtain residual coded data includes:
    将所述第一残差图像和所述第一残差分布作为半动态熵编码器的输入,输出所述残差编码数据,所述半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,所述第一预设类型的编码运算包括加、减或位运算,且所述半动态熵编码器中不包括第二预设类型的编码运算,所述第二预设类型包括乘、除或取余运算中的至少一种。The first residual image and the first residual distribution are used as inputs of a semi-dynamic entropy encoder, and the residual encoded data is output. The semi-dynamic entropy encoder is used to use a first preset type of encoding operation. Entropy encoding is performed. The first preset type of encoding operation includes addition, subtraction or bit operation, and the second preset type of encoding operation is not included in the semi-dynamic entropy encoder. The second preset type includes At least one of multiplication, division, or remainder operations.
  3. 根据权利要求1所述的方法,其特征在于,所述对所述隐变量进行编码,得到残差编码数据,包括:The method according to claim 1, characterized in that said encoding the latent variables to obtain residual encoded data includes:
    将所述隐变量作为静态熵编码器的输入,得到所述隐变量编码数据。The latent variable is used as the input of the static entropy encoder to obtain the latent variable encoded data.
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述自编码模型包括编码模型和解码模型,所述将所述输入图像作为自编码模型的输入,输出隐变量和第一残差分布,包括:The method according to any one of claims 1 to 3, characterized in that the auto-encoding model includes an encoding model and a decoding model, and the input image is used as the input of the auto-encoding model to output a latent variable and a third A residual distribution, including:
    将所述输入图像作为所述编码模型的输入,输出所述隐变量,所述编码模型用于从所述输入图形中提取特征;Using the input image as an input to the encoding model and outputting the latent variable, the encoding model is used to extract features from the input graphics;
    将所述隐变量作为所述解码模型的输入,得到所述第一残差分布,所述解码模型用于预测输入的图像与对应的像素分布之间的残差值。The latent variable is used as the input of the decoding model to obtain the first residual distribution. The decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述自回归模型用于使用已预测的像素点的像素值预测处于同一连线上的像素点的值。The method according to any one of claims 1 to 4, characterized in that the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
  6. 一种图像解压方法,其特征在于,包括:An image decompression method, characterized by including:
    获取隐变量编码数据和残差编码数据,所述隐变量编码数据为编码端从输入图像中提取到的特征进行编码得到,所述残差编码数据包括对所述输入图像与自回归模型前向传播输出的图像之间的残差进行编码得到的数据;Obtain latent variable encoding data and residual encoding data. The latent variable encoding data is obtained by encoding the features extracted from the input image by the encoding end. The residual encoding data includes the input image and the autoregressive model forward Propagate the data obtained by encoding the residual between the output images;
    对所述隐变量编码数据进行解码,得到隐变量,所述隐变量包括所述输入图像中提取 的特征;The latent variable encoded data is decoded to obtain a latent variable. The latent variable includes the extracted data from the input image. Characteristics;
    将所述隐变量作为自编码模型的输入,输出第二残差分布;Use the latent variable as the input of the autoencoding model and output the second residual distribution;
    结合所述第二残差分布和所述残差编码数据进行解码,得到第二残差图像;Decoding is performed in combination with the second residual distribution and the residual encoded data to obtain a second residual image;
    将所述第二残差图像作为自回归模型的反向传播的输入,输出解压图像。The second residual image is used as the input of backpropagation of the autoregressive model, and a decompressed image is output.
  7. 根据权利要求6所述的方法,其特征在于,所述对所述隐变量编码数据进行解码,得到隐变量,包括:The method according to claim 6, characterized in that decoding the latent variable encoded data to obtain latent variables includes:
    将所述隐变量编码数据作为静态熵编码器的输入,输出所述隐变量。The latent variable encoded data is used as the input of the static entropy encoder and the latent variable is output.
  8. 根据权利要求6或7所述的方法,其特征在于,所述结合所述第二残差分布和所述残差编码数据进行解码,得到第二残差图像,包括:The method according to claim 6 or 7, characterized in that combining the second residual distribution and the residual coded data for decoding to obtain a second residual image includes:
    将所述第二残差分布和所述残差编码数据作为半动态熵编码器的输入,输出所述第二残差图像,所述半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,所述第一预设类型的编码运算包括加、减或位运算,且所述半动态熵编码器中不包括第二预设类型的编码运算,所述第二预设类型包括乘、除或取余运算中的至少一种。The second residual distribution and the residual encoded data are used as inputs of a semi-dynamic entropy encoder, and the second residual image is output, and the semi-dynamic entropy encoder is used to use a first preset type of encoding operation. Entropy encoding is performed. The first preset type of encoding operation includes addition, subtraction or bit operation, and the second preset type of encoding operation is not included in the semi-dynamic entropy encoder. The second preset type includes At least one of multiplication, division, or remainder operations.
  9. 根据权利要求6-8中任一项所述的方法,其特征在于,所述将所述第二残差图像作为自回归模型的反向传播的输入,输出解压图像,包括:The method according to any one of claims 6 to 8, characterized in that using the second residual image as an input of backpropagation of an autoregressive model and outputting a decompressed image includes:
    通过所述自回归模型,对所述第二残差图像中处于同一连线上的像素点进行并行解码,得到所述解压图像。Through the autoregressive model, pixels on the same connection line in the second residual image are decoded in parallel to obtain the decompressed image.
  10. 一种图像编码装置,其特征在于,包括:An image coding device, characterized in that it includes:
    自回归模块,用于将输入图像作为自回归模型的输入,输出第一图像,所述自回归模型;An autoregressive module, configured to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;
    残差计算模块,用于获取所述第一图像和所述输入图像之间的残差,得到第一残差图像;A residual calculation module, used to obtain the residual between the first image and the input image to obtain a first residual image;
    自编码模块,用于将所述输入图像作为自编码模型的输入,输出隐变量和第一残差分布,所述隐变量包括从所述输入图像中提取到的特征,所述第一残差分布包括所述自编码模型输出的用于表示所述输入图像中各个像素点和所述第一残差图像中对应的各个像素点之间的残差值;An autoencoding module, configured to use the input image as the input of the autoencoding model, and output a latent variable and a first residual distribution, where the latent variable includes features extracted from the input image, and the first residual The distribution includes a residual value output by the autoencoding model to represent a residual value between each pixel point in the input image and a corresponding each pixel point in the first residual image;
    残差编码模块,用于对所述第一残差图像和所述第一残差分布进行编码,得到残差编码数据;A residual coding module, used to code the first residual image and the first residual distribution to obtain residual coded data;
    隐变量编码模块,用于对所述隐变量进行编码,得到隐变量编码数据,所述隐变量编码数据和所述残差编码数据用于解压后得到所述输入图像。A latent variable encoding module is used to encode the latent variable to obtain latent variable encoded data, and the latent variable encoded data and the residual encoded data are used to obtain the input image after decompression.
  11. 根据权利要求10所述的装置,其特征在于,The device according to claim 10, characterized in that:
    所述残差编码模块,具体用于将所述第一残差图像和所述第一残差分布作为半动态熵 编码器的输入,输出所述残差编码数据,所述半动态熵编码器用于使用第一预设类型的编码运算进行熵编码,所述第一预设类型的编码运算包括加、减或位运算,且所述半动态熵编码器中不包括第二预设类型的编码运算,所述第二预设类型包括乘、除或取余运算中的至少一种。The residual coding module is specifically configured to use the first residual image and the first residual distribution as semi-dynamic entropy The input of the encoder is to output the residual encoded data. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bitwise encoding. operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of multiplication, division or remainder operation.
  12. 根据权利要求10所述的装置,其特征在于,The device according to claim 10, characterized in that:
    所述隐变量编码模块,具体用于将所述隐变量作为静态熵编码器的输入,得到所述隐变量编码数据。The latent variable encoding module is specifically configured to use the latent variable as the input of a static entropy encoder to obtain the latent variable encoded data.
  13. 根据权利要求10-12中任一项所述的装置,其特征在于,所述自编码模型包括编码模型和解码模型,所述自编码模块,具体用于:The device according to any one of claims 10-12, characterized in that the self-encoding model includes an encoding model and a decoding model, and the self-encoding module is specifically used for:
    将所述输入图像作为所述编码模型的输入,输出所述隐变量,所述编码模型用于从所述输入图形中提取特征;Using the input image as an input to the encoding model and outputting the latent variable, the encoding model is used to extract features from the input graphics;
    将所述隐变量作为所述解码模型的输入,得到所述第一残差分布,所述解码模型用于预测输入的图像与对应的像素分布之间的残差值。The latent variable is used as the input of the decoding model to obtain the first residual distribution. The decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
  14. 根据权利要求10-13中任一项所述的装置,其特征在于,所述自回归模型用于使用已预测的像素点的像素值预测处于同一连线上的像素点的值。The device according to any one of claims 10 to 13, characterized in that the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
  15. 一种图像解压装置,其特征在于,包括:An image decompression device, characterized by including:
    收发模块,用于获取隐变量编码数据和残差编码数据,所述隐变量编码数据包括编码端从输入图像中提取到的特征进行编码得到,所述残差编码数据包括自回归模型前向传播输出的第一图像和所述输入图像之间的残差进行编码得到;Transceiver module, used to obtain latent variable coded data and residual coded data. The latent variable coded data includes the features extracted from the input image by the coding end and is encoded. The residual coded data includes autoregressive model forward propagation. The residual between the output first image and the input image is obtained by encoding;
    隐变量解码模块,用于对所述隐变量编码数据进行解码,得到隐变量,所述隐变量包括所述输入图像中提取的特征;A latent variable decoding module, used to decode the latent variable encoded data to obtain latent variables, where the latent variables include features extracted from the input image;
    自编码模块,用于将所述隐变量作为自编码模型的输入,输出第二残差分布;The autoencoding module is used to use the latent variable as the input of the autoencoding model and output the second residual distribution;
    残差解码模块,用于结合所述第二残差分布和所述残差编码数据进行解码,得到第二残差图像;A residual decoding module, configured to perform decoding in combination with the second residual distribution and the residual coded data to obtain a second residual image;
    自回归模块,用于将所述第二残差图像作为自回归模型的反向传播的输入,输出解压图像。The autoregressive module is configured to use the second residual image as an input of backpropagation of the autoregressive model and output a decompressed image.
  16. 根据权利要求15所述的装置,其特征在于,The device according to claim 15, characterized in that:
    所述隐变量解码模块,具体用于将所述隐变量编码数据作为静态熵编码器的输入,输出所述隐变量。The latent variable decoding module is specifically configured to use the latent variable encoded data as the input of a static entropy encoder and output the latent variable.
  17. 根据权利要求15或16所述的装置,其特征在于,The device according to claim 15 or 16, characterized in that,
    所述残差解码模块,具体用于将所述第二残差分布和所述残差编码数据作为半动态熵编码器的输入,输出所述第二残差图像,所述半动态熵编码器用于使用第一预设类型的编 码运算进行熵编码,所述第一预设类型的编码运算包括加、减或位运算,且所述半动态熵编码器中不包括第二预设类型的编码运算,所述第二预设类型包括乘、除或取余运算中的至少一种。The residual decoding module is specifically configured to use the second residual distribution and the residual encoded data as inputs of a semi-dynamic entropy encoder and output the second residual image. The semi-dynamic entropy encoder uses For editing using the first default type Code operation performs entropy coding. The first preset type of encoding operation includes addition, subtraction or bit operation, and the second preset type of encoding operation is not included in the semi-dynamic entropy encoder. The second preset type The type includes at least one of multiplication, division, or remainder operations.
  18. 根据权利要求10-17中任一项所述的装置,其特征在于,The device according to any one of claims 10-17, characterized in that,
    所述自回归模块,具体用于通过所述自回归模型,对所述第二残差图像中处于同一连线上的像素点进行并行解码,得到所述解压图像。The autoregressive module is specifically configured to use the autoregressive model to decode pixels on the same connection line in the second residual image in parallel to obtain the decompressed image.
  19. 一种图像编码装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1-5中任一项所述的方法的步骤。An image encoding device, characterized in that it includes a processor, the processor is coupled to a memory, and the memory stores a program. When the program instructions stored in the memory are executed by the processor, claims 1-5 are realized. The steps of any of the methods.
  20. 一种图像解压装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求6-9中任一项所述的方法的步骤。An image decompression device, characterized in that it includes a processor, the processor is coupled to a memory, and the memory stores a program. When the program instructions stored in the memory are executed by the processor, claims 6-9 are realized. The steps of any of the methods.
  21. 一种图像处理系统,其特征在于,包括图像编码装置和图像解压装置,所述图像编码装置用于实现权利要求1-5中任一项所述的方法的步骤,所述图像解压装置用于实现权利要求6-9中任一项所述的方法的步骤。An image processing system, characterized in that it includes an image encoding device and an image decompression device, the image encoding device is used to implement the steps of the method according to any one of claims 1 to 5, and the image decompression device is used to Implementing the steps of the method according to any one of claims 6-9.
  22. 一种计算机可读存储介质,包括程序,当其被处理单元所执行时,执行如权利要求1至9中任一项所述的方法的步骤。A computer-readable storage medium comprising a program that, when executed by a processing unit, performs the steps of the method according to any one of claims 1 to 9.
  23. 一种计算机程序产品,其特征在于,所述计算机程序产品包括软件代码,所述软件代码用于执行如权利要求1至9中任一项所述的方法的步骤。 A computer program product, characterized in that the computer program product includes software code for executing the steps of the method according to any one of claims 1 to 9.
PCT/CN2023/090043 2022-04-26 2023-04-23 Image encoding method and apparatus, and image decompression method and apparatus WO2023207836A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210447177.0A CN115022637A (en) 2022-04-26 2022-04-26 Image coding method, image decompression method and device
CN202210447177.0 2022-04-26

Publications (1)

Publication Number Publication Date
WO2023207836A1 true WO2023207836A1 (en) 2023-11-02

Family

ID=83067519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/090043 WO2023207836A1 (en) 2022-04-26 2023-04-23 Image encoding method and apparatus, and image decompression method and apparatus

Country Status (2)

Country Link
CN (1) CN115022637A (en)
WO (1) WO2023207836A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022637A (en) * 2022-04-26 2022-09-06 华为技术有限公司 Image coding method, image decompression method and device
CN117155405A (en) * 2023-08-09 2023-12-01 海飞科(南京)信息技术有限公司 Method for quickly establishing tANS coding and decoding conversion table based on gradient descent

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111405283A (en) * 2020-02-20 2020-07-10 北京大学 End-to-end video compression method, system and storage medium based on deep learning
CN111901596A (en) * 2020-06-29 2020-11-06 北京大学 Video hybrid coding and decoding method, device and medium based on deep learning
CN112257858A (en) * 2020-09-21 2021-01-22 华为技术有限公司 Model compression method and device
CN113574882A (en) * 2019-03-21 2021-10-29 高通股份有限公司 Video compression using depth generative models
WO2022022176A1 (en) * 2020-07-30 2022-02-03 华为技术有限公司 Image processing method and related device
US20220272345A1 (en) * 2020-10-23 2022-08-25 Deep Render Ltd Image encoding and decoding, video encoding and decoding: methods, systems and training methods
CN115022637A (en) * 2022-04-26 2022-09-06 华为技术有限公司 Image coding method, image decompression method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113574882A (en) * 2019-03-21 2021-10-29 高通股份有限公司 Video compression using depth generative models
CN111405283A (en) * 2020-02-20 2020-07-10 北京大学 End-to-end video compression method, system and storage medium based on deep learning
CN111901596A (en) * 2020-06-29 2020-11-06 北京大学 Video hybrid coding and decoding method, device and medium based on deep learning
WO2022022176A1 (en) * 2020-07-30 2022-02-03 华为技术有限公司 Image processing method and related device
CN112257858A (en) * 2020-09-21 2021-01-22 华为技术有限公司 Model compression method and device
US20220272345A1 (en) * 2020-10-23 2022-08-25 Deep Render Ltd Image encoding and decoding, video encoding and decoding: methods, systems and training methods
CN115022637A (en) * 2022-04-26 2022-09-06 华为技术有限公司 Image coding method, image decompression method and device

Also Published As

Publication number Publication date
CN115022637A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
WO2023207836A1 (en) Image encoding method and apparatus, and image decompression method and apparatus
WO2022116856A1 (en) Model structure, model training method, and image enhancement method and device
WO2021155832A1 (en) Image processing method and related device
WO2022179588A1 (en) Data coding method and related device
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
WO2022028197A1 (en) Image processing method and device thereof
US20240078414A1 (en) Parallelized context modelling using information shared between patches
WO2023174256A1 (en) Data compression method and related device
CN115409697A (en) Image processing method and related device
WO2023051335A1 (en) Data encoding method, data decoding method, and data processing apparatus
Fraihat et al. A novel lossy image compression algorithm using multi-models stacked AutoEncoders
TWI826160B (en) Image encoding and decoding method and apparatus
TW202348029A (en) Operation of a neural network with clipped input data
CN114501031B (en) Compression coding and decompression method and device
WO2023177318A1 (en) Neural network with approximated activation function
CN113554719B (en) Image encoding method, decoding method, storage medium and terminal equipment
CN115409150A (en) Data compression method, data decompression method and related equipment
WO2023040745A1 (en) Feature map encoding method and apparatus and feature map decoding method and apparatus
US20240221230A1 (en) Feature map encoding and decoding method and apparatus
WO2024007820A1 (en) Data encoding and decoding method and related device
CN118318441A (en) Feature map encoding and decoding method and device
TW202345034A (en) Operation of a neural network with conditioned weights
WO2023121499A1 (en) Methods and apparatus for approximating a cumulative distribution function for use in entropy coding or decoding data
EP4396942A1 (en) Methods and apparatus for approximating a cumulative distribution function for use in entropy coding or decoding data
CN114693811A (en) Image processing method and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795269

Country of ref document: EP

Kind code of ref document: A1