WO2023207836A1

WO2023207836A1 - Image encoding method and apparatus, and image decompression method and apparatus

Info

Publication number: WO2023207836A1
Application number: PCT/CN2023/090043
Authority: WO
Inventors: 康宁; 仇善召; 张鸣天; 张世枫; 李震国
Original assignee: 华为技术有限公司
Priority date: 2022-04-26
Filing date: 2023-04-23
Publication date: 2023-11-02
Also published as: CN115022637A

Abstract

The present application provides an image encoding method and apparatus, and an image decompression method and apparatus, relating to computer vision in the field of artificial intelligence, and used for encoding in combination with the output of an autoregressive model and the output of an autoencoding model, reducing the sizes of the required models and improving the encoding and decoding efficiency. The image encoding method comprises: taking an input image as the input of the autoregressive model, and outputting a first image; obtaining a residual between the first image and the input image to obtain a first residual image; taking the input image as the input of the autoencoding model, and outputting a hidden variable and a first residual distribution, the hidden variable comprising features extracted from the input image, and the first residual distribution comprising a residual value corresponding to each pixel point in the input image outputted by the autoencoding model; encoding the first residual image and the first residual distribution to obtain residual encoded data; and encoding the hidden variable to obtain hidden variable encoded data, the hidden variable encoded data and the residual encoded data being decompressed to obtain the input image.

Description

An image encoding method, image decompression method and device

This application claims the priority of the Chinese patent application submitted to the China Patent Office on April 26, 2022, with the application number "202210447177.0" and the application title "An image encoding method, image decompression method and device", and its entire content is approved by This reference is incorporated into this application.

Technical field

The present application relates to the field of image processing, and in particular, to an image encoding method, image decompression method and device.

Background technique

Images are widely used in various fields, and the transmission or storage of images may be involved in a large number of scenarios. And with the higher the resolution of the image, more storage space is consumed when saving the image, the higher the bandwidth required when transmitting the image, and the lower the transmission efficiency. Therefore, in general, in order to facilitate the transmission or storage of images, images can be compressed to reduce the number of bits occupied by the image, thereby reducing the storage space required to save the image and the bandwidth required to transmit the image.

For example, in some common image compression methods, entropy coding can be used for image compression. For example, commonly used entropy coding algorithms include Huffman coding, arithmetic coding, ANS coding, etc. for image compression. However, the compression rates of various entropy coding methods have reached the optimal level, and it is difficult to further improve the compression rate. Therefore, how to improve encoding and decoding efficiency has become an urgent problem to be solved.

Contents of the invention

This application provides an image encoding method, an image decompression method and a device for encoding by combining the output of an autoregressive model and an autoencoding model, reducing the size of the required model and improving encoding and decoding efficiency.

In view of this, in a first aspect, the present application provides an image coding method, which includes: using the input image as the input of the autoregressive model, outputting the first image; obtaining the residual between the first image and the input image, and obtaining the first Residual image; use the input image as the input of the autoencoding model, output latent variables and the first residual distribution, the latent variables include features extracted from the input image, and the first residual distribution includes the predictions of the autoencoding model for representation Input the residual value corresponding to each pixel point in the image and each pixel point in the first residual image; encode the first residual image and the first residual distribution to obtain the residual encoded data; encode the latent variable to obtain Latent variable encoded data, latent variable encoded data and residual encoded data are used to obtain the input image after decompression.

Therefore, in this application, the output results of the autoregressive model and the autoencoding model are combined for coding, which can control both the autoencoding and the autoregressive models to a very small size and avoid the long inference time caused by the large network of the autoencoding model. problem to achieve efficient image compression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.

In a possible implementation, the aforementioned encoding of the first residual image and the first residual distribution to obtain residual encoded data includes: using the first residual image and the first residual distribution as semi-dynamic entropy The input of the encoder is to output residual encoded data. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder is used to perform entropy encoding. The second preset type of encoding operation is not included in the entropy encoder. calculation, the second preset type includes at least one of multiplication, division or remainder operation, that is, the semi-dynamic entropy encoder does not include multiplication, division or remainder operation and other long-consuming operations, that is, the semi-dynamic entropy Only simple addition and subtraction operations can be included in the encoder, allowing efficient encoding.

Therefore, in the embodiment of the present application, the residual image can be semi-dynamic entropy encoded and encoded in a limited distribution manner. Compared with dynamic entropy encoding, time-consuming operations such as multiplication, division and remainder operations are reduced. The coding efficiency is greatly improved.

In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using the semi-dynamic entropy encoder for entropy encoding, simple operations can be used, Efficient encoding operations such as addition, subtraction, bit operations, etc., to achieve efficient encoding.

In a possible implementation, the aforementioned coding of latent variables to obtain residual coded data may include: using latent variables as inputs of a static entropy encoder to obtain latent variable coded data.

Therefore, in the embodiment of the present application, static entropy coding can be performed on the features extracted from the input image, and coding can be achieved efficiently.

In a possible implementation, the autoencoding model may include an encoding model and a decoding model, using the input image as the input of the autoencoding model, and outputting the latent variable and the first residual distribution, including: using the input image as the input of the encoding model. , output latent variables, and the encoding model is used to extract features from the input graphics; the latent variables are used as input to the decoding model to obtain the first residual distribution, and the decoding model is used to predict the residual between the input image and the corresponding pixel distribution. value.

In the implementation of the present application, a trained autoencoding model can be used to extract important features from the input image and predict the corresponding residual image, so as to combine the output of the autoregressive model to obtain a residual that can represent the data in the input image. Poorly encoded data.

In a possible implementation, the autoregressive model is used to predict the values of pixels on the same connection using the pixel values of the predicted pixels, so that in the subsequent decoding process, for the pixels on the same connection , there is no need to wait for other pixels to be decoded before the current pixel can be decoded, achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.

In a second aspect, this application provides an image decompression method, which includes: obtaining latent variable coded data and residual coded data. The latent variable coded data includes features extracted by the coding end from the input image and is obtained by coding. The residual coded data The data includes the residual error between the forward propagation output image of the autoregressive model and the input image, which is encoded; the latent variable encoded data is decoded to obtain the latent variable. The latent variable includes the features extracted by the encoding end from the input image. ; Use the latent variable as the input of the autoencoding model and output the second residual distribution; combine the second residual distribution and the residual encoded data for decoding to obtain the second residual image; use the second residual image as the autoregressive model Backpropagation takes input and outputs a decompressed image.

Therefore, in the implementation of this application, the autoencoding model usually has poor fitting ability and requires a deeper network to achieve Better compression rate, and this application combines the output results of the autoregressive model, thereby reducing the size of the autoencoding model. Therefore, in this application, the autoregressive model and the autoencoding model are combined for decoding, and both the autoencoding and the autoregressive models can be controlled to be very small, thus avoiding the problem of too long inference time caused by an excessively large network of the autoencoding model. Enable efficient image decompression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.

In a possible implementation, the aforementioned decoding of latent variable encoded data to obtain latent variables includes: using latent variable encoded data as input to a static entropy encoder and outputting latent variables. Among them, this decoding can be understood as the inverse operation of static entropy coding performed by the encoding end, so that important features in the image can be obtained by lossless recovery.

In a possible implementation, the aforementioned combination of the second residual distribution and the residual coded data for decoding to obtain the second residual image includes: using the second residual distribution and the residual coded data as semi-dynamic entropy coding The semi-dynamic entropy encoder is used as input to the encoder to output a second residual image. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder is The entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of multiplication, division, or remainder operations, that is, the semi-dynamic entropy encoder does not include multiplication, division, or remainder operations. The semi-dynamic entropy encoder can only include simple addition and subtraction operations to achieve high-efficiency encoding. Therefore, the residual image can be decoded based on semi-dynamic entropy coding and decoded in a limited distribution manner. Compared with dynamic entropy coding, time-consuming operations such as multiplication, division and remainder operations are reduced, and the decoding efficiency is greatly improved. efficiency.

In a possible implementation, the aforementioned second residual image is used as the input of the back propagation of the autoregressive model, and outputting the decompressed image includes: using the autoregressive model, the second residual image is on the same line The pixels on the image are decoded in parallel to obtain the decompressed image. Therefore, for pixels on the same connection, there is no need to wait for other pixels to be decoded before the current pixel can be decoded, thereby achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image.

In a third aspect, this application provides an image coding device, including:

The autoregressive module is used to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;

The residual calculation module is used to obtain the residual between the first image and the input image to obtain the first residual image;

The autoencoding module is used to use the input image as the input of the autoencoding model, and output latent variables and a first residual distribution. The latent variables include features extracted from the input image, and the first residual distribution includes the user output of the autoencoding model. Yu represents the residual value corresponding to each pixel point in the input image and each pixel point in the first residual image;

A residual coding module, used to code the first residual image and the first residual distribution to obtain residual coded data;

The latent variable encoding module is used to encode latent variables to obtain latent variable encoded data. The latent variable encoded data and residual encoded data are used to obtain the input image after decompression.

In a possible implementation, the residual coding module is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy encoder and output residual encoded data. The semi-dynamic entropy encoder uses When performing entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation. The default type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-term operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder can Only simple addition and subtraction operations are included, allowing for efficient encoding.

In a possible implementation, the latent variable encoding module is specifically configured to use latent variables as inputs to the static entropy encoder to obtain latent variable encoded data.

In a possible implementation, the auto-encoding model includes an encoding model and a decoding model. The auto-encoding module is specifically used to: use the input image as the input of the encoding model, output latent variables, and the encoding model is used to extract features from the input graphics. ; Use the latent variable as the input of the decoding model to obtain the first residual distribution. The decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.

In a possible implementation, the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.

In a fourth aspect, this application provides an image decompression device, including:

The transceiver module is used to obtain latent variable encoding data and residual encoding data. The latent variable encoding data includes encoding the features extracted from the input image by the encoding end. The residual encoding data includes the image output by the autoregressive model and The data obtained by encoding the residual between the input images;

The latent variable decoding module is used to decode the latent variable encoded data to obtain the latent variable. The latent variable includes the features extracted by the encoding end from the input image;

The autoencoding module is used to use latent variables as the input of the autoencoding model and output the second residual distribution;

The residual decoding module is used to decode the second residual distribution and the residual coded data to obtain the second residual image;

The autoregressive module is used to use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.

In a possible implementation, the latent variable decoding module is specifically configured to use the latent variable encoded data as the input of the static entropy encoder and output the latent variable.

In a possible implementation, the residual decoding module is specifically configured to use the second residual distribution and residual coded data as inputs of a semi-dynamic entropy encoder, and output a second residual image. The semi-dynamic entropy encoder uses To use the first default class Entropy encoding is performed using a type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operations, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation. The second preset type includes multiplication, At least one of division or remainder operations, that is, the semi-dynamic entropy encoder does not include long time-consuming operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder may only include simple addition and subtraction. operations, thereby enabling efficient encoding.

In a possible implementation, the autoregressive module is specifically configured to decode pixels on the same connection line in the second residual image in parallel through the autoregressive model to obtain the decompressed image.

In a fifth aspect, embodiments of the present application provide an image coding device, which has the function of implementing the image processing method in the first aspect. This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In a sixth aspect, embodiments of the present application provide an image decompression device, which has the function of implementing the image processing method in the second aspect. This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In a seventh aspect, embodiments of the present application provide an image encoding device, including: a processor and a memory, wherein the processor and the memory are interconnected through lines, and the processor calls the program code in the memory to execute any one of the above first aspects. Shown are processing-related functions used in image coding methods. Alternatively, the image encoding device may be a chip.

In an eighth aspect, embodiments of the present application provide an image decompression device, including: a processor and a memory, wherein the processor and the memory are interconnected through lines, and the processor calls the program code in the memory to execute any one of the above second aspects. Shown are the processing-related functions used in the image decompression method. Optionally, the image decompression device may be a chip.

In the ninth aspect, embodiments of the present application provide an image encoding device. The image encoding device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are The processing unit executes, and the processing unit is configured to perform processing-related functions in the above-mentioned first aspect or any optional implementation manner of the first aspect.

In a tenth aspect, embodiments of the present application provide an image decompression device. The image encoding device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are The processing unit executes, and the processing unit is configured to perform processing-related functions as in the above-mentioned second aspect or any optional implementation manner of the second aspect.

In an eleventh aspect, an embodiment of the present application provides an image processing system, which is characterized in that it includes an image encoding device and an image decompression device, and the image encoding device is configured to perform the above-mentioned first aspect or any one of the first aspects. In the optional embodiment, the image decompression device is configured to perform processing-related functions as described in the above second aspect or any one of the second aspects. Select processing-related functions in the implementation.

In a twelfth aspect, embodiments of the present application provide a computer-readable storage medium, including instructions that, when run on a computer, cause the computer to execute any of the optional implementations of the first aspect or the second aspect. method.

In a thirteenth aspect, embodiments of the present application provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the method in any optional implementation of the first aspect or the second aspect.

Description of the drawings

Figure 1 is a schematic diagram of an artificial intelligence subject framework applied in this application;

Figure 2 is a schematic diagram of a system architecture provided by an embodiment of the present application;

Figure 3 is a schematic diagram of an application scenario according to the embodiment of the present application;

Figure 4 is a schematic diagram of another application scenario according to the embodiment of the present application;

Figure 5 is a schematic diagram of another application scenario according to the embodiment of the present application;

Figure 6 is a schematic flowchart of an image encoding method provided by an embodiment of the present application;

Figure 7 is a schematic flow chart of another image encoding method provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a prediction method of an autoregressive model provided by an embodiment of the present application;

Figure 9 is a schematic diagram of the prediction sequence of an autoregressive model provided by an embodiment of the present application;

Figure 10 is a schematic diagram of a residual calculation method provided by the embodiment of the present application;

Figure 11 is a schematic diagram of a data structure provided by an embodiment of the present application;

Figure 12 is a schematic flow chart of an image decompression method provided by an embodiment of the present application;

Figure 13 is a schematic flow chart of another image decompression method provided by an embodiment of the present application;

Figure 14 is a schematic structural diagram of an image coding device provided by this application;

Figure 15 is a schematic structural diagram of an image decoding device provided by this application;

Figure 16 is a schematic structural diagram of another image coding device provided by the present application;

Figure 17 is a schematic structural diagram of another image decoding device provided by this application;

Figure 18 is a schematic structural diagram of a chip provided by this application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a structural schematic diagram of the artificial intelligence main framework. The following is from the "intelligent information chain" (horizontal axis) and "IT value chain" ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" ranges from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to systems The industrial ecological process reflects the value that artificial intelligence brings to the information technology industry.

(1)Infrastructure

Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by smart chips, such as central processing unit (CPU), neural-network processing unit (NPU), graphics processing unit (GPU), dedicated integration Hardware acceleration chips such as application specific integrated circuit (ASIC) or field programmable gate array (FPGA) are provided; the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include Cloud storage and computing, interconnection network, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)Data

Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.

Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4) General ability

After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.

(5) Intelligent products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.

The embodiments of the present application involve a large number of related applications of neural networks and images. In order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts in the fields of neural networks and images that may be involved in the embodiments of the present application are first introduced below.

(1)Neural network

The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input. The output of the arithmetic unit can be as shown in the following formula:

Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(2) Deep neural network

Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple intermediate layers. DNN is divided according to the positions of different layers. The neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all intermediate layers, or hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN looks complex, each layer of it can be expressed as a linear relationship expression: in, is the input vector, is the output vector, is the offset vector or bias parameter, w is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and offset vector The number is also relatively large. The definitions of these parameters in DNN are as follows: Taking the coefficient w as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

To sum up, the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as

It should be noted that the input layer has no W parameter. In a deep neural network, more intermediate layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).

(3) Convolutional neural network

Convolutional neural network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter. The convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal. In the convolutional layer of a convolutional neural network, a neuron can be connected to only some of the neighboring layer neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information independent of position. The convolution kernel can be initialized in the form of a matrix of random size. During the training of convolutional neural network In the process, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4)Loss function

In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value that we really want to predict, we can compare the predicted value of the current network with the really desired target value, and then based on the difference between the two to update the weight vector of each layer of the neural network according to the difference (of course, there is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in the deep neural network). For example, if the predicted value of the network If it is high, adjust the weight vector to make the prediction lower, and continue to adjust until the deep neural network can predict the really desired target value or a value that is very close to the really desired target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is an important function for measuring the difference between the predicted value and the target value. equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible. The loss function can usually include error square mean square, cross entropy, logarithm, exponential and other loss functions. For example, one can use the error mean square as the loss function, defined as The specific loss function can be selected according to the actual application scenario.

(5)Back propagation algorithm

The neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.

(6)Entropy coding

Entropy coding refers to coding that does not lose any information according to the entropy principle during the coding process. Information entropy is the average amount of information in the source (a measure of uncertainty). Common entropy codes include: Shannon coding, Huffman coding, arithmetic coding, etc.

For example, if the pixel value distribution of each pixel in the predicted image is known, the optimal compression scheme can be obtained using entropy coding technology. Using entropy coding technology, an image with probability p can be represented by -log ₂ p bits. For example: an image with a probability of 1/8 needs to be represented by 3 bits, and an image with a probability of 1/256 needs to be represented by 8 bits.

To determine the number of bits for each letter, the algorithm needs to know the probability of each letter appearing as accurately as possible, and the model's job is to provide this data. The better the predictions of the model, the better the compression results. Furthermore the model must present the same data during compression and recovery.

The static model (or static entropy coding) analyzes the entire text to calculate the probability of each letter before compression. The result of this calculation is applied to the entire text. The encoding table only needs to be calculated once, so the encoding speed is high, and the result will definitely not be longer than the original text except for the probability value required during decoding. In the method provided by this application, the entropy coding used may include static entropy coding methods such as tANS or fse.

Dynamic model In this model, the probabilities continuously change during the encoding process. This goal can be achieved through a variety of algorithms, such as:

Forward dynamics: The probability is calculated based on the letters that have been encoded. Each time a letter is encoded, its probability increases.

Inverse dynamics: before encoding, calculate the probability of each letter in the remaining unencoded part. As the encoding proceeds, more and more letters no longer appear, and their probabilities become 0, while the probabilities of the remaining letters increase, and the number of bits encoded for them decreases. The compression ratio keeps increasing so that the last letter only requires 0 bits to encode.

Therefore, the model is optimized according to the specificity of different parts; probabilistic data does not need to be transmitted in the forward model.

In this application, entropy coding is divided into many types. For example, it can be divided into static entropy coding, semi-dynamic entropy coding and dynamic entropy coding. No matter which encoder is used, the purpose achieved is: for data with probability p, use a method close to The length of log ₂ p encodes it. The difference is that static entropy coding uses a single probability distribution for coding, semi-dynamic coding uses multiple (i.e. limited types) probability distributions for coding, and dynamic entropy coding uses any infinite types of probability distributions for coding.

(7)Autoregressive model

It is a way of processing time series that uses previous historical data of the same variable to predict current data.

For example, the same variable such as x in previous periods, that is, x ₁ to x _t-1 , is used to predict the performance of x _t in the current period, and it is assumed that they are a linear relationship. Because this is developed from linear regression in regression analysis, it just does not use x to predict y, but uses x to predict x; so it is called autoregression.

(8)Autoencoding model

The autoencoding model is a neural network that uses the backpropagation algorithm to make the output value equal to the input value. It first compresses the input data into a latent space representation, and then reconstructs the output through this representation.

Autoencoding models usually include encoding (encoder) models and decoder (decoder) models. In this application, the trained encoding model is used to extract features from the input image to obtain latent variables. The latent variables are input to the trained decoding model to output the predicted residual corresponding to the input image.

(9)Lossless compression

A technology that compresses data. After compression, the data takes up less space than before compression, and the compressed data can be decompressed to restore the original data. The decompressed data is completely consistent with the data before compression.

Generally, the greater the probability of occurrence of each pixel in the image (that is, the probability value obtained when the pixel value of the current pixel is predicted by the pixel value of other pixels), the shorter the compressed length will be. The probability of a real image is much higher than that of a randomly generated image, so the number of bits per pixel (bpd) required for compression is much smaller than the latter. In practical applications, the BPD of most images is significantly smaller than before compression, and has only a very small probability to be higher than before compression, thus reducing the average bpd of each image.

(10)Compression rate

The ratio of the original data size to the compressed data size. If there is no compression, the value is 1. The larger the value, the better.

(11)Throughput

The size of raw data that can be compressed/decompressed per second.

(12) Receptive field

When predicting a pixel, the point needs to be known in advance. Changing points in the non-receptive field does not change the prediction of the pixel.

The encoding method and decoding method provided by the embodiments of this application can be executed on the server or on the terminal device. Correspondingly, the neural network mentioned below in this application can be deployed on the server or on the terminal. , which can be adjusted according to actual application scenarios. For example, the encoding method and decoding method provided by this application can Deployed in the terminal through plug-ins. The terminal device may be a mobile phone with image processing function, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), or a personal digital assistant (PDA). ), personal computer (PC), camera, camcorder, smart watch, wearable device (WD) or self-driving vehicle, etc., the embodiments of the present application are not limited to this. The following is an exemplary description taking the encoding method and decoding method provided by this application being deployed on a terminal as an example.

All or part of the processes in the encoding method and decoding method provided by this application can be implemented through neural networks. For example, the autoregressive model, autoencoding model, etc. can be implemented through neural networks. Generally, the neural network needs to be deployed on the terminal after training. As shown in Figure 2, this embodiment of the present application provides a system architecture 100. In Figure 2, data collection device 160 is used to collect training data. In some optional implementations, in this application, for the autoregressive model and the autoencoding model, the training data may include a large number of high-definition images.

After collecting the training data, the data collection device 160 stores the training data into the database 130, and the training device 120 trains to obtain the target model/rules 101 based on the training data maintained in the database 130. Optionally, the training set mentioned in the following embodiments of this application may be obtained from the database 130 or may be obtained through user input data.

The target model/rule 101 may be a neural network trained in the embodiment of the present application. The neural network may include one or more networks, such as an autoregressive model or an autoencoding model.

The following describes how the training device 120 obtains the target model/rules 101 based on the training data. The training device 120 processes the input three-dimensional model and compares the output image with the high-quality rendering image corresponding to the input three-dimensional model until the training device The difference between the output image 120 and the high-quality rendered image is less than a certain threshold, thereby completing the training of the target model/rule 101.

The above target model/rule 101 can be used to implement the neural network mentioned in the encoding method and decoding method in the embodiment of the present application, that is, the data to be processed (such as the image to be compressed) is input to the target after relevant preprocessing. Model/Rule 101, you can get the processing results. The target model/rule 101 in the embodiment of this application may specifically be the neural network mentioned below in this application, and the neural network may be the aforementioned CNN, DNN or RNN type of neural network. It should be noted that in actual applications, the training data maintained in the database 130 may not necessarily be collected by the data collection device 160, but may also be received from other devices. In addition, it should be noted that the training device 120 may not necessarily train the target model/rules 101 based entirely on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training, which is not limited in this application. .

The target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in Figure 2. The execution device 110 can also be called a computing device. The execution device 110 It can be a terminal, such as a mobile phone terminal, a tablet, a laptop, an augmented reality (AR)/virtual reality (VR), a vehicle-mounted terminal, etc. It can also be a server or cloud device, etc. In Figure 2, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices. The user can input data to the I/O interface 112 through the client device 140. In this embodiment of the present application, the input data may include: data to be processed input by the client device. The client can be other hardware devices, such as terminals or servers, etc. The client can also be software deployed on the terminal, such as APPs, web pages, etc.

The preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as data to be processed) received by the I/O interface 112. In the embodiment of the present application, the preprocessing module 113 and the preprocessing module may not be present. 114 (there can also be only one preprocessing module), and the calculation module 111 is directly used to process the input data.

When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processes, the execution device 110 can call data, codes, etc. in the data storage system 150 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150 .

Finally, the I/O interface 112 returns the processing result to the client device 140 to provide it to the user. For example, if the first neural network is used for image classification and the processing result is a classification result, the I/O interface 112 The classification results obtained above are returned to the client device 140 to provide them to the user.

It should be noted that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the The above tasks, thereby providing the user with the desired results. In some scenarios, the execution device 110 and the training device 120 may be the same device, or located within the same computing device. To facilitate understanding, this application will introduce the execution device and the training device separately, which is not a limitation.

In the situation shown in FIG. 2 , the user can manually set the input data, and the manual setting can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send input data to the I/O interface 112. If requiring the client device 140 to automatically send input data requires the user's authorization, the user can set corresponding permissions in the client device 140. The user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be display, sound, action, etc. The client device 140 can also serve as a data collection end, collecting the input data input to the I/O interface 112 as shown in the figure and the predicted labels output from the I/O interface 112 as new sample data, and stored in the database 130 . Of course, it is also possible to collect without going through the client device 140. Instead, the I/O interface 112 directly uses the input data input to the I/O interface 112 as shown in the figure and the predicted label output from the I/O interface 112 as a new sample. The data is stored in database 130.

It should be noted that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2, the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.

As shown in Figure 2, the target model/rule 101 is obtained by training according to the training device 120. In the embodiment of the present application, the target model/rule 101 can be the neural network in the present application. Specifically, the neural network provided in the embodiment of the present application It can include CNN, deep convolutional neural networks (DCNN), recurrent neural network (RNN) or constructed neural networks, etc.

The encoding method and decoding method in the embodiment of the present application can be executed by an electronic device, which is the aforementioned execution device. This electronic device includes a CPU and a GPU that can compress images. Of course, other devices may also be included, such as NPU or ASIC, etc. This is only an illustrative description and will not be repeated one by one. Illustratively, the electronic device may be a mobile phone (mobile phone), tablet computer, notebook computer, PC, mobile internet device (mobile internet device, MID), wearable device, virtual reality (VR) device, augmented reality device, etc. Augmented reality (AR) equipment, wireless electronic equipment in industrial control (industrial control), Wireless electronic devices in self-driving, wireless electronic devices in remote medical surgery, wireless electronic devices in smart grid, wireless electronic devices in transportation safety, Wireless electronic devices in smart cities, wireless electronic devices in smart homes, etc. The electronic device can be a device running Android system, IOS system, Windows system and other systems. The electronic device can run applications that need to compress images to obtain compressed images, such as communication software, photo albums or camera applications.

Generally, in some image compression scenarios, entropy coding can be used for compression. The distribution of the image is unknown, so the original distribution needs to be estimated, and the estimated distribution is input into the entropy encoder for encoding. Usually the more accurate the estimate, the higher the compression rate. Traditional image lossless compression algorithms mostly adopt the principle of "similar pixel values are usually closer" and use fixed prediction methods. The encoding efficiency of this method is low.

In some scenarios, AI image lossless compression can also be used for compression. Compared with traditional encoding algorithms, AI algorithms can achieve significantly higher compression rates, but the compression/decompression efficiency is very low.

For example, autoregressive models can be used for image compression. If you build an autoregressive model and input the values of all previous pixels, you can output the distribution parameters of the predicted points. If the distribution is a Gaussian distribution, the output is the two parameters of mean and variance. When using the autoregressive model for compression, all pixels can be input to the autoregressive model to obtain the distribution prediction of the pixel, and the distribution prediction of the pixel and the value of the pixel can be input into the entropy encoder to obtain the encoded data. During decompression, all pixels are input to the autoregressive model to obtain the distribution prediction of the pixels. The distribution prediction and its encoding data are input to the entropy encoder to obtain the decoded data. However, during the encoding and decoding process, the prediction of each pixel relies on all previous pixels, which results in low operating efficiency. During decompression, all pixels before the current pixel need to be decompressed before the current pixel can be decompressed. Only one network inference can be decompressed at a time. pixels, the number of network inferences is large, and the decompression efficiency is low.

As another example, an autoencoding model can be used for image compression. When encoding, the original data is input into the encoding network (Encoder) to obtain the hidden variables, and the hidden variables are input into the decoding network (Decoder) to obtain the distribution prediction of the image; the manually designed distribution and the value of the hidden variable are input into the entropy encoding, Encoding latent variables; input the distribution prediction of the image and the original image into entropy coding to encode the image. When decoding, the hand-designed distribution and the encoding of the hidden variables are input into the entropy encoding, and the hidden variables are decoded; the hidden variables are input into the decoding network (Decoder) to obtain the distribution prediction of the image; the distribution prediction of the image and the encoding of the image are input Entropy coding,decoding images. Compared with autoregressive models, autoencoding models have poorer fitting capabilities. If the compression rate exceeds that of traditional compression algorithms, a deeper network is required, and the latency of a single network inference is high.

Therefore, this application provides an encoding method and a decoding method that utilizes autoregressive models and autoencoder models for lossless compression, and provides an efficient semi-dynamic entropy encoder so that both the model inference and encoding processes run on the AI chip. It reduces the transmission between system memory and AI chip memory, and achieves high-bandwidth compression and decompression.

First, for ease of understanding, some application scenarios of the encoding method and decoding method provided in this application are exemplarily introduced.

Scenario 1. Save captured images locally

Taking the method provided by this application deployed on a terminal as an example, the terminal may include a mobile phone, a camera, a monitoring device, or other devices with a shooting function or connected to a camera device. For example, as shown in Figure 3, after an image is captured, as an example to reduce the storage space occupied by the image, the image can be losslessly compressed through the encoding method provided by this application, thereby obtaining compressed encoded data. When you need to read the image, such as when displaying the image in a photo album, you can Decode through the decoding method provided in this application to obtain high-definition images. Through the method provided by this application, images can be efficiently and losslessly compressed, reducing the content required to save the image, losslessly restoring the image, and decompressing the image to obtain a high-definition image.

Scenario 2. Image transmission

In some communication scenarios, image transmission may be involved. For example, as shown in Figure 4, when users use communication software to communicate, they can transmit images through wired or wireless networks. In order to increase the transmission rate and reduce the network resources occupied by transmitted images, the encoding method provided by this application can be used to The image is losslessly compressed to obtain the compressed encoded data, and then the encoded data is transmitted. After receiving the encoded data, the receiving end can decode the encoded data through the decoding method provided by this application to obtain the restored image.

Scenario 3: The server saves a large number of images

In some platforms or databases that provide services to users, it is usually necessary to save a large number of high-definition images. If you save them directly according to the pixels of each frame of the image, it will take up a very large storage space. For example, as shown in Figure 5, some shopping software or public data sets need to save a large number of high-definition images in the server, and users can read the required images from the server. The encoding method provided by this application can be used to efficiently perform lossless compression on images that need to be saved, and obtain compressed data. When an image needs to be read, the saved encoded data can be decoded through the decoding method provided by this application to obtain a high-definition image.

For ease of understanding, the processes of the encoding method and decoding method provided by this application are introduced below.

Referring to Figure 6, a schematic flow chart of an encoding method provided by this application is as follows.

601. Use the input image as the input of the autoregressive model and output the first image.

Among them, the input image can be an image to be compressed, and the autoregressive model can be used to use the values of other pixels in the input image except the current pixel to predict the pixel value of the current pixel, and obtain the predicted pixels of each pixel. distribution, that is, the first image.

The input image may include a variety of images, and the sources of the input images may be different depending on the scene. For example, the input image may be a photographed image, a received image, etc.

Optionally, during the prediction process of the autoregressive model, for pixels on the same connection, the pixel values of the predicted pixels can be used for prediction, so that in the subsequent decoding process, for the pixels on the same connection, pixels, there is no need to wait for other pixels to be decoded before the current pixel can be decoded, achieving decoding efficiency for pixels on the same connection and improving decoding efficiency for the input image. The same connection can be the same row, the same column, the same diagonal, etc., which can be determined according to the actual application scenario.

602. Obtain the residual between the first image and the input image to obtain the first residual image.

After obtaining the first image, the residual value between each pixel point in the first image and the corresponding pixel point in the input image can be calculated to obtain the first residual image.

Wherein, the resolution between the first image and the input image is usually the same, that is, the pixel points in the first image and the input image correspond one to one. Therefore, when calculating the residual value, the residual value between each pair of pixel points can be calculated. Difference value, the obtained residual value can form an image, that is, the first residual image.

Optionally, when calculating the residual, the residual value is usually an integer type in the range [-255, 255]. The residual value can be converted to a low-precision numerical type to represent it, such as converting the integer to a uint8 numerical type. , thereby reducing the value to [0, 255], and by setting the offset, the residual values of each pixel point are distributed near 128, thereby making the data more concentrated, and the input image can be expressed with less data and the residual distribution between the autoregressive model output images.

603. Use the input image as the input of the autoencoding model, and output the latent variable and first residual distribution.

After obtaining the input image, the input image can also be used as the input of the autoencoding model to output the corresponding latent variable and first residual distribution.

The latent variable may include features extracted from the input image, and the first residual distribution may include the residual between each pixel of the input image predicted by the autoencoding model and the corresponding pixel in the first residual image. value.

Specifically, the autoencoding model may include an encoding model and a decoding model. The encoding model may be used to extract features from the input image, and the decoding model may be used to predict the residual between the input image and the image output by the autoregressive model. That is, features can be extracted from the input image through the encoding model to obtain latent variables used to represent important features of the input image. The latent variables are used as input to the decoding model to output the first residual distribution.

It should be noted that this application does not limit the execution order of step 601 and step 603. Step 601 may be executed first, step 603 may be executed first, or step 601 and step 603 may be executed simultaneously. The details may be adjusted according to the actual application scenario.

604. Encode the first residual image and the first residual distribution to obtain residual encoded data.

After obtaining the first residual image and the first residual distribution, the first residual image and the first residual distribution may be encoded to obtain residual encoded data.

Specifically, when encoding the first residual image and the first residual distribution, semi-dynamic entropy coding can be used, that is, a limited kind of probability distribution is used for encoding to obtain the encoded data of the residual image, that is, the residual encoded data. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type. Assuming a type of encoding operation, the second preset type includes at least one of multiplication, division or remainder operation that takes a long time to improve encoding efficiency. Therefore, in the embodiment of the present application, a limited number of probability distributions can be used for encoding to obtain the encoding of the residual image. Compared with dynamic entropy encoding, decompressing a character requires more instructions; division and exponentiation are time-consuming, and each instruction takes dozens of times as much as addition. Through the semi-dynamic entropy of the limited probability distribution provided by this application Encoding can achieve efficient encoding and improve encoding efficiency.

605. Encode the latent variables to obtain residual encoded data.

The latent variable can include important features extracted from the input image, so when performing image compression, it can The extracted important features are encoded to obtain residual encoded data, which facilitates subsequent image restoration and obtains a lossless image.

Optionally, when encoding latent variables, static entropy coding can be used. The latent variable is taken as input to the static entropy encoder, which outputs an encoded bitstream of the latent variable.

The latent variable encoded data and residual encoded data can be used at the decoder to perform lossless image restoration, thereby achieving lossless compression and restoration of the image.

Generally, autoencoding models usually have poor fitting capabilities and require a deeper network to achieve a better compression rate. However, this application combines the output results of the autoregressive model, thereby reducing the size of the autoencoding model. Therefore, in this application, the output results of the autoregressive model and the autoencoding model are combined for coding, which can control both the autoencoding and the autoregressive models to a very small size and avoid the long inference time caused by the large network of the autoencoding model. problem to achieve efficient image compression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.

The process of the encoding method provided by this application has been introduced above. The process of the encoding method provided by this application will be introduced in more detail below based on specific application scenarios. Refer to Figure 7, which is a schematic flow chart of another encoding method provided by this application.

First, the input image 701 is obtained.

The input image 701 may include an image collected by itself or a received image. For example, if the method provided by this application is deployed on a terminal, the input image may include an image collected by the terminal, or may be an image received by the terminal from other servers or terminals.

Subsequently, the input image 701 is used as the input of the autoregressive model 702, and a predicted image 703 is output.

The autoregressive model can be used to predict the pixel probability distribution of each pixel using the adjacent pixels of each pixel to obtain the predicted image 703, which is the aforementioned first image.

It can be understood that the autoregressive model can use the pixel values of adjacent pixels to predict the pixel value of the current pixel.

In the embodiment of the present application, in order to speed up the decoding efficiency of the decoder, when the autoregressive model performs prediction, for the pixels on the same line, the pixel values of adjacent pixels can be used for prediction in parallel. Taking a specific autoregressive model as an example, as shown in Figure 8, given an m×n image and hyperparameter h (0≤h<n), if for any pixel (i,j), automatically All points (i′, j′) predicting (i, j) in the regression model satisfy: h×i′+j′<h×i+j, then this image can be parallelized n+(m-1)×h times Prediction, as shown in Figure 8, when h=1, for pixels on the same diagonal, the pixel values of multiple pixels on the left can be selected as the receptive field in units of 1 to predict the current pixel The pixel probability distribution of a point is the probability that the pixel is the value of each pixel. As shown in Figure 8, when h=2, the pixel values of multiple pixels on the left can be selected in units of 2 as receptive fields to predict the pixel probability distribution of the current pixel. In order to facilitate subsequent decompression, pixels on the same diagonal can be decompressed in parallel.

In addition, the prediction order for each pixel can be shown in Figure 9, where the smaller the number, the higher the priority of the prediction order, and pixels with the same number are predicted at the same time. Therefore, pixels on the same diagonal can be predicted in parallel to improve the prediction efficiency of the autoregressive model.

Subsequently, the residual between the predicted image and the input image is calculated to obtain image residual 704.

After obtaining the predicted image 703 output by the autoregression, the predicted image and each pixel in the input image can be calculated. The residual between the points results in image residual 704, which is the aforementioned first residual image.

For example, given the original image x, that is, the input image, use the autoregressive model to predict the original image and obtain the predicted reconstructed image. The image residual between each pixel of the reconstructed image and the original image can be calculated

For example, as shown in Figure 10, after obtaining the input image and the predicted image, the difference between the corresponding pixels of the input image and the predicted image can be calculated to obtain the residual value between each pixel to form a residual image.

Optionally, when calculating the residual, you can convert the residual value to an integer type in the range [-255, 255] and convert it to a low-precision numerical type, such as converting the integer to a uint8 numerical type, thereby converting The value is reduced to [0, 255], and the offset can be set so that the residual values of each pixel are distributed around 128, thereby making the data more concentrated, and the input image and autoregression can be expressed with less data. Distribution of residuals between model output images.

For example, if an autoregressive model is used to input the original image x and output y, then the predicted image x′=round(clip(y,0,M-1)), then the residual is calculated as The value range of each pixel in x′ is an integer from 0 to M-1; use model 2 to predict r to obtain the distribution N (μ, σ), then use distributed coding Where N is Gaussian distribution or logistic distribution.

In addition, the input image is also input to the autoencoding model 705, and the prediction residual 707 and the latent variable 706 are output.

For example, the original image x can be input to the autoencoding model, and the autoencoding model is used to estimate the probability distribution p(r|x) of the image residual r, that is, the prediction residual 707.

Specifically, the autoencoding model can include an encoding model (encoder) and a decoding model (decoder). The input image is used as the input of the encoding model. Important features can be extracted from the input image to obtain the latent variable 706, and then the latent variable can be used as the decoder. The input to the model is the output prediction residual 707.

Usually, the autoencoding model can be a pre-trained model. Specifically, it can use an autoencoder (AutoEncoder, AE), a variational word encoder (Variational AutoEncoder, VAE) or VQ-VAE (Vector Quantised-Variational AutoEncoder), etc., specifically It can be adjusted according to actual application scenarios, and this application does not limit this.

Subsequently, latent variable 706 may be encoded to obtain latent variable encoding 708.

Specifically, the latent variables can be encoded using static entropy coding. That is, the numerical structure is used to represent data with high probability using shorter bit numbers, and data with low probability using longer bits.

For example, the tree structure can be shown in Figure 11, and its corresponding bits can be expressed as shown in Table 1.

Table 1

Therefore, the data a ₁ a ₂ a ₁ a ₄ is encoded as 0100110.

In addition, the image residual 704 and the prediction residual 707 can also be encoded to obtain residual encoding 709.

Specifically, semi-dynamic entropy coding can be performed on the image residual 704 and the prediction residual 707 to obtain residual coding.

For ease of understanding, the difference between dynamic entropy coding and the semi-dynamic entropy coding provided in this application is explained.

First, taking rANS coding as an example, dynamic coding uses state (usually a large integer, or a large or small number) to represent data, and uses the probability information of the data to change the state value. The final coded value is a 0 or 1 representation of the state. In rANS coding, an M value must first be set, which represents the number of bits required to represent a probability. For a character a _i , its corresponding PMF _i is proportional to its probability, and the sum is 2 ^M ; its corresponding CDF _i is the accumulation of all previous PMF values, that is, PMF ₁ +PMF ₂ +...+PMF _i-1 . In the above table, if M=4, the PMF and CDF corresponding to the probability value are as shown in Table 2:

Table 2

If the states before and after compressing a character x are S and S' respectively, then
S'＝S/PMF(x)*2 ^M +CDF(x)+S%PMF(x)

Dynamic entropy coding can also be used as static entropy coding. When the value in the table is a fixed value, it is static entropy coding; when the tables of different symbols are not exactly the same, dynamic entropy coding is needed.

The speed bottlenecks in dynamic entropy coding include: symbol search and operations during decompression: division and remainder operations are the most time-consuming, followed by multiplication. Therefore, in order to reduce the efficiency caused by the wireless probability distribution method in dynamic entropy coding, this application provides a semi-dynamic entropy coding. Based on the aforementioned dynamic entropy coding, that is, the coding formula of rANS, approximate processing is first performed, such as replacing operations such as multiplication, division, and remainder in dynamic entropy coding with approximate lightweight operations such as addition, subtraction, and bitwise operations. Under the premise of losing the compression rate, operations such as multiplication, division, and remainder are greatly reduced or eliminated; and then through a series of transformation processes, all operations that take more than a certain time (such as the remaining remainder, multiplication, division, etc.) are reduced. ) into table access, and lightweight operations such as addition, subtraction, and bitwise operations. It can be understood that the semi-dynamic entropy coding provided by this application can remove all time-consuming operations such as symbol search, multiplication, division and remainder through algorithm transformation and tabular processing, thereby achieving a throughput comparable to static entropy coding.

For example, similar to the commonly used rANS implementation, the state value S is truncated and approximated, but the differences include:

Unlike the usual rANS that truncates S to [2 ^M ,2 ^2M ), a total of 2 ^2M -2 ^M states; this scheme truncates it to [2 ^M ,2 ^M+1 ), a total of 2 ^M states. To achieve a smaller state space and facilitate subsequent tabular processing;

Unlike the usual rANS calculations that use division and remainder calculations, this solution changes it to an approximate solution method of loop + bit operation to further reduce the storage space required for tabulation. The loop in this calculation takes a long time, so after this processing, the time consumption will usually exceed that of the original rANS. However, in subsequent processing, the number of loops will be tabulated to achieve efficient compression and decompression.

During the compression process, for each distribution and symbol, a table is used to precalculate and store the number of cycles (that is, the number of state right shifts), and the difference between the next state and this state under this distribution and symbol. For example, when performing compression, for each input distribution index and symbol, the corresponding δ is obtained by looking up the table, and the state right shift number b=(δ+S)>>M is calculated; Push the rightmost b bit of the state into the memory, and shift the state value to the right by b bits; through the distribution index and symbol, look up the table to get the difference between the next state and this state, and add this difference to the current state On the status value, get the updated status value.

Compared with directly storing the number of loops, this solution stores the intermediate result δ. The number of loops can be calculated as (δ+S)>>M. The encoding method provided by this application can reduce the memory space required to store the table. Compared with directly storing the difference between two states, the semi-dynamic entropy coding method provided by this application stores the difference between two states after the state is shifted to the right. This method can be stored with unsigned numbers, reducing the number by half for the same number of digits. memory space.

After obtaining the residual encoding 709 and the latent variable encoding 708, subsequent operations can be performed. For example, the residual code 709 and the latent variable code 708 are saved, or the residual code 709 and the latent variable code 708 are transmitted to the receiving end. The details can be determined according to the actual application scenario.

Therefore, the method provided by the embodiment of the present application can be applied to image lossless compression to achieve efficient image lossless compression. It also provides an efficient semi-dynamic entropy encoder, allowing the model inference and encoding processes to run on the AI chip, reducing the transmission between system memory and AI chip memory, and achieving high-bandwidth compression and decompression.

The process of the encoding method provided by this application is introduced above, and the process of the corresponding decoding method is introduced below, that is, the inverse operation of the foregoing encoding process. Referring to Figure 12, a schematic flow chart of a decoding method provided by this application is as follows.

1201. Obtain latent variable encoding data and residual encoding data.

Among them, the decoder can read the latent variable encoded data and residual encoded data locally, or receive the latent variable encoded data and residual encoded data sent by the encoding end. Specifically, the latent variable encoded data and residual encoded data can be determined according to the actual application scenario. The source of differentially encoded data is not limited by this application.

Specifically, the latent variable encoding data can be obtained by encoding the features extracted from the input image at the encoding end. The residual coding data may be obtained by encoding the aforementioned image residual and prediction residual at the encoding end. The image residual may include the residual between the input image at the encoding end and the image output by the autoregressive model. The latent variable coded data and residual coded data can be referred to the relevant introductions in Figures 6 to 11 mentioned above, and will not be described again here.

1202. Decode the latent variable encoded data to obtain the latent variable.

Among them, the method of decoding the latent variable encoded data can correspond to the encoding end. For example, if the encoding end uses a static entropy encoder for encoding, the static entropy encoder can be used for decoding during decoding. For example, the latent variable encoded data is used as the input of the static entropy encoder to output the latent variable. Cain variables may include features extracted from the input image. For the decompression end, the cain variables represent features in the decompression image.

1203. Use the latent variable as the input of the autoencoding model and output the second residual distribution.

After decoding the latent variable encoding data to obtain the latent variable, the latent variable can be used as the input of the autoencoding model, and the corresponding second residual distribution is output, that is, the image corresponding to the first residual distribution at the encoding end. It can be understood that To represent the residual distribution between the image output by the autoregressive model in the encoding end and the input image.

Specifically, the autoencoding model can include a decoding model, and the predicted residual image can be output by using the latent variable as the input of the decoding model. The decoding model may be a trained model and is used to output a residual image corresponding to the input image. The residual image may be understood as a residual value between the residual image predicted by the autoregressive model and the input image.

It should be noted that both the encoding end and the decoding end deploy autoregressive models and autoencoding models, and the autoregressive model on the encoding end is the same as the autoregressive model on the decoding end. If the encoding end and decoding end are deployed in the same device, then Coding end and The auto-encoding model on the decoding end is the same. If the encoding end and decoding end are deployed in different devices, the encoding end and decoding end can deploy the same auto-encoding model, or a complete auto-encoding model can be deployed on the encoding end and deployed on the decoding end. The decoding model in the autoencoding model can be adjusted according to actual application scenarios, and this application does not limit this.

1204. Combine the second residual distribution and the residual coded data for decoding to obtain the second residual image.

After obtaining the second residual distribution and the residual coded data, the second residual distribution and the residual coded data can be combined for decoding to obtain the second residual image.

Specifically, if the encoding end uses semi-dynamic entropy coding for encoding, the decoding end can also decode based on semi-dynamic entropy coding and output the second residual image, that is, the image corresponding to the first residual image on the encoding end. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include the second preset type. The second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long operations such as multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include multiplication, division or remainder operations. Only simple addition and subtraction operations can be included in the dynamic entropy encoder, allowing efficient encoding.

More specifically, the semi-dynamic entropy encoder can participate in the related descriptions in the aforementioned Figures 6 to 11, and will not be described again here.

It can be understood that for the aforementioned process of encoding the first residual image and the first residual distribution at the encoding end to obtain the residual encoded data, after the decoding end obtains the second residual distribution and the residual encoded data, the process can be performed The inverse operation to deduce the second residual distribution is equivalent to obtaining the residual between the first image output by the autoregressive model at the encoding end and the input image, that is, the first residual distribution.

1205. Use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.

After obtaining the second residual image, the second residual distribution can be used as the input of the autoregressive model for backpropagation, and the decompressed image can be deduced, that is, lossless recovery of the input image at the encoding end can be achieved.

In addition, when the second residual image is used as the input of the autoregressive model for backpropagation, if the autoregressive model at the encoding end uses the pixel values of the predicted pixels to predict the values of pixels on the same connection, then in When the decoding end performs the decoding operation, the values of pixels on the same connection can be decoded in parallel to achieve efficient decoding. The same connection can be the same row, the same column, the same diagonal, etc., which can be determined according to the actual application scenario.

Therefore, in the embodiment of the present application, the autoencoding model usually has poor fitting ability and requires a deeper network to achieve a better compression rate. However, the present application combines the output results of the autoregressive model, thereby reducing the error of the autoencoding model. size. Therefore, in this application, the autoregressive model and the autoencoding model are combined for decoding, and both the autoencoding and the autoregressive models can be controlled to be very small, thus avoiding the problem of too long inference time caused by an excessively large network of the autoencoding model. Enable efficient image decompression. Moreover, in the method provided by this application, the entire process can be implemented based on the AI lossless compression of the AI chip, including the AI model and entropy coding, which avoids the transmission problem between the system memory and the AI chip memory and improves the coding efficiency.

For ease of understanding, the process of the decoding method provided by this application is introduced below in conjunction with specific application scenarios. Refer to Figure 13, which is a schematic flow chart of another decoding method provided by this application, as follows.

First, obtain the latent variable encoding 1301 and the residual encoding 1302.

Among them, the latent variable encoding 1301 and the residual encoding 1302 can be read locally or received from the encoding end, and can be adjusted according to the actual application scenario. For example, the latent variable encoding 1301 and the residual encoding may be the latent variable encoding 708 and the residual encoding 709 mentioned in FIG. 7 .

Subsequently, the latent variable encoding 1301 is input to the static entropy encoder 1303, and the latent variable 1304 is output.

Generally, the bits corresponding to each probability in entropy coding can be as shown in the aforementioned Table 1. After obtaining the latent variable encoded bit stream, the probability corresponding to each character can be determined based on the corresponding relationship, thereby outputting the latent variable. It can be understood as decompressing important features in the image.

The latent variable 1304 is then used as the input of the decoding model in the autoencoding model 1305, and the prediction residual 1306 is output.

The decoding model is similar to the decoding model in Figure 7 and will not be described again here. The prediction residual 1306 is similar to the aforementioned prediction residual 707 and will not be described again here.

Subsequently, both the residual encoding 1302 and the prediction residual 1306 are used as inputs to the semi-dynamic entropy encoder, and an image residual 1308 is output.

The image residual 1308 is similar to the aforementioned image residual 704 and will not be described again here.

Among them, the decoding process of semi-dynamic entropy coding can be understood as the inverse operation of the aforementioned semi-dynamic entropy coding, that is, when the prediction residual and residual coding are known, the image residual is inversely inferred. For example: find the state value of the current symbol: s=S ^′ % 2 ^M , find the symbol x corresponding to s, and decompress x. It needs to satisfy CDF(x)≤s<CDF(x)+PMF(x). According to the decompressed symbol x, restore the status value of the previous step: S＝S'/2 ^M *PMF(x)+S'%2 ^M –PMF(x).

After the image residual 1308 is obtained, the image residual can be used as the input of backpropagation of the autoregressive model 1309 to infer the decompressed image 1310.

It can be understood that the autoregressive model 1309 is a trained model, which is the same as the aforementioned autoregressive model 702. It can be understood that when the image residual is known, the input image 701 is deduced in reverse.

Optionally, if the encoding end uses the pixel values of the predicted pixels to predict the pixel values of the pixels on the same line in parallel when outputting the prediction residual through the autoregressive model, then reverse the autoregressive model. During propagation, the pixel values of pixels on the same line can be decoded to achieve parallel decoding.

For example, given an m×n image and hyperparameter h (0≤h<n), if for any pixel point (i,j), all points (i′ of (i,j) predicted in the autoregressive model , j′) satisfies: h×i ^′ +j ^′ <h×i+j, then this image can be decompressed through n+(m-1)×h parallel calculations. The decompression sequence includes:

Extract the points in the first row sequentially: (0,0),(0,1),...,(0,n-1). While decompressing the (0,j) point, if j-h≥0, then decompress (1,j-h) at the same time; if j-h×2≥0, then decompress (2,j-h×2) at the same time, and so on;

Extract the points in the second row sequentially: (1,n-h-1),...,(1,n-1). While decompressing the (1,j) point, if j-h≥0,

Then decompress (2,j-h) at the same time; if j-h×2≥0, then decompress (3,j-h×2) at the same time, and so on;

Decompress according to this rule until all points are decompressed.

Therefore, through the method of encoding and decoding the same line in parallel provided by this application, the encoding and decoding efficiency can be greatly improved, and more efficient image compression can be achieved.

For ease of understanding, the following takes some specific application scenarios as examples to introduce the effects achieved by this application.

First, it is necessary to construct a neural network model with the autoregressive model and the autoencoding model as the core. In this technical solution, the autoregressive model implements a lightweight design and only contains 12 parameters. For a three-channel image, each channel only needs 4 parameters for prediction. The autoencoder model uses a vector quantized autoencoder. It uses the vector codebook to reduce the space of latent variables and sets the codebook size to 256. That is, the value space of the latent variables in the autoencoder is limited to 256 integers. middle. The encoder and decoder of the autoencoder both use four residual convolution blocks, and the number of channels for each layer of features is 32.

The model training process and testing process are as follows:

Training: Train on the training set of a single data set to obtain the parameters of the autoregressive model, the autoencoding model, and the statistics of the latent variables, which are used to compress the latent variables.

Compression: Through the method provided in this application, all test images of a single data set are stacked together in the batch dimension to form a four-dimensional tensor. This four-dimensional tensor is used as input to the process at once, and the residual encoding of all images is output in parallel with the encoding of the latent variables.

Decompression: Using the method provided by this application, the residual coding and latent variables of all images are used as input in the decompression process at once, and the original images of all images are output in parallel.

Compared with some commonly used lossless compressions, such as L3C (Practical Full Resolution Learned Lossless Image Compression), FLIF (free lossless image format based on maniac compression), WebP or PNG (Portable Network Graphics), etc., the method provided by this application is called PILC (Practical Image Lossless Compression, image lossless compression), see Table 3.

table 3

As can be seen from Table 3, compared with the previous AI image lossless compression algorithm - L3C, this technical invention improves the throughput rate by 14 times while maintaining the compression rate. At the same time, this technical invention improves the compression rate and throughput rate. It is also better than traditional methods such as PNG, WebP, and FLIF.

Therefore, the method provided by this application combines the autoregressive model and the autoencoding model, which greatly reduces the model size compared to using the autoencoding model alone. Moreover, the autoregressive model provided by this application can realize parallel encoding and parallel decompression, efficient encoding and decoding, and efficient image compression and decompression. Moreover, the process of the method provided in this application can be run on the AI chip, which avoids the transmission of information between the system memory and the AI chip memory, further improving the encoding and decoding efficiency.

In addition, in the real production environment, the sizes of images are basically different, and the resolution of the images is also relatively high. In order to efficiently compress and decompress high-definition large images of different sizes, this embodiment is designed as follows.

Model training: In the model training stage, high-definition large data sets such as OpenImage and ImageNet64 are used for model training to obtain the parameters of the autoregressive model and the autoencoding model.

compression:

First, perform image preprocessing, slice large high-definition images of different sizes into the same size (such as 32x 32), and store each image size information separately for image restoration;

Stack all slices together in the batch dimension as input to the process;

Output the residual coding of all images in parallel with the coding of latent variables;

Record the statistical information of the latent variables of each data set (same data set)/each image (different data sets) as another output of the process.

The achieved effects can be seen in Table 4.

Table 4

Obviously, the method provided by this application can achieve higher throughput and efficient encoding and decoding.

More specifically, the following is a more detailed comparison of some commonly used compression methods.

Referring to Table 5, under the premise that the maximum likelihood (an indicator for evaluating the prediction accuracy of the generated model, the smaller the better) is basically consistent with the previous fastest AI algorithm L3C, the inference speed is increased by 9.6 times.

table 5

Referring to Table 6, for the same autoregressive model, using the parallel solution provided by this application, the decompression speed is increased by 7.9 times compared with the non-parallel solution. The parallel scheme has a restriction on the receptive field, but this receptive field has a limited impact on the compression ratio.

Table 6

Referring to Table 7, compared with dynamic entropy coding (rANS), the coding speed of the semi-dynamic entropy coding (ANS-AI) proposed in this application is increased by 20 times, the decoding speed is increased by 100 times, and the BPD loss is less than 0.55 and 0.17. And this semi-dynamic entropy coding can be Running on the AI chip, on a single V100, the peak speed can reach 1GB/s.

Table 7

In addition, compared with dynamic entropy coding, semi-dynamic entropy coding reduces the number of distribution types required from 2048 to 8, the memory size required for preprocessing is reduced to 1/256 of the original, and the BPD loss is less than 0.03, which can reduce entropy coding. Required computing resources to improve coding efficiency.

The process of the image encoding method and image decompression method provided by the present application has been introduced above. The device for executing the foregoing method will be introduced below.

Referring to Figure 14, this application provides a schematic structural diagram of an image coding device. The image coding device includes:

The autoregressive module 1401 is used to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;

The residual calculation module 1402 is used to obtain the residual between the first image and the input image to obtain the first residual image;

The autoencoding module 1403 is used to use the input image as the input of the autoencoding model, and output latent variables and a first residual distribution. The latent variables include features extracted from the input image, and the first residual distribution includes the features output by the autoencoding model. Used to represent the residual value between each pixel in the input image and the corresponding pixel in the first residual image;

Residual coding module 1404, used to code the first residual image and the first residual distribution to obtain residual coded data;

The latent variable encoding module 1405 is used to encode latent variables to obtain latent variable encoded data. The latent variable encoded data and residual encoded data are used to obtain the input image after decompression.

In a possible implementation, the residual encoding module 1404 is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy encoder, and output residual encoding data. The semi-dynamic entropy encoding The encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type of encoding operation includes addition, subtraction or bit operation. The second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-time operations such as multiplication, division or remainder operations.

In a possible implementation, the semi-dynamic entropy encoder may be obtained by converting a dynamic up-entropy encoder. Specifically, the operations of the dynamic entropy encoder can be approximated, such as replacing the operations of the dynamic entropy encoder with approximate operations, reducing or removing operations such as multiplication, division, remainder, etc., and then transforming the operations can be performed. Transformation, thereby converting all operations that take more than a certain time (such as remaining remainder, multiplication, division, etc.) into table access, and lightweight operations such as addition, subtraction, bit, etc., to obtain the half-digit operation provided by this application Dynamic entropy encoder. It can be understood that the semi-dynamic The entropy encoder can be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder. When using this semi-dynamic entropy encoder for entropy encoding, simple operations can be used, such as addition, subtraction, and bit operations. and other efficient coding operations to achieve efficient coding.

In a possible implementation, the latent variable encoding module 1405 is specifically configured to use latent variables as inputs to the static entropy encoder to obtain latent variable encoded data.

In a possible implementation, the auto-encoding model includes an encoding model and a decoding model. The auto-encoding module 1403 is specifically used to: use the input image as the input of the encoding model, output latent variables, and the encoding model is used to extract from the input graphics. Features; use the latent variable as the input of the decoding model to obtain the first residual distribution, and the decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.

Refer to Figure 15, which is a schematic structural diagram of an image decompression device provided by this application. The image decompression device includes:

The transceiver module 1501 is used to obtain latent variable coded data and residual coded data. The latent variable coded data includes the features extracted from the input image by the coding end and is obtained by coding. The residual coded data includes the third output of the autoregressive model. Data obtained by encoding the residual between an image and the input image;

The latent variable decoding module 1502 is used to decode the latent variable encoded data to obtain latent variables. The latent variables include features extracted by the encoding end from the input image;

The autoencoding module 1503 is used to use latent variables as the input of the autoencoding model and output the second residual distribution;

The residual decoding module 1504 is used to decode the second residual distribution and the residual coded data to obtain the second residual image;

The autoregressive module 1505 is configured to use the second residual image as the input of backpropagation of the autoregressive model and output the decompressed image.

In a possible implementation, the latent variable decoding module 1502 is specifically configured to use the latent variable encoded data as the input of the static entropy encoder and output the latent variable.

In a possible implementation, the residual decoding module 1504 is specifically configured to use the second residual distribution and the residual coded data as inputs of the semi-dynamic entropy encoder, and output the second residual image. The semi-dynamic entropy encoder The encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes addition, subtraction or bit operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type of encoding operation includes addition, subtraction or bit operation. The second preset type includes at least one of multiplication, division or remainder operations, that is, the semi-dynamic entropy encoder does not include long-time operations such as multiplication, division or remainder operations.

In one possible implementation, the autoregressive module 1505 is specifically configured to decode pixels on the same connection line in the second residual image in parallel through an autoregressive model to obtain a decompressed image.

Please refer to Figure 16, which is a schematic structural diagram of another image encoding device provided by this application, as follows.

The image encoding device may include a processor 1601 and a memory 1602. The processor 1601 and the memory 1602 are interconnected through lines. Among them, the memory 1602 stores program instructions and data.

The memory 1602 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 6-11.

The processor 1601 is configured to execute the method steps performed by the image encoding device shown in any of the embodiments shown in FIGS. 6 to 11 .

Optionally, the image encoding device may also include a transceiver 1603 for receiving or sending data.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program for generating vehicle driving speed. When running on the computer, the computer is caused to execute the steps shown in Figures 6 to 11. The illustrated embodiments describe steps in a method.

Optionally, the aforementioned image encoding device shown in FIG. 16 is a chip.

Embodiments of the present application also provide an image encoding device. The image encoding device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. , the processing unit is configured to perform the method steps performed by the image encoding device shown in any of the embodiments in FIGS. 6 to 11 .

An embodiment of the present application also provides a digital processing chip. The digital processing chip integrates the circuit and one or more interfaces for realizing the above-mentioned processor 1601, or the functions of the processor 1601. When a memory is integrated into the digital processing chip, the digital processing chip can complete the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not have an integrated memory, it can be connected to an external memory through a communication interface. The digital processing chip implements the actions performed by the image encoding device in the above embodiment according to the program code stored in the external memory.

The image encoding device provided by the embodiment of the present application may be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the server executes the image encoding method described in the embodiments shown in FIGS. 6-11. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.

Please refer to Figure 17, which is a schematic structural diagram of another image decompression device provided by this application, as described below.

The image decompression device may include a processor 1701 and a memory 1702. The processor 1701 and the memory 1702 are interconnected through lines. Among them, the memory 1702 stores program instructions and data.

The memory 1702 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 12-13.

The processor 1701 is configured to execute the method steps performed by the image decompression device shown in any of the embodiments shown in FIGS. 12 and 13 .

Optionally, the image decompression device may also include a transceiver 1703 for receiving or sending data.

Embodiments of the present application also provide a computer-readable storage medium, which stores useful For the program that generates the vehicle's driving speed, when it is driving on the computer, the computer is caused to execute the steps in the method described in the embodiment shown in FIGS. 12-13.

Optionally, the aforementioned image decompression device shown in Figure 17 is a chip.

Embodiments of the present application also provide an image decompression device. The image decompression device may also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. , the processing unit is used to execute the method steps executed by the image decompression device shown in any of the embodiments in FIGS. 12 and 13 .

An embodiment of the present application also provides a digital processing chip. The digital processing chip integrates the circuit and one or more interfaces for realizing the above-mentioned processor 1701, or the functions of the processor 1701. When a memory is integrated into the digital processing chip, the digital processing chip can complete the method steps of any one or more of the foregoing embodiments. When the digital processing chip does not have an integrated memory, it can be connected to an external memory through a communication interface. The digital processing chip implements the actions performed by the image decompression device in the above embodiment according to the program code stored in the external memory.

The image decompression device provided by the embodiment of the present application may be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor. The communication unit may be, for example, an input/output interface, a pin, or a circuit. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the server executes the image decompression method described in the embodiments shown in FIGS. 6-11. Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.

An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the image decompression device or the image decompression device in the method described in the embodiments shown in Figures 6 to 13. step.

This application also provides an image processing system, which includes an image encoding device and an image decompression device. The image encoding device is used to execute the method steps corresponding to the aforementioned Figures 6-11. The image decompression device is used to execute the aforementioned Figures 12-11. Figure 13 corresponds to the method steps.

Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (GPU), or a digital signal processing unit. Digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete Hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.

Exemplarily, please refer to Figure 18. Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application. The chip can be represented as a neural network processor NPU 180. The NPU 180 serves as a co-processor and is mounted to the main CPU ( On the Host CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1803. The arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.

In some implementations, the computing circuit 1803 includes multiple processing units (process engines, PEs) internally. In a In some implementations, the arithmetic circuit 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1803 is a general-purpose matrix processor.

For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit. The operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .

The unified memory 1806 is used to store input data and output data. The weight data directly passes through the storage unit access controller (direct memory access controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802. Input data is also transferred to unified memory 1806 via DMAC.

Bus interface unit (bus interface unit, BIU) 1810 is used for interaction between the AXI bus and DMAC and instruction fetch buffer (IFB) 1809.

The bus interface unit 1810 (bus interface unit, BIU) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .

The vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, upsampling of feature planes, etc.

In some implementations, vector calculation unit 1807 can store the processed output vectors to unified memory 1806 . For example, the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value. In some implementations, vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.

The instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;

The unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.

Among them, the operations of each layer in the recurrent neural network can be performed by the operation circuit 1803 or the vector calculation unit 1807.

The processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the methods in Figures 6 to 13.

In addition, it should be noted that the device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate. The physical unit can be located in one place, or it can be distributed across multiple network units. can be based on actual It is necessary to select some or all of the modules to achieve the purpose of this embodiment. In addition, in the drawings of the device embodiments provided in this application, the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., including a number of instructions to make a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments of this application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)), etc.

The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects without necessarily using Used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

Finally, it should be noted that: the above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes within the technical scope disclosed in the present application. or replacement, shall be covered by the protection scope of this application.

Claims

An image coding method, characterized by including:

Use the input image as the input of the autoregressive model and output the first image;

Obtain the residual between the first image and the input image to obtain a first residual image;

The input image is used as the input of the autoencoding model, and a latent variable and a first residual distribution are output. The latent variable includes features extracted from the input image. The first residual distribution includes the autoencoding The model output is used to represent the residual value between each pixel point in the input image and the corresponding each pixel point in the first residual image;

Encode the first residual image and the first residual distribution to obtain residual coded data;

The latent variable is encoded to obtain latent variable encoded data, and the latent variable encoded data and the residual encoded data are used to obtain the input image after decompression.
The method according to claim 1, characterized in that said encoding the first residual image and the first residual distribution to obtain residual coded data includes:

The first residual image and the first residual distribution are used as inputs of a semi-dynamic entropy encoder, and the residual encoded data is output. The semi-dynamic entropy encoder is used to use a first preset type of encoding operation. Entropy encoding is performed. The first preset type of encoding operation includes addition, subtraction or bit operation, and the second preset type of encoding operation is not included in the semi-dynamic entropy encoder. The second preset type includes At least one of multiplication, division, or remainder operations.
The method according to claim 1, characterized in that said encoding the latent variables to obtain residual encoded data includes:

The latent variable is used as the input of the static entropy encoder to obtain the latent variable encoded data.
The method according to any one of claims 1 to 3, characterized in that the auto-encoding model includes an encoding model and a decoding model, and the input image is used as the input of the auto-encoding model to output a latent variable and a third A residual distribution, including:

Using the input image as an input to the encoding model and outputting the latent variable, the encoding model is used to extract features from the input graphics;

The latent variable is used as the input of the decoding model to obtain the first residual distribution. The decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
The method according to any one of claims 1 to 4, characterized in that the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
An image decompression method, characterized by including:

Obtain latent variable encoding data and residual encoding data. The latent variable encoding data is obtained by encoding the features extracted from the input image by the encoding end. The residual encoding data includes the input image and the autoregressive model forward Propagate the data obtained by encoding the residual between the output images;

The latent variable encoded data is decoded to obtain a latent variable. The latent variable includes the extracted data from the input image. Characteristics;

Use the latent variable as the input of the autoencoding model and output the second residual distribution;

Decoding is performed in combination with the second residual distribution and the residual encoded data to obtain a second residual image;

The second residual image is used as the input of backpropagation of the autoregressive model, and a decompressed image is output.
The method according to claim 6, characterized in that decoding the latent variable encoded data to obtain latent variables includes:

The latent variable encoded data is used as the input of the static entropy encoder and the latent variable is output.
The method according to claim 6 or 7, characterized in that combining the second residual distribution and the residual coded data for decoding to obtain a second residual image includes:

The second residual distribution and the residual encoded data are used as inputs of a semi-dynamic entropy encoder, and the second residual image is output, and the semi-dynamic entropy encoder is used to use a first preset type of encoding operation. Entropy encoding is performed. The first preset type of encoding operation includes addition, subtraction or bit operation, and the second preset type of encoding operation is not included in the semi-dynamic entropy encoder. The second preset type includes At least one of multiplication, division, or remainder operations.
The method according to any one of claims 6 to 8, characterized in that using the second residual image as an input of backpropagation of an autoregressive model and outputting a decompressed image includes:

Through the autoregressive model, pixels on the same connection line in the second residual image are decoded in parallel to obtain the decompressed image.
An image coding device, characterized in that it includes:

An autoregressive module, configured to use the input image as the input of the autoregressive model and output the first image, the autoregressive model;

A residual calculation module, used to obtain the residual between the first image and the input image to obtain a first residual image;

An autoencoding module, configured to use the input image as the input of the autoencoding model, and output a latent variable and a first residual distribution, where the latent variable includes features extracted from the input image, and the first residual The distribution includes a residual value output by the autoencoding model to represent a residual value between each pixel point in the input image and a corresponding each pixel point in the first residual image;

A residual coding module, used to code the first residual image and the first residual distribution to obtain residual coded data;

A latent variable encoding module is used to encode the latent variable to obtain latent variable encoded data, and the latent variable encoded data and the residual encoded data are used to obtain the input image after decompression.
The device according to claim 10, characterized in that:

The residual coding module is specifically configured to use the first residual image and the first residual distribution as semi-dynamic entropy The input of the encoder is to output the residual encoded data. The semi-dynamic entropy encoder is used to perform entropy encoding using a first preset type of encoding operation. The first preset type of encoding operation includes addition, subtraction or bitwise encoding. operation, and the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of multiplication, division or remainder operation.
The device according to claim 10, characterized in that:

The latent variable encoding module is specifically configured to use the latent variable as the input of a static entropy encoder to obtain the latent variable encoded data.
The device according to any one of claims 10-12, characterized in that the self-encoding model includes an encoding model and a decoding model, and the self-encoding module is specifically used for:

Using the input image as an input to the encoding model and outputting the latent variable, the encoding model is used to extract features from the input graphics;

The latent variable is used as the input of the decoding model to obtain the first residual distribution. The decoding model is used to predict the residual value between the input image and the corresponding pixel distribution.
The device according to any one of claims 10 to 13, characterized in that the autoregressive model is used to predict the values of pixels on the same connection using the predicted pixel values of the pixels.
An image decompression device, characterized by including:

Transceiver module, used to obtain latent variable coded data and residual coded data. The latent variable coded data includes the features extracted from the input image by the coding end and is encoded. The residual coded data includes autoregressive model forward propagation. The residual between the output first image and the input image is obtained by encoding;

A latent variable decoding module, used to decode the latent variable encoded data to obtain latent variables, where the latent variables include features extracted from the input image;

The autoencoding module is used to use the latent variable as the input of the autoencoding model and output the second residual distribution;

A residual decoding module, configured to perform decoding in combination with the second residual distribution and the residual coded data to obtain a second residual image;

The autoregressive module is configured to use the second residual image as an input of backpropagation of the autoregressive model and output a decompressed image.
The device according to claim 15, characterized in that:

The latent variable decoding module is specifically configured to use the latent variable encoded data as the input of a static entropy encoder and output the latent variable.
The device according to claim 15 or 16, characterized in that,

The residual decoding module is specifically configured to use the second residual distribution and the residual encoded data as inputs of a semi-dynamic entropy encoder and output the second residual image. The semi-dynamic entropy encoder uses For editing using the first default type Code operation performs entropy coding. The first preset type of encoding operation includes addition, subtraction or bit operation, and the second preset type of encoding operation is not included in the semi-dynamic entropy encoder. The second preset type The type includes at least one of multiplication, division, or remainder operations.
The device according to any one of claims 10-17, characterized in that,

The autoregressive module is specifically configured to use the autoregressive model to decode pixels on the same connection line in the second residual image in parallel to obtain the decompressed image.
An image encoding device, characterized in that it includes a processor, the processor is coupled to a memory, and the memory stores a program. When the program instructions stored in the memory are executed by the processor, claims 1-5 are realized. The steps of any of the methods.
An image decompression device, characterized in that it includes a processor, the processor is coupled to a memory, and the memory stores a program. When the program instructions stored in the memory are executed by the processor, claims 6-9 are realized. The steps of any of the methods.
An image processing system, characterized in that it includes an image encoding device and an image decompression device, the image encoding device is used to implement the steps of the method according to any one of claims 1 to 5, and the image decompression device is used to Implementing the steps of the method according to any one of claims 6-9.
A computer-readable storage medium comprising a program that, when executed by a processing unit, performs the steps of the method according to any one of claims 1 to 9.
A computer program product, characterized in that the computer program product includes software code for executing the steps of the method according to any one of claims 1 to 9.