CN115022637A - Image coding method, image decompression method and device - Google Patents

Image coding method, image decompression method and device Download PDF

Info

Publication number
CN115022637A
CN115022637A CN202210447177.0A CN202210447177A CN115022637A CN 115022637 A CN115022637 A CN 115022637A CN 202210447177 A CN202210447177 A CN 202210447177A CN 115022637 A CN115022637 A CN 115022637A
Authority
CN
China
Prior art keywords
image
residual
coding
input
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210447177.0A
Other languages
Chinese (zh)
Inventor
康宁
仇善召
张鸣天
张世枫
李震国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210447177.0A priority Critical patent/CN115022637A/en
Publication of CN115022637A publication Critical patent/CN115022637A/en
Priority to PCT/CN2023/090043 priority patent/WO2023207836A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Abstract

The application provides an image coding method, an image decompression method and an image decompression device which relate to computer vision in the field of artificial intelligence, and are used for coding by combining the output of an autoregressive model and an autoregressive model, reducing the size of the needed model and improving the coding and decoding efficiency. The image encoding method includes: taking an input image as an input of an autoregressive model, and outputting a first image; obtaining a residual error between the first image and the input image to obtain a first residual error image; the input image is used as the input of the self-coding model, an implicit variable and a first residual distribution are output, the implicit variable comprises the features extracted from the input image, and the first residual distribution comprises residual values corresponding to all pixel points in the input image output by the self-coding model; coding the first residual image and the first residual distribution to obtain residual coded data; and coding the hidden variables to obtain hidden variable coded data, and decompressing the hidden variable coded data and the residual coded data to obtain an input image.

Description

Image coding method, image decompression method and device
Technical Field
The present application relates to the field of image processing, and in particular, to an image encoding method, an image decompression method, and an image decompression device.
Background
Images are widely used in various fields, and transmission or storage of images may be involved in a large number of scenes. And as the resolution of the image is higher, the more the memory space consumption required in saving the image is, the higher the bandwidth required in transmitting the image is, and the lower the transmission efficiency is. Therefore, in general, to facilitate transmission or storage of an image, the image may be compressed, thereby reducing the number of bits occupied by the image, and further reducing the storage space required to store the image and the bandwidth required to transmit the image.
For example, in some common image compression methods, entropy coding may be used to perform image compression, such as huffman coding, arithmetic coding, ANS coding, and the like, which are commonly used entropy coding algorithms. However, the compression rates of various entropy coding methods are optimized, and it is difficult to further improve the compression rate. Therefore, how to improve the encoding and decoding efficiency becomes a problem to be solved urgently.
Disclosure of Invention
The application provides an image coding method, an image decompression method and an image decompression device, which are used for coding by combining the output of an autoregressive model and an autoregressive model, reducing the size of the required model and improving the coding and decoding efficiency.
In view of the above, the present application provides, in a first aspect, an image encoding method, including: taking an input image as an input of an autoregressive model, and outputting a first image; obtaining a residual error between the first image and the input image to obtain a first residual error image; the method comprises the steps that an input image is used as input of a self-coding model, an implicit variable and first residual distribution are output, the implicit variable comprises characteristics extracted from the input image, and the first residual distribution comprises residual values predicted by the self-coding model and used for representing pixel points in the input image and corresponding to the pixel points in a first residual image; coding the first residual image and the first residual distribution to obtain residual coded data; and coding the hidden variable to obtain hidden variable coded data, and decompressing the hidden variable coded data and the residual coded data to obtain an input image.
Therefore, in the application, the output results of the autoregressive model and the autoregressive model are combined for coding, so that the autoregressive model and the autoregressive model can be controlled to be very small, the problem of overlong reasoning time caused by overlarge network of the autoregressive model is solved, and efficient image compression is realized. In addition, in the method provided by the application, the whole process can be realized based on AI lossless compression of the AI chip, including an AI model and entropy coding, so that the transmission problem of a system memory and an AI chip memory is avoided, and the coding efficiency is improved.
In a possible implementation, the aforementioned encoding the first residual image and the first residual distribution to obtain residual encoded data includes: the method comprises the steps of taking a first residual image and first residual distribution as input of a semi-dynamic entropy coder, and outputting residual coded data, wherein the semi-dynamic entropy coder is used for entropy coding by using a first preset type of coding operation, the first preset type of coding operation comprises addition, subtraction or bit operation, the semi-dynamic entropy coder does not comprise a second preset type of coding operation, and the second preset type of coding operation comprises at least one of multiplication, division or residue operation, namely the semi-dynamic entropy coder does not comprise multiplication, division or residue operation and other long-time-consuming operations, namely the semi-dynamic entropy coder can only comprise simple addition and subtraction operation, so that efficient coding can be realized.
Therefore, in the embodiment of the application, the semi-dynamic entropy coding can be performed on the residual image, the encoding can be performed in a limited distribution mode, and the operation loss which is long in time consumption, such as multiplication, division, residue taking operation and the like, is reduced compared with the dynamic entropy coding, so that the encoding efficiency is greatly improved.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be subjected to approximation processing, for example, the operation of the dynamic entropy encoder is replaced by approximation operation, operations such as multiplication, division, remainder removal and the like are reduced or removed, and then, conversion processing may be performed to convert the operation, so that all operations (such as remaining operation loss such as remainder, multiplication, division and the like) which consume more than a certain time duration are converted into table access, and lightweight operations such as addition, subtraction, bit and the like, so as to obtain the semi-dynamic entropy encoder provided by the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
In a possible implementation, the encoding the hidden variable to obtain residual encoded data may include: and taking the hidden variable as the input of the static entropy coder to obtain the hidden variable coded data.
Therefore, in the present embodiment, the features extracted from the input image can be statically entropy-encoded, and encoding can be efficiently performed.
In one possible embodiment, the self-coding model may include a coding model and a decoding model, the input image is used as an input of the self-coding model, and the hidden variable and the first residual distribution are output, including: taking an input image as the input of a coding model, and outputting a hidden variable, wherein the coding model is used for extracting features from an input image; and taking the hidden variable as the input of a decoding model to obtain a first residual distribution, wherein the decoding model is used for predicting residual values between the input image and the corresponding pixel distribution.
In the embodiment of the application, the trained self-coding model can be used for extracting important features from the input image and predicting the corresponding residual image, so that residual coding data which can represent data in the input image can be obtained by combining the output of the autoregressive model.
In a possible implementation manner, the autoregressive model is used for predicting values of pixels on the same connecting line by using the predicted pixel values of the pixels, so that in a subsequent decoding process, for the pixels on the same connecting line, the current pixel can be decoded without waiting for other pixels to be decoded, the decoding efficiency of the pixels on the same connecting line is realized, and the decoding efficiency of an input image is improved.
In a second aspect, the present application provides an image decompression method, including: acquiring implicit variable coded data and residual coded data, wherein the implicit variable coded data comprises a coded result obtained by coding a feature extracted from an input image by a coding end, and the residual coded data comprises a coded result obtained by coding a residual between an image output by autoregressive model forward propagation and the input image; decoding the hidden variable coded data to obtain hidden variables, wherein the hidden variables comprise characteristics extracted from an input image by a coding end; taking the hidden variable as the input of the self-coding model, and outputting second residual distribution; decoding is carried out by combining the second residual distribution and the residual coded data to obtain a second residual image; and taking the second residual image as the backward propagation input of the autoregressive model, and outputting the decompressed image.
Therefore, according to the embodiment of the application, the self-encoding model is generally poor in fitting capability, a deeper network is needed to achieve a better compression rate, and the size of the self-encoding model can be reduced by combining the output result of the autoregressive model. Therefore, in the application, the autoregressive model and the autoregressive model are combined for decoding, so that the autoregressive model and the autoregressive model can be controlled to be very small, the problem of overlong reasoning time caused by overlarge network of the autoregressive model is solved, and efficient image decompression is realized. In addition, in the method provided by the application, the whole process can be realized based on AI lossless compression of the AI chip, including an AI model and entropy coding, so that the transmission problem of a system memory and an AI chip memory is avoided, and the coding efficiency is improved.
In a possible implementation, the decoding the coded data of the hidden variable to obtain the hidden variable includes: and taking the hidden variable coded data as the input of the static entropy coder, and outputting the hidden variable. The decoding can be understood as that the encoding end carries out the inverse operation of static entropy coding, so that the important characteristics in the image can be obtained through lossless recovery.
In a possible implementation manner, the decoding, performed in combination with the second residual distribution and the residual encoded data, to obtain a second residual image includes: and taking the second residual distribution and the residual coded data as the input of a semi-dynamic entropy coder, and outputting a second residual image, wherein the semi-dynamic entropy coder is used for performing entropy coding by using a first preset type of coding operation, the first preset type of coding operation comprises addition, subtraction or bit operation, the semi-dynamic entropy coder does not comprise a second preset type of coding operation, and the second preset type comprises at least one of multiplication, division or remainder operation, namely the semi-dynamic entropy coder does not comprise multiplication, division or remainder operation and other long-time-consuming operations, namely the semi-dynamic entropy coder can only comprise simple addition and subtraction operation, so that efficient coding can be realized. Therefore, the residual image can be decoded based on semi-dynamic entropy coding in a limited distribution mode, the operation loss which consumes more time, such as multiplication, division, residue taking operation and the like, is reduced compared with the dynamic entropy coding, and the decoding efficiency is greatly improved.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be subjected to approximation processing, for example, the operation of the dynamic entropy encoder is replaced by approximation operation, operations such as multiplication, division, remainder removal and the like are reduced or removed, and then, conversion processing may be performed to convert the operation, so that all operations (such as remaining operation loss such as remainder, multiplication, division and the like) which consume more than a certain time duration are converted into table access, and lightweight operations such as addition, subtraction, bit and the like, so as to obtain the semi-dynamic entropy encoder provided by the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
In a possible implementation, the aforementioned outputting a decompressed image by using the second residual image as an input of the back propagation of the autoregressive model includes: and carrying out parallel decoding on the pixel points on the same connecting line in the second residual image through an autoregressive model to obtain a decompressed image. Therefore, for the pixels on the same connecting line, the current pixel can be decoded without waiting for other pixels to be decoded, so that the decoding efficiency of the pixels on the same connecting line is realized, and the decoding efficiency of the input image is improved.
In a third aspect, the present application provides an image encoding apparatus comprising:
the autoregressive module is used for taking the input image as the input of the autoregressive model, outputting a first image and the autoregressive model;
the residual error calculation module is used for acquiring a residual error between the first image and the input image to obtain a first residual error image;
the self-coding module is used for taking the input image as the input of a self-coding model, outputting an implicit variable and first residual distribution, wherein the implicit variable comprises characteristics extracted from the input image, and the first residual distribution comprises residual values which are output by the self-coding model and used for representing all pixel points in the input image and all pixel points in the first residual image;
the residual coding module is used for coding the first residual image and the first residual distribution to obtain residual coded data;
and the hidden variable coding module is used for coding the hidden variables to obtain hidden variable coded data, and the hidden variable coded data and the residual coded data are used for decompressing to obtain an input image.
In a possible embodiment, the residual coding module is specifically configured to output residual coded data by using a first residual image and a first residual distribution as inputs of a semi-dynamic entropy coder, where the semi-dynamic entropy coder is configured to perform entropy coding using a first preset type of coding operation, the first preset type of coding operation includes an addition, subtraction, or bit operation, the semi-dynamic entropy coder does not include a second preset type of coding operation, and the second preset type includes at least one of a multiplication, division, or remainder operation, that is, the semi-dynamic entropy coder does not include a long-time operation such as a multiplication, division, or remainder operation, that is, the semi-dynamic entropy coder may include only a simple addition or subtraction operation, so that efficient coding may be implemented.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be subjected to approximation processing, for example, the operation of the dynamic entropy encoder is replaced by approximation operation, operations such as multiplication, division, remainder removal and the like are reduced or removed, and then, conversion processing may be performed to convert the operation, so that all operations (such as remaining operation loss such as remainder, multiplication, division and the like) which consume more than a certain time duration are converted into table access, and lightweight operations such as addition, subtraction, bit and the like, so as to obtain the semi-dynamic entropy encoder provided by the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
In a possible implementation manner, the hidden variable encoding module is specifically configured to use a hidden variable as an input of the static entropy encoder to obtain hidden variable encoded data.
In a possible implementation, the self-coding model comprises a coding model and a decoding model, and the self-coding module is specifically configured to: taking an input image as the input of a coding model, and outputting a hidden variable, wherein the coding model is used for extracting features from an input image; and taking the hidden variable as the input of a decoding model to obtain a first residual distribution, wherein the decoding model is used for predicting residual values between the input image and the corresponding pixel distribution.
In one possible embodiment, the autoregressive model is used to predict the values of pixels on the same link using the pixel values of the predicted pixels.
In a fourth aspect, the present application provides an image decompression apparatus comprising:
the receiving and sending module is used for obtaining hidden variable coded data and residual coded data, the hidden variable coded data comprises data obtained by coding features extracted from an input image by a coding end, and the residual coded data comprises data obtained by coding a residual between an image output by an autoregressive model and the input image;
the hidden variable decoding module is used for decoding the hidden variable coded data to obtain hidden variables, and the hidden variables comprise characteristics extracted from an input image by a coding end;
the self-coding module is used for taking the hidden variable as the input of the self-coding model and outputting second residual distribution;
the residual error decoding module is used for decoding by combining the second residual error distribution and the residual error coded data to obtain a second residual error image;
and the autoregressive module is used for taking the second residual image as the input of the back propagation of the autoregressive model and outputting the decompressed image.
In a possible embodiment, the hidden variable decoding module is specifically configured to output the hidden variable by using the hidden variable encoded data as an input of the static entropy encoder.
In a possible embodiment, the residual decoding module is specifically configured to output the second residual image by using the second residual distribution and the residual encoded data as input of a semi-dynamic entropy encoder, where the semi-dynamic entropy encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes an addition operation, a subtraction operation, or a bit operation, the semi-dynamic entropy encoder does not include the second preset type of encoding operation, and the second preset type includes at least one of a multiplication operation, a division operation, or a remainder operation, that is, the semi-dynamic entropy encoder does not include a long time-consuming operation such as a multiplication operation, a division operation, or a remainder operation, that is, the semi-dynamic entropy encoder may only include a simple addition and subtraction operation, so that efficient encoding may be implemented.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be approximated, for example, the operation of the dynamic entropy encoder may be replaced by an approximate operation, and operations such as multiplication, division, and remainder may be reduced or removed, and then, a transformation process may be performed to transform the operation, so that all operations (such as remaining operation losses such as remainder, multiplication, and division) which take time longer than a certain time period are converted into a table access, and light-weight operations such as addition, subtraction, and bit, so as to obtain the semi-dynamic entropy encoder provided in the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
In a possible implementation manner, the autoregressive module is specifically configured to perform parallel decoding on pixel points located on the same connecting line in the second residual image through an autoregressive model, so as to obtain a decompressed image.
In a fifth aspect, an embodiment of the present application provides an image encoding apparatus having a function of implementing the image processing method of the first aspect. The function can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In a sixth aspect, an embodiment of the present application provides an image decompression apparatus having a function of implementing the image processing method of the second aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a seventh aspect, an embodiment of the present application provides an image encoding apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected by a line, and the processor calls the program code in the memory for executing the processing-related functions in the method for encoding an image as described in any of the first aspect. Alternatively, the image encoding device may be a chip.
In an eighth aspect, an embodiment of the present application provides an image decompression apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected by a line, and the processor calls the program code in the memory for executing the processing-related functions in the method for image decompression as described in any of the second aspects. Alternatively, the image decompression means may be a chip.
In a ninth aspect, the present application provides an image encoding apparatus, which may also be referred to as a digital processing chip or chip, where the chip includes a processing unit and a communication interface, and the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit, and the processing unit is configured to execute the functions related to the processing in the first aspect or any one of the optional implementations of the first aspect.
In a tenth aspect, the present application provides an image decompression apparatus, where the image encoding apparatus may also be referred to as a digital processing chip or a chip, where the chip includes a processing unit and a communication interface, the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit, and the processing unit is configured to execute the functions related to the processing in the second aspect or any one of the optional implementations of the second aspect.
In an eleventh aspect, the present application provides an image processing system, including an image encoding device configured to perform a function related to processing as in any one of the first aspect and the first optional implementation manner described above, and an image decompressing device configured to perform a function related to processing as in any one of the second aspect and the second optional implementation manner described above.
In a twelfth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method in any optional implementation manner of the first aspect or the second aspect.
In a thirteenth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the method in any of the optional embodiments of the first or second aspects.
Drawings
FIG. 1 is a schematic diagram of an artificial intelligence body framework for use in the present application;
FIG. 2 is a system architecture diagram according to an embodiment of the present application;
fig. 3 is a schematic view of an application scenario according to an embodiment of the present application;
FIG. 4 is a schematic view of another application scenario of the embodiment of the present application;
FIG. 5 is a schematic diagram of another application scenario according to an embodiment of the present application;
fig. 6 is a flowchart illustrating an image encoding method according to an embodiment of the present application;
fig. 7 is a schematic flowchart of another image encoding method according to an embodiment of the present application;
fig. 8 is a schematic diagram illustrating a prediction method of an autoregressive model according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a prediction sequence of an autoregressive model according to an embodiment of the present application;
fig. 10 is a schematic diagram illustrating a residual error calculation method according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a data structure provided by an embodiment of the present application;
fig. 12 is a schematic flowchart of an image decompression method according to an embodiment of the present application;
fig. 13 is a schematic flowchart of another image decompression method according to an embodiment of the present application;
FIG. 14 is a schematic structural diagram of an image encoding apparatus provided in the present application;
fig. 15 is a schematic structural diagram of an image decoding apparatus provided in the present application;
FIG. 16 is a schematic structural diagram of another image encoding apparatus provided in the present application;
FIG. 17 is a schematic structural diagram of another image decoding apparatus provided in the present application;
fig. 18 is a schematic diagram of a chip structure provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" process of consolidation. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure
The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by an intelligent chip, such as a Central Processing Unit (CPU), a Network Processor (NPU), a Graphic Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA), or other hardware acceleration chip; the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, wisdom city etc..
The embodiments of the present application relate to a large number of neural networks and related applications of images, and in order to better understand the solution of the embodiments of the present application, the following first introduces terms and concepts related to the fields of neural networks and images to which the embodiments of the present application may relate.
(1) Neural network
The neural network may be composed of neural units, and the neural unit may refer to an operation unit with xs and intercept 1 as inputs, and an output of the operation unit may be as shown in the following formula:
Figure BDA0003617436510000071
where s is 1, 2, … … n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(2) Deep neural network
Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple intermediate layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, intermediate layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the number of the middle layers is an intermediate layer or a hidden layer. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.
Although DNN appears complex, each layer can be represented as a linear relational expression:
Figure BDA0003617436510000072
wherein the content of the first and second substances,
Figure BDA0003617436510000073
is the input vector of the input vector,
Figure BDA0003617436510000074
is the output vector of the output vector,
Figure BDA0003617436510000075
is an offset vector or referred to as a bias parameter, w is a weight matrix (also referred to as a coefficient), and α () is an activation function. Each layer is only for the input vector
Figure BDA0003617436510000076
Obtaining the output vector by such simple operation
Figure BDA0003617436510000077
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure BDA0003617436510000078
The number of the same is also large. The definition of these parameters in DNN is as follows: taking coefficient w as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as
Figure BDA0003617436510000079
The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as
Figure BDA00036174365100000710
Note that the input layer is without the W parameter. In deep neural networks, more intermediate layers make the network more able to characterize complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.
(3) Convolutional neural network
A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel may be initialized in the form of a matrix of random size, and may be learned to obtain reasonable weights during training of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) Loss function
In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to make the prediction lower, and the adjustment is carried out continuously until the deep neural network can predict the really expected target valueOr a value very close to the real desired target value. Therefore, it is necessary to define in advance how to compare the difference between the predicted value and the target value, which is a loss function (loss function) or an objective function (objective function), as an important equation for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible. The loss function may generally include a loss function such as mean square error, cross entropy, logarithm, or exponential. For example, the mean square error can be used as a loss function, defined as
Figure BDA0003617436510000081
The specific loss function can be selected according to the actual application scenario.
(5) Back propagation algorithm
The neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming to obtain the parameters of the optimal neural network model, such as a weight matrix.
(6) Entropy coding
Entropy coding is coding without losing any information according to the principle of entropy in the coding process. Information entropy is the average amount of information (a measure of uncertainty) of a source. Common entropy coding methods are: shannon (Shannon) coding, Huffman (Huffman) coding, arithmetic coding (arithmet) and the like.
For example, if the distribution of pixel values of each pixel in the predicted image is known, the optimal compression scheme can be obtained by using an entropy coding technique. Using entropy coding techniques, an image with probability p can be encoded using-log 2 p bits. For example: image need with probability 1/8Represented by 3 bits, and an image with a probability of 1/256 would need to be represented by 8 bits.
The algorithm for determining the number of bits per letter requires that the probability of occurrence of each letter be known as accurately as possible and the task of the model is to provide this data. The better the prediction of the model the better the compression result. Furthermore, the model must present the same data during compression and recovery.
Static models (alternatively referred to as static entropy coding) analyze the entire word before compression to calculate the probability of each letter. This calculation is used throughout the text. The coding table needs to be calculated only once, so the coding speed is high, and the result is certainly not longer than the original text except the probability value required in decoding. In the method provided by the present application, the entropy coding adopted may include static entropy coding modes such as tANS or fse.
The probability of a dynamic model in this model varies continuously with the encoding process. This can be achieved by a number of algorithms, such as:
forward dynamic: the probability is calculated in terms of letters that have already been encoded, and increases each time a letter is encoded.
Reverse dynamic: the probability of each letter in the remaining uncoded part is calculated before encoding. As the encoding progresses and more letters are no longer present at the end, their probability becomes 0, while the probability of the remaining letters increases and the number of bits encoded for them decreases. The compression rate is so high that the last letter requires only 0 bits to encode.
Therefore, the model is optimized according to the specificity of different parts; the rate data need not be conveyed in the forward model.
In the present application, entropy coding is divided into multiple types, for example, static entropy coding, semi-dynamic entropy coding and dynamic entropy coding can be divided, and no matter which encoder is used, the purpose of implementation is as follows: for data with probability p, use the approximate log 2 The length of p encodes it out. The difference is that static entropy coding uses a single probability distribution for coding, semi-dynamic uses multiple (i.e., finite) probability distributions for coding, and dynamic entropy coding uses any infinite number of probability distributions for codingAnd (5) line coding.
(7) Autoregressive model
Is a way to process a time series that predicts the current data with previous history data of the same variable.
For example, using the same variable, e.g. x, for each preceding stage, i.e. x 1 To x t-1 To predict the current period x t And assume that they are in a linear relationship. Since this is developed from linear regression in regression analysis, x is used to predict x instead of y; so called autoregressive.
(8) Self-coding model
A self-coding model is a neural network that uses a back-propagation algorithm to make output values equal to input values, compressing the input data into a latent spatial representation, and then reconstructing the output from this representation.
The self-coding model generally includes an encoding (encoder) model and a decoding (decoder) model. In the present application, the trained coding model is used to extract features from an input image to obtain an implicit variable, and the implicit variable is input to the trained decoding model, so that a predicted residual corresponding to the input image can be output.
(9) Lossless compression
According to the technology for compressing the data, the occupied space of the compressed data is smaller than that of the compressed data before compression, the compressed data can be decompressed to restore original data, and the decompressed data is completely consistent with the data before compression.
Generally, the larger the probability of each pixel point in the image (i.e., the probability value obtained when the pixel value of the current pixel point is predicted by the pixel values of other pixel points), the shorter the compressed length. The probability of a truly existing image is much higher than a randomly generated image, and thus the number of bits required to compress each pixel (bpd) is much smaller than the latter. In practical applications, the BPD of most images is significantly less than before compression with only a very small probability higher than before compression, thereby reducing the average BPD per image.
(10) Compression ratio
The ratio of the original data size to the compressed data size, which is 1 if not compressed, is larger as better.
(11) Throughput capacity
The size of the raw data can be compressed/decompressed every second.
(12) Receptive field
When a pixel is predicted, a point that is known in advance is needed. Changing the point in the non-receptive field does not change the prediction of the pixel point.
The encoding method and the decoding method provided by the embodiment of the present application may be executed on a server, and may also be executed on a terminal device, and accordingly, the neural network mentioned below in the present application may be deployed on the server, may also be deployed on the terminal, and may specifically be adjusted according to an actual application scenario. For example, the encoding method and the decoding method provided by the present application may be deployed in a terminal in a plug-in manner. The terminal device may be a mobile phone with an image processing function, a Tablet Personal Computer (TPC), a media player, a smart tv, a notebook computer (LC), a Personal Digital Assistant (PDA), a Personal Computer (PC), a camera, a camcorder, a smart watch, a Wearable Device (WD), an autonomous vehicle, or the like, which is not limited in the embodiment of the present application. In the following, an example is given by taking the example that the encoding method and the decoding method provided in the present application are deployed in a terminal.
All or part of the processes in the encoding method and the decoding method provided by the application can be realized by a neural network, for example, an autoregressive model, an autorecoding model and the like can be realized by the neural network. Whereas generally a neural network needs to be deployed on a terminal after training, as shown in fig. 2, the present embodiment provides a system architecture 100. In fig. 2, a data acquisition device 160 is used to acquire training data. In some alternative implementations, the training data may include a large number of high definition images for autoregressive and self-coding models, as described herein.
After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130. Alternatively, the training set mentioned in the following embodiments of the present application may be obtained from the database 130, or may be obtained by inputting data by a user.
The target model/rule 101 may be a neural network trained in the embodiment of the present application, and the neural network may include one or more networks, such as an autoregressive model or an autocorrelation model.
The following describes that the training device 120 obtains the target model/rule 101 based on the training data, and the training device 120 processes the input three-dimensional model, and compares the output image with the high-quality rendering image corresponding to the input three-dimensional model until the difference between the output image of the training device 120 and the high-quality rendering image is smaller than a certain threshold, thereby completing the training of the target model/rule 101.
The target model/rule 101 can be used to implement the neural network mentioned in the encoding method and the decoding method of the embodiment of the present application, that is, the data to be processed (such as the image to be compressed) is input into the target model/rule 101 after being processed by correlation, i.e., the processing result can be obtained. The target model/rule 101 in the embodiment of the present application may specifically be a neural network mentioned below in the present application, and the neural network may be a CNN, DNN, RNN, or other type of neural network described above. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training, which is not limited in this application.
The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 2, where the execution device 110 may also be referred to as a computing device, and the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or the like, and may also be a server or a cloud device, or the like. In fig. 2, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: the client device inputs data to be processed. The client may be other hardware devices, such as a terminal or a server, and the client may also be software deployed on the terminal, such as an APP, a web page, and the like.
The preprocessing module 113 and the preprocessing module 114 are configured to perform preprocessing according to input data (such as data to be processed) received by the I/O interface 112, and in this embodiment, the input data may be processed directly by the computing module 111 without the preprocessing module 113 and the preprocessing module 114 (or only one of them may be used).
In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.
Finally, if the I/O interface 112 returns the processing result to the client device 140 and provides the processing result to the user, for example, if the first neural network is used for image classification and the processing result is a classification result, the I/O interface 112 returns the obtained classification result to the client device 140 and provides the classification result to the user.
It should be noted that the training device 120 may generate corresponding target models/rules 101 based on different training data for different targets or different tasks, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results. In some scenarios, the performing device 110 and the training device 120 may be the same device or may be located within the same computing device, and for ease of understanding, the performing device and the training device will be described separately and are not intended to be limiting.
In the case shown in fig. 2, the user may manually specify the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also act as a data collection terminal, collecting input data for the input I/O interface 112 and the predicted tag for the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted into the I/O interface 112 and the prediction tag outputted into the I/O interface 112 as shown in the figure may be directly stored into the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.
It should be noted that fig. 2 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and a positional relationship between devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 2, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.
As shown in fig. 2, a target model/rule 101 is obtained according to training of the training device 120, where the target model/rule 101 may be a neural network in the present application in this embodiment, and specifically, the neural network provided in this embodiment may include a CNN, a Deep Convolutional Neural Network (DCNN), a Recurrent Neural Network (RNN), or a constructed neural network, and the like.
The encoding method and the decoding method in the embodiments of the present application may be executed by an electronic device, which is the aforementioned execution device. The electronic equipment comprises a CPU and a GPU and can compress images. Of course, other devices, such as NPU or ASIC, may also be included, which is only an exemplary illustration and is not described in detail. Illustratively, the electronic device may be, for example, a mobile phone (mobile phone), a tablet computer, a notebook computer, a PC, a Mobile Internet Device (MID), a wearable device, a Virtual Reality (VR) device, an Augmented Reality (AR) device, a wireless electronic device in industrial control (industrial control), a wireless electronic device in self driving (self driving), a wireless electronic device in remote surgery (remote medical supply), a wireless electronic device in smart grid (smart grid), a wireless electronic device in transportation safety (transportation safety), a wireless electronic device in city (smart city), a wireless electronic device in smart home (smart home), and the like. The electronic device may be a device running an android system, an IOS system, a windows system, and other systems. An application program, such as communication software, an album, or a camera, that needs to compress an image to obtain a compressed image may be run in the electronic device.
Generally, in some image compression scenarios, entropy encoding may be employed for compression. The distribution of the image is unknown, the original distribution needs to be estimated, and the estimated distribution is input into an entropy encoder for encoding. Generally, the more accurate the estimate, the higher the compression ratio. The traditional image lossless compression algorithm mostly adopts the principle that similar pixel values are usually relatively close, and utilizes a fixed prediction method. This approach is inefficient in coding.
In some scenarios, the AI image may be compressed in a lossless compression manner, and compared with the conventional encoding algorithm, the AI algorithm may obtain a significantly higher compression rate, but the compression/decompression efficiency is very low.
For example, an autoregressive model may be used for image compression. If an autoregressive model is constructed, and the values of all the previous pixels are input, the distribution parameters of the predicted points can be output. If the distribution is Gaussian, the output is two parameters of mean and variance. When the autoregressive model is used for compression, all pixels can be input into the autoregressive model to obtain distribution prediction of the pixels, and the distribution prediction of the pixels and the values of the pixels are input into the entropy coder to obtain coded data. During decompression, all pixels are input into the autoregressive model to obtain the distribution prediction of the pixels, and the distribution prediction and the coded data thereof are input into the entropy coder to obtain the decoded data. However, in the encoding and decoding processes, the prediction of each pixel depends on all previous pixels, the operation efficiency is low, when decompressing, all previous pixels before the current pixel need to be decompressed before decompressing the current pixel, only one pixel can be decompressed by one-time network reasoning, the network reasoning frequency is large, and the decompression efficiency is low.
Also for example, a self-coding model may be employed for image compression. When encoding is carried out, inputting original data into an encoding network (Encoder) to obtain a hidden variable, and inputting the hidden variable into a decoding network (Decoder) to obtain distribution prediction of an image; inputting the distribution designed by hand and the value of the hidden variable into entropy coding to code the hidden variable; the distribution prediction of the image and the original image are input into entropy coding to code the image. When decoding, inputting the manually designed distribution and the coding of the hidden variables into entropy coding, and decoding the hidden variables; inputting the hidden variable into a decoding network (Decoder) to obtain the distribution prediction of the image; the distribution prediction of the image and the encoding of the image are input to entropy encoding, and the image is decoded. The fitting ability of the self-coding model is inferior compared to the autoregressive model. If the compression rate exceeds the traditional compression algorithm, a deeper network is needed, and the time delay of single network inference is high.
Therefore, the application provides an encoding method and a decoding method, lossless compression is carried out by utilizing an autoregressive model and an autoencoder model, and a high-efficiency semi-dynamic entropy encoder is provided, so that both the model reasoning and the encoding process run on an AI chip, the transmission between a system memory and an AI chip memory is reduced, and high-bandwidth compression and decompression are realized.
First, for ease of understanding, some application scenarios of the encoding method and the decoding method provided in the present application are exemplarily described.
Scene one, local storage shot image
Taking the case that the method provided by the present application is deployed in a terminal, the terminal may include a mobile phone, a camera, a monitoring device, or other devices having a shooting function or connected to a camera device. For example, as shown in fig. 3, after an image is obtained by shooting, for example, to reduce the storage space occupied by the image, the image may be subjected to lossless compression by the encoding method provided in the present application, so as to obtain compressed encoded data. When the image needs to be read, if the image is displayed in an album, the image can be decoded by the decoding method provided by the application, so that a high-definition image is obtained. By the method, the image can be efficiently subjected to lossless compression, the content required by the image is reduced, lossless recovery is carried out on the image, and the high-definition image is obtained by decompression.
Scene two, image transmission
In some communication scenarios, image transmission may be involved. For example, as shown in fig. 4, when a user uses communication software to perform communication, the image may be transmitted through a wired or wireless network, and in order to increase the transmission rate and reduce the network resources occupied by transmitting the image, the image may be losslessly compressed by the encoding method provided in the present application to obtain compressed encoded data, and then the encoded data may be transmitted. After the receiving end receives the encoded data, the encoding data can be decoded by the decoding method provided by the application, so that the restored image is obtained.
Scene three, server saves a large number of images
In some platforms or databases providing services for users, a large number of high-definition images generally need to be stored, and if the images are directly stored according to pixel points of each frame of image, a very large storage space needs to be occupied. For example, as shown in fig. 5, some shopping software or public data sets need to store a large number of high-definition images in a server, and a user can read a required image from the server. The coding method provided by the application can be used for efficiently carrying out lossless compression on the image needing to be stored to obtain compressed data. When the image needs to be read, the stored coded data can be decoded by the decoding method provided by the application, so that the high-definition image is obtained.
For ease of understanding, the flows of the encoding method and the decoding method provided in the present application are described below, respectively.
Referring to fig. 6, a flow chart of an encoding method provided by the present application is shown as follows.
601. The input image is used as the input of the autoregressive model, and the first image is output.
The input image may be an image to be compressed, and the autoregressive model may be configured to predict a pixel value of a current pixel by using values of other pixels in the input image except the current pixel, so as to obtain a predicted pixel distribution of each pixel, that is, the first image.
The input image may comprise a plurality of images, and the source of the input image may be different depending on the scene. For example, the input image may be a captured image or a received image.
Optionally, in the process of predicting by using the autoregressive model, for the pixels on the same connection line, the pixel values of the predicted pixels can be used for predicting, so that in the subsequent decoding process, for the pixels on the same connection line, the current pixels can be decoded without waiting for the decoding of other pixels, the decoding efficiency of the pixels on the same connection line is realized, and the decoding efficiency of the input image is improved. The same connecting line may be the same row, the same column, the same diagonal line, or the like, and may be determined specifically according to an actual application scenario.
602. And acquiring a residual error between the first image and the input image to obtain a first residual error image.
After the first image is obtained, a residual value between each pixel point in the first image and a corresponding pixel point in the input image can be calculated to obtain a first residual image.
The resolution between the first image and the input image is generally the same, that is, the pixels in the first image and the pixels in the input image are in one-to-one correspondence, so that when the residual value is calculated, the residual value between each pair of pixels can be calculated, and the obtained residual values can form an image, that is, the first residual image.
Optionally, when calculating the residual, the residual value is usually an integer type in the range of [ -255, 255], the residual value may be converted into a numerical type with low precision to be represented, for example, the integer is converted into a numerical type of uint8, so as to reduce the numerical value to [0, 255], and the residual value of each pixel point may be distributed near 128 by setting an offset, so as to make the data more concentrated, and the residual distribution between the input image and the autoregressive model output image may be represented by less data.
603. The input image is used as the input of the self-coding model, and the hidden variable and the first residual distribution are output.
After the input image is obtained, the input image can be used as the input of the self-coding model, and the corresponding hidden variable and the first residual distribution are output.
The hidden variable may include a feature extracted from the input image, and the first residual distribution may include residual values between respective pixel points of the input image predicted by the self-coding model and corresponding pixel points in the first residual image.
In particular, the self-encoding model may include an encoding model that may be used to extract features from an input image and a decoding model that is used to predict a residual between the input image and an image output by the autoregressive model. That is, features may be extracted from an input image by a coding model to obtain a hidden variable representing an important feature of the input image, and the first residual distribution may be output using the hidden variable as an input of a decoding model.
It should be noted that, in the present application, the execution order of step 601 and step 603 is not limited, and step 601 may be executed first, step 603 may also be executed first, step 601 and step 603 may also be executed simultaneously, and may specifically be adjusted according to an actual application scenario.
604. And coding the first residual image and the first residual distribution to obtain residual coded data.
After obtaining the first residual image and the first residual distribution, the first residual image and the first residual distribution may be encoded to obtain residual encoded data.
Specifically, when the first residual image and the first residual distribution are encoded, semi-dynamic entropy encoding may be adopted, that is, encoding is performed by adopting a limited number of probability distributions, so as to obtain encoded data of the residual image, that is, residual encoded data. The semi-dynamic entropy coder is used for entropy coding by using a first preset type of coding operation, the first preset type of coding operation comprises addition, subtraction or bit operation, the semi-dynamic entropy coder does not comprise a second preset type of coding operation, and the second preset type comprises at least one operation which consumes longer time in multiplication, division or residue taking operation so as to improve coding efficiency. Therefore, in the embodiment of the present application, the coding can be performed using a limited number of probability distributions, and the coding of the residual image can be obtained. Compared with dynamic entropy coding, the method has the advantages that more instructions are needed for decompressing one character; the time consumption of division and power is long, the time consumption of each instruction is tens of times of that of addition, efficient coding can be realized through the semi-dynamic entropy coding with limited probability distribution, and the coding efficiency is improved.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be subjected to approximation processing, for example, the operation of the dynamic entropy encoder is replaced by approximation operation, operations such as multiplication, division, remainder removal and the like are reduced or removed, and then, conversion processing may be performed to convert the operation, so that all operations (such as remaining operation loss such as remainder, multiplication, division and the like) which consume more than a certain time duration are converted into table access, and lightweight operations such as addition, subtraction, bit and the like, so as to obtain the semi-dynamic entropy encoder provided by the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
605. And coding the hidden variable to obtain residual coded data.
The hidden variable may include an important feature extracted from the input image, so that when the image is compressed, the extracted important feature may be encoded to obtain residual encoded data, so as to facilitate subsequent restoration of the image to obtain a lossless image.
Optionally, when the hidden variable is encoded, the hidden variable may be encoded by using static entropy coding. The hidden variable is used as an input of a static entropy coder, so that a coded bit stream of the hidden variable is output.
The hidden variable coded data and the residual coded data can be used for lossless recovery of an image at a decoding end, so that lossless compression and recovery of the image are realized.
Generally, the self-coding model is poor in fitting capability, a deeper network is needed to achieve a better compression ratio, and the output result of the autoregressive model is combined, so that the size of the self-coding model can be reduced. Therefore, in the application, the output results of the autoregressive model and the autoregressive model are combined for coding, so that the autoregressive model and the autoregressive model can be controlled to be very small, the problem of overlong reasoning time caused by overlarge network of the autoregressive model is solved, and efficient image compression is realized. In addition, in the method provided by the application, the whole process can be realized based on AI lossless compression of the AI chip, including an AI model and entropy coding, so that the transmission problem of a system memory and an AI chip memory is avoided, and the coding efficiency is improved.
The foregoing introduces a flow of the encoding method provided by the present application, and the following introduces the flow of the encoding method provided by the present application in more detail with reference to a specific application scenario. Referring to fig. 7, a flow diagram of another encoding method provided in the present application is shown.
First, an input image 701 is acquired.
The input image 701 may include an image acquired by itself or a received image. For example, if the method provided by the present application is deployed in a terminal, the input image may include an image captured by the terminal, or may be an image received by the terminal from another server or terminal.
Subsequently, the input image 701 is input to the autoregressive model 702, and a predicted image 703 is output.
The autoregressive model may be used to predict the pixel probability distribution of each pixel by using the pixel adjacent to each pixel, so as to obtain a predicted image 703, that is, the first image.
It is understood that the autoregressive model can use the pixel values of the neighboring pixels to predict the pixel value of the current pixel.
In the embodiment of the present application, in order to accelerate the decoding efficiency of the decoding end, when the autoregressive model performs prediction, for a pixel point on the same line, the pixel value of a pixel point adjacent to the pixel point can be used to perform prediction in parallel. Taking a specific autoregressive model as an example, as shown in fig. 8, given an m × n image and a hyper-parameter h (0 ≦ h < n), if all points (i ', j') of the prediction (i, j) in the autoregressive model satisfy: h × i '+ j' < h × i + j, the image can be predicted by n + (m-1) × h times in parallel, as shown in fig. 8, when h ═ 1, for the pixel points on the same diagonal line, the pixel values of a plurality of pixel points on the left side of the pixel point can be selected by using 1 as a unit as a receptive field, so as to predict the pixel probability distribution of the current pixel point, that is, the probability that the pixel point is each pixel value. As shown in fig. 8, when h is 2, the pixel values of a plurality of pixels on the left side thereof may be selected in units of 2 as a receptive field to predict the pixel probability distribution of the current pixel. Therefore, when decompression is carried out subsequently, decompression can be carried out in parallel aiming at the pixel points on the same diagonal line.
In addition, the prediction order for each pixel point can be as shown in fig. 9, where a smaller number indicates that the prediction order is more preferred, and pixel points with the same number are predicted at the same time. Therefore, the pixel points on the same diagonal line can be predicted in parallel, and the prediction efficiency of the autoregressive model is improved.
Subsequently, the residual between the predicted image and the input image is calculated, resulting in an image residual 704.
After the predicted image 703 output by autoregressive is obtained, a residual between the predicted image and each pixel point in the input image can be calculated, and an image residual 704, that is, the aforementioned first residual image, is obtained.
If an original image x, namely an input image is given, the original image is predicted by using an autoregressive model to obtain a predicted reconstructed image
Figure BDA0003617436510000151
The image residual error between each pixel point of the reconstructed image and each pixel point of the original image can be calculated
Figure BDA0003617436510000152
For example, as shown in fig. 10, after the input image and the predicted image are obtained, the difference between the corresponding pixel points of the input image and the predicted image may be calculated to obtain the residual value between each pixel point, so as to form a residual image.
Alternatively, when calculating the residual, the range of integers of [ -255, 255] may be used, the residual value may be converted into a numerical type with low precision to represent, for example, the integers are converted into a numerical type of uint8 to reduce the numerical value to [0, 255], and the residual value of each pixel point may be distributed around 128 by setting the offset, so that the data is more concentrated, and the residual distribution between the input image and the output image of the autoregressive model may be represented by less data.
For example, when an original image x is input and y is output by using an autoregressive model, a residual is calculated as a predicted image x' is round (clip (y,0, M-1)), and
Figure BDA0003617436510000153
wherein the value range of each pixel in x' is an integer from 0 to M-1; using model two to predict r to obtain distribution N (mu, sigma), then using
Figure BDA0003617436510000154
Distributed coding
Figure BDA0003617436510000155
Where N is a Gaussian or logical distribution.
Further, the input image is input to the self-encoding model 705, and a prediction residual 707 and a hidden variable 706 are output.
As the original image x can be input to the self-coding model, the probability distribution p (r | x) of the image residual r, i.e. the prediction residual 707, is estimated using the self-coding model.
Specifically, the self-coding model may include an encoding model (encoder) and a decoding model (decoder), and an input image may be used as an input of the encoding model, and an important feature may be extracted from the input image to obtain a hidden variable 706, and then the hidden variable is used as an input of the decoding model to output a prediction residual 707.
Generally, the self-coding model may be a pre-trained model, and specifically may adopt an Auto Encoder (AE), a Variable Auto Encoder (VAE), a VQ-VAE (Vector quantized-variable auto encoder), or the like, and may be specifically adjusted according to an actual application scenario, which is not limited in this application.
The hidden variable 706 may then be encoded, resulting in a hidden variable encoding 708.
Specifically, the hidden variable may be encoded by using static entropy coding. That is, data with a high probability is represented by a short number of bits, and data with a low probability is represented by a long number.
For example, a tree structure may be as shown in fig. 11, and the corresponding bits may be represented as shown in table 1.
Character(s) Probability of occurrence Encoding
a 1 0.4 0
a 2 0.35 10
a 3 0.2 110
a 4 0.05 111
TABLE 1
Thus, data a 1 a 2 a 1 a 4 Encoded to 0100110.
Further, the image residual 704 and the prediction residual 707 may be encoded to obtain a residual code 709.
Specifically, semi-dynamic entropy coding may be performed on image residual 704 and prediction residual 707, resulting in residual coding.
For ease of understanding, the differences between dynamic entropy coding and the semi-dynamic entropy coding provided herein are described.
First, taking the rANS encoding as an example, the dynamic encoding represents data by using a state (usually a large integer or a size number), changes the state value by using probability information of the data, and finally, the encoded value is represented by 0 and 1 of the state. In rANS coding, an M value is first set to represent the number of bits required to represent a probability. For a character a i Their corresponding PMF i Proportional to its probability and adding to 2 M (ii) a Its corresponding CDF i For the accumulation of values of all preceding PMFs, i.e. PMFs 1 +PMF 2 +…+PMF i-1 . In the above table, when M is 4, the PMF and CDF corresponding to the probability values are shown in table 2:
character(s) Probability of PMF CDF
a 1 0.4 6 0
a 2 0.35 6 6
a 3 0.2 3 12
a 4 0.05 1 15
TABLE 2
If the states before and after compressing a character x are S, S', respectively
S’=S/PMF(x)*2 M +CDF(x)+S%PMF(x)
The dynamic entropy coding can also be used as static entropy coding, and when the value in the table is a fixed value, the static entropy coding is performed; when tables of different symbols are not identical, dynamic entropy coding is required.
The speed bottleneck in dynamic entropy coding includes: symbol search and operation in decompression: the division and the remainder operation are the most time-consuming and the multiplication is the next. Therefore, the present application provides a semi-dynamic entropy coding, which aims at the efficiency reduction caused by the wireless probability distribution method in the dynamic entropy coding. Based on the dynamic entropy coding, namely the coding formula of rANS, firstly carrying out approximate processing, for example, replacing operations such as multiplication, division, remainder and the like in the dynamic entropy coding with approximate light-weight operations such as addition, subtraction, bit and the like, and greatly reducing or removing operations such as multiplication, division, remainder and the like on the premise of little compression ratio loss; and then all operations (such as residual operation loss, multiplication, division and the like) which take more time than a certain time length are converted into table access and light-weight operations such as addition, subtraction, bit and the like through a series of conversion processing. It can be understood that the semi-dynamic entropy coding provided by the application can remove all time-consuming operations such as symbol search, multiplication, division and remainder through algorithm transformation and tabulation processing, and achieve a throughput rate equivalent to that of static entropy coding.
For example, similar to a common rANS implementation, the state value S is truncated and approximated, but the differences include:
unlike normal rANS, which truncates S to [2 ] M ,2 2M ) In total 2 2M -2 M A seed status; this scheme truncates it to [2 ] M ,2 M +1 ) In total 2 M And (4) a state. So as to realize smaller state space and facilitate the subsequent tabulation processing;
unlike the conventional rANS which uses division and remainder calculation, the scheme changes the method into an approximate solution method of circular + bit operation, so that the storage space required by the tabulation can be further reduced. The loop time in the calculation is long, so the time consumption is usually higher than that of the original rANS after the processing, but in the subsequent processing, the loop times are tabulated to realize efficient compression and decompression.
In the compression process, for each distribution and sign, the number of cycles (i.e. the right shift number of the state) is pre-calculated and stored by using a table, and the difference value between the next state and the state under the distribution and sign. If compressing, looking up the table to obtain the corresponding delta for each input distribution index and symbol, and calculating to obtain the state right shift number b ═ delta + S > M; pushing the rightmost b bits of the state into a memory, and shifting the state value to the right by b bits; and looking up a table to obtain a difference value between the next state and the state through the distribution index and the symbol, and adding the difference value to the current state value to obtain an updated state value.
Compared with the number of direct storage cycles, the scheme stores the intermediate result delta, the cycle number can be calculated as (delta + S) > M, and the coding mode provided by the application can reduce the memory space required by the storage table. Compared with the method for directly storing the difference value of the two states, the method for semi-dynamic entropy coding can store the difference value of the two states by unsigned numbers after the storage states are shifted to the right in the semi-dynamic entropy coding mode, and the memory space is reduced by half under the same number of bits.
After the residual coding 709 and the hidden variable coding 708 are obtained, the subsequent operations can be performed. For example, the residual coding 709 and the hidden variable coding 708 are saved, or the residual coding 709 and the hidden variable coding 708 are transmitted to the receiving end, which may be determined according to the actual application scenario.
Therefore, the method provided by the embodiment of the application can be applied to lossless compression of the image, and efficient lossless compression of the image is realized. And an efficient semi-dynamic entropy coder is provided, so that both the model reasoning and the coding process run on the AI chip, the transmission between the system memory and the memory of the AI chip is reduced, and the high-bandwidth compression and decompression are realized.
The flow of the encoding method provided by the present application is described above, and the flow of the decoding method corresponding thereto, i.e., the inverse operation of the encoding flow, is described below. Referring to fig. 12, a flowchart of a decoding method provided in the present application is shown as follows.
1201. And acquiring hidden variable coded data and residual coded data.
The decoding end may locally read the hidden variable coded data and the residual coded data, or receive the hidden variable coded data and the residual coded data sent by the encoding end, and specifically, the source of the hidden variable coded data and the source of the residual coded data may be determined according to an actual application scenario, which is not limited in this application.
Specifically, the latent variable encoded data may be obtained by encoding a feature extracted from the input image by an encoding end. The residual encoded data may be obtained by encoding the image residual and the prediction residual by an encoding end, where the image residual may include a residual between an input image of the encoding end and an image output by the autoregressive model. The hidden variable encoded data and the residual encoded data can refer to the related descriptions in fig. 6 to fig. 11, and are not described herein again.
1202. And decoding the hidden variable coded data to obtain the hidden variable.
The decoding mode of the hidden variable coded data can correspond to the coding end. For example, if the encoding end employs a static entropy encoder to perform encoding, the static entropy encoder may be used to perform decoding during decoding. The hidden variable coded data is output as an input to a static entropy coder. The hidden variables may include features extracted from the input image, which for the decompression end represent features in the decompressed image.
1203. And outputting the second residual distribution by taking the hidden variable as the input of the self-coding model.
After the latent variable coded data is decoded to obtain the latent variable, the latent variable is used as an input of the self-coding model, and a corresponding second residual distribution, that is, an image corresponding to the first residual distribution at the coding end, is output, which can be understood as representing a residual distribution between an image output by the self-regression model at the coding end and the input image.
Specifically, the self-coding model may include a decoding model, and the predicted residual image may be output by using the hidden variable as an input of the decoding model. The decoding model may be a trained model for outputting a residual image corresponding to the input image, and the residual image may be understood as a residual value between a residual image predicted by the autoregressive model and the input image.
It should be noted that the encoding end and the decoding end are both provided with an autoregressive model and an autoregressive model, and the autoregressive model of the encoding end is the same as the autoregressive model of the decoding end, if the encoding end and the decoding end are provided in the same device, the autoregressive models of the encoding end and the decoding end are the same, if the encoding end and the decoding end are provided in different devices, the encoding end and the decoding end may be provided with the same autoregressive model, or a complete autoregressive model may be provided at the encoding end, and the decoding end is provided with the decoding model in the autoregressive model, which may be specifically adjusted according to an actual application scenario, which is not limited in this application.
1204. And decoding by combining the second residual distribution and the residual coded data to obtain a second residual image.
After the second residual distribution and the residual coded data are obtained, the second residual distribution and the residual coded data can be combined for decoding to obtain a second residual image.
Specifically, if the encoding end performs encoding by using semi-motion entropy encoding, the decoding end may also perform decoding based on semi-motion entropy encoding, and output a second residual image, that is, an image corresponding to the first residual image at the encoding end. The semi-dynamic entropy coder is used for entropy coding by using a first preset type of coding operation, the first preset type of coding operation comprises addition, subtraction or bit operation, the semi-dynamic entropy coder does not comprise a second preset type of coding operation, the second preset type comprises at least one of multiplication, division or residue taking operation, namely the semi-dynamic entropy coder does not comprise multiplication, division or residue taking operation which consumes long time, namely the semi-dynamic entropy coder can only comprise simple addition and subtraction operation, and therefore high-efficiency coding can be achieved.
More specifically, the semi-dynamic entropy encoder may participate in the related descriptions in fig. 6 to fig. 11, which are not described herein again.
It can be understood that, in the process of obtaining residual encoded data by encoding the first residual image and the first residual distribution by the encoding end, after the decoding end obtains the second residual distribution and the residual encoded data, the decoding end may perform inverse operation to infer the second residual distribution, which is equivalent to obtaining a residual between the first image output by the encoding end autoregressive model and the input image, that is, the first residual distribution.
1205. And taking the second residual image as the backward propagation input of the autoregressive model, and outputting the decompressed image.
After the second residual image is obtained, the second residual distribution can be used as the input of the autoregressive model for reverse propagation, and a decompressed image is deduced, namely the lossless recovery of the input image of the encoding end is realized.
In addition, when the second residual image is used as the input of the autoregressive model for reverse propagation, if the autoregressive model of the encoding end predicts the values of the pixels on the same connecting line by using the predicted pixel values of the pixels, the values of the pixels on the same connecting line can be decoded in parallel when the decoding end performs decoding operation, so that efficient decoding is realized. The same line may be the same row, the same column, the same diagonal line, or the like, and may be determined specifically according to an actual application scenario.
Therefore, according to the embodiment of the application, the self-encoding model is generally poor in fitting capability, a deeper network is needed to achieve a better compression rate, and the size of the self-encoding model can be reduced by combining the output result of the autoregressive model. Therefore, in the application, the autoregressive model and the autoregressive model are combined for decoding, so that the autoregressive model and the autoregressive model can be controlled to be very small, the problem of overlong reasoning time caused by overlarge network of the autoregressive model is solved, and efficient image decompression is realized. In addition, in the method provided by the application, the whole process can be realized based on AI lossless compression of the AI chip, including an AI model and entropy coding, so that the transmission problem of a system memory and an AI chip memory is avoided, and the coding efficiency is improved.
For ease of understanding, the following describes a flow of the decoding method provided in the present application with reference to a specific application scenario, and with reference to fig. 13, a schematic flow diagram of another decoding method provided in the present application is described as follows.
First, latent variable coding 1301 and residual coding 1302 are obtained.
The hidden variable encoding 1301 and the residual encoding 1302 may be read locally or received from an encoding end, and may be specifically adjusted according to an actual application scenario. For example, the implicit variable coding 1301 and the residual coding may be the implicit variable coding 708 and the residual coding 709 mentioned in fig. 7.
Then, the hidden variable encoding 1301 is input to the static entropy encoder 1303, and the hidden variable 1304 is output.
In general, as shown in table 1, after a bit stream encoded by an implicit variable is obtained, the probability corresponding to each character is determined according to the correspondence relationship, so as to output an implicit variable, which can be understood as an important feature in a decompressed image.
The prediction residual 1306 is then output using the hidden variable 1304 as input from the decoding model 1305.
The decoding model is similar to the aforementioned decoding model in fig. 7, and is not described herein again. The prediction residual 1306 is similar to the prediction residual 707 described above and will not be described herein.
The residual coding 1302 and the prediction residual 1306 are then both taken as inputs to a semi-dynamic entropy coder, outputting an image residual 1308.
The image residual 1308 is similar to the image residual 704, and is not described herein again.
The decoding process of semi-dynamic entropy coding can be understood as the inverse operation of the semi-dynamic entropy coding, that is, in the case of known prediction residual and residual coding, the image residual is deduced reversely. Such as: solving the state value of the current symbol: s ═ S'% 2 M Finding out the symbol x corresponding to s, and decompressing x. The requirement of S is less than or equal to CDF (x)<CDF (x) + PMF (x), according to the decoded symbol x, restoring the state value of the previous step: s ═ S'/2 M *PMF(x)+S’%2 M –PMF(x)。
After the image residual 1308 is obtained, the image residual can be used as an input to the back propagation of the autoregressive model 1309 to infer a decompressed image 1310.
It is understood that the autoregressive model 1309 is a trained model, and as with the autoregressive model 702 described above, it is understood that the input image 701 is inferred in reverse when the image residual is known.
Optionally, if the encoding end uses the predicted pixel values of the pixels to predict the pixel values of the pixels on the same line in parallel when outputting the prediction residual through the autoregressive model, the encoding end may decode the pixel values of the pixels on the same line when performing reverse propagation on the autoregressive model, thereby implementing parallel decoding.
For example, given an m × n image and a hyper-parameter h (0 ≦ h < n), if all points (i ', j') of the prediction (i, j) in the autoregressive model satisfy: h i '+ j' < h i + j, the image can be decompressed by n + (m-1) × h parallel computations. The decompression sequence comprises:
decompressing the dots in the first row in sequence: (0,0), (0,1), …, (0, n-1). Decompressing (1, j-h) at the same time when the (0, j) point is decompressed and if j-h is more than or equal to 0; if j-hX 2 is not less than 0, decompressing (2, j-hX 2) at the same time, and so on;
decompressing points in the second line in sequence: (1, n-h-1), …, (1, n-1). Decompressing (2, j-h) at the same time when the (1, j) point is decompressed and if j-h is more than or equal to 0; if j-hX 2 is not less than 0, decompressing (3, j-hX 2) at the same time, and so on;
decompress according to this rule until all points are decompressed.
Therefore, the encoding and decoding efficiency can be greatly improved and the more efficient image compression can be realized through the parallel encoding and decoding mode of the same line.
For the convenience of understanding, the following description will take some specific application scenarios as examples to illustrate the effects achieved by the present application.
Firstly, a neural network model with an autoregressive model and an autoregressive coding model as cores needs to be constructed, the autoregressive model in the technical scheme implements lightweight design and only comprises 12 parameters, and each channel only needs 4 parameters for predicting the three-channel image. The self-encoder model uses a vector quantization self-encoder, and uses a vector codebook to reduce the space of hidden variables, and the size of the codebook is set to be 256, that is, the value space of the hidden variables in the self-encoder is limited to 256 integers. The encoder and decoder of the self-encoder all adopt four residual rolling blocks, and the number of channels of each layer is 32.
The model training process and the test process are as follows:
training: and training on a training set of a single data set to obtain parameters of an autoregressive model and an autoregressive coding model and statistic of hidden variables for compression of the hidden variables.
Compression: by the method provided by the application, all the test images of a single data set are stacked together in the batch dimension to form a four-dimensional tensor. And taking the four-dimensional tensor as the input of the flow at one time, and outputting the residual coding of all the images and the coding of the hidden variable in parallel.
Decompressing: according to the method, residual coding and hidden variables of all images are used as input in a decompression flow at one time, and original images of all the images are output in parallel.
In contrast to some common Lossless Compression, such as L3C (Practical Full Resolution left Lossless Image Compression), flif (free Lossless Image format based on manual Compression), WebP or png (portable Network graphics), the method provided by the present application is called PILC (Practical Image Lossless Compression), see table 3.
Figure BDA0003617436510000191
Figure BDA0003617436510000201
TABLE 3
As can be seen from Table 3, compared with the conventional AI image lossless compression algorithm, namely L3C, the throughput rate of the invention is improved by 14 times under the condition of keeping the compression rate basically equivalent, and the invention is also superior to the conventional methods such as PNG, WebP, FLIF and the like in compression rate and throughput rate.
Therefore, the method provided by the application combines the autoregressive model and the self-coding model, and compared with a method of singly using the self-coding model, the size of the model is greatly reduced. The autoregressive model provided by the application can realize parallel coding and parallel decompression, can realize efficient coding and decoding, and realizes efficient image compression and decompression. The flow of the method provided by the application can be operated on the AI chip, so that the information transmission between the system memory and the memory of the AI chip is avoided, and the coding and decoding efficiency is further improved.
In addition, in order to perform efficient compression and decompression of high-definition large images with different sizes, the present embodiment is designed as follows.
Model training: in the model training stage, model training is performed by using a high-definition large data set such as OpenImage, ImageNet64 and the like, so as to obtain parameters of an autoregressive model and an autoregressive coding model.
Compression:
firstly, preprocessing images, uniformly slicing high-definition large images with different sizes into the same size (such as 32x 32), and independently storing the size information of each image for restoring the images;
stacking all the slices in a batch dimension as an input of a process;
outputting residual coding of all images and coding of hidden variables in parallel;
the statistical information of hidden variables of each data set (same data set)/each image (different data set) is recorded as another output of the process.
The effects achieved can be seen in table 4.
Figure BDA0003617436510000202
TABLE 4
Obviously, the method provided by the application can realize higher throughput rate and high-efficiency coding and decoding.
More specifically, a more detailed comparison is made below in some commonly used compression modes.
Referring to table 5, on the premise that the maximum likelihood (an index for evaluating the prediction accuracy of the generated model, the smaller the value is, the better) is basically consistent with the fastest AI algorithm L3C, the inference speed is improved by 9.6 times.
Figure BDA0003617436510000211
TABLE 5
Referring to table 6, the same autoregressive model, with the parallel scheme provided in this application, the decompression speed is improved by a factor of 7.9 compared to the unparalleled scheme. The parallel scheme has a limit on the receptive field, but this receptive field has a limited effect on the compression ratio.
Receptive field Whether or not to be parallel BPD Throughput rate (MB/s)
3 Is that 5.77 382.5
3 Whether or not 5.77 48.5
4 Whether or not 5.77 47.5
7 Whether or not 5.74 44.0
TABLE 6
Referring to table 7, compared with the dynamic entropy coding (rANS), the semi-dynamic entropy coding (ANS-AI) proposed by the present application has an encoding speed improved by 20 times, a decoding speed improved by 100 times, and a BPD loss smaller than 0.55 and 0.17. And the semi-dynamic entropy coding can be operated on an AI chip, and the peak speed can reach 1GB/s on a single V100 chip.
Figure BDA0003617436510000212
TABLE 7
In addition, compared with the dynamic entropy coding, the number of the required distribution types is reduced to 8 from 2048, the size of the memory required by the preprocessing is reduced to 1/256, the BPD loss is less than 0.03, the calculation resources required by the entropy coding can be reduced, and the coding efficiency is improved.
The foregoing describes the flow of an image encoding method and an image decompression method provided by the present application, and an apparatus for performing the foregoing methods is described below.
Referring to fig. 14, a schematic structural diagram of an image encoding device provided in the present application includes:
an autoregressive module 1401 for taking the input image as an input of an autoregressive model, outputting a first image, the autoregressive model;
a residual calculation module 1402, configured to obtain a residual between the first image and the input image, so as to obtain a first residual image;
a self-coding module 1403, configured to output an implicit variable and a first residual distribution by using the input image as an input of a self-coding model, where the implicit variable includes a feature extracted from the input image, and the first residual distribution includes a residual value output by the self-coding model and used for representing each pixel point in the input image and each corresponding pixel point in the first residual image;
a residual coding module 1404, configured to code the first residual image and the first residual distribution to obtain residual coded data;
and the latent variable coding module 1405 is configured to code the latent variable to obtain latent variable coded data, and the latent variable coded data and the residual coded data are used for decompression to obtain an input image.
In a possible embodiment, the residual coding module 1404 is specifically configured to output residual coded data by using the first residual image and the first residual distribution as inputs of a semi-dynamic entropy coder, where the semi-dynamic entropy coder is configured to perform entropy coding using a first preset type of coding operation, the first preset type of coding operation includes an addition, a subtraction, or a bit operation, the semi-dynamic entropy coder does not include a second preset type of coding operation, and the second preset type includes at least one of a multiplication, a division, or a remainder operation, that is, the semi-dynamic entropy coder does not include a long-time operation such as a multiplication, a division, or a remainder operation.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be subjected to approximation processing, for example, the operation of the dynamic entropy encoder is replaced by approximation operation, operations such as multiplication, division, remainder removal and the like are reduced or removed, and then, conversion processing may be performed to convert the operation, so that all operations (such as remaining operation loss such as remainder, multiplication, division and the like) which consume more than a certain time duration are converted into table access, and lightweight operations such as addition, subtraction, bit and the like, so as to obtain the semi-dynamic entropy encoder provided by the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
In a possible implementation, the hidden variable encoding module 1405 is specifically configured to use a hidden variable as an input of the static entropy encoder to obtain hidden variable encoded data.
In a possible implementation, the self-coding model includes a coding model and a decoding model, and the self-coding module 1403 is specifically configured to: taking an input image as the input of a coding model, and outputting a hidden variable, wherein the coding model is used for extracting features from an input image; and taking the hidden variable as the input of a decoding model to obtain a first residual distribution, wherein the decoding model is used for predicting residual values between the input image and the corresponding pixel distribution.
In one possible embodiment, the autoregressive model is used to predict the values of pixels on the same line using the pixel values of the predicted pixels.
Referring to fig. 15, the present application provides a schematic structural diagram of an image decompression apparatus, where the image decompression apparatus includes:
a transceiver module 1501, configured to obtain implicit variable coded data and residual coded data, where the implicit variable coded data includes a feature extracted from an input image by a coding end and is obtained by coding, and the residual coded data includes data obtained by coding a residual between a first image output by an autoregressive model and the input image;
a hidden variable decoding module 1502, configured to decode the hidden variable encoded data to obtain a hidden variable, where the hidden variable includes a feature extracted from the input image by the encoding end;
a self-encoding module 1503, configured to output a second residual distribution by using the hidden variable as an input of a self-encoding model;
a residual decoding module 1504, configured to decode in combination with the second residual distribution and the residual encoded data to obtain a second residual image;
an autoregressive module 1505 for outputting the decompressed image using the second residual image as an input for the back propagation of the autoregressive model.
In a possible implementation, the hidden variable decoding module 1502 is specifically configured to output hidden variables by using the hidden variable encoded data as an input of the static entropy encoder.
In a possible embodiment, the residual decoding module 1504 is specifically configured to output the second residual image by using the second residual distribution and the residual encoded data as input of a semi-dynamic entropy encoder, where the semi-dynamic entropy encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes an addition, a subtraction, or a bit operation, the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of a multiplication, a division, or a remainder operation, that is, the semi-dynamic entropy encoder does not include a long-time operation such as a multiplication, a division, or a remainder operation.
In one possible implementation, the semi-dynamic entropy encoder may be a transform of a dynamic up-coder. Specifically, the operation of the dynamic entropy encoder may be subjected to approximation processing, for example, the operation of the dynamic entropy encoder is replaced by approximation operation, operations such as multiplication, division, remainder removal and the like are reduced or removed, and then, conversion processing may be performed to convert the operation, so that all operations (such as remaining operation loss such as remainder, multiplication, division and the like) which consume more than a certain time duration are converted into table access, and lightweight operations such as addition, subtraction, bit and the like, so as to obtain the semi-dynamic entropy encoder provided by the present application. It can be understood that the semi-dynamic entropy encoder may be an entropy encoder obtained by replacing or converting some operations in the dynamic entropy encoder, and when the semi-dynamic entropy encoder is used for entropy encoding, simple operations, such as addition, subtraction, bit operation and other operations of high-efficiency encoding, can be used, so as to realize high-efficiency encoding.
In a possible implementation manner, the autoregressive module 1505 is specifically configured to perform parallel decoding on pixel points located on the same connecting line in the second residual image through an autoregressive model, so as to obtain a decompressed image.
Referring to fig. 16, a schematic structural diagram of another image encoding device provided in the present application is described as follows.
The image encoding apparatus may include a processor 1601 and a memory 1602. The processor 1601 and the memory 1602 are interconnected by a line. The memory 1602 has stored therein program instructions and data.
The memory 1602 stores program instructions and data corresponding to the steps of fig. 6-11.
The processor 1601 is configured to perform the method steps performed by the image encoding apparatus shown in any of the foregoing fig. 6-11.
Optionally, the image encoding apparatus may further include a transceiver 1603 for receiving or transmitting data.
Also provided in an embodiment of the present application is a computer-readable storage medium having stored therein a program for generating a running speed of a vehicle, which when running on a computer, causes the computer to execute the steps in the method as described in the foregoing embodiments shown in fig. 6 to 11.
Alternatively, the aforementioned image encoding device shown in fig. 16 is a chip.
The embodiment of the present application further provides an image encoding apparatus, which may also be referred to as a digital processing chip or a chip, where the chip includes a processing unit and a communication interface, the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit, and the processing unit is configured to execute the method steps executed by the image encoding apparatus shown in any one of the foregoing fig. 6 to fig. 11.
The embodiment of the application also provides a digital processing chip. Integrated with the digital processing chip are circuitry and one or more interfaces for performing the functions of the processor 1601, or the processor 1601 as described above. When integrated with memory, the digital processing chip may perform the method steps of any one or more of the preceding embodiments. When the digital processing chip is not integrated with the memory, the digital processing chip can be connected with the external memory through the communication interface. The digital processing chip implements the operations performed by the image encoding apparatus in the above embodiments according to the program code stored in the external memory.
The image encoding device provided by the embodiment of the application can be a chip, and the chip comprises: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer-executable instructions stored by the storage unit to cause the chip in the server to perform the image encoding method described in the embodiments shown in fig. 6-11 above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Referring to fig. 17, a schematic structural diagram of another image decompression apparatus provided in the present application is as follows.
The image decompression apparatus may include a processor 1701 and a memory 1702. The processor 1701 and the memory 1702 are interconnected by a line. Among other things, memory 1702 has stored therein program instructions and data.
The memory 1702 stores program instructions and data corresponding to the steps of fig. 12-13 described above.
The processor 1701 is configured to execute the method steps performed by the image decompression apparatus shown in any of the embodiments of fig. 12-13.
Optionally, the image decompression apparatus may further include a transceiver 1703 for receiving or transmitting data.
Also provided in an embodiment of the present application is a computer-readable storage medium having stored therein a program for generating a running speed of a vehicle, which when running on a computer, causes the computer to execute the steps in the method as described in the foregoing embodiment shown in fig. 12 to 13.
Alternatively, the aforementioned image decompression apparatus shown in fig. 17 is a chip.
The present application further provides an image decompression apparatus, which may also be referred to as a digital processing chip or a chip, where the chip includes a processing unit and a communication interface, the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit, and the processing unit is configured to execute the method steps executed by the image decompression apparatus shown in any one of the foregoing embodiments in fig. 12 to 13.
The embodiment of the application also provides a digital processing chip. Integrated with the digital processing chip are circuitry and one or more interfaces for implementing the processor 1701, or the functionality of the processor 1701, as described above. When integrated with memory, the digital processing chip may perform the method steps of any one or more of the preceding embodiments. When the digital processing chip is not integrated with the memory, the digital processing chip can be connected with the external memory through the communication interface. The digital processing chip implements the operations performed by the image decompression apparatus in the above embodiments according to the program code stored in the external memory.
The image decompression device that this application embodiment provided can be the chip, and the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer executable instructions stored in the storage unit to enable the chip in the server to execute the image decompression method described in the embodiments shown in fig. 6-11. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to execute the steps performed by the image decompression apparatus or the image decompression apparatus in the methods described in the foregoing embodiments shown in fig. 6 to 13.
The present application further provides an image processing system comprising an image encoding apparatus configured to perform the method steps corresponding to the aforementioned fig. 6-11, and an image decompression apparatus configured to perform the method steps corresponding to the aforementioned fig. 12-13.
Specifically, the processing unit or the processor may be a Central Processing Unit (CPU), a Network Processor (NPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like.
Referring to fig. 18, fig. 18 is a schematic structural diagram of a chip according to an embodiment of the present disclosure, where the chip may be represented as a neural network processor NPU 180, and the NPU 180 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 1803, and the controller 1804 controls the arithmetic circuit 1803 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 1803 internally includes a plurality of processing units (PEs). In some implementations, the operational circuitry 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1803 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from weight memory 1802 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1801 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 1808.
The unified memory 1806 is used for storing input data and output data. The weight data directly passes through a Direct Memory Access Controller (DMAC) 1805, and the DMAC is transferred to the weight memory 1802. The input data is also carried to the unified memory 1806 via the DMAC.
A Bus Interface Unit (BIU) 1810 for interaction of the AXI bus with the DMAC and the instruction fetch memory (IFB) 1809.
The bus interface unit 1810(bus interface unit, BIU) is configured to obtain an instruction from the external memory by the instruction fetch memory 1809, and is further configured to obtain the original data of the input matrix a or the weight matrix B from the external memory by the storage unit access controller 1805.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1806, to transfer weighted data to the weighted memory 1802, or to transfer input data to the input memory 1801.
The vector calculation unit 1807 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as batch normalization (batch normalization), pixel-level summation, up-sampling of a feature plane and the like.
In some implementations, the vector calculation unit 1807 can store the processed output vector to the unified memory 1806. For example, the vector calculation unit 1807 may apply a linear function and/or a non-linear function to the output of the arithmetic circuit 1803, such as linear interpolation of the feature planes extracted by the convolutional layers, and further such as a vector of accumulated values to generate the activation values. In some implementations, the vector calculation unit 1807 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 1803, e.g., for use in subsequent layers in a neural network.
An instruction fetch buffer 1809 connected to the controller 1804, configured to store instructions used by the controller 1804;
the unified memory 1806, the input memory 1801, the weight memory 1802, and the instruction fetch memory 1809 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
The operation of each layer in the recurrent neural network can be performed by the operation circuit 1803 or the vector calculation unit 1807.
Where the processor referred to herein may be a general purpose central processing unit, microprocessor, ASIC, or one or more integrated circuits adapted to control the execution of the programs of the methods of fig. 6-13.
It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims (23)

1. An image encoding method, comprising:
taking an input image as an input of an autoregressive model, and outputting a first image;
obtaining a residual error between the first image and the input image to obtain a first residual error image;
taking the input image as an input of a self-coding model, and outputting an implicit variable and a first residual distribution, wherein the implicit variable comprises features extracted from the input image, and the first residual distribution comprises residual values output by the self-coding model and used for representing each pixel point in the input image and each corresponding pixel point in the first residual image;
coding the first residual image and the first residual distribution to obtain residual coded data;
and coding the hidden variables to obtain hidden variable coded data, wherein the hidden variable coded data and the residual coded data are used for decompressing to obtain the input image.
2. The method of claim 1, wherein said encoding the first residual image and the first residual distribution to obtain residual encoded data comprises:
and outputting the residual encoded data by taking the first residual image and the first residual distribution as the input of a semi-dynamic entropy encoder, wherein the semi-dynamic entropy encoder is used for performing entropy encoding by using a first preset type of encoding operation, the first preset type of encoding operation comprises addition, subtraction or bit operation, a second preset type of encoding operation is not included in the semi-dynamic entropy encoder, and the second preset type comprises at least one of multiplication, division or remainder operation.
3. The method of claim 1, wherein said encoding the hidden variable to obtain residual encoded data comprises:
and taking the hidden variable as the input of a static entropy coder to obtain the hidden variable coded data.
4. The method according to any one of claims 1-3, wherein the self-coding model comprises a coding model and a decoding model, and wherein the outputting the hidden variable and the first residual distribution using the input image as an input to the self-coding model comprises:
taking the input image as the input of the coding model, and outputting the hidden variable, wherein the coding model is used for extracting features from the input image;
and taking the implicit variable as the input of the decoding model to obtain the first residual distribution, wherein the decoding model is used for predicting residual values between the input image and the corresponding pixel distribution.
5. The method of any of claims 1-4, wherein the autoregressive model is used to predict the values of pixels on the same link using the pixel values of the predicted pixels.
6. An image decompression method, comprising:
acquiring implicit variable coded data and residual coded data, wherein the implicit variable coded data are obtained by coding features extracted from an input image by a coding end, and the residual coded data comprise data obtained by coding a residual between the input image and an image output by autoregressive model forward propagation;
decoding the hidden variable coded data to obtain hidden variables, wherein the hidden variables comprise features extracted from the input image;
taking the hidden variable as the input of a self-coding model, and outputting second residual distribution;
decoding is carried out by combining the second residual distribution and the residual coded data to obtain a second residual image;
and taking the second residual image as the backward propagation input of the autoregressive model, and outputting a decompressed image.
7. The method of claim 6, wherein said decoding the hidden variable encoded data to obtain hidden variables comprises:
and taking the hidden variable coded data as the input of a static entropy coder, and outputting the hidden variable.
8. The method according to claim 6 or 7, wherein said decoding combined with said second residual distribution and said residual encoded data to obtain a second residual image comprises:
and taking the second residual distribution and the residual coding data as the input of a semi-dynamic entropy coder, and outputting the second residual image, wherein the semi-dynamic entropy coder is used for performing entropy coding by using a first preset type of coding operation, the first preset type of coding operation comprises addition, subtraction or bit operation, the semi-dynamic entropy coder does not comprise a second preset type of coding operation, and the second preset type comprises at least one of multiplication, division or residue operation.
9. The method according to any one of claims 6-8, wherein outputting a decompressed image using the second residual image as an input for back propagation of an autoregressive model comprises:
and carrying out parallel decoding on the pixel points on the same connecting line in the second residual image through the autoregressive model to obtain the decompressed image.
10. An image encoding device characterized by comprising:
the autoregressive module is used for taking an input image as the input of an autoregressive model and outputting a first image, wherein the autoregressive model is a model of the autoregressive model;
the residual error calculation module is used for acquiring a residual error between the first image and the input image to obtain a first residual error image;
a self-coding module, configured to take the input image as an input of a self-coding model, and output an implicit variable and a first residual distribution, where the implicit variable includes a feature extracted from the input image, and the first residual distribution includes a residual value output by the self-coding model and used for representing a residual value between each pixel point in the input image and each corresponding pixel point in the first residual image;
a residual coding module, configured to code the first residual image and the first residual distribution to obtain residual coded data;
and the latent variable coding module is used for coding the latent variable to obtain latent variable coded data, and the latent variable coded data and the residual coded data are used for obtaining the input image after decompression.
11. The apparatus of claim 10,
the residual coding module is specifically configured to use the first residual image and the first residual distribution as inputs of a semi-dynamic entropy coder, and output the residual coded data, where the semi-dynamic entropy coder is configured to perform entropy coding using a first preset type of coding operation, the first preset type of coding operation includes an addition, a subtraction, or a bit operation, and the semi-dynamic entropy coder does not include a second preset type of coding operation, and the second preset type includes at least one of a multiplication, a division, or a remainder operation.
12. The apparatus of claim 10,
the hidden variable coding module is specifically configured to use the hidden variable as an input of a static entropy coder to obtain the hidden variable coded data.
13. The apparatus according to any of claims 10-12, wherein the self-coding model comprises a coding model and a decoding model, and the self-coding module is specifically configured to:
taking the input image as the input of the coding model, and outputting the hidden variable, wherein the coding model is used for extracting features from the input image;
and taking the implicit variable as the input of the decoding model to obtain the first residual distribution, wherein the decoding model is used for predicting residual values between the input image and the corresponding pixel distribution.
14. The apparatus of any of claims 10-13, wherein the autoregressive model is configured to predict values of pixels on a same line using pixel values of predicted pixels.
15. An image decompression apparatus, characterized by comprising:
the receiving and sending module is used for obtaining implicit variable coded data and residual coded data, the implicit variable coded data comprises coded data obtained by coding features extracted from an input image by a coding end, and the residual coded data comprises coded data obtained by coding a residual between a first image output by autoregressive model forward propagation and the input image;
the hidden variable decoding module is used for decoding the hidden variable coded data to obtain hidden variables, and the hidden variables comprise features extracted from the input image;
the self-coding module is used for taking the hidden variable as the input of a self-coding model and outputting second residual distribution;
the residual error decoding module is used for decoding by combining the second residual error distribution and the residual error coded data to obtain a second residual error image;
and the autoregressive module is used for taking the second residual image as the input of the back propagation of the autoregressive model and outputting a decompressed image.
16. The apparatus of claim 15,
the hidden variable decoding module is specifically configured to take the hidden variable encoded data as an input of a static entropy encoder, and output the hidden variable.
17. The apparatus of claim 15 or 16,
the residual decoding module is specifically configured to output the second residual image by using the second residual distribution and the residual encoded data as inputs of a semi-dynamic entropy encoder, where the semi-dynamic entropy encoder is configured to perform entropy encoding using a first preset type of encoding operation, the first preset type of encoding operation includes an addition, a subtraction, or a bit operation, the semi-dynamic entropy encoder does not include a second preset type of encoding operation, and the second preset type includes at least one of a multiplication, a division, or a remainder operation.
18. The apparatus of any one of claims 10-17,
the autoregressive module is specifically configured to perform parallel decoding on the pixel points on the same connecting line in the second residual image through the autoregressive model to obtain the decompressed image.
19. An image encoding apparatus comprising a processor coupled to a memory, the memory storing a program, the program instructions stored by the memory when executed by the processor implementing the steps of the method of any of claims 1-5.
20. An image decompression apparatus comprising a processor coupled to a memory, the memory storing a program that when executed by the processor implements the steps of the method of any of claims 6 to 9.
21. An image processing system, characterized by comprising image encoding means for implementing the steps of the method of any one of claims 1 to 5 and image decompression means for implementing the steps of the method of any one of claims 6 to 9.
22. A computer readable storage medium comprising a program which, when executed by a processing unit, performs the steps of the method of any one of claims 1 to 9.
23. A computer program product, characterized in that it comprises a software code for performing the steps of the method according to any one of claims 1 to 9.
CN202210447177.0A 2022-04-26 2022-04-26 Image coding method, image decompression method and device Pending CN115022637A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210447177.0A CN115022637A (en) 2022-04-26 2022-04-26 Image coding method, image decompression method and device
PCT/CN2023/090043 WO2023207836A1 (en) 2022-04-26 2023-04-23 Image encoding method and apparatus, and image decompression method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210447177.0A CN115022637A (en) 2022-04-26 2022-04-26 Image coding method, image decompression method and device

Publications (1)

Publication Number Publication Date
CN115022637A true CN115022637A (en) 2022-09-06

Family

ID=83067519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210447177.0A Pending CN115022637A (en) 2022-04-26 2022-04-26 Image coding method, image decompression method and device

Country Status (2)

Country Link
CN (1) CN115022637A (en)
WO (1) WO2023207836A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207836A1 (en) * 2022-04-26 2023-11-02 华为技术有限公司 Image encoding method and apparatus, and image decompression method and apparatus

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11388416B2 (en) * 2019-03-21 2022-07-12 Qualcomm Incorporated Video compression using deep generative models
CN111405283B (en) * 2020-02-20 2022-09-02 北京大学 End-to-end video compression method, system and storage medium based on deep learning
GB202016824D0 (en) * 2020-10-23 2020-12-09 Deep Render Ltd DR big book 3
CN111901596B (en) * 2020-06-29 2021-10-22 北京大学 Video hybrid coding and decoding method, device and medium based on deep learning
CN114066914A (en) * 2020-07-30 2022-02-18 华为技术有限公司 Image processing method and related equipment
CN112257858A (en) * 2020-09-21 2021-01-22 华为技术有限公司 Model compression method and device
CN115022637A (en) * 2022-04-26 2022-09-06 华为技术有限公司 Image coding method, image decompression method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207836A1 (en) * 2022-04-26 2023-11-02 华为技术有限公司 Image encoding method and apparatus, and image decompression method and apparatus

Also Published As

Publication number Publication date
WO2023207836A1 (en) 2023-11-02

Similar Documents

Publication Publication Date Title
US10834415B2 (en) Devices for compression/decompression, system, chip, and electronic device
KR102332490B1 (en) Compression methods, chips, electronics and media for deep neural networks
JP6946572B2 (en) Accelerated quantized multiply-accumulate operation
CN113259665B (en) Image processing method and related equipment
Khashman et al. Image compression using neural networks and Haar wavelet
CN110677651A (en) Video compression method
CN114581544A (en) Image compression method, computer device and computer storage medium
WO2022028197A1 (en) Image processing method and device thereof
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
US20230401756A1 (en) Data Encoding Method and Related Device
WO2018228399A1 (en) Computing device and method
WO2023207836A1 (en) Image encoding method and apparatus, and image decompression method and apparatus
WO2023174256A1 (en) Data compression method and related device
WO2023051335A1 (en) Data encoding method, data decoding method, and data processing apparatus
CN114501031B (en) Compression coding and decompression method and device
TW202348029A (en) Operation of a neural network with clipped input data
CN112532251A (en) Data processing method and device
Fraihat et al. A novel lossy image compression algorithm using multi-models stacked AutoEncoders
CN115409697A (en) Image processing method and related device
CN113554719B (en) Image encoding method, decoding method, storage medium and terminal equipment
CN114693811A (en) Image processing method and related equipment
CN116468966A (en) Neural network reasoning acceleration method and device based on feature map compression
AU2022348742A1 (en) Feature map encoding and decoding method and apparatus
TW202345034A (en) Operation of a neural network with conditioned weights
WO2023222313A1 (en) A method, an apparatus and a computer program product for machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination