CN115409150A

CN115409150A - Data compression method, data decompression method and related equipment

Info

Publication number: CN115409150A
Application number: CN202110584986.1A
Authority: CN
Inventors: 张琛; 张世枫; 法比奥·玛利亚·卡路奇; 李震国
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2022-11-29

Abstract

The embodiment of the application discloses a data compression method, a data decompression method and related equipment, which can be applied to lossless compression of data such as images, videos, texts and the like. The method may be performed by a compression device or may be performed by a component of a compression device (e.g., a processor, a chip, or a system of chips, etc.). The method comprises the following steps: the method comprises the steps of obtaining a first data batch and a second data batch, updating a mother model according to the first data batch to obtain a first model in the process of compressing the first data batch, and using the first model to participate in compression of the second data batch. On one hand, in the compression process, the compression can be carried out on the data batch to be compressed while the updating is carried out on the master model, and the updated new model does not need to be stored, so that the compression method is suitable for the compression of large-scale data sets. On the other hand, in the updating process of the master model, the master model can more and more conform to the distribution of the data batch to be compressed, and therefore, the compression rate can be higher and higher.

Description

Data compression method, data decompression method and related equipment

Technical Field

The embodiment of the application relates to the field of artificial intelligence, in particular to a data compression method, a data decompression method and related equipment.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.

At present, lossless compression is taken as an important technical direction in modern AI, lossless compression can be mainly realized by a depth generation model, and the depth generation model can very accurately model the distribution of data, so that compression ratio higher than that of a traditional algorithm can be realized theoretically.

However, how to use the depth generation model to improve the compression rate of data in practical application is an urgent technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides a data compression method, a data decompression method and related equipment. The compression rate of the data batch to be compressed can be improved.

A first aspect of the embodiments of the present application provides a data compression method, which may be applied to lossless compression of data such as images, videos, and texts. The method may be performed by a compression device or may be performed by a component of a compression device (e.g., a processor, a chip, or a system of chips, etc.). The method comprises the following steps: acquiring a first data batch and a second data batch, for example, acquiring the first data batch and the second data batch in a data set to be compressed; acquiring a mother model; compressing the first data batch to obtain a first sequence based on the first data batch, the mother model and a first encoder, wherein the first encoder corresponds to the mother model; updating the parent model based on the first data batch to obtain a first model; and compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence and the first encoder. The mother model is used for calculating first probability distribution information, and the first probability distribution information is used for representing probability distribution of values of all variables in the first data batch. The variable value can be understood as the minimum compression unit (or the value corresponding to the minimum compression unit) in the data batch. For image compression, the variable value may be a pixel value or a plurality of pixel values.

In the embodiment of the application, on one hand, in the compression process, the compression can be carried out on the data batch to be compressed while the parent model is updated, and the updated new model does not need to be stored, so that the compression method is suitable for compression of large-scale data sets. On the other hand, in the updating process of the master model, the master model is more and more in line with the distribution of the data batch to be compressed, and therefore, the compression rate is more and more high. In addition, compared with the prior art that a new model needs to be trained specially for the data set to be compressed, the master model in the embodiment of the application has strong universality, and can continuously update the model in the process of compressing the data set to be compressed without training specially for the data set to be compressed. And the updating process can be reproduced by updating the master model in the decompression process, so that the updated model can be obtained by storing the master model without storing each model generated in the compression process, and the storage space is saved.

Optionally, in a possible implementation manner of the first aspect, the step further includes: acquiring a third data batch; compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence and the first encoder, comprising: compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder; updating the first model based on the third data batch to obtain a second model; and compressing the second data batch to obtain a second sequence based on the second data batch, the second model, the third sequence and the first encoder.

In the possible implementation mode, the parent model can be updated for multiple times, and the updated model is used for compressing the subsequent data batch, so that the compression rate is improved.

Optionally, in a possible implementation manner of the first aspect, the step further includes: acquiring a third data batch; compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence and the first encoder, comprising: compressing the third data batch based on the third data batch, the first model, the first sequence and the first encoder to obtain a third sequence; and compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the third sequence and the first encoder. In addition, if the model is not updated, namely the model is stopped early, prompt information can be stored, and the prompt information is used for indicating the early stop information, so that the wrong model is prevented from being used in the decompression process.

In this possible implementation, the updating of the model may be ended in advance by means of an early stop, and the subsequent data batch may be compressed using the model obtained at the early stop. Since an early stall would stop further updates of the model early, compression after the early stall would only require forward propagation of the neural network, saving time required for backward propagation and space for model updates (saving space to allow for larger batches of data). In other words, the compression ratio is sacrificed to provide an improved solution to compression and decompression time efficiency over the way the model is updated all the time.

Optionally, in a possible implementation manner of the first aspect, the updating the parent model based on the first data batch to obtain the first model includes: and updating the mother model by using an optimization algorithm and the first data batch to obtain a first model, wherein the optimization algorithm comprises a gradient-based optimization algorithm, meta learning or reinforcement learning. Optionally, in order to realize the reproduction of the updating of the mother model in the decompression process, the random seeds appearing in the process of updating the mother model can be saved and saved together with the mother model.

In the possible implementation mode, the mother model can be updated based on the optimization algorithm, the reproduction is easy in the decompression process, and the compression time is reduced by the time consumed by specially training the data set to be compressed compared with the existing compression time because the special training of the data set to be compressed in the prior art is not needed.

Optionally, in a possible implementation manner of the first aspect, the master model in the above step is obtained by training the neural network model with a training data batch in a training data set as an input of the neural network model and with a loss function value smaller than a first threshold as a target, a data type of the training data set is the same as a data type of a data set to be compressed, the data set to be compressed includes a first data batch and a second data batch, and the loss function is used to indicate a difference between probability distribution information output by the neural network model and actual probability distribution information of a variable value in the training data batch.

In the possible implementation mode, the neural network model is trained through the loss function to obtain a trained mother model, and the data type of the training data set is the same as that of the data set to be compressed, so that the trained mother model is ensured to accord with the distribution of the data set to be compressed, and the compression ratio is improved.

Optionally, in a possible implementation manner of the first aspect, the compressing the first data batch to obtain the first sequence based on the first data batch, the mother model, and the first encoder includes: inputting the first data batch into the mother model to obtain first probability distribution information; the first data batch is compressed based on the first probability distribution information and the first encoder to obtain a first sequence.

In this possible implementation, the first probability distribution information of the first data batch is obtained through the mother model, and the first data batch is compressed by the probability distribution information and the first encoder to obtain the first sequence. In the process of compressing the first data batch, the first data batch is not required to be specially trained in advance, so that the compression time is shortened by comparing with the existing compression time, and the time consumed by specially training the first data batch is shortened.

Optionally, in a possible implementation manner of the first aspect, the step of compressing the second data batch to obtain the second sequence based on the second data batch, the first model, the first sequence, and the first encoder includes: inputting the second data batch into the first model to obtain second probability distribution information, wherein the second probability distribution information is used for expressing the probability distribution of each variable value in the second data batch; and compressing the second data batch to obtain a second sequence based on the second probability distribution information, the first sequence and the first encoder.

In this possible implementation manner, the subsequent data batch (i.e., the second data batch) uses the first model obtained by updating the parent model, and the first model better conforms to the probability distribution of the second data batch, so that the compression rate is improved.

Optionally, in a possible implementation manner of the first aspect, the step of compressing the second data batch to obtain the second sequence based on the second data batch, the second model, the third sequence, and the first encoder includes: inputting the second data batch into a second model to obtain second probability distribution information, wherein the second probability distribution information is used for expressing the probability distribution of each variable value in the second data batch; and compressing the second data batch to obtain a second sequence based on the second probability distribution information, the third sequence and the first encoder.

In this possible implementation manner, the subsequent data batch (i.e., the second data batch) uses the second model obtained by updating the first model, and the second model better conforms to the probability distribution of the second data batch, so as to improve the compression rate.

Optionally, in a possible implementation manner of the first aspect, the step of compressing the second data batch to obtain the second sequence based on the second data batch, the first model, the third sequence, and the first encoder includes: inputting the second data batch into the first model to obtain second probability distribution information, wherein the second probability distribution information is used for expressing the probability distribution of each variable value in the second data batch; and compressing the second data batch to obtain a second sequence based on the second probability distribution information, the third sequence and the first encoder.

In this possible implementation, the updating of the model may be ended in advance by means of an early stop, and the subsequent data batch may be compressed using the model obtained at the early stop. Since an early stall would stop further updates of the model early, compression after the early stall would only require forward propagation of the neural network, saving time required for backward propagation and space for model updates (saving space to allow for larger batches of data). In other words, improvements in compression and decompression time efficiency are improved by sacrificing compression ratio over the way models are updated at all times.

Optionally, in a possible implementation manner of the first aspect, the step of compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence, and the first encoder includes: inputting the third data batch into the first model to obtain third probability distribution information, wherein the third probability distribution information is used for expressing the probability distribution of each variable value in the third data batch; and compressing the third data batch to obtain a third sequence based on the third probability distribution information, the first sequence and the first encoder.

In this possible implementation manner, the subsequent data batch (i.e., the third data batch) uses the first model, and the first model better conforms to the probability distribution of the third data batch, so as to improve the compression rate.

Optionally, in a possible implementation manner of the first aspect, the step further includes: and determining a first encoder corresponding to the type of the parent model based on a first incidence relation, wherein the first incidence relation is used for representing the incidence relation between the type of the parent model and the first encoder, and the type comprises a full observation model and a hidden variable model.

In this possible implementation, an adaptive encoder is selected according to the type of the master model, and the subsequent compression rate can be improved.

Optionally, in a possible implementation manner of the first aspect, the obtaining the mother model by the above steps includes: and acquiring a mother model based on the data type of the data set to be compressed, wherein the data type comprises an image data type and a sequence data type.

In this possible implementation, the depth generative model has different effects on different data types. For example, the stream model or the variational self-encoder model works better for image data. And the adaptive mother model is selected through the data type, so that the compression rate is favorably improved.

Optionally, in a possible implementation manner of the first aspect, the acquiring the first data batch and the second data batch in the above step includes: acquiring a data set to be compressed; and splitting the data set to be compressed to obtain a first data batch and a second data batch.

In the possible implementation mode, the method is suitable for compressing large-scale data sets, and the data batch is obtained by splitting the data sets.

A second aspect of the embodiments of the present application provides a data compression method, which can be applied to lossless compression of data such as images, videos, and texts. The method may be performed by a compression device or may be performed by a component of a compression device (e.g., a processor, a chip, or a system of chips, etc.). The method comprises the following steps: acquiring a first data batch and a second data batch; acquiring a mother model, wherein the mother model is used for calculating first probability distribution information, and the first probability distribution information is used for representing the probability distribution of each variable value in a first data batch; updating the mother model based on the first data batch to obtain a first model; compressing the second data batch based on the second data batch, the first model and the first encoder to obtain a first sequence, wherein the first encoder corresponds to the mother model; and compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence and the first encoder.

In the embodiment of the application, the method and the device can be suitable for a first-in last-out encoder, on one hand, in the compression process, compression can be carried out on the data to be compressed in batches while the master model is updated, and the updated new model does not need to be stored, so that the method and the device are suitable for compression of large-scale data sets. On the other hand, in the updating process of the master model, the master model is more and more in line with the distribution of the data batch to be compressed, and therefore, the compression rate is more and more high. On the other hand, the decompression process can obtain data batches in sequence through the reverse order adjustment of the first-in-last-out type encoder.

Optionally, in a possible implementation manner of the second aspect, the step further includes: acquiring a third data batch; compressing the second data batch to obtain a first sequence based on the second data batch, the first model and the first encoder, comprising: updating the first model based on the third data batch to obtain a second model; compressing the second data batch to obtain a first sequence based on the second data batch, the second model and the first encoder; compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence and the first encoder, and the method comprises the following steps: compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder; and compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the third sequence and the first encoder.

Optionally, in a possible implementation manner of the second aspect, the step further includes: acquiring a third data batch; compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence and the first encoder, and the method comprises the following steps: compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder; and compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the third sequence and the first encoder. In addition, if the model is not updated, namely the model is stopped early, prompt information can be stored, and the prompt information is used for indicating the early stop information, so that the wrong model is prevented from being used in the decompression process.

A third aspect of the embodiments of the present application provides a data decompression method, which can be applied to lossless compression of data such as images, videos, and texts. The method may be performed by a decompression device, or may be performed by a component of a decompression device (e.g., a processor, a chip, or a system of chips, etc.). The method comprises the following steps: acquiring a second sequence; acquiring a mother model; decompressing the second sequence based on the mother model and the first encoder to obtain a fourth sequence and a first data batch, wherein the first encoder corresponds to the mother model; updating the mother model based on the first data batch to obtain a first model; and decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch.

In the embodiment of the application, on one hand, in the decompression process, the mother model can be updated by decompressing the coding sequence without storing the updated new model, and therefore, the method is suitable for compression of large-scale data sets. On the other hand, the master model in the embodiment of the application has strong universality, can not need to train specially to-be-compressed data sets, and continuously updates the model in the process of compressing the to-be-compressed data sets. And the updating process can be reproduced by updating the master model in the decompression process, so that the updated model can be obtained by storing the master model without storing each model generated in the compression process, and the storage space is saved.

Optionally, in a possible implementation manner of the third aspect, the above step of obtaining the second data batch based on the first model and the fourth sequence decompressed by the first encoder includes: decompressing the fourth sequence based on the first model and the first encoder to obtain a fifth sequence and a third data batch; updating the first model based on the third data batch to obtain a second model; and decompressing the fifth sequence based on the second model and the first coder to obtain a second data batch.

In the possible implementation mode, the parent model can be updated for multiple times, and the updated model is used for decompressing the coding sequence to obtain the data batch, so that lossless compression is realized.

Optionally, in a possible implementation manner of the third aspect, the above step of obtaining the second data batch based on the first model and the fourth sequence decompressed by the first encoder includes: decompressing the fourth sequence based on the first model and the first encoder to obtain a fifth sequence and a third data batch; and decompressing the fifth sequence based on the first model and the first encoder to obtain a second data batch. Optionally, a prompt may be obtained, where the prompt is used to indicate early-stop information to avoid using an incorrect model during decompression.

In this possible implementation, compared with the mode of updating the model all the time, the improvement scheme of improving the decompression time efficiency by sacrificing the compression ratio is adopted.

Optionally, in a possible implementation manner of the third aspect, the updating the parent model based on the first data batch to obtain the first model includes: and updating the mother model by using an optimization algorithm and the first data batch to obtain a first model, wherein the optimization algorithm comprises a gradient-based optimization algorithm, meta learning or reinforcement learning. Optionally, in order to realize the reproduction of the updating of the mother model in the decompression process, the random seeds appearing in the process of updating the mother model can be saved and saved together with the mother model.

In the possible implementation mode, the mother model can be updated based on an optimization algorithm, and the updating is easy to reproduce in the decompression process.

Optionally, in a possible implementation manner of the third aspect, the master model in the above step is obtained by training the neural network model with a training data batch in a training data set as an input of the neural network model and with a loss function value smaller than a first threshold as a target, a data type of the training data set is the same as a data type of a data set to be compressed, the data set to be compressed includes a first data batch and a second data batch, and the loss function is used to indicate a difference between probability distribution information output by the neural network model and actual probability distribution information of a variable value in the training data batch.

Optionally, in a possible implementation manner of the third aspect, the step of obtaining the fourth sequence and the first data batch based on the mother model and the second sequence decompressed by the first encoder includes: decompressing the second sequence based on the second sequence, the mother model and the first encoder to obtain a fourth sequence and a first data batch; and decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch, which comprises the following steps: and decompressing the fourth sequence based on the fourth sequence, the first model and the first encoder to obtain a second data batch.

In this possible implementation, the method can be applied to the case where the model contains conditional probabilities.

Optionally, in a possible implementation manner of the third aspect, the step of obtaining a fourth sequence and the first data batch based on the second sequence, the mother model, and the second sequence decompressed by the first encoder includes: obtaining a first probability based on the mother model; decompressing the second sequence based on the first probability and the first encoder to obtain a portion of data; inputting a part of data into the mother model to obtain a second probability; decompressing the second sequence based on the second probability and the first encoder to obtain another part of data and a fourth sequence; a first data batch is obtained based on one part of data and the other part of data.

In this possible implementation, a part of the probabilities may be obtained from the model, a part of the data may be obtained by decompressing the code sequence with the part of the probabilities, another part of the probabilities may be obtained by inputting the part of the data into the model, another part of the data may be obtained by decompressing the code sequence with the another part of the probabilities, and then the complete data batch may be obtained.

Optionally, in a possible implementation manner of the third aspect, the step of obtaining the fourth sequence and the first data batch based on the mother model and the second sequence decompressed by the first encoder includes: acquiring first probability distribution information based on the mother model, wherein the first probability distribution information is used for representing the probability distribution of values of all variables in the first data batch; and decompressing the second sequence based on the first probability distribution information and the first encoder to obtain a fourth sequence and the first data batch.

In this possible implementation manner, the first probability distribution information may be directly obtained through the mother model, and the first data batch is obtained by decompressing the sequence with the first probability distribution information.

Optionally, in a possible implementation manner of the third aspect, the above step of obtaining the second data batch based on the first model and the fourth sequence decompressed by the first encoder includes: acquiring second probability distribution information based on the first model, wherein the second probability distribution information is used for representing the probability distribution of each variable value in the second data batch; and decompressing the fourth sequence based on the second probability distribution information and the first encoder to obtain a second data batch.

In this possible implementation manner, the second probability distribution information may be directly obtained through the first model, and the second data batch is obtained by decompressing the sequence with the first probability distribution information.

Optionally, in a possible implementation manner of the third aspect, the foregoing steps further include: and determining a first encoder corresponding to the type of the parent model based on a first incidence relation, wherein the first incidence relation is used for representing the incidence relation between the type of the parent model and the first encoder, and the type comprises a full observation model and a hidden variable model.

Optionally, in a possible implementation manner of the third aspect, the step further includes: and combining the first data batch and the second data batch to obtain an initial data set.

In the possible implementation mode, the method is suitable for decompression of large-scale data sets, and the data sets are obtained by combining a plurality of data batches.

A fourth aspect of the embodiments of the present application provides a compression apparatus, which may be applied to lossless compression of data such as images, videos, and texts. The compression apparatus includes:

the acquisition unit is used for acquiring the first data batch and the second data batch;

the acquiring unit is further used for acquiring a mother model, the mother model is used for calculating first probability distribution information, and the first probability distribution information is used for representing the probability distribution of values of all variables in the first data batch;

the compression unit is used for compressing the first data batch to obtain a first sequence based on the first data batch, the mother model and a first encoder, and the first encoder corresponds to the mother model;

the updating unit is used for updating the mother model based on the first data batch to obtain a first model;

and the compression unit is also used for compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the obtaining unit is further configured to obtain a third data batch; the compression unit is further used for compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder; the updating unit is also used for updating the first model based on the third data batch to obtain a second model; and the compression unit is further used for compressing the second data batch to obtain a second sequence based on the second data batch, the second model, the third sequence and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the obtaining unit is further configured to obtain a third data batch; the compression unit is further used for compressing the third data batch based on the third data batch, the first model, the first sequence and the first encoder to obtain a third sequence; and the compression unit is also used for compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the third sequence and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the updating unit is specifically configured to update the mother model to obtain the first model using an optimization algorithm and the first data batch, where the optimization algorithm includes a gradient-based optimization algorithm, meta learning, or reinforcement learning.

Optionally, in a possible implementation manner of the fourth aspect, the master model is obtained by training the neural network model with a training data batch in a training data set as an input of the neural network model and with a loss function value smaller than a first threshold as a target, a data type of the training data set is the same as a data type of a data set to be compressed, the data set to be compressed includes the first data batch and a second data batch, and the loss function is used to indicate a difference between probability distribution information output by the neural network model and actual probability distribution information of a variable value in the training data batch.

Optionally, in a possible implementation manner of the fourth aspect, the compressing unit is specifically configured to input the first data batch into the mother model to obtain first probability distribution information; and the compression unit is specifically used for compressing the first data batch to obtain a first sequence based on the first probability distribution information and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the compressing unit is specifically configured to input the second data batch into the first model to obtain second probability distribution information, where the second probability distribution information is used to represent a probability distribution of values of each variable in the second data batch; and the compression unit is specifically configured to compress the second data batch to obtain a second sequence based on the second probability distribution information, the first sequence, and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the compressing unit is specifically configured to input the second data batch into the second model to obtain second probability distribution information, where the second probability distribution information is used to represent probability distribution of values of each variable in the second data batch; and the compression unit is specifically configured to compress the second data batch to obtain a second sequence based on the second probability distribution information, the third sequence, and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the compressing unit is specifically configured to input the second data batch into the first model to obtain second probability distribution information, where the second probability distribution information is used to represent a probability distribution of values of each variable in the second data batch; and the compression unit is specifically configured to compress the second data batch to obtain a second sequence based on the second probability distribution information, the third sequence, and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the compressing unit is specifically configured to input the third data batch into the first model to obtain third probability distribution information, where the third probability distribution information is used to represent probability distribution of values of each variable in the third data batch; and the compression unit is specifically used for compressing the third data batch to obtain a third sequence based on the third probability distribution information, the first sequence and the first encoder.

Optionally, in a possible implementation manner of the fourth aspect, the compressing apparatus further includes: the determining unit is used for determining a first encoder corresponding to the type of the mother model based on a first incidence relation, the first incidence relation is used for representing the incidence relation between the type of the mother model and the first encoder, and the type comprises a full observation model and a hidden variable model.

Optionally, in a possible implementation manner of the fourth aspect, the obtaining unit is specifically configured to obtain the mother model based on a data type of the data set to be compressed, where the data type includes an image data type and a sequence data type.

Optionally, in a possible implementation manner of the fourth aspect, the obtaining unit is specifically configured to obtain a data set to be compressed; the acquisition unit is specifically used for splitting the data set to be compressed to obtain a first data batch and a second data batch.

A fifth aspect of the embodiments of the present application provides a compression apparatus, which can be applied to lossless compression of data such as images, videos, and texts. The compression apparatus includes:

the acquiring unit is further used for acquiring a mother model, the mother model is used for calculating first probability distribution information, and the first probability distribution information is used for representing probability distribution of values of all variables in the first data batch;

the compression unit is used for compressing the second data batch to obtain a first sequence based on the second data batch, the first model and the first encoder, and the first encoder corresponds to the mother model;

and the compression unit is also used for compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence and the first encoder.

Optionally, in a possible implementation manner of the fifth aspect, the obtaining unit is further configured to obtain a third data batch; the updating unit is also used for updating the first model based on the third data batch to obtain a second model; the compression unit is specifically used for compressing the second data batch to obtain a first sequence based on the second data batch, the second model and the first encoder; the compression unit is specifically used for compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder; and the compression unit is specifically used for compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the third sequence and the first encoder.

Optionally, in a possible implementation manner of the fifth aspect, the obtaining unit is further configured to obtain a third data batch; the compression unit is specifically used for compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder; and the compression unit is specifically used for compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the third sequence and the first encoder.

A sixth aspect of the embodiments of the present application provides a decompression apparatus, which can be applied to lossless compression of data such as images, videos, and texts. The decompression device includes:

an acquisition unit configured to acquire a second sequence;

an acquisition unit configured to acquire a master model;

the decompression unit is used for decompressing the second sequence based on the mother model and the first encoder to obtain a fourth sequence and a first data batch, and the first encoder corresponds to the mother model;

and the decompression unit is further used for decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch.

Optionally, in a possible implementation manner of the sixth aspect, the decompressing unit is specifically configured to decompress the fourth sequence based on the first model and the first encoder to obtain a fifth sequence and a third data batch; the updating unit is also used for updating the first model based on the third data batch to obtain a second model; and the decompression unit is specifically used for decompressing the fifth sequence based on the second model and the first encoder to obtain the second data batch.

Optionally, in a possible implementation manner of the sixth aspect, the decompressing unit is specifically configured to decompress the fourth sequence based on the first model and the first encoder to obtain a fifth sequence and a third data batch; and the decompression unit is specifically used for decompressing the fifth sequence based on the first model and the first encoder to obtain the second data batch.

Optionally, in a possible implementation manner of the sixth aspect, the updating unit is specifically configured to obtain the first model by using an optimization algorithm and the first batch optimization mother model, where the optimization algorithm includes a gradient-based optimization algorithm, meta learning, or reinforcement learning.

Optionally, in a possible implementation manner of the sixth aspect, the master model is obtained by training the neural network model with a training data batch in a training data set as an input of the neural network model and with a loss function value smaller than a first threshold as a target, a data type of the training data set is the same as a data type of a data set to be compressed, the data set to be compressed includes the first data batch and a second data batch, and the loss function is used to indicate a difference between probability distribution information output by the neural network model and actual probability distribution information of a variable value in the training data batch.

Optionally, in a possible implementation manner of the sixth aspect, the decompressing unit is specifically configured to decompress the second sequence based on the second sequence, the mother model, and the first encoder to obtain a fourth sequence and the first data batch; and the decompression unit is specifically used for decompressing the fourth sequence based on the fourth sequence, the first model and the first encoder to obtain the second data batch.

Optionally, in a possible implementation manner of the sixth aspect, the decompressing unit is specifically configured to obtain the first probability based on the mother model; the decompression unit is specifically used for decompressing the second sequence to obtain a part of data based on the first probability and the first encoder; the decompression unit is specifically used for inputting a part of data into the mother model to obtain a second probability; the decompression unit is specifically used for decompressing the second sequence based on the second probability and the first encoder to obtain another part of data and a fourth sequence; and the decompression unit is specifically used for obtaining the first data batch based on one part of data and the other part of data.

Optionally, in a possible implementation manner of the sixth aspect, the decompressing unit is specifically configured to obtain first probability distribution information based on the mother model, where the first probability distribution information is used to indicate probability distribution of values of each variable in the first data batch; and the decompression unit is specifically used for decompressing the second sequence based on the first probability distribution information and the first encoder to obtain the fourth sequence and the first data batch.

Optionally, in a possible implementation manner of the sixth aspect, the decompression unit is specifically configured to obtain second probability distribution information based on the first model, where the second probability distribution information is used to represent probability distribution of values of each variable in the second data batch; and the decompression unit is specifically used for decompressing the fourth sequence based on the second probability distribution information and the first encoder to obtain the second data batch.

Optionally, in a possible implementation manner of the sixth aspect, the decompression device further includes: the determining unit is used for determining a first encoder corresponding to the type of the mother model based on a first incidence relation, the first incidence relation is used for representing the incidence relation between the type of the mother model and the first encoder, and the type comprises a full observation model and a hidden variable model.

Optionally, in a possible implementation manner of the sixth aspect, the decompression device further includes: and the merging unit is used for merging the first data batch and the second data batch to obtain an initial data set.

A seventh aspect of embodiments of the present application provides a compression apparatus, where the compression apparatus or a component of the compression apparatus (e.g., a processor, a chip, or a system of chips) executes the method in the foregoing first aspect or any possible implementation manner of the first aspect.

An eighth aspect of embodiments of the present application provides a compression apparatus, or a component of a compression apparatus (e.g., a processor, a chip, or a system of chips) that performs the method of the second aspect or any possible implementation manner of the second aspect.

A ninth aspect of embodiments of the present application provides a decompression apparatus, where the decompression apparatus or a component of the decompression apparatus (such as a processor, a chip or a system of chips) performs the method in any possible implementation manner of the foregoing third aspect or third aspect.

A tenth aspect of embodiments of the present application provides a computer-readable storage medium, where instructions are stored, and when executed on a computer, cause the computer to perform the method in the foregoing first aspect or any possible implementation manner of the first aspect, the second aspect or any possible implementation manner of the second aspect, or any possible implementation manner of the third aspect.

An eleventh aspect of embodiments of the present application provides a computer program product, which when executed on a computer, causes the computer to perform the method in the foregoing first aspect or any possible implementation manner of the first aspect, the second aspect or any possible implementation manner of the second aspect, and the third aspect or any possible implementation manner of the third aspect.

A twelfth aspect of embodiments of the present application provides a compression apparatus, including: a processor coupled to a memory for storing a program or instructions which, when executed by the processor, cause the compression device to carry out the method of the first aspect or any possible implementation of the first aspect.

A thirteenth aspect of embodiments of the present application provides a compression apparatus, including: a processor coupled to a memory for storing a program or instructions which, when executed by the processor, cause the compression device to carry out the method of the second aspect described above or any possible implementation of the second aspect.

A fourteenth aspect of an embodiment of the present application provides a decompression apparatus, including: a processor coupled to a memory for storing a program or instructions which, when executed by the processor, cause the decompression apparatus to carry out the method of the third aspect or any possible implementation of the third aspect.

For technical effects brought by the fourth, seventh, tenth, eleventh, twelfth aspects or any one of possible implementation manners, reference may be made to technical effects brought by the first aspect or different possible implementation manners of the first aspect, and details are not described here.

For example, the technical effect brought by the fifth, eighth, tenth, eleventh, thirteenth aspects or any one of possible implementation manners of the fifth, eighth, tenth, eleventh, thirteenth aspects may refer to the technical effect brought by the second aspect or different possible implementation manners of the second aspect, and details are not described here again.

For technical effects brought by the sixth, ninth, tenth, eleventh, fourteenth aspects or any one of possible implementation manners, reference may be made to technical effects brought by different possible implementation manners of the third aspect or the third aspect, and details are not described here again.

According to the technical scheme, the embodiment of the application has the following advantages: and acquiring the first data batch and the second data batch, updating the parent model according to the first data batch to obtain a first model in the process of compressing the first data batch, and participating in the compression of the second data batch by using the first model. On one hand, in the compression process, the compression can be carried out on the data batch to be compressed while the updating is carried out on the master model, and the updated new model does not need to be stored, so that the compression method is suitable for the compression of large-scale data sets. On the other hand, in the updating process of the master model, the master model is more and more in line with the distribution of the data batch to be compressed, and therefore, the compression rate is more and more high. In addition, compared with the prior art that a new model needs to be trained specially for the data set to be compressed, the master model in the embodiment of the application has strong universality, and can continuously update the model in the process of compressing the data set to be compressed without training specially for the data set to be compressed. And the updating process can be reproduced by updating the master model in the decompression process, so that the updated model can be obtained by storing the master model without storing each model generated in the compression process, and the storage space is saved.

Drawings

FIG. 1 is a schematic diagram of a system architecture provided herein;

FIG. 2 is a schematic diagram of a convolutional neural network structure provided in the present application;

FIG. 3 is a schematic diagram of another convolutional neural network structure provided in the present application;

fig. 4 is a schematic diagram of a chip hardware structure provided in the present application;

FIG. 5 is a schematic flow chart of a training method for a mother model provided in the present application;

fig. 6A is a coding flow of an advanced first-out coder provided in the present application;

FIG. 6B is a decoding process of the first-in first-out encoder provided in the present application;

fig. 7A is an encoding flow of a first-in last-out encoder provided in the present application;

fig. 7B is a decoding process of a first-in last-out encoder provided in the present application;

FIG. 7C is a lossless compression framework provided herein;

FIG. 8 is a schematic flow chart of a data compression method provided herein;

FIGS. 9-12 are schematic diagrams of several data compression processes provided herein;

FIG. 13 is another schematic flow chart of a data compression method provided herein;

FIGS. 14-17 are block diagrams of further data compression provided herein;

fig. 18 is a schematic flowchart of a data decompression method provided in the present application;

19-22 are block diagrams of several data decompressions provided by the present application;

FIG. 23 is a schematic view of a configuration of a compressing apparatus according to an embodiment of the present application;

FIG. 24 is another schematic view of the structure of a compressing apparatus according to the embodiment of the present application;

FIG. 25 is a schematic diagram of a decompression apparatus according to an embodiment of the present application;

FIG. 26 is a schematic view showing another structure of a compressing apparatus of the embodiment of the present application;

fig. 27 is another schematic structural diagram of a decompression device according to an embodiment of the present application.

Detailed Description

For ease of understanding, the relevant terms and concepts to which the embodiments of the present application relate generally will be described below.

1. Neural network

The neural network may be composed of neural units, which may be referred to as X _s And an arithmetic unit with intercept 1 as input, the output of which may be:

wherein s =1, 2, … … n, n is a natural number greater than 1, W _s Is X _s B is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

2. Deep neural network

Deep Neural Networks (DNNs), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, where "many" has no particular metric. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically, the first layer is the input layer, the last layer is the output layer, and the number of layers in between are all hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. Of course, the deep neural network may not include the hidden layer, and is not limited herein.

The operation of each layer in the deep neural network can be expressed mathematically

To describe: from the work of each layer in the physical-level deep neural network, it can be understood that the transformation of the input space into the output space (i.e. the row space to the column space of the matrix) is accomplished by five operations on the input space (set of input vectors), which include: 1. ascending/descending dimensions; 2. zooming in/out; 3. rotating; 4. translating; 5. "bending". Wherein 1, 2, 3 are operated by

Finish, operation 4 is performed by

The operation of 5 is completed by α (). The expression "space" is used herein because the object being classified is not a single thing, but a class of things, and space refers to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value for a neuron in the layer of neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weight W of each layer controls how the space is transformed. The purpose of training the deep neural network is to finally obtain the weight matrix (the weight matrix formed by the vectors W of a plurality of layers) of all the layers of the trained neural network. Therefore, the training process of the neural network is essentially a way of learning the control space transformation, and more specifically, the weight matrix.

3. Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network includes a feature extractor consisting of convolutional layers and sub-sampling layers. The feature extractor may be viewed as a filter and the convolution process may be viewed as convolving the same trainable filter with an input image or convolved feature plane (feature map). The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The underlying principle is: the statistics of one part of the image are the same as the other parts. Meaning that image information learned in one part can also be used in another part. The same learned acquired image information can be used for all positions on the image. In the same convolution layer, a plurality of convolution kernels can be used to extract different image information, and generally, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size, and can acquire reasonable weight through learning in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting. The mother model in the embodiment of the present application may specifically be CNN.

4. Recurrent neural networks

In the traditional neural network model, all layers are connected, and nodes between each layer are connectionless. But such a general neural network is not solved for many problems. For example, it is predicted what the next word of a sentence is, because the preceding and following words in a sentence are not independent, and the preceding word is generally needed. Recurrent Neural Network (RNN) means that the current output of a sequence is also related to the previous output. The specific expression is that the network memorizes the previous information, stores the previous information in the internal state of the network and applies the previous information to the calculation of the current output.

5. Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are configured in advance for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be slightly lower, and the adjustment is carried out continuously until the neural network can predict the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

6. Cross entropy

The cross entropy is a common concept in deep learning, and is generally used to find a difference between a target value and a predicted value, and the cross entropy is often used as a loss function of a neural network.

7. Lossless compression

Lossless compression: the data is compressed, and the length of the compressed data is smaller than that of the original data. After the compressed data is decompressed, the restored data must be identical to the original data. The core of lossless compression is to find the statistical law inside the data and to express the variable value with higher probability in the file to be compressed by the code with shorter length. For example, if the letter e appears more frequently than z in an english document, the code length of the document can be made shorter if e is represented by a shorter code. Because this representation is reversible, the compressed file can be recovered from the encoding, thereby achieving lossless compression.

8. Compression ratio

The compression rate refers to a ratio of the length of original data to the length of compressed data. If not compressed, the value is 1. The larger the value, the better.

9. Number of bits per dimension

The number of bits per dimension refers to the average bit length of each dimension (byte) of the compressed data. The calculation formula may be: 8/compression ratio. If there is no compression, the value is 8. The smaller the value, the better.

10. Depth generative model

Deep Generative Models (DGMs): a probabilistic modeling technique with deep neural networks. The distribution of data can be modeled very accurately so that compression ratios higher than conventional algorithms can be theoretically achieved. The depth generation model mainly comprises two major classes of display generation models (explicit DGMs) and implicit generation models (implicit DGMs).

11. Display generation model

In contrast to an implicit generative model, an explicit generative model can give the probability distribution values of the input samples. Since probability distribution information is required in the coding mechanism of lossless compression, the depth generative model that can be currently applied to lossless compression is an explicit generative model. The display generation model can be classified into full observation models (fully observed models) and latent variable models (latent variable models) according to the structure of the model.

12. Full observation model

The full observation model is a model that directly models the distribution p (x) of the data points x, and the probability density value of the input data points x can be directly given. Namely, the total observation model directly models the target variable itself, and is typically an autoregressive model (autoregressive models).

13. Hidden variable model

The hidden variable model introduces one (or more) hidden variables z, and indirectly gives the density value p (x) = p (x | z) p (z) dz (taking one hidden variable as an example) of a data point by giving p (x | z) and p (z) for an input data point x. Namely, the hidden variable model models the target variable by a method of additionally introducing the hidden variable and modeling the relationship between the hidden variable and the target variable, and is typically a flow model (rivers) and a variational auto-encoders (vass) model.

14. Data batch

A data batch (data batch) can be understood as batch data or a plurality of data, and the plurality of data belong to the same type (for example, images, audio or sequences, etc.), or have a certain logical relationship or an association.

15. Probability distribution information

The probability distribution information refers to the probability distribution of variable values or the probability distribution of symbols (symbols). For the data type of the data batch being an image, a data batch includes a plurality of data (or data points), one data corresponds to a picture (including a plurality of sub-pixels), and each sub-pixel corresponds to a random variable. For a single-channel 8-bit picture, the value of this variable may range from 0 to 255.

Since the default modeling variables of the depth generation model are continuous random variables, and the operation objects of the lossless compression are discrete random variables, the current research focuses on how to adapt the existing explicit depth generation model for modeling continuous variables to the lossless compression encoder operating on discrete variables. For example: for the flow model, the existing relatively representative schemes include an Integer Discrete Flow (IDF), a local inverse coding (LBB) model and an inverse preserving volume flow (iVPF); for the variational self-encoder model, the existing relatively representative schemes include an inverse-coding asymmetric digital system (Bits-back ANS, BB-ANS), a bit-Swap algorithm (Bits-Swap) and a multi-layer hidden variable model coding algorithm (HiLLoC).

Although the current technical scheme solves the problem of adaptation of most depth generation models and lossless compression, the storage cost of the depth generation models is ignored. In practical applications, training large-scale data using deep generative models requires enormous training time and computational resources. Training a model for each data set and saving the model at the same time can bring high time cost and high model storage cost. Although data compression based on depth-generated models saves coding space compared to conventional lossless compression algorithms, the cost incurred by model storage is prohibitive if it is more space-efficient than coding.

In order to solve the above problems, the present application provides a data compression method, a data decompression method, and related devices. In the compression process, the data batch in the data set to be compressed can be compressed while the master model is updated (or optimized), and in the update process of the master model, the master model can more and more conform to the distribution of the data set to be compressed, so that the compression rate can be more and more high. And the updated new model is not required to be stored, so that the method is suitable for compression of large-scale data sets.

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

First, a system architecture provided in the embodiments of the present application is described.

Referring to fig. 1, a system architecture 100 is provided in accordance with an embodiment of the present invention. As shown in the system architecture 100, the data collecting device 160 is configured to collect training data, which in this embodiment of the present application includes: a training data set. Further, the training data may include training data sets of multiple data types. The data type may be image data, sequence data (e.g., audio, video, binary data, etc.), or the like. And stores the training data in database 130, and training device 120 trains to obtain target model/rule 101 based on the training data maintained in database 130. How the training device 120 obtains the target model/rule 101 based on the training data will be described in more detail below, where the target model/rule 101 can be used to implement the compression method and/or the decompression method provided in the embodiment of the present application, that is, the probability distribution information of values of each variable in the data can be obtained by inputting the data into the target model/rule 101 after relevant preprocessing. The target model/rule 101 in the embodiment of the present application may specifically be a depth generative model, and in the embodiment provided in the present application, the depth generative model is a mother model, and the mother model is obtained by training a training data set. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an AR/VR, a vehicle-mounted terminal, and may also be a server or a cloud. In fig. 1, the execution device 110 is configured with an I/O interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include, in an embodiment of the present application: the data set to be compressed or the data batch to be compressed (i.e. the first data batch and the second data batch) may be input by a user, or may be uploaded by the user through a shooting device, or may be from a database, which is not limited herein.

If the input data is a data set to be compressed, the preprocessing module 113 is configured to perform preprocessing according to the data set to be compressed received by the I/O interface 112. The preprocessing described above may not be included if the input data is a batch of data to be compressed, in other words, if the input data is a first batch of data and a second batch of data.

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.

Finally, the I/O interface 112 returns the processing results, such as the probability distribution information of the values of the variables in the data batch obtained as described above, to the client device 140, and thus provides the processing results to the user or the input encoder.

It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results or provide the input for the subsequent compression/decompression.

In the case shown in fig. 1, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the position relationship between the devices, modules, etc. shown in the diagram does not constitute any limitation, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 1, a target model/rule 101 is obtained by training according to a training device 120, where the target model/rule 101 may be a mother model in the embodiment of the present application, and specifically, in the network provided in the embodiment of the present application, the mother model may be a convolutional neural network for image processing.

Since CNN is a very common neural network, the structure of CNN will be described in detail below with reference to fig. 2. As described in the introduction of the basic concept, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images input thereto.

As shown in fig. 2, convolutional Neural Network (CNN) 100 may include an input layer 110, a convolutional/pooling layer 120, where the pooling layer is optional, and a neural network layer 130.

Convolutional layer/pooling layer 120:

and (3) rolling layers:

convolutional/pooling layers 120 as shown in FIG. 2 may include, for example, 121-126 layers, in one implementation, 121 layers are convolutional layers, 122 layers are pooling layers, 123 layers are convolutional layers, 124 layers are pooling layers, 125 is convolutional layers, 126 is pooling layers; in another implementation, 121, 122 are convolutional layers, 123 are pooling layers, 124, 125 are convolutional layers, and 126 are pooling layers. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

Taking convolutional layer 121 as an example, convolutional layer 121 may include many convolution operators, also called kernels, whose role in image processing is to act as a filter for extracting specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel (or two pixels by two pixels … …, which depends on the value of step stride), so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image in the process of performing the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same dimension are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image. Different weight matrixes can be used for extracting different features in the image, for example, one weight matrix is used for extracting image edge information, the other weight matrix is used for extracting specific colors of the image, the other weight matrix is used for blurring unwanted noise points in the image … …, the dimensions of the weight matrixes are the same, the dimensions of feature graphs extracted by the weight matrixes with the same dimensions are also the same, and the extracted feature graphs with the same dimensions are combined to form the output of convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can extract information from the input image, thereby helping the convolutional neural network 100 to make correct prediction.

When convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (e.g., 121) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 100 increases, the more convolutional layers (e.g., 126) that go further back extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.

A pooling layer:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, i.e., layers 121-126 as illustrated by 120 in FIG. 2, which may be one convolutional layer followed by one pooling layer or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a particular range to produce an average. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The neural network layer 130:

after processing by convolutional layer/pooling layer 120, convolutional neural network 100 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (class information or other relevant information as needed), the convolutional neural network 100 needs to generate one or a set of outputs of the number of classes as needed using the neural network layer 130. Accordingly, a plurality of hidden layers (131, 132 to 13n shown in fig. 2) and an output layer 140 may be included in the neural network layer 130, and parameters included in the hidden layers may be obtained by performing pre-training according to related training data of a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.

After the hidden layers in the neural network layer 130, i.e. the last layer of the whole convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the class cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from 110 to 140 in fig. 2 is the forward propagation) of the whole convolutional neural network 100 is completed, the backward propagation (i.e. the propagation from 140 to 110 in fig. 2 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.

It should be noted that the convolutional neural network 100 shown in fig. 2 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, as shown in fig. 3, a plurality of convolutional layers/pooling layers are parallel, and the features respectively extracted are all input to the whole neural network layer 130 for processing.

A hardware structure of a chip provided in an embodiment of the present application is described below.

Fig. 4 is a hardware structure of a chip provided in an embodiment of the present invention, where the chip includes a neural network processor 40. The chip may be provided in an execution device 110 as shown in fig. 1 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 1 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithm for each layer in the convolutional neural network shown in fig. 2 can be implemented in a chip as shown in fig. 4.

The neural network processor 40 may be any processor suitable for large-scale exclusive or operation processing, such as a neural-Network Processing Unit (NPU), a Tensor Processing Unit (TPU), or a Graphics Processing Unit (GPU). Taking NPU as an example: the neural network processor NPU40 is mounted as a coprocessor on a main processing unit (CPU) (host CPU), and tasks are distributed by the main CPU. The core portion of the NPU is an arithmetic circuit 403, and the controller 404 controls the arithmetic circuit 403 to extract data in a memory (weight memory or input memory) and perform arithmetic.

In some implementations, the arithmetic circuit 403 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 403 is a two-dimensional systolic array. The arithmetic circuit 403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 403 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 402 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 401 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in the accumulator 408.

The vector calculation unit 407 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, etc. For example, the vector calculation unit 407 may be used for network calculations of non-convolution/non-FC layers in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 407 can store the processed output vector to the unified buffer 406. For example, the vector calculation unit 407 may apply a non-linear function to the output of the arithmetic circuit 403, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 407 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 403, for example for use in subsequent layers in a neural network.

The unified memory 406 is used to store input data as well as output data.

The weight data directly passes through a memory unit access controller 405 (DMAC) to carry the input data in the external memory to the input memory 401 and/or the unified memory 406, store the weight data in the external memory into the weight memory 402, and store the data in the unified memory 506 into the external memory.

A Bus Interface Unit (BIU) 410, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 409 through a bus.

An instruction fetch buffer 409 connected to the controller 404 is used for storing instructions used by the controller 404.

The controller 404 is configured to call an instruction cached in the instruction memory 409 to implement controlling of a working process of the operation accelerator.

Generally, the unified memory 406, the input memory 401, the weight memory 402 and the instruction fetch memory 409 are On-Chip (On-Chip) memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM) or other readable and writable memories.

The operation of each layer in the convolutional neural network shown in fig. 2 or fig. 3 may be performed by the operation circuit 403 or the vector calculation unit 407.

The training method, the data compression method, and the data decompression method of the mother model according to the embodiments of the present application are described in detail below with reference to the accompanying drawings.

First, a method for training a mother model according to an embodiment of the present application will be described in detail with reference to fig. 5. The training method shown in fig. 5 may be performed by a training device of a mother model, which may be a cloud service device, a terminal device, for example, a device with sufficient computing power such as a computer or a server to perform the training method of the mother model, or a system composed of the cloud service device and the terminal device. Illustratively, the training method may be performed by the training apparatus 120 in fig. 1, the neural network processor 40 in fig. 4.

Optionally, the training method may be processed by a CPU, or may be processed by both the CPU and the GPU, or may use other processors suitable for neural network computation instead of the GPU, which is not limited in this application.

The training method shown in fig. 5 includes

steps

501 and 502. Step 501 and step 502 are described in detail below.

And step 501, acquiring training data batches.

In the embodiment of the present application, there are various ways to obtain the training data batch, which may be obtained by first obtaining the training data set and dividing the training data set, or may be obtained by directly obtaining the training data batch, and the specific details are not limited herein. The training data set or the training data batch may be acquired by an acquisition device (e.g., a video camera, a terminal, etc.), or may be obtained by directly selecting a data set such as CIFAR-10, imageNet, etc. as the training data set, which is not limited herein.

Optionally, a training data set is obtained, and the training data set is split into a plurality of training data batches, for example: a first training data batch, a second training data batch, and a third training data batch. Of course, the number of the training data set divided into data batches may be set according to actual needs, for example: 3 training data batches or 10 training data batches, etc., and the details are not limited herein.

And 502, taking the training data batch as the input of the neural network model, and training the neural network model by taking the loss function value smaller than the first threshold value as a target to obtain a trained mother model.

Optionally, after the training data batch is obtained, a corresponding deep generation model may be selected as the neural network model according to the data type of the training data batch (or the data type of the training data set). The data type may be an image data type, a text data type, an audio data type, or a binary data type, and is not limited herein.

Optionally, if the training data set is obtained in step 501, the mother model may be selected according to a data type of the training data set (similar to the training data batch), which is specifically similar to the training data batch, and details are not repeated here.

Because the depth generation model has better generalization on homogeneous data, the subsequent compression or decompression on the homogeneous data is facilitated. Therefore, the embodiments of the present application describe only the depth generation model as a parent model.

For example, the correspondence between the data type of the training data set or training data batch and the parent model can be shown in table 1.

TABLE 1

The data types and the mother models shown in table 1 are only examples, and in practical applications, there are other data types, mother models, and other corresponding manners of the data types and the mother models, and the details are not limited herein.

In this case, the neural network model is trained with the goal of reducing the value of the loss function, that is, the difference between the probability distribution information output by the neural network model and the actual probability distribution information of the variable values in the training data batch is continuously reduced (that is, the loss function is used to indicate the difference between the probability distribution information output by the neural network model and the actual probability distribution information of the variable values in the training data batch). The training process may be understood as a probability generation task. The loss function can be understood as a loss function corresponding to the probability generation task.

Optionally, the optimized mother model is obtained by optimizing the neural network model using an optimization algorithm and training data in batch. The optimization algorithm may be a gradient-based optimization algorithm, or may be meta learning, or may be an optimization manner such as reinforcement learning, which is not limited herein. It will also be understood that the process of optimizing the neural network model is the process of optimizing the loss function in the neural network model, generally to minimize the loss

Illustratively, continuing the above example, the training data set is divided into a first training data batch, a second training data batch, and a third training data batch. And sequentially using the first training data batch, the second training data batch and the third training data batch to optimize the depth generation model to obtain the mother model.

The process of optimizing the neural network model using the first training data batch is described below only with the first training data batch and with a gradient-based optimization algorithm as an example. Assuming a first training data batch acquired

Wherein x _1i Is the ith data of the first training data batch, and n is the size of the first training data batch. Slave neural network model

Obtaining probability distribution information of each variable value in x, and using a first training data batch B ₁ Calculating a loss function L (theta) ₀ ,B ₁ ) Gradient of (2)

And updating parameters of the neural network model by using a gradient-based optimization algorithm to obtain parameters of a new neural network model

To obtain a new neural network model

Wherein λ is a learning rate, which can be set according to actual needs, and is not limited herein. In order to enable the gradient descent method to have a good performance, we can set the value of the learning rate within a suitable range. Too large a learning rate leads to instability of learning, and too small a value leads to extremely long training times. The adaptive learning rate achieves a reasonable high rate by ensuring stable training, and can reduce training time.

In this manner, the neural network model is optimized according to the training data batch (e.g., the first training data batch) in the training data set, so that the trained mother model is more and more consistent with the distribution of the training data set.

It should be noted that the training process may also adopt other training methods instead of the aforementioned training method, and is not limited herein.

In the embodiment of the present application, due to the respective technical characteristics of different encoders, the encoders can be classified into a first-in-first-out type and a first-in-last-out type.

The following briefly describes the general flow of encoding and decoding (omitting probability distribution information) of the first-in-first-out encoder and the first-in-last-out encoder with reference to fig. 6A, 6B, 7A, and 7B.

Referring to fig. 6A and 6B, fig. 6A is a flow of encoding (or compressing) in an first-in-first-out encoder, and fig. 6B is a flow of decoding (or decompressing) in the first-in-first-out encoder. For an encoder of the first-in-first-out type (e.g. an arithmetic encoder), it can only decompress different data points (or data batches) from the encoded file (or encoded sequence, final code, etc.) in the order of compression during the decompression phase.

Referring to fig. 7A and 7B, fig. 7A is a flow of encoding (or compressing) of an in-then-out encoder, and fig. 7B is a flow of decoding (or decompressing) of an in-then-out encoder. For a first-in-last-out encoder (e.g., asymmetric digital system), it can only decompress different data points (or data batches) from the encoded file (or encoded sequence, final code, etc.) in the order opposite the compression order during the decompression phase. In order to realize that the sequence of the decompressed data batches in the decompression flow of 7B is data batch 1-data batch 2-data batch 3-data batch 4, the compression flow of 7A may be adjusted in a reverse compression mode, i.e. the compression is performed from the last data batch (e.g. data batch 4 in fig. 7A) to the front, so that all data batches can be decompressed from front to back during decompression.

Referring to fig. 7C, a lossless compression framework provided in this embodiment of the present application includes a compression process, an optimization process, and a decompression process. The left branch of the figure explains how p is used in the compression process at step t _t-1 Compressing the t-th batch (i.e., batch) _t ) And using the data batch to generate a depth generative model from p _t-1 Is updated to p _t (ii) a The right branch of the figure explains that in the decompression process, at step t, theWhat uses p _t-1 Decompressing the tth data batch and using the data batch to generate the depth generative model from p _t-1 Is updated to p _t . Wherein p is ₀ It can be understood as a parent model (or initial model) that the framework can perform compression of new data sets and migration (or optimization) of the parent model, and does not save the migrated new model. In the decompression process, the framework can perform edge-to-edge encoding and edge-to-edge decoding and migration on the mother model, so that all models required for decompression and corresponding original data are recovered. In the migration process, the model will conform to the distribution of the target data more and more, so the compression rate will be higher and higher. Meanwhile, the depth generation model has better generalization on the same-class data, and the compression ratio at the initial stage of the migration process is higher.

The following describes the data compression method and the data decompression method in detail in accordance with the embodiments of the present application with reference to the above application scenarios and the accompanying drawings.

The compression flow for the first-in first-out type encoder is slightly different from that for the first-in last-out type (or referred to as last-in first-out type) encoder, and is described below separately. For the compression process of the first-in-first-out encoder, refer to fig. 8, and for the compression process of the first-in-last-out encoder, refer to fig. 13.

Referring to fig. 8, an embodiment of a data compression method for an advanced first-out encoder provided in the embodiments of the present application, which may be executed by a compression apparatus or a component of the compression apparatus (e.g., a processor, a chip, or a system-on-chip), includes steps 801 to 806.

In step 801, a first data batch and a second data batch are obtained.

In the embodiment of the present application, there are various manners for acquiring the first data batch and the second data batch, which may be directly acquiring the first data batch and the second data batch, or acquiring the data set to be compressed, and splitting the data set to be compressed to obtain the first data batch and the second data batch, and the like, and the specific details are not limited herein. Wherein the data set to be compressed may be of the same or similar data type as the training data set. Of course, if the data type of the data set to be compressed is the same as the data type of the training data set, it is beneficial to improve the compression rate of the data set to be compressed.

In step 802, a master model is obtained.

Alternatively, the compression device may obtain the master model based on a data type of the data set to be compressed or the data batch to be compressed (e.g., the first data batch or the second data batch). The specific process of determining the master model based on the data type may refer to the description in step 502 in the embodiment shown in fig. 5, and the correspondence between the data type and the master model may refer to table 1, which is not described herein again.

Alternatively, the mother model may be a mother model trained by the training method shown in fig. 5.

In step 803, the first data batch is compressed to a first sequence based on the first data batch, the master model, and the first encoder.

The encoder to which the model is adapted may differ due to the different probability map structures of the different depth generative models. Optionally, after the mother model is obtained, a first encoder corresponding to the type of the mother model is determined based on a first association relationship, where the first association relationship is used to represent an association relationship between the type of the mother model and the first encoder. The type of the parent model may include the full observation model and the hidden variable model.

Alternatively, the first association relationship between the type of the parent model and the first encoder may be as shown in table 2.

TABLE 2

For the full observation model, the adaptive encoder is wide, and comprises arithmetic coding, an asymmetric digital system, huffman coding and the like. For the hidden variable model, since the inverse coding technique needs to be used, it is more suitable for the encoder combining the inverse coding technique and the asymmetric digital system, and of course, the encoder combining the inverse coding technique and the arithmetic coding can also be used.

It should be understood that the above table 2 is only an example, and in practical applications, there are other types, encoders or first association relations, and the details are not limited herein.

After the first data batch and the mother model are obtained, the first data batch is input into the mother model to obtain first probability distribution information, and the first probability distribution information is used for representing probability distribution of values of all variables in the first data batch. And inputting the first probability distribution information and the first data batch into a first encoder to be compressed to obtain a first sequence. For an exemplary procedure of obtaining the first sequence, reference may be made to fig. 9.

In step 804, the parent model is updated based on the first data batch to obtain a first model.

In the embodiment of the application, there are many ways to obtain the first model by updating (or optimizing) the mother model based on the first data in batch, and an optimization algorithm based on a gradient may be used, meta learning may also be used, reinforcement learning may also be used, and the like, and the specific details are not limited herein.

Optionally, the process of obtaining the first model based on the first batch optimization mother model is similar to the manner of optimizing the mother model based on the first training data batch in step 502 in the foregoing embodiment shown in fig. 5, and details are not repeated here. For an exemplary process of optimizing the parent model, reference may be made to FIG. 9.

Optionally, if the data set to be compressed includes N data batches, the mother model may be optimized using M data batches, where N is a positive integer greater than or equal to 2, and M is a positive integer less than or equal to N. It is also understood that the number of times of optimizing the mother model may be one time or N-1 times, and is not limited herein.

Alternatively, the first model may be derived using the first batch optimization master model multiple times. It can be understood that, in the embodiment of the present application, the data batch in the data batch optimization model may be used once or multiple times, which is not limited herein. If the data batch optimization model is used for multiple times, the times are saved, or the steps of gradient updating are saved, so that the reproducibility in the decompression process is realized.

In step 805, a third data batch is acquired. This step is optional.

Similar to the foregoing step 801, in the embodiment of the present application, the third data batch may be directly obtained, or the third data batch may be obtained from a data set to be compressed, which is not limited herein.

It is understood that the present step may be a separate step or may belong to step 801. If the step 805 belongs to the step 801, that is, the data set to be compressed includes the first data batch, the second data batch, and the third data batch, the compression apparatus splits the data set to be compressed to obtain the first data batch, the second data batch, and the third data batch.

Optionally, if the data set to be compressed includes N data batches, the data set to be compressed may be split to obtain N data batches.

In step 806, the second data batch is compressed to a second sequence based on the second data batch, the first model, the first sequence, and the first encoder.

Optionally, if the data batch to be compressed includes a first data batch and a second data batch, the second data batch may be input into the first model to obtain second probability distribution information, where the second probability distribution information is used to represent probability distribution of values of each variable in the second data batch. And inputting the second data batch, the second probability distribution information and the first sequence into a first encoder to be compressed to obtain a second sequence. For an exemplary procedure of obtaining the second sequence, reference may be made to fig. 9.

Alternatively, if arithmetic coding is employed, the first and second sequences may be identical.

In this embodiment of the application, the updating times based on the parent model are different, and this step is different, and the following description will be given by taking an example in which the data set to be compressed includes a first data batch, a third data batch, and a second data batch.

First, the number of updates to the master model differs by one from the number of data batches.

If the data batch to be compressed includes the first data batch, the second data batch and the third data batch, the third data batch may be compressed based on the third data batch, the first model, the first sequence and the first code to obtain a third sequence. And optimizing the first model using the third data batch to obtain a second model. And compressing the second data batch based on the second data batch, the second model, the third sequence and the first encoder to obtain a second sequence. Equivalently, the data set to be compressed comprises three data batches, and the master model is optimized 2 times.

Optionally, after the third data batch is obtained, the third data batch is input into the first model to obtain third probability distribution information, where the third probability distribution information is used to identify probability distribution of each variable value in the third data batch. And inputting the third probability distribution information and the first sequence into a first encoder to compress a third data batch to obtain a third sequence. And optimizing the first model using the third data batch to obtain a second model. And inputting the second data batch into a second model to obtain second probability distribution information, and inputting the third sequence and the second probability distribution information into a first encoder to compress the second data batch to obtain a second sequence. For an exemplary procedure of obtaining the second sequence, reference may be made to fig. 10.

Second, the number of times of updating the master model is different from the number of data batches by more than 2.

In this case, the early stop of the model update can be understood. I.e., the first model is not optimized with the third data batch. This case differs from the first case described above in that the optimization of the first model is not continued.

If the data batch to be compressed includes the first data batch, the second data batch and the third data batch, the third data batch may be compressed based on the third data batch, the first model, the first sequence and the first code to obtain a third sequence. And compressing the second data batch based on the second data batch, the first model, the third sequence and the first encoder to obtain a second sequence.

Optionally, after the third data batch is acquired, the third data batch is input into the first model to obtain third probability distribution information. And inputting the third probability distribution information and the first sequence into a first encoder to compress a third data batch to obtain a third sequence. And inputting the second data batch into the first model to obtain second probability distribution information, and inputting the third sequence and the second probability distribution information into the first encoder to compress the second data batch to obtain a second sequence. For an exemplary procedure of obtaining the second sequence, reference may be made to fig. 11. Equivalently, the data set to be compressed comprises three data batches, and the master model is optimized 1 time.

In the embodiment of the present application, if the data set to be compressed includes N data batches, reference may be made to fig. 12 for a compression process of the first-in first-out encoder, and the specific process is similar to the steps in the embodiment shown in fig. 8, and is not described here again. The first data batch can be regarded as a first compressed data batch, the second data batch can be regarded as a last compressed data batch, and the third data batch can be understood as a data batch between the first data batch and the second data batch. In addition, there may be more data batches between the first data batch and the second data batch besides the third data batch, and the specific number of the data batches is not limited here. N is a positive integer greater than 3. The nth sequence corresponds to the second sequence, the first data batch corresponds to the first data batch, and the nth data batch corresponds to the second data batch. It is understood that the N-1 th model and the N-2 th model may be the same model, or may be obtained by optimizing the N-2 th model and the N-1 th data batch, and the details are not limited herein. The model is equivalent to the mother model which can be optimized for N-1 times, N-2 times, N-3 times and the like.

Alternatively, if the master model is not optimized with N-1 batches of data, a hint may be generated that is stored following the master model, the hint indicating how many times the master model is updated, or which of the data sets to be compressed are not used to update the model. In turn, the decompression device determines whether to update the model, or to what extent (e.g., how many updates are made, which data batches are used to update the model) during decompression.

Alternatively, it may be configured in advance such that the decompression device determines whether to update the model, or to what extent.

In addition, if the randomness exists in the process of optimizing the parent model (or the compression process), corresponding random seeds are stored so as to realize lossless compression.

In a possible implementation manner, this embodiment includes steps 801 to 806, and in another possible implementation manner, this embodiment includes

steps

801, 802, 803, 804, and 806.

The compression process and the process of optimizing the parent model in the embodiment of the application can be parallel or serial. For example, the step 804 of optimizing the parent model may be before the step 803 of the compression flow, or the step 804 may be after the step 803, or the step 803 and the step 804 may be performed simultaneously, which is not limited herein.

In the embodiment of the application, on one hand, in the compression process, the data batch in the data set to be compressed can be compressed while the master model is optimized, and the optimized new model does not need to be stored, so that the compression method is suitable for compression of large-scale data sets. On the other hand, in the process of optimizing the master model, the master model will conform to the distribution of the data set to be compressed more and more, and therefore, the compression rate will be higher and higher. In addition, compared with the prior art that a new model needs to be trained specially for the data set to be compressed, the master model in the embodiment of the application has strong universality, can not need to be trained specially for the data set to be compressed, and continuously optimizes the model in the process of compressing the data set to be compressed. And the optimization process can be reproduced by optimizing the mother model in the decompression process, so that the optimized model can be obtained by storing the mother model without storing each model generated in the compression process, and the storage space is saved.

Referring to fig. 13, an embodiment of a data compression method for a first-in last-out encoder according to the present invention may be executed by a compression apparatus, or may be executed by a component (e.g., a processor, a chip, or a system-on-chip) of the compression apparatus, and the embodiment includes steps 1301 to 1306.

In step 1301, a first data batch and a second data batch are obtained.

In step 1302, a master model is obtained.

Steps

1301 and 1302 in this embodiment are similar to

steps

801 and 802 in the embodiment shown in fig. 8, and detailed description thereof is omitted here.

Optionally, when the data set to be compressed is split, the multiple data batches obtained by splitting may be sequentially marked, so that the compression device identifies which data batch is the last data batch in the subsequent compression process.

Optionally, after the mother model is obtained, probability distribution information may be obtained from the mother model and stored. In other words, since the present embodiment is a first-in last-out encoder (or a last-in first-out encoder), the first data batch acquired at the beginning and the first probability distribution information obtained according to the mother model are first put into a buffer, which needs to be processed from the last data batch to the first data batch.

In step 1303, the master model is updated based on the first data batch to obtain a first model.

Step 1303 in this embodiment is similar to step 804 in the embodiment shown in fig. 8, and detailed description thereof is omitted here.

In step 1304, the second data batch is compressed to a first sequence based on the second data batch, the first model, and the first encoder.

The manner of determining the first encoder in this embodiment is similar to the step 803 in the embodiment shown in fig. 8, and details thereof are not repeated here.

Optionally, if the data set to be compressed includes a first data batch and a second data batch, inputting the second data batch into the first model to obtain second probability distribution information, where the second probability distribution information is used to represent probability distribution of values of each variable in the second data batch. And inputting the second probability distribution information and the second data batch into the first code to be compressed to obtain a first sequence. For an exemplary procedure of obtaining the first sequence, reference may be made to fig. 14.

Optionally, if the data set to be compressed includes the first data batch, the second data batch and the third data batch, the first model may be obtained by optimizing the mother model using the first data batch, and the second model may be obtained by optimizing the first model using the second data batch. And inputting the second data batch into a second model to obtain second probability distribution information, wherein the second probability distribution information is used for representing the probability distribution of each variable value in the second data batch. And inputting the second probability distribution information and the second data batch into the first code and compressing to obtain a first sequence. For an exemplary procedure of obtaining the first sequence, reference may be made to fig. 15. Equivalently, the data set to be compressed comprises three data batches, and the master model is optimized 2 times.

Second, the number of times the master model is updated differs by more than 2 from the number of data batches.

Optionally, if the data set to be compressed includes the first data batch, the second data batch and the third data batch, the first model may be obtained by using the first data batch optimization mother model. And inputting the second data batch into the first model to obtain second probability distribution information, wherein the second probability distribution information is used for representing the probability distribution of each variable value in the second data batch. And inputting the second probability distribution information and the second data batch into the first code to be compressed to obtain a first sequence. For an exemplary procedure of obtaining the first sequence, refer to fig. 16. Equivalently, the data set to be compressed comprises three data batches, and the master model is optimized 1 time.

In this embodiment of the application, if the data set to be compressed includes N data batches, reference may be made to fig. 17 for a compression process of a first-in last-out encoder, and the specific process is similar to the steps in the embodiment shown in fig. 13 and is not described herein again. The first data batch can be regarded as a first compressed data batch, the second data batch can be regarded as a last compressed data batch, and the third data batch can be understood as a data batch between the first data batch and the second data batch. In addition, there may be more data batches between the first data batch and the second data batch besides the third data batch, and the specific number of the data batches is not limited here. N is a positive integer greater than 3. The Nth sequence corresponds to the second sequence, the first data batch corresponds to the first data batch, and the Nth data batch corresponds to the second data batch. It is understood that the N-1 th model and the N-2 th model may be the same model, or may be obtained by optimizing the N-2 th model and the N-1 th data batch, and the details are not limited herein. The model can be optimized N-1 times, N-2 times, N-3 times and the like.

In step 1305, a third data batch is acquired. This step is optional.

Step 1305 in this embodiment is similar to step 805 in the embodiment shown in fig. 8, and detailed description thereof is omitted here.

In step 1306, the first data batch is compressed to a second sequence based on the first data batch, the master model, the first sequence, and the first encoder.

After the first data batch and the mother model are obtained, the first data batch is input into the mother model to obtain first probability distribution information, and the first probability distribution information is used for representing probability distribution of values of all variables in the first data batch. And inputting the first probability distribution information, the first sequence and the first data batch into a first encoder to be compressed to obtain a second sequence. For example, the process of acquiring the second sequence may refer to fig. 13, fig. 14, fig. 15, fig. 16, or fig. 17.

Optionally, if the data set to be compressed includes a first data batch and a second data batch, the first data batch is input into the mother model to obtain first probability distribution information, and the first probability distribution information is used for representing probability distribution of values of each variable in the first data batch. And inputting the first probability distribution information, the first sequence and the first data batch into a first encoder to obtain a second sequence. For an exemplary procedure of obtaining the second sequence, reference may be made to fig. 14.

In the embodiment of the present application, the updating times based on the parent model are different, and the step is slightly similar, and the following description will be given by taking an example in which the data set to be compressed includes a first data batch, a third data batch, and a second data batch.

Optionally, after the first sequence is obtained, the third data batch may be input into the first model to obtain third probability distribution information, where the third probability distribution information is a probability distribution of values of each variable in the third data batch. And inputting the third probability distribution information, the first sequence and the third data batch into a first encoder to be compressed to obtain a third sequence. And inputting the first data batch into the mother model to obtain first probability distribution information, wherein the first probability distribution information is used for expressing the probability distribution of each variable value in the first data batch. And inputting the first probability distribution information, the third sequence and the first data batch into a first encoder to be compressed to obtain a second sequence.

For example, if the number of times the parent model is updated differs from the number of data batches by one. The flow of acquiring the second sequence may refer to fig. 15. Equivalently, the data set to be compressed comprises three data batches, and the master model is optimized 2 times.

For example, if the number of times of updating the parent model is different from the number of data batches by more than 2. The flow of acquiring the second sequence may refer to fig. 16. Equivalently, the data set to be compressed comprises three data batches, and the master model is optimized 1 time.

In the embodiment of the present application, if the data set to be compressed includes N data batches, reference may be made to fig. 17 for a compression process of a first-in last-out encoder. The compression flow of fig. 17 is similar to the compression flow shown in fig. 15 or fig. 16, and is not described again here.

Alternatively, if the master model is not optimized with N-1 batches of data, a hint may be generated that is stored following the master model, the hint indicating how many times the master model is updated, or which of the data sets to be compressed are not used to update the model. And in the decompression process, the decompression equipment judges whether the model is updated or not or to what extent.

In addition, if randomness exists in the process of optimizing the mother model (or the compression process), corresponding random seeds are saved so as to realize lossless compression.

In one possible implementation manner, the present embodiment includes steps 1301 to 1306, and in another possible implementation manner, the present embodiment includes

steps

1301, 1302, 1303, 1304, and 1306.

The timing relationship of the steps in the embodiments of the present application is not fixed. For example, step 1303 of optimizing the parent model may be performed before step 1304 of the compression flow, step 1303 may also be performed after step 1304, and step 1303 and step 1304 may also be performed simultaneously, which is not limited herein.

In the embodiment of the present application, on one hand, due to the characteristics of this type of encoder (i.e. the first-in last-out type), the data batch of the preamble bit needs to be compressed in the following bit, that is, the order in which the data batch is compressed needs to be opposite to the order in which the model data batch is migrated. And (3) caching the information of all the compressed data batches, and performing reverse-order compression after model optimization is completed, wherein the optimized new model is not required to be stored in the process, so that the method is suitable for compression of large-scale data sets. On the other hand, in the process of optimizing the master model, the master model will conform to the distribution of the data set to be compressed more and more, and therefore, the compression rate will be higher and higher. On the other hand, the decompression process can obtain data batches in sequence through the reverse order adjustment of the first-in-last-out type encoder.

Referring to fig. 18, an embodiment of a data decompression method provided in this application may be executed by a decompression device, or may be executed by a component (e.g., a processor, a chip, or a system-on-chip) of the decompression device, where the embodiment includes steps 1801 to 1805.

In step 1801, a second sequence is obtained.

The second sequence in this embodiment may be a final coding sequence (i.e., a second sequence) obtained by the above compression method, may also be a second sequence sent by the receiving compression device, and may also be obtained in other manners, for example, the second sequence forwarded by other devices, which is not limited herein.

In step 1802, a master model is obtained.

The method for acquiring the master model in this embodiment may be the master model obtained by training through the training method shown in fig. 5, or may be a method of receiving the master model sent by the compression device, and the like, and is not limited herein.

The master model in this embodiment is the same as the master model in the embodiment shown in fig. 13, and thus lossless compression can be achieved.

In step 1803, the second sequence is decompressed based on the mother model and the first encoder to obtain a fourth sequence and the first data batch.

The manner of determining the first encoder in the embodiment of the present application is similar to that described in step 803 in the embodiment shown in fig. 8, and is not repeated here.

After the master model is obtained, the probability distribution carried by the master model can be obtained, and the step can be divided into two cases according to whether the conditional probability is included in the master model, which are described below respectively.

First, the mother model does not include conditional probabilities, and can directly obtain probability distribution information.

In other words, the first probability distribution information (i.e., the distribution p (x) in the foregoing noun explanation) for representing the probability distribution of values of the respective variables in the first data batch can be directly obtained from the mother model. And decompressing the second sequence using the first probability distribution information and the first encoder to obtain a fourth sequence and the first data batch.

Alternatively, if arithmetic coding is employed, the fourth sequence may be identical to the second sequence.

Second, the mother model includes conditional probabilities, and probability distribution information can be indirectly obtained.

In other words, incomplete probability information (i.e., a first probability) may be obtained from the mother model, a portion of the data may be obtained by decompressing the second sequence with the first probability and the first encoder, the portion of the data may be input into the mother model to obtain a second probability, and another portion of the data and the fourth sequence may be obtained by decompressing the second sequence based on the second probability and the first encoder. Wherein a first data batch can be obtained from one part of data and another part of data.

Illustratively, a portion of the probability p is obtained by the parent model _x2 By p _x2 Decompressing the second sequence to obtain A ₂ A is ₂ Inputting the mother model to obtain p (x) ₁ |A ₂ ) With p (x) ₁ |A ₂ ) Decompressing the second sequence to obtain A ₁ Finally, the first data batch (A) is obtained ₁ ,A ₂ )。

In step 1804, the master model is just updated based on the first data batch to obtain a first model.

The manner of updating (or optimizing) the mother model based on the first data batch in this step to obtain the first model is similar to that described in step 804 of the embodiment shown in fig. 8, and is not described herein again.

Optionally, if there is a random seed, performing optimization based on the random seed, so that the model optimized in the decompression process is consistent with the model optimized in the compression process or within a certain error range, thereby implementing lossless compression.

Optionally, during or before the process of optimizing the mother model by using the first data batch, the number of times of using the data batch, for example, the number of times of updating the gradient in the embodiment shown in fig. 8, may be obtained first, and the mother model is updated according to the number of times, so as to achieve reproducibility in the decompression process.

In step 1805, the fourth sequence is decompressed based on the first model and the first encoder to obtain a second data batch.

After the mother model is optimized to obtain the first model, second probability distribution information (the specific obtaining manner may be as described above to directly obtain the probability distribution information, or as described above to indirectly obtain the probability distribution information) may be obtained from the first model, and the second probability distribution information is used to represent the probability distribution of values of each variable in the second data batch. And decompressing the fourth sequence by using the second probability distribution information and the first encoder to obtain a second data batch, so as to decompress the first data batch and the second data batch. Exemplarily, a decompression flow as shown in fig. 19. Optionally, the initial data set, that is, the data set to be compressed before compression, is recovered according to the first data batch, the second data batch, and the decompression sequence. Thereby realizing lossless compression.

Optionally, the fifth sequence and the third data batch are obtained by decompressing the fourth sequence using the second probability distribution information and the first encoder. And optimizing the first model using the third data batch to obtain a second model. And acquiring second probability distribution information through the second model, and decompressing the fifth sequence by using the second probability distribution information and the first encoder to obtain a second data batch. So that the first data batch, the second data batch and the third data batch are obtained through decomposition. Illustratively, the decompression flow is shown in fig. 20.

Optionally, the fifth sequence and the third data batch are obtained by decompressing the fourth sequence using the second probability distribution information and the first encoder. If the first model carries prompt information or receives prompt information sent by the compression device, the prompt information is used for indicating that the first model is not optimized to the second model in the compression process (or the prompt information is understood to be used for indicating early stop information in the compression process, and the decompression device can judge the times of optimizing the mother model according to the early stop information to further determine the corresponding probability distribution of the model), the decompression device can obtain second probability distribution information through the first model, and decompress the fifth sequence by using the second probability distribution information and the first encoder to obtain the second data batch. So that the first data batch, the second data batch and the third data batch are extruded. Illustratively, the decompression flow is shown in fig. 21.

Optionally, the initial data set, that is, the data set to be compressed before compression, is recovered according to the first data batch, the second data batch, the third data batch, and the decompression sequence. Thereby realizing lossless compression.

In this embodiment of the application, if the data set to be compressed includes N data batches, the decompression process may refer to fig. 22, and the specific process is similar to the steps in the embodiment shown in fig. 18, and is not described here again. The first data batch can be regarded as a first compressed data batch, the second data batch can be regarded as a last compressed data batch, and the third data batch can be understood as a data batch between the first data batch and the second data batch. In addition, there may be more data batches between the first data batch and the second data batch besides the third data batch, and the specific number of the data batches is not limited here. N is a positive integer greater than 3. The nth sequence corresponds to the second sequence, the first data batch corresponds to the first data batch, and the nth data batch corresponds to the second data batch. It is understood that the N-1 th model and the N-2 th model may be the same model, or may be obtained by optimizing the N-2 th model and the N-1 th data batch, and the details are not limited herein. The model is equivalent to the mother model which can be optimized for N-1 times, N-2 times, N-3 times and the like.

Optionally, an initial data set, that is, a data set to be compressed before compression, is recovered according to the first data batch, the (N-1) th data batch, the nth data batch, and the decompression sequence. Thereby realizing lossless compression.

In the embodiment of the application, on one hand, in the decompression process, the mother model can be optimized by decompressing the coding sequence without storing the optimized new model, and therefore, the method is suitable for decompression of large-scale data sets. On the other hand, according to the prompt information or the preset configuration information, an optimized model used in the compression process can be obtained, so that probability distribution is determined, an initial data set is obtained, and lossless compression is achieved. On the other hand, as the special training of the data set to be compressed in the prior art is not needed, the compression time is reduced by the time consumed for specially training the data set to be compressed compared with the existing compression time. In addition, compared with the prior art that a new model needs to be trained specially for the data set to be compressed, the master model in the embodiment of the application has strong universality, can not need to be trained specially for the data set to be compressed, and continuously optimizes the model in the process of compressing the data set to be compressed. And the optimization process can be reproduced by optimizing the mother model in the decompression process, so that the optimized model can be obtained by storing the mother model without storing each model generated in the compression process, and the storage space is saved.

In this embodiment of the present application, in the data compression method or the data decompression method, if the data set to be compressed is too large, the data set to be compressed may be divided into a plurality of data clusters, and each data cluster is processed by using the compression method or the decompression method. In this way, the need for buffer space may be reduced for the compression methods of fig. 13-17. I.e., the storage space required to store the data batch and/or the probability distribution information corresponding to the data batch.

Alternatively, each data cluster produces an independent code, and the order of the different independent codes needs to be the same as the order of the clusters. For example, 6 data are batched into 2 data clusters. Since the compression direction of the data is in the reverse order inside each data cluster, a larger buffer space is required than in the flow of the first-in-first-out encoder. Before the data batches of the same data cluster are subjected to online migration of the model, probability mass distribution required by the data batches and compression is stored in the cache space in sequence; when the optimization of the data batch of the data cluster is completed, the encoder performs compression encoding on the data batch of the data cluster by using the information in the cache in the reverse order. Due to the cache mechanism, in the online migration of the type-before-out encoder, the compression flow of the encoder is lagged behind the online migration flow of the model, and certain parallelism can be realized. In the decompression stage, the encoded files are read according to the order of the clusters, and the data batch of the corresponding cluster is decompressed from each encoded file. And in each step of online compression and decompression, a data batch in the current step is firstly decompressed by using the current model, and the model is migrated by using the data batch decompressed in the current step. Since the optimization of the model in this process cannot be ahead of the decompression of the required data batch, no extra buffer is needed in this stage and the decoupling and high parallelism of the migration flow (model optimization process) and the encoder operation flow (data decompression process) in the compression stage cannot be realized.

In the embodiment of the present application, if the encoder is an encoder combining an inverse coding technique and an asymmetric digital system, the input in the compression process and the output in the decompression process also have initial bits (initial bits).

In the embodiment of the present application, if the encoder is an arithmetic encoder, the intermediate information transmitted between each step in the compression process is an interval, and the interval can be understood as a plurality of coded sequences. During decompression, if the decompression process is not over, the same encoded sequence is passed between each step, e.g., the second sequence is the same as the fourth sequence in FIG. 21, as previously described.

For the convenience of intuitively seeing the beneficial effects of the data compression method and the data decompression method (hereinafter referred to as one-shot online migration (OSOA)) in the embodiment of the present application, the following compares the existing compression methods.

Two existing models used in algorithms that use depth-generated models for lossless compression are selected. One is the flow model in IDF + + + and the other is an inverse autoregressive flow variable auto-encoder (IAF VAE) in HiLLoC. For the two models, a mother model is obtained on a pre-training data set CIFAR-10 respectively. The optimization algorithm adopts a gradient-based optimization algorithm. Because both the above two models are hidden variable models, both compression and decompression adopt an encoder combining an inverse coding technique and an asymmetric digital system, and simultaneously adopt the compression flow shown in the above fig. 13 to fig. 17.

Illustratively, it is necessary to perform online migration compression on an image with a size of N × C × H × W, where N is the size of a data batch, C is the number of channels of the image (color image is 3, and black-and-white image is 1), and H and W are the number of pixels in the height and width dimensions of the image, respectively. In addition, the data SET to be compressed is randomly selected from 131072 pictures in YFCC100M, and is cut into 3 × 256 pictures to form a data SET256. A downsampling technique is then used to derive a data SET128 of size 3 x 128 from the SET256, SET64 of size 3 x 64 and SET32 of size 3 x 32.

For the N of the above IDF + + +, 3,3,12,48 was chosen. For the above described N of HiLLoC 4,16,64,256 was chosen. Wherein for IDF + + +, each SET256 picture is cropped into four pictures of size 3 x 128.

The effect of the parent model compressing the data set to be compressed by the OSOA compared with the effect of compressing the data set to be compressed by other methods can refer to table 3, and the unit is: number of bits per dimension.

TABLE 3

The HiLLoC oso and IDF + + oso are schemes that adopt the compression method provided in the embodiments of the present application, and HiLLoC pre-train and IDF + + pre-train are models that are obtained by training specifically for data sets in the prior art. It can be seen that the number of bits per dimension of the scheme employing the OSOA is smaller than the corresponding existing scheme. For example: when the data SET to be compressed is SET256, IDF + + OSOA is reduced by 0.836 (i.e., 3.170-2.334) bits per dimension compared to IDF + + pre-train. Here, the lower the number of bits per dimension, the higher the compression ratio, and therefore IDF + + OSOA is improved by the number of bits per dimension of 0.836 as compared with IDF + + pre-train.

With reference to fig. 23, a compression device and a decompression device in the embodiment of the present application are described below, and an embodiment of a compression device in the embodiment of the present application includes:

an acquiring unit 2301, configured to acquire a first data batch and a second data batch;

an obtaining unit 2301, further configured to obtain a mother model, where the mother model is used to calculate first probability distribution information, and the first probability distribution information is used to represent probability distributions of values of variables in the first data batch;

a compressing unit 2302, configured to compress the first data batch to obtain a first sequence based on the first data batch, the mother model and a first encoder, where the first encoder corresponds to the mother model;

an updating unit 2303, configured to optimize the mother model based on the first data batch to obtain a first model;

the compressing unit 2302 is further configured to compress the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence, and the first encoder.

Optionally, the compression device further comprises:

a determining unit 2304, configured to determine the first encoder corresponding to the type of the mother model based on a first association relationship, where the first association relationship is used to represent an association relationship between the type of the mother model and the first encoder, and the type includes a full observation model and a hidden variable model.

In this embodiment, operations performed by each unit in the compression device are similar to those described in the embodiments shown in fig. 8 to 12, and are not described again here.

In this embodiment, on the one hand, in the compression process, the compression unit 2302 compresses the data batch to be compressed, and the update unit 2303 optimizes the master model without saving the optimized new model, so that the compression method is suitable for compression of a large-scale data set. On the other hand, in the process of optimizing the parent model by the updating unit 2303, the parent model will conform to the distribution of the data batch to be compressed more and more, and therefore, the compression rate will be higher and higher. In addition, compared with the prior art that a new model needs to be trained specifically for the data set to be compressed, the master model in the embodiment of the present application has strong universality, and can continuously optimize the model by the updating unit 2303 in the process of compressing the data set to be compressed by the compressing unit 2302 without training specifically for the data set to be compressed. And the optimization process can be reproduced by optimizing the mother model in the decompression process, so that the optimized model can be obtained by storing the mother model without storing each model generated in the compression process, and the storage space is saved.

Referring to fig. 24, another embodiment of the compressing apparatus in the embodiment of the present application includes:

an acquiring unit 2401, configured to acquire a first data batch and a second data batch;

the obtaining unit 2401 is further configured to obtain a mother model, where the mother model is used to calculate first probability distribution information, and the first probability distribution information is used to represent probability distributions of values of variables in the first data batch;

an updating unit 2402, configured to optimize the mother model based on the first data batch to obtain a first model;

a compressing unit 2403, configured to compress the second data batch to obtain a first sequence based on the second data batch, the first model, and a first encoder, where the first encoder corresponds to the mother model;

a compressing unit 2403, further configured to compress the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence, and the first encoder.

In this embodiment, operations performed by each unit in the compression device are similar to those described in the embodiments shown in fig. 13 to 17, and are not described again here.

In this embodiment, the method is applicable to a first-in last-out encoder, and on one hand, in the compression process, the compression unit 2403 may compress the data to be compressed in batch, and the update unit 2402 optimizes the master model without storing the optimized new model, so that the method is suitable for compressing large-scale data sets. On the other hand, in the process of optimizing the master model by the updating unit 2402, the master model will conform to the distribution of the data batch to be compressed more and more, and therefore, the compression rate will be higher and higher. On the other hand, the decompression process can obtain data batches in sequence through the reverse order adjustment of the first-in last-out type encoder.

Referring to fig. 25, an embodiment of a decompression apparatus in an embodiment of the present application includes:

an obtaining unit 2501, configured to obtain a second sequence;

an obtaining unit 2501, further configured to obtain a mother model;

a decompression unit 2502, configured to decompress the second sequence based on the mother model and a first encoder to obtain a fourth sequence and a first data batch, where the first encoder corresponds to the mother model;

an updating unit 2503, configured to optimize the mother model based on the first data batch to obtain a first model;

the decompression unit 2502 is further configured to decompress the fourth sequence to obtain a second data batch based on the first model and the first encoder.

Optionally, the decompression apparatus further comprises a determining unit 2504 and/or a merging unit 2505.

A determining unit 2504, configured to determine the first encoder corresponding to the type of the mother model based on a first association relationship, where the first association relationship is used to represent an association relationship between the type of the mother model and the first encoder, and the type includes a full observation model and a hidden variable model.

A merging unit 2505, configured to merge the first data batch and the second data batch to obtain an initial data set.

In this embodiment, operations performed by each unit in the decompression device are similar to those described in the embodiments shown in fig. 18 to fig. 22, and are not described again here.

In this embodiment, on one hand, in the decompression process, the decompression unit 2502 decompresses the coding sequence, the update unit 2503 optimizes the mother model, and the optimized new model does not need to be saved, so that the method is suitable for compression of large-scale data sets. On the other hand, the mother model in the embodiment of the application has strong universality and can not be specially trained for the data set to be compressed. In addition, the optimization process can optimize the reproduction of the master model through the updating unit 2503 in the decompression process, so that the optimized model can be obtained by storing the master model, each model generated in the compression process does not need to be stored, and the storage space is saved.

Referring to fig. 26, a schematic diagram of another compression apparatus is provided. The compression device may include a processor 2601, a memory 2602, and a communication interface 2603. The processor 2601, memory 2602, and communication interface 2603 are interconnected by wires. Among other things, memory 2602 has stored therein program instructions and data.

The memory 2602 stores program instructions and data corresponding to the steps executed by the compression device in the corresponding embodiments shown in fig. 8 to 12 or fig. 13 to 17.

Optionally, the memory 2602 stores batches of data to be compressed (e.g., the first, second, or third batches of data, etc.) and encoded sequences (e.g., the first, second, etc.)

A processor 2601, configured to perform the steps performed by the compression device according to any of the embodiments shown in fig. 8 to 12 or fig. 13 to 17.

Optionally, the first encoder may be located in the processor 2601 or located outside the processor 2601, which is not limited herein.

The communication interface 2603 may be used for receiving and transmitting data, and is configured to perform the steps related to acquiring, transmitting and receiving in any of the embodiments shown in fig. 8 to 12 or fig. 13 to 17.

In one implementation, the compression device may include more or fewer components than those shown in fig. 26, which are merely illustrative and not limiting.

Referring to fig. 27, a schematic structural diagram of another decompression device provided in the present application is shown. The decompression device may include a processor 2701, memory 2702, and communication interface 2703. The processor 2701, memory 2702, and communication interface 2703 are interconnected by wires. The memory 2702 stores therein program instructions and data.

The memory 2702 stores program instructions and data corresponding to the steps executed by the decompression device in the above-described embodiment corresponding to fig. 18 to 22.

Optionally, the memory 2702 stores the decompressed data batches (e.g., the first data batch, the second data batch, the third data batch, etc.) and the encoded sequences (e.g., the first sequence, the second sequence, etc.)

The processor 2701 is configured to perform the steps performed by the decompression device in any of the embodiments shown in fig. 18 to fig. 22.

Optionally, the first encoder may be located in the processor 2701 or located outside the processor 2701, which is not limited herein.

The communication interface 2703 may be used for receiving and transmitting data, and is used for executing the steps related to the acquiring, transmitting and receiving in any of the embodiments shown in fig. 18 to fig. 22.

In one implementation, the decompression device may include more or fewer components than those shown in fig. 27, which are merely illustrative and not limiting.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit described above may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.

When the integrated unit is implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Claims

1. A method of data compression, the method comprising:

acquiring a first data batch and a second data batch;

acquiring a mother model, wherein the mother model is used for calculating first probability distribution information, and the first probability distribution information is used for representing probability distribution of values of all variables in the first data batch;

compressing the first data batch to obtain a first sequence based on the first data batch, the mother model and a first encoder, wherein the first encoder corresponds to the mother model;

updating the mother model based on the first data batch to obtain a first model;

compressing the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence, and the first encoder.

2. The method of claim 1, further comprising:

acquiring the third data batch;

compressing the second data batch based on the second data batch, the first model, the first sequence, and the first encoder to obtain a second sequence, comprising:

compressing the third data batch to obtain a third sequence based on the third data batch, the first model, the first sequence and the first encoder;

updating the first model based on the third data batch to obtain a second model;

compressing the second data batch to obtain the second sequence based on the second data batch, the second model, the third sequence, and the first encoder.

3. The method of claim 1, further comprising:

acquiring the third data batch;

compressing the third data batch based on the third data batch, the first model, the first sequence and the first encoder to obtain a third sequence;

compressing the second data batch to obtain the second sequence based on the second data batch, the first model, the third sequence, and the first encoder.

4. The method according to any one of claims 1 to 3, wherein the updating the parent model based on the first data batch results in a first model comprising:

and updating the mother model by using an optimization algorithm and the first data batch to obtain the first model, wherein the optimization algorithm comprises a gradient-based optimization algorithm, meta-learning or reinforcement learning.

5. The method according to any one of claims 1 to 4, wherein the master model is obtained by training a neural network model with a training data batch in a training data set as an input of the neural network model, and with a loss function having a value smaller than a first threshold as a target, the training data set having a data type same as that of a data set to be compressed, the data set to be compressed including the first data batch and the second data batch, the loss function indicating a difference between probability distribution information output by the neural network model and actual probability distribution information of variable values in the training data batch.

6. The method of any of claims 1 to 5, wherein compressing the first data batch based on the first data batch, the master model, and a first encoder to obtain a first sequence comprises:

inputting the first data batch into the mother model to obtain the first probability distribution information;

compressing the first data batch to obtain the first sequence based on the first probability distribution information and the first encoder.

7. The method of any of claims 1 to 6, wherein compressing the second data batch into a second sequence based on the second data batch, the first model, the first sequence, and the first encoder comprises:

inputting the second data batch into the first model to obtain second probability distribution information, wherein the second probability distribution information is used for representing the probability distribution of values of all variables in the second data batch;

and compressing the second data batch to obtain the second sequence based on the second probability distribution information, the first sequence and the first encoder.

8. The method of claim 2, wherein compressing the second data batch to obtain a second sequence based on the second data batch, the second model, the third sequence, and the first encoder comprises:

inputting the second data batch into the second model to obtain second probability distribution information, wherein the second probability distribution information is used for representing the probability distribution of each variable value in the second data batch;

and compressing the second data batch to obtain the second sequence based on the second probability distribution information, the third sequence and the first encoder.

9. The method of claim 3, wherein compressing the second data batch based on the second data batch, the first model, the third sequence, and the first encoder to obtain the second sequence comprises:

10. The method of claim 8 or 9, wherein compressing the third data batch based on the third data batch, the first model, the first sequence, and the first encoder to obtain a third sequence comprises:

inputting the third data batch into the first model to obtain third probability distribution information, wherein the third probability distribution information is used for representing the probability distribution of each variable value in the third data batch;

and compressing the third data batch to obtain the third sequence based on the third probability distribution information, the first sequence and the first encoder.

11. The method according to any one of claims 1 to 10, further comprising:

determining the first encoder corresponding to the type of the mother model based on a first incidence relation, wherein the first incidence relation is used for representing the incidence relation between the type of the mother model and the first encoder, and the type comprises a full observation model and a hidden variable model.

12. The method of any one of claims 1 to 10, wherein the obtaining a master model comprises:

and acquiring the mother model based on the data type of the data set to be compressed, wherein the data type comprises an image data type and a sequence data type.

13. The method of any one of claims 1 to 12, wherein the obtaining the first data batch and the second data batch comprises:

acquiring a data set to be compressed;

and splitting the data set to be compressed to obtain the first data batch and the second data batch.

14. A method of data compression, the method comprising:

acquiring a first data batch and a second data batch;

acquiring a mother model, wherein the mother model is used for calculating first probability distribution information, and the first probability distribution information is used for representing the probability distribution of each variable value in the first data batch;

compressing the second data batch to obtain a first sequence based on the second data batch, the first model and a first encoder, wherein the first encoder corresponds to the mother model;

compressing the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence and the first encoder.

15. The method of claim 14, further comprising:

acquiring the third data batch;

compressing the second data batch based on the second data batch, the first model, and the first encoder to obtain a first sequence, comprising:

compressing the second data batch to obtain a first sequence based on the second data batch, the second model and a first encoder;

compressing the first data batch based on the first data batch, the mother model, the first sequence, and the first encoder to obtain a second sequence, comprising:

and compressing the first data batch to obtain the second sequence based on the first data batch, the mother model, the third sequence and the first encoder.

16. The method of claim 14, further comprising:

acquiring the third data batch;

compressing the first data batch based on the first data batch, the mother model, the third sequence, and the first encoder to obtain the second sequence.

17. A method of data decompression, the method comprising:

acquiring a second sequence;

acquiring a mother model;

decompressing the second sequence based on the mother model and a first encoder to obtain a fourth sequence and a first data batch, wherein the first encoder corresponds to the mother model;

updating the parent model based on the first data batch to obtain a first model;

and decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch.

18. The method of claim 17, wherein decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch comprises:

decompressing the fourth sequence based on the first model and the first encoder to obtain a fifth sequence and a third data batch;

and decompressing the fifth sequence based on the second model and the first encoder to obtain the second data batch.

19. The method of claim 17, wherein decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch comprises:

and decompressing the fifth sequence based on the first model and the first encoder to obtain the second data batch.

20. The method of any of claims 17 to 19, wherein said updating the master model based on the first data batch to obtain a first model comprises:

and updating the mother model by using an optimization algorithm and the first data batch to obtain the first model, wherein the optimization algorithm comprises a gradient-based optimization algorithm, meta learning or reinforcement learning.

21. The method according to any one of claims 17 to 20, wherein the master model is obtained by training a neural network model with a training data batch in a training data set as an input of the neural network model, and with a loss function having a value smaller than a first threshold value as a target, the training data set having a data type same as a data type of a data set to be compressed, the data set to be compressed including the first data batch and the second data batch, the loss function indicating a difference between probability distribution information output by the neural network model and actual probability distribution information of variable values in the training data batch.

22. The method according to any one of claims 17 to 21, wherein the decompressing the second sequence based on the mother model and the first encoder to obtain a fourth sequence and a first data batch comprises:

decompressing the second sequence based on the second sequence, the mother model and the first encoder to obtain the fourth sequence and the first data batch;

the decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch includes:

and decompressing the fourth sequence based on the fourth sequence, the first model and the first encoder to obtain the second data batch.

23. The method of claim 22, wherein decompressing the second sequence based on the second sequence, the mother model, and the first encoder to obtain the fourth sequence and the first data batch comprises:

obtaining a first probability based on the mother model;

decompressing the second sequence based on the first probability and the first encoder to obtain a portion of data;

inputting the part of data into the mother model to obtain a second probability;

decompressing the second sequence based on the second probability and the first encoder to obtain another portion of data and the fourth sequence;

and obtaining the first data batch based on the part of data and the other part of data.

24. The method according to any one of claims 17 to 21, wherein the decompressing the second sequence based on the mother model and the first encoder to obtain a fourth sequence and a first data batch comprises:

acquiring first probability distribution information based on the mother model, wherein the first probability distribution information is used for representing the probability distribution of each variable value in the first data batch;

decompressing the second sequence based on the first probability distribution information and the first encoder to obtain the fourth sequence and the first data batch.

25. The method of any of claims 17 to 24, wherein decompressing the fourth sequence based on the first model and the first encoder to obtain a second data batch comprises:

acquiring second probability distribution information based on the first model, wherein the second probability distribution information is used for representing the probability distribution of each variable value in the second data batch;

and decompressing the fourth sequence based on the second probability distribution information and the first encoder to obtain the second data batch.

26. The method of any one of claims 17 to 25, further comprising:

27. The method of any one of claims 17 to 26, further comprising:

and combining the first data batch and the second data batch to obtain an initial data set.

28. A compression apparatus, characterized in that the compression apparatus comprises:

the acquisition unit is used for acquiring a first data batch and a second data batch;

the acquiring unit is further configured to acquire a mother model, where the mother model is used to calculate first probability distribution information, and the first probability distribution information is used to represent probability distribution of values of each variable in the first data batch;

the compressing unit is further configured to compress the second data batch to obtain a second sequence based on the second data batch, the first model, the first sequence, and the first encoder.

29. A compression apparatus, characterized in that the compression apparatus comprises:

the acquiring unit is further configured to acquire a mother model, where the mother model is used to calculate first probability distribution information, and the first probability distribution information is used to represent probability distributions of values of variables in the first data batch;

a compressing unit, configured to compress the second data batch to obtain a first sequence based on the second data batch, the first model, and a first encoder, where the first encoder corresponds to the mother model;

the compression unit is further configured to compress the first data batch to obtain a second sequence based on the first data batch, the mother model, the first sequence, and the first encoder.

30. A decompression device, characterized in that it comprises:

an acquisition unit configured to acquire a second sequence;

the acquiring unit is used for acquiring a mother model;

the decompression unit is used for decompressing the second sequence based on the mother model and a first encoder to obtain a fourth sequence and a first data batch, and the first encoder corresponds to the mother model;

31. A compression device comprising a processor coupled with a memory for storing a computer program or instructions for executing the computer program or instructions in the memory such that the method of any one of claims 1 to 13 is performed or such that the method of any one of claims 14 to 16 is performed.

32. A decompression device comprising a processor coupled with a memory for storing a computer program or instructions for executing the computer program or instructions in the memory such that the method of any one of claims 17 to 27 is performed.

33. A computer storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 13, or cause the computer to perform the method of any one of claims 14 to 16, or cause the computer to perform the method of any one of claims 17 to 27.

34. A computer program product, characterized in that, when executed on a computer, causes the computer to perform the method of any one of claims 1 to 13, or causes the computer to perform the method of any one of claims 14 to 16, or causes the computer to perform the method of any one of claims 17 to 27.