WO2023174256A1 - 一种数据压缩方法以及相关设备 - Google Patents

一种数据压缩方法以及相关设备 Download PDF

Info

Publication number
WO2023174256A1
WO2023174256A1 PCT/CN2023/081315 CN2023081315W WO2023174256A1 WO 2023174256 A1 WO2023174256 A1 WO 2023174256A1 CN 2023081315 W CN2023081315 W CN 2023081315W WO 2023174256 A1 WO2023174256 A1 WO 2023174256A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sub
probability distribution
bit stream
neural network
Prior art date
Application number
PCT/CN2023/081315
Other languages
English (en)
French (fr)
Inventor
张琛
莱德汤姆
康宁
张世枫
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023174256A1 publication Critical patent/WO2023174256A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • This application relates to the field of artificial intelligence, and in particular, to a data compression method and related equipment.
  • Multimedia data now accounts for the vast majority of Internet traffic. Compression of image data plays a vital role in the storage and efficient transmission of multimedia data. Therefore, image coding is a technology with great practical value.
  • the lossless compression scheme based on artificial intelligence takes advantage of the feature of deep generative models that can estimate the probability distribution of data more accurately than traditional schemes, and achieves a compression ratio that is far superior to traditional lossless compression schemes.
  • widely used deep generative models include autoregressive models, variational autoencoders (VAE), normalizing flows, etc.
  • VAE variational autoencoders
  • the autoregressive model is better compatible with arithmetic encoders and Huffman coding
  • the variational autoencoder combined with the inverse encoding (bits-back) mechanism is better compatible with asymmetric digital systems
  • the flow model is compatible with The three different entropy encoders described above.
  • lossless compression solutions are also evaluated throughput rate.
  • the variational autoencoder model is a latent variable model.
  • This type of model does not directly model the data itself, but instead introduces one (or more) latent variables, and then models the prior distribution, likelihood function and approximate posterior distribution. Since the marginal distribution of the data cannot be directly obtained from the variational autoencoder, the traditional entropy coding method cannot be directly used.
  • a variational autoencoding lossless compression scheme based on the inverse encoding mechanism is proposed. bits-back ANS is the original form of this scheme, which is suitable for variational autoencoder models containing only one latent variable, and can be generalized to variational autoencoder models containing multiple latent variables.
  • This application provides a data compression method. Compared with the additional initial bits required by the inverse encoding mechanism in the prior art, the embodiments of this application do not require additional initial bits, which can achieve compression of a single data point, and greatly Reduced the compression ratio during parallel compression.
  • this application provides a data compression method, including: acquiring first target data, where the first target data includes first sub-data and second sub-data;
  • the first target data may be image data for compression or other data (such as text, video, binary stream, etc.).
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block
  • the first target data is a text sequence, and the first sub-data and the second sub-data are obtained after data segmentation of the text sequence; or,
  • the first target data is a binary stream, and the first sub-data and the second sub-data are obtained after data segmentation of the binary stream; or,
  • the first target data is a video
  • the first sub-data and the second sub-data are obtained by data segmenting multiple image frames of the video.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • a spatial dimension or a channel dimension for image data, it includes one channel dimension (C) and two spatial dimensions (width W and height H).
  • a first probability distribution is obtained through the first decoder of the variational autoencoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data; according to the first A probability distribution, compressing the second sub-data through an entropy encoder to obtain a first bit stream; using the first bit stream as an initial bit stream, compressing the first sub-data (that is, using the The first sub-data is compressed into the first bit stream) to obtain a second bit stream.
  • the embodiments of the present application do not require additional initial bits, can realize compression of a single data point, and greatly reduce the compression ratio during parallel compression.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the variational autoencoder may include a variational encoder, a decoder (such as the first decoder and the second decoder in the embodiment of the present application) and a prior distribution of the latent variable.
  • the decoder may be composed of decoder layers (such as the first convolutional neural network and the second convolutional neural network in the embodiment of the present application), and includes the number and variation of decoder layers.
  • the number of latent variables in the autoencoder The numbers are the same.
  • the function of the decoder layer is to input deeper hidden variables and output the conditional probability distribution of the current layer data (the current layer data can be shallower hidden variables or data).
  • the variational encoder needs to input the entire data to predict the approximate posterior distribution of the latent variable, and the input latent variable in the decoder directly predicts the conditional probability distribution of the entire data.
  • the data to be compressed is divided into at least two parts, namely: first sub-data and second sub-data.
  • the conditional probability distribution of the first sub-data is predicted; the conditional probability distribution of the second sub-data depends on the first sub-data, which can be specifically determined by inputting the first sub-data into the first decoder.
  • the decoder may implement a pixel reset operation.
  • the first decoder may include a first convolutional neural network and a second convolutional neural network, and according to the first sub-data, the first decoding by a variational autoencoder device to obtain the first probability distribution, which may specifically include: performing a pixel reset operation from the spatial dimension to the channel dimension on the second target data including the second sub-data to obtain the third sub-data, the second target data
  • the size of the third sub-data is consistent with the size of the first target data, and the size of the third sub-data and the first sub-data are the same in the spatial dimension;
  • the second target data including the second sub-data may be data with the same size as the first target data, wherein in the first target data, elements other than the second sub-data may be set to zero (or Other preset values) to obtain the second target data. After performing a pixel reset operation on the second target data, it can be converted into a third sub-data that has the same size as the first sub-data in the spatial dimension.
  • the embodiment of the present application fully utilizes the correlation between image pixels by using an encoder layer based on an autoregressive structure defined by channel-first pixel reset, thereby significantly reducing the amount of parameters required for the model while obtaining a lower encoding length. This improves the compression throughput and reduces the space cost of model storage.
  • the fourth sub-data can be obtained through the first convolutional neural network according to the first sub-data, and the fourth sub-data and the third sub-data are in the channel dimension. Dimensions are the same. That is to say, feature extraction and size transformation of the first sub-data can be performed through the first convolutional neural network, so as to obtain a fourth sub-data with the same size of Hull's third sub-data in the channel dimension.
  • the third sub-data and the fourth sub-data may be fused to obtain fused sub-data.
  • the fusion method can be data replacement of the corresponding channel.
  • the fusion of the third sub-data and the fourth sub-data may specifically include: replacing data of some channels in the fourth sub-data with the third sub-data. The data of the corresponding channel in the data to obtain the fused sub-data.
  • the first probability distribution can be obtained through the second convolutional neural network according to the fused sub-data.
  • the fused sub-data and the first sub-data can also be concatenated along the channel dimension to obtain the spliced sub-data; furthermore, according to the Obtaining the first probability distribution from the fused sub-data through the second convolutional neural network may specifically include: obtaining the first probability distribution through the second convolutional neural network based on the spliced sub-data. Describe the first probability distribution.
  • this application provides a data decompression method, including:
  • a first probability distribution is obtained through the first decoder of the variational autoencoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;
  • second sub-data is decompressed from the first bit stream through an entropy encoder; the first sub-data and the second sub-data are used to restore the first target data.
  • decoding the first sub-data from the second bit stream to obtain the first bit stream includes:
  • a second probability distribution is obtained through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data;
  • the approximate posterior distribution of the latent variable is obtained through the variational encoder in the variational autodifferentiator;
  • the latent variable is compressed to the third bit stream through the entropy encoder to obtain a first bit stream.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block
  • the first target data is a text sequence, and the first sub-data and the second sub-data are obtained after data segmentation of the text sequence; or,
  • the first target data is a binary stream, and the first sub-data and the second sub-data are obtained after data segmentation of the binary stream;
  • the first target data is a video
  • the first sub-data and the second sub-data are obtained by data segmenting multiple image frames of the video.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the first decoder includes a first convolutional neural network and a second convolutional neural network, and according to the first sub-data, the first decoder of the variational autoencoder is , the first probability distribution is obtained, including:
  • the fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data have the same size in the channel dimension;
  • the first probability distribution is obtained through the second convolutional neural network.
  • the fusion of the third sub-data and the fourth sub-data includes:
  • the method further includes:
  • Obtaining the first probability distribution based on the fused sub-data through the second convolutional neural network includes: based on the spliced sub-data through the second convolutional neural network, Obtain the first probability distribution.
  • this application provides a data compression device, including:
  • An acquisition module configured to acquire first target data, where the first target data includes first sub-data and second sub-data;
  • a compression module configured to obtain a first probability distribution according to the first sub-data through the first decoder of the variational autoencoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data.
  • the first sub-data is compressed into the first bit stream to obtain a second bit stream.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block
  • the first target data is a text sequence, and the first sub-data and the second sub-data are obtained after data segmentation of the text sequence; or,
  • the first target data is a binary stream, and the first sub-data and the second sub-data are obtained after data segmentation of the binary stream;
  • the first target data is a video
  • the first sub-data and the second sub-data are obtained by data segmenting multiple image frames of the video.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the compression module is specifically used to:
  • the approximate posterior distribution of the latent variable is obtained through the variational encoder in the variational autodifferentiator;
  • a second probability distribution is obtained through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data;
  • the first sub-data is compressed to the third bit stream by the entropy encoder to obtain a fourth bit stream;
  • the latent variable is compressed to the fourth bit stream through the entropy encoder to obtain a second bit stream.
  • the first decoder includes a first convolutional neural network and a second convolutional neural network
  • the compression module is specifically used to:
  • the fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data have the same size in the channel dimension;
  • the first probability distribution is obtained through the second convolutional neural network.
  • the fusion of the third sub-data and the fourth sub-data includes:
  • the device further includes:
  • a splicing module configured to splice the fused sub-data and the first sub-data along the channel dimension to obtain the spliced sub-data
  • Obtaining the first probability distribution based on the fused sub-data through the second convolutional neural network includes: based on the spliced sub-data through the second convolutional neural network, Obtain the first probability distribution.
  • this application provides a data decompression device, including:
  • the acquisition module is used to acquire the second bit stream
  • a decompression module configured to decode the first sub-data from the second bit stream to obtain the first bit stream
  • a first probability distribution is obtained through the first decoder of the variational autoencoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;
  • second sub-data is decompressed from the first bit stream through an entropy encoder; the first sub-data and the second sub-data are used to restore the first target data.
  • the code receiving module is specifically used for:
  • a second probability distribution is obtained through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data;
  • the approximate posterior distribution of the latent variable is obtained through the variational encoder in the variational autodifferentiator;
  • the latent variable is compressed to the third bit stream through the entropy encoder to obtain a first bit stream.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block
  • the first target data is a text sequence, and the first sub-data and the second sub-data are obtained after data segmentation of the text sequence; or,
  • the first target data is a binary stream, and the first sub-data and the second sub-data are obtained after data segmentation of the binary stream;
  • the first target data is a video
  • the first sub-data and the second sub-data are obtained by data segmenting multiple image frames of the video.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the first decoder includes a first convolutional neural network and a second convolutional neural network
  • the decompression module is specifically used to:
  • the fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data have the same size in the channel dimension;
  • the first probability distribution is obtained through the second convolutional neural network.
  • the fusion of the third sub-data and the fourth sub-data includes:
  • the device further includes:
  • a splicing module configured to splice the fused sub-data and the first sub-data along the channel dimension to obtain the spliced sub-data
  • Obtaining the first probability distribution based on the fused sub-data through the second convolutional neural network includes: based on the spliced sub-data through the second convolutional neural network, Obtain the first probability distribution.
  • the present application provides a data compression device, including a storage medium, a processing circuit, and a bus system; wherein the storage medium is used to store instructions, and the processing circuit is used to execute instructions in the memory to execute the above.
  • the present application provides a data compression device, including a storage medium, a processing circuit, and a bus system; wherein the storage medium is used to store instructions, and the processing circuit is used to execute instructions in the memory to execute the above The data compression method described in any one of the second aspects.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute any of the above-mentioned aspects of the first aspect. data compression method.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute any one of the above described second aspects. data compression method.
  • embodiments of the present application provide a computer program that, when run on a computer, causes the computer to perform any of the data compression methods described in the first aspect.
  • embodiments of the present application provide a computer program that, when run on a computer, causes the computer to perform any of the data compression methods described in the second aspect.
  • the present application provides a chip system that includes a processor for supporting an execution device (such as a data compression device or a data decompression device) or a training device to implement the functions involved in the above aspects, such as , send or process the data and/or information involved in the above methods.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for executing the device or training the device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Embodiments of the present application provide a data compression method, including: obtaining first target data, the first target data including first sub-data and second sub-data; according to the first sub-data, through variational autoencoding
  • the first decoder of the decoder obtains a first probability distribution, which is used as a conditional probability distribution of the second sub-data; according to the first probability distribution, the second sub-data is compressed by an entropy encoder sub-data to obtain a first bit stream; compress the first sub-data to the first bit stream to obtain a second bit stream.
  • the embodiment of the present application does not require additional initial bits to achieve compression of a single data point. And reduces the compression ratio during parallel compression.
  • Figure 1 is a structural schematic diagram of the main framework of artificial intelligence
  • Figure 2 is a schematic diagram of the application scenario of the embodiment of the present application.
  • Figure 3 is a schematic diagram of the application scenario of the embodiment of the present application.
  • Figure 4 is a schematic diagram of the data processing process based on CNN
  • Figure 5 is a schematic diagram of the data processing process based on CNN
  • Figure 6 is a schematic diagram of an embodiment of a system architecture method provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 8 is a schematic flowchart of a data compression method provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of an embodiment of a pixel replacement operation provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of the processing flow of a decoder provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a decoder provided by an embodiment of the present application.
  • Figure 12 is a schematic flowchart of a data compression method provided by an embodiment of the present application.
  • Figure 13 is a schematic flowchart of a data compression method provided by an embodiment of the present application.
  • Figure 14 is a schematic flowchart of a data decompression method provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a data compression device provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a data decompression device provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • Figure 1 shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has experienced “data-information-knowledge- The condensation process of "wisdom”.
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA, etc.);
  • the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include cloud storage and Computing, interconnection networks, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
  • This application can be applied to the field of data compression in the field of artificial intelligence.
  • the following will introduce multiple application scenarios that have been implemented into products.
  • the image compression method provided by the embodiment of the present application can be applied to the image compression process in the terminal device. Specifically, it can be applied to the photo album, video monitoring, etc. on the terminal device.
  • Figure 2 is a schematic diagram of an application scenario according to an embodiment of the present application.
  • a terminal device can obtain a picture to be compressed, where the picture to be compressed can be a photo taken by a camera or A frame taken from a video.
  • the terminal device can extract features of the acquired images to be compressed through the artificial intelligence (AI) coding unit in the embedded neural network processing unit (NPU), and transform the image data into images with lower redundancy.
  • AI artificial intelligence
  • NPU embedded neural network processing unit
  • the central processing unit passes the output features Probability estimation of each point in the extracted output features is arithmetically encoded to reduce the coding redundancy of the output features, further reduce the amount of data transmission in the image compression process, and save the encoded data in the form of data files in the corresponding storage location.
  • the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on arithmetic decoding, and use the AI decoding unit in the NPU to compare the feature map Perform reconstruction to obtain the reconstructed image.
  • the image compression method provided by the embodiment of the present application can be applied to the image compression process on the cloud side. Specifically, it can be applied to functions such as cloud photo albums on the cloud side server.
  • Figure 3 is a schematic diagram of an application scenario according to an embodiment of the present application.
  • a terminal device can obtain a picture to be compressed, where the picture to be compressed can be a photo taken by a camera or A frame taken from a video.
  • the terminal device can use the CPU to perform lossless encoding and compression on the image to be compressed to obtain the encoded data. For example, but not limited to, any lossless compression method based on the existing technology.
  • the terminal device can transmit the encoded data to the server on the cloud side, and the server can The received encoded data is decoded accordingly to obtain the image to be compressed.
  • the server can extract features of the image to be compressed through the AI encoding unit in the graphics processing unit (GPU), and transform the image data into Output features with lower redundancy and generate probability estimates of each point in the output feature.
  • the CPU performs arithmetic coding on the extracted output features through probability estimates of each point in the output feature, reducing the coding redundancy of the output features and further reducing
  • the amount of data transferred during the image compression process, and the encoded data obtained by encoding is saved in the corresponding storage location in the form of a data file.
  • the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on arithmetic decoding, and use the AI decoding unit in the NPU to compare the feature map Perform reconstruction to obtain the reconstructed image.
  • the server can perform lossless encoding and compression on the image to be compressed through the CPU to obtain the encoded data. For example, but not limited to any lossless compression method based on the existing technology, the server can transmit the encoded data to The terminal device can perform corresponding lossless decoding on the received encoded data to obtain the decoded image.
  • the neural network can be composed of neural units.
  • the neural unit can refer to an arithmetic unit that takes xs and intercept 1 as input.
  • the output of the arithmetic unit can be:
  • Ws is the weight of Xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in between are hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks very complicated, the work of each layer is actually not complicated. Simply put, it is the following linear relationship expression: in, is the input vector, is the output vector, is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just a pair of input vectors After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and offset vector The number is also relatively large.
  • DNN The definitions of these parameters in DNN are as follows: Taking the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically, a model with more parameters has higher complexity and greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is the process of learning the weight matrix. The ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by the vectors W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor consisting of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
  • a neuron can be connected to only some of the neighboring layer neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as extracting features in a way that is independent of location.
  • the convolution kernel can be formalized as a matrix of random size. During the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • CNN is a very common neural network.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multiple levels of learning at different levels of abstraction.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the image input into it.
  • a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a fully connected layer 230.
  • the convolutional layer/pooling layer 220 may include layers 221-226 as shown in Examples.
  • Layer 221 is a convolution layer
  • layer 222 is a pooling layer
  • layer 223 is a convolution layer
  • layer 224 is a pooling layer
  • layer 225 is a convolution layer
  • layer 226 is a pooling layer
  • 221, 222 is the convolution layer
  • 223 is the pooling layer
  • 224 and 225 are the convolution layers
  • 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or can be used as the input of another convolutional layer to continue the convolution operation.
  • convolutional layer 221 As an example to introduce the internal working principle of a convolutional layer.
  • the convolution layer 221 can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually predefined. During the convolution operation on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...This depends on the value of the step size) to complete the process of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a convolved output with a single depth dimension, but in most cases, instead of using a single weight matrix, multiple weight matrices of the same size (rows ⁇ columns) are applied, That is, multiple matrices of the same type.
  • the output of each weight matrix is stacked to form the depth dimension of the convolution image.
  • the dimension here can be understood as being determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to remove unnecessary noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices with the same size are also the same size. The extracted multiple feature maps with the same size are then merged to form a convolution operation. output.
  • weight values in these weight matrices require a large amount of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, thereby allowing the convolutional neural network 200 to make correct predictions. .
  • the initial convolutional layer for example, 221
  • the features extracted by subsequent convolutional layers for example, 226) become more and more complex, such as high-level semantic features.
  • each layer 221-226 as shown at 220 in Figure 4 there can be a convolution layer followed by a layer
  • the pooling layer can also be a multi-layer convolution layer followed by one or more pooling layers.
  • the only purpose of the pooling layer is to reduce the spatial size of the image.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value in a specific range as the result of max pooling.
  • the operators in the pooling layer should also be related to the size of the image.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer.
  • Each pixel in the image output by the pooling layer represents the corresponding image of the input pooling layer. The average or maximum value of a subregion.
  • the convolutional neural network 200 After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 200 needs to use the fully connected layer 230 to generate the output of one or a set of required number of classes. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in Figure 4), and the parameters contained in the multiple hidden layers may be based on the relevant training data of the specific task type. Obtained by pre-training, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, etc...
  • the output layer 240 has a loss function similar to categorical cross entropy and is specifically used to calculate the prediction error.
  • the convolutional neural network 200 shown in Figure 4 is only an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models, for example, only Including part of the network structure shown in Figure 4, for example, the convolutional neural network used in the embodiment of the present application may only include an input layer 210, a convolution layer/pooling layer 220 and an output layer 240.
  • the convolutional neural network 100 shown in Figure 4 is only an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models, for example, as The multiple convolutional layers/pooling layers shown in Figure 5 are parallel, and the extracted features are all input to the fully connected layer 230 for processing.
  • the neural network can use the error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forward propagation of the input signal until the output will produce an error loss, and the initial neural network is updated by backpropagating the error loss information. parameters in the network model, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • Lossless compression A technology that compresses data.
  • the compressed data length is smaller than the original data length. After decompressing the compressed data, the recovered data must be exactly the same as the original data.
  • Compression length the storage space occupied by compressed data.
  • Compression ratio the ratio of the original data length to the compressed data length. If there is no compression, the value is 1. The larger the value, the better.
  • Bits per dimension The average bit length of each dimension (bytes) of compressed data.
  • the calculation formula is: 8/compression ratio. If there is no compression, the value is 8. The smaller the value, the better.
  • Hidden variable a kind of data with a specific probability distribution. By establishing the conditional probability between these data and the original data, the probability distribution of the original data can be obtained.
  • Encoding/Decoding The process of data compression is encoding, and the process of decompression is decoding.
  • Reverse encoding A special encoding technology that uses additional binary data stored in the system to decode to generate specific data.
  • FIG. 6 is a schematic diagram of the system architecture provided by an embodiment of the present application.
  • the system architecture 500 includes an execution device 510 , a training device 520 , a database 530 , a client device 540 , a data storage system 550 and a data collection system 560 .
  • the execution device 510 includes a computing module 511, an I/O interface 512, a preprocessing module 513 and a preprocessing module 514.
  • the target model/rule 501 may be included in the calculation module 511, and the preprocessing module 513 and the preprocessing module 514 are optional.
  • the execution device 510 can be a mobile phone, a tablet, a notebook computer, a smart wearable device, etc., and the terminal device can perform compression processing on the acquired images.
  • the terminal device may be a virtual reality (VR) device.
  • the embodiments of the present application can also be applied to intelligent monitoring.
  • a camera can be configured in the intelligent monitoring, and the intelligent monitoring can obtain images to be compressed through the camera. It should be understood that the embodiments of the present application can also be applied to In other scenarios that require image compression, other application scenarios will not be listed here.
  • Data collection device 560 is used to collect training data. After collecting the training data, the data collection device 560 stores the training data into the database 530, and the training device 520 trains to obtain the target model/rule 501 based on the training data maintained in the database 530.
  • the above target model/rule 501 (such as the variational autoencoder, entropy encoder, etc. in the embodiment of the present application) can be used to implement data compression and decompression tasks, that is, the data to be processed (such as the variational autoencoder in the embodiment of the present application)
  • the first target data is input into the target model/rule 501, and the compressed data (such as the second bit stream in the embodiment of the present application) can be obtained.
  • the training data maintained in the database 530 may not necessarily be collected by the data collection device 560, but may also be received from other devices.
  • the training device 520 does not necessarily perform training of the target model/rules 501 based entirely on the training data maintained by the database 530. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be regarded as a limitation of this application. of embodiment limited.
  • the target model/rules 501 trained according to the training device 520 can be applied to different systems or devices, such as to the execution device 510 shown in Figure 6.
  • the execution device 510 can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, augmented reality (AR)/virtual reality (VR) devices, vehicle-mounted terminals, etc., or servers or clouds, etc.
  • the execution device 510 is configured with an input/output (I/O) interface 512 for data interaction with external devices. The user can input data to the I/O interface 512 through the client device 540 .
  • I/O input/output
  • the preprocessing module 513 and the preprocessing module 514 are used to perform preprocessing according to the input data received by the I/O interface 512. It should be understood that there may be no preprocessing module 513 and 514 or only one preprocessing module. When the preprocessing module 513 and the preprocessing module 514 do not exist, the computing module 511 can be directly used to process the input data.
  • the execution device 510 When the execution device 510 preprocesses input data, or when the calculation module 511 of the execution device 510 performs calculations and other related processes, the execution device 510 can call data, codes, etc. in the data storage system 550 for corresponding processing. , the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 550.
  • the I/O interface 512 presents the processing results to the client device 540, thereby providing them to the user.
  • the user can manually set the input data, and the "manually set input data" can be operated through the interface provided by the I/O interface 512 .
  • the client device 540 can automatically send input data to the I/O interface 512. If requiring the client device 540 to automatically send the input data requires the user's authorization, the user can set corresponding permissions in the client device 540. The user can view the results output by the execution device 510 on the client device 540, and the specific presentation form may be display, sound, action, etc.
  • the client device 540 can also be used as a data collection terminal to collect the input data of the input I/O interface 512 and the output results of the output I/O interface 512 as new sample data, and store them in the database 530.
  • the I/O interface 512 directly uses the input data input to the I/O interface 512 and the output result of the output I/O interface 512 as a new sample as shown in the figure.
  • the data is stored in database 530.
  • Figure 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 550 is an external memory relative to the execution device 510. In other cases, the data storage system 550 can also be placed in the execution device 510.
  • Figure 7 is a chip hardware structure diagram provided by an embodiment of the present application.
  • the chip includes a neural network processor 700.
  • the chip can be disposed in the execution device 510 as shown in Figure 6 to complete the calculation work of the calculation module 511.
  • the chip can also be installed in the training device 520 as shown in Figure 6 to complete the training work of the training device 520 and output the target model/rules 501.
  • the algorithms at each layer in the image processing model shown in Figure 6 can be implemented in the chip shown in Figure 7.
  • the neural network processor (neural processing unit, NPU) 700 serves as a co-processor and is mounted on the host central processing unit (host central processing unit, host CPU), and the host CPU allocates tasks.
  • the core part of NPU is the computing power In path 703, the controller 704 controls the calculation circuit 703 to extract the data in the memory (weight memory 702 or input memory 701) and perform calculations.
  • the computing circuit 703 internally includes multiple processing engines (PEs).
  • PEs processing engines
  • arithmetic circuit 703 is a two-dimensional systolic array.
  • the arithmetic circuit 703 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 703 is a general-purpose matrix processor.
  • the operation circuit 703 obtains the corresponding data of matrix B from the weight memory 702 and caches it on each PE in the operation circuit 703 .
  • the operation circuit 703 takes the matrix A data from the input memory 701 and performs matrix operation on the matrix B, and stores the partial result or the final result of the matrix in an accumulator (accumulator) 708 .
  • the vector calculation unit 707 can further process the output of the operation circuit 703, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • the vector calculation unit 707 can be used for network calculations of non-convolutional/non-FC layers in neural networks, such as pooling, batch normalization, local response normalization, etc. .
  • vector calculation unit 707 can store the processed output vectors to unified memory 706 .
  • the vector calculation unit 707 may apply a nonlinear function to the output of the operation circuit 703, such as a vector of accumulated values, to generate an activation value.
  • vector calculation unit 707 generates normalized values, merged values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 703, such as for use in a subsequent layer in a neural network.
  • the unified memory 706 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 701 and/or unified memory 706 through the storage unit access controller (direct memory access controller, DMAC) 705, and stores the weight data in the external memory into the weight memory 702. And store the data in the unified memory 706 into the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (bus interface unit, BIU) 710 is used to realize the interaction between the main CPU, the DMAC and the fetch memory 709 through the bus.
  • An instruction fetch buffer 709 connected to the controller 704 is used to store instructions used by the controller 704.
  • the controller 704 is used to call instructions cached in the fetch memory 709 to control the working process of the computing accelerator.
  • the unified memory 706, the input memory 701, the weight memory 702 and the instruction memory 709 are all on-chip memories, and the external memory is a memory external to the NPU.
  • the external memory can be double data rate synchronous dynamic random access. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • Lossless compression of data is one of the important basic directions in the field of information technology. Its purpose is to establish a bijection from the original data space to the coding space, so that long data with a high frequency of occurrence can be represented by a shorter code, thereby obtaining an average value. A shorter data representation length can be achieved, and a one-to-one conversion between the original data space and the encoding space can be achieved based on the bijection.
  • the optimal lossless compression length of data is determined by the Shannon information entropy of the data probability distribution; and the more accurately the data probability distribution is estimated, the closer to the optimal lossless compression length can be obtained.
  • the lossless compression scheme based on artificial intelligence takes advantage of the feature of deep generative models that can estimate the probability distribution of data more accurately than traditional schemes, and achieves a compression ratio that is far superior to traditional lossless compression schemes.
  • widely used deep generative models include autoregressive models (autoregressive Models), variational autoencoders (variational auto-encoder, VAE), flow models (normalizing flows), etc.
  • autoregressive model is better compatible with arithmetic encoders and Huffman coding
  • the variational autoencoder combined with the inverse encoding (bits-back) mechanism is better compatible with asymmetric digital systems
  • the flow model is compatible with The three different entropy encoders described above.
  • lossless compression solutions are also evaluated throughput rate.
  • throughput rate For lossless compression solutions based on artificial intelligence, because the model size is much larger than traditional solutions, the overall throughput rate is lower than traditional solutions.
  • the variational autoencoder model is a latent variable model.
  • This type of model does not directly model the data itself, but instead introduces one (or more) latent variables, and then models the prior distribution, likelihood function and approximate posterior distribution. Since the marginal distribution of the data cannot be directly obtained from the variational autoencoder, the traditional entropy coding method cannot be directly used.
  • a variational autoencoding lossless compression scheme based on the inverse encoding mechanism is proposed. bits-back ANS is the original form of this scheme, which is suitable for variational autoencoder models containing only one latent variable, and can be generalized to variational autoencoder models containing multiple latent variables.
  • the model in a variational autoencoder containing a latent variable, can be divided into three modules, namely: a priori module, variational encoder modules and decoder modules.
  • the above three modules can be used to determine the parameters of the following three distributions, namely: the prior distribution of the latent variable, the likelihood function of the latent variable (the conditional probability distribution of the data) and the approximate posterior distribution of the latent variable.
  • bit stream 4 1. Obtain the bit data to be decompressed (bit stream 4);
  • the present invention is an improvement on the artificial intelligence lossless compression scheme based on variational autoencoders.
  • the present invention improves two major pain points in this subdivision: first, by introducing a special autoregressive structure, it reduces the amount of parameters required for the variational autoencoder to achieve the same compression ratio, thereby improving the throughput rate; second, By introducing a special variational encoder and decoder structure and proposing a new inverse encoding algorithm, the random initial bits necessary before the lossless compression scheme based on the variational autoencoder are removed, thus realizing the single-step performance of the scheme. Data point compression and decompression as well as efficient parallel compression and decompression.
  • Figure 8 is a schematic diagram of a data compression method provided by an embodiment of the present application.
  • a data compression method provided by an embodiment of the present application includes:
  • first target data where the first target data includes first sub-data and second sub-data.
  • the first target data may be image data for compression or other data (such as text, video, etc.), where the first target data may be an image captured by the above-mentioned terminal device through a camera (or is part of the image), or the first target data may also be an image obtained from inside the terminal device (for example, an image stored in the photo album of the terminal device, or a picture obtained by the terminal device from the cloud).
  • the above-mentioned first target data may be data with image compression requirements, and this application does not place any limitation on the source of the first target image.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • a spatial dimension or a channel dimension for image data, it includes one channel dimension (C) and two spatial dimensions (width W and height H).
  • the first target data may include 6 channels
  • the first sub-data may be the data of the first three channels in the first target data
  • the second sub-data may be the data of the last three channels in the first target data.
  • the size of the first target data in the spatial dimension is N*N
  • the first sub-data can be data with the spatial dimension in the first target data ranging from (0 to N/2)*N
  • the second sub-data can be the Data with spatial dimensions in a target data ranging from (N/2 to N)*N.
  • the first sub-data obtain the first probability distribution through the first decoder of the variational autoencoder, and the first probability distribution is used as the conditional probability distribution of the second sub-data;
  • the variational autoencoder may include a variational encoder, a decoder (such as the first decoder and the second decoder in the embodiment of the present application) and a prior distribution of the latent variable.
  • the decoder may be composed of decoder layers (such as the first convolutional neural network and the second convolutional neural network in the embodiment of the present application), and includes the number and variation of decoder layers.
  • the number of latent variables in the autoencoder is the same.
  • the function of the decoder layer is to input deeper hidden variables and output the conditional probability distribution of the current layer data (the current layer data can be shallower hidden variables or data).
  • the variational encoder needs to input the entire data to predict the approximate posterior distribution of the latent variable, and the input latent variable in the decoder directly predicts the conditional probability distribution of the entire data.
  • the data to be compressed is divided into at least two parts, namely: first sub-data and second sub-data.
  • the conditional probability distribution of the first sub-data is predicted; the conditional probability distribution of the second sub-data depends on the first sub-data, which can be specifically determined by inputting the first sub-data into the first decoder.
  • the decoder may implement a pixel reset operation.
  • parameters denoted as k
  • two reversible operations denoted as space-to-channel operation and channel-to-space operation
  • the parameter k takes a positive integer value, which determines the ratio of the change in the input and output tensor space dimensions in the above two reversible operations.
  • the above-mentioned space to channel operation and channel to space operation are inverse to each other.
  • image data can be represented as vectors, which contain one channel dimension (C) and two spatial dimensions (width W and height H). Due to the characteristics of data batch processing in deep learning technology, the corresponding tensor representation has one more batch dimension (N), that is, the image data tensor contains four dimensions (NCHW or NHWC).
  • N batch dimension
  • a tensor with size n 1 *c 1 *h 1 *w 1 can be transformed into size n 1 *k 2 c 1 *h 1 /k*w through the space conversion operation with parameter k. 1 /k tensor. It is required here that both h 1 and w 1 can be divided by k.
  • a tensor of size n 2 *c 2 *h 2 *w 2 is transformed into a tensor of size n 2 *c 2 /k 2 *kh 2 *kw 2 through the channel conversion space operation with parameter k. It can be seen that the above two operations do not change the total number of elements in the tensor, but only change the position of the elements in the tensor. For different pixel reset rules, different pixel reset devices are available.
  • the pixel reset operation used in the embodiment of the present application adopts a channel-first approach. Since the two operations of space to channel and channel to space are inverse to each other when k is fixed, Figure 9 shows that when n is 1, h and w are 4, c is 3, and k is 2 , the operation effect of space transfer channel.
  • the first decoder may include a first convolutional neural network and a second convolutional neural network.
  • the first decoder of the self-encoder is divided into two parts to obtain the first probability distribution, which may specifically include: performing a pixel resetting operation from the spatial dimension to the channel dimension on the second target data including the second sub-data to obtain the third sub-data.
  • Data, the second target data and the first target data have the same size, and the third sub-data and the first sub-data have the same size in the spatial dimension;
  • the second target data including the second sub-data may be data with the same size as the first target data, wherein in the first target data, elements other than the second sub-data may be set to zero (or Other preset values) to obtain the second target data.
  • the current layer variable may be the above-mentioned second target data, and a pixel reset operation may be performed on the current layer variable.
  • the fourth sub-data can be obtained through the first convolutional neural network according to the first sub-data, and the fourth sub-data and the third sub-data are in the channel dimension. Dimensions are the same. That is to say, feature extraction and size transformation of the first sub-data can be performed through the first convolutional neural network, so as to obtain a fourth sub-data with the same size of Hull's third sub-data in the channel dimension.
  • the third sub-data and the fourth sub-data may be fused to obtain fused sub-data.
  • the fusion method can be data replacement of the corresponding channel.
  • the fusion of the third sub-data and the fourth sub-data may specifically include: replacing data of some channels in the fourth sub-data with the third sub-data. The data of the corresponding channel in the data to obtain the fused sub-data.
  • the first i channels of the third sub-data z′ i-1 can be replaced by the first i channels of the fourth sub-data z′′ i .
  • the first probability distribution can be obtained through the second convolutional neural network according to the fused sub-data.
  • the fused sub-data and the first sub-data can also be concatenated along the channel dimension to obtain the spliced sub-data; furthermore, according to the Obtaining the first probability distribution from the fused sub-data through the second convolutional neural network may specifically include: obtaining the first probability distribution through the second convolutional neural network based on the spliced sub-data. Describe the first probability distribution.
  • the deeper hidden variable is z i
  • the output is the conditional probability distribution of the current layer variable z i-1 .
  • the input of neural network one is z i
  • the output is a tensor z′ i with the same size (including the same number of elements) as z i-1 .
  • the tensor z′ i undergoes a space-to-channel pixel resetting operation with parameter k and becomes a tensor z′′ i with the same spatial dimension size as z i .
  • the splicing operation can be performed according to the channel dimension to obtain the spliced tensor z′′′ i .
  • the tensor z i-1 undergoes a space to channel pixel reset operation with parameter k and becomes a tensor with the same space dimension as z i . z′ i-1 .
  • the decoder layer one introduces the autoregressive structure by inputting the tensor z′′′ i into the neural network two to obtain the probability distribution parameters of the first channel pixel of the tensor z′ i-1 ;
  • the first i channels of tensor z′ i-1 replace the first i channels from z′′ i in z′′′ i and input into neural network 2 to obtain the probability of obtaining the i+1th channel pixel of tensor z′ i-1 distribution parameters.
  • Figure 11 is a schematic diagram of a variational autoencoder in the embodiment of the present application. Decoder one is equivalent to the second decoder in the embodiment of the present application, and decoder two is equivalent to the second decoder in the embodiment of the present application. First decoder.
  • Table 1 shows an exemplary process of a variational autoencoder including one latent variable as an example.
  • the compression process of the process is shown on the left side of Table 1, and the decompression process is shown on the right side of Figure 1.
  • x) of its hidden variables are given by the variational encoder after inputting the data to be compressed.
  • the prior distribution of its deepest hidden variable The parameters of are directly given by the parameters of the deepest hidden variable prior distribution module in the model.
  • the remaining parameters of the conditional probability distribution are output by the corresponding decoder layer through the value of the input conditional data.
  • Each involved decoder layer structure can refer to the above implementation Description of the decoder in the embodiment. In Table 1, x 1 , ... for Same reason.
  • the embodiment of the present application fully utilizes the correlation between image pixels by using an encoder layer based on an autoregressive structure defined by channel-first pixel reset, thereby significantly reducing the amount of parameters required for the model while obtaining a lower encoding length. This improves the compression throughput and reduces the space cost of model storage.
  • the second sub-data can be compressed by an entropy encoder according to the first probability distribution to obtain a first bit stream.
  • the first bit stream may be used as an initial bit stream, and the first sub-data may be compressed.
  • using the first bit stream as an initial bit stream to compress the first sub-data may specifically include: according to the first sub-data, using the variable auto-division encoding
  • the variational encoder in the encoder is used to obtain the approximate posterior distribution of the latent variable; according to the approximate posterior distribution, the latent variable is obtained from the first bit stream through the entropy encoder, and the third bit stream is obtained ;
  • a second probability distribution is obtained through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data; according to the The second probability distribution is used to compress the first sub-data to the third bit stream through the entropy encoder to obtain a fourth bit stream; according to the prior distribution of the latent variable, the entropy encoder is used to compress the first sub-data to the third bit stream to obtain a fourth bit stream.
  • the latent variable is compressed into the fourth bit stream to obtain a second bit stream.
  • the decompression steps can be:
  • Figure 12 is a diagram of the data compression process when the number of latent variables is 1, where S is the first target data, S1 is the first sub-data, and S2 is the second sub-data.
  • Figure 13 In order to compare the differences between the embodiments of the present application and existing solutions, the processes are shown in Figure 13 respectively.
  • the left side of Figure 13 shows the core process of compression and decompression of the existing solution; the right side of Figure 13 shows the core process of compression and decompression without additional initial bits according to the embodiment of the present application.
  • Table 2 which shows the core method flow of the decoder layer based on the autoregressive structure defined by channel-first pixel reset and the variational autoencoder lossless compression solution without additional initial bits.
  • the encoder used since the encoder used makes better use of the correlation between pixels of the image data, it can reduce the model size by 100 while providing a better encoding length than the lossless compression scheme of the same type of model. times.
  • Table 3 shows the average coding bits per dimension (bpd) of this scheme (SHVC) and other industry-best schemes on public data sets. It can be seen that the effect of this scheme is optimal or close to optimal among all comparative schemes (including traditional schemes, VAE model schemes and flow model schemes). Among the same type of solutions (VAE based), it is the best.
  • Table 4 shows that in addition to the coding length advantage of this scheme, due to the small number of model parameters, its inference time is greatly reduced, thereby improving the throughput of compression and decompression.
  • the data statistics are set to 10,000 CIFAR10 images, the batch size is 100, and the hardware is a V100 graphics card.
  • the embodiments of the present application can achieve single data point compression and efficient parallel compression while avoiding the extra initial bits required by the current inverse encoding mechanism.
  • Table 5 shows three cases of the embodiment of this application (SHVC), using (SHVC-ARIB) and using a deterministic posterior (essentially an autoencoder model) and a solution without an inverse coding mechanism (SHVC-Det). The average code length per dimension when additional initial bits are taken into account. As can be seen from Table 5, using this solution can reduce the additional space cost by up to 30 times compared with the current inverse encoding compression algorithm.
  • Embodiments of the present application provide a data compression method, including: obtaining first target data, the first target data including first sub-data and second sub-data; according to the first sub-data, through variational autoencoding
  • the first decoder of the decoder obtains a first probability distribution, which is used as a conditional probability distribution of the second sub-data; according to the first probability distribution, the second sub-data is compressed by an entropy encoder sub-data to obtain a first bit stream; using the first bit stream as an initial bit stream, compress the first sub-data to obtain a second bit stream.
  • the embodiments of the present application do not require additional initial bits, can realize compression of a single data point, and greatly reduce the compression ratio during parallel compression.
  • Figure 14 is a flow diagram of a data decompression method provided by an embodiment of the present application. As shown in Figure 14, the data decompression method provided by an embodiment of the present application includes:
  • the latent variable obtain a second probability distribution through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data;
  • the first sub-data obtain a first probability distribution through the first decoder of the variational autoencoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;
  • the entropy encoder uses the entropy encoder to decompress the second sub-code from the first bit stream. Data; the first sub-data and the second sub-data are used to determine the first target data.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the first decoder includes a first convolutional neural network and a second convolutional neural network, and according to the first sub-data, the first decoder of the variational autoencoder is , the first probability distribution is obtained, including:
  • the fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data have the same size in the channel dimension;
  • the first probability distribution is obtained through the second convolutional neural network.
  • the fusion of the third sub-data and the fourth sub-data includes:
  • the method further includes:
  • Obtaining the first probability distribution based on the fused sub-data through the second convolutional neural network includes: based on the spliced sub-data through the second convolutional neural network, Obtain the first probability distribution.
  • Figure 15 is a schematic structural diagram of a data compression device 1500 provided by an embodiment of the present application.
  • the data compression device 1500 can be a terminal device or a server.
  • the data compression device 1500 includes:
  • Acquisition module 1501 used to acquire first target data, where the first target data includes first sub-data and second sub-data;
  • Compression module 1502 configured to obtain a first probability distribution through the first decoder of the variational autoencoder according to the first sub-data, and the first probability distribution is used as the conditional probability of the second sub-data. distributed;
  • the first sub-data is compressed to obtain a second bit stream.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the compression module is specifically used to:
  • the approximate posterior distribution of the latent variable is obtained through the variational encoder in the variational autodifferentiator;
  • a second probability distribution is obtained through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data;
  • the first sub-data is compressed to the third bit stream by the entropy encoder to obtain a fourth bit stream;
  • the latent variable is compressed to the fourth bit stream through the entropy encoder to obtain a second bit stream.
  • the first decoder includes a first convolutional neural network and a second convolutional neural network
  • the compression module is specifically used to:
  • the fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data have the same size in the channel dimension;
  • the first probability distribution is obtained through the second convolutional neural network.
  • the fusion of the third sub-data and the fourth sub-data includes:
  • the device further includes:
  • a splicing module configured to splice the fused sub-data and the first sub-data along the channel dimension to obtain the spliced sub-data
  • Obtaining the first probability distribution based on the fused sub-data through the second convolutional neural network includes: based on the spliced sub-data through the second convolutional neural network, Obtain the first probability distribution.
  • Figure 16 is a schematic structural diagram of a data decompression device 1600 provided by an embodiment of the present application.
  • the data decompression device 1600 may be a terminal device or a server.
  • the data decompression device 1600 may include:
  • the acquisition module 1601 is used to acquire the second bit stream and the prior distribution of the latent variable
  • Decompression module 1602 configured to decompress the latent variable from the second bit stream through an entropy encoder according to the prior distribution to obtain a fourth bit stream;
  • a second probability distribution is obtained through the second decoder of the variational autoencoder; the second probability distribution is used as a conditional probability distribution of the first sub-data;
  • the approximate posterior distribution of the latent variable is obtained through the variational encoder in the variational autodifferentiator;
  • the latent variable is compressed to the third bit stream through the entropy encoder to obtain a first bit stream;
  • a first probability distribution is obtained through the first decoder of the variational autoencoder, and the first probability distribution is used as a conditional probability distribution of the second sub-data;
  • second sub-data is decompressed from the first bit stream through the entropy encoder; the first sub-data and the second sub-data are used to determine the first target data.
  • decompression module 1602 For a specific description of the decompression module 1602, please refer to the description of steps 1402 to 1408 in the above embodiment, and will not be described again here.
  • the first target data is an image block
  • the first sub-data and the second sub-data are obtained after data segmentation of the image block.
  • the first sub-data and the second sub-data are obtained by segmenting the image block in a spatial dimension or a channel dimension.
  • the first decoder includes a first convolutional neural network and a second convolutional neural network
  • the decompression module is specifically used to:
  • the fourth sub-data is obtained through the first convolutional neural network, and the fourth sub-data and the third sub-data have the same size in the channel dimension;
  • the first probability distribution is obtained through the second convolutional neural network.
  • the fusion of the third sub-data and the fourth sub-data includes:
  • the device further includes:
  • a splicing module configured to splice the fused sub-data and the first sub-data along the channel dimension to obtain the spliced sub-data
  • the first probability distribution is obtained through the second convolutional neural network
  • the method includes: obtaining the first probability distribution through the second convolutional neural network based on the spliced sub-data.
  • Figure 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • the execution device 1700 may specifically be a virtual reality VR device, a mobile phone, Tablets, laptops, smart wearable devices, monitoring data processing equipment, etc. are not limited here.
  • the execution device 1700 may be deployed with the data compression device described in the corresponding embodiment of FIG. 15 or the data decompression device described in the corresponding embodiment of FIG. 16 .
  • the execution device 1700 may include: a receiver 1701, a transmitter 1702, a processor 1703 and a memory 1704 (the number of processors 1703 in the execution device 1700 may be one or more, one processor is taken as an example in Figure 15 ), wherein the processor 1703 may include an application processor 17031 and a communication processor 17032.
  • the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or other means.
  • Memory 1704 may include read-only memory and random access memory and provides instructions and data to processor 1703 .
  • a portion of memory 1704 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1704 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1703 controls the execution of operations of the device.
  • various components of the execution device are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1703 or implemented by the processor 1703.
  • the processor 1703 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1703 .
  • the above-mentioned processor 1703 can be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the processor 1703 can implement or execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1704.
  • the processor 1703 reads the information in the memory 1704 and completes the steps of the above method in combination with its hardware.
  • the receiver 1701 may be used to receive input numeric or character information and generate signal inputs related to performing relevant settings and functional controls of the device.
  • the transmitter 1702 can be used to output numeric or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include a display device such as a display screen .
  • the embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps of the method described in the embodiment shown in FIG. 8, or causes the computer to perform the steps of the method described in the embodiment shown in FIG. 14.
  • the illustrated embodiment describes the steps performed by the method.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program for signal processing. When it is run on a computer, it causes the computer to execute the embodiment shown in Figure 8. The steps performed by the described method, or by causing the computer to perform the steps performed by the method described in the embodiment shown in FIG. 14 .
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits to another website site, computer, training equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a training device or a data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请涉及人工智能领域,公开了一种数据压缩方法,包括:获取第一目标数据,第一目标数据包括第一子数据和第二子数据;根据第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,第一概率分布用于作为第二子数据的条件概率分布;根据第一概率分布,通过熵编码器压缩第二子数据,以得到第一比特流;将所述第一子数据压缩至所述第一比特流,以得到第二比特流。本申请相比于现有技术中反编码机制所需的额外设置的初始比特,本申请实施例中无需额外设置的初始比特,可以实现单数据点的压缩,且降低了并行压缩时的压缩比。

Description

一种数据压缩方法以及相关设备
本申请要求于2022年3月14日提交中国专利局、申请号为202210249906.1、发明名称为“一种数据压缩方法以及相关设备”的中国专利申请的优先权、以及于2023年1月13日提交中国专利局、申请号为202310077949.0、发明名称为“一种数据压缩方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种数据压缩方法以及相关设备。
背景技术
如今多媒体数据占据了互联网的绝大部分流量。对于图像数据的压缩对于多媒体数据的存储和高效传输有着至关重要的作用。所以图像编码是一项具有重大实用价值的技术。
对于图像编码的研究已经有较长的历史了,研究人员提出了大量的方法,并制定了多种国际标准,比如JPEG,JPEG2000,WebP,BPG等图像编码标准。这些编码方法虽然在目前都得到了广泛应用,但是针对现在不断增长的图像数据量及不断出现的新媒体类型,这些传统方法显示出了某些局限性。
基于人工智能的无损压缩方案利用了深度生成模型能够比传统的方案更准确地估计数据的概率分布这一特性,得到了远优于传统无损压缩方案的压缩比。在基于人工智能的无损压缩方案中,被广泛使用的深度生成模型包括自回归模型(autoregressive model),变分自编码器(variational auto-encoder,VAE),流模型(normalizing flows)等。一般来讲,自回归模型可较好地兼容算术编码器和霍夫曼编码;变分自编码器结合使用反编码(bits-back)机制可较好地兼容非对称数字系统;流模型可以兼容上述三种不同的熵编码器。除了压缩比以外,评价无损压缩解决方案的还有吞吐率这一指标。对于基于人工智能的无损压缩解决方案来说,由于模型规模远大于传统方案,因此整体吞吐率低于传统方案。另外,综合压缩比和吞吐率两个指标来说,基于不同生成模型的无损压缩解决方案目前没有绝对的先后之分。目前的研究尚处于对不同生成模型的压缩方案探索其帕累托前沿的阶段。
其中,区别于全观测模型(如自回归模型),变分自编码器模型是一种隐变量模型。该类模型并非对数据数据本身直接建模,而是额外引入了一个(或者多个)隐变量,然后对先验分布,似然函数以及近似后验分布进行建模。由于从变分自编码器中无法直接获得数据数据的边际分布,传统的熵编码方式无法直接被沿用。为了能够使用变分自编码器进行数据的无损压缩,基于反编码机制的变分自编码无损压缩方案被提出。bits-back ANS是该方案的原始形式,适用于只包含一个隐变量的变分自编码器模型,并且可以推广适用到包含多个隐变量的变分自编码器模型。
现行的基于反编码机制的变分自编码器无损压缩方案均需要额外的初始比特用以解压出隐变量的样本。额外的初始比特为随机生成的数据,该数据的大小需要考虑进压缩成本中,且在待串行压缩的数据数量较少时,额外的平均成本较高;且,由于所需的额外初始比特与待压缩数据点的个数成正比,因此无法实现高效的并行压缩。
发明内容
本申请提供一种数据压缩方法,相比于现有技术中反编码机制所需的额外设置的初始比特,本申请实施例中无需额外设置的初始比特,可以实现单数据点的压缩,且大大降低了并行压缩时的压缩比。
第一方面,本申请提供一种数据压缩方法,包括:获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;
在一种可能的实现中,第一目标数据可以为供压缩的图像数据或者是其他数据(例如文本、视频、二进制流等)。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者,
所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。其中,对于图像数据来说,包含了一个通道维度(C)和两个空间维度(宽W和高H)。
根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;将所述第一比特流作为初始比特流,对所述第一子数据进行压缩(也就是将所述第一子数据压缩至所述第一比特流),以得到第二比特流。
相比于现有技术中反编码机制所需的额外设置的初始比特,本申请实施例中无需额外设置的初始比特,可以实现单数据点的压缩,且大大降低了并行压缩时的压缩比。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,变分自编码器可以包括变分编码器,解码器(例如本申请实施例中的第一解码器和第二解码器)和隐变量的先验分布。
在一种可能的实现中,解码器可以由解码器层(例如本申请实施例中的第一卷积神经网络以及第二卷积神经网络)构成,且包含解码器层的个数与变分自编码器中隐变量的个 数相同。解码器层的作用是输入更深层的隐变量,输出当前层数据的条件概率分布(当前层数据可以是更浅层的隐变量或者数据数据)。
在现有的变分自编码器模型中,变分编码器需要输入整个数据数据以预测隐变量的近似后验分布,解码器中输入隐变量直接预测整个数据数据的条件概率分布。在本申请实施例中,将待压缩的数据分成至少两部分,即:第一子数据和第二子数据。和现有将全部数据输入到变分编码器不同的是,本申请实施例中仅将数据的一部分(第一子数据)输入到变分编码器,来预测隐变量的近似后验分布,且隐变量输入第一解码器后预测第一子数据的条件概率分布;第二子数据的条件概率分布依赖于第一子数据,具体可以由将第一子数据输入第一解码器来确定。
在一种可能的实现中,解码器可以实现像素重置操作。
在一种可能的实现中,所述第一解码器可以包括第一卷积神经网络和第二卷积神经网络,所述根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,具体可以包括:对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
其中,包括所述第二子数据的第二目标数据可以为和第一目标数据尺寸相同的数据,其中,在第一目标数据中,除了第二子数据之外的元素可以被置零(或者其他预设数值),以得到第二目标数据,对第二目标数据进行像素重置操作后,可以将其转化为和第一子数据在空间维度的尺寸相同的第三子数据。
本申请实施例通过使用基于通道优先像素重置定义的自回归结构的编码器层,充分利用图片像素间的相关关系,从而在获得更低的编码长度的前提下大幅降低模型所需参数量,进而提高了压缩的吞吐率以及降低了模型存储的空间成本。
在一种可能的实现中,可以根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同。也就是说,可以通过第一卷积神经网络,对第一子数据进行特征提取以及尺寸的变换,以便得到一个赫尔第三子数据在通道维度的尺寸相同的第四子数据。
在一种可能的实现中,可以将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据。可选的,融合方式可以为对应通道的数据替换。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,具体可以包括:将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,可以将根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,还可以将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作(concat),以得到拼接后的子数据;进而,所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,具体可以包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
第二方面,本申请提供了一种数据解压缩方法,包括:
获取第二比特流;
从所述第二比特流中解码出第一子数据,以得到第一比特流;
根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为第二子数据的条件概率分布;
根据所述第一概率分布,通过熵编码器从所述第一比特流中解压出第二子数据;所述第一子数据和所述第二子数据用于还原得到第一目标数据。
在一种可能的实现中,所述从所述第二比特流中解码出第一子数据,以得到第一比特流,包括:
获取隐变量的先验分布;
根据所述先验分布,通过熵编码器从所述第二比特流中解压出所述隐变量,得到第四比特流;
根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为第一子数据的条件概率分布;
根据所述第二概率分布,通过所述熵编码器从所述第四比特流中解压出所述第一子数据,得到第三比特流;
根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
根据所述近似后验分布,通过所述熵编码器将所述隐变量压缩至所述第三比特流,得到第一比特流。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者
所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,包括:
对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,包括:
将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,所述方法还包括:
将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
第三方面,本申请提供了一种数据压缩装置,包括:
获取模块,用于获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;
压缩模块,用于根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;
将所述第一子数据压缩至所述第一比特流,以得到第二比特流。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者
所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,所述压缩模块,具体用于:
根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
根据所述近似后验分布,从所述第一比特流中通过所述熵编码器解码出所述隐变量,得到第三比特流;
根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为所述第一子数据的条件概率分布;
根据所述第二概率分布,通过所述熵编码器将所述第一子数据压缩至所述第三比特流,以得到第四比特流;
根据所述隐变量的先验分布,通过所述熵编码器将所述隐变量压缩至所述第四比特流,得到第二比特流。
在一种可能的实现中,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述压缩模块,具体用于:
对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,包括:
将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,所述装置还包括:
拼接模块,用于将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
第四方面,本申请提供了一种数据解压缩装置,包括:
获取模块,用于获取第二比特流;
解压模块,用于从所述第二比特流中解码出第一子数据,以得到第一比特流;
根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为第二子数据的条件概率分布;
根据所述第一概率分布,通过熵编码器从所述第一比特流中解压出第二子数据;所述第一子数据和所述第二子数据用于还原得到第一目标数据。
在一种可能的实现中,所述接码模块,具体用于:
获取隐变量的先验分布;
根据所述先验分布,通过熵编码器从所述第二比特流中解压出所述隐变量,得到第四比特流;
根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为第一子数据的条件概率分布;
根据所述第二概率分布,通过所述熵编码器从所述第四比特流中解压出所述第一子数据,得到第三比特流;
根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
根据所述近似后验分布,通过所述熵编码器将所述隐变量压缩至所述第三比特流,得到第一比特流。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者
所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述解压模块,具体用于:
对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,包括:
将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,所述装置还包括:
拼接模块,用于将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
第五方面,本申请提供了一种数据压缩装置,包括存储介质、处理电路以及总线系统;其中,所述存储介质用于存储指令,所述处理电路用于执行存储器中的指令,以执行上述第一方面任一所述的数据压缩方法。
第六方面,本申请提供了一种数据压缩装置,包括存储介质、处理电路以及总线系统;其中,所述存储介质用于存储指令,所述处理电路用于执行存储器中的指令,以执行上述第二方面任一所述的数据压缩方法。
第七方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面任一所述的数据压缩方法。
第八方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第二方面任一所述的数据压缩方法。
第九方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面任一所述的数据压缩方法。
第十方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第二方面任一所述的数据压缩方法。
第十一方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备(例如数据压缩装置或者数据解压缩装置)或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
本申请实施例提供了一种数据压缩方法,包括:获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;将所述第一子数据压缩至所述第一比特流,以得到第二比特流。相比于现有技术中反编码机制所需的额外设置的初始比特,本申请实施例中无需额外设置的初始比特,可以实现单数据点的压缩, 且降低了并行压缩时的压缩比。
附图说明
图1为人工智能主体框架的一种结构示意图;
图2为本申请实施例的应用场景示意;
图3为本申请实施例的应用场景示意;
图4为一种基于CNN的数据处理过程示意;
图5为一种基于CNN的数据处理过程示意;
图6为本申请实施例提供的一种系统架构法的实施例示意;
图7为本申请实施例提供的一种芯片的结构示意;
图8为本申请实施例提供的一种数据压缩方法的流程示意;
图9为本申请实施例提供的一种像素置换操作的实施例示意;
图10为本申请实施例提供的一种解码器的处理流程示意;
图11为本申请实施例提供的解码器的结构示意;
图12为本申请实施例提供的一种数据压缩方法的流程示意;
图13为本申请实施例提供的一种数据压缩方法的流程示意;
图14为本申请实施例提供的一种数据解压缩方法的流程示意;
图15为本申请实施例提供的数据压缩装置的一种结构示意图;
图16为本申请实施例提供的数据解压缩装置的一种结构示意图;
图17为本申请实施例提供的执行设备的一种结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识— 智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。
本申请可以应用于人工智能领域的数据压缩领域中,下面将对多个落地到产品的多个应用场景进行介绍。
一、应用于终端设备中的图像压缩过程
本申请实施例提供的图像压缩方法可以应用于终端设备中的图像压缩过程,具体的,可以应用于终端设备上的相册、视频监控等。具体的,可以参照图2,图2为本申请实施例的应用场景示意,如图2中示出的那样,终端设备可以获取到待压缩图片,其中待压缩图片可以是相机拍摄的照片或是从视频中截取的一帧画面。终端设备可以通过嵌入式神经网络(neural-network processing unit,NPU)中的人工智能(artificial intelligence,AI)编码单元对获取到的待压缩图片进行特征提取,将图像数据变换成冗余度更低的输出特征,且产生输出特征中各点的概率估计,中央处理器(central processing unit,CPU)通过输出特征 中各点的概率估计对提取获得的输出特征进行算术编码,降低输出特征的编码冗余,进一步降低图像压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对应的存储位置。当用户需要获取上述存储位置中保存的文件时,CPU可以在相应的存储位置获取并加载上述保存的文件,并基于算数解码获取到解码得到的特征图,通过NPU中的AI解码单元对特征图进行重构,得到重构的图像。
二、应用于云侧的图像压缩过程
本申请实施例提供的图像压缩方法可以应用于云侧的图像压缩过程,具体的,可以应用于云侧服务器上的云相册等功能。具体的,可以参照图3,图3为本申请实施例的应用场景示意,如图3中示出的那样,终端设备可以获取到待压缩图片,其中待压缩图片可以是相机拍摄的照片或是从视频中截取的一帧画面。终端设备可以通过CPU对待压缩图片进行无损编码压缩,得到编码数据,例如但不限于基于现有技术中的任意一种无损压缩方法,终端设备可以将编码数据传输至云侧的服务器,服务器可以对接收到的编码数据进行相应的无损解码,得到待压缩图像,服务器可以通过图形处理器(graphics processing unit,GPU)中的AI编码单元对获取到的待压缩图片进行特征提取,将图像数据变换成冗余度更低的输出特征,且产生输出特征中各点的概率估计,CPU通过输出特征中各点的概率估计对提取获得的输出特征进行算术编码,降低输出特征的编码冗余,进一步降低图像压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对应的存储位置。当用户需要获取上述存储位置中保存的文件时,CPU可以在相应的存储位置获取并加载上述保存的文件,并基于算数解码获取到解码得到的特征图,通过NPU中的AI解码单元对特征图进行重构,得到重构的图像,服务器可以通过CPU对待压缩图片进行无损编码压缩,得到编码数据,例如但不限于基于现有技术中的任意一种无损压缩方法,服务器可以将编码数据传输至终端设备,终端设备可以对接收到的编码数据进行相应的无损解码,得到解码后的图像。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
其中,s=1、2、……、n,n为大于1的自然数,Ws为Xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:其中,是输入向量,是输出向量,是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,系数W和偏移向量的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
CNN是一种非常常见的神经网络,下面结合图4重点对CNN的结构进行详细的介绍。如前文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
如图4所示,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及全连接层(fully connected layer)230。
卷积层/池化层220:
卷积层:
如图4所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中, 221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图4中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应 子区域的平均值或最大值。
全连接层230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用全连接层230来生成一个或者一组所需要的类的数量的输出。因此,在全连接层230中可以包括多层隐含层(如图4所示的231、232至23n),该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等……
在全连接层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图4由210至240方向的传播为前向传播)完成,反向传播(如图4由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图4所示的卷积神经网络200仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,仅包括图4中所示的网络结构的一部分,比如,本申请实施例中所采用的卷积神经网络可以仅包括输入层210、卷积层/池化层220和输出层240。
需要说明的是,如图4所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,如图5所示的多个卷积层/池化层并行,将分别提取的特征均输入给全连接层230进行处理。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的值或与真正想要的值非常接近的值。因此,就需要预先定义“如何比较预测值和值之间的差异”,这便是损失函数(loss function)或函数(objective function),它们是用于衡量预测值和值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(5)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网 络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
(6)无损压缩:对数据进行压缩的技术,压缩后的数据长度小于原始数据长度。压缩后的数据通过解压,恢复的数据必须与原始数据完全相同。
(7)压缩长度:压缩后的数据所占的存储空间。
(8)压缩率:原始数据长度和压缩后数据长度的比值。如果没有压缩,值为1。该值越大越好。
(9)每维比特数:压缩后的数据每个维度(字节)的平均比特长度。计算公式为:8/压缩率。如果没有压缩,该值为8。该值越小越好。
(10)吞吐率:平均每秒处理的数据量大小。
(11)隐变量:一种具有特定概率分布的数据,通过建立这些数据与原始数据的条件概率,能够得到原始数据的概率分布。
(12)编码/解码:对数据压缩的过程是编码,解压的过程是解码。
(13)反编码:一种特殊的编码技术,利用系统中存储的额外二进制数据用解码生成特定的数据。
下面结合图6对本申请实施例提供的系统架构进行详细的介绍。图6为本申请一实施例提供的系统架构示意图。如图6所示,系统架构500包括执行设备510、训练设备520、数据库530、客户设备540、数据存储系统550以及数据采集系统560。
执行设备510包括计算模块511、I/O接口512、预处理模块513和预处理模块514。计算模块511中可以包括目标模型/规则501,预处理模块513和预处理模块514是可选的。
作为一种示例,所述执行设备510可以为手机、平板、笔记本电脑、智能穿戴设备等,终端设备可以对获取到的图片进行压缩处理。作为另一示例,所述终端设备可以为虚拟现实(virtual reality,VR)设备。作为另一示例,本申请实施例也可以应用于智能监控中,可以在所述智能监控中配置相机,则智能监控可以通过相机获取待压缩图片等,应当理解,本申请实施例还可以应用于其他需要进行图像压缩的场景中,此处不再对其他应用场景进行一一列举。
数据采集设备560用于采集训练数据。在采集到训练数据之后,数据采集设备560将这些训练数据存入数据库530,训练设备520基于数据库530中维护的训练数据训练得到目标模型/规则501。
上述目标模型/规则501(例如本申请实施例中的变分自编码器、熵编码器等)能够用于实现数据的压缩以及解压缩任务,即,将待处理数据(例如本申请实施例中的第一目标数据)输入该目标模型/规则501,即可得到压缩后的数据(例如本申请实施例中的第二比特流)。需要说明的是,在实际应用中,数据库530中维护的训练数据不一定都来自于数据采集设备560的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备520也不一定完全基于数据库530维护的训练数据进行目标模型/规则501的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的 限定。
根据训练设备520训练得到的目标模型/规则501可以应用于不同的系统或设备中,如应用于图6所示的执行设备510,所述执行设备510可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备,车载终端等,还可以是服务器或者云端等。在图6中,执行设备510配置输入/输出(input/output,I/O)接口512,用于与外部设备进行数据交互,用户可以通过客户设备540向I/O接口512输入数据。
预处理模块513和预处理模块514用于根据I/O接口512接收到的输入数据进行预处理。应理解,可以没有预处理模块513和预处理模块514或者只有的一个预处理模块。当不存在预处理模块513和预处理模块514时,可以直接采用计算模块511对输入数据进行处理。
在执行设备510对输入数据进行预处理,或者在执行设备510的计算模块511执行计算等相关的处理过程中,执行设备510可以调用数据存储系统550中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统550中。
最后,I/O接口512将处理结果,呈现给客户设备540,从而提供给用户。
在图6所示情况下,用户可以手动给定输入数据,该“手动给定输入数据”可以通过I/O接口512提供的界面进行操作。另一种情况下,客户设备540可以自动地向I/O接口512发送输入数据,如果要求客户设备540自动发送输入数据需要获得用户的授权,则用户可以在客户设备540中设置相应权限。用户可以在客户设备540查看执行设备510输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备540也可以作为数据采集端,采集如图所示输入I/O接口512的输入数据及输出I/O接口512的输出结果作为新的样本数据,并存入数据库530。当然,也可以不经过客户设备540进行采集,而是由I/O接口512直接将如图所示输入I/O接口512的输入数据及输出I/O接口512的输出结果,作为新的样本数据存入数据库530。
值得注意的是,图6仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图6中,数据存储系统550相对执行设备510是外部存储器,在其它情况下,也可以将数据存储系统550置于执行设备510中。
下面介绍本申请实施例提供的一种芯片硬件结构。
图7为本申请一实施例提供的芯片硬件结构图,该芯片包括神经网络处理器700。该芯片可以被设置在如图6所示的执行设备510中,用以完成计算模块511的计算工作。该芯片也可以被设置在如图6所示的训练设备520中,用以完成训练设备520的训练工作并输出目标模型/规则501。如图6所示的图像处理模型中各层的算法均可在如图7所示的芯片中得以实现。
神经网络处理器(neural processing unit,NPU)700作为协处理器挂载到主中央处理单元(host central processing unit,host CPU)上,由主CPU分配任务。NPU的核心部分为运算电 路703,控制器704控制运算电路703提取存储器(权重存储器702或输入存储器701)中的数据并进行运算。
在一些实现中,运算电路703内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路703是二维脉动阵列。运算电路703还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路703是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路703从权重存储器702中取矩阵B相应的数据,并缓存在运算电路703中每一个PE上。运算电路703从输入存储器701中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)708中。
向量计算单元707可以对运算电路703的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元707可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现中,向量计算单元707能将经处理的输出的向量存储到统一存储器706。例如,向量计算单元707可以将非线性函数应用到运算电路703的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元707生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路703的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器706用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)705将外部存储器中的输入数据搬运到输入存储器701和/或统一存储器706、将外部存储器中的权重数据存入权重存储器702,以及将统一存储器706中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)710,用于通过总线实现主CPU、DMAC和取指存储器709之间进行交互。
与控制器704连接的取指存储器(instruction fetch buffer)709,用于存储控制器704使用的指令。
控制器704,用于调用取指存储器709中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器706、输入存储器701、权重存储器702以及取指存储器709均为片上(on-chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
数据的无损压缩是信息技术领域的重要基础方向之一。其目的为建立原数据空间到编码空间的双射,使得出现频率较高且较长的数据被较短的编码表示,从而在平均意义上获 得更短的数据表示长度,并且能根据该双射在原数据空间和编码空间之间实现一对一的转换。根据香农信源编码定理,数据的最优无损压缩长度由数据概率分布的香农信息熵决定;并且对数据概率分布估计地越准确,越能得到接近最优无损压缩长度。
基于人工智能的无损压缩方案利用了深度生成模型能够比传统的方案更准确地估计数据的概率分布这一特性,得到了远优于传统无损压缩方案的压缩比。在基于人工智能的无损压缩方案中,被广泛使用的深度生成模型包括自回归模型(autoregressive Models),变分自编码器(variational auto-encoder,VAE),流模型(normalizing flows)等。一般来讲,自回归模型可较好地兼容算术编码器和霍夫曼编码;变分自编码器结合使用反编码(bits-back)机制可较好地兼容非对称数字系统;流模型可以兼容上述三种不同的熵编码器。除了压缩比以外,评价无损压缩解决方案的还有吞吐率这一指标。对于基于人工智能的无损压缩解决方案来说,由于模型规模远大于传统方案,因此整体吞吐率低于传统方案。另外,综合压缩比和吞吐率两个指标来说,基于不同生成模型的无损压缩解决方案目前没有绝对的先后之分。目前的研究尚处于对不同生成模型的压缩方案探索其帕累托前沿的阶段。
其中,区别于全观测模型(如自回归模型),变分自编码器模型是一种隐变量模型。该类模型并非对数据数据本身直接建模,而是额外引入了一个(或者多个)隐变量,然后对先验分布,似然函数以及近似后验分布进行建模。由于从变分自编码器中无法直接获得数据数据的边际分布,传统的熵编码方式无法直接被沿用。为了能够使用变分自编码器进行数据的无损压缩,基于反编码机制的变分自编码无损压缩方案被提出。bits-back ANS是该方案的原始形式,适用于只包含一个隐变量的变分自编码器模型,并且可以推广适用到包含多个隐变量的变分自编码器模型。
以Bits-Back ANS和包含一个隐变量的变分自编码器为例,在包含一个隐变量的变分自编码器中,模型可以分为三个模块,即:先验模块、变分编码器模块和解码器模块。以上三个模块可以用来分别确定以下三个分布的参数,即:隐变量的先验分布,隐变量的似然函数(数据的条件概率分布)和隐变量的近似后验分布。
该技术方案中数据的压缩步骤为:
1.获取待压缩数据;
2.获取额外的初始比特数据(比特流1);
3.将待压缩数据输入变分编码器,从而获得隐变量的近似后验分布;根据近似后验分布从比特流1中使用熵编码器解码出一个隐变量的样本,并获得比特流2;
4.将解压出的隐变量样本输入解码器,从而获得数据的条件概率分布;根据数据的条件概率分布使用熵编码器将待压缩数据压缩进比特流2,从而获得比特流3;
5.从先验模块中获取隐变量的先验分布;根据隐变量的先验分布使用熵编码器将上述隐变量的样本压缩进比特流3,从而获得比特流4;
6.输出比特流4作为最终压缩后比特数据。
该技术方案中数据的解压步骤为:
1.获取待解压的比特数据(比特流4);
2.从先验模块中获取隐变量的先验分布;根据隐变量的先验分布使用熵编码器从第四 比特流中解压出压缩阶段使用的隐变量样本,从而获得比特流3;
3.将上述隐变量样本输入解码器,从而获得数据的条件概率分布;根据数据的条件概率分布使用熵编码器从比特流3中解压出被压缩的数据,并获得比特流2;
4.将解压出的数据输入变分编码器,从而获得隐变量的近似后验分布;根据近似后验分布使用熵编码器压缩上述隐变量样本进入比特流2,从而获得比特流1;
5.输出比特流1作为还原的额外初始比特;
6.输出解压缩出的数据。
现行的基于反编码机制的变分自编码器无损压缩方案均需要额外的初始比特用以解压出隐变量的样本。额外的初始比特为随机生成的数据,该数据的大小需要考虑进压缩成本中,且在待串行压缩的数据数量较少时,额外的平均成本较高;且,由于所需的额外初始比特与待压缩数据点的个数成正比,因此无法实现高效的并行压缩。
基于以上技术背景,本发明是针对基于变分自编码器的人工智能无损压缩方案进行的改良。本发明改良了该细分领域的两大痛点问题:一是通过引入一种特殊的自回归结构降低了变分自编码器达到相同压缩比所需的参数量,从而提升了吞吐率;二是通过引入一种特殊的的变分编码器和解码器结构和提出新的反编码算法,移除了基于变分自编码器无损压缩方案之前所必须的随机初始比特,从而实现了该方案的单数据点压缩解压以及高效并行压缩解压。
参照图8,图8为本申请实施例提供的一种数据压缩方法的实施例示意,如图8示出的那样,本申请实施例提供的一种数据压缩方法包括:
801、获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据。
在一种可能的实现中,第一目标数据可以为供压缩的图像数据或者是其他数据(例如文本、视频等),其中,第一目标数据可以是上述终端设备通过摄像头拍摄到的图像(或者是图像的一部分),或者,该第一目标数据还可以是从终端设备内部获得的图像(例如,终端设备的相册中存储的图像,或者,终端设备从云端获取的图片)。应理解,上述第一目标数据可以是具有图像压缩需求的数据,本申请并不对第一目标图像的来源作任何限定。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。其中,对于图像数据来说,包含了一个通道维度(C)和两个空间维度(宽W和高H)。
例如,第一目标数据可以包括6个通道,第一子数据可以为第一目标数据中的前三个通道的数据,第二子数据可以为第一目标数据中的后三个通道数据。
例如,第一目标数据在空间维度上的尺寸为N*N,第一子数据可以为第一目标数据中空间维度在(0至N/2)*N的数据,第二子数据可以为第一目标数据中空间维度在(N/2至N)*N的数据。
应理解,本申请并不限定在对第一目标数据进行数据切分的切分规则。
802、根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
在一种可能的实现中,变分自编码器可以包括变分编码器,解码器(例如本申请实施例中的第一解码器和第二解码器)和隐变量的先验分布。
在一种可能的实现中,解码器可以由解码器层(例如本申请实施例中的第一卷积神经网络以及第二卷积神经网络)构成,且包含解码器层的个数与变分自编码器中隐变量的个数相同。解码器层的作用是输入更深层的隐变量,输出当前层数据的条件概率分布(当前层数据可以是更浅层的隐变量或者数据数据)。
在现有的变分自编码器模型中,变分编码器需要输入整个数据数据以预测隐变量的近似后验分布,解码器中输入隐变量直接预测整个数据数据的条件概率分布。在本申请实施例中,将待压缩的数据分成至少两部分,即:第一子数据和第二子数据。和现有将全部数据输入到变分编码器不同的是,本申请实施例中仅将数据的一部分(第一子数据)输入到变分编码器,来预测隐变量的近似后验分布,且隐变量输入第一解码器后预测第一子数据的条件概率分布;第二子数据的条件概率分布依赖于第一子数据,具体可以由将第一子数据输入第一解码器来确定。
接下来介绍本申请实施例中的解码器的结构:
在一种可能的实现中,解码器可以实现像素重置操作。
接下来介绍本申请实施例中的空间维度到通道维度的像素重置操作:
在一种可能的实现中,可以配置参数(记为k)和两个在通道维度和空间维度进行像素重置的可逆操作(记为空间转通道操作和通道转空间操作)。其中,参数k取值正整数,决定了上述两个可逆操作中输入和输出张量空间维度尺寸变化的比例。对于同一个k,上述空间转通道操作和通道转空间操作互逆。
对于图像来说,图像数据可以表示为向量,其包含了一个通道维度(C)和两个空间维度(宽W和高H)。由于深度学习技术中数据批处理的特性,其相应的张量表征则多出一个批维度(N),即图片数据张量包含四个维度(NCHW或者NHWC)。以HCHW为例,一个大小为n1*c1*h1*w1的张量经过参数为k的空间转通道操作可以变为大小为n1*k2c1*h1/k*w1/k的张量。此处要求h1和w1均可以整除k。一个大小为n2*c2*h2*w2的张量经过参数为k的通道转空间操作变为大小为n2*c2/k2*kh2*kw2的张量。可以看出,上述两个操作不改变张量中元素的总个数,只改变元素在张量中的位置。对于不同的像素重置的规则,可以得到不同的像素重置装置。本申请实施例使用的像素重置操作采用通道优先的方式。由于空间转通道和通道转空间两个操作在固定k时互逆,图9示出了当n取值为1,h和w取值为4,c取值为3,k取值为2时,空间转通道的操作效果。
以第一解码器为例,在一种可能的实现中,所述第一解码器可以包括第一卷积神经网络和第二卷积神经网络,所述根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,具体可以包括:对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
其中,包括所述第二子数据的第二目标数据可以为和第一目标数据尺寸相同的数据,其中,在第一目标数据中,除了第二子数据之外的元素可以被置零(或者其他预设数值),以得到第二目标数据,对第二目标数据进行像素重置操作后,可以将其转化为和第一子数据在空间维度的尺寸相同的第三子数据。例如可以参照图10,其中,当前层变量可以为上述第二目标数据,可以对当前层变量进行像素重置操作。
在一种可能的实现中,可以根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同。也就是说,可以通过第一卷积神经网络,对第一子数据进行特征提取以及尺寸的变换,以便得到一个赫尔第三子数据在通道维度的尺寸相同的第四子数据。
在一种可能的实现中,可以将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据。可选的,融合方式可以为对应通道的数据替换。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,具体可以包括:将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
其中,在计算第二子数据的第i+1个通道像素的概率分布时,可以将第三子数据z′i-1前i个通道替换第四子数据z″i的前i个通道。
在一种可能的实现中,可以将根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,还可以将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作(concat),以得到拼接后的子数据;进而,所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,具体可以包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
示例性的,参照图10,更深层隐变量为zi,输出为当前层变量zi-1的条件概率分布。神经网络一的输入为zi,输出为与zi-1大小相同(包括的元素数量相同)的张量z′i。张量z′i经过参数为k的空间到通道像素重置操作变为和zi空间维度大小相同的张量z″i。由于张量zi与张量z″i空间维度大小相同,故可以按通道维度进行拼接操作,得到拼接后的张量z″′i。记张量zi-1经过参数为k的空间到通道像素重置操作变为和zi空间维度大小相同的张量z′i-1。该解码器层一引入自回归结构的方式为:将张量z″′i输入神经网络二,得到张量z′i-1第一个通道像素的概率分布参数;将张量z′i-1前i个通道替换z″′i中来自z″i的前i个通道并输入神经网络二,得到得到张量z′i-1第i+1个通道像素的概率分布参数。
参照图11,图11为本申请实施例中一个变分自编码器的示意,其中,解码器一相当于本申请实施例中的第二解码器,解码器二相当于本申请实施例中的第一解码器。
示例性,如下表1所述,表1示出了包含一个隐变量的变分自编码器为例的示例性流程。其流程的压缩流程如表1左所示,解压流程如图1右所示。其隐变量的近似后验分布q(z1|x)的参数由变分编码器输入待压数据后给出。其最深层隐变量的先验分布的参数由模型中最深层隐变量先验分布模块的参数直接给出。其余的条件概率分布的参数均由相应的解码器层,通过输入条件数据的值输出。每一个涉及的解码器层结构可以参照上述实 施例中关于解码器的描述。表1中x1,…x12为数据x(包含3个通道)输入通过通道优先的像素重置(参数k为2)进行空间维度到通道维度的变换所得的12个通道。对于同理。
表1
本申请实施例通过使用基于通道优先像素重置定义的自回归结构的编码器层,充分利用图片像素间的相关关系,从而在获得更低的编码长度的前提下大幅降低模型所需参数量,进而提高了压缩的吞吐率以及降低了模型存储的空间成本。
803、根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;
在一种可能的实现中,可以根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流。第一比特流可以作为初始比特流,并对所述第一子数据进行压缩。相比于现有技术中反编码机制所需的额外设置的初始比特,本申请实施例中无需额外设置的初始比特,可以实现单数据点的压缩,且大大降低了并行压缩时的压缩比。
804、将所述第一子数据压缩至所述第一比特流。
在一种可能的实现中,所述将所述第一比特流作为初始比特流,对所述第一子数据进行压缩,具体可以包括:根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;根据所述近似后验分布,从所述第一比特流中通过所述熵编码器得到所述隐变量,得到第三比特流;根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为所述第一子数据的条件概率分布;根据所述第二概率分布,通过所述熵编码器将所述第一子数据压缩至所述第三比特流,以得到第四比特流;根据所述隐变量的先验分布,通过所述熵编码器将所述隐变量压缩至所述第四比特流,得到第二比特流。
示例性的,在编码侧,可以执行如下步骤:
1.获取第一目标数据;
2.将第一目标数据分为第一子数据和第二子数据,将第一子数据输入第一解码器获取第二子数据的条件概率分布;使用第二子数据的条件概率分布压缩第二子数据从而获得初 始比特数据(第一比特流);
3.将第一子数据输入变分编码器,从而获得隐变量的近似后验分布;根据近似后验分布从第一比特流中使用熵编码器解码出一个隐变量的样本,并获得第三比特流;
4.将解压出的隐变量样本输入第二解码器,从而获得数据的条件概率分布;根据数据的条件概率分布使用熵编码器将第一子数据压缩进第三比特流,从而获得第四比特流;
5.从先验模块中获取隐变量的先验分布;根据隐变量的先验分布使用熵编码器将上述隐变量的样本压缩进第三比特流,从而获得第二比特流;
6.输出第二比特流作为最终压缩后的比特数据。
相应的,在解码段,解压步骤可以为:
1.获取待解压的比特数据(第二比特流);
2.从先验模块中获取隐变量的先验分布;根据隐变量的先验分布使用熵编码器从第二比特流中解压出压缩阶段使用的隐变量样本,从而获得第四比特流;
3.将上述隐变量样本输入解码器一,从而获得子数据一的条件概率分布;根据子数据一的条件概率分布使用熵编码器从第四比特流中解压出被压缩的子数据一,并获得第三比特流;
4.将解压出的子数据一输入变分编码器,从而获得隐变量的近似后验分布;根据近似后验分布使用熵编码器压缩上述隐变量样本进入第三比特流,从而获得第一比特流;
5.将子数据一输入解码器二,从而获得子数据二的条件概率分布;根数子数据二的条件概率分布从第一比特流中解压出子数据二,此时比特流被消耗完毕;
6.将子数据一和子数据二按照压缩时的分离方式逆转,从而得到原数据,输出解压缩出的原数据(第一目标数据)。
参照图12,图12为隐变量数量为1时的数据压缩过程的一个示意,其中S为第一目标数据,S1为第一子数据,S2为第二子数据。
为了比较本申请实施例与现有方案的区别,将其流程分别展示于图13。图13左示出了现有方案的压缩解压核心流程;图13右示出了本申请实施例无需额外初始比特的压缩解压核心流程。
参照表2,表2为基于通道优先像素重置定义的自回归结构的解码器层和无需额外初始比特的变分自编码器无损压缩解决方案的核心方法流程。
表2
接下来结合实验结果描述本申请实施例的有益效果。
在本申请实施例中,由于使用的编码器较好地利用了图片数据的像素间的相关性,能够在比同类型模型无损压缩方案给出更优编码长度的同时,将模型大小减小100倍。
表3展示了本方案(SHVC)与其他业界最优方案在公开数据集上的平均每维度编码比特数(bpd)。可以看出本方案效果在所有对比方案中(包括传统方案,VAE模型方案和流模型方案)均为最优或者接近最优。在同类型方案中(VAE based)为最优。
表4展示了本方案除了编码长度优势外,由于模型参数量的较少,其推理时间大大降低,从而提升了压缩和解压的吞吐率。其数据统计设定为10000张CIFAR10图片,批尺寸为100,硬件为一张V100显卡。
表3
表4
此外,本申请实施例可以在避免现行反编码机制所需的额外初始比特,实现单数据点的压缩和高效并行压缩。表5中展示了本申请实施例(SHVC)、使用(SHVC-ARIB)以及使用确定性后验(本质上为自编码器模型)和无反编码机制的方案(SHVC-Det)三种情况下考虑进额外初始比特时的平均每维度编码长度。从表5可以看出,使用本方案可以比现行的反编码压缩算法将额外空间成本降低高达30倍。
表5
应理解,本申请实施例也可以用到数据的有损压缩中。
本申请实施例提供了一种数据压缩方法,包括:获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;将所述第一比特流作为初始比特流,对所述第一子数据进行压缩,以得到第二比特流。相比于现有技术中反编码机制所需的额外设置的初始比特,本申请实施例中无需额外设置的初始比特,可以实现单数据点的压缩,且大大降低了并行压缩时的压缩比。
参照图14,图14为本申请实施例提供的一种数据解压缩方法的流程示意,如图14所示,本申请实施例提供的数据解压缩方法,包括:
1401、获取第二比特流以及隐变量的先验分布;
1402、根据所述先验分布,通过熵编码器从所述第二比特流中解压出所述隐变量,得到第四比特流;
1403、根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为第一子数据的条件概率分布;
1404、根据所述第二概率分布,通过所述熵编码器从所述第四比特流中解压出所述第一子数据,得到第三比特流;
1405、根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
1406、根据所述近似后验分布,通过所述熵编码器将所述隐变量压缩至所述第三比特流,得到第一比特流;
1407、根据所述第一子数据,通过所述变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
1408、根据所述第一概率分布,通过所述熵编码器从所述第一比特流中解压出第二子 数据;所述第一子数据和所述第二子数据用于确定第一目标数据。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,包括:
对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,包括:
将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,所述方法还包括:
将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
关于数据解压缩方法的描述,可以参照图8以及对应的实施例中关于数据解压缩相关的描述,这里不再赘述。
在图1至图14所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图15,图15为本申请实施例提供的数据压缩装置1500的一种结构示意图,数据压缩装置1500可以是终端设备或服务器,数据压缩装置1500包括:
获取模块1501,用于获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;
关于获取模块1501的具体描述可以参照上述实施例中步骤801的描述,这里不再赘述。
压缩模块1502,用于根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;
将所述第一比特流作为初始比特流,对所述第一子数据进行压缩,以得到第二比特流。
关于压缩模块1502的具体描述可以参照上述实施例中步骤802至步骤804的描述,这里不再赘述。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,所述压缩模块,具体用于:
根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
根据所述近似后验分布,从所述第一比特流中通过所述熵编码器得到所述隐变量,得到第三比特流;
根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为所述第一子数据的条件概率分布;
根据所述第二概率分布,通过所述熵编码器将所述第一子数据压缩至所述第三比特流,以得到第四比特流;
根据所述隐变量的先验分布,通过所述熵编码器将所述隐变量压缩至所述第四比特流,得到第二比特流。
在一种可能的实现中,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述压缩模块,具体用于:
对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,包括:
将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,所述装置还包括:
拼接模块,用于将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
参阅图16,图16为本申请实施例提供的数据解压缩装置1600的一种结构示意图,数据解压缩装置1600可以是终端设备或服务器,数据解压缩装置1600可以包括:
获取模块1601,用于获取第二比特流以及隐变量的先验分布;
关于获取模块1601的具体描述可以参照上述实施例中步骤1401的描述,这里不再赘 述。
解压模块1602,用于根据所述先验分布,通过熵编码器从所述第二比特流中解压出所述隐变量,得到第四比特流;
根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为第一子数据的条件概率分布;
根据所述第二概率分布,通过所述熵编码器从所述第四比特流中解压出所述第一子数据,得到第三比特流;
根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
根据所述近似后验分布,通过所述熵编码器将所述隐变量压缩至所述第三比特流,得到第一比特流;
根据所述第一子数据,通过所述变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
根据所述第一概率分布,通过所述熵编码器从所述第一比特流中解压出第二子数据;所述第一子数据和所述第二子数据用于确定第一目标数据。
关于解压模块1602的具体描述可以参照上述实施例中步骤1402至步骤1408的描述,这里不再赘述。
在一种可能的实现中,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的。
在一种可能的实现中,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
在一种可能的实现中,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述解压模块,具体用于:
对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
在一种可能的实现中,所述将所述第三子数据和所述第四子数据进行融合,包括:
将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
在一种可能的实现中,所述装置还包括:
拼接模块,用于将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布, 包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
接下来介绍本申请实施例提供的一种执行设备,请参阅图17,图17为本申请实施例提供的执行设备的一种结构示意图,执行设备1700具体可以表现为虚拟现实VR设备、手机、平板、笔记本电脑、智能穿戴设备、监控数据处理设备等,此处不做限定。其中,执行设备1700上可以部署有图15对应实施例中所描述的数据压缩装置、或者图16对应实施例中所描述的数据解压缩装置。具体的,执行设备1700可以包括:接收器1701、发射器1702、处理器1703和存储器1704(其中执行设备1700中的处理器1703的数量可以一个或多个,图15中以一个处理器为例),其中,处理器1703可以包括应用处理器17031和通信处理器17032。在本申请的一些实施例中,接收器1701、发射器1702、处理器1703和存储器1704可通过总线或其它方式连接。
存储器1704可以包括只读存储器和随机存取存储器,并向处理器1703提供指令和数据。存储器1704的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1704存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1703控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1703中,或者由处理器1703实现。处理器1703可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1703中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1703可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1703可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1704,处理器1703读取存储器1704中的信息,结合其硬件完成上述方法的步骤。
接收器1701可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1702可用于通过第一接口输出数字或字符信息;发射器1702还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1702还可以包括显示屏等显示设备。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图8所示实施例描述的方法所执行的步骤,或者,使得计算机执行如前述图14所示实施例描述的方法所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图8所示实施例描述的方法所执行的步骤,或者,使得计算机执行如前述图14所示实施例描述的方法所执行的步骤。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (29)

  1. 一种数据压缩方法,其特征在于,包括:
    获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;
    根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
    根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;
    将所述第一子数据压缩至所述第一比特流,以得到第二比特流。
  2. 根据权利要求1所述的方法,其特征在于,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
    所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
    所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者,
    所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述将所述第一子数据压缩至所述第一比特流,包括:
    根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
    根据所述近似后验分布,从所述第一比特流中通过所述熵编码器解码出所述隐变量,得到第三比特流;
    根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为所述第一子数据的条件概率分布;
    根据所述第二概率分布,通过所述熵编码器将所述第一子数据压缩至所述第三比特流,以得到第四比特流;
    根据所述隐变量的先验分布,通过所述熵编码器将所述隐变量压缩至所述第四比特流,得到第二比特流。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,包括:
    对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以 得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
    根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
    将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
    根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述第三子数据和所述第四子数据进行融合,包括:
    将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
  7. 根据权利要求5或6所述的方法,其特征在于,所述方法还包括:
    将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
    所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  8. 一种数据解压缩方法,其特征在于,包括:
    获取第二比特流;
    从所述第二比特流中解码出第一子数据,以得到第一比特流;
    根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为第二子数据的条件概率分布;
    根据所述第一概率分布,通过熵编码器从所述第一比特流中解压出第二子数据;所述第一子数据和所述第二子数据用于还原得到第一目标数据。
  9. 根据权利要求8所述的方法,其特征在于,所述从所述第二比特流中解码出第一子数据,以得到第一比特流,包括:
    获取隐变量的先验分布;
    根据所述先验分布,通过熵编码器从所述第二比特流中解压出所述隐变量,得到第四比特流;
    根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为第一子数据的条件概率分布;
    根据所述第二概率分布,通过所述熵编码器从所述第四比特流中解压出所述第一子数据,得到第三比特流;
    根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
    根据所述近似后验分布,通过所述熵编码器将所述隐变量压缩至所述第三比特流,得到第一比特流。
  10. 根据权利要求8或9所述的方法,其特征在于,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
    所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
    所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者
    所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
  11. 根据权利要求8至10任一所述的方法,其特征在于,所述第一子数据和所述第二子数据为对所述图像块在空间维度或者通道维度上进行数据切分后得到的。
  12. 根据权利要求8至11任一所述的方法,其特征在于,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,包括:
    对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
    根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
    将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
    根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  13. 根据权利要求12所述的方法,其特征在于,所述将所述第三子数据和所述第四子数据进行融合,包括:
    将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
  14. 根据权利要求12或13所述的方法,其特征在于,所述方法还包括:
    将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
    所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  15. 一种数据压缩装置,其特征在于,包括:
    获取模块,用于获取第一目标数据,所述第一目标数据包括第一子数据和第二子数据;
    压缩模块,用于根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为所述第二子数据的条件概率分布;
    根据所述第一概率分布,通过熵编码器压缩所述第二子数据,以得到第一比特流;
    将所述第一子数据压缩至所述第一比特流,以得到第二比特流。
  16. 根据权利要求15所述的装置,其特征在于,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
    所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
    所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者,
    所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
  17. 根据权利要求15或16所述的装置,其特征在于,所述压缩模块,具体用于:
    根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
    根据所述近似后验分布,从所述第一比特流中通过所述熵编码器解码出所述隐变量,得到第三比特流;
    根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为所述第一子数据的条件概率分布;
    根据所述第二概率分布,通过所述熵编码器将所述第一子数据压缩至所述第三比特流,以得到第四比特流;
    根据所述隐变量的先验分布,通过所述熵编码器将所述隐变量压缩至所述第四比特流,得到第二比特流。
  18. 根据权利要求15至17任一所述的装置,其特征在于,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述压缩模块,具体用于:
    对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
    根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
    将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
    根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  19. 根据权利要求18所述的装置,其特征在于,所述将所述第三子数据和所述第四子数据进行融合,包括:
    将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
  20. 根据权利要求18或19所述的装置,其特征在于,所述装置还包括:
    拼接模块,用于将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
    所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  21. 一种数据解压缩装置,其特征在于,包括:
    获取模块,用于获取第二比特流;
    解压模块,用于从所述第二比特流中解码出第一子数据,以得到第一比特流;
    根据所述第一子数据,通过变分自编码器的第一解码器,得到第一概率分布,所述第一概率分布用于作为第二子数据的条件概率分布;
    根据所述第一概率分布,通过熵编码器从所述第一比特流中解压出第二子数据;所述第一子数据和所述第二子数据用于还原得到第一目标数据。
  22. 根据权利要求21所述的装置,其特征在于,所述接码模块,具体用于:
    获取隐变量的先验分布;
    根据所述先验分布,通过熵编码器从所述第二比特流中解压出所述隐变量,得到第四比特流;
    根据所述隐变量,通过所述变分自编码器的第二解码器,得到第二概率分布;所述第二概率分布用于作为第一子数据的条件概率分布;
    根据所述第二概率分布,通过所述熵编码器从所述第四比特流中解压出所述第一子数据,得到第三比特流;
    根据所述第一子数据,通过所述变自分编码器中的变分编码器,得到隐变量的近似后验分布;
    根据所述近似后验分布,通过所述熵编码器将所述隐变量压缩至所述第三比特流,得到第一比特流。
  23. 根据权利要求21或22所述的装置,其特征在于,所述第一目标数据为图像块,所述第一子数据和所述第二子数据为对所述图像块进行数据切分后得到的;或者,
    所述第一目标数据为文字序列,所述第一子数据和所述第二子数据为对所述文字序列进行数据切分后得到的;或者,
    所述第一目标数据为二进制流,所述第一子数据和所述第二子数据为对所述二进制流进行数据切分后得到的;或者
    所述第一目标数据为视频,所述第一子数据和所述第二子数据为对所述视频的多个图像帧进行数据切分后得到的。
  24. 根据权利要求21至23任一所述的装置,其特征在于,所述第一解码器包括第一卷积神经网络和第二卷积神经网络,所述解压模块,具体用于:
    对包括所述第二子数据的第二目标数据进行空间维度到通道维度的像素重置操作,以得到第三子数据,所述第二目标数据和所述第一目标数据的尺寸大小一致,所述第三子数据和所述第一子数据在空间维度的尺寸相同;
    根据所述第一子数据,通过所述第一卷积神经网络,得到第四子数据,所述第四子数据和所述第三子数据在通道维度的尺寸相同;
    将所述第三子数据和所述第四子数据进行融合,以得到融合后的子数据;
    根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  25. 根据权利要求24所述的装置,其特征在于,所述将所述第三子数据和所述第四子数据进行融合,包括:
    将所述第四子数据中部分通道的数据替换为所述第三子数据中对应通道的数据,以得到融合后的子数据。
  26. 根据权利要求24或25所述的装置,其特征在于,所述装置还包括:
    拼接模块,用于将所述融合后的子数据和所述第一子数据沿着通道维度进行拼接操作,以得到拼接后的子数据;
    所述根据所述融合后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布,包括:根据所述拼接后的子数据,通过所述第二卷积神经网络,得到所述第一概率分布。
  27. 一种数据压缩装置,其特征在于,包括存储介质、处理电路以及总线系统;其中,所述存储介质用于存储指令,所述处理电路用于执行存储器中的指令,以执行所述权利要求1至14中任一项所述的方法的步骤。
  28. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至14中任一项所述的方法的步骤。
  29. 一种计算机程序产品,其特征在于,所述计算机程序产品包括代码,当所述代码被执行时,用于实现权利要求1至14任一项所述的方法的步骤。
PCT/CN2023/081315 2022-03-14 2023-03-14 一种数据压缩方法以及相关设备 WO2023174256A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210249906.1 2022-03-14
CN202210249906 2022-03-14
CN202310077949.0 2023-01-13
CN202310077949.0A CN116095183A (zh) 2022-03-14 2023-01-13 一种数据压缩方法以及相关设备

Publications (1)

Publication Number Publication Date
WO2023174256A1 true WO2023174256A1 (zh) 2023-09-21

Family

ID=86198942

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081315 WO2023174256A1 (zh) 2022-03-14 2023-03-14 一种数据压缩方法以及相关设备

Country Status (2)

Country Link
CN (1) CN116095183A (zh)
WO (1) WO2023174256A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150243A (zh) * 2023-10-27 2023-12-01 湘江实验室 一种基于故障影响解耦网络的故障隔离与估计方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110290387A (zh) * 2019-05-17 2019-09-27 北京大学 一种基于生成模型的图像压缩方法
US20200104640A1 (en) * 2018-09-27 2020-04-02 Deepmind Technologies Limited Committed information rate variational autoencoders
CN113569243A (zh) * 2021-08-03 2021-10-29 上海海事大学 基于自监督变分lstm的深层半监督学习网络入侵检测方法
CN113810058A (zh) * 2021-09-17 2021-12-17 哲库科技(上海)有限公司 数据压缩方法、数据解压缩方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104640A1 (en) * 2018-09-27 2020-04-02 Deepmind Technologies Limited Committed information rate variational autoencoders
CN110290387A (zh) * 2019-05-17 2019-09-27 北京大学 一种基于生成模型的图像压缩方法
CN113569243A (zh) * 2021-08-03 2021-10-29 上海海事大学 基于自监督变分lstm的深层半监督学习网络入侵检测方法
CN113810058A (zh) * 2021-09-17 2021-12-17 哲库科技(上海)有限公司 数据压缩方法、数据解压缩方法、装置及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAI, ZHENGLI; LIANG, ZHENMING; ZHOU, WEI; SUN, XIA: "Research Overview of Variational Auto-Encoders Models", COMPUTER ENGINEERING AND APPLICATIONS, HUABEI JISUAN JISHU YANJIUSUO, CN, vol. 55, no. 3, 1 February 2019 (2019-02-01), CN , pages 1 - 9, XP009548958, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002-8331.1810-0284 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150243A (zh) * 2023-10-27 2023-12-01 湘江实验室 一种基于故障影响解耦网络的故障隔离与估计方法
CN117150243B (zh) * 2023-10-27 2024-01-30 湘江实验室 一种基于故障影响解耦网络的故障隔离与估计方法

Also Published As

Publication number Publication date
CN116095183A (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2020221200A1 (zh) 神经网络的构建方法、图像处理方法及装置
EP3940591A1 (en) Image generating method, neural network compression method, and related apparatus and device
WO2022116856A1 (zh) 一种模型结构、模型训练方法、图像增强方法及设备
WO2021018163A1 (zh) 神经网络的搜索方法及装置
WO2021155832A1 (zh) 一种图像处理方法以及相关设备
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
JP2021510888A (ja) 加速された量子化積和演算
WO2023231794A1 (zh) 一种神经网络参数量化方法和装置
CN110222718B (zh) 图像处理的方法及装置
EP4283876A1 (en) Data coding method and related device
WO2022021938A1 (zh) 图像处理方法与装置、神经网络训练的方法与装置
WO2022028197A1 (zh) 一种图像处理方法及其设备
WO2023207836A1 (zh) 一种图像编码方法、图像解压方法以及装置
CN111950700A (zh) 一种神经网络的优化方法及相关设备
WO2022156475A1 (zh) 神经网络模型的训练方法、数据处理方法及装置
WO2023174256A1 (zh) 一种数据压缩方法以及相关设备
US20240078414A1 (en) Parallelized context modelling using information shared between patches
CN114266897A (zh) 痘痘类别的预测方法、装置、电子设备及存储介质
WO2022222854A1 (zh) 一种数据处理方法及相关设备
Yi et al. Elanet: effective lightweight attention-guided network for real-time semantic segmentation
WO2022022176A1 (zh) 一种图像处理方法以及相关设备
WO2023207531A1 (zh) 一种图像处理方法及相关设备
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023029559A1 (zh) 一种数据处理方法以及装置
WO2022100140A1 (zh) 一种压缩编码、解压缩方法以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769758

Country of ref document: EP

Kind code of ref document: A1