WO2018120019A1

WO2018120019A1 - Compression/decompression apparatus and system for use with neural network data

Info

Publication number: WO2018120019A1
Application number: PCT/CN2016/113497
Authority: WO
Inventors: 陈天石; 罗宇哲; 郭崎; 刘少礼; 陈云霁
Original assignee: 上海寒武纪信息科技有限公司
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2018-07-05

Abstract

A compression/decompression apparatus and system for use with neural network data. Said compression apparatus comprises: a model conversion module (120), which is used for converting neural network data into video-like data, and a data encoding module (131), which is connected to the model conversion module (120) and which is used for encoding the video-type data by using a video encoding method to obtain a compression result. Said decompression apparatus comprises: a data decoding module (132), which is used for obtaining a compression result and which decodes the compression result by using a video decoding method corresponding to the compression result; and a model conversion module (120), which is connected to the data decoding module (132) and which is used for restoring decoded video-type data to neural network data. Said system comprises said compression apparatus and decompression apparatus. The apparatus and system compress/decompress neural network data by virtue of video encoding and decoding methods, thus attaining a high compression ratio, greatly reducing the storage space and transmission burden of neural network models.

Description

Apparatus and system for compression/decompression of neural network data

Technical field

The present invention relates to the field of artificial neural network technologies, and in particular, to an apparatus and system for compression/decompression of neural network data.

Background technique

Artificial neural networks (ANNs), referred to as neural networks (NNs), are mathematical models of algorithms that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. This kind of network relies on the complexity of the system to adjust the relationship between a large number of internal nodes to achieve the purpose of processing information.

At present, neural networks have made great progress in many fields such as intelligent control and machine learning. With the rise of deep learning, neural networks have once again become a hot issue in the field of artificial intelligence. With the broad integration of big data and deep learning, the size of neural networks has become larger and larger. Researchers at Google Inc. have proposed the concept of “large-scale deep learning” and hope to build intelligent computer systems through Google as a platform to integrate global information.

With the continuous development of deep learning technology, the current model of neural networks is getting larger and larger, and the demand for storage performance and memory access bandwidth is getting higher and higher. If you do not compress, not only a large amount of storage space is required, but also the memory bandwidth requirements are very high. Compressed neural networks, as a new technical concept, have shown sufficient necessity in the context of the increasing scale of neural networks.

Summary of the invention

(1) Technical problems to be solved

In view of the above technical problems, the present invention provides an apparatus and system for compression/decompression of neural network data to reduce the pressure of storage space and memory access bandwidth.

(2) Technical plan

According to an aspect of the invention, a compression apparatus for neural network data is provided. The compression device includes: a model conversion module 120 for converting neural network data into video-like data; and a data encoding module 131 connected to the model conversion module 120 for video encoding The data is encoded to obtain a compressed result.

Preferably, in the compression device of the present invention, the video-like data refers to a model conversion module. After the conversion, each of the original neural network data is converted into a series of integer values within a preset range, corresponding to the representation of one pixel, which together constitute the corresponding video data.

Preferably, in the compression device of the present invention, the model conversion module 120 converts the neural network data into video-like data in one of two ways:

The first way: For neural network data with a data range of [-b, a], the model conversion module 120 operates as follows:

Where I is an integer in the interval [0, 255], that is, a representation of one pixel; w is the true data value of the neural network data in the range [-b, a];

The second way: for convolutional neural network data, the model conversion module 120 converts the weights and offsets of the hidden layer neurons corresponding to each feature map in the convolutional neural network data, and weights and biases The integers obtained after the conversion are integrated to obtain data of the corresponding video frame, and the video data of the video frame is obtained by combining the weights of the hidden layer neurons corresponding to the plurality of feature maps and the data of the similar video frames obtained by the offset.

Preferably, in the compression device of the present invention, the data encoding module 131 includes: an encoding sub-module for encoding the video data in a video encoding manner to obtain a data encoding result; and an integration sub-module for The data encoding result and the encoding process information are integrated to obtain a compressed result.

Preferably, in the compression apparatus of the present invention, the encoding sub-module includes: a prediction unit 130a for performing predictive coding using correlation between video-like data neighboring data; and a transforming unit 130b for processing the predicted unit The video-like data is orthogonally transform-encoded to compress the data; the quantization unit 130c is configured to perform quantization coding on the video-like data processed by the transform unit, and reduce the coding length of the data without degrading the data quality; and entropy The encoding unit 130d is configured to perform rate compression encoding on the video-like data processed by the quantization unit by using statistical characteristics of the data to reduce data redundancy.

Preferably, in the compression device of the present invention, the encoding sub-module includes: a depth auto-encoder unit 130, configured to further encode the video-like data output by the model conversion module, and output the hidden layer as a coding result; wherein the depth The automatic encoder unit 130 trains by using the video-like data as a training input and an ideal output by minimizing the reconstruction error. It is basically the same data as the input class video data.

Preferably, the compression device of the present invention further includes: a data cache module 140 for buffering neural network data; and a controller module 110 coupled to the data cache module 140, the model conversion module 120, and the data encoding module 131 for transmitting Control instructions to perform the following operations:

Sending a data read command to the data cache module 140, requesting the neural network data to be sent to the outside world, and buffering the neural network data;

Sending a data read instruction to the model conversion module 120 to read the neural network data from the data cache module 140;

Transmitting a data conversion instruction to the model conversion module 120 to convert the read neural network data into video-like data;

Sending a data read instruction to the data cache module 140, requesting it to request the class video data to the model conversion module 120, and performing caching;

Sending a data read instruction to the data encoding module 131 to read the video-like data from the data cache module 140;

Transmitting, by the data encoding module 131, a data encoding instruction, where the encoding instruction includes information of the encoding mode, and encoding the unit-type video data corresponding to the encoding mode to obtain a data encoding result;

Sending an integration instruction to the data encoding module 131, so that the data encoding result and the encoding process information are integrated to obtain a compression result;

The data cache instruction is sent to the data cache module 140 to obtain the compression result from the data encoding module 131, and the compression result is cached.

According to another aspect of the present invention, a decompression apparatus for neural network data is also provided. The decompression device includes: a data decoding module 132, configured to obtain a compression result, and decode the compression result by using a video decoding manner corresponding to the compression result; and a model conversion module 120 connected to the data decoding module 132, Used to restore the decoded class video data to neural network data.

Preferably, in the decompression device of the present invention, the data decoding module 132 includes: a de-integration sub-module for de-integrating the compression result to obtain a data encoding result and encoding process information; and a decoding sub-module for The coding mode information is extracted from the coding process information, and the data coding result is decoded by using a decoding mode corresponding to the coding mode information to obtain a class video. data.

Preferably, in the decompression device of the present invention, the model conversion module restores the decoded video-like data to neural network data in one of two ways:

The first way: for the neural network data with the data range in [-b, a], the model conversion module 120 operates according to the following formula to obtain the true data values of the neural network data:

Where w is the true data value of the neural network data in the range [-b, a], and I is the video-like data, which is an integer in the interval [0, 255].

The second way: for convolutional neural network data, the model conversion module 120 converts the data of the corresponding video frame in the video-like data, and converts each frame into a hidden layer neuron corresponding to a feature map of the convolutional neural network. Weight and offset, the data converted by each frame is integrated to obtain the weight and offset of the hidden layer neurons corresponding to each feature map of the convolutional neural network.

Preferably, the decompression device of the present invention further includes: a data cache module 140 for buffering the compression result; and a controller module 110, coupled to the model conversion module 120, the data decoding module 132, and the data cache module 140, for Release the control instructions to do the following:

Sending a data read instruction to the data cache module 140, requesting the external result to be compressed, and buffering the compressed result;

Sending a data read instruction to the data decoding module 132 to read the compressed result from the data cache module 140;

Sending a de-integration instruction to the data decoding module 132 to decode the encoding process information and the data compression result from the compression result;

Sending a data read instruction to the data decoding module 132, and reading the encoding process information from the data decoding module 132;

Selecting a decoding instruction according to the encoding process information;

Sending a decoding instruction to the data encoding and decoding module 132, so that it decompresses the data compression result in the compression result to obtain video-like data;

Sending a data read instruction to the data cache module 140, causing it to read the video-like data from the data decoding module 132, and buffering;

Sending a data read instruction to the model conversion module 120 to cause it to pass from the data cache module 140 Reading class video data;

A data conversion instruction is sent to the model conversion module 120 to convert the video-like data into neural network data.

According to still another aspect of the present invention, a system for compression/decompression of neural network data is also provided. The system includes: a compression device, which is the compression device described above; and a decompression device, which is the decompression device described above; wherein the compression device and the decompression device share a data cache module 140, a controller module 110, and a model Conversion module 120.

(3) Beneficial effects

It can be seen from the above technical solutions that the apparatus and system for compressing/decompressing neural network data of the present invention have at least one of the following beneficial effects:

(1) Using the video codec method to compress/decompress the neural network data to achieve a higher compression ratio, which greatly reduces the storage space and transmission pressure of the neural network model;

(2) In the data compression/decompression module, a variety of algorithms similar to the video codec method are integrated, which can greatly accelerate the compression/decompression process;

(3) It has a dedicated data buffer module and controller module to serve a variety of video codec-dedicated modules, supporting a combination of multiple video codec technologies, greatly increasing the flexibility and practicability of the device, while supporting emerging A technique of compressing and decompressing using a deep neural network.

In summary, the present invention can achieve high-efficiency compression and decompression of a large-scale neural network model, thereby greatly reducing the storage space and transmission pressure of the neural network model, thereby adapting to the trend of expanding the size of the neural network in the era of big data.

DRAWINGS

1 is a block diagram showing the structure of a compression apparatus for compressing neural network data according to a first embodiment of the present invention.

2 is a schematic structural view of a data encoding module in the compression device shown in FIG. 1.

3 is a flow chart of a controller module in FIG. 1 transmitting a control command to perform an operation.

4 is a schematic structural diagram of a decompression apparatus for decompressing a neural network data compression result according to a second embodiment of the present invention.

FIG. 5 is a schematic structural diagram of a data decoding module in the decompression device shown in FIG. 4.

6 is a controller module of the decompression device shown in FIG. 4 transmitting a control command to perform an operation Flow chart.

FIG. 7 is a schematic structural diagram of a compression/decompression system for neural network data compression results according to a third embodiment of the present invention.

[Description of main components of the present invention]

110-controller module; 120-model conversion module; 140-data cache module;

130-data encoding/decoding module;

131-data encoding module; 132-data decoding module;

130a-prediction unit; 130b-transform unit; 130c-quantization unit;

130d-entropy coding unit; 130e-depth automatic encoder unit;

200 - External storage module.

detailed description

Video coding and decoding technology is a very mature technology. Traditional video coding and decoding technology uses techniques such as prediction, transform and entropy coding. After deep learning, the use of deep neural networks for video encoding and decoding has become a new research hotspot.

After careful research and comparison, the applicant found that the neural network data has the same local correlation as the pixels of the video image. Therefore, using the video codec method to encode and decode the neural network model, and then compressing the neural network model, it will be a A viable technical route.

The present invention will be further described in detail below with reference to the specific embodiments of the invention.

First, the compression device embodiment

In a first exemplary embodiment of the present invention, a compression apparatus for compressing neural network data is provided. Referring to FIG. 1, the compression apparatus for compressing neural network data in this embodiment includes: a controller module 110, a model conversion module 120, a data encoding module 131, and a data cache module 140.

In this embodiment, the data cache module 140 is configured to cache the neural network data obtained by the external storage module 200. The model conversion module 120 is coupled to the data cache module 140 for converting neural network data into video-like data. The data encoding module 131 is coupled to the model conversion module 120 for encoding video-like data in a video encoding manner. The controller module 110 is connected to the model conversion module 120, the data encoding module 131, and the data cache module 140, and is used to issue control commands to the three to coordinate the work.

Referring to FIG. 3, in this embodiment, the controller module 110 sends a control instruction to perform the following operations:

Step S302, sending a data read instruction to the data cache module 140, requesting the neural network data to the external storage module 200, and buffering the neural network data;

Among them, neural network data refers to data that characterizes the type, structure, weight and neuron characteristics of the neural network.

Step S304, sending a data read instruction to the model conversion module 120 to read the neural network data from the data cache module 140;

Step S306, sending a data conversion instruction to the model conversion module 120, so that it converts the read neural network data into video-like data;

Wherein, the video-like data herein refers to the original neural network data converted into a series of integer values within a preset range after the conversion by the model conversion module, such as an integer value in the interval [0, 255], Corresponding to the representation of one pixel, these integers together constitute video-like data. The following two examples of specific neural network data are described:

(1) For neural network data with data ranges in [-b, a] (both a and b are positive integers):

Converting the neural network data to an integer from 0 to 255 (corresponding to 8bpp at this time), the model conversion module can operate as follows:

Where I is an integer in the interval [0, 255], that is, a representation of one pixel; w is the true data value of the neural network data in the range of [-b, a].

(2) For convolutional neural network data

The model conversion module 120 converts the weights and offsets of the hidden layer neurons corresponding to each feature map in the convolutional neural network data, and integrates the integers obtained by the weight conversion to obtain data of the corresponding video frame. The weight of the hidden layer neurons corresponding to the plurality of feature maps and the data of the similar video frames obtained by the offsets are combined to obtain the video-like data.

Step S308, sending a data read instruction to the data cache module 140, requesting the class conversion data to be requested by the model conversion module 120, and performing buffering;

Step S310, sending a data read instruction to the data encoding module 131 to read the video-like data from the data cache module 140;

Step S312, sending a data encoding instruction to the data encoding module 131, where the encoding instruction includes information of the encoding mode, so that the unit-type video data corresponding to the encoding mode is encoded to obtain a data encoding result;

Referring to FIG. 2, in this embodiment, the data encoding module 131 includes: an encoding sub-module for encoding the video data in a video encoding manner to obtain a data encoding result; and an integration sub-module for using the data The coding result is integrated with the coding process information to obtain a compression result. The encoding sub-module further includes: a prediction unit 130a, a transform unit 130b, a quantization unit 130c, an entropy encoding unit 130d, and a depth auto-encoder unit 130e.

Among them, in the first coding method:

(1) The prediction unit 130a performs predictive coding using correlation between adjacent video-like data (neural network data).

For example, the similarity of the weights of the neural network units corresponding to different feature maps predicts the weight of the neural network, and encodes the difference between the predicted value and the actual value to achieve the purpose of compression.

(2) The transform unit 130b performs orthogonal transform coding on the video-like data processed by the prediction unit 130a, thereby achieving the purpose of compression.

For example, when performing two-dimensional discrete cosine transform (DCT) on video-like data, let f(m, n) be N×N discrete video data, then the two-dimensional DCT transform is expressed as:

Where u, v=0, 1, ..., N-1, when u=0 (v=0),

When u=1, 2, ..., N-1 (v = 1, 2, ..., N-1), c(u) = 1 (c(v) = 1); f(m, n) For the value of (m, n) in the matrix before encoding, F(u, v) is the value of (u, v) in the matrix after encoding.

(3) The quantization unit 130c quantizes the video-like data processed by the transform unit, which can reduce the coding length of the data without degrading the data quality.

For example, when using scalar quantization techniques, the data is processed as follows:

Where F(u,v) is the data of any position in the video-like data processed by the transform unit (u, v=0, 1, ..., N-1), Q _step is the quantization step size, FQ(u , v) is the quantized value of F(u,v), and round() is the rounding function (the output is the integer closest to the input real number).

(4) The entropy coding unit 130d performs rate compression coding on the video-like data processed by the quantization unit by using the statistical characteristics of the data, such as Huffman coding and arithmetic coding.

For example, when Huffman coding is performed, a binary code of a short word length is assigned to a symbol having a high probability of occurrence, and a binary code of a long word length is assigned to a symbol having a small probability of occurrence, thereby obtaining a code having the shortest average code length. The entropy encoding unit 130d can decode the encoded data by adopting a decoding method corresponding to the encoding method.

In general, the coding mode includes: prediction, transform, quantization, and entropy coding. In the data coding module 131, the video-like data sequentially passes through the prediction unit 130a, the transform unit 130b, the quantization unit 130c, and the entropy coding unit 130d. The output of one module is the input of the latter module. For example, a set of video data, after being subjected to the prediction unit 130a, becomes a coding result of the difference between the predicted value and the actual value, enters the transform unit 130b, is further compressed by the two-dimensional DCT transform, and then enters the quantization unit 130c so that the code length thereof is shortened. Finally, the coding redundancy is reduced by the Huffman coding of the entropy coding unit 130d, thereby achieving a better compression effect.

Wherein, in the second coding mode, the depth auto-encoder unit 130e encodes the data by using the working principle of the depth auto-encoder.

Wherein, the depth auto-encoder unit 130 trains the video-like data as a training input and an ideal output by minimizing the reconstruction error, so that the output becomes substantially the same data as the input-type video data, so that the depth auto-encoder unit will The hidden layer output is used as the encoding result, and the final output is used as the decoding result. Since the number of neurons in the hidden layer is less than the number of input neurons, the input data can be compressed. It should be noted that the depth autoencoder unit encodes the information of the decoder side of the deep autoencoder and encodes the encoded result for decoding.

It should be noted that the coding command may be one of the foregoing coding modes, or may be a combination of the above two coding modes, or may be other video coding modes.

Specifically, the sequence of instructions in the controller module can be determined by a program written by the user, and the neural network data can be compressed by using a compression method that the user desires to use. Users can combine different coding methods by writing related programs. The controller module compiles the relevant program into Order, and decode the instructions into relevant control instructions to achieve control of each module and encoding process.

For the specific process of encoding, reference may be made to the related description of the video coding, and will not be further described in detail herein.

It should be further noted that the process of compressing data is essentially a process of encoding data, and the encoding process in the above process can be equivalently regarded as part of a compression process or a compression process.

Step S314, sending an integration instruction to the data encoding module 131, so that the data encoding result and the encoding process information are integrated to obtain a compression result;

After this step, the compression result includes two parts: the first part is the data encoding result of the neural network data, and the second part is the encoding process information. The coding process information may include: information of the coding mode and information of the decoding end of the depth auto-encoder (when the depth auto-encoder unit is used).

Step S316, sending a data cache instruction to the data cache module 140, obtaining a compression result from the data encoding module 131, and buffering the compression result;

Step S318, sending a data storage instruction to the data cache module 140 to save the compression result to the external storage module 200.

It should be noted that, in this embodiment, the compression result is output to the external storage module, but in other embodiments of the present invention, the compression result may be directly transmitted, or the compression result may be cached in the data encoding module. The 131 or the data cache module 140 is an optional implementation of the present invention.

So far, the compression device for compressing neural network data in this embodiment has been introduced.

Second, the decompression device embodiment

In a second exemplary embodiment of the present invention, a decompression apparatus for decompressing neural network data compression results is provided.

Referring to FIG. 4, the decompression device for decompressing the neural network data compression result is similar to the compression device of the first embodiment, and includes: a controller module 110, a model conversion module 120, a data decoding module 132, and Data cache module 140. The connection relationship of each module in the decompression device in this embodiment is similar to the connection relationship of the compression device in the first embodiment, and will not be described in detail herein.

In this embodiment, the data cache module 140 is configured to cache the compression result. The data decoding module 132 is connected to the model conversion module 120 for using a video decoder corresponding to the compression result. The compression result is decoded. The model conversion module 120 is connected to the data decoding module (132) for restoring the decoded video-like data to neural network data. The controller module 110 is connected to the model conversion module 120, the data decoding module 132, and the data cache module 140, and is configured to issue control instructions to the three to coordinate the work.

Different from the compression device of the first embodiment, the operations performed by the respective modules in the decompression device of the present embodiment are inverse to the operations performed by the corresponding modules of the compression device of the first embodiment. Specifically, referring to FIG. 5, in this embodiment, the controller module 110 sends a control instruction to perform the following operations:

Step S602, sending a data read instruction to the data cache module 140, requesting the compression result to the external storage module 200, and buffering the compression result;

As described above, the compression result here includes two parts: the first part is the data encoding result of the neural network data, and the second part is the encoding process information.

Step S604, sending a data read instruction to the data decoding module 132 to read the compression result from the data cache module 140;

Step S606, sending a de-integration instruction to the data decoding module 132, so that it decodes the encoding process information and the data compression result from the compression result;

Step S608, sending a data read instruction to the data decoding module 132, and reading the encoding process information from the data decoding module 132;

Step S610, selecting a decoding instruction according to the encoding process information;

As described above, the encoding process information may include: information of the encoding mode and information of the decoding end of the depth autoencoder (when the depth autoencoder unit is used). Therefore, it is possible to obtain from the encoding process information which encoding mode or combination of encoding modes is used to encode the neural network data, and accordingly generate corresponding decoding instructions. The decoding instruction includes which decoding method is used to decode the data encoding result in the compression result.

Step S612, sending a decoding instruction to the data encoding and decoding module 132, so that it decompresses the data compression result in the compression result to obtain video-like data;

The data decoding module 132 includes: a de-integration sub-module, configured to de-integrate the compression result to obtain a data encoding result and encoding process information; and a decoding sub-module, configured to extract the encoding mode information from the encoding process information, The data encoding result is decoded by using a decoding method corresponding to the encoding mode information to obtain video-like data. The decoding submodule further includes: a prediction unit 130a, a transformation unit 130b, a quantization unit 130c, and an entropy coding unit 130d. And depth autoencoder unit 130e. The operations performed by each unit are inverse to the related operations in the encoding module.

Among them, in the first decoding mode (as shown by the solid line in Figure 5):

(1) Entropy encoding unit 130d may perform an entropy decoding process corresponding to the entropy encoding method used when encoding the data, such as a decoding process of Huffman encoding.

(2) The quantization unit 130c performs inverse quantization processing on the compression result processed by the entropy coding unit. For data processed by scalar quantization techniques, the following inverse quantization process is used:

F(u,v)=FQ(u,v)·Q _step (3-2)

All parameter definitions are the same as Equation 3-1 and will not be repeated here.

(3) The transform unit 130b performs inverse orthogonal transform on the data compression result processed by the quantization unit to perform decoding.

For example, inverse to Equation 2-1, the inverse two-dimensional discrete cosine transform for an N × N matrix is expressed as:

All parameter definitions are the same as Equation 2-1 and will not be repeated here.

(4) The prediction unit 130a decodes the compression result processed by the transformation unit using the correlation between adjacent data in the original neural network data.

For example, the prediction unit 130a may add the predicted value to the correlation difference to restore the original value.

Wherein, in the second decoding mode, the depth auto-encoder unit 130e decodes the neural network data encoded by the deep auto-encoder (as indicated by a broken line in FIG. 5).

For example, in the decoding process, the depth auto-encoder unit 130e first decodes the decoding end information of the depth auto-encoder used in the encoding from the input data, constructs a decoder using the decoding-end information, and uses the decoder pair. The neural network data encoded by the deep autoencoder is decoded.

In the first embodiment, the encoding instruction may be an encoding method or a combination of two or more encoding methods. Corresponding to the first embodiment, if the data input to the data decoding module 132 adopts two or more encoding modes, the data decoding module 132 sequentially decodes the data by using a corresponding decoding manner.

For example, when the encoding method used for inputting the data of the data decoding module 132 is prediction, transform, quantization, and Huffman encoding, the encoded data sequentially passes through the entropy encoding module 130d, and the quantization module 130c. The transform module 130b and the prediction module 130a, the output of the previous module is the input of the latter module. For example, the compressed neural network data of a set of input data encoding and decoding modules is decoded by the entropy encoding module 130d for the decoding process corresponding to the Huffman encoding, and the decoding result is input to the quantization unit 130c for inverse quantization, and then enters the transform unit 130b for reverse. The transform finally enters the prediction unit 130a so that the predicted value is added to the correlation difference, thereby outputting the decoded result.

For the specific process of decoding, reference may be made to the related description of video decoding, which will not be further described in detail herein.

Step S614, sending a data read instruction to the data cache module 140, and causing the data cache module 140 to read the video-like data from the data decoding module 132, and buffering;

Step S616, sending a data read instruction to the model conversion module 120, so that the model conversion module 120 reads the video-like data from the data cache module 140;

Step S618, sending a data conversion instruction to the model conversion module 120, so that the model conversion module 120 converts the video-like data into neural network data;

Regarding the conversion process, it is inverse to the process performed by the model conversion module in the first embodiment.

Taking the first method as an example: for neural network data with a data range of [-b, a], the model conversion module (120) operates according to the following formula to obtain real data values of the neural network data:

Taking the second method as an example: for convolutional neural network data, the model conversion module 120 converts the data of the corresponding video frame in the video-like data, and converts each frame into a hidden layer corresponding to a feature map of the convolutional neural network. The weights and offsets of the neurons integrate the data converted by each frame to obtain the weights and offsets of the hidden layer neurons corresponding to the feature maps of the convolutional neural network.

Step S620, sending a data read instruction to the data cache module 140, and causing the data cache module 140 to request the neural network data from the model conversion module 120, and buffering;

Step S622, send a data write command to the data cache module 140, so that the data cache module 140 writes the neural network data to the external storage module 200;

It should be noted that, in this embodiment, the decoding result is output to the external storage module, but in other embodiments of the present invention, the decoding result may be directly transmitted, or Cache the decoding result in the model conversion module or the data cache module, which are optional implementations of the present invention.

It should be further explained that the decompression process is essentially a decoding process, so the decoding process in the above process can be equivalently regarded as part of the decompression process or the decompression process.

So far, the decompression device for decompressing neural network data in this embodiment has been introduced.

Third, the compression / decompression system embodiment

In a third exemplary embodiment of the present invention, a compression/decompression system is also provided. As shown in Fig. 7, the compression/decompression system of the present embodiment integrates the compression device of the first embodiment and the decompression device of the second embodiment. And, the compression device and the decompression device share the controller module 110, the model conversion module 120, and the data cache module 140. And, the data encoding module 131 in the compression device and the data decoding module 132 in the decompression device are integrated into the data encoding/decoding module 130. In the data codec module 130, the data encoding module 131 and the data decoding module share a prediction unit 130a, a transform unit 130b, a quantization unit 130c, an entropy encoding unit 130d, and a depth auto encoder unit 130e.

The following summarizes the compression process and the decompression process of the compression/decompression system of this embodiment separately:

Compression process:

First, the neural network data is stored in the external storage module 200; then, the controller module 110 sends a control command to the relevant module to control the compression process; the data cache module 140 reads the neural network data from the external storage module and caches; and then, the model The conversion module 120 reads the neural network data from the data cache module 140 and converts it into class video data, and then stores the video data to the data cache module 140; then, the data encoding module 131 reads the class from the data cache module 140. The video data, which in turn passes through the processing of the prediction unit 130a, the transform unit 130b, the quantization unit 130c, and the entropy encoding unit 130d completes the compression process; subsequently, the data buffer module 140 reads the compressed data from the data encoding and decoding module 30. Finally, the data cache module 140 writes the compression result to the external storage module.

Decoding process:

First, the data to be decompressed is stored in the external storage module 200, and the data is compressed by the prediction, transformation, quantization, and entropy coding processes of the neural network data; in the following process, the controller module 110 sends the control instruction. To each relevant module to control the decompression process. Data The cache module 140 reads the data to be decompressed from the external storage module 200. Next, the data decoding module 132 reads the data to be decompressed from the data buffer module 140, and the data is subjected to processing by the entropy encoding unit 130d, the quantization unit 130c, the transform unit 130b, and the prediction unit 130a, and decompressed into video-like data. The data cache module 140 then reads the video-like data from the data encoding and decoding module 30. Subsequently, the data cache module 140 stores the video-like data to the model conversion module 120, which converts it into neural network data. Finally, the data cache module 140 reads the neural network data from the model conversion module 120 and writes it to the external storage module 200.

So far, the compression/decompression system for compressing/decompressing neural network data in this embodiment has been introduced.

Heretofore, the present embodiment has been described in detail with reference to the accompanying drawings. In view of the above description, those skilled in the art should have a clear understanding of the apparatus and system of the present invention for compression/decompression of neural network data.

The invention can be applied to the following (including but not limited to) scenarios: data processing, robots, computers, printers, scanners, telephones, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers , cameras, camcorders, projectors, watches, earphones, mobile storage, wearable devices and other electronic products; aircraft, ships, vehicles and other types of transportation; televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, Electric lights, gas stoves, range hoods and other household appliances; and including nuclear magnetic resonance instruments, B-ultrasound, electrocardiograph and other medical equipment.

It should be noted that the implementations that are not shown or described in the drawings or the text of the specification are all known to those of ordinary skill in the art and are not described in detail. In addition, the above definitions of the various elements and methods are not limited to the specific structures, shapes or manners mentioned in the embodiments, and those skilled in the art can simply modify or replace them, for example:

(1) The external storage module and the data cache module may also exist in a whole form, that is, the two modules are merged into one module having a storage function;

(2) The external storage module and the data cache module may also exist in a form of local storage distributed among the modules.

(3) The external storage module can be replaced by a hard disk;

(4) The external storage module can be replaced by an input/output module for inputting and outputting data.

It should also be noted that an example of parameters containing specific values may be provided herein, but these parameters need not be exactly equal to the corresponding values, but may approximate the corresponding values within acceptable error tolerances or design constraints. The order of the above steps is not limited to the above, and may be varied or rearranged depending on the desired design, unless specifically stated or steps that must occur in sequence. The above embodiments may be used in combination with other embodiments or based on design and reliability considerations, that is, the technical features in different embodiments may be freely combined to form more embodiments.

The processes or methods depicted in the preceding figures may be by hardware (eg, circuitry, dedicated logic, etc.), firmware, software (eg, software carried on a non-transitory computer readable medium), or both. The combined processing logic is executed. Although the processes or methods have been described above in some order, it should be understood that certain operations described can be performed in a different order. Moreover, some operations may be performed in parallel rather than sequentially.

In summary, the present invention can realize high-efficiency compression and decompression of a large-scale neural network model, thereby greatly reducing the storage space and transmission pressure of the neural network model, thereby adapting to the trend of expanding the size of the neural network in the era of big data, and can be applied to the nerve. Various fields of network data have strong promotion and application value.

The specific embodiments of the present invention have been described in detail, and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

A compression device for neural network data, comprising:

a model conversion module (120) for converting neural network data into video-like data;

The data encoding module (131) is connected to the model conversion module (120) for encoding the video data in a video encoding manner to obtain a compression result.
The compression device according to claim 1, wherein the video-like data refers to that after the conversion by the model conversion module, each of the original neural network data is converted into a series of integer values within a preset range, corresponding to In the representation of one pixel, these integers together constitute the corresponding video data.
The compression apparatus according to claim 2, wherein the model conversion module (120) converts the neural network data into video-like data in one of two ways:

The first way: For neural network data with a data range of [-b, a], the model transformation module (120) operates as follows:

Where I is an integer in the interval [0, 255], that is, a representation of one pixel; w is the true data value of the neural network data in the range [-b, a];

The second way: for convolutional neural network data, the model conversion module (120) converts the weights and offsets of the hidden layer neurons corresponding to each feature map in the convolutional neural network data, and converts the weights The data obtained by the offset conversion is integrated to obtain data of the corresponding video frame, and the video data of the video frame is obtained by combining the weights of the hidden layer neurons corresponding to the plurality of feature maps and the data of the similar video frames obtained by the offset.
The compression device according to claim 1, wherein the data encoding module (131) comprises:

An encoding sub-module for encoding the video data in a manner of video encoding to obtain a data encoding result;

The integration sub-module is used to integrate the data encoding result and the encoding process information to obtain a compression result.
The compression device according to claim 4, wherein said coding sub-module include:

a prediction unit (130a) for predictive coding using correlation between adjacent data of video-like data;

a transform unit (130b), configured to perform orthogonal transform coding on the video-like data processed by the prediction unit to compress the data;

a quantization unit (130c) for performing quantization coding on the video-like data processed by the transformation unit, and reducing the coding length of the data without reducing the data quality;

The entropy coding unit (130d) is configured to perform rate compression coding on the video-like data processed by the quantization unit by using statistical characteristics of the data to reduce data redundancy.
The compression device according to claim 4, wherein the encoding sub-module comprises:

a depth autoencoder unit (130) for further encoding the video-like data output by the model conversion module, and using the hidden layer output as the encoding result;

The depth autoencoder unit (130) trains the video-like data as a training input and an ideal output by minimizing the reconstruction error, so that the output becomes substantially the same data as the input-type video data.
The compression device according to any one of claims 1 to 6, further comprising:

a data cache module (140) for buffering neural network data;

The controller module (110) is coupled to the data cache module (140), the model conversion module (120), and the data encoding module (131) for transmitting control commands to perform the following operations:

Sending a data read command to the data cache module (140), requesting the neural network data to be sent to the outside world, and buffering the neural network data;

Transmitting a data read instruction to the model conversion module (120) to read the neural network data from the data cache module (140);

Transmitting a data conversion instruction to the model conversion module (120) to convert the read neural network data into video-like data;

Sending a data read instruction to the data cache module (140) to request the class video data to the model conversion module (120), and buffering;

Sending a data read command to the data encoding module (131) to enable it from the data cache module (140) reading class video data;

Transmitting, by the data encoding module (131), a data encoding instruction, where the encoding instruction includes information of the encoding mode, and encoding the unit-type video data corresponding to the encoding mode to obtain a data encoding result;

Sending an integration instruction to the data encoding module (131), so that the data encoding result and the encoding process information are integrated to obtain a compression result;

A data cache instruction is sent to the data cache module (140) to obtain a compression result from the data encoding module (131), and the compression result is cached.
A decompression device for neural network data, comprising:

a data decoding module (132), configured to obtain a compression result, and decode the compression result by using a video decoding manner corresponding to the compression result;

The model conversion module (120) is coupled to the data decoding module (132) for restoring the decoded video-like data to neural network data.
The decompression device of claim 8, wherein the data decoding module (132) comprises:

De-integrating sub-module for de-integrating the compression result to obtain data encoding result and encoding process information;

The decoding submodule is configured to extract coding mode information from the coding process information, and decode the data coding result by using a decoding manner corresponding to the coding mode information to obtain video-like data.
The decompression apparatus according to claim 8, wherein the model conversion module restores the decoded video-like data to neural network data in one of two ways:

The first way: For neural network data with data range [-b, a], the model conversion module (120) operates according to the following formula to obtain the true data values of the neural network data:

Where w is the true data value of the neural network data in the range [-b, a], and I is the video-like data, which is an integer in the interval [0, 255];

The second way: for convolutional neural network data, the model transformation module (120) will classify The data of the corresponding video frame in the frequency data is converted, and each frame is converted into a weight and offset of a hidden layer neuron corresponding to a feature map of the convolutional neural network, and the data converted by each frame is integrated to obtain a convolution The weights and offsets of the hidden layer neurons corresponding to each feature map of the neural network.
The decompression device according to claim 8, further comprising:

a data cache module (140) for buffering the compression result;

The controller module (110) is coupled to the model conversion module (120), the data decoding module (132), and the data cache module (140) for issuing control commands to the three to perform the following operations:

Sending a data read instruction to the data cache module (140), requesting the external result to be compressed, and buffering the compressed result;

Sending a data read instruction to the data decoding module (132) to read the compressed result from the data cache module (140);

Sending a de-integration instruction to the data decoding module (132) to decode the encoding process information and the data compression result from the compression result;

Transmitting a data read instruction to the data decoding module (132), and reading the encoding process information from the data decoding module 132;

Selecting a decoding instruction according to the encoding process information;

Sending a decoding instruction to the data encoding and decoding module (132), so that the data compression result in the compression result is decompressed to obtain video-like data;

Sending a data read instruction to the data buffer module (140) to read the video-like data from the data decoding module (132) and buffering;

Sending a data read instruction to the model conversion module (120) to read the video-like data from the data cache module (140);

A data conversion instruction is sent to the model conversion module (120) to convert the video-like data into neural network data.
A system for compressing/decompressing neural network data, comprising:

a compression device, the compression device of claim 7;

a decompression device, the decompression device of claim 11;

The compression device and the decompression device share a data cache module (140), a controller module (110), and a model conversion module (120).