CN111275184B - Method, system, device and storage medium for realizing neural network compression - Google Patents

Method, system, device and storage medium for realizing neural network compression Download PDF

Info

Publication number
CN111275184B
CN111275184B CN202010039749.2A CN202010039749A CN111275184B CN 111275184 B CN111275184 B CN 111275184B CN 202010039749 A CN202010039749 A CN 202010039749A CN 111275184 B CN111275184 B CN 111275184B
Authority
CN
China
Prior art keywords
weight
data
value
neural network
weight data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010039749.2A
Other languages
Chinese (zh)
Other versions
CN111275184A (en
Inventor
陈弟虎
萧嘉乐
粟涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010039749.2A priority Critical patent/CN111275184B/en
Publication of CN111275184A publication Critical patent/CN111275184A/en
Application granted granted Critical
Publication of CN111275184B publication Critical patent/CN111275184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method, a system, a device and a storage medium for realizing neural network compression, wherein the method comprises the following steps: acquiring weight data of a neural network; training the weight data by using a preset local variance as a constraint condition to obtain new weight data; detecting whether the recognition precision of the neural network based on the new weight data meets a preset requirement, and if so, executing the step S4; otherwise, return to step S2; converting a new representation form of the weight data according to a preset mode to realize compression of the weight data; detecting whether the compression ratio of the weight data meets the preset requirement or not, and if so, completing the compression step; otherwise, the process returns to step S2. The invention not only does not lose the recognition precision of the neural network, but also realizes the compression of the weight data by carrying out the form conversion on the weight data of the neural network, thereby improving the calculation performance of the accelerator and being widely applied to the data processing technology of the neural network.

Description

Method, system, device and storage medium for realizing neural network compression
Technical Field
The present invention relates to neural network data processing technologies, and in particular, to a method, a system, an apparatus, and a storage medium for implementing neural network compression.
Background
A convolutional neural network is widely used in various fields such as image classification and target recognition as one of deep neural networks because of its superior recognition accuracy compared to a conventional empirical model-based recognition algorithm. It is noted that most of the convolutional neural network is parallel convolutional computation, so in order to improve the energy efficiency ratio of the computation, it is necessary to design a hardware accelerator specifically for accelerating the convolutional neural network computation. While hardware accelerators need to face two problems: how to increase the parallelism of the computation and how to increase the effective memory bandwidth of the accelerator.
In the face of the latter problem, both academic and industrial fields have been studied, and the mainstream solutions can be divided into three types: pruning of the network, quantization of the network, compression of the network data. The pruning of the network refers to observing the contribution degree of each item to the final result in the training process and observing whether redundant zero values exist in the calculation process, and then deleting the redundant values in the network to achieve the slimming of the network. The quantization of the network refers to representing the characteristic value and the weight value required by convolution by using a low-precision data representation mode, namely 16-bit and 8-bit fixed point data, and then retraining to achieve the purpose of reducing the whole bandwidth occupation of the network weight value and the characteristic value. The compression of the network data refers to compressing the weight in the network by referring to the practice in audio and video coding and decoding, for example, compressing in the form of a huffman code.
The three solutions of the above mainstream for increasing the effective bandwidth of the accelerator have their limitations. The pruning of the network can make the network sparse, and if the pruning is applied to a general hardware accelerator computing architecture, the situation that the utilization rate of a computing unit is not high occurs, and the energy efficiency ratio is improved to a limited extent. In the mainstream quantization strategy mode, the minimum bit width of the quantization is about 8 bits on the premise of keeping the precision not to be reduced. The variable length codec needs to add a codec module at a hardware end, and the codec may cause the data to lose regularity relative to an original calculation mode, and a situation of data non-alignment occurs, which may cause an actual calculation process to fail to use a theoretical bandwidth stored on a chip.
Disclosure of Invention
In order to solve one of the above technical problems, an object of the present invention is to provide a method, a system, an apparatus, and a storage medium for implementing neural network compression, which can effectively compress weights of a neural network without losing recognition accuracy.
The first technical scheme adopted by the invention is as follows:
a method of implementing neural network compression, comprising the steps of:
s1, acquiring weight data of the neural network;
s2, training the weight data by using a preset local variance as a constraint condition to obtain new weight data;
s3, detecting whether the recognition precision of the neural network based on the new weight data meets the preset requirement, and if so, executing the step S4; otherwise, return to step S2;
s4, converting the representation form of the new weight data according to a preset mode, and compressing the weight data;
s5, detecting whether the compression ratio of the weight data meets the preset requirement, and if so, completing the compression step; otherwise, the process returns to step S2.
Further, the weight data is data quantized by fixed-point data, and the step S4 specifically includes:
converting the new weight data into a representation form of local mean data plus difference data;
wherein, a local mean value corresponds to a plurality of weights, and a difference value corresponds to a weight.
Further, the preset local variance is calculated by the following formula:
Figure BDA0002367305470000021
wherein Num represents the number of local means, and Y represents the bit width of the difference.
Further, the compression rate of the weight data is obtained by calculating in the following way:
and calculating to obtain the compression ratio by combining the dimension local size value of the neural network, the bit width of the difference value, the bit width of the local mean value and a first preset formula.
Further, the first preset formula is as follows:
Figure BDA0002367305470000022
wherein, the N ispart、Mpart、K2partAnd K1partThe local size values in four weight dimensions are obtained, Y represents the bit width of the difference, and X represents the bit width of the weight in the weight data.
Further, the method also comprises the step of designing the on-chip storage module, which specifically comprises the following steps:
determining a mean value address generator and a mean value storage module according to the local mean value data;
determining a difference address generator and a difference storage module according to the difference data;
and determining a configurable shifter according to the bit width of the weight value and the bit width of the difference value, wherein the shifter is used for restoring the compressed bit width to the uncompressed bit width.
The second technical scheme adopted by the invention is as follows:
a system for implementing neural network compression, comprising:
the data acquisition module is used for acquiring weight data of the neural network;
the weight training module is used for training the weight data by adopting a preset local variance as a constraint condition to obtain new weight data;
the precision detection module is used for detecting whether the recognition precision of the neural network based on the new weight data meets the preset requirement or not, and if so, jumping to the form conversion module; otherwise, returning to the weight value training module;
the form conversion module is used for converting the representation form of the new weight data according to a preset mode and realizing the compression of the weight data;
the compression ratio detection module is used for detecting whether the compression ratio of the weight data meets the preset requirement or not, and if so, completing the compression step; otherwise, returning to the weight value training module.
Further, still include the on-chip storage design module, include:
the local mean value addressing unit is used for determining a mean value address generator and a mean value storage module according to the local mean value data;
the searching addressing unit is used for determining a difference value address generator and a difference value storage module according to the difference value data;
and the shifter is used for recovering the compressed bit width into the uncompressed bit width.
The third technical scheme adopted by the invention is as follows:
an apparatus for implementing neural network compression, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method described above.
The fourth technical scheme adopted by the invention is as follows:
a storage medium having stored therein processor-executable instructions for performing the method as described above when executed by a processor.
The invention has the beneficial effects that: the invention not only does not lose the recognition precision of the neural network, but also realizes the compression of the weight data by carrying out the form conversion on the weight data of the neural network, thereby improving the calculation performance of the accelerator and being suitable for hardware accelerators sensitive to power consumption and performance.
Drawings
FIG. 1 is a flow diagram of steps in a method of implementing neural network compression in an embodiment;
FIG. 2 is a diagram illustrating a standard convolution process in an embodiment;
FIG. 3 is a diagram illustrating weight splitting after variance constraint in the embodiment
FIG. 4 is a diagram illustrating the transmission of weights to a computing unit according to an embodiment;
FIG. 5 is a schematic diagram of the operation of the configurable shifter in an embodiment;
FIG. 6 is a block diagram of a system for implementing neural network compression according to an embodiment.
Detailed Description
As shown in fig. 1, the present embodiment provides a method for implementing neural network compression, including the following steps:
s101, weight data of the neural network are obtained.
The weight data in this embodiment is data quantized by fixed-point data, and may be specifically determined as a quantization result of X bits, such as 8 bits or 16 bits. The neural network can be a convolutional neural network or a common deep network or the like.
S102, training the weight data by using a preset local variance as a constraint condition to obtain new weight data.
And adding a constraint condition of local variance in the weight training process according to the actual compression ratio requirement, so that the weight with smaller local difference can be obtained after training.
The preset local variance may be specifically calculated by:
consider a standard convolution process, as shown in FIG. 2. And performing convolution calculation on the input features with the row number W, the column number D and the channel number M and convolution kernels with kernels of N channels K multiplied by K to obtain output features with the size of W multiplied by D multiplied by N. Then the maximum space for local variance constraints on the weights is K x M x N and the constraints can be done in K, K, M and N four weight dimensions. Considering that the bit width of the weight data input from the beginning is X bits, and the bit width of the difference data after the target compression is Y bits. Then the statistical maximum variance S in the presence of the Y-bit data difference can be found from the variance calculation2Is (where Num is the number of local internal weights, related to the selected local size):
Figure BDA0002367305470000041
maximum variance S of theory2When the method is applied to the weight training process, the variable of Num is noted in the maximum variance, and the variable and the dimension and range of local selectionThe maximum value is K × K × M × N, depending on the size. And the whole is completely divided by considering the local, namely when selecting the local range, the range size selected in each dimension is evenly divided by the maximum value in the dimension. The purpose of this is to guarantee that there is no redundancy of the local partitions while all weights are variance constrained.
The weight value can be trained by using the existing training method, which is not the key point of this embodiment and is not repeated here.
S103, detecting whether the recognition precision of the neural network based on the new weight data meets a preset requirement, and if so, executing a step S104; otherwise, the process returns to step S102.
Applying the new weight data to a neural network, operating the neural network, and detecting whether the identification precision of the neural network meets a preset requirement, wherein the identification precision can be the precision of identifying the picture, and if so, the weight data is in accordance with the requirement; if not, the procedure returns to step S102 to continue training to meet the preset requirement.
And S104, converting the new representation form of the weight data according to a preset mode, and compressing the weight data.
Specifically, step S104 specifically includes: and converting the new weight data into a representation form of local mean data and difference data, wherein one local mean corresponds to a plurality of weights, and one difference corresponds to one weight.
As shown in fig. 3, if the maximum variance is applied to the training process, so that the original local unordered weight is limited within a range, the fixed-point representation of the weight from the original X bits may be modified into a representation of the local mean plus the difference, where the local mean is represented by X bits, and the difference is represented by Y bits, so that the original weight may be decomposed into two parts, one part is the original-size partial weight represented by the difference, and the other part is the local mean weight that needs to be indexed through address conversion.
Fig. 3 illustrates an example of weight splitting for a local size of 2 × 2, and it is easy to find that the address index of the local average value is related to the local size. If the four-dimensional addressing is converted into one-dimensional addressing representation, the address index of the local mean value can be obtained as follows:
Figure BDA0002367305470000051
wherein N ispart、Mpart、K2partAnd K1partIs the local size value in each weight dimension.
S105, detecting whether the compression rate of the weight data meets a preset requirement, and if so, completing the compression step; otherwise, the process returns to step S102.
Specifically, step S105 specifically includes: and calculating to obtain the compression ratio by combining the dimension local size value of the neural network, the bit width of the difference value, the bit width of the local mean value and a first preset formula.
According to the above description, the weight data represented by the original X bits can be decomposed into the remaining difference between the local average represented by the small X bits and the Y bits of the original size, so as to implement the compression of the weight, and the compression ratio a can be obtained by comparing the storage space occupied by the weight before and after the compression:
Figure BDA0002367305470000052
wherein, the N ispart、Mpart、K2partAnd K1partThe local size values in four weight dimensions are obtained, Y represents the bit width of the difference, and X represents the bit width of the weight in the weight data. It can be seen from the formula that the compression ratio is mainly determined by the original X bits and the Y bits of the difference, and when Y is half of the X value, the compression ratio can be approximately 50%.
Further as an optional implementation manner, after step S105, step S106 of designing an on-chip storage module is further included, and step S106 specifically includes steps S1061, S1062, and S1063:
s1061, determining a mean value address generator and a mean value storage module according to the local mean value data;
s1062, determining a difference address generator and a difference storage module according to the difference data;
s1063, determining a configurable shifter according to the bit width of the weight and the bit width of the difference, wherein the shifter is used for recovering the compressed bit width to the bit width which is not compressed.
In the convolution calculation process, the weight needs to be buffered in the storage on the chip so as to reduce the number of times of accessing the off-chip storage in the calculation process and increase the calculation efficiency, and then for the compressed weight, a special on-chip storage structure needs to be designed when a hardware accelerator is designed, so that the calculation efficiency is not influenced by the on-chip decompression process, and the input required by the calculation array is not limited by the output bandwidth of the on-chip storage.
The on-chip storage of the weights may be as shown in fig. 4. The address generator is used to generate the address information needed to address the mean and difference memory blocks, it being noted that the address generator for the mean memory block is configurable, depending on the local size of the dimensions used and the bit width before and after compaction, as discussed above in relation to weight splitting. The weight storage block is mainly divided into a bit difference value storage block and a mean value storage block and is used for storing the weight after disassembly. After the weight values are read out by the address from the address generator, the difference weight values need to be restored to normal bit width representation through a configurable shifter, and then bit-wise addition is carried out on the difference weight values and the corresponding average weight values, so that the final weight values output to the calculation array can be obtained, and the process is decompression. The configurable shifter here is also related to the parameters of the compression.
The configurable shifter operates as shown in fig. 5. The method comprises the following steps of calculating the difference value of a bit number to be amplified, and calculating the difference value of the bit number to be amplified and the average value of the bit number to be amplified. Wherein X is Y + a + b. The operation mode is that the difference value of the Y bit is amplified by a times according to the parameters, namely, the a bit displacement is carried out, and meanwhile, the 0 or 1 is compensated on the right side; then, the complete weight of X bit is formed by adding 0 or 1 of b bit in front of the obtained data. The complement of a and b is determined by the positive and negative of the difference, when the difference is positive complement 0, the difference is negative complement 1.
The method of this embodiment can compress convolutional layers of any size, and can also compress a complete network (formed by combining multiple convolutional layers). The preset local variance can be calculated by adopting other formulas or a variance constraint replacement mode, so that the effects of not losing the recognition precision of the neural network and realizing data compression can be achieved.
In summary, compared with the existing method, the method of the embodiment at least has the following beneficial effects:
(1) and on the premise of not losing the identification precision of the neural network, compressing the weight of the convolutional neural network so as to be suitable for the hardware accelerator sensitive to power consumption and performance.
(2) The variance constraint and the compression process can be completely configurable, can be compressed according to the requirements of compression rate and precision, and can be adapted to the convolutional layer with any size.
(3) And the regular and aligned compression is realized while the weight is compressed, so that the waste of input bandwidth caused by the decompression of the weight in the actual weight reading process of the hardware accelerator is avoided.
(4) The embodiment realizes a method for really and effectively reducing the weight in the convolutional neural network, thereby realizing the improvement of the performance of the hardware accelerator calculation.
As shown in fig. 6, the present embodiment provides a system for implementing neural network compression, including:
the data acquisition module is used for acquiring weight data of the neural network;
the weight training module is used for training the weight data by adopting a preset local variance as a constraint condition to obtain new weight data;
the precision detection module is used for detecting whether the recognition precision of the neural network based on the new weight data meets the preset requirement or not, and if so, jumping to the form conversion module; otherwise, returning to the weight value training module;
the form conversion module is used for converting the representation form of the new weight data according to a preset mode and realizing the compression of the weight data;
the compression ratio detection module is used for detecting whether the compression ratio of the weight data meets the preset requirement or not, and if so, completing the compression step; otherwise, returning to the weight value training module.
As a further optional implementation, the system further includes an on-chip memory design module, including:
the local mean value addressing unit is used for determining a mean value address generator and a mean value storage module according to the local mean value data;
the searching addressing unit is used for determining a difference value address generator and a difference value storage module according to the difference value data;
and the shifter is used for recovering the compressed bit width into the uncompressed bit width.
The system for realizing neural network compression of the embodiment can execute the method for realizing neural network compression provided by the method embodiment of the invention, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.
The embodiment also provides a device for realizing neural network compression, which comprises:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method described above.
The device for realizing neural network compression of the embodiment can execute the method for realizing neural network compression provided by the method embodiment of the invention, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.
The present embodiments also provide a storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the method as described above.
The storage medium of this embodiment may execute the method for implementing neural network compression provided by the method embodiment of the present invention, may execute any combination of the implementation steps of the method embodiment, and has corresponding functions and advantages of the method.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A method of implementing neural network compression, comprising the steps of:
s1, acquiring weight data of the neural network;
s2, training the weight data by using the maximum variance as a constraint condition to obtain new weight data;
s3, detecting whether the recognition precision of the neural network based on the new weight data meets the preset requirement, and if so, executing the step S4; otherwise, return to step S2;
s4, converting the representation form of the new weight data according to a preset mode, and compressing the weight data;
s5, detecting whether the compression ratio of the weight data meets the preset requirement, and if so, completing the compression step; otherwise, return to step S2;
the weight data is quantized data of fixed point data, and the step S4 specifically includes:
converting the new weight data into a representation form of local mean data plus difference data;
wherein, a local mean value corresponds to a plurality of weight values, a difference value corresponds to a weight value, and the maximum variance is applied to the training process, so that the original local unordered weight value is limited in a range, the weight value can be modified from the original fixed-point representation mode of X bits into the representation mode of the local mean value plus the difference value, wherein the local mean value is represented by X-bit data, and the difference value is represented by Y bits, the original weight value can be decomposed into two parts, one part is the partial weight value of the original size represented by the difference value, and the other part is the local mean weight value which needs to be indexed through address conversion;
the maximum variance is calculated by the following formula:
Figure FDA0003527529010000011
wherein, Num represents the number of local internal weights, and Y represents the bit width of the difference.
2. The method of claim 1, wherein the compression rate of the weight data is calculated by:
calculating to obtain a compression ratio by combining the dimension local size value of the neural network, the bit width of the difference value, the bit width of the weight in the weight data and a first preset formula;
the first preset formula is as follows:
Figure FDA0003527529010000012
wherein, the N ispart、Mpart、K2partAnd K1partThe local size values in four weight dimensions are represented, Y represents the bit width of the difference, and X represents the bit width of the weight in the weight data input at first.
3. The method of claim 1, further comprising a step of designing an on-chip storage module, specifically:
determining a mean value address generator and a mean value storage module according to the local mean value data;
determining a difference address generator and a difference storage module according to the difference data;
and determining a configurable shifter according to the bit width of the weight value and the bit width of the difference value, wherein the shifter is used for restoring the compressed bit width to the uncompressed bit width.
4. A system for implementing neural network compression, comprising:
the data acquisition module is used for acquiring weight data of the neural network;
the weight training module is used for training the weight data by adopting the maximum variance as a constraint condition to obtain new weight data;
the precision detection module is used for detecting whether the recognition precision of the neural network based on the new weight data meets the preset requirement or not, and if so, jumping to the form conversion module; otherwise, returning to the weight value training module;
the form conversion module is used for converting the representation form of the new weight data according to a preset mode and realizing the compression of the weight data;
the compression ratio detection module is used for detecting whether the compression ratio of the weight data meets the preset requirement or not, and if so, completing the compression step; otherwise, returning to the weight value training module;
the weight data is data quantized by fixed point data, and the form conversion module is specifically configured to:
converting the new weight data into a representation form of local mean data plus difference data;
wherein, a local mean value corresponds to a plurality of weight values, a difference value corresponds to a weight value, and the maximum variance is applied to the training process, so that the original local unordered weight value is limited in a range, the weight value can be modified from the original fixed-point representation mode of X bits into the representation mode of the local mean value plus the difference value, wherein the local mean value is represented by X-bit data, and the difference value is represented by Y bits, the original weight value can be decomposed into two parts, one part is the partial weight value of the original size represented by the difference value, and the other part is the local mean weight value which needs to be indexed through address conversion;
the maximum variance is calculated by the following formula:
Figure FDA0003527529010000031
wherein, Num represents the number of local internal weights, and Y represents the bit width of the difference.
5. The system of claim 4, further comprising an on-chip storage design module comprising:
the local mean value addressing unit is used for determining a mean value address generator and a mean value storage module according to the local mean value data;
the searching addressing unit is used for determining a difference value address generator and a difference value storage module according to the difference value data;
and the shifter is used for recovering the compressed bit width into the uncompressed bit width.
6. An apparatus for implementing neural network compression, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement a method of implementing neural network compression as claimed in any one of claims 1 to 3.
7. A storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the method of any one of claims 1-3.
CN202010039749.2A 2020-01-15 2020-01-15 Method, system, device and storage medium for realizing neural network compression Active CN111275184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010039749.2A CN111275184B (en) 2020-01-15 2020-01-15 Method, system, device and storage medium for realizing neural network compression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010039749.2A CN111275184B (en) 2020-01-15 2020-01-15 Method, system, device and storage medium for realizing neural network compression

Publications (2)

Publication Number Publication Date
CN111275184A CN111275184A (en) 2020-06-12
CN111275184B true CN111275184B (en) 2022-05-03

Family

ID=70998717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010039749.2A Active CN111275184B (en) 2020-01-15 2020-01-15 Method, system, device and storage medium for realizing neural network compression

Country Status (1)

Country Link
CN (1) CN111275184B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN109840589A (en) * 2019-01-25 2019-06-04 深兰人工智能芯片研究院(江苏)有限公司 A kind of method, apparatus and system running convolutional neural networks on FPGA
CN109993293A (en) * 2019-02-28 2019-07-09 中山大学 A kind of deep learning accelerator suitable for stack hourglass network
CN110555510A (en) * 2018-05-31 2019-12-10 耐能智慧股份有限公司 Method for compressing pre-trained deep neural network model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568237B2 (en) * 2018-05-10 2023-01-31 Samsung Electronics Co., Ltd. Electronic apparatus for compressing recurrent neural network and method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN110555510A (en) * 2018-05-31 2019-12-10 耐能智慧股份有限公司 Method for compressing pre-trained deep neural network model
CN109840589A (en) * 2019-01-25 2019-06-04 深兰人工智能芯片研究院(江苏)有限公司 A kind of method, apparatus and system running convolutional neural networks on FPGA
CN109993293A (en) * 2019-02-28 2019-07-09 中山大学 A kind of deep learning accelerator suitable for stack hourglass network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Adaptive Weight Compression for Memory-Efficient Neural Networks;Jong Hwan Ko 等;《https://dl.acm.org/doi/pdf/10.5555/3130379.3130424》;20171231;第199-204页 *
Cross-Entropy Pruning for Compressing Convolutional Neural Networks;Rongxin Bao 等;《Neural Computation》;20181231;第30卷(第11期);第3128–3149页 *
DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework;Shuochao Yao 等;《https://arxiv.org/pdf/1706.01215.pdf》;20171122;第1-14页 *
深度神经网络压缩与加速综述;纪荣嵘 等;《计算机研究与发展》;20181231;第55卷(第9期);第1871-1888页 *
融合群稀疏与排他性稀疏正则项的神经网络压缩情感分析方法;黄磊 等;《北京化工大学学报(自然科学版)》;20191231;第46卷(第21期);第103-112页 *

Also Published As

Publication number Publication date
CN111275184A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
US20230237332A1 (en) Learning compressible features
WO2023045204A1 (en) Method and system for generating finite state entropy coding table, medium, and device
US20180046895A1 (en) Device and method for implementing a sparse neural network
EP1891545B1 (en) Compressing language models with golomb coding
CN111832719A (en) Fixed point quantization convolution neural network accelerator calculation circuit
CN108416427A (en) Convolution kernel accumulates data flow, compressed encoding and deep learning algorithm
CN112702600B (en) Image coding and decoding neural network layered fixed-point method
CN114647399B (en) Low-energy-consumption high-precision approximate parallel fixed-width multiplication accumulation device
CN111507465A (en) Configurable convolutional neural network processor circuit
WO2019080670A1 (en) Gene sequencing data compression method and decompression method, system, and computer readable medium
CN114222129A (en) Image compression encoding method, image compression encoding device, computer equipment and storage medium
CN113741858A (en) In-memory multiply-add calculation method, device, chip and calculation equipment
CN112580805A (en) Method and device for quantizing neural network model
CN114328898A (en) Text abstract generating method and device, equipment, medium and product thereof
Barbarioli et al. Hierarchical residual encoding for multiresolution time series compression
TW202013261A (en) Arithmetic framework system and method for operating floating-to-fixed arithmetic framework
Fuketa et al. Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format
CN113902097A (en) Run-length coding accelerator and method for sparse CNN neural network model
CN111275184B (en) Method, system, device and storage medium for realizing neural network compression
Park et al. GRLC: Grid-based run-length compression for energy-efficient CNN accelerator
US20230325374A1 (en) Generation method and index condensation method of embedding table
KR102502162B1 (en) Apparatus and method for compressing feature map
Ascia et al. Improving inference latency and energy of network-on-chip based convolutional neural networks through weights compression
Shaila et al. Block encoding of color histogram for content based image retrieval applications
CN112734021A (en) Neural network acceleration method based on bit sparse calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant