CN113705784A - Neural network weight coding method based on matrix sharing and hardware system - Google Patents

Neural network weight coding method based on matrix sharing and hardware system Download PDF

Info

Publication number
CN113705784A
CN113705784A CN202110964903.1A CN202110964903A CN113705784A CN 113705784 A CN113705784 A CN 113705784A CN 202110964903 A CN202110964903 A CN 202110964903A CN 113705784 A CN113705784 A CN 113705784A
Authority
CN
China
Prior art keywords
matrix
neural network
coding
weight
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110964903.1A
Other languages
Chinese (zh)
Inventor
虞致国
孙一
顾晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202110964903.1A priority Critical patent/CN113705784A/en
Publication of CN113705784A publication Critical patent/CN113705784A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a neural network weight coding method based on a shared matrix and a hardware system, and belongs to the technical field of neural network algorithm hardware implementation. The method aims at the problems of high demand on the number of storage and calculation devices, large size of a storage and calculation array and high cost of the current weight coding method, and realizes pairwise coding of weight parameters in a neural network convolution kernel by using a shared matrix mode. Meanwhile, the coding method of the invention has good compatibility and is not limited by the type of a nonvolatile memory device. Compared with the existing coding method, the coding method based on the shared matrix is more suitable for large-scale and ultra-large-scale computing architectures.

Description

Neural network weight coding method based on matrix sharing and hardware system
Technical Field
The invention relates to a neural network weight coding method based on matrix sharing and a hardware system, and belongs to the technical field of neural network algorithm hardware implementation.
Background
With the development of deep learning, neural networks are widely applied in various fields such as image recognition, voice recognition, natural language processing and the like. However, as network architectures become more complex, the amount of data transmission and computation in neural networks increases dramatically. In addition, computation and data transmission bring about great power consumption, so that neural network applications are difficult to deploy on hardware devices.
In recent years, a storage and computation integrated architecture for neural network computation has been widely focused and researched, and the basic idea is to store weight mapping into a storage and computation array, so that some simple but huge data volume logic computation functions are put into a memory to reduce the data transmission volume and the transmission distance between the memory and a processor.
The operation principle of a classical non-volatile memory array is shown in fig. 1. To calculate two multiplication matrices
Figure BDA0003221333800000011
Storing the values of the weight matrix W in a memory array in the form of conductance, and storing the matrix in the form of conductance
Figure BDA0003221333800000012
The value of (A) is input to the input end of the storage and calculation array in a voltage mode, and the operation result in a current mode is obtained from the output end of the storage and calculation array, so that the whole storage and calculation integration process is completed.
The weight parameters in the neural network are usually divided into positive and negative values, in order to show that the weight parameters in the storage and calculation array usually adopt the expression mode of positive and negative matrixes, the encoded values are respectively stored in the positive and negative matrixes, then the output results of the two matrixes are subtracted, and the value of the weight parameter is expressed by the difference value of the matrixes. As shown in fig. 2, the current encoding method processes each weight parameter independently, mapping into positive and negative matrices, i.e. at least two computing devices are required for each weight represented. In the application scenario of individual pursuit of high precision, a weight is represented by a plurality of positive matrix devices and a plurality of negative matrix devices, and the device overhead of each weight parameter is even more.
Nowadays, a simple neural network is difficult to meet task requirements, so that the structure of the neural network is increasingly complex, the scale is continuously enlarged, the application scene of the storage and calculation integrated framework becomes more and more extensive and complex, and realization of the large-scale and ultra-large-scale neural network in the hardware field is a necessary way for development of the storage and calculation integrated framework. As the depth and parameters of neural networks increase, the number of devices and array sizes required in hardware implementations inevitably become increasingly large. The large array size causes a series of problems including cost, chip area, parasitic parameters and testability, and the difficulty of hardware implementation is multiplied with the increase of the number of devices.
Disclosure of Invention
The invention provides a neural network weight parameter coding method based on matrix sharing and a hardware system, aiming at the problems of large quantity of computing devices, large size of a computing array and high cost in the implementation of neural network hardware in the conventional neural network weight coding method.
The invention provides a neural network weight parameter coding method based on matrix sharing, which is characterized by comprising the following steps:
the method comprises the following steps: performing fixed-point processing on the weight parameters in the trained neural network convolution kernels;
step two: grouping convolution kernels in the neural network, and grouping every two convolution kernels when the number of the convolution kernels is an even number; when the number of the convolution kernels is odd, randomly taking out one convolution kernel, and grouping the residual convolution kernels pairwise;
step three: calculating an actual encoded value, comprising:
independently coding the weight parameters in one convolution kernel arbitrarily taken out in the step two, outputting two actual coding values after each weight parameter is coded, and mapping and storing the two actual coding values to a positive matrix and a negative matrix;
for the two-by-two grouped convolution kernels in step two: two weight parameters at the same position in the same group of convolution kernels are coded pairwise, and each two weight parameters are coded and then output three actual coded values, and are mapped and stored to three matrixes;
step four: and splicing the matrixes storing the actual coding values to obtain a coded neural network weight matrix.
Optionally, using
Figure BDA0003221333800000021
Two weight parameters representing the same position of the same set of convolution kernels, pair
Figure BDA0003221333800000022
Two-by-two coding is carried out, and the coding result is mapped and stored to three matrixes which are characterized in that
Figure BDA0003221333800000023
The positive matrix is a matrix of positive ions,
Figure BDA0003221333800000024
positive matrix and shared negative matrix, the process of pairwise coding comprises:
s1: inputting weight parameters
Figure BDA0003221333800000025
S2: judgment of
Figure BDA0003221333800000026
When in use
Figure BDA0003221333800000027
When it is used, order
Figure BDA0003221333800000028
When in use
Figure BDA0003221333800000029
When it is used, order
Figure BDA00032213338000000210
S3:
Figure BDA00032213338000000211
S4: judgment of
Figure BDA00032213338000000212
When in use
Figure BDA00032213338000000213
When the temperature of the water is higher than the set temperature,
Figure BDA00032213338000000214
keeping the original value unchanged;
when in use
Figure BDA00032213338000000215
When it is used, order
Figure BDA00032213338000000216
S5: outputting the data
Figure BDA00032213338000000217
Wherein,
Figure BDA00032213338000000218
as a weight parameter
Figure BDA00032213338000000219
Actual coded values after coding;
weight parameter
Figure BDA00032213338000000220
By two coded values
Figure BDA00032213338000000221
And
Figure BDA00032213338000000222
determining, weighting, parameters
Figure BDA00032213338000000223
By two coded values
Figure BDA00032213338000000224
And
Figure BDA00032213338000000225
are determined jointly, wherein
Figure BDA00032213338000000226
Is a shared negative encoded value;
Figure BDA00032213338000000227
mapping is stored to
Figure BDA00032213338000000228
The positive matrix is a matrix of positive ions,
Figure BDA00032213338000000229
mapping is stored to
Figure BDA00032213338000000230
The positive matrix is a matrix of positive ions,
Figure BDA00032213338000000231
the mapping is stored to the shared negative matrix.
Optionally, using
Figure BDA00032213338000000232
Two weight parameters representing the same position of the same set of convolution kernels, pair
Figure BDA00032213338000000233
Two-by-two coding is carried out, and the coding result is mapped and stored to three matrixes which are characterized in that
Figure BDA0003221333800000031
A negative matrix of the matrix is formed,
Figure BDA0003221333800000032
negative matrix and shared positive matrix, the process of pairwise coding comprises:
s1: inputting weight parameters
Figure BDA0003221333800000033
S2: judgment of
Figure BDA0003221333800000034
When in use
Figure BDA0003221333800000035
Order to
Figure BDA0003221333800000036
When in use
Figure BDA0003221333800000037
Order to
Figure BDA0003221333800000038
S3:
Figure BDA0003221333800000039
S4: judgment of
Figure BDA00032213338000000310
When in use
Figure BDA00032213338000000311
When the temperature of the water is higher than the set temperature,
Figure BDA00032213338000000312
keeping the original value unchanged;
when in use
Figure BDA00032213338000000313
When the temperature of the water is higher than the set temperature,
Figure BDA00032213338000000314
s5: outputting the data
Figure BDA00032213338000000315
Wherein,
Figure BDA00032213338000000316
as a weight parameter
Figure BDA00032213338000000317
Actual coded values after coding;
weight parameter
Figure BDA00032213338000000318
By two coded values
Figure BDA00032213338000000319
And
Figure BDA00032213338000000320
determining, weighting, parameters
Figure BDA00032213338000000321
By two coded values
Figure BDA00032213338000000322
And
Figure BDA00032213338000000323
are determined jointly, wherein
Figure BDA00032213338000000324
To share a positive encoded value;
Figure BDA00032213338000000325
mapping is stored to
Figure BDA00032213338000000326
A negative matrix of the matrix is formed,
Figure BDA00032213338000000327
mapping is stored to
Figure BDA00032213338000000328
A negative matrix of the matrix is formed,
Figure BDA00032213338000000329
the mapping is stored to the shared positive matrix.
A second object of the present invention is to provide a neural network chip, which includes an accumulation array composed of a plurality of accumulation devices, wherein each weight value in the neural network weight matrix is written into the corresponding accumulation device one by one, and the neural network weight matrix is obtained by using the neural network weight parameter coding method.
Optionally, the storage device is a non-volatile storage device.
Optionally, the precision of the storage device is 8 bits.
Optionally, the precision of the storage device is 4 bits.
A third object of the present invention is to provide a computing apparatus, comprising a memory and a processor, wherein the memory stores computer-executable instructions, and the instructions are characterized in that when executed by the processor, the method for encoding the neural network weight parameters is performed.
A fourth object of the present invention is to provide a neural network hardware system, comprising: the invention provides a neural network chip and/or a computing device.
The invention also provides an application of the neural network weight parameter coding method and/or the neural network chip and/or the computing device and/or the neural network hardware system in the technical field of neural networks.
The invention has the beneficial effects that:
when the hardware of the neural network is implemented, the convolution kernels in the neural network are grouped in pairs, and the weight parameters at the same positions of the convolution kernels are coded in pairs in a matrix sharing mode, and the two weight parameters are coded and then only need three storage devices to implement the calculation, so that the number of the storage devices required by the calculation array is reduced; meanwhile, the coding method is not limited by the type of a nonvolatile memory device, and has good compatibility. Compared with the existing method of independently coding each weight parameter and storing the coded value into 2 storage devices, the method only uses 1.5 storage devices at least on average for each weight parameter, and saves about 1/4 device overhead for the storage array.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of the operation of a non-volatile memory array.
Fig. 2 is a schematic diagram of a storage array structure of the current encoding method.
FIG. 3 is a schematic diagram of a shared device structure according to the present invention.
Fig. 4 shows a schematic diagram of the whole process of mapping from algorithm to storage array according to the present invention, wherein (a) is algorithm encoding process and (b) is process of writing encoded value into storage array.
Fig. 5 is a flow chart of an implementation of the encoding method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The coding method provided by the invention is oriented to the weight parameters in the neural network, in the practical application, the neural network is trained firstly to obtain the final trained neural network convolution kernel, and then the coding method provided by the invention is utilized to code the weight parameters in the convolution kernel.
The first embodiment is as follows:
the embodiment provides a neural network coding method based on matrix sharing, which is applied to a neural network, and the coding process comprises the following steps:
the method comprises the following steps: performing fixed-point processing on the weight parameters in the trained neural network convolution kernel, wherein the fixed-point processing method in the embodiment is a linear transformation mode;
table 1 shows a convolution kernel weight parameter matrix after the dotting processing in this embodiment;
TABLE 1 weight matrix after spotting
Figure BDA0003221333800000051
Step two: grouping the convolution kernels processed in the step one, and rearranging the convolution kernels, wherein the weight parameters are processed pairwise;
table 2 is the matrix rearranged after grouping;
TABLE 2 rearrangement of cloth matrix
Convolution kernel 1 rebinning 5 15 -7 -9
Convolution kernel 2 rearranged cloth 12 -3 -7 11
Step three: calculating an actual coding value; in this embodiment, a shared negative matrix is selected, and the rearrangement matrix is encoded according to the encoding flowchart of fig. 5;
and the weight parameters corresponding to the uplink and the downlink are processed in pairs, and the two share one storage device for primary coding to obtain the primary coding matrix of the table 3.
TABLE 3 Primary coding matrix
Convolution kernel 1 positive matrix 5 15 0 0
Shared negative matrix 0 0 7 9
Convolution kernel 2 positive matrix 12 -3 0 20
And judging the matrix of the initial coding, wherein the physical values in the actual storage and calculation array have no positive or negative value, when the existing coding value is negative, introducing an offset into the group of codes, and the coding values of the rest groups are not changed, so that the actual coding matrix of the table 4 is obtained.
TABLE 4 actual coding matrix
Convolution kernel 1 positive matrix 5 18 0 0
Shared negative matrix 0 3 7 9
Convolution kernel 2 positive matrix 12 0 0 20
Step four: and splicing the actual coding matrixes in the table 4 to obtain a coded neural network weight matrix.
In order to verify the effectiveness of the coding method and the characteristics of low device overhead, a series of experiments are carried out. In the experimental process, the coding method is applied to different neural networks, the MINIST hand-written digital data set is adopted for verification, and the accuracy of network inference after coding and the device overhead required by array storage in hardware implementation are observed.
A small-scale neural network is designed in the first experiment, and the small-scale neural network comprises 3 convolutional layers and 1 full-connection layer. In the first convolutional layer, the convolutional kernel size is 5 × 1 × 7, and no padding is added; in the second convolution layer, the convolution kernel size is 3 × 7, and filling is added; in the third convolution layer, the convolution kernel size is 3 × 7, and filling is added; the fourth layer is a full connection layer, the number of input neurons is 63, and the number of output neurons is 10.
Then, the coding method of the invention is applied to a neural network with deeper layers and larger scale. The new neural network has 8 layers, including 5 convolutional layers and 3 fully-connected layers. In the first convolution layer, the convolution kernel size is 7 × 1 × 16, and no padding is added; in the second convolution layer, the convolution kernel size is 5 × 16 × 32, and filling is added; in the third convolution layer, the convolution kernel size is 3 × 32 × 128, and padding is added; in the fourth convolution layer, the convolution kernel size is 3 × 128, and padding is added; in the fifth convolutional layer, the convolutional kernel size was 3 × 128 × 64, and padding was added. The last three layers are full-connection layers with classification function, and the number of neurons in each layer is respectively as follows: 1024, 512 and 10.
The results are summarized in Table 5 below. By applying the coding method in two types of neural networks, the effectiveness of the coding method can be effectively verified. Meanwhile, the transverse comparison of the two networks can show the proportional relation between the optimization effect of the coding method and the network scale.
TABLE 5 summary of the results
Network name Neural network 1 Neural network 2
Number of layers 5 8
Number of parameters 1,718 1,065,098
Rate of accuracy 96.71% 98.94%
Number of original computing devices 3,436 2,130,196
Existing number of computing devices 2,639 1,595,328
Saving overhead 23.2% 25.0%
According to the experimental results given in table 5, it can be seen that after the coding method of the present invention is applied, the target network still performs well on the calculation accuracy, and the validity of the coding method is verified. Meanwhile, the number of required storage devices before and after the coding method is applied can be compared, and the coding method has the characteristic of remarkably reducing device overhead. Therefore, the coding method of the invention can solve the problem that a large number of circuit devices need to be arranged in the implementation process of neural network hardware on the premise of ensuring the network precision, and effectively reduces the development cost and the design difficulty.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A neural network weight parameter coding method based on matrix sharing is characterized by comprising the following steps:
the method comprises the following steps: performing fixed-point processing on the weight parameters in the trained neural network convolution kernels;
step two: grouping convolution kernels in the neural network, and grouping every two convolution kernels when the number of the convolution kernels is an even number; when the number of the convolution kernels is odd, randomly taking out one convolution kernel, and grouping the residual convolution kernels pairwise;
step three: calculating an actual encoded value, comprising:
independently coding the weight parameters in one convolution kernel arbitrarily taken out in the step two, outputting two actual coding values after each weight parameter is coded, and mapping and storing the two actual coding values to a positive matrix and a negative matrix;
for the two-by-two grouped convolution kernels in step two: two weight parameters at the same position in the same group of convolution kernels are coded pairwise, and each two weight parameters are coded and then output three actual coded values, and are mapped and stored to three matrixes;
step four: and splicing the matrixes storing the actual coding values to obtain a coded neural network weight matrix.
2. The method of claim 1, using
Figure FDA0003221333790000011
Two weight parameters representing the same position of the same set of convolution kernels, pair
Figure FDA0003221333790000012
Two-by-two coding is carried out, and the coding result is mapped and stored to three matrixes which are characterized in that
Figure FDA0003221333790000013
The positive matrix is a matrix of positive ions,
Figure FDA0003221333790000014
positive matrix and shared negative matrix, the process of pairwise coding comprises:
s1: inputting weight parameters
Figure FDA0003221333790000015
S2: judgment of
Figure FDA0003221333790000016
When in use
Figure FDA0003221333790000017
When it is used, order
Figure FDA0003221333790000018
When in use
Figure FDA0003221333790000019
When it is used, order
Figure FDA00032213337900000110
S3:
Figure FDA00032213337900000111
S4: judgment of
Figure FDA00032213337900000112
When in use
Figure FDA00032213337900000113
When the temperature of the water is higher than the set temperature,
Figure FDA00032213337900000114
keeping the original value unchanged;
when in use
Figure FDA00032213337900000115
When it is used, order
Figure FDA00032213337900000116
S5: outputting the data
Figure FDA00032213337900000117
Wherein,
Figure FDA00032213337900000118
as a weight parameter
Figure FDA00032213337900000119
Actual coded values after coding;
weight parameter
Figure FDA00032213337900000120
By two coded values
Figure FDA00032213337900000121
And
Figure FDA00032213337900000122
determining, weighting, parameters
Figure FDA00032213337900000123
By two coded values
Figure FDA00032213337900000124
And
Figure FDA00032213337900000125
are determined jointly, wherein
Figure FDA00032213337900000126
Is a shared negative encoded value;
Figure FDA00032213337900000127
mapping is stored to
Figure FDA00032213337900000128
The positive matrix is a matrix of positive ions,
Figure FDA00032213337900000129
mapping is stored to
Figure FDA00032213337900000130
The positive matrix is a matrix of positive ions,
Figure FDA00032213337900000131
the mapping is stored to the shared negative matrix.
3. The method of claim 1, using
Figure FDA00032213337900000132
Two weight parameters representing the same position of the same set of convolution kernels, pair
Figure FDA00032213337900000133
Two-by-two coding is carried out, and the coding result is mapped and stored to three matrixes which are characterized in that
Figure FDA0003221333790000021
A negative matrix of the matrix is formed,
Figure FDA0003221333790000022
negative matrix and shared positive matrix, the process of pairwise coding comprises:
s1: inputting weight parameters
Figure FDA0003221333790000023
S2: judgment of
Figure FDA0003221333790000024
When in use
Figure FDA0003221333790000025
Order to
Figure FDA0003221333790000026
When in use
Figure FDA0003221333790000027
Order to
Figure FDA0003221333790000028
S3:
Figure FDA0003221333790000029
S4: judgment of
Figure FDA00032213337900000210
When in use
Figure FDA00032213337900000211
When the temperature of the water is higher than the set temperature,
Figure FDA00032213337900000212
keeping the original value unchanged;
when in use
Figure FDA00032213337900000213
When the temperature of the water is higher than the set temperature,
Figure FDA00032213337900000214
s5: outputting the data
Figure FDA00032213337900000215
Wherein,
Figure FDA00032213337900000216
as a weight parameter
Figure FDA00032213337900000217
Actual coded values after coding;
weight parameter
Figure FDA00032213337900000218
By two coded values
Figure FDA00032213337900000219
And
Figure FDA00032213337900000220
determining, weighting, parameters
Figure FDA00032213337900000221
By two coded values
Figure FDA00032213337900000222
And
Figure FDA00032213337900000223
are determined jointly, wherein
Figure FDA00032213337900000224
To share a positive encoded value;
Figure FDA00032213337900000225
mapping is stored to
Figure FDA00032213337900000226
A negative matrix of the matrix is formed,
Figure FDA00032213337900000227
mapping is stored to
Figure FDA00032213337900000228
A negative matrix of the matrix is formed,
Figure FDA00032213337900000229
the mapping is stored to the shared positive matrix.
4. A neural network chip comprising an inventory array formed by a plurality of inventory devices, each weight value in a neural network weight matrix being written one by one into a corresponding inventory device, characterized in that the neural network weight matrix is obtained by using the neural network weight parameter coding method according to any one of claims 1 to 3.
5. The chip of claim 4, wherein the memory device is a non-volatile memory device.
6. The chip of claim 5, wherein the precision of the memory device is 8 bits.
7. The chip of claim 5, wherein the precision of the memory device is 4 bits.
8. A computing device comprising a memory and a processor, the memory having stored thereon computer-executable instructions, wherein the instructions, when executed by the processor, perform the neural network weight parameter encoding method of any one of claims 1-3.
9. A neural network hardware system, comprising: the neural network chip of any one of claims 4-7 and/or the computing device of claim 8.
10. Use of the neural network weight parameter coding method according to any one of claims 1 to 3 and/or the neural network chip according to any one of claims 4 to 7 and/or the computing device according to claim 8 and/or the neural network hardware system according to claim 9 in the field of neural network technology.
CN202110964903.1A 2021-08-20 2021-08-20 Neural network weight coding method based on matrix sharing and hardware system Pending CN113705784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110964903.1A CN113705784A (en) 2021-08-20 2021-08-20 Neural network weight coding method based on matrix sharing and hardware system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110964903.1A CN113705784A (en) 2021-08-20 2021-08-20 Neural network weight coding method based on matrix sharing and hardware system

Publications (1)

Publication Number Publication Date
CN113705784A true CN113705784A (en) 2021-11-26

Family

ID=78653841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110964903.1A Pending CN113705784A (en) 2021-08-20 2021-08-20 Neural network weight coding method based on matrix sharing and hardware system

Country Status (1)

Country Link
CN (1) CN113705784A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306811A (en) * 2023-02-28 2023-06-23 苏州亿铸智能科技有限公司 Weight distribution method for deploying neural network for ReRAM

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170200078A1 (en) * 2014-08-28 2017-07-13 Commissariat A L'energie Atomique Et Aux Energies Alternatives Convolutional neural network
CN107612555A (en) * 2017-10-12 2018-01-19 江南大学 A kind of improvement degree of rarefication Adaptive matching tracing algorithm based on dichotomy
US20180046916A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN111242180A (en) * 2020-01-03 2020-06-05 南京邮电大学 Image identification method and system based on lightweight convolutional neural network
CN112734025A (en) * 2019-10-28 2021-04-30 复旦大学 Neural network parameter sparsification method based on fixed base regularization
CN112836757A (en) * 2021-02-09 2021-05-25 东南大学 Deep learning network convolution kernel internal parameter sharing method
CN112990454A (en) * 2021-02-01 2021-06-18 国网安徽省电力有限公司检修分公司 Neural network calculation acceleration method and device based on integrated DPU multi-core isomerism
US20210232897A1 (en) * 2016-04-27 2021-07-29 Commissariat A L'energie Atomique Et Aux Energies Al Ternatives Device and method for calculating convolution in a convolutional neural network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170200078A1 (en) * 2014-08-28 2017-07-13 Commissariat A L'energie Atomique Et Aux Energies Alternatives Convolutional neural network
US20210232897A1 (en) * 2016-04-27 2021-07-29 Commissariat A L'energie Atomique Et Aux Energies Al Ternatives Device and method for calculating convolution in a convolutional neural network
US20180046916A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
CN107612555A (en) * 2017-10-12 2018-01-19 江南大学 A kind of improvement degree of rarefication Adaptive matching tracing algorithm based on dichotomy
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN112734025A (en) * 2019-10-28 2021-04-30 复旦大学 Neural network parameter sparsification method based on fixed base regularization
CN111242180A (en) * 2020-01-03 2020-06-05 南京邮电大学 Image identification method and system based on lightweight convolutional neural network
CN112990454A (en) * 2021-02-01 2021-06-18 国网安徽省电力有限公司检修分公司 Neural network calculation acceleration method and device based on integrated DPU multi-core isomerism
CN112836757A (en) * 2021-02-09 2021-05-25 东南大学 Deep learning network convolution kernel internal parameter sharing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306811A (en) * 2023-02-28 2023-06-23 苏州亿铸智能科技有限公司 Weight distribution method for deploying neural network for ReRAM
CN116306811B (en) * 2023-02-28 2023-10-27 苏州亿铸智能科技有限公司 Weight distribution method for deploying neural network for ReRAM

Similar Documents

Publication Publication Date Title
US11580377B2 (en) Method and device for optimizing neural network
CN107516129B (en) Dimension self-adaptive Tucker decomposition-based deep network compression method
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
CN108229671B (en) System and method for reducing storage bandwidth requirement of external data of accelerator
CN109840585B (en) Sparse two-dimensional convolution-oriented operation method and system
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN107633297A (en) A kind of convolutional neural networks hardware accelerator based on parallel quick FIR filter algorithm
CN109791628A (en) Neural network model splits' positions method, training method, computing device and system
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN112286864B (en) Sparse data processing method and system for accelerating operation of reconfigurable processor
CN113421187B (en) Super-resolution reconstruction method, system, storage medium and equipment
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN113741858A (en) In-memory multiply-add calculation method, device, chip and calculation equipment
CN113705784A (en) Neural network weight coding method based on matrix sharing and hardware system
TW202009799A (en) Memory-adaptive processing method for convolutional neural network and system thereof
CN115859011A (en) Matrix operation method, device and unit, and electronic equipment
Okubo et al. A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation
CN110807479A (en) Neural network convolution calculation acceleration method based on Kmeans algorithm
KR102154834B1 (en) In DRAM Bitwise Convolution Circuit for Low Power and Fast Computation
CN114819167A (en) Sparse approximate inverse quantum preprocessing method and device for sparse linear system
CN104123372B (en) A kind of method and device that cluster is realized based on CUDA
CN117313803B (en) Sliding window 2D convolution computing method based on RISC-V vector processor architecture
CN116055003B (en) Data optimal transmission method, device, computer equipment and storage medium
CN111507178B (en) Data processing optimization method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination