CN110070178B - Convolutional neural network computing device and method - Google Patents

Convolutional neural network computing device and method Download PDF

Info

Publication number
CN110070178B
CN110070178B CN201910337943.6A CN201910337943A CN110070178B CN 110070178 B CN110070178 B CN 110070178B CN 201910337943 A CN201910337943 A CN 201910337943A CN 110070178 B CN110070178 B CN 110070178B
Authority
CN
China
Prior art keywords
neural network
network model
model
weight
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910337943.6A
Other languages
Chinese (zh)
Other versions
CN110070178A (en
Inventor
王东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201910337943.6A priority Critical patent/CN110070178B/en
Publication of CN110070178A publication Critical patent/CN110070178A/en
Application granted granted Critical
Publication of CN110070178B publication Critical patent/CN110070178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a convolutional neural network computing device and a method thereof, wherein the device comprises a neural network model buffer used for buffering and coding a convolutional neural network model; the neural network model decoder is used for reading the coding model and decoding to obtain the logic index and the control information of the model weight; the input neural network characteristic map buffer is used for buffering input characteristic map pixel values; the characteristic diagram storage controller is used for reading characteristic image pixel value data according to the characteristic diagram logic index; the accumulator array is used for adding the input characteristic image pixel values multiplied by the same neural network model weight value and generating a temporary accumulation result; the pipeline buffer is used for buffering the temporary accumulation result; the multiply-accumulator array is used for multiplying the temporary accumulation result by the corresponding weight value of the neural network model, adding the product results belonging to the current convolution operation and generating an output characteristic image pixel value; and the output neural network characteristic map buffer is used for buffering the output characteristic map pixel values.

Description

Convolutional neural network computing device and method
Technical Field
The invention relates to the field of integrated circuits and artificial intelligence, in particular to a device for accelerating convolutional neural network inference operation and a corresponding neural network model coding method.
Background
With the rise of deep learning technology, convolutional neural networks are widely applied in various fields, such as computer vision, image processing, voice recognition, automatic robots, unmanned vehicles, and the like. The convolutional neural network is used as a core algorithm of a deep learning technology and has the advantages of high inference accuracy, strong fault tolerance and the like; but also has the problems of huge calculation amount, consumption of system storage resources and the like. The convolution operation in the convolutional neural network usually consumes more than 90% of the running time of the algorithm, and therefore, in order to realize the real-time calculation of the convolutional neural network, a special hardware circuit needs to be designed to accelerate the calculation of the convolution operation therein. The core of convolution operation is multiply-accumulate operation, therefore, the existing convolution neural network hardware accelerating circuit adopts a single multiply-accumulate array structure to accelerate multiply-accumulate operation, the multiply-accumulate array is composed of thousands of identical multiply-accumulate units, each multiply-accumulate unit needs a multiplier circuit, the hardware accelerating circuit with the structure usually needs to consume a large amount of hardware resources to realize thousands of multiplier circuits, and the defects of high cost of integrated circuit realization, large circuit power consumption and the like are caused. The adder can consume less circuit resources and can operate at a higher operating frequency than the multiplier; therefore, if redundant multiplication in convolution operation can be avoided, and the adder is used for replacing the multiplier to accelerate the hardware of the convolution operation, the resource consumption of a hardware circuit can be effectively reduced, the running frequency of the processor can be improved, and the purposes of reducing the cost of a hardware chip and improving the calculation performance can be achieved.
Disclosure of Invention
The invention provides a calculating device capable of efficiently accelerating convolution operation in convolution neural network reasoning operation, which can avoid redundant multiplication operation in convolution operation, utilizes an adder circuit with less hardware resource consumption and higher running speed to replace a multiplier circuit, reduces the resource consumption of an integrated circuit and simultaneously improves the operation efficiency of the integrated circuit under the condition of ensuring that the convolution operation result is not changed, and provides a basis for realizing a low-cost and low-power-consumption convolution neural network hardware acceleration chip.
In order to achieve the above purpose: the technical scheme of the invention is realized as follows:
drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings;
fig. 1 shows a schematic diagram of the principle of convolution operation in a convolutional neural network.
FIG. 2 is a schematic diagram of a convolutional neural network computing device according to the present invention.
Fig. 3 shows a specific example of the coding convolutional neural network model according to the present invention.
FIG. 4 is a flow chart of a convolutional neural network model encoding method according to the present invention.
Fig. 5 is a schematic diagram of the internal structure of the accumulator module according to the present invention.
Fig. 6 is a schematic diagram of the internal structure of the multiply-accumulator module according to the present invention.
Detailed Description
The invention is further described below with reference to a set of embodiments and the accompanying drawings.
The embodiment discloses a convolutional neural network computing device, which is used for accelerating a convolution operation, which is a core computing method of a convolutional neural network. Fig. 1 is a schematic diagram of the convolution operation. The input to the convolution operation is referred to as the input neural network signature 101, and the input neural network signature pixel value 1011 (i.e., an element of the three-dimensional matrix) is represented by Fin(R, C, N) is expressed, wherein (R, C, N) is a characteristic diagram logic index 1012 of the convolutional neural network, R is more than or equal to 0 and less than R, C is more than or equal to 0 and less than or equal to C, and N is more than or equal to 0 and less than or equal to N; the convolutional neural network model 102 is typically represented by a four-dimensional matrix, and the model weight value 3021 of the convolutional neural network (i.e., an element of the four-dimensional matrix) is represented by W (K, K ', N, M), where (K, K ', N) is referred to as the logical index of the convolutional neural network model weight 3011, and 0. ltoreq. K < K, 0. ltoreq. K ', 0. ltoreq. N, 0. ltoreq. M; the output of the convolution operation is still a three-dimensional matrix called output neural network characteristic map 103, and each output characteristic map pixel value 1031 (an element of the three-dimensional matrix) is denoted as Fout(R ', C ', M), wherein 0. ltoreq. R ' < R ', 0. ltoreq. C ' < C ' 0. ltoreq. M ' < M, calculated by a convolution operation defined by the following mathematical expression:
Figure BDA0002039749560000021
wherein s is the convolution window sliding displacement. In formula 1, the convolutional neural network model weight values with the same m are collectively referred to as a model channel 1021 of the neural network model, m is referred to as a model channel index, and each model channel has N × C × R weight values, so that N × C × R times of multiply-accumulate operations (each multiply-accumulate operation includes one multiplication operation and one addition operation) are required to be performed every time one output feature image pixel value 1031 is calculated. In the prior art, a large number of multiply-accumulate units are used for executing multiply-accumulate operations in parallel so as to achieve the purpose of accelerating convolution operation; however, the multiplier-accumulator resource consumption is large, the running speed is slow, and the overall performance of the final processor is often influenced.
A schematic structural diagram of a convolutional neural network computing device disclosed in this embodiment is shown in fig. 2, and specifically includes a neural network model buffer 201, a neural network model decoder 202, an input neural network feature map buffer 203, a feature map storage controller 204, an accumulator array 205, a pipeline buffer 206, a multiply-accumulator array 207, and an output neural network feature map buffer 208; when the device of the invention accelerates the operation of the convolutional neural network, firstly, the neural network model buffer 201 reads in the data of the coding convolutional neural network model 3 from an external memory, and the coding convolutional neural network model 3 can be completely loaded for many times or once according to the size of the model data and the use condition of the current storage resource of the buffer; meanwhile, the neural network feature map buffer 203 reads and buffers the input neural network feature map pixel value 1011 data required for performing the current convolution operation from the external memory or the output neural network feature map buffer 208; after loading the required data of the coding convolutional neural network model 3 and the input neural network characteristic image pixel value 1011, the neural network model decoder 202 reads the data of the coding convolutional neural network model 3 cached in the neural network model cache 201, that is, the convolutional neural network weight logic index table 301 and the model weight information table 302; the neural network model decoder 202 performs decoding operation, extracts the logical index 3011 of the convolutional neural network model weight, and sends the logical index 3011 of the weight to the feature map storage controller 204; the feature map storage controller 204 calculates a convolutional neural network feature map logical index 1012 (i.e., coordinate values of corresponding feature map pixels in the convolutional calculation formula 1) from the weight logical index 3011 (i.e., coordinate values of specific weights in the convolutional calculation formula 1) according to a specific convolutional neural network convolutional calculation formula; the profile storage controller 204 may convert the logical index 1012 of the convolutional neural network profile into a physical address of a storage medium in the input neural network profile buffer 203 according to a preset profile data organization structure manner (such as common linear continuous storage, link storage, index storage, hash storage, and the like, but not limited to these manners), obtain corresponding input neural network profile pixel value 1011 data according to the physical address, and send the profile pixel value 1011 data to the accumulator array 205 for convolution operation. Different from the traditional convolutional neural network hardware acceleration circuit which adopts a single and isomorphic multiply accumulator array, the invention uses an accumulator array 205 and a multiply accumulator array 207 to cooperate with each other to realize the acceleration of convolutional operation; a pipeline buffer 206 is arranged between the accumulator array 205 and the multiply accumulator array 207 and is used for receiving and buffering a temporary accumulation result 210 of the accumulator array; finally, the multiply-accumulator array 207 reads the temporary accumulation result 210 and completes convolution operation, and the calculation result of the convolution operation, that is, the output characteristic image pixel value 1031 data, is sent to the output neural network characteristic image buffer 208 for buffering.
In the prior art, when convolutional neural network inference operation is performed, the value of the neural network model weight is usually fixed, that is, a Q-bit binary fixed point number is used to represent a specific numerical value of the model weight. For a Qbit binary fixed point number, the possible value will not exceed 2QAnd (4) respectively. For example, a 4-bit unsigned binary fixed point number, which may all take on values of 0,1,2,3, …,13,14,15, for a total of 16. Thus, when NxC × R > 2QIn time, multiple product terms F appear in the convolution calculation formula 1in(r-S, c-S, n) × W (k, k ', n, m) contains the same neural network weight value W (k, k', n, m); according to the multiplicative distribution law, if the input characteristic image pixel value F multiplied by the same neural network weight value is firstly obtainedinAnd (r, c, n) are added and then multiplied by the weight value of the neural network, and the convolution operation result is kept unchanged. By adopting the improved convolution calculation method, redundant multiplication operations can be effectively avoided, namely, the multiplication operations of NxCxR times are reduced to only 2QTimes, but not the number of addition operationsInstead, the total computation of the convolution operation is greatly reduced.
In order to implement the hardware acceleration circuit of the improved convolution calculation method, firstly, the convolutional neural network model 102 needs to be encoded, that is, all possible values of the convolutional channel weight are determined for at least one model channel 1021 of the original convolutional neural network model 102, for at least one possibly occurring neural network model weight value 3021, a logical index 3011 of each weight equal to the value in the model channel 1021 is obtained, the logical index 3011 of the weight is stored in the convolutional neural network weight index table 301, the number of weights 3022 equal to the weight value 3021 is counted, and the weight value 3021 and the corresponding weight number are stored in the convolutional neural network model weight information table 302. Fig. 4 shows an embodiment of the convolutional neural network model coding method according to the present invention. After the encoding process starts, first obtaining all model weight values of a current model channel 1021 (corresponding to a certain m value) (step 402), then setting a neural network weight value 3021 required by current encoding (step 403), searching all obtained weights for a logic index 3011 of all model weights having the same weight value (steps 404 to 408), and leading a logic index 3011 storing the model weights to a convolutional neural network model logic index table 301 (step 406); the weight value counter is used for counting the number 3022 of model weights with the model weight value equal to the set weight value, and the neural network weight value 3021 and the number of the model weights are stored in the convolutional neural network model weight information table 302 according to the corresponding relationship (step 409); after all possible weight values appearing in the neural network model of the current model channel 1021 are coded, the neural network model weight of the next model channel 1021 is continuously loaded, and the coding process from step 403 to step 410 is repeated until all the channels of the neural network model are coded. The convolutional neural network model logical index table 301 and the convolutional neural network model weight information table 302 form a coding convolutional neural network model 3.
Fig. 3 shows a specific example of the coding convolutional neural network model 3. In this embodiment, the convolutional neural network model dimension is K ═ K' ═ 3, N ═ 2, and M ═ 1, and the model weight value 3021 (i.e., VAL in the figure) is represented by a 3-bit signed binary fixed point number, whose value range is [ -4, 3 ]; in this example, the model weight logical index table 301 sequentially stores the logical indexes 3011 of the model weights, that is, coordinate values (n, k, k'), in the order of magnitude corresponding to the model weight values 3021 in the model weight information table 302; for a specific model weight value 3021, the order of the coordinate priorities when searching for model weights having the same value is n → k → k'; in the model weight information table 302, all non-zero weight values 3021 appearing in the convolutional neural network model and the number 3022 of weights corresponding to the weight values are stored in the order of the smaller model weight values 3021 to the larger model weight values 3021 (i.e., NUM in the figure); the header of the weight information table 302 also stores the total number of all non-zero weight values appearing in the neural network model, and this value is used for the multiply-accumulator array 207 to determine whether the current convolution operation is completed. Since the multiplication and addition operations in the convolution operation both satisfy the commutative law, changing the accumulation order of the input neural network feature image pixel values 1011, the temporary accumulation result 210 of the feature image pixels, and the multiplication order of the model weight values 3021 does not affect the final result of the convolution operation, therefore, when performing neural network model coding, the coordinate priority order when searching for the model weight of a specific weight value, the storage order of the logic index 3011 having the same weight value weight in the convolutional neural network weight logic index table 301, and the storage order of the weight values 3021 in the model weight information table 302 are not limited to the order described in this embodiment, and other ordering rules may also be adopted, and will not affect the correct implementation of the present invention. The encoding results of different model channels 1021 may be stored in the same set of convolutional neural network model weight logical index table 301 and weight information table 302, or may be stored in multiple sets of convolutional neural network model weight logical index table 301 and weight information table 302, respectively.
In order to implement the improved convolution calculation method, the invention adopts a heterogeneous hardware architecture to implement a hardware acceleration circuit of the convolutional neural network, namely an accumulator array 205 and a multiply-accumulator array 207 work together to implement the improved convolution calculation method. FIG. 5 shows an embodiment of accumulator array 205, comprising at leastAn accumulator module 2051, said accumulator module 2051 comprising at least one first adder circuit 2055, a control information register circuit 2052, at least one first counter circuit 2053 and at least one selector circuit 2054; the control information register circuit 2052 is configured to store the control information 209 sent by the neural network model decoder, the first counter circuit reads the control information 209, and counts the input neural network feature image pixel 1011 data received by the accumulator module according to the neural network model weight value 3021 in the control information 209 and the model weight number 3022 corresponding to the weight value, and at the same time, outputs a control signal to control the selector to select one input signal of the first adder as the output of the first adder or equal to zero, and another input signal of the first adder receives the input neural network feature image pixel value 1011 data to realize the input neural network feature image value F corresponding to the same weight value in formula 1inAn accumulation operation of (r, c, n) and generating a temporary accumulation result 210; each accumulator can execute convolution operations (a plurality of convolution operations with different values of m and the same value of s) corresponding to the same position of a plurality of model channels 1021 or convolution operations (a plurality of convolution operations with the same value of m and the different values of s) corresponding to a plurality of different positions in the same model channel 1021 in parallel so as to achieve the purposes of sharing input neural network characteristic image pixel value 1011 data or neural network model weight value 3021 among different convolution operations and reducing the bandwidth of an external memory.
Multiply accumulator array 207 is comprised of at least one multiply accumulator module 2071. Fig. 6 shows an embodiment of a multiply-accumulator module 2071, which includes at least one multiplier circuit 2072, at least one second adder circuit 2073, at least one pattern code information register circuit 2074 and at least one second counter circuit 2075; the model coding information temporary memory 2071 is used for storing the neural network model weight value 3021 and the corresponding model weight number 3022 information; the multiplier circuit 2072 sequentially reads the model weight value 3021, multiplies the weight value 3021 by the corresponding temporary accumulation result 210 and sends the multiplied result to the second adder 2073, and the second adder 2073 completes the final accumulation operation of the improved convolution algorithm; the second counter circuit 2075 reads 3022 information of the model weight number in the currently executed convolution operation, counts and controls the second adder circuit 2073 to accumulate all product results corresponding to the current convolution operation to obtain the output neural network feature map pixel 1031; after the accumulation is completed, the second counter circuit 2075 clears the output of the second adder circuit 2073 to prepare for starting a new convolution operation;
the neural network feature map cache 203 may be formed by common storage media on an integrated circuit chip, such as Flash, SRAM, or Embedded Memory resource (Embedded Memory) in FPGA, but is not limited to these types; the profile storage controller 204 also has corresponding storage media management (e.g., charging and discharging, refreshing, hibernation, waking up, etc.) and data error correction (parity, ECC, etc.) functions, depending on the type of the specific storage media.
The pipeline buffer 206 is configured to buffer the non-consecutive temporary accumulation result 210 output by the accumulator array 205, that is, the pipeline buffer 206 receives the temporary accumulation result 210 output by the accumulator array 205, and when the buffer is not full, the temporary accumulation result 210 is stored in the buffer according to the receiving order; when the buffer is full, the pipeline buffer sends a blocking signal to the accumulator array 205, informing the accumulator array 205 to stop operation; the pipeline buffer 206 receives the read request from the multiply-accumulate array 207, and outputs the temporary accumulation result 210 to the multiply-accumulate array 207 according to the write sequence when the buffer is not empty, and the pipeline buffer 206 sends a block signal to the multiply-accumulate array 207 to notify the multiply-accumulate array 207 to stop the operation when the buffer is empty. In the invention, the number of accumulators in the accumulator array 205 is much larger than the number of multiply accumulators in the multiply accumulator array 207 (usually 8-16 times), so that the pipeline buffer 206 can output continuous and stable data flow after data buffering, and the multiply accumulator array 207 can execute convolution operation with higher utilization rate, thereby effectively saving hardware cost. When accumulator array 205 and multiplier accumulator array 207 operate at different frequencies, pipeline buffer 206 may also act as a data buffer across clock domains, balancing the data load of both clock domains in computing the two paths.
In conclusion, the invention can effectively avoid executing redundant multiplication operation in convolution operation of the convolutional neural network core algorithm, and uses an adder circuit with smaller area and lower power consumption to replace most multiplier circuits to carry out convolution operation, thereby reducing the cost and the power consumption of a neural network acceleration chip or a processor, and enabling the neural network acceleration chip or the processor to be more suitable for embedded type, Internet of things and edge calculation application scenes. When the convolutional neural network model coding is carried out, the model weight with the weight of zero can be saved, so that sparse convolutional operation is supported, and the expansibility of the hardware device related to the convolutional neural network coding is more flexible.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims (9)

1. A convolutional neural network computing device (2), the device comprising:
the neural network model buffer (201), the neural network model decoder (202), the input neural network feature map buffer (203), the feature map storage controller (204), the accumulator array (205), the pipeline buffer (206), the multiply-accumulator array (207) and the output neural network feature map buffer (208); wherein the content of the first and second substances,
the neural network model buffer (201) is used for buffering part or all of the coding convolutional neural network model (3) read by the convolutional neural network computing device (2) from an external memory, and the coding convolutional neural network model (3) comprises at least one convolutional neural network weight logic index table (301) and at least one convolutional neural network model weight information table (302); the convolutional neural network weight logical index table (301) stores at least a logical index (3011) of convolutional neural network model weights; the convolutional neural network model weight information table (302) at least stores information of all possible neural network model weight values (3021) and model weight numbers (3022) in the convolutional neural network model (102);
the input neural network characteristic map buffer (203) is used for reading and buffering input neural network characteristic map pixel values (1011) required by convolution operation from an external memory or an output neural network characteristic map buffer (208);
the neural network model decoder (202) reads a convolutional neural network model weight information table (302) stored in a neural network model buffer (201), accumulates the number (3022) of model weights corresponding to each neural network model weight value (3021), obtains the number of logical indexes (3011) of convolutional neural network model weights required by current convolutional operation, reads the logical indexes (3011) of the convolutional neural network model weights of equal number from a convolutional neural network weight logical index table (301) and generates control information (209), sends the logical indexes (3011) to a feature map storage controller (204), and sends the control information (209) to the accumulator array (205) and a multiplier-accumulator array (207), the control information (209) at least comprises neural network model weight value (3021) information decoded from a convolutional neural network model weight information table (302);
the characteristic diagram storage controller (204) is used for converting the logic index (3011) of the neural network model weight into a characteristic diagram logic index (1012) of the convolutional neural network at a corresponding position during convolution operation according to a neural network convolution operation formula, converting the characteristic diagram logic index (1012) into an actual memory physical address according to a preset characteristic diagram logic index-memory physical address mapping relation, reading an input neural network characteristic image pixel value (1011) corresponding to the physical address in an input neural network characteristic diagram buffer (203), and sending the input neural network characteristic image pixel value to the accumulator array (205);
the accumulator array (205) is composed of at least one accumulator module (2051), the accumulator module (2051) adds input neural network characteristic image pixel values (1011) which need to be multiplied by the same neural network model weight values (3021) in the current convolution operation according to the control information (209) to generate at least one temporary accumulation result (210), and the temporary accumulation result (210) is in one-to-one correspondence with the neural network model weight values (3021) in the convolution neural network model weight information table (302);
the pipeline buffer (206) is used for receiving and buffering temporary accumulation results (210) output by the accumulator array (205); the buffered temporary accumulation result (210) is provided for the multiply-accumulator array (207) to complete convolution operation;
the multiply-accumulator array (207) multiplies at least one temporary accumulation result (210) generated by the accumulator array (207) by a corresponding neural network model weight value (3021) according to the control information (209), adds the multiplication results belonging to the current convolution operation to generate an output characteristic image pixel value (1031), and sends the result to the output neural network characteristic image buffer (208);
the output neural network characteristic diagram buffer (208) is used for receiving the calculation result output by the multiply-accumulator array (207) and buffering and outputting the calculation result to an external memory or an input neural network characteristic diagram buffer (203).
2. The convolutional neural network computing device (2) of claim 1, wherein the logical index (3011) of the model weights stored in the convolutional neural network weight logical index table (301) is a logical index of model weights for each model channel (1021) in a convolutional neural network model (102), and logical indexes (3011) of model weights having the same weight value are stored in consecutive entries of the logical index table (301); the neural network model weight values (3021) stored in the neural network model weight information table (302) are all possible non-zero neural network model weight values in the same model channel (1021) in the convolutional neural network model (102); the neural network model weight information table (302) further stores the number (3022) of model weights corresponding to each of the neural network model weight values (3021), and the weight values (3021) are in one-to-one correspondence with the number (3022) of weight values.
3. The convolutional neural network computing device (2) of claim 1, wherein the control information (209) further comprises: information on the number (3022) of network model weights having the same weight value in each model channel (1021) in the convolutional neural network model (102).
4. The convolutional neural network computing device (2) of claim 2 or 3, wherein the accumulator array (205) is comprised of at least one accumulator module (2051), the accumulator module (2051) comprising at least one first adder circuit (2055), at least one control information temporary memory circuit (2052), at least one first counter circuit (2053), and at least one selector circuit (2054); the control information temporary memory circuit (2052) is used for storing control information (209) sent by the neural network model decoder (202), the first counter (2053) counts input neural network characteristic image pixel values (1011) received by the accumulator module (2051) according to neural network model weight values (3021) and corresponding model weight number (3022) information in the control information (209), and outputs a control signal to control the at least one selector circuit (2054) to select one input end of the first adder circuit (2055) to be equal to the output or be equal to zero; the other input of the first adder circuit (2055) receives the input neural network eigenmap pixel value (1011) and performs an accumulation operation of the input neural network eigenmap pixel value (1011) multiplied by a neural network weight having the same weight value in a convolution operation.
5. The convolutional neural network computing device (2) of claim 4, wherein the multiply-accumulator array (207) is comprised of at least one multiply-accumulator module (2071) comprising at least one multiplier circuit (2072), at least one second adder (2073) circuit, a model-coded information register circuit (2074), and at least one second counter circuit (2075); the model coding information temporary memory (2074) is used for storing the neural network model weight value (3021) information and the corresponding model weight number (3022); a multiplier circuit (2072) reads the weight value (3021) information, multiplies the weight value (3021) by a corresponding temporary accumulation result (210), and transmits the multiplication result to the second adder circuit (2073); a second counter circuit (2075) reads the information of the number (3022) of model weights corresponding to different weight values in the convolution operation currently performed, counts and controls a second adder circuit (2073) to accumulate all the multiplication results belonging to the current convolution operation output by the multiplier circuit (2072).
6. The convolutional neural network computing device (2) of claim 4, wherein the pipeline buffer (206) further comprises: receiving temporary accumulation results (210) output by an accumulator array (205), and storing the temporary accumulation results (210) in the buffer according to a receiving sequence when the buffer is not full; when the buffer is full, the pipeline buffer sends a blocking signal to the accumulator array (205) to inform the accumulator array (205) to stop operation; the pipeline buffer (206) receives a read request from the multiply-accumulator array (207), outputs the temporary accumulation results (210) to the multiply-accumulator array (207) in write order when the buffer is not empty, and issues a block signal to the multiply-accumulator array (207) when the buffer is empty, informing the multiply-accumulator array (207) to stop operation.
7. The convolutional neural network computing device (2) of claim 2 or 3, wherein the multiply-accumulator array (207) is comprised of at least one multiply-accumulator module (2071) comprising at least one multiplier circuit (2072), at least one second adder (2073) circuit, a model-coded information register circuit (2074), and at least one second counter circuit (2075); the model coding information temporary memory (2074) is used for storing the neural network model weight value (3021) information and the corresponding model weight number (3022); a multiplier circuit (2072) reads the weight value (3021) information, multiplies the weight value (3021) by a corresponding temporary accumulation result (210), and transmits the multiplication result to the second adder circuit (2073); a second counter circuit (2075) reads the information of the number (3022) of model weights corresponding to different weight values in the convolution operation currently performed, counts and controls a second adder circuit (2073) to accumulate all the multiplication results belonging to the current convolution operation output by the multiplier circuit (2072).
8. The convolutional neural network computing device (2) of claim 7, wherein the pipeline buffer (206) further comprises: receiving temporary accumulation results (210) output by an accumulator array (205), and storing the temporary accumulation results (210) in the buffer according to a receiving sequence when the buffer is not full; when the buffer is full, the pipeline buffer sends a blocking signal to the accumulator array (205) to inform the accumulator array (205) to stop operation; the pipeline buffer (206) receives a read request from the multiply-accumulator array (207), outputs the temporary accumulation results (210) to the multiply-accumulator array (207) in write order when the buffer is not empty, and issues a block signal to the multiply-accumulator array (207) when the buffer is empty, informing the multiply-accumulator array (207) to stop operation.
9. A convolutional neural network model encoding method applied to the convolutional neural network computing device (2) of claim 1, comprising: at least one model channel (1021) to the original convolutional neural network model (102),
determining all possible neural network model weight values (3021) present in the model channel (1021);
for each value of all possible neural network model weight values (3021), obtaining a logical index (3011) of each of the model channels with a weight equal to the value, and counting the number of weights (3022) corresponding to the value;
storing the logical index (3011) in a convolutional neural network weight index table (301);
storing the weight values (3021) and the corresponding weight numbers (3022) in a convolutional neural network model weight information table (302).
CN201910337943.6A 2019-04-25 2019-04-25 Convolutional neural network computing device and method Active CN110070178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910337943.6A CN110070178B (en) 2019-04-25 2019-04-25 Convolutional neural network computing device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910337943.6A CN110070178B (en) 2019-04-25 2019-04-25 Convolutional neural network computing device and method

Publications (2)

Publication Number Publication Date
CN110070178A CN110070178A (en) 2019-07-30
CN110070178B true CN110070178B (en) 2021-05-14

Family

ID=67368858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910337943.6A Active CN110070178B (en) 2019-04-25 2019-04-25 Convolutional neural network computing device and method

Country Status (1)

Country Link
CN (1) CN110070178B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442323B (en) * 2019-08-09 2023-06-23 复旦大学 Device and method for performing floating point number or fixed point number multiply-add operation
CN111492369B (en) * 2019-09-19 2023-12-12 香港应用科技研究院有限公司 Residual quantization of shift weights in artificial neural networks
CN110717588B (en) 2019-10-15 2022-05-03 阿波罗智能技术(北京)有限公司 Apparatus and method for convolution operation
CN112784952B (en) * 2019-11-04 2024-03-19 珠海格力电器股份有限公司 Convolutional neural network operation system, method and equipment
CN110929860B (en) * 2019-11-07 2020-10-23 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
CN111126569B (en) * 2019-12-18 2022-11-11 中国电子科技集团公司第五十二研究所 Convolutional neural network device supporting pruning sparse compression and calculation method
CN110991623B (en) * 2019-12-20 2024-05-28 中国科学院自动化研究所 Neural network operation system based on digital-analog mixed neuron
CN113379046B (en) * 2020-03-09 2023-07-11 中国科学院深圳先进技术研究院 Acceleration calculation method for convolutional neural network, storage medium and computer equipment
CN111415004B (en) * 2020-03-17 2023-11-03 阿波罗智联(北京)科技有限公司 Method and device for outputting information
CN111461313B (en) * 2020-03-27 2023-03-14 合肥工业大学 Convolution neural network hardware accelerator based on lightweight network and calculation method thereof
CN111652359B (en) * 2020-05-25 2023-05-02 北京大学深圳研究生院 Multiplier array for matrix operations and multiplier array for convolution operations
CN111723924B (en) * 2020-05-28 2022-07-12 西安交通大学 Deep neural network accelerator based on channel sharing
CN113807998A (en) * 2020-06-12 2021-12-17 深圳市中兴微电子技术有限公司 Image processing method, target detection device, machine vision equipment and storage medium
CN111797977B (en) * 2020-07-03 2022-05-20 西安交通大学 Accelerator structure for binarization neural network and circular expansion method
CN112418417B (en) * 2020-09-24 2024-02-27 北京计算机技术及应用研究所 Convolutional neural network acceleration device and method based on SIMD technology
CN112508174B (en) * 2020-12-09 2024-03-22 东南大学 Weight binary neural network-oriented pre-calculation column-by-column convolution calculation unit
CN112966813B (en) * 2021-03-15 2023-04-07 神思电子技术股份有限公司 Convolutional neural network input layer device and working method thereof
CN113191193B (en) * 2021-03-30 2023-08-04 河海大学 Convolution method based on graph and grid
CN113298259B (en) * 2021-06-10 2024-04-26 中国电子科技集团公司第十四研究所 CNN (computer network) reasoning framework design method supporting multi-core parallelism of embedded platform
CN113672286A (en) * 2021-07-30 2021-11-19 科络克电子科技(上海)有限公司 Assembly line evaluator, and moving track analysis processing device, method and equipment
CN113591025B (en) * 2021-08-03 2024-06-14 深圳思谋信息科技有限公司 Feature map processing method and device, convolutional neural network accelerator and medium
CN114723031B (en) * 2022-05-06 2023-10-20 苏州宽温电子科技有限公司 Computing device
CN115481721B (en) * 2022-09-02 2023-06-27 浙江大学 Psum calculation circuit for convolutional neural network
CN116205274B (en) * 2023-04-27 2023-07-21 苏州浪潮智能科技有限公司 Control method, device, equipment and storage medium of impulse neural network
CN116402106B (en) * 2023-06-07 2023-10-24 深圳市九天睿芯科技有限公司 Neural network acceleration method, neural network accelerator, chip and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor
CN107871163A (en) * 2016-09-28 2018-04-03 爱思开海力士有限公司 Operation device and method for convolutional neural networks
CN107909148A (en) * 2017-12-12 2018-04-13 北京地平线信息技术有限公司 For performing the device of the convolution algorithm in convolutional neural networks
CN108885714A (en) * 2017-11-30 2018-11-23 深圳市大疆创新科技有限公司 The control method of computing unit, computing system and computing unit
CN109002883A (en) * 2018-07-04 2018-12-14 中国科学院计算技术研究所 Convolutional neural networks model computing device and calculation method
CN109472356A (en) * 2018-12-29 2019-03-15 南京宁麒智能计算芯片研究院有限公司 A kind of accelerator and method of restructural neural network algorithm
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089577B2 (en) * 2016-08-05 2018-10-02 Xilinx, Inc. Binary neural networks on progammable integrated circuits
US10838910B2 (en) * 2017-04-27 2020-11-17 Falcon Computing Systems and methods for systolic array design from a high-level program
CN109409512B (en) * 2018-09-27 2021-02-19 西安交通大学 Flexibly configurable neural network computing unit, computing array and construction method thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor
CN107871163A (en) * 2016-09-28 2018-04-03 爱思开海力士有限公司 Operation device and method for convolutional neural networks
CN108885714A (en) * 2017-11-30 2018-11-23 深圳市大疆创新科技有限公司 The control method of computing unit, computing system and computing unit
CN107909148A (en) * 2017-12-12 2018-04-13 北京地平线信息技术有限公司 For performing the device of the convolution algorithm in convolutional neural networks
CN109002883A (en) * 2018-07-04 2018-12-14 中国科学院计算技术研究所 Convolutional neural networks model computing device and calculation method
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
CN109472356A (en) * 2018-12-29 2019-03-15 南京宁麒智能计算芯片研究院有限公司 A kind of accelerator and method of restructural neural network algorithm

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A scalable FPGA accelerator for convolutional neural networks;Ke Xu 等;《12th Conference,ACA 2018,Yingkou,China》;20180811;第3-14页 *
Optimizing FPGA-based accelerator design for deep convolutional neural networks;C.Zhang 等;《FPGA"15:Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays》;20150228;第161-170页 *
基于FPGA的人工神经网络的研究与实现;杨程;《中国优秀硕士学位论文全文数据库-信息科技辑》;20170315(第3期);第I140-182页 *
基于FPGA的卷积神经网络加速器;余子健 等;《计算机工程》;20160420;第109-114+119页 *

Also Published As

Publication number Publication date
CN110070178A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN110070178B (en) Convolutional neural network computing device and method
CN109063825B (en) Convolutional neural network accelerator
CN107256424B (en) Three-value weight convolution network processing system and method
US10877733B2 (en) Segment divider, segment division operation method, and electronic device
CN111832719A (en) Fixed point quantization convolution neural network accelerator calculation circuit
CN111240746B (en) Floating point data inverse quantization and quantization method and equipment
CN111507465B (en) Configurable convolutional neural network processor circuit
US11973519B2 (en) Normalized probability determination for character encoding
WO2021057085A1 (en) Hybrid precision storage-based depth neural network accelerator
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN111694643B (en) Task scheduling execution system and method for graph neural network application
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN109389208B (en) Data quantization device and quantization method
CN111738427B (en) Operation circuit of neural network
CN112862091B (en) Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
US11561795B2 (en) Accumulating data values and storing in first and second storage devices
CN115186802A (en) Block sparse method and device based on convolutional neural network and processing unit
CN115018062A (en) Convolutional neural network accelerator based on FPGA
CN112232499A (en) Convolutional neural network accelerator
CN116227599A (en) Inference model optimization method and device, electronic equipment and storage medium
CN112085154A (en) Asymmetric quantization for compression and inference acceleration of neural networks
Shu et al. High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination
US20210044303A1 (en) Neural network acceleration device and method
CN109389209B (en) Processing apparatus and processing method
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant