CN110070178B

CN110070178B - Convolutional neural network computing device and method

Info

Publication number: CN110070178B
Application number: CN201910337943.6A
Authority: CN
Inventors: 王东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2019-04-25
Filing date: 2019-04-25
Publication date: 2021-05-14
Anticipated expiration: 2039-04-25
Also published as: CN110070178A

Abstract

The invention discloses a convolutional neural network computing device and a method thereof, wherein the device comprises a neural network model buffer used for buffering and coding a convolutional neural network model; the neural network model decoder is used for reading the coding model and decoding to obtain the logic index and the control information of the model weight; the input neural network characteristic map buffer is used for buffering input characteristic map pixel values; the characteristic diagram storage controller is used for reading characteristic image pixel value data according to the characteristic diagram logic index; the accumulator array is used for adding the input characteristic image pixel values multiplied by the same neural network model weight value and generating a temporary accumulation result; the pipeline buffer is used for buffering the temporary accumulation result; the multiply-accumulator array is used for multiplying the temporary accumulation result by the corresponding weight value of the neural network model, adding the product results belonging to the current convolution operation and generating an output characteristic image pixel value; and the output neural network characteristic map buffer is used for buffering the output characteristic map pixel values.

Description

Convolutional neural network computing device and method

Technical Field

The invention relates to the field of integrated circuits and artificial intelligence, in particular to a device for accelerating convolutional neural network inference operation and a corresponding neural network model coding method.

Background

With the rise of deep learning technology, convolutional neural networks are widely applied in various fields, such as computer vision, image processing, voice recognition, automatic robots, unmanned vehicles, and the like. The convolutional neural network is used as a core algorithm of a deep learning technology and has the advantages of high inference accuracy, strong fault tolerance and the like; but also has the problems of huge calculation amount, consumption of system storage resources and the like. The convolution operation in the convolutional neural network usually consumes more than 90% of the running time of the algorithm, and therefore, in order to realize the real-time calculation of the convolutional neural network, a special hardware circuit needs to be designed to accelerate the calculation of the convolution operation therein. The core of convolution operation is multiply-accumulate operation, therefore, the existing convolution neural network hardware accelerating circuit adopts a single multiply-accumulate array structure to accelerate multiply-accumulate operation, the multiply-accumulate array is composed of thousands of identical multiply-accumulate units, each multiply-accumulate unit needs a multiplier circuit, the hardware accelerating circuit with the structure usually needs to consume a large amount of hardware resources to realize thousands of multiplier circuits, and the defects of high cost of integrated circuit realization, large circuit power consumption and the like are caused. The adder can consume less circuit resources and can operate at a higher operating frequency than the multiplier; therefore, if redundant multiplication in convolution operation can be avoided, and the adder is used for replacing the multiplier to accelerate the hardware of the convolution operation, the resource consumption of a hardware circuit can be effectively reduced, the running frequency of the processor can be improved, and the purposes of reducing the cost of a hardware chip and improving the calculation performance can be achieved.

Disclosure of Invention

The invention provides a calculating device capable of efficiently accelerating convolution operation in convolution neural network reasoning operation, which can avoid redundant multiplication operation in convolution operation, utilizes an adder circuit with less hardware resource consumption and higher running speed to replace a multiplier circuit, reduces the resource consumption of an integrated circuit and simultaneously improves the operation efficiency of the integrated circuit under the condition of ensuring that the convolution operation result is not changed, and provides a basis for realizing a low-cost and low-power-consumption convolution neural network hardware acceleration chip.

In order to achieve the above purpose: the technical scheme of the invention is realized as follows:

drawings

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings;

fig. 1 shows a schematic diagram of the principle of convolution operation in a convolutional neural network.

FIG. 2 is a schematic diagram of a convolutional neural network computing device according to the present invention.

Fig. 3 shows a specific example of the coding convolutional neural network model according to the present invention.

FIG. 4 is a flow chart of a convolutional neural network model encoding method according to the present invention.

Fig. 5 is a schematic diagram of the internal structure of the accumulator module according to the present invention.

Fig. 6 is a schematic diagram of the internal structure of the multiply-accumulator module according to the present invention.

Detailed Description

The invention is further described below with reference to a set of embodiments and the accompanying drawings.

The embodiment discloses a convolutional neural network computing device, which is used for accelerating a convolution operation, which is a core computing method of a convolutional neural network. Fig. 1 is a schematic diagram of the convolution operation. The input to the convolution operation is referred to as the input neural network signature 101, and the input neural network signature pixel value 1011 (i.e., an element of the three-dimensional matrix) is represented by F_in(R, C, N) is expressed, wherein (R, C, N) is a characteristic diagram logic index 1012 of the convolutional neural network, R is more than or equal to 0 and less than R, C is more than or equal to 0 and less than or equal to C, and N is more than or equal to 0 and less than or equal to N; the convolutional neural network model 102 is typically represented by a four-dimensional matrix, and the model weight value 3021 of the convolutional neural network (i.e., an element of the four-dimensional matrix) is represented by W (K, K ', N, M), where (K, K ', N) is referred to as the logical index of the convolutional neural

network model weight

3011, and 0. ltoreq. K < K, 0. ltoreq. K ', 0. ltoreq. N, 0. ltoreq. M; the output of the convolution operation is still a three-dimensional matrix called output neural network characteristic map 103, and each output characteristic map pixel value 1031 (an element of the three-dimensional matrix) is denoted as F_out(R ', C ', M), wherein 0. ltoreq. R ' < R ', 0. ltoreq. C ' < C ' 0. ltoreq. M ' < M, calculated by a convolution operation defined by the following mathematical expression:

wherein s is the convolution window sliding displacement. In formula 1, the convolutional neural network model weight values with the same m are collectively referred to as a model channel 1021 of the neural network model, m is referred to as a model channel index, and each model channel has N × C × R weight values, so that N × C × R times of multiply-accumulate operations (each multiply-accumulate operation includes one multiplication operation and one addition operation) are required to be performed every time one output feature image pixel value 1031 is calculated. In the prior art, a large number of multiply-accumulate units are used for executing multiply-accumulate operations in parallel so as to achieve the purpose of accelerating convolution operation; however, the multiplier-accumulator resource consumption is large, the running speed is slow, and the overall performance of the final processor is often influenced.

A schematic structural diagram of a convolutional neural network computing device disclosed in this embodiment is shown in fig. 2, and specifically includes a neural network model buffer 201, a neural network model decoder 202, an input neural network feature map buffer 203, a feature map storage controller 204, an accumulator array 205, a pipeline buffer 206, a multiply-accumulator array 207, and an output neural network feature map buffer 208; when the device of the invention accelerates the operation of the convolutional neural network, firstly, the neural network model buffer 201 reads in the data of the coding convolutional neural network model 3 from an external memory, and the coding convolutional neural network model 3 can be completely loaded for many times or once according to the size of the model data and the use condition of the current storage resource of the buffer; meanwhile, the neural network feature map buffer 203 reads and buffers the input neural network feature map pixel value 1011 data required for performing the current convolution operation from the external memory or the output neural network feature map buffer 208; after loading the required data of the coding convolutional neural network model 3 and the input neural network characteristic image pixel value 1011, the neural network model decoder 202 reads the data of the coding convolutional neural network model 3 cached in the neural network model cache 201, that is, the convolutional neural network weight logic index table 301 and the model weight information table 302; the neural network model decoder 202 performs decoding operation, extracts the logical index 3011 of the convolutional neural network model weight, and sends the logical index 3011 of the weight to the feature map storage controller 204; the feature map storage controller 204 calculates a convolutional neural network feature map logical index 1012 (i.e., coordinate values of corresponding feature map pixels in the convolutional calculation formula 1) from the weight logical index 3011 (i.e., coordinate values of specific weights in the convolutional calculation formula 1) according to a specific convolutional neural network convolutional calculation formula; the profile storage controller 204 may convert the logical index 1012 of the convolutional neural network profile into a physical address of a storage medium in the input neural network profile buffer 203 according to a preset profile data organization structure manner (such as common linear continuous storage, link storage, index storage, hash storage, and the like, but not limited to these manners), obtain corresponding input neural network profile pixel value 1011 data according to the physical address, and send the profile pixel value 1011 data to the accumulator array 205 for convolution operation. Different from the traditional convolutional neural network hardware acceleration circuit which adopts a single and isomorphic multiply accumulator array, the invention uses an accumulator array 205 and a multiply accumulator array 207 to cooperate with each other to realize the acceleration of convolutional operation; a pipeline buffer 206 is arranged between the accumulator array 205 and the multiply accumulator array 207 and is used for receiving and buffering a temporary accumulation result 210 of the accumulator array; finally, the multiply-accumulator array 207 reads the temporary accumulation result 210 and completes convolution operation, and the calculation result of the convolution operation, that is, the output characteristic image pixel value 1031 data, is sent to the output neural network characteristic image buffer 208 for buffering.

In the prior art, when convolutional neural network inference operation is performed, the value of the neural network model weight is usually fixed, that is, a Q-bit binary fixed point number is used to represent a specific numerical value of the model weight. For a Qbit binary fixed point number, the possible value will not exceed 2^QAnd (4) respectively. For example, a 4-bit unsigned binary fixed point number, which may all take on values of 0,1,2,3, …,13,14,15, for a total of 16. Thus, when NxC × R > 2^QIn time, multiple product terms F appear in the convolution calculation formula 1_in(r-S, c-S, n) × W (k, k ', n, m) contains the same neural network weight value W (k, k', n, m); according to the multiplicative distribution law, if the input characteristic image pixel value F multiplied by the same neural network weight value is firstly obtained_inAnd (r, c, n) are added and then multiplied by the weight value of the neural network, and the convolution operation result is kept unchanged. By adopting the improved convolution calculation method, redundant multiplication operations can be effectively avoided, namely, the multiplication operations of NxCxR times are reduced to only 2^QTimes, but not the number of addition operationsInstead, the total computation of the convolution operation is greatly reduced.

In order to implement the hardware acceleration circuit of the improved convolution calculation method, firstly, the convolutional neural network model 102 needs to be encoded, that is, all possible values of the convolutional channel weight are determined for at least one model channel 1021 of the original convolutional neural network model 102, for at least one possibly occurring neural network model weight value 3021, a logical index 3011 of each weight equal to the value in the model channel 1021 is obtained, the logical index 3011 of the weight is stored in the convolutional neural network weight index table 301, the number of weights 3022 equal to the weight value 3021 is counted, and the weight value 3021 and the corresponding weight number are stored in the convolutional neural network model weight information table 302. Fig. 4 shows an embodiment of the convolutional neural network model coding method according to the present invention. After the encoding process starts, first obtaining all model weight values of a current model channel 1021 (corresponding to a certain m value) (step 402), then setting a neural network weight value 3021 required by current encoding (step 403), searching all obtained weights for a logic index 3011 of all model weights having the same weight value (steps 404 to 408), and leading a logic index 3011 storing the model weights to a convolutional neural network model logic index table 301 (step 406); the weight value counter is used for counting the number 3022 of model weights with the model weight value equal to the set weight value, and the neural network weight value 3021 and the number of the model weights are stored in the convolutional neural network model weight information table 302 according to the corresponding relationship (step 409); after all possible weight values appearing in the neural network model of the current model channel 1021 are coded, the neural network model weight of the next model channel 1021 is continuously loaded, and the coding process from step 403 to step 410 is repeated until all the channels of the neural network model are coded. The convolutional neural network model logical index table 301 and the convolutional neural network model weight information table 302 form a coding convolutional neural network model 3.

Fig. 3 shows a specific example of the coding convolutional neural network model 3. In this embodiment, the convolutional neural network model dimension is K ═ K' ═ 3, N ═ 2, and M ═ 1, and the model weight value 3021 (i.e., VAL in the figure) is represented by a 3-bit signed binary fixed point number, whose value range is [ -4, 3 ]; in this example, the model weight logical index table 301 sequentially stores the logical indexes 3011 of the model weights, that is, coordinate values (n, k, k'), in the order of magnitude corresponding to the model weight values 3021 in the model weight information table 302; for a specific model weight value 3021, the order of the coordinate priorities when searching for model weights having the same value is n → k → k'; in the model weight information table 302, all non-zero weight values 3021 appearing in the convolutional neural network model and the number 3022 of weights corresponding to the weight values are stored in the order of the smaller model weight values 3021 to the larger model weight values 3021 (i.e., NUM in the figure); the header of the weight information table 302 also stores the total number of all non-zero weight values appearing in the neural network model, and this value is used for the multiply-accumulator array 207 to determine whether the current convolution operation is completed. Since the multiplication and addition operations in the convolution operation both satisfy the commutative law, changing the accumulation order of the input neural network feature image pixel values 1011, the temporary accumulation result 210 of the feature image pixels, and the multiplication order of the model weight values 3021 does not affect the final result of the convolution operation, therefore, when performing neural network model coding, the coordinate priority order when searching for the model weight of a specific weight value, the storage order of the logic index 3011 having the same weight value weight in the convolutional neural network weight logic index table 301, and the storage order of the weight values 3021 in the model weight information table 302 are not limited to the order described in this embodiment, and other ordering rules may also be adopted, and will not affect the correct implementation of the present invention. The encoding results of different model channels 1021 may be stored in the same set of convolutional neural network model weight logical index table 301 and weight information table 302, or may be stored in multiple sets of convolutional neural network model weight logical index table 301 and weight information table 302, respectively.

In order to implement the improved convolution calculation method, the invention adopts a heterogeneous hardware architecture to implement a hardware acceleration circuit of the convolutional neural network, namely an accumulator array 205 and a multiply-accumulator array 207 work together to implement the improved convolution calculation method. FIG. 5 shows an embodiment of accumulator array 205, comprising at leastAn accumulator module 2051, said accumulator module 2051 comprising at least one first adder circuit 2055, a control information register circuit 2052, at least one first counter circuit 2053 and at least one selector circuit 2054; the control information register circuit 2052 is configured to store the control information 209 sent by the neural network model decoder, the first counter circuit reads the control information 209, and counts the input neural network feature image pixel 1011 data received by the accumulator module according to the neural network model weight value 3021 in the control information 209 and the model weight number 3022 corresponding to the weight value, and at the same time, outputs a control signal to control the selector to select one input signal of the first adder as the output of the first adder or equal to zero, and another input signal of the first adder receives the input neural network feature image pixel value 1011 data to realize the input neural network feature image value F corresponding to the same weight value in formula 1_inAn accumulation operation of (r, c, n) and generating a temporary accumulation result 210; each accumulator can execute convolution operations (a plurality of convolution operations with different values of m and the same value of s) corresponding to the same position of a plurality of model channels 1021 or convolution operations (a plurality of convolution operations with the same value of m and the different values of s) corresponding to a plurality of different positions in the same model channel 1021 in parallel so as to achieve the purposes of sharing input neural network characteristic image pixel value 1011 data or neural network model weight value 3021 among different convolution operations and reducing the bandwidth of an external memory.

Multiply accumulator array 207 is comprised of at least one multiply accumulator module 2071. Fig. 6 shows an embodiment of a multiply-accumulator module 2071, which includes at least one multiplier circuit 2072, at least one second adder circuit 2073, at least one pattern code information register circuit 2074 and at least one second counter circuit 2075; the model coding information temporary memory 2071 is used for storing the neural network model weight value 3021 and the corresponding model weight number 3022 information; the multiplier circuit 2072 sequentially reads the model weight value 3021, multiplies the weight value 3021 by the corresponding temporary accumulation result 210 and sends the multiplied result to the second adder 2073, and the second adder 2073 completes the final accumulation operation of the improved convolution algorithm; the second counter circuit 2075 reads 3022 information of the model weight number in the currently executed convolution operation, counts and controls the second adder circuit 2073 to accumulate all product results corresponding to the current convolution operation to obtain the output neural network feature map pixel 1031; after the accumulation is completed, the second counter circuit 2075 clears the output of the second adder circuit 2073 to prepare for starting a new convolution operation;

the neural network feature map cache 203 may be formed by common storage media on an integrated circuit chip, such as Flash, SRAM, or Embedded Memory resource (Embedded Memory) in FPGA, but is not limited to these types; the profile storage controller 204 also has corresponding storage media management (e.g., charging and discharging, refreshing, hibernation, waking up, etc.) and data error correction (parity, ECC, etc.) functions, depending on the type of the specific storage media.

The pipeline buffer 206 is configured to buffer the non-consecutive temporary accumulation result 210 output by the accumulator array 205, that is, the pipeline buffer 206 receives the temporary accumulation result 210 output by the accumulator array 205, and when the buffer is not full, the temporary accumulation result 210 is stored in the buffer according to the receiving order; when the buffer is full, the pipeline buffer sends a blocking signal to the accumulator array 205, informing the accumulator array 205 to stop operation; the pipeline buffer 206 receives the read request from the multiply-accumulate array 207, and outputs the temporary accumulation result 210 to the multiply-accumulate array 207 according to the write sequence when the buffer is not empty, and the pipeline buffer 206 sends a block signal to the multiply-accumulate array 207 to notify the multiply-accumulate array 207 to stop the operation when the buffer is empty. In the invention, the number of accumulators in the accumulator array 205 is much larger than the number of multiply accumulators in the multiply accumulator array 207 (usually 8-16 times), so that the pipeline buffer 206 can output continuous and stable data flow after data buffering, and the multiply accumulator array 207 can execute convolution operation with higher utilization rate, thereby effectively saving hardware cost. When accumulator array 205 and multiplier accumulator array 207 operate at different frequencies, pipeline buffer 206 may also act as a data buffer across clock domains, balancing the data load of both clock domains in computing the two paths.

In conclusion, the invention can effectively avoid executing redundant multiplication operation in convolution operation of the convolutional neural network core algorithm, and uses an adder circuit with smaller area and lower power consumption to replace most multiplier circuits to carry out convolution operation, thereby reducing the cost and the power consumption of a neural network acceleration chip or a processor, and enabling the neural network acceleration chip or the processor to be more suitable for embedded type, Internet of things and edge calculation application scenes. When the convolutional neural network model coding is carried out, the model weight with the weight of zero can be saved, so that sparse convolutional operation is supported, and the expansibility of the hardware device related to the convolutional neural network coding is more flexible.

It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims

1. A convolutional neural network computing device (2), the device comprising:

the neural network model buffer (201), the neural network model decoder (202), the input neural network feature map buffer (203), the feature map storage controller (204), the accumulator array (205), the pipeline buffer (206), the multiply-accumulator array (207) and the output neural network feature map buffer (208); wherein the content of the first and second substances,

the neural network model buffer (201) is used for buffering part or all of the coding convolutional neural network model (3) read by the convolutional neural network computing device (2) from an external memory, and the coding convolutional neural network model (3) comprises at least one convolutional neural network weight logic index table (301) and at least one convolutional neural network model weight information table (302); the convolutional neural network weight logical index table (301) stores at least a logical index (3011) of convolutional neural network model weights; the convolutional neural network model weight information table (302) at least stores information of all possible neural network model weight values (3021) and model weight numbers (3022) in the convolutional neural network model (102);

the input neural network characteristic map buffer (203) is used for reading and buffering input neural network characteristic map pixel values (1011) required by convolution operation from an external memory or an output neural network characteristic map buffer (208);

the neural network model decoder (202) reads a convolutional neural network model weight information table (302) stored in a neural network model buffer (201), accumulates the number (3022) of model weights corresponding to each neural network model weight value (3021), obtains the number of logical indexes (3011) of convolutional neural network model weights required by current convolutional operation, reads the logical indexes (3011) of the convolutional neural network model weights of equal number from a convolutional neural network weight logical index table (301) and generates control information (209), sends the logical indexes (3011) to a feature map storage controller (204), and sends the control information (209) to the accumulator array (205) and a multiplier-accumulator array (207), the control information (209) at least comprises neural network model weight value (3021) information decoded from a convolutional neural network model weight information table (302);

the characteristic diagram storage controller (204) is used for converting the logic index (3011) of the neural network model weight into a characteristic diagram logic index (1012) of the convolutional neural network at a corresponding position during convolution operation according to a neural network convolution operation formula, converting the characteristic diagram logic index (1012) into an actual memory physical address according to a preset characteristic diagram logic index-memory physical address mapping relation, reading an input neural network characteristic image pixel value (1011) corresponding to the physical address in an input neural network characteristic diagram buffer (203), and sending the input neural network characteristic image pixel value to the accumulator array (205);

the accumulator array (205) is composed of at least one accumulator module (2051), the accumulator module (2051) adds input neural network characteristic image pixel values (1011) which need to be multiplied by the same neural network model weight values (3021) in the current convolution operation according to the control information (209) to generate at least one temporary accumulation result (210), and the temporary accumulation result (210) is in one-to-one correspondence with the neural network model weight values (3021) in the convolution neural network model weight information table (302);

the pipeline buffer (206) is used for receiving and buffering temporary accumulation results (210) output by the accumulator array (205); the buffered temporary accumulation result (210) is provided for the multiply-accumulator array (207) to complete convolution operation;

the multiply-accumulator array (207) multiplies at least one temporary accumulation result (210) generated by the accumulator array (207) by a corresponding neural network model weight value (3021) according to the control information (209), adds the multiplication results belonging to the current convolution operation to generate an output characteristic image pixel value (1031), and sends the result to the output neural network characteristic image buffer (208);

the output neural network characteristic diagram buffer (208) is used for receiving the calculation result output by the multiply-accumulator array (207) and buffering and outputting the calculation result to an external memory or an input neural network characteristic diagram buffer (203).

2. The convolutional neural network computing device (2) of claim 1, wherein the logical index (3011) of the model weights stored in the convolutional neural network weight logical index table (301) is a logical index of model weights for each model channel (1021) in a convolutional neural network model (102), and logical indexes (3011) of model weights having the same weight value are stored in consecutive entries of the logical index table (301); the neural network model weight values (3021) stored in the neural network model weight information table (302) are all possible non-zero neural network model weight values in the same model channel (1021) in the convolutional neural network model (102); the neural network model weight information table (302) further stores the number (3022) of model weights corresponding to each of the neural network model weight values (3021), and the weight values (3021) are in one-to-one correspondence with the number (3022) of weight values.

3. The convolutional neural network computing device (2) of claim 1, wherein the control information (209) further comprises: information on the number (3022) of network model weights having the same weight value in each model channel (1021) in the convolutional neural network model (102).

4. The convolutional neural network computing device (2) of claim 2 or 3, wherein the accumulator array (205) is comprised of at least one accumulator module (2051), the accumulator module (2051) comprising at least one first adder circuit (2055), at least one control information temporary memory circuit (2052), at least one first counter circuit (2053), and at least one selector circuit (2054); the control information temporary memory circuit (2052) is used for storing control information (209) sent by the neural network model decoder (202), the first counter (2053) counts input neural network characteristic image pixel values (1011) received by the accumulator module (2051) according to neural network model weight values (3021) and corresponding model weight number (3022) information in the control information (209), and outputs a control signal to control the at least one selector circuit (2054) to select one input end of the first adder circuit (2055) to be equal to the output or be equal to zero; the other input of the first adder circuit (2055) receives the input neural network eigenmap pixel value (1011) and performs an accumulation operation of the input neural network eigenmap pixel value (1011) multiplied by a neural network weight having the same weight value in a convolution operation.

5. The convolutional neural network computing device (2) of claim 4, wherein the multiply-accumulator array (207) is comprised of at least one multiply-accumulator module (2071) comprising at least one multiplier circuit (2072), at least one second adder (2073) circuit, a model-coded information register circuit (2074), and at least one second counter circuit (2075); the model coding information temporary memory (2074) is used for storing the neural network model weight value (3021) information and the corresponding model weight number (3022); a multiplier circuit (2072) reads the weight value (3021) information, multiplies the weight value (3021) by a corresponding temporary accumulation result (210), and transmits the multiplication result to the second adder circuit (2073); a second counter circuit (2075) reads the information of the number (3022) of model weights corresponding to different weight values in the convolution operation currently performed, counts and controls a second adder circuit (2073) to accumulate all the multiplication results belonging to the current convolution operation output by the multiplier circuit (2072).

6. The convolutional neural network computing device (2) of claim 4, wherein the pipeline buffer (206) further comprises: receiving temporary accumulation results (210) output by an accumulator array (205), and storing the temporary accumulation results (210) in the buffer according to a receiving sequence when the buffer is not full; when the buffer is full, the pipeline buffer sends a blocking signal to the accumulator array (205) to inform the accumulator array (205) to stop operation; the pipeline buffer (206) receives a read request from the multiply-accumulator array (207), outputs the temporary accumulation results (210) to the multiply-accumulator array (207) in write order when the buffer is not empty, and issues a block signal to the multiply-accumulator array (207) when the buffer is empty, informing the multiply-accumulator array (207) to stop operation.

7. The convolutional neural network computing device (2) of claim 2 or 3, wherein the multiply-accumulator array (207) is comprised of at least one multiply-accumulator module (2071) comprising at least one multiplier circuit (2072), at least one second adder (2073) circuit, a model-coded information register circuit (2074), and at least one second counter circuit (2075); the model coding information temporary memory (2074) is used for storing the neural network model weight value (3021) information and the corresponding model weight number (3022); a multiplier circuit (2072) reads the weight value (3021) information, multiplies the weight value (3021) by a corresponding temporary accumulation result (210), and transmits the multiplication result to the second adder circuit (2073); a second counter circuit (2075) reads the information of the number (3022) of model weights corresponding to different weight values in the convolution operation currently performed, counts and controls a second adder circuit (2073) to accumulate all the multiplication results belonging to the current convolution operation output by the multiplier circuit (2072).

8. The convolutional neural network computing device (2) of claim 7, wherein the pipeline buffer (206) further comprises: receiving temporary accumulation results (210) output by an accumulator array (205), and storing the temporary accumulation results (210) in the buffer according to a receiving sequence when the buffer is not full; when the buffer is full, the pipeline buffer sends a blocking signal to the accumulator array (205) to inform the accumulator array (205) to stop operation; the pipeline buffer (206) receives a read request from the multiply-accumulator array (207), outputs the temporary accumulation results (210) to the multiply-accumulator array (207) in write order when the buffer is not empty, and issues a block signal to the multiply-accumulator array (207) when the buffer is empty, informing the multiply-accumulator array (207) to stop operation.

9. A convolutional neural network model encoding method applied to the convolutional neural network computing device (2) of claim 1, comprising: at least one model channel (1021) to the original convolutional neural network model (102),

determining all possible neural network model weight values (3021) present in the model channel (1021);

for each value of all possible neural network model weight values (3021), obtaining a logical index (3011) of each of the model channels with a weight equal to the value, and counting the number of weights (3022) corresponding to the value;

storing the logical index (3011) in a convolutional neural network weight index table (301);

storing the weight values (3021) and the corresponding weight numbers (3022) in a convolutional neural network model weight information table (302).