CN114692833B - Convolution calculation circuit, neural network processor and convolution calculation method - Google Patents

Convolution calculation circuit, neural network processor and convolution calculation method Download PDF

Info

Publication number
CN114692833B
CN114692833B CN202210326694.2A CN202210326694A CN114692833B CN 114692833 B CN114692833 B CN 114692833B CN 202210326694 A CN202210326694 A CN 202210326694A CN 114692833 B CN114692833 B CN 114692833B
Authority
CN
China
Prior art keywords
data
module
convolution
weight
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210326694.2A
Other languages
Chinese (zh)
Other versions
CN114692833A (en
Inventor
卢知伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Qixin Semiconductor Co ltd
Original Assignee
Guangdong Qixin Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Qixin Semiconductor Co ltd filed Critical Guangdong Qixin Semiconductor Co ltd
Priority to CN202210326694.2A priority Critical patent/CN114692833B/en
Publication of CN114692833A publication Critical patent/CN114692833A/en
Application granted granted Critical
Publication of CN114692833B publication Critical patent/CN114692833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

One or more embodiments of the present specification provide a convolution calculation circuit, a neural network processor, and a convolution calculation method. Wherein the circuit comprises: the data grouping module divides the original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight values in the convolution kernel; in each convolution calculation module, an index generation sub-module generates a plurality of index data based on the numerical value of each original data in the same data packet on the same bit; the weight inquiry sub-module inquires the target weight corresponding to each index data in the weight table; the shift addition sub-module is used for carrying out shift processing on each target weight based on bit of the index data in the original data and adding all the shifted target weights to obtain an intermediate convolution value; the summation module is used for summing all the intermediate convolution values transmitted by each convolution calculation module to obtain a final convolution value; the circuit area can be reduced, and the power consumption can be reduced.

Description

Convolution calculation circuit, neural network processor and convolution calculation method
Technical Field
One or more embodiments of the present disclosure relate to the field of artificial intelligence, and more particularly, to a convolution calculation circuit, a neural network processor, and a convolution calculation method.
Background
With the continuous development of the technical field of artificial intelligence, various deep learning algorithms based on convolutional neural networks are widely applied to various scenes such as image processing, voice recognition and the like, and bring great convenience to social life, but at the same time, huge calculated amount generated by convolutional calculation at the core in the algorithms also brings challenges to performance and cost of related circuits such as a neural network processor and the like.
The convolution calculation mainly consists of addition and multiplication, wherein an execution module of the multiplication part is generally realized based on a multiplier in the related technology at present, but the circuit area of the multiplier is large, the power consumption is high, the performance improvement and the cost reduction of the circuit are not facilitated, and therefore, how to realize the convolution calculation with smaller circuit area and lower power consumption becomes a technical problem to be solved urgently.
Disclosure of Invention
In view of this, one or more embodiments of the present specification propose a convolution calculation circuit, a neural network processor, and a convolution calculation method.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, a convolution calculation circuit is provided, including a data grouping module, a convolution calculation module, and a summation module, where the convolution calculation module further includes an index generation sub-module, a weight query sub-module, and a shift addition sub-module;
the data grouping module divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight values in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module;
in each convolution calculation module, the index generation sub-module generates a plurality of index data based on the numerical value of each original data in the same data packet on the same bit, and transmits the index data to the weight inquiry sub-module;
the weight inquiry sub-module inquires target weights corresponding to the index data in a weight table and transmits the target weights to the shift addition sub-module; the weight table in each convolution calculation module is preconfigured based on the weight in the convolution kernel;
the shift addition sub-module performs shift processing on each target weight based on the bit of the index data in the original data, adds all the shifted target weights to obtain an intermediate convolution value, and transmits the intermediate convolution value to the summation module;
and the summation module is used for summing all the intermediate convolution values transmitted by each convolution calculation module to obtain a final convolution value.
According to a second aspect of one or more embodiments of the present specification, a neural network processor is presented, which performs convolution calculations using the circuit of the first aspect described above.
According to a third aspect of one or more embodiments of the present specification, there is provided a convolution calculation method, comprising:
dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight values in the convolution kernel;
generating, for each data packet, a number of index data based on the value of each original data in the data packet over the same bit;
for each index data, inquiring a target weight corresponding to the index data in a weight table of a data packet to which the index data belongs; the weight table of each data packet is preconfigured based on the weight in the convolution kernel;
and carrying out shift processing on each target weight based on the bit of the index data in the original data, and adding all the shifted target weights to obtain a final convolution value.
As can be seen from the above description, the convolution calculation circuit in the present specification first performs division of data packets on the original data by the data packet module; meanwhile, a convolution calculation module is arranged for each data packet, in each convolution calculation module, index data are generated from the original data of the same data packet through an index generation sub-module, then target weights corresponding to the index data are searched through a weight inquiry sub-module, and all the target weights are subjected to shift addition through a shift addition sub-module to obtain intermediate convolution values; and finally, summing all the intermediate convolution values through a summation module to obtain a final convolution value.
According to the scheme, the weight index and the shift addition after the part related to multiplication in the convolution calculation is successfully converted into the data packet are avoided, the use of the multiplier in the convolution calculation circuit is avoided, the circuit area is reduced, the circuit power consumption is reduced, and the optimization of performance and cost is facilitated.
Drawings
For a clearer description, the drawings that are required to be used in the embodiments of the present specification or the related art description will be briefly described below, it being apparent that the drawings in the following description are only parts of the embodiments of the present specification, and that other drawings may be obtained from these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a schematic diagram of a convolution calculating circuit according to an exemplary embodiment.
Fig. 2 is a schematic diagram of a convolution calculation module according to an exemplary embodiment.
Fig. 3 is a flowchart of a convolution calculation method according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
At present, most of deep learning algorithms widely applied to various scenes such as image processing, voice recognition and the like are realized based on a convolutional neural network model, and the algorithm cores of the deep learning algorithms are actually a large number of convolutional calculations positioned on a convolutional layer of the neural network.
Convolution computation consists essentially of addition and multiplication toImage processing is exemplified when a convolution kernel of 3 x 3 size is employedCalculate pixel area +.>M on the middle center pixel point 11 The calculation formula is as follows: r=m 00 ×W 00 +M 01 ×W 01 +M 02 ×W 02 +M 10 ×W 10 +M 11 ×W 11 +M 12 ×W 12 +M 20 ×W 20 +M 21 ×W 21 +M 22 ×W 22 The method comprises the steps of carrying out a first treatment on the surface of the Wherein W is pq For weights in the convolution kernel, M pq For image data such as gradation on a pixel, p, q are natural numbers.
In the related art, the circuit for performing the convolution calculation is generally implemented based on adders and multipliers, for example, the calculation of the convolution value R can be implemented by combining 9 multipliers with 8 adders, but this scheme is not advantageous for optimizing the performance and cost of the convolution calculation circuit because the multipliers are devices with large circuit area and high power consumption. As algorithms become more complex and demands increase, how to implement convolution calculation with smaller circuit area and lower power consumption becomes a technical problem to be solved.
In view of this, the present specification proposes a convolution calculation circuit that can be used to execute various kinds of deep learning algorithms based on convolution neural networks.
Referring to fig. 1, fig. 1 is a schematic diagram of a convolution calculating circuit according to an exemplary embodiment of the present disclosure.
The convolution computing circuit comprises a data grouping module, a convolution computing module and a summation module; the convolution calculation module further comprises an index generation sub-module, a weight inquiry sub-module and a shift addition sub-module.
In a convolution calculation circuit, the number of data grouping modules 10 may be 1, the number of summing modules 30 may be 1, and the number of convolution calculation modules is related to the size of the convolution kernel in the convolution neural network. In an alternative implementation, if the convolution kernel size is n×n, the number of convolution calculation modules provided in the convolution calculation circuit may be n, and denoted as convolution calculation module 20, convolution calculation module 21, and..once again, convolution calculation module 2 (n-1); wherein n is a positive integer greater than 1.
More commonly, in the case where the convolution kernel size is 3×3, the number of data grouping modules, convolution calculating modules, and summing modules included in the convolution calculating circuit is 1, 3, and 1, respectively, which are respectively denoted as data grouping module 10, convolution calculating modules 20, 21, and 22, and summing module 30.
It should be noted that, in the case where the convolution kernel size is 5×5 or greater, more convolution calculation modules may be provided in the convolution calculation circuit, or a large convolution kernel may be converted into a plurality of small convolution kernels for processing, for example, a convolution kernel of 5×5 may be converted into 4 convolution kernels of 3×3 for processing, so as to reduce the complexity of the circuit and improve the applicability of the circuit.
In connection, the data grouping module 10 may be connected to the index generating sub-module 2i1 in each convolution calculating module 2i, the summing module 30 may be connected to the shift summing sub-module 2i3 in each convolution calculating module 2i, and in each convolution calculating module 2i, the index generating sub-module 2i1, the weight querying sub-module 2i2, and the shift summing sub-module 2i3 may be connected in sequence; wherein i is a natural number between 0 and n-1.
The following describes each module and sub-module in the convolution calculation circuit in further detail.
The data grouping module 10 divides a plurality of original data to be processed into a plurality of data groupings based on the correspondence between the original data and the weights in the convolution kernel, and transmits the original data belonging to the same data grouping to the same convolution calculation module 2i.
The corresponding relation exists between a plurality of original data to be processed and each weight value in the convolution kernel; taking image processing as an example, the raw data may be image data on each pixel point of the image to be processed, and in one convolution calculation, each weight in the convolution kernel of 3×3 corresponds to image data on each pixel point on the pixel region of 3×3 one by one. It can be understood that if the scene is not image processing, the original data will also form a corresponding relationship with each weight value in the convolution kernel based on a specific algorithm, and details will not be repeated here depending on the algorithm.
The data grouping module 10 may divide the plurality of original data into a plurality of data groupings based on a correspondence between the original data and weights in the convolution kernel after receiving the incoming original data. In an alternative implementation, the data grouping module 10 may divide the original data corresponding to the peer weights in the convolution kernel into the same data group; in another alternative implementation, the data grouping module 10 may divide the original data corresponding to the same column weights in the convolution kernel into the same data group.
After dividing the plurality of raw data into a plurality of data packets, the data packet module 10 may transmit the raw data belonging to the same data packet to the index generating sub-module 2i1 of the same convolution calculating module 2i, while the raw data belonging to different data packets will be transmitted to different convolution calculating modules, respectively.
By convolution kernelsCorresponding to original data->For example, the data grouping module 10 may group the raw data M corresponding to the first row weight in the convolution kernel 00 、M 01 、M 02 The index generation sub-module 201, which is transmitted to the convolution calculation module 20, generates the original data M corresponding to the second row weight in the convolution kernel 10 、M 11 、M 12 The index generation sub-module 211, which is passed to the convolution calculation module 21, generates the original data M corresponding to the third row weight in the convolution kernel 20 、M 21 、M 22 The index generation sub-module 221, which is passed to the convolution calculation module 22.
In each convolution calculation module 2i, the index generation sub-module 2i1 generates a plurality of index data based on the values of the original data in the same data packet on the same bit, and transmits the index data to the weight query sub-module 2i2.
The index generation sub-module 2i1, after receiving a plurality of original data belonging to the same data packet, which are transmitted from the data packet module 10, may generate a plurality of index data based on the values of the original data on the same bit. In an alternative implementation manner, if the data grouping module 10 groups the original data corresponding to the same-row weights in the convolution kernel, the index generating sub-module 2i1 may take the values of the respective original data on the same bit, and generate the index data corresponding to the bit according to the sequential arrangement of the same-row weights in the convolution kernel corresponding to the original data, so as to generate the index data respectively corresponding to the respective bit. It will be appreciated that if the data packet 10 groups the original data corresponding to the same column weights in the convolution kernel, the index generation sub-module 2i1 may take the values of the respective original data on the same bit and generate the index data according to the sequential arrangement of the same column weights in the convolution kernel corresponding to the original data.
Under the condition that the convolution kernel size is n multiplied by n and the original data bit width is m, based on n pieces of original data in the same data packet, m pieces of index data with n bit widths corresponding to bits [ m-1] to [0] respectively can be generated; wherein m is a positive integer.
Referring to Table 1 below, based on the previous example, assuming that the bit width of the original data is 8, the original data M is received from the data packet module 10 00 (0100 1000)、M 01 (1010 0111)、M 02 (0001 1010) the index generation sub-module 201 of the convolution calculation module 20 may take their values on the same bit according to M 00 、M 01 、M 02 The values are respectively index data bit [0]]Bit [ 1]]Bit [2 ]]Cis of upper numerical valueSequentially arranging, generating index data corresponding to each bit of the original data, and then arranging bit [7]]To bit [0]]The corresponding 8 3-bit index data are transmitted to the weight query sub-module 202 of the convolution calculation module 20.
TABLE 1
Similarly, the original data M is transmitted in the received data packet 10 10 、M 11 、M 12 Thereafter, the index generation sub-module 211 of the convolution calculation module 21 may be based on the raw data M 10 、M 11 、M 12 Generating bit [7]]To bit [0]]The corresponding 8 index data are then transmitted to the weight query sub-module 212 of the convolution calculation module 21.
Upon receipt of the original data M incoming by the data packet module 1 20 、M 21 、M 22 Thereafter, the index generation sub-module 221 of the convolution calculation module 22 may be based on the raw data M 20 、M 21 、M 22 Generating bit [7]]To bit [0]]The corresponding 8 index data are then transmitted to the weight query sub-module 222 of the convolution calculation module 22.
In each convolution calculation module 2i, a weight inquiry sub-module 2i2 inquires a weight table for a target weight corresponding to each index data, and transmits the target weight to a shift addition sub-module 2i3; the weight table in each convolution calculation module is preconfigured based on the weights in the convolution kernel.
After receiving the index data corresponding to each bit transmitted by the index generating sub-module 2i1, the weight querying sub-module 2i2 may query a weight table preconfigured by the convolution calculating module 2i for a target weight corresponding to the index data.
In the case of a convolution kernel size of n×nThe weight table comprises the numerical range of 0 to 2 n 2 of-1 n The bit width of the target weight corresponding to each index data is the same as that of the weight in the convolution kernel, and is generally the same as that of the original data; since the convolution kernel is generally unchanged in the convolution calculation, the weight table in each convolution calculation module can be used for processing different original data after being preconfigured. Under the condition that the bit width of the original data is m, the target weights corresponding to m index data are queried, the index data corresponding to different bits can be the same in value, the target weights obtained by querying the index data with the same value are the same, but the subsequent shifting processes are different.
Referring to table 3 below, based on the previous example, assuming that the weight table of the convolution calculation module 20 is shown in table 2 below, after receiving the 8 3-bit index data corresponding to the original data bits [7] to [0] respectively, which is transmitted from the index generation sub-module 201, the weight query sub-module 202 of the convolution calculation module 20 may query to obtain 8 target weights corresponding to the bits [7] to [0] respectively, and then transmit the 8 target weights to the shift addition sub-module 203 of the convolution calculation module 20.
TABLE 2
Bit position [7] [6] [5] [4] [3] [2] [1] [0]
Index data 010 001 010 100 101 010 110 010
Target weight W 010 W 001 W 010 W 100 W 101 W 010 W 110 W 010
TABLE 3 Table 3
Similarly, after receiving the 8 index data transmitted from the index generation sub-module 211, the weight query sub-module 212 of the convolution calculation module 21 may find the corresponding 8 target weights, and then transmit the corresponding 8 target weights to the shift addition sub-module 213 of the convolution calculation module 21.
After receiving the 8 index data transmitted from the index generating sub-module 221, the weight querying sub-module 222 of the convolution calculating module 22 may find the corresponding 8 target weights, and then transmit the corresponding 8 target weights to the shift adding sub-module 223 of the convolution calculating module 22.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a convolution calculating module according to an exemplary embodiment of the present disclosure.
In an alternative implementation, the convolution calculation module 2i further includes a weight table configuration module 2i4.
In connection, the weight table configuration module 2i4 is connected to the weight query sub-module 2i2 in the convolution calculation module 2i.
In the case where the data grouping module 10 groups the raw data corresponding to the weights in the same row in the convolution kernel into a group, the weight table configuration module 2i4 may determine the target weights corresponding to the index data by calculating the weights in the same row in the convolution kernel based on the combined addition rule corresponding to the index data, thereby determining the target weights respectively corresponding to the index data to configure the weight table of the weight query sub-module 2i2. It will be appreciated that in the case where the data packet 10 groups the original data corresponding to the same column of weights in the convolution kernel, the target weights in the weight table are then determined by the same column of weight calculations in the convolution kernel.
It should be noted that, in the weight table configuration modules of different convolution calculation modules, weights of different rows in the convolution kernel are adopted, and they should correspond to the original data processed by the convolution calculation module. Based on the previous example, the raw data M is processed 00 、M 01 、M 02 Is calculated from the weights W of the first row co-located in the convolution kernel in the convolution calculation module 20 of (1) 00 、W 01 、W 02 Calculating and determining weight value table, processing original data M 10 、M 11 、M 12 Is calculated by the weight W of the second row at the same place in the convolution kernel 10 、W 11 、W 12 Calculate and determine the weight table to process the original data M 20 、M 21 、M 22 Then the weight W of the third row is calculated from the same position in the convolution kernel in the convolution calculation module 22 of (1) 20 、W 21 、W 22 And calculating and determining a weight table.
Referring to Table 5 below, based on the previous example, assume that the combination-and-addition rule corresponding to the index data is as shown in Table 4 below, weights W of the same row in the received incoming convolution kernel 00 、W 01 、W 02 Thereafter, based on the combined addition rule, the weight table configuration module 204 of the convolution calculation module 20 may determine target weights corresponding to the index data 000 to the index data 111, respectively.
TABLE 4 Table 4
TABLE 5
Similarly, upon receiving weights for the same row in the incoming convolution kernel, the weight table configuration module 214 of the convolution calculation module 21 may determine the weight table as shown in table 6 below.
TABLE 6
Upon receiving the weights for the same row in the incoming convolution kernel, the weight table configuration module 224 of the convolution calculation module 22 may determine the weight table as shown in table 7 below.
TABLE 7
In each convolution calculation module 2i, the shift addition sub-module 2i3 performs shift processing on each target weight based on the bit of the index data in the original data, adds all the shifted target weights to obtain an intermediate convolution value, and transmits the intermediate convolution value to the summation module 30.
After receiving the target weights corresponding to the index data transmitted by the weight query sub-module 2i2, the shift addition sub-module 2i3 may perform a left shift process on the target weights, where the bit positions of the index data in the original data are consistent, and then add all the shifted target weights to obtain an intermediate convolution value R of the convolution calculation module 2i i And convolving the intermediate convolution value R i To summing module 30.
Based on the previous example, the bits respectively corresponding to the original data bits [7] as shown in Table 3, which are transmitted in the received weight query sub-module 202]To bit [0]]After 8 3-bit target weights, the shift-and-add sub-module 203 of the convolution computation module 20 may correspond to bit [7]]Target weight W 010 Shift 7 bits left to obtain W' 010 Will correspond to bit [6 ]]Target weight W 001 Left shift 6 th bit to obtain W' 001 And so on, adding the 8 target weights after the shift processing to obtain an intermediate convolution value R of the convolution calculation module 20 0 ,R 0 =W’ 010 +W’ 001 +W’ 010 +W’ 100 +W’ 101 +W’ 010 +W’ 110 +W’ 010
Similarly, after receiving the 8 target weights transmitted by the weight query submodule 212, the shift addition submodule 213 of the convolution computation module 21 may perform a left shift corresponding to the bit position on the target weights, and add the 8 shifted target weights to obtain an intermediate convolution value R of the convolution computation module 21 1
At the 8 destinations received from the weight query sub-module 222After the target weights, the shift adding sub-module 223 of the convolution calculation module 22 may perform left shift corresponding to the bit position on the target weights, and add the 8 shifted target weights to obtain an intermediate convolution value R of the convolution calculation module 22 2
The summation module 30 sums all the intermediate convolution values transmitted from the convolution calculation modules 2i to obtain a final convolution value.
The summation module 30 receives the intermediate convolution values R from the respective convolution calculation modules 2i i Then, summing all the intermediate convolution values to obtain a final convolution value
Based on the previous example, the convolution calculation module 20 may calculate the original data M 00 、M 01 、M 02 Obtaining an intermediate convolution value R 0 The convolution calculation module 21 may calculate the original data M 10 、M 11 、M 12 Obtaining an intermediate convolution value R 1 The convolution calculation module 22 may be formed from the raw data M 20 、M 21 、M 22 Obtaining an intermediate convolution value R 2 Finally, summing module 30 can apply a median convolution value R 0 Intermediate convolution value R 1 Intermediate convolution value R 2 And summing to obtain a final convolution value R.
The specific circuit implementation of each module and each sub-module is not complex, a multiplier is not needed, and a plurality of alternative implementation modes exist; for example, the weight query sub-module may be implemented based on components such as a data selector, and the summing module may be implemented based on an adder, which is not described in detail.
As can be seen from the above description, the convolution calculation circuit in the present specification first performs division of data packets on the original data by the data packet module; meanwhile, a convolution calculation module is arranged for each data packet, in each convolution calculation module, index data are generated from the original data of the same data packet through an index generation sub-module, then target weights corresponding to the index data are searched through a weight inquiry sub-module, and all the target weights are subjected to shift addition through a shift addition sub-module to obtain intermediate convolution values; and finally, summing all the intermediate convolution values through a summation module to obtain a final convolution value.
According to the scheme, the weight index and the shift addition after the part related to multiplication in the convolution calculation is successfully converted into the data packet are avoided, the use of the multiplier in the convolution calculation circuit is avoided, the circuit area is reduced, the circuit power consumption is reduced, and the optimization of performance and cost is facilitated.
The specification also provides a neural network processor which can be used for executing various deep learning algorithms based on the convolutional neural network. The neural network processor performs convolution calculation by adopting the convolution calculation circuit provided by the specification; at the hardware level, buses, interfaces, memory, and non-volatile storage may be included in addition to the convolution computation circuitry, as well as hardware required by other services.
The specification also provides a convolution calculation method which can be used for executing various deep learning algorithms based on the convolution neural network.
Referring to fig. 3, fig. 3 is a flowchart illustrating a convolution calculating method according to an exemplary embodiment of the present disclosure.
The convolution calculation method can comprise the following specific steps:
step 302, dividing a plurality of original data to be processed into a plurality of data packets based on the corresponding relation between the original data and the weights in the convolution kernel;
step 304, for each data packet, generating a plurality of index data based on the values of the original data in the data packet on the same bit;
step 306, for each index data, inquiring a target weight corresponding to the index data in a weight table of a data packet to which the index data belongs; the weight table of each data packet is preconfigured based on the weight in the convolution kernel;
and 308, performing shift processing on each target weight based on the bit of the index data in the original data, and adding all the shifted target weights to obtain a final convolution value.
It will be appreciated that in step 308, the summation of all the shifted target weights may be decomposed into a middle convolution value of each data packet obtained by first summing all the shifted target weights for the data packet, and a final convolution value obtained by summing the middle convolution values of all the data packets.
Optionally, in step 302, the dividing the raw data to be processed into data packets based on the correspondence between the raw data and the weights in the convolution kernel includes:
the original data corresponding to the same row weights in the convolution kernel are partitioned into the same data packet.
Optionally, in step 304, the generating a plurality of index data based on the values of the respective original data in the data packet on the same bit includes:
and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the same-row weights in the convolution kernel corresponding to the original data, so as to generate the index data respectively corresponding to each bit.
Optionally, the process of pre-configuring the weight table of each data packet based on the weights in the convolution kernel includes:
and calculating and determining a target weight corresponding to the index data by the weight value of the same row in the convolution kernel based on a combination addition rule corresponding to the index data, thereby determining the target weight corresponding to each index data respectively so as to configure a weight table of a data packet to which the original data corresponding to the weight value belongs.
Optionally, when the method is applied to image processing, the plurality of pieces of original data to be processed are image data on each pixel point of the image to be processed.
As can be seen from the above description, the convolution calculation method in the present specification successfully converts the part related to multiplication in convolution calculation into the weight index after data grouping and adds the shift, which avoids the use of multipliers in circuits when implementing convolution calculation, and is beneficial to the reduction of circuit area and circuit power consumption, and the optimization of performance and cost.
The method set forth in the above embodiment may be implemented based on software, such as by reading and running a corresponding computer program; of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded, that is, the execution subject of the process is not limited to each logic unit, but may be hardware or logic devices.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims (8)

1. The convolution computing circuit is characterized by comprising a data grouping module, a convolution computing module and a summation module, wherein the convolution computing module further comprises an index generating sub-module, a weight inquiring sub-module and a shift adding sub-module;
the data grouping module divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight values in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module;
in each convolution calculation module, the index generation sub-module generates a plurality of index data based on the numerical value of each original data in the same data packet on the same bit, and transmits the index data to the weight inquiry sub-module;
the weight inquiry sub-module inquires target weights corresponding to the index data in a weight table and transmits the target weights to the shift addition sub-module; the weight table in each convolution calculation module is preconfigured based on the weight in the convolution kernel; the convolution calculation module further comprises a weight table configuration module; the weight table configuration module calculates and determines target weights corresponding to the index data by weights of the same row in the convolution kernel based on a combination addition rule corresponding to the index data, so as to determine the target weights respectively corresponding to the index data to configure a weight table of the convolution calculation module;
the shift addition sub-module performs shift processing on each target weight based on the bit of the index data in the original data, adds all the shifted target weights to obtain an intermediate convolution value, and transmits the intermediate convolution value to the summation module;
and the summation module is used for summing all the intermediate convolution values transmitted by each convolution calculation module to obtain a final convolution value.
2. The circuit according to claim 1, wherein the data grouping module is configured to, when dividing the number of raw data to be processed into a number of data packets based on the correspondence between the raw data and the weights in the convolution kernel:
the original data corresponding to the same row weights in the convolution kernel are partitioned into the same data packet.
3. The circuit according to claim 1, wherein the index generation sub-module is configured, when generating a plurality of index data based on values of respective original data in the same data packet on the same bit, to:
and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the same-row weights in the convolution kernel corresponding to the original data, so as to generate the index data respectively corresponding to each bit.
4. The circuit according to claim 1, wherein in the case where a convolution kernel size is n×n, the number of convolution calculation modules included in the convolution calculation circuit is n; wherein n is a positive integer.
5. The circuit of claim 1, wherein the plurality of raw data to be processed, when applied to image processing, is image data at each pixel of an image to be processed.
6. A neural network processor, characterized in that convolution calculations are performed using the circuit of any one of claims 1 to 5.
7. A convolution computing method, the method comprising:
dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight values in the convolution kernel;
generating, for each data packet, a number of index data based on the value of each original data in the data packet over the same bit;
for each index data, inquiring a target weight corresponding to the index data in a weight table of a data packet to which the index data belongs; the weight table of each data packet is preconfigured based on the weight in the convolution kernel; wherein the process of pre-configuring the weight table of each data packet based on the weights in the convolution kernel comprises: based on a combination addition rule corresponding to index data, determining a target weight corresponding to the index data by weight calculation of the same row in a convolution kernel, thereby determining target weights respectively corresponding to the index data to configure a weight table of a data packet to which original data corresponding to the weight value belongs;
and carrying out shift processing on each target weight based on the bit of the index data in the original data, and adding all the shifted target weights to obtain a final convolution value.
8. The method of claim 7, wherein generating the index data based on the value of each original data in the data packet over the same bit comprises:
and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the same-row weights in the convolution kernel corresponding to the original data, so as to generate the index data respectively corresponding to each bit.
CN202210326694.2A 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method Active CN114692833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210326694.2A CN114692833B (en) 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210326694.2A CN114692833B (en) 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method

Publications (2)

Publication Number Publication Date
CN114692833A CN114692833A (en) 2022-07-01
CN114692833B true CN114692833B (en) 2023-11-21

Family

ID=82141829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210326694.2A Active CN114692833B (en) 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method

Country Status (1)

Country Link
CN (1) CN114692833B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN108205700A (en) * 2016-12-20 2018-06-26 上海寒武纪信息科技有限公司 Neural network computing device and method
CN108229648A (en) * 2017-08-31 2018-06-29 深圳市商汤科技有限公司 Convolutional calculation method and apparatus, electronic equipment, computer storage media
CN109409514A (en) * 2018-11-02 2019-03-01 广州市百果园信息技术有限公司 Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks
CN110059822A (en) * 2019-04-24 2019-07-26 苏州浪潮智能科技有限公司 One kind compressing quantization method based on channel packet low bit neural network parameter
CN110688158A (en) * 2017-07-20 2020-01-14 上海寒武纪信息科技有限公司 Computing device and processing system of neural network
CN111428863A (en) * 2020-03-23 2020-07-17 河海大学常州校区 Low-power-consumption convolution operation circuit based on approximate multiplier
CN112434801A (en) * 2020-10-30 2021-03-02 西安交通大学 Convolution operation acceleration method for carrying out weight splitting according to bit precision
WO2021232422A1 (en) * 2020-05-22 2021-11-25 深圳市大疆创新科技有限公司 Neural network arithmetic device and control method thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190052893A (en) * 2017-11-09 2019-05-17 삼성전자주식회사 Method and apparatus for preprocessing an operation of neural network
US20220092399A1 (en) * 2021-12-06 2022-03-24 Intel Corporation Area-Efficient Convolutional Block

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN108205700A (en) * 2016-12-20 2018-06-26 上海寒武纪信息科技有限公司 Neural network computing device and method
CN110688158A (en) * 2017-07-20 2020-01-14 上海寒武纪信息科技有限公司 Computing device and processing system of neural network
CN108229648A (en) * 2017-08-31 2018-06-29 深圳市商汤科技有限公司 Convolutional calculation method and apparatus, electronic equipment, computer storage media
CN109409514A (en) * 2018-11-02 2019-03-01 广州市百果园信息技术有限公司 Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks
CN110059822A (en) * 2019-04-24 2019-07-26 苏州浪潮智能科技有限公司 One kind compressing quantization method based on channel packet low bit neural network parameter
CN111428863A (en) * 2020-03-23 2020-07-17 河海大学常州校区 Low-power-consumption convolution operation circuit based on approximate multiplier
WO2021232422A1 (en) * 2020-05-22 2021-11-25 深圳市大疆创新科技有限公司 Neural network arithmetic device and control method thereof
CN112434801A (en) * 2020-10-30 2021-03-02 西安交通大学 Convolution operation acceleration method for carrying out weight splitting according to bit precision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AdderNet: Do wen really need multiplications in deep learning?;Hanting Chen等;2020 IEEE/CVF CVPR;全文 *

Also Published As

Publication number Publication date
CN114692833A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN107633297B (en) Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN117933327A (en) Processing device, processing method, chip and electronic device
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
CN110555516A (en) FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method
EP4200722A1 (en) Tabular convolution and acceleration
CN112734020A (en) Convolution multiplication accumulation hardware acceleration device, system and method of convolution neural network
CN109240644B (en) Local search method and circuit for Yixin chip
CN111428863A (en) Low-power-consumption convolution operation circuit based on approximate multiplier
CN113642589B (en) Image feature extraction method and device, computer equipment and readable storage medium
CN114492746A (en) Federal learning acceleration method based on model segmentation
CN114692833B (en) Convolution calculation circuit, neural network processor and convolution calculation method
CN106682258A (en) Method and system for multi-operand addition optimization in high-level synthesis tool
CN111931927B (en) Method and device for reducing occupation of computing resources in NPU
CN113657587B (en) Deformable convolution acceleration method and device based on FPGA
CN115544438A (en) Twiddle factor generation method and device in digital communication system and computer equipment
CN111738432B (en) Neural network processing circuit supporting self-adaptive parallel computation
US11297127B2 (en) Information processing system and control method of information processing system
CN110807479A (en) Neural network convolution calculation acceleration method based on Kmeans algorithm
CN110831106B (en) Clustering method based on convolution
US20210303987A1 (en) Power reduction for machine learning accelerator background
CN109416757B (en) Method, apparatus and computer-readable storage medium for processing numerical data
CN111260036A (en) Neural network acceleration method and device
US20200250524A1 (en) System and method for reducing computational complexity of neural network
CN112446463A (en) Neural network full-connection layer operation method and device and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 803, Unit 2, No. 2515 Huandao North Road, Hengqin New District, Zhuhai City, Guangdong Province, 519000

Applicant after: Guangdong Qixin Semiconductor Co.,Ltd.

Address before: Room 2206, block C, building 2, Qiancheng Binhai garden, No. 3, Haicheng Road, Mabu community, Xixiang street, Bao'an District, Shenzhen, Guangdong 518103

Applicant before: Shenzhen Qixin Semiconductor Co.,Ltd.

GR01 Patent grant
GR01 Patent grant