CN114692833A - Convolution calculation circuit, neural network processor and convolution calculation method - Google Patents

Convolution calculation circuit, neural network processor and convolution calculation method Download PDF

Info

Publication number
CN114692833A
CN114692833A CN202210326694.2A CN202210326694A CN114692833A CN 114692833 A CN114692833 A CN 114692833A CN 202210326694 A CN202210326694 A CN 202210326694A CN 114692833 A CN114692833 A CN 114692833A
Authority
CN
China
Prior art keywords
data
convolution
module
weight
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210326694.2A
Other languages
Chinese (zh)
Other versions
CN114692833B (en
Inventor
卢知伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qixin Semiconductor Co ltd
Original Assignee
Shenzhen Qixin Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qixin Semiconductor Co ltd filed Critical Shenzhen Qixin Semiconductor Co ltd
Priority to CN202210326694.2A priority Critical patent/CN114692833B/en
Publication of CN114692833A publication Critical patent/CN114692833A/en
Application granted granted Critical
Publication of CN114692833B publication Critical patent/CN114692833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

One or more embodiments of the present specification propose a convolution calculation circuit, a neural network processor, and a convolution calculation method. Wherein, the circuit includes: the data grouping module divides the original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel; in each convolution calculation module, an index generation submodule generates a plurality of index data based on the numerical values of each original data in the same data group on the same bit; the weight value query submodule queries a target weight value corresponding to each index datum in a weight value table; the shift addition submodule is used for carrying out shift processing on each target weight value based on the bit of the index data in the original data and adding all the shifted target weight values to obtain an intermediate convolution value; the summation module is used for summing all the intermediate convolution values transmitted by each convolution calculation module to obtain a final convolution value; the circuit area can be reduced, and the power consumption can be reduced.

Description

Convolution calculation circuit, neural network processor and convolution calculation method
Technical Field
One or more embodiments of the present disclosure relate to the field of artificial intelligence technologies, and in particular, to a convolution calculation circuit, a neural network processor, and a convolution calculation method.
Background
With the continuous development of the technical field of artificial intelligence, various deep learning algorithms based on convolutional neural networks are widely applied to various scenes such as image processing, voice recognition and the like, so that great convenience is brought to social life, and meanwhile, huge calculation amount generated by convolutional calculation in the algorithms at the core also brings challenges to the performance and cost of related circuits such as a neural network processor and the like.
The convolution calculation mainly consists of addition and multiplication, wherein an execution module of a multiplication part is generally realized based on a multiplier in the related art at present, but the multiplier has a large circuit area and high power consumption, and is not beneficial to the performance improvement and the cost reduction of a circuit, so that how to realize the convolution calculation with a smaller circuit area and lower power consumption becomes a technical problem to be solved urgently.
Disclosure of Invention
In view of the above, one or more embodiments of the present disclosure provide a convolution calculation circuit, a neural network processor, and a convolution calculation method.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, a convolution calculation circuit is provided, which includes a data grouping module, a convolution calculation module, and a summation module, wherein the convolution calculation module further includes an index generation sub-module, a weight query sub-module, and a shift addition sub-module;
the data grouping module divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module;
in each convolution calculation module, the index generation submodule generates a plurality of index data based on the numerical values of all original data in the same data group on the same bit position, and transmits the index data to the weight query submodule;
the weight value query submodule queries a target weight value corresponding to each index data in a weight value table and transmits the target weight value to the shift addition module; the weight value table in each convolution calculation module is configured in advance based on the weight value in the convolution kernel;
the shift addition submodule is used for shifting each target weight value based on the bit of the index data in the original data, adding all the shifted target weight values to obtain an intermediate convolution value, and transmitting the intermediate convolution value to the summation module;
and the summation module is used for summing all the intermediate convolution values transmitted by the convolution calculation modules to obtain a final convolution value.
According to a second aspect of one or more embodiments herein, there is provided a neural network processor, which performs convolution calculations using the circuitry of the first aspect.
According to a third aspect of one or more embodiments of the present specification, there is provided a convolution calculation method including:
dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel;
for each data packet, generating a plurality of index data based on the values of the original data in the data packet on the same bit;
for each index data, inquiring a target weight corresponding to the index data in a weight value table of a data packet to which the index data belongs; the weight value table of each data packet is pre-configured based on the weight value in the convolution kernel;
and shifting each target weight value based on the bit of the index data in the original data, and adding all the shifted target weight values to obtain a final convolution value.
As can be seen from the above description, the convolution calculating circuit in this specification, first, performs data grouping division on original data through a data grouping module; meanwhile, each data group is provided with a convolution calculation module, in each convolution calculation module, index data is generated by the original data of the same data group through an index generation submodule, then a target weight corresponding to each index data is searched through a weight query submodule, and all the target weights are subjected to shift addition through a shift addition submodule to obtain an intermediate convolution value; and finally, summing all the intermediate convolution values through a summing module to obtain a final convolution value.
According to the scheme, the part related to multiplication in the convolution calculation is successfully converted into the weight index and the shift addition after the data grouping, so that the use of a multiplier in a convolution calculation circuit is avoided, the circuit area is reduced, the circuit power consumption is reduced, and the optimization of performance and cost is facilitated.
Drawings
For the sake of clarity, the drawings needed in this description of embodiments or related technical description will be briefly described below, it is obvious that the drawings in the following description are only parts of the embodiments of the present description, and it is obvious for those skilled in the art that other drawings may be obtained from these drawings without inventive labor.
Fig. 1 is a schematic structural diagram of a convolution calculating circuit according to an exemplary embodiment.
Fig. 2 is a schematic structural diagram of a convolution calculation module according to an exemplary embodiment.
FIG. 3 is a flow chart of a convolution calculation method provided by an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
At present, deep learning algorithms widely applied to various scenes such as image processing, voice recognition and the like are mostly realized based on convolutional neural network models, and the algorithm core of the deep learning algorithms is actually a large amount of convolutional calculation positioned on a convolutional layer of a neural network.
The convolution calculation mainly consists of addition and multiplication, taking image processing as an example, when a convolution kernel with the size of 3 x 3 is adopted
Figure BDA0003573789950000031
Calculating pixel regions
Figure BDA0003573789950000032
M on center pixel11The convolution value R is calculated according to the following formula: r ═ M00×W00+M01×W01+M02×W02+M10×W10+M11×W11+M12×W12+M20×W20+M21×W21+M22×W22(ii) a Wherein, WpqAs weights in the convolution kernel, MpqP and q are natural numbers for image data such as gray scale on a pixel point.
In the related art, the circuit for performing the convolution calculation is generally implemented based on adders and multipliers, for example, the calculation of the convolution value R can be implemented by combining 9 multipliers with 8 adders, but this solution is not favorable for optimizing the performance and cost of the convolution calculation circuit because the multiplier is a device with large circuit area and high power consumption. With the increasingly complex and increasing requirements of the algorithm, how to implement convolution calculation with smaller circuit area and lower power consumption becomes an urgent technical problem to be solved.
In view of the above, the present specification provides a convolution calculation circuit, which can be used to execute various deep learning algorithms based on a convolutional neural network.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a convolution calculating circuit according to an exemplary embodiment of the present disclosure.
The convolution calculation circuit comprises a data grouping module, a convolution calculation module and a summation module; the convolution calculation module also comprises an index generation sub-module, a weight inquiry sub-module and a shift addition sub-module.
In a convolution calculation circuit, the number of the data grouping modules 10 may be 1, the number of the summing modules 30 may also be 1, and the number of the convolution calculation modules is related to the size of the convolution kernel in the convolution neural network. In an alternative implementation, if the size of the convolution kernel is n × n, the number of convolution calculation modules disposed in the convolution calculation circuit may be n, and is respectively denoted as the convolution calculation module 20, the convolution calculation module 21, the. Wherein n is a positive integer greater than 1.
As is common, in the case that the convolution kernel size is 3 × 3, the number of the data grouping module, the convolution calculating module, and the summing module included in the convolution calculating circuit is 1, 3, and 1 respectively, which are respectively denoted as the data grouping module 10, the convolution calculating module 20, the convolution calculating module 21, the convolution calculating module 22, and the summing module 30.
It should be noted that, in the case that the size of the convolution kernel is 5 × 5 or more, more convolution calculation modules may be provided in the convolution calculation circuit, or a large convolution kernel may be converted into a plurality of small convolution kernels for processing, for example, a convolution kernel of 5 × 5 may be converted into 4 convolution kernels of 3 × 3 for processing, so as to reduce the complexity of the circuit and improve the applicability of the circuit.
In terms of connection, the data grouping module 10 may be connected to an index generation submodule 2i1 in each convolution calculation module 2i, the summation module 30 may be connected to a shift addition submodule 2i3 in each convolution calculation module 2i, and in each convolution calculation module 2i, the index generation submodule 2i1, the weight query submodule 2i2, and the shift summation submodule 2i3 may be connected in sequence; wherein i is a natural number between 0 and n-1.
The following is a further detailed description of each module and sub-module in the convolution calculation circuit.
The data grouping module 10 divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relationship between the original data and the weight in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module 2 i.
Corresponding relations exist between a plurality of original data to be processed and each weight in the convolution kernel; taking image processing as an example, the original data may be image data on each pixel point of an image to be processed, and in one convolution calculation, each weight in a convolution kernel of 3 × 3 corresponds to image data on each pixel point on a pixel area of 3 × 3 one to one. It can be understood that if the scene is not the image processing scene, the original data may also form a corresponding relationship with each weight in the convolution kernel based on a specific algorithm, and details are not repeated here depending on the algorithm.
After receiving the incoming raw data, the data grouping module 10 may divide the raw data into a plurality of data groups based on the correspondence between the raw data and the weights in the convolution kernel. In an alternative implementation, the data grouping module 10 may divide the original data corresponding to the same row weight in the convolution kernel into the same data grouping; in another alternative implementation, the data grouping module 10 may also divide the original data corresponding to the same column weight in the convolution kernel into the same data grouping.
After dividing a plurality of original data into a plurality of data packets, the data packet module 10 may transmit the original data belonging to the same data packet to the index generation submodule 2i1 of the same convolution calculation module 2i, and the original data belonging to different data packets are transmitted to different convolution calculation modules, respectively.
By convolution kernel
Figure BDA0003573789950000051
Corresponding to the original data
Figure BDA0003573789950000052
For example, data grouping module 10 may group raw data M corresponding to a first row weight in a convolution kernel00、M01、M02The index is transmitted to the sub-module 201 of convolution calculation module 20, and the original data M corresponding to the second row weight in the convolution kernel is processed10、M11、M12The index is transmitted to the index generation submodule 211 of the convolution calculation module 21, and the original data M corresponding to the third row weight value in the convolution kernel is processed20、M21、M22And is transmitted to the index generation submodule 221 of the convolution calculation module 22.
In each convolution calculation module 2i, an index generation submodule 2i1 generates a plurality of index data based on the values of the original data in the same data packet on the same bit, and transmits the index data to a weight query submodule 2i 2.
The index generation submodule 2i1, after receiving a plurality of original data belonging to the same data packet and coming from the data packet module 10, may generate a plurality of index data based on the values of the original data on the same bit. In an alternative implementation, if the data grouping module 10 groups the original data corresponding to the weights in the same row in the convolution kernel into a group, the index generation submodule 2i1 may take the values of the original data on the same bit, and generate the index data corresponding to the bit according to the sequential arrangement of the original data corresponding to the weights in the same row in the convolution kernel, so as to generate the index data corresponding to each bit. It is understood that, if the data packet 10 groups the original data corresponding to the same column of weights in the convolution kernel, the index generation submodule 2i1 may take the values of the original data at the same bit, and generate the index data according to the sequence arrangement of the original data corresponding to the same column of weights in the convolution kernel.
Under the condition that the size of a convolution kernel is n multiplied by n and the bit width of original data is m, based on n original data in the same data packet, m index data with the bit width of n respectively corresponding to a bit [ m-1] to a bit [0] can be generated; wherein m is a positive integer.
Referring to table 1 below, based on the previous example, assuming that the bit width of the original data is 8, the original data M is received by the data packet module 1000(0100 1000)、M01(1010 0111)、M02(00011010) the index generation submodule 201 of the convolution calculation module 20 may take their values at the same bit, according to M00、M01、M02The values are respectively index data bit [0]]Bit 1]Bit 2]Arranging the upper numerical values in order to generate index data corresponding to each bit of the original data, and then arranging the bit [7]]To bit [0]]The corresponding 8 pieces of 3-bit index data are transmitted to the weight query submodule 202 of the convolution calculation module 20.
Figure BDA0003573789950000053
Figure BDA0003573789950000061
TABLE 1
Similarly, the original data M coming in after receiving the data packet 1010、M11、M12Thereafter, the index generation submodule 211 of the convolution calculation module 21 may generate an index based on the raw data M10、M11、M12Generating bit [7]]To bit [0]]The corresponding 8 index data are then transmitted to the weight query submodule 212 of the convolution calculation module 21.
The raw data M coming in from the receiving data packet module 120、M21、M22Thereafter, the index generation submodule 221 of the convolution calculation module 22 may generate an index based on the raw data M20、M21、M22Generating bit [7]]To bit [0]]Corresponding 8 index data, then transmitted to the convolution calculationThe weight query sub-module 222 of module 22.
In each convolution calculation module 2i, a weight query submodule 2i2 queries a target weight corresponding to each index data in a weight table, and transmits the target weight to a shift addition module 2i 3; and the weight value table in each convolution calculation module is configured in advance based on the weight value in the convolution kernel.
After the weight query submodule 2i2 receives the index data corresponding to each bit and transmitted from the index generation submodule 2i1, the weight query submodule 2i2 may query a target weight corresponding to the index data in a weight table pre-configured by the convolution calculation module 2 i.
In the case of a convolution kernel size of n × n, the weight table includes a value range of 0 to 2n2 of-1nThe bit width of the target weight corresponding to each index data is the same as the bit width of the weight in the convolution kernel, and is generally the same as the bit width of the original data; because the convolution kernel in the convolution calculation is generally unchanged, the weight value table in each convolution calculation module can be used for processing different original data after being configured in advance. Under the condition that the bit width of the original data is m, target weights corresponding to m index data are inquired, the index data corresponding to different bits can be the same in numerical value, the target weights obtained by inquiring the index data with the same numerical value are the same, and subsequent shifting processing is different.
Referring to table 3 below, based on the previous example, assuming that the weight value table of the convolution calculation module 20 is shown in table 2 below, after receiving 8 3-bit index data respectively corresponding to bits [7] to [0] of the original data, which are transmitted from the index generation sub-module 201, the weight query sub-module 202 of the convolution calculation module 20 may query to obtain 8 target weights respectively corresponding to bits [7] to [0], and then transmit the target weights to the shift addition sub-module 203 of the convolution calculation module 20.
Index data 000 001 010 011 100 101 110 111
Target weight W000 W001 W010 W011 W100 W101 W110 W111
TABLE 2
Figure BDA0003573789950000062
Figure BDA0003573789950000071
TABLE 3
Similarly, after receiving 8 index data transmitted by the index generation sub-module 211, the weight query sub-module 212 of the convolution calculation module 21 may find corresponding 8 target weights, and then transmit the 8 target weights to the shift addition sub-module 213 of the convolution calculation module 21.
After receiving the 8 index data transmitted by the index generation sub-module 221, the weight query sub-module 222 of the convolution calculation module 22 may search the corresponding 8 target weights, and then transmit the 8 target weights to the shift addition sub-module 223 of the convolution calculation module 22.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a convolution calculating module according to an exemplary embodiment of the present disclosure.
In an alternative implementation, the convolution calculation module 2i further includes a weight table configuration module 2i 4.
In terms of connection relation, the weight table configuration module 2i4 is connected to the weight query module 2i2 in the convolution calculation module 2 i.
In a case that the data grouping module 10 groups the original data corresponding to the weights in the same row in the convolution kernel into one group, the weight table configuring module 2i4 may determine the target weights corresponding to the index data by calculating the weights in the same row in the convolution kernel based on the combination and addition rule corresponding to the index data, so as to determine the target weights corresponding to the index data respectively, so as to configure the weight table of the weight querying module 2i 2. It will be appreciated that where the data packet 10 groups raw data corresponding to the same column of weights in the convolution kernel, the target weights in the weight table are determined by the weight calculation of the same column in the convolution kernel.
It should be noted that, in the weight value table configuration module of different convolution calculation modules, the weights of different rows in the convolution kernel are adopted, and they should correspond to the original data processed by the convolution calculation module. Based on the foregoing example, the raw data M is processed00、M01、M02In the convolution calculating module 20, the weight W of the first row at the same position in the convolution kernel00、W01、W02Calculating a weight value table, processing the original data M10、M11、M12In the convolution calculation module, the weight W of the second line at the same position in the convolution kernel10、W11、W12Calculating a determination weight table to process the raw data M20、M21、M22In the convolution calculating module 22, the weight W of the third row at the same position in the convolution kernel20、W21、W22And calculating a determination weight value table.
Referring to table 5 below, assuming the corresponding combinatorial-addition rule of the index data is shown in table 4 below based on the previous example, the weights W of the same row in the received incoming convolution kernel00、W01、W02Then, based on the combination and addition rule, the weight table configuration module 204 of the convolution calculation module 20 may determine the target weights corresponding to the index data 000 to the index data 111, respectively.
Figure BDA0003573789950000072
TABLE 4
Figure BDA0003573789950000081
TABLE 5
Similarly, upon receiving the weights of the same row in the incoming convolution kernel, the weight table configuration module 214 of the convolution calculation module 21 may determine the weight table as shown in table 6 below.
Figure BDA0003573789950000082
TABLE 6
Upon receiving the weights for the same row in the incoming convolution kernel, the weight table configuration module 224 of the convolution calculation module 22 may determine the weight table as shown in table 7 below.
Figure BDA0003573789950000083
TABLE 7
In each convolution calculation module 2i, the shift addition submodule 2i3 shifts each target weight based on the bit of the index data in the original data, adds all the shifted target weights to obtain an intermediate convolution value, and transmits the intermediate convolution value to the summation module 30.
After receiving the target weight corresponding to each index data transmitted by the weight query submodule 2i2, the shift addition submodule 2i3 may perform left shift processing on the target weight, which is consistent with the bit position of the index data in the original data, and then add all the shifted target weights to obtain the intermediate convolution value R of the convolution calculation module 2iiAnd the intermediate convolution value R is usediTo the summing block 30.
Based on the previous example, the bits [7] corresponding to the original data bits respectively and as shown in Table 3 are transmitted from the received weight query submodule 202]To bit [0]]After 8 target weights of 3 bits, the shift-and-add submodule 203 of the convolution calculation module 20 may add the corresponding bit [7]]Target weight W of010Left shifted by 7 to give W'010Will correspond to bit [6 ]]Target weight W of001Left-shifted by 6 to obtain W'001Repeating the above steps, adding the 8 shifted target weights to obtain an intermediate convolution value R of the convolution calculation module 200,R0=W’010+W’001+W’010+W’100+W’101+W’010+W’110+W’010
Similarly, after receiving 8 target weights transmitted by the weight query submodule 212, the shift addition submodule 213 of the convolution calculation module 21 may perform left shift corresponding to the bit position of the target weight, and add the 8 shifted target weights to obtain the middle convolution value R of the convolution calculation module 211
After receiving the 8 target weights transmitted by the weight query submodule 222, the shift addition submodule 223 of the convolution calculation module 22 may add the 8 target weights to the targetThe weight value is shifted to the left corresponding to the bit position, and the target weight values after 8 shift processes are added to obtain the middle convolution value R of the convolution calculation module 222
And the summation module 30 is used for summing all the intermediate convolution values transmitted by the convolution calculation modules 2i to obtain a final convolution value.
The summation module 30 receives the intermediate convolution values R from the convolution calculation modules 2iiThen, summing all the intermediate convolution values to obtain the final convolution value
Figure BDA0003573789950000091
Based on the foregoing example, the convolution calculation module 20 can be derived from the raw data M00、M01、M02Obtaining an intermediate convolution value R0The convolution calculation module 21 may be constructed from the original data M10、M11、M12Obtaining an intermediate convolution value R1The convolution calculation module 22 can be constructed from the raw data M20、M21、M22Obtaining an intermediate convolution value R2Finally, summing module 30 may sum intermediate convolution values R0Intermediate convolution value R1And an intermediate convolution value R2And summing to obtain a final convolution value R.
The specific circuit implementation of each module and each submodule is not complex, a multiplier is not needed, and various selectable implementation modes exist; for example, the weight query sub-module may be implemented based on components such as a data selector, and the summation module may be implemented based on an adder, which is not described in detail.
As can be seen from the above description, the convolution calculating circuit in this specification, first, divides the original data into data packets by the data packet module; meanwhile, each data group is provided with a convolution calculation module, in each convolution calculation module, index data is generated by the original data of the same data group through an index generation submodule, then a target weight corresponding to each index data is searched through a weight query submodule, and all the target weights are subjected to shift addition through a shift addition submodule to obtain an intermediate convolution value; and finally, summing all the intermediate convolution values through a summing module to obtain a final convolution value.
According to the scheme, the part related to multiplication in the convolution calculation is successfully converted into the weight index and the shift addition after the data grouping, so that the use of a multiplier in a convolution calculation circuit is avoided, the circuit area is reduced, the circuit power consumption is reduced, and the optimization of performance and cost is facilitated.
The present specification also provides a neural network processor, which can be used for executing various deep learning algorithms based on the convolutional neural network. The neural network processor executes convolution calculation by adopting the convolution calculation circuit provided in the specification; on the hardware level, in addition to the convolution calculation circuit, the convolution calculation circuit can also comprise a bus, an interface, a memory and a nonvolatile memory, and other hardware required by services.
The present specification also proposes a convolution calculation method, which can be used for executing various deep learning algorithms based on a convolution neural network.
Referring to fig. 3, fig. 3 is a flowchart illustrating a convolution calculation method according to an exemplary embodiment of the present disclosure.
The convolution calculation method can comprise the following specific steps:
step 302, dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel;
step 304, aiming at each data packet, generating a plurality of index data based on the numerical values of the original data in the data packet on the same bit;
step 306, for each index data, querying a target weight corresponding to the index data in a weight value table of a data packet to which the index data belongs; the weight value table of each data packet is pre-configured based on the weight value in the convolution kernel;
and 308, shifting each target weight value based on the bit of the index data in the original data, and adding all the shifted target weight values to obtain a final convolution value.
It is understood that, in step 308, the adding and summing of all the shifted target weights may be decomposed into adding all the shifted target weights under each data packet to obtain an intermediate convolution value of the data packet, and adding the intermediate convolution values of all the data packets to obtain a final convolution value.
Optionally, in step 302, the dividing the raw data to be processed into data packets based on the correspondence between the raw data and the weights in the convolution kernel includes:
and dividing the original data corresponding to the same row weight in the convolution kernel into the same data packet.
Optionally, in step 304, the generating a plurality of index data based on the values of the original data in the data packet at the same bit includes:
and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the weights in the same row in the convolution kernel corresponding to the original data, thereby generating the index data corresponding to each bit respectively.
Optionally, the process of pre-configuring the weight value table of each data packet based on the weight value in the convolution kernel includes:
and based on a combined addition rule corresponding to the index data, calculating and determining a target weight corresponding to the index data by using weights in the same row in the convolution kernel, thereby determining the target weight corresponding to each index data respectively so as to configure a weight value table of a data group to which the original data corresponding to the weight belongs.
Optionally, when the method is applied to image processing, the plurality of original data to be processed are image data on each pixel point of the image to be processed.
As can be seen from the above description, the convolution calculation method in this specification converts the part involved in multiplication in convolution calculation successfully into weight index and shift addition after data grouping, thereby avoiding the use of a multiplier in a circuit when convolution calculation is implemented, facilitating the reduction of circuit area and the reduction of circuit power consumption, and facilitating the optimization of performance and cost.
The methods illustrated in the above embodiments may be implemented based on software, for example, reading and running a corresponding computer program to implement; of course, other implementations are not excluded, such as logic devices or combinations of software and hardware, that is, the execution main body of the processing flow is not limited to each logic unit, and may be hardware or logic devices.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (10)

1. A convolution calculation circuit is characterized by comprising a data grouping module, a convolution calculation module and a summation module, wherein the convolution calculation module also comprises an index generation sub-module, a weight inquiry sub-module and a shift addition sub-module;
the data grouping module divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module;
in each convolution calculation module, the index generation submodule generates a plurality of index data based on the numerical values of all original data in the same data group on the same bit position, and transmits the index data to the weight query submodule;
the weight value query submodule queries a target weight value corresponding to each index datum in a weight value table and transmits the target weight value to the shift addition module; the weight value table in each convolution calculation module is configured in advance based on the weight value in the convolution kernel;
the shift addition submodule is used for performing shift processing on each target weight value based on the bit of the index data in the original data, adding all the shifted target weight values to obtain an intermediate convolution value, and transmitting the intermediate convolution value to the summation module;
and the summation module is used for summing all the intermediate convolution values transmitted by the convolution calculation modules to obtain a final convolution value.
2. The circuit according to claim 1, wherein the data grouping module, when dividing the plurality of original data to be processed into a plurality of data groups based on the correspondence between the original data and the weight in the convolution kernel, is specifically configured to:
and dividing the original data corresponding to the same row weight in the convolution kernel into the same data packet.
3. The circuit of claim 1, wherein the index generation submodule, when generating the plurality of index data based on the values of the original data in the same data packet at the same bit, is specifically configured to:
and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the weights in the same row in the convolution kernel corresponding to the original data, thereby generating the index data corresponding to each bit respectively.
4. The circuit of claim 1, wherein the convolution computation module further comprises a weight table configuration module:
and the weight value table configuration module is used for determining a target weight value corresponding to the index data through the weight value calculation of the same row in the convolution kernel based on the combined addition rule corresponding to the index data, so that the target weight value corresponding to each index data is determined to configure the weight value table of the convolution calculation module.
5. The circuit according to claim 1, wherein in a case where a convolution kernel size is n × n, the number of the convolution calculation modules included in the convolution calculation circuit is n; wherein n is a positive integer.
6. The circuit of claim 1, wherein the raw data to be processed is image data at each pixel point of an image to be processed when applied to image processing.
7. A neural network processor, wherein the circuit of any one of claims 1 to 6 is used to perform convolution calculations.
8. A method of convolution computation, the method comprising:
dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel;
for each data packet, generating a plurality of index data based on the values of the original data in the data packet on the same bit;
for each index data, inquiring a target weight corresponding to the index data in a weight value table of a data packet to which the index data belongs; the weight value table of each data packet is pre-configured based on the weight value in the convolution kernel;
and shifting each target weight value based on the bit of the index data in the original data, and adding all the shifted target weight values to obtain a final convolution value.
9. The method of claim 8, wherein generating index data based on values of respective original data in the data packet at a same bit comprises:
and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the weights in the same row in the convolution kernel corresponding to the original data, thereby generating the index data corresponding to each bit respectively.
10. The method of claim 8, wherein the pre-configuring the weight table for each data packet based on the weights in the convolution kernel comprises:
and based on a combined addition rule corresponding to the index data, calculating and determining a target weight corresponding to the index data by using weights in the same row in the convolution kernel, thereby determining the target weight corresponding to each index data respectively so as to configure a weight value table of a data group to which the original data corresponding to the weight belongs.
CN202210326694.2A 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method Active CN114692833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210326694.2A CN114692833B (en) 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210326694.2A CN114692833B (en) 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method

Publications (2)

Publication Number Publication Date
CN114692833A true CN114692833A (en) 2022-07-01
CN114692833B CN114692833B (en) 2023-11-21

Family

ID=82141829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210326694.2A Active CN114692833B (en) 2022-03-30 2022-03-30 Convolution calculation circuit, neural network processor and convolution calculation method

Country Status (1)

Country Link
CN (1) CN114692833B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN108205700A (en) * 2016-12-20 2018-06-26 上海寒武纪信息科技有限公司 Neural network computing device and method
CN108229648A (en) * 2017-08-31 2018-06-29 深圳市商汤科技有限公司 Convolutional calculation method and apparatus, electronic equipment, computer storage media
CN109409514A (en) * 2018-11-02 2019-03-01 广州市百果园信息技术有限公司 Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks
US20190138891A1 (en) * 2017-11-09 2019-05-09 Samsung Electronics Co., Ltd. Apparatus and method with neural network
CN110059822A (en) * 2019-04-24 2019-07-26 苏州浪潮智能科技有限公司 One kind compressing quantization method based on channel packet low bit neural network parameter
CN110688158A (en) * 2017-07-20 2020-01-14 上海寒武纪信息科技有限公司 Computing device and processing system of neural network
CN111428863A (en) * 2020-03-23 2020-07-17 河海大学常州校区 Low-power-consumption convolution operation circuit based on approximate multiplier
CN112434801A (en) * 2020-10-30 2021-03-02 西安交通大学 Convolution operation acceleration method for carrying out weight splitting according to bit precision
WO2021232422A1 (en) * 2020-05-22 2021-11-25 深圳市大疆创新科技有限公司 Neural network arithmetic device and control method thereof
US20220092399A1 (en) * 2021-12-06 2022-03-24 Intel Corporation Area-Efficient Convolutional Block

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447034A (en) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 Neutral network processor based on data compression, design method and chip
CN108205700A (en) * 2016-12-20 2018-06-26 上海寒武纪信息科技有限公司 Neural network computing device and method
CN110688158A (en) * 2017-07-20 2020-01-14 上海寒武纪信息科技有限公司 Computing device and processing system of neural network
CN108229648A (en) * 2017-08-31 2018-06-29 深圳市商汤科技有限公司 Convolutional calculation method and apparatus, electronic equipment, computer storage media
US20190138891A1 (en) * 2017-11-09 2019-05-09 Samsung Electronics Co., Ltd. Apparatus and method with neural network
CN109409514A (en) * 2018-11-02 2019-03-01 广州市百果园信息技术有限公司 Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks
CN110059822A (en) * 2019-04-24 2019-07-26 苏州浪潮智能科技有限公司 One kind compressing quantization method based on channel packet low bit neural network parameter
CN111428863A (en) * 2020-03-23 2020-07-17 河海大学常州校区 Low-power-consumption convolution operation circuit based on approximate multiplier
WO2021232422A1 (en) * 2020-05-22 2021-11-25 深圳市大疆创新科技有限公司 Neural network arithmetic device and control method thereof
CN112434801A (en) * 2020-10-30 2021-03-02 西安交通大学 Convolution operation acceleration method for carrying out weight splitting according to bit precision
US20220092399A1 (en) * 2021-12-06 2022-03-24 Intel Corporation Area-Efficient Convolutional Block

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HANTING CHEN等: "AdderNet: Do wen really need multiplications in deep learning?", 2020 IEEE/CVF CVPR *

Also Published As

Publication number Publication date
CN114692833B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN111221578B (en) Computing device and computing method
CN110689125A (en) Computing device
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN108733348B (en) Fused vector multiplier and method for performing operation using the same
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
CN112434801B (en) Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
CN109165006B (en) Design optimization and hardware implementation method and system of Softmax function
CN111814957B (en) Neural network operation method and related equipment
CN110543936A (en) Multi-parallel acceleration method for CNN full-connection layer operation
Arredondo-Velazquez et al. A streaming architecture for Convolutional Neural Networks based on layer operations chaining
CN109240644B (en) Local search method and circuit for Yixin chip
CN106682258A (en) Method and system for multi-operand addition optimization in high-level synthesis tool
CN113918120A (en) Computing device, neural network processing apparatus, chip, and method of processing data
CN114692833B (en) Convolution calculation circuit, neural network processor and convolution calculation method
CN109379191B (en) Dot multiplication operation circuit and method based on elliptic curve base point
CN114758209B (en) Convolution result obtaining method and device, computer equipment and storage medium
CN1207879C (en) Viterbi equalization using precalculated metrics increments
CN115544438A (en) Twiddle factor generation method and device in digital communication system and computer equipment
CN111738432B (en) Neural network processing circuit supporting self-adaptive parallel computation
US11297127B2 (en) Information processing system and control method of information processing system
CN110807479A (en) Neural network convolution calculation acceleration method based on Kmeans algorithm
CN111260036A (en) Neural network acceleration method and device
CN113377333A (en) Hardware computing system and method for solving N root openings of complex numbers based on parabolic synthesis method
Yin et al. A reconfigurable accelerator for generative adversarial network training based on FPGA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 803, Unit 2, No. 2515 Huandao North Road, Hengqin New District, Zhuhai City, Guangdong Province, 519000

Applicant after: Guangdong Qixin Semiconductor Co.,Ltd.

Address before: Room 2206, block C, building 2, Qiancheng Binhai garden, No. 3, Haicheng Road, Mabu community, Xixiang street, Bao'an District, Shenzhen, Guangdong 518103

Applicant before: Shenzhen Qixin Semiconductor Co.,Ltd.

GR01 Patent grant
GR01 Patent grant