CN114692833A

CN114692833A - Convolution calculation circuit, neural network processor and convolution calculation method

Info

Publication number: CN114692833A
Application number: CN202210326694.2A
Authority: CN
Inventors: 卢知伯
Original assignee: Shenzhen Qixin Semiconductor Co ltd
Current assignee: Shenzhen Qixin Semiconductor Co ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-01
Anticipated expiration: 2042-03-30
Also published as: CN114692833B

Abstract

One or more embodiments of the present specification propose a convolution calculation circuit, a neural network processor, and a convolution calculation method. Wherein, the circuit includes: the data grouping module divides the original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel; in each convolution calculation module, an index generation submodule generates a plurality of index data based on the numerical values of each original data in the same data group on the same bit; the weight value query submodule queries a target weight value corresponding to each index datum in a weight value table; the shift addition submodule is used for carrying out shift processing on each target weight value based on the bit of the index data in the original data and adding all the shifted target weight values to obtain an intermediate convolution value; the summation module is used for summing all the intermediate convolution values transmitted by each convolution calculation module to obtain a final convolution value; the circuit area can be reduced, and the power consumption can be reduced.

Description

Convolution calculation circuit, neural network processor and convolution calculation method

Technical Field

One or more embodiments of the present disclosure relate to the field of artificial intelligence technologies, and in particular, to a convolution calculation circuit, a neural network processor, and a convolution calculation method.

Background

With the continuous development of the technical field of artificial intelligence, various deep learning algorithms based on convolutional neural networks are widely applied to various scenes such as image processing, voice recognition and the like, so that great convenience is brought to social life, and meanwhile, huge calculation amount generated by convolutional calculation in the algorithms at the core also brings challenges to the performance and cost of related circuits such as a neural network processor and the like.

The convolution calculation mainly consists of addition and multiplication, wherein an execution module of a multiplication part is generally realized based on a multiplier in the related art at present, but the multiplier has a large circuit area and high power consumption, and is not beneficial to the performance improvement and the cost reduction of a circuit, so that how to realize the convolution calculation with a smaller circuit area and lower power consumption becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, one or more embodiments of the present disclosure provide a convolution calculation circuit, a neural network processor, and a convolution calculation method.

To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

according to a first aspect of one or more embodiments of the present specification, a convolution calculation circuit is provided, which includes a data grouping module, a convolution calculation module, and a summation module, wherein the convolution calculation module further includes an index generation sub-module, a weight query sub-module, and a shift addition sub-module;

the data grouping module divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module;

in each convolution calculation module, the index generation submodule generates a plurality of index data based on the numerical values of all original data in the same data group on the same bit position, and transmits the index data to the weight query submodule;

the weight value query submodule queries a target weight value corresponding to each index data in a weight value table and transmits the target weight value to the shift addition module; the weight value table in each convolution calculation module is configured in advance based on the weight value in the convolution kernel;

the shift addition submodule is used for shifting each target weight value based on the bit of the index data in the original data, adding all the shifted target weight values to obtain an intermediate convolution value, and transmitting the intermediate convolution value to the summation module;

and the summation module is used for summing all the intermediate convolution values transmitted by the convolution calculation modules to obtain a final convolution value.

According to a second aspect of one or more embodiments herein, there is provided a neural network processor, which performs convolution calculations using the circuitry of the first aspect.

According to a third aspect of one or more embodiments of the present specification, there is provided a convolution calculation method including:

dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel;

for each data packet, generating a plurality of index data based on the values of the original data in the data packet on the same bit;

for each index data, inquiring a target weight corresponding to the index data in a weight value table of a data packet to which the index data belongs; the weight value table of each data packet is pre-configured based on the weight value in the convolution kernel;

and shifting each target weight value based on the bit of the index data in the original data, and adding all the shifted target weight values to obtain a final convolution value.

As can be seen from the above description, the convolution calculating circuit in this specification, first, performs data grouping division on original data through a data grouping module; meanwhile, each data group is provided with a convolution calculation module, in each convolution calculation module, index data is generated by the original data of the same data group through an index generation submodule, then a target weight corresponding to each index data is searched through a weight query submodule, and all the target weights are subjected to shift addition through a shift addition submodule to obtain an intermediate convolution value; and finally, summing all the intermediate convolution values through a summing module to obtain a final convolution value.

According to the scheme, the part related to multiplication in the convolution calculation is successfully converted into the weight index and the shift addition after the data grouping, so that the use of a multiplier in a convolution calculation circuit is avoided, the circuit area is reduced, the circuit power consumption is reduced, and the optimization of performance and cost is facilitated.

Drawings

For the sake of clarity, the drawings needed in this description of embodiments or related technical description will be briefly described below, it is obvious that the drawings in the following description are only parts of the embodiments of the present description, and it is obvious for those skilled in the art that other drawings may be obtained from these drawings without inventive labor.

Fig. 1 is a schematic structural diagram of a convolution calculating circuit according to an exemplary embodiment.

Fig. 2 is a schematic structural diagram of a convolution calculation module according to an exemplary embodiment.

FIG. 3 is a flow chart of a convolution calculation method provided by an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.

It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

At present, deep learning algorithms widely applied to various scenes such as image processing, voice recognition and the like are mostly realized based on convolutional neural network models, and the algorithm core of the deep learning algorithms is actually a large amount of convolutional calculation positioned on a convolutional layer of a neural network.

The convolution calculation mainly consists of addition and multiplication, taking image processing as an example, when a convolution kernel with the size of 3 x 3 is adopted

Calculating pixel regions

M on center pixel₁₁The convolution value R is calculated according to the following formula: r ═ M₀₀×W₀₀+M₀₁×W₀₁+M₀₂×W₀₂+M₁₀×W₁₀+M₁₁×W₁₁+M₁₂×W₁₂+M₂₀×W₂₀+M₂₁×W₂₁+M₂₂×W₂₂(ii) a Wherein, W_pqAs weights in the convolution kernel, M_pqP and q are natural numbers for image data such as gray scale on a pixel point.

In the related art, the circuit for performing the convolution calculation is generally implemented based on adders and multipliers, for example, the calculation of the convolution value R can be implemented by combining 9 multipliers with 8 adders, but this solution is not favorable for optimizing the performance and cost of the convolution calculation circuit because the multiplier is a device with large circuit area and high power consumption. With the increasingly complex and increasing requirements of the algorithm, how to implement convolution calculation with smaller circuit area and lower power consumption becomes an urgent technical problem to be solved.

In view of the above, the present specification provides a convolution calculation circuit, which can be used to execute various deep learning algorithms based on a convolutional neural network.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a convolution calculating circuit according to an exemplary embodiment of the present disclosure.

The convolution calculation circuit comprises a data grouping module, a convolution calculation module and a summation module; the convolution calculation module also comprises an index generation sub-module, a weight inquiry sub-module and a shift addition sub-module.

In a convolution calculation circuit, the number of the data grouping modules 10 may be 1, the number of the summing modules 30 may also be 1, and the number of the convolution calculation modules is related to the size of the convolution kernel in the convolution neural network. In an alternative implementation, if the size of the convolution kernel is n × n, the number of convolution calculation modules disposed in the convolution calculation circuit may be n, and is respectively denoted as the convolution calculation module 20, the convolution calculation module 21, the. Wherein n is a positive integer greater than 1.

As is common, in the case that the convolution kernel size is 3 × 3, the number of the data grouping module, the convolution calculating module, and the summing module included in the convolution calculating circuit is 1, 3, and 1 respectively, which are respectively denoted as the data grouping module 10, the convolution calculating module 20, the convolution calculating module 21, the convolution calculating module 22, and the summing module 30.

It should be noted that, in the case that the size of the convolution kernel is 5 × 5 or more, more convolution calculation modules may be provided in the convolution calculation circuit, or a large convolution kernel may be converted into a plurality of small convolution kernels for processing, for example, a convolution kernel of 5 × 5 may be converted into 4 convolution kernels of 3 × 3 for processing, so as to reduce the complexity of the circuit and improve the applicability of the circuit.

In terms of connection, the data grouping module 10 may be connected to an index generation submodule 2i1 in each convolution calculation module 2i, the summation module 30 may be connected to a shift addition submodule 2i3 in each convolution calculation module 2i, and in each convolution calculation module 2i, the index generation submodule 2i1, the weight query submodule 2i2, and the shift summation submodule 2i3 may be connected in sequence; wherein i is a natural number between 0 and n-1.

The following is a further detailed description of each module and sub-module in the convolution calculation circuit.

The data grouping module 10 divides a plurality of original data to be processed into a plurality of data groups based on the corresponding relationship between the original data and the weight in the convolution kernel, and transmits the original data belonging to the same data group to the same convolution calculation module 2 i.

Corresponding relations exist between a plurality of original data to be processed and each weight in the convolution kernel; taking image processing as an example, the original data may be image data on each pixel point of an image to be processed, and in one convolution calculation, each weight in a convolution kernel of 3 × 3 corresponds to image data on each pixel point on a pixel area of 3 × 3 one to one. It can be understood that if the scene is not the image processing scene, the original data may also form a corresponding relationship with each weight in the convolution kernel based on a specific algorithm, and details are not repeated here depending on the algorithm.

After receiving the incoming raw data, the data grouping module 10 may divide the raw data into a plurality of data groups based on the correspondence between the raw data and the weights in the convolution kernel. In an alternative implementation, the data grouping module 10 may divide the original data corresponding to the same row weight in the convolution kernel into the same data grouping; in another alternative implementation, the data grouping module 10 may also divide the original data corresponding to the same column weight in the convolution kernel into the same data grouping.

After dividing a plurality of original data into a plurality of data packets, the data packet module 10 may transmit the original data belonging to the same data packet to the index generation submodule 2i1 of the same convolution calculation module 2i, and the original data belonging to different data packets are transmitted to different convolution calculation modules, respectively.

By convolution kernel

Corresponding to the original data

For example, data grouping module 10 may group raw data M corresponding to a first row weight in a convolution kernel₀₀、M₀₁、M₀₂The index is transmitted to the sub-module 201 of convolution calculation module 20, and the original data M corresponding to the second row weight in the convolution kernel is processed₁₀、M₁₁、M₁₂The index is transmitted to the index generation submodule 211 of the convolution calculation module 21, and the original data M corresponding to the third row weight value in the convolution kernel is processed₂₀、M₂₁、M₂₂And is transmitted to the index generation submodule 221 of the convolution calculation module 22.

In each convolution calculation module 2i, an index generation submodule 2i1 generates a plurality of index data based on the values of the original data in the same data packet on the same bit, and transmits the index data to a weight query submodule 2i 2.

The index generation submodule 2i1, after receiving a plurality of original data belonging to the same data packet and coming from the data packet module 10, may generate a plurality of index data based on the values of the original data on the same bit. In an alternative implementation, if the data grouping module 10 groups the original data corresponding to the weights in the same row in the convolution kernel into a group, the index generation submodule 2i1 may take the values of the original data on the same bit, and generate the index data corresponding to the bit according to the sequential arrangement of the original data corresponding to the weights in the same row in the convolution kernel, so as to generate the index data corresponding to each bit. It is understood that, if the data packet 10 groups the original data corresponding to the same column of weights in the convolution kernel, the index generation submodule 2i1 may take the values of the original data at the same bit, and generate the index data according to the sequence arrangement of the original data corresponding to the same column of weights in the convolution kernel.

Under the condition that the size of a convolution kernel is n multiplied by n and the bit width of original data is m, based on n original data in the same data packet, m index data with the bit width of n respectively corresponding to a bit [ m-1] to a bit [0] can be generated; wherein m is a positive integer.

Referring to table 1 below, based on the previous example, assuming that the bit width of the original data is 8, the original data M is received by the data packet module 10₀₀(0100 1000)、M₀₁(1010 0111)、M₀₂(00011010) the index generation submodule 201 of the convolution calculation module 20 may take their values at the same bit, according to M₀₀、M₀₁、M₀₂The values are respectively index data bit [0]]Bit 1]Bit 2]Arranging the upper numerical values in order to generate index data corresponding to each bit of the original data, and then arranging the bit [7]]To bit [0]]The corresponding 8 pieces of 3-bit index data are transmitted to the weight query submodule 202 of the convolution calculation module 20.

TABLE 1

Similarly, the original data M coming in after receiving the data packet 10₁₀、M₁₁、M₁₂Thereafter, the index generation submodule 211 of the convolution calculation module 21 may generate an index based on the raw data M₁₀、M₁₁、M₁₂Generating bit [7]]To bit [0]]The corresponding 8 index data are then transmitted to the weight query submodule 212 of the convolution calculation module 21.

The raw data M coming in from the receiving data packet module 1₂₀、M₂₁、M₂₂Thereafter, the index generation submodule 221 of the convolution calculation module 22 may generate an index based on the raw data M₂₀、M₂₁、M₂₂Generating bit [7]]To bit [0]]Corresponding 8 index data, then transmitted to the convolution calculationThe weight query sub-module 222 of module 22.

In each convolution calculation module 2i, a weight query submodule 2i2 queries a target weight corresponding to each index data in a weight table, and transmits the target weight to a shift addition module 2i 3; and the weight value table in each convolution calculation module is configured in advance based on the weight value in the convolution kernel.

After the weight query submodule 2i2 receives the index data corresponding to each bit and transmitted from the index generation submodule 2i1, the weight query submodule 2i2 may query a target weight corresponding to the index data in a weight table pre-configured by the convolution calculation module 2 i.

In the case of a convolution kernel size of n × n, the weight table includes a value range of 0 to 2ⁿ2 of-1ⁿThe bit width of the target weight corresponding to each index data is the same as the bit width of the weight in the convolution kernel, and is generally the same as the bit width of the original data; because the convolution kernel in the convolution calculation is generally unchanged, the weight value table in each convolution calculation module can be used for processing different original data after being configured in advance. Under the condition that the bit width of the original data is m, target weights corresponding to m index data are inquired, the index data corresponding to different bits can be the same in numerical value, the target weights obtained by inquiring the index data with the same numerical value are the same, and subsequent shifting processing is different.

Referring to table 3 below, based on the previous example, assuming that the weight value table of the convolution calculation module 20 is shown in table 2 below, after receiving 8 3-bit index data respectively corresponding to bits [7] to [0] of the original data, which are transmitted from the index generation sub-module 201, the weight query sub-module 202 of the convolution calculation module 20 may query to obtain 8 target weights respectively corresponding to bits [7] to [0], and then transmit the target weights to the shift addition sub-module 203 of the convolution calculation module 20.

Index data	000	001	010	011	100	101	110	111
									Target weight	W₀₀₀	W₀₀₁	W₀₁₀	W₀₁₁	W₁₀₀	W₁₀₁	W₁₁₀	W₁₁₁

TABLE 2

TABLE 3

Similarly, after receiving 8 index data transmitted by the index generation sub-module 211, the weight query sub-module 212 of the convolution calculation module 21 may find corresponding 8 target weights, and then transmit the 8 target weights to the shift addition sub-module 213 of the convolution calculation module 21.

After receiving the 8 index data transmitted by the index generation sub-module 221, the weight query sub-module 222 of the convolution calculation module 22 may search the corresponding 8 target weights, and then transmit the 8 target weights to the shift addition sub-module 223 of the convolution calculation module 22.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a convolution calculating module according to an exemplary embodiment of the present disclosure.

In an alternative implementation, the convolution calculation module 2i further includes a weight table configuration module 2i 4.

In terms of connection relation, the weight table configuration module 2i4 is connected to the weight query module 2i2 in the convolution calculation module 2 i.

In a case that the data grouping module 10 groups the original data corresponding to the weights in the same row in the convolution kernel into one group, the weight table configuring module 2i4 may determine the target weights corresponding to the index data by calculating the weights in the same row in the convolution kernel based on the combination and addition rule corresponding to the index data, so as to determine the target weights corresponding to the index data respectively, so as to configure the weight table of the weight querying module 2i 2. It will be appreciated that where the data packet 10 groups raw data corresponding to the same column of weights in the convolution kernel, the target weights in the weight table are determined by the weight calculation of the same column in the convolution kernel.

It should be noted that, in the weight value table configuration module of different convolution calculation modules, the weights of different rows in the convolution kernel are adopted, and they should correspond to the original data processed by the convolution calculation module. Based on the foregoing example, the raw data M is processed₀₀、M₀₁、M₀₂In the convolution calculating module 20, the weight W of the first row at the same position in the convolution kernel₀₀、W₀₁、W₀₂Calculating a weight value table, processing the original data M₁₀、M₁₁、M₁₂In the convolution calculation module, the weight W of the second line at the same position in the convolution kernel₁₀、W₁₁、W₁₂Calculating a determination weight table to process the raw data M₂₀、M₂₁、M₂₂In the convolution calculating module 22, the weight W of the third row at the same position in the convolution kernel₂₀、W₂₁、W₂₂And calculating a determination weight value table.

Referring to table 5 below, assuming the corresponding combinatorial-addition rule of the index data is shown in table 4 below based on the previous example, the weights W of the same row in the received incoming convolution kernel₀₀、W₀₁、W₀₂Then, based on the combination and addition rule, the weight table configuration module 204 of the convolution calculation module 20 may determine the target weights corresponding to the index data 000 to the index data 111, respectively.

TABLE 4

TABLE 5

Similarly, upon receiving the weights of the same row in the incoming convolution kernel, the weight table configuration module 214 of the convolution calculation module 21 may determine the weight table as shown in table 6 below.

TABLE 6

Upon receiving the weights for the same row in the incoming convolution kernel, the weight table configuration module 224 of the convolution calculation module 22 may determine the weight table as shown in table 7 below.

TABLE 7

In each convolution calculation module 2i, the shift addition submodule 2i3 shifts each target weight based on the bit of the index data in the original data, adds all the shifted target weights to obtain an intermediate convolution value, and transmits the intermediate convolution value to the summation module 30.

After receiving the target weight corresponding to each index data transmitted by the weight query submodule 2i2, the shift addition submodule 2i3 may perform left shift processing on the target weight, which is consistent with the bit position of the index data in the original data, and then add all the shifted target weights to obtain the intermediate convolution value R of the convolution calculation module 2i_iAnd the intermediate convolution value R is used_iTo the summing block 30.

Based on the previous example, the bits [7] corresponding to the original data bits respectively and as shown in Table 3 are transmitted from the received weight query submodule 202]To bit [0]]After 8 target weights of 3 bits, the shift-and-add submodule 203 of the convolution calculation module 20 may add the corresponding bit [7]]Target weight W of₀₁₀Left shifted by 7 to give W'₀₁₀Will correspond to bit [6 ]]Target weight W of₀₀₁Left-shifted by 6 to obtain W'₀₀₁Repeating the above steps, adding the 8 shifted target weights to obtain an intermediate convolution value R of the convolution calculation module 20₀，R₀＝W’₀₁₀+W’₀₀₁+W’₀₁₀+W’₁₀₀+W’₁₀₁+W’₀₁₀+W’₁₁₀+W’₀₁₀。

Similarly, after receiving 8 target weights transmitted by the weight query submodule 212, the shift addition submodule 213 of the convolution calculation module 21 may perform left shift corresponding to the bit position of the target weight, and add the 8 shifted target weights to obtain the middle convolution value R of the convolution calculation module 21₁。

After receiving the 8 target weights transmitted by the weight query submodule 222, the shift addition submodule 223 of the convolution calculation module 22 may add the 8 target weights to the targetThe weight value is shifted to the left corresponding to the bit position, and the target weight values after 8 shift processes are added to obtain the middle convolution value R of the convolution calculation module 22₂。

And the summation module 30 is used for summing all the intermediate convolution values transmitted by the convolution calculation modules 2i to obtain a final convolution value.

The summation module 30 receives the intermediate convolution values R from the convolution calculation modules 2i_iThen, summing all the intermediate convolution values to obtain the final convolution value

Based on the foregoing example, the convolution calculation module 20 can be derived from the raw data M₀₀、M₀₁、M₀₂Obtaining an intermediate convolution value R₀The convolution calculation module 21 may be constructed from the original data M₁₀、M₁₁、M₁₂Obtaining an intermediate convolution value R₁The convolution calculation module 22 can be constructed from the raw data M₂₀、M₂₁、M₂₂Obtaining an intermediate convolution value R₂Finally, summing module 30 may sum intermediate convolution values R₀Intermediate convolution value R₁And an intermediate convolution value R₂And summing to obtain a final convolution value R.

The specific circuit implementation of each module and each submodule is not complex, a multiplier is not needed, and various selectable implementation modes exist; for example, the weight query sub-module may be implemented based on components such as a data selector, and the summation module may be implemented based on an adder, which is not described in detail.

As can be seen from the above description, the convolution calculating circuit in this specification, first, divides the original data into data packets by the data packet module; meanwhile, each data group is provided with a convolution calculation module, in each convolution calculation module, index data is generated by the original data of the same data group through an index generation submodule, then a target weight corresponding to each index data is searched through a weight query submodule, and all the target weights are subjected to shift addition through a shift addition submodule to obtain an intermediate convolution value; and finally, summing all the intermediate convolution values through a summing module to obtain a final convolution value.

The present specification also provides a neural network processor, which can be used for executing various deep learning algorithms based on the convolutional neural network. The neural network processor executes convolution calculation by adopting the convolution calculation circuit provided in the specification; on the hardware level, in addition to the convolution calculation circuit, the convolution calculation circuit can also comprise a bus, an interface, a memory and a nonvolatile memory, and other hardware required by services.

The present specification also proposes a convolution calculation method, which can be used for executing various deep learning algorithms based on a convolution neural network.

Referring to fig. 3, fig. 3 is a flowchart illustrating a convolution calculation method according to an exemplary embodiment of the present disclosure.

The convolution calculation method can comprise the following specific steps:

step 302, dividing a plurality of original data to be processed into a plurality of data groups based on the corresponding relation between the original data and the weight in the convolution kernel;

step 304, aiming at each data packet, generating a plurality of index data based on the numerical values of the original data in the data packet on the same bit;

step 306, for each index data, querying a target weight corresponding to the index data in a weight value table of a data packet to which the index data belongs; the weight value table of each data packet is pre-configured based on the weight value in the convolution kernel;

and 308, shifting each target weight value based on the bit of the index data in the original data, and adding all the shifted target weight values to obtain a final convolution value.

It is understood that, in step 308, the adding and summing of all the shifted target weights may be decomposed into adding all the shifted target weights under each data packet to obtain an intermediate convolution value of the data packet, and adding the intermediate convolution values of all the data packets to obtain a final convolution value.

Optionally, in step 302, the dividing the raw data to be processed into data packets based on the correspondence between the raw data and the weights in the convolution kernel includes:

and dividing the original data corresponding to the same row weight in the convolution kernel into the same data packet.

Optionally, in step 304, the generating a plurality of index data based on the values of the original data in the data packet at the same bit includes:

and taking the numerical value of each original data on the same bit, and generating index data corresponding to the bit according to the sequence arrangement of the weights in the same row in the convolution kernel corresponding to the original data, thereby generating the index data corresponding to each bit respectively.

Optionally, the process of pre-configuring the weight value table of each data packet based on the weight value in the convolution kernel includes:

and based on a combined addition rule corresponding to the index data, calculating and determining a target weight corresponding to the index data by using weights in the same row in the convolution kernel, thereby determining the target weight corresponding to each index data respectively so as to configure a weight value table of a data group to which the original data corresponding to the weight belongs.

Optionally, when the method is applied to image processing, the plurality of original data to be processed are image data on each pixel point of the image to be processed.

As can be seen from the above description, the convolution calculation method in this specification converts the part involved in multiplication in convolution calculation successfully into weight index and shift addition after data grouping, thereby avoiding the use of a multiplier in a circuit when convolution calculation is implemented, facilitating the reduction of circuit area and the reduction of circuit power consumption, and facilitating the optimization of performance and cost.

The methods illustrated in the above embodiments may be implemented based on software, for example, reading and running a corresponding computer program to implement; of course, other implementations are not excluded, such as logic devices or combinations of software and hardware, that is, the execution main body of the processing flow is not limited to each logic unit, and may be hardware or logic devices.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims

1. A convolution calculation circuit is characterized by comprising a data grouping module, a convolution calculation module and a summation module, wherein the convolution calculation module also comprises an index generation sub-module, a weight inquiry sub-module and a shift addition sub-module;

the weight value query submodule queries a target weight value corresponding to each index datum in a weight value table and transmits the target weight value to the shift addition module; the weight value table in each convolution calculation module is configured in advance based on the weight value in the convolution kernel;

the shift addition submodule is used for performing shift processing on each target weight value based on the bit of the index data in the original data, adding all the shifted target weight values to obtain an intermediate convolution value, and transmitting the intermediate convolution value to the summation module;

2. The circuit according to claim 1, wherein the data grouping module, when dividing the plurality of original data to be processed into a plurality of data groups based on the correspondence between the original data and the weight in the convolution kernel, is specifically configured to:

3. The circuit of claim 1, wherein the index generation submodule, when generating the plurality of index data based on the values of the original data in the same data packet at the same bit, is specifically configured to:

4. The circuit of claim 1, wherein the convolution computation module further comprises a weight table configuration module:

and the weight value table configuration module is used for determining a target weight value corresponding to the index data through the weight value calculation of the same row in the convolution kernel based on the combined addition rule corresponding to the index data, so that the target weight value corresponding to each index data is determined to configure the weight value table of the convolution calculation module.

5. The circuit according to claim 1, wherein in a case where a convolution kernel size is n × n, the number of the convolution calculation modules included in the convolution calculation circuit is n; wherein n is a positive integer.

6. The circuit of claim 1, wherein the raw data to be processed is image data at each pixel point of an image to be processed when applied to image processing.

7. A neural network processor, wherein the circuit of any one of claims 1 to 6 is used to perform convolution calculations.

8. A method of convolution computation, the method comprising:

9. The method of claim 8, wherein generating index data based on values of respective original data in the data packet at a same bit comprises:

10. The method of claim 8, wherein the pre-configuring the weight table for each data packet based on the weights in the convolution kernel comprises: