CN113240103B

CN113240103B - Neural network pooling circuit

Info

Publication number: CN113240103B
Application number: CN202110712084.1A
Authority: CN
Inventors: 裴京; 施路平; 王冠睿; 马骋
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2022-10-04
Anticipated expiration: 2041-06-25
Also published as: CN113240103A

Abstract

The application relates to a neural network pooling circuit, which comprises a pixel counting module, an address generating module and a comparator module, wherein the pixel counting module is used for receiving an input updating signal representing image pixels in an external memory, determining the number of the image pixels input in the external memory according to the input updating signal and determining whether to send a read-write control signal to the address generating module according to the number; the address generation module is used for receiving the read-write control signal, generating a read enable signal and a read address or a write enable signal and a write address aiming at the external memory according to the read-write control signal, and starting a read operation request or a write operation request to the external memory; and the comparator module is used for reading image pixel data from the external memory and determining a pooling result according to a pre-set pooling mode.

Description

Neural network pooling circuit

Technical Field

The application relates to the technical field of artificial intelligence chips, in particular to a neural network pooling circuit.

Background

The goal of Artificial Intelligence (AI) is to let us help us in daily life by enabling human and superman capabilities on machines. Typical application scenarios include automatic driving, smart homes, smart assistants, security cameras, robots, and so on. The task of AI software is to train through an optimization algorithm to obtain a neural network framework to realize a specific application scenario. The AI hardware is the core force of the AI process, and the aim of the AI hardware development is to enable software deployed on the AI hardware to perform functions faster and with lower power consumption.

The pooling layer is an important component in the convolutional neural network, and the pooling layer is mainly used for compressing an input feature map, so that the feature map is reduced, the network calculation complexity is simplified, and the feature compression is performed to extract main features. The design of the pooling circuit is also the key of the design of the convolutional neural network circuit, but the processing speed of the pooling circuit in the related art is slow, so that the processing speed of the whole convolutional neural network circuit is influenced.

Therefore, there is a need in the art for a neural network pooling circuit with improved processing efficiency.

Disclosure of Invention

An object of the embodiment of the present application is to provide a neural network pooling circuit, which can implement the overall implementation logic of pooling operation logic, and adapt to a solution for pooling operation based on a feature map vector storage structure.

The neural network pooling circuit provided by the embodiment of the application is realized as follows:

a neural network pooling circuit includes a pixel count module, an address generation module, and a comparator module, wherein,

the pixel counting module is used for receiving an input updating signal representing image pixels in an external memory, determining the number of the image pixels input in the external memory according to the input updating signal, and determining whether to send a read-write control signal to the address generating module according to the number;

the address generation module is electrically connected with the pixel counting unit and the external memory, and is used for receiving the read-write control signal, generating a read enable signal and a read address or a write enable signal and a write address for the external memory according to the read-write control signal, and starting a read operation request or a write operation request for the external memory;

the comparator module is electrically connected with the external memory and is used for reading image pixel data from the external memory under the condition that the address generation module generates the read enable signal and the read address, and determining a pooling result according to a preset pooling mode; and a write module for writing the generated pooling result in the external memory in a case where the address generating module generates the write enable signal and the write address.

Optionally, in an embodiment of the present application, the circuit further includes an instruction processing module, where the instruction processing module is electrically connected to the pixel counting module, the address generating module, and the comparator module, and is configured to send an executable instruction and a configuration parameter for implementing a pooling function to the pixel counting module, the address generating module, and the comparator module.

Optionally, in an embodiment of the application, the instruction processing module is further configured to receive the input update signal of the image pixel in the external memory, and send the input update signal to the pixel counting module.

Optionally, in an embodiment of the present application, the pixel counting module is further configured to send a read-write control signal to the address generating module under a condition that a row of pooled outputs is generated; and the input row reading request corresponding to the generated row of pooling output is determined according to the size of the pooling window in the column direction and the stepping value in the column direction.

Optionally, in an embodiment of the present application, the address generating module further includes a fill pixel processing unit, where the fill pixel processing unit is configured to determine whether an image pixel is located in a non-fill area, and generate a read address and a read operation request valid for the external memory if it is determined that the image pixel is located in the non-fill area; otherwise, no valid read address and read operation request is generated.

Optionally, in an embodiment of the present application, the comparator module further includes a data precision conversion unit, and the data precision conversion unit is configured to perform data precision conversion on the image pixel data read from the external memory before determining the pooling result.

Optionally, in an embodiment of the application, the address generating module is further configured to, in a case that a data width is changed after the data precision conversion, increase the number of times of continuously generating read addresses or the number of times of continuously generating write addresses, so that a total data width input to the data precision conversion unit matches a total data width output to the data precision conversion unit.

Optionally, in an embodiment of the present application, the comparator module includes a parallel comparison unit, and the parallel comparison unit includes a plurality of comparator groups, and the comparator group is composed of a plurality of comparators.

Optionally, in an embodiment of the present application, the comparator module further includes a data buffer, where the data buffer is electrically connected to the output ends of the plurality of comparator groups in the parallel comparison unit, and is used for buffering the output results of the plurality of comparator groups.

The neural network pooling circuit provided by the embodiment of the application can realize the overall realization logic of the pooling operation logic through the cooperative work of the pixel counting module, the address generating module and the comparator module, and is adapted to a solution scheme for performing pooling operation based on a feature map vector storage structure. In addition, the address generation module optimizes the reading efficiency of the pixel vectors with the characteristic map of the filling area and meets the requirement of high-speed data pooling operation. The data type conversion circuit in the pooling operation realizes the structure and can support more complex characteristic diagram pooling operation.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram illustrating a neural network pooling circuit 100 according to an exemplary embodiment.

FIG. 2 is an example of one type of address generation module 103 shown in accordance with an example embodiment.

Fig. 3 is an example of one type of comparator module 105 shown in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Referring now to fig. 1, a neural network pooling circuit 100 provided by an embodiment of the present application is illustrated, and as shown in fig. 1, the neural network pooling circuit 100 may include a pixel counting module 101, an address generating module 103, and a comparator module 105, wherein,

the pixel counting module 101 is configured to receive an input update signal indicating an image pixel in an external memory, determine the number of the image pixels input in the external memory according to the input update signal, and determine whether to send a read-write control signal to the address generating module 103 according to the number;

the address generating module 103 is electrically connected to the pixel counting unit 101 and the external memory, and configured to receive the read-write control signal, generate a read enable signal and a read address or a write enable signal and a write address for the external memory according to the read-write control signal, and start a read operation request or a write operation request for the external memory;

the comparator module 105 is electrically connected to the external memory, and configured to read image pixel data from the external memory when the address generation module 103 generates the read enable signal and the read address, and determine a pooling result according to a preset pooling mode; and for writing the generated pooling result in the external memory in case the address generating module 103 generates the write enable signal and the write address.

In this embodiment, the input terminal of the neural network pooling circuit 100 may be connected to the output terminal of the convolution circuit, and is configured to process the three-dimensional feature map generated by the convolution circuit, that is, the image pixel in this embodiment includes the image pixel of the three-dimensional feature map. And the three dimensions of the three-dimensional feature map are respectively a row, a column and an input channel. In this embodiment of the application, after the convolution circuit generates the three-dimensional feature map, the three-dimensional feature map may be input to the external memory for storage. In order to perform the pooling operation on the three-dimensional feature map in the external memory in time, an input update signal may be generated when an image pixel of the three-dimensional feature map is input into the external memory, and the input update signal may be sent to the pixel counting module 101. In one embodiment of the present application, an input update signal may be generated in case that each line of image pixels is input into the external memory, so as to facilitate the counting of the pixel counting module 101. Of course, in other embodiments, the input update signal may also be configured to match the number of rows of the pooling window, for example, the number of rows of the pooling window is three, and then, one input update signal may be generated under the condition that every three rows of image pixels are input into the external memory, and the present application does not limit the condition for generating the input update signal.

In this embodiment, after receiving the input update signal, the pixel count module 101 may determine the number of image pixels input in the external memory according to the input update signal, and determine whether to send a read/write control signal to the address generation module 103 according to the number. In this embodiment, the pixel count module 101 may send the read/write control signal to the address generation module 103 if it is satisfied that a row of pooled outputs can be generated. In the pooling operation, an input row corresponding to a row of pooled outputs is determined according to the column-wise Py of the pooling window and the column-wise Sy. The size Py of the pooling window in the column direction determines the number of input lines required for each line of output, and the step value Sy of the pooling window in the column direction determines the number of lines spaced between the first line of input lines corresponding to two adjacent output lines. In one example, the pixel count module 101 increments an internal counter by 1 each time it receives an input update signal. Meanwhile, when the comparator module 105 calculates and generates a row of pooling output results, the counter inside the pixel counting module 101 subtracts the stepping value Sy of the pooling window in the column direction, which is used to indicate that the input row number no longer used by the pooling window stepping needs to be satisfied before the operation can be continued. Only when the current count value of the counter inside the pixel count module 101 is greater than the size Py of the pooling window in the column direction, the pixel count module 101 may send a read-write control signal to the address generation module 103 to drive it to generate a new read-write address, so as to trigger subsequent address generation and read comparison operations. Of course, in other embodiments, the image pixels may also be counted by one or more columns, or may also be counted by the number of pixels, and the counting manner of the image pixels is not limited in this application.

In this embodiment, the address generating module 103 is further electrically connected to the external memory, and is configured to generate a read address or a write address for the external memory according to the read-write control signal after receiving the read-write control signal. The pixel count module 101 may send a read address generation enable signal to the address generation module 103, in a case where it is determined that the pooling operation can be started according to the input update signal, the read address generation enable signal being capable of starting the address generation module 103 to start up, that is, generating a read address for the external memory. In one example, the address generating module 103 may first determine an address of a first pixel in the feature map in the external memory during generating the read address, and then may use the address of the pixel as a base address, and the positions of the pixels in the pooling windows during the pooling operation may be represented as offsets from the base address. The base address may be obtained during software compilation. In one embodiment, the base address may be pre-configured in the address generation module 103. In another embodiment, the base address may also be stored in an external memory and loaded into the address generation module 103 when performing the pooling operation. Of course, in other embodiments, the address generation module 103 may also determine the address of each pixel in other manners, which is not limited herein.

The present application, on the other hand, illustrates one embodiment of the address generation module 103 by the circuit shown in fig. 2. As shown in fig. 2, the address generation module 103 may include an address accumulator 1031 and a loop counter 1032. Fig. 2 shows a 5-stage cyclic counter composed of 5 counters, which are a first counter, a second counter, a third counter, a fourth counter, and a fifth counter, respectively. Each counter has a corresponding read address step value, which is a first step value, a second step value, a third step value and a fourth step value. The corresponding order of the pooling operation and the 5-stage loop is "pooling window row direction size Px = > pooling window column direction size Py = > output channel C = > output image row direction size Ox = > output image column direction size Oy". When performing pooling calculation, the address generation module 103 first takes the base address of the input data as the input data reading address, then the pooling operation is performed along the row direction of the pooling window, at this time, the change value of the address each time is a first step value, when the calculation in the row direction of the pooling window is completed, the first-stage counting of the first counter is completed, a carry signal to the second counter is generated, at this time, the address stepping of the address generation unit is changed into a second step value, and address stepping in the row direction of the pooling window is realized. Similarly, when the calculations in the column direction of the pooling window, the output channel, and the row direction of the output image are completed, the 3-stage cycle of the third counter and the 4-stage cycle count of the fourth counter are completed, and the address step of the address generating unit is changed to the step value (the third step value and the fourth step value) corresponding to the corresponding cycle, thereby realizing the cyclic reading of the data in different directions. When the 5-stage loop count of the fifth counter is completed, the entire pooling operation is completed. Meanwhile, at the end of the 2 nd cycle count, the pooling comparison in one pooling window is completed, and the address generating module 103 generates a write address and a write enable signal according to corresponding logic. It can be seen that in this circular step addressing manner, the pooling window size, pooling window step size, etc. that vary in the original pooling calculation are unified to a fixed circular address accumulation variation. The address stepping change value can be obtained by pre-calculation during the compiling of software, thereby simplifying the logic design of hardware and avoiding the complex address generation calculation.

In this embodiment, after the address generating module 103 generates the read address and sends the read address to the external memory, the external memory may send the image pixel data in the read address to the comparator module 105. In this embodiment, the address generation module 103 may further send a read enable signal to the external memory, where the read enable signal may be used to instruct the external memory to send the image pixel data of the read address to the comparator module 105. Of course, in other embodiments, the external memory may also send an operating state signal of the read data to the address generation module 103, so that the address generation module 103 determines whether the external memory is operating normally. Further, if the address generation module 103 does not receive the operating state signal or receives an operating state signal indicating that the memory cannot respond, the processing may be performed according to an abnormal condition.

In the pooling operation, a filling area is added around the feature images, so that the boundary information of the feature images can be kept, and in addition, different feature images with different input sizes can be supplemented by the filling area, so that the sizes of the feature images are consistent. The size of the fill area may be preset, for example, by adding filled rows and columns above, below, to the left, and to the right of the feature image. The pixel value in the fill area may be set to a constant, and in code implementations, may be set to 0. In the embodiment of the present application, as shown in fig. 2, the filling judgment unit 1033 of the address generation module 103 judges whether the read address generated by the address accumulator 1031 points to a filling area in the external memory, so as to avoid reading data in the filling area in the external memory, and save data processing resources. Specifically, the fill determination unit 1033 may determine whether the currently generated read address is a virtual address corresponding to the fill area or a real address pointing to a pixel of the stored image in the storage area, based on values of rows and columns filled up, down, left, and right of the feature image and the count value in the loop counter, which are configured in advance. If the currently generated read address is a dummy address corresponding to the padding area, the padding determining unit 1033 does not send a read enable signal to the external memory, and does not trigger the comparison circuit to perform an operation, thereby reducing the read operation to the external memory. On the other hand, if the generated read address corresponds to the real address, the population judging unit 1033 transmits the read enable address and the read address to the external memory. It should be noted that, the present application is not limited to the manner of determining whether the read address corresponds to the virtual address or the actual address.

In the embodiment of the present application, as shown in fig. 3, the comparator module 105 may be connected to the external memory through a data bus. The external memory may transmit the image pixel data in the read address to the comparator module 105 through the data bus after receiving the read address and the read enable signal of the address generation module 103. The comparator module 105 may determine the pooled result according to a pre-set pooling pattern. The pooling mode may include, but is not limited to, a maximum, a minimum, an average, and the like. The comparator module 105, after determining the pooling result, may also transmit the pooling result to the external memory via a data bus.

In the embodiment of the present application, as shown in fig. 3, the comparator module 105 may include a parallel comparison unit 1052 therein. The parallel comparison unit 1052 may include a plurality of comparator groups, which may be composed of a plurality of comparators. The comparison operation of data with various lengths can be realized by performing the pooling operation on a plurality of mutually separated comparator groups. In one example, the parallel comparing unit 1052 may include comparator groups with 4 bit widths and structures identical to each other, each of the comparator groups may include comparators with 4 bit widths 1B, and then the comparator module 105 may implement pooling operations of multiple data lengths such as int32, int8, and ternary.

In a practical application environment, the precision of the image pixel data read from the external memory does not necessarily match the precision of the output data that is finally required, for example, the data precision of the input data is int32, but the precision requirement of the preset output data is int8. Based on this, the neural network pooling circuit provided by the embodiment of the application can provide a data precision conversion function. Based on this, as shown in fig. 3, the comparator module 105 may further include a data precision conversion unit 1051, where the data precision conversion unit 1051 is configured to perform data precision conversion on the image pixel data read from the external memory before the determination of the pooling result, so that the precision of the input data matches the precision of the output data set in advance. The data precision conversion unit 1051 in this embodiment may acquire the data precision of the input image pixel data, then may acquire a preset precision requirement of the output data, and finally perform precision conversion on the input image pixel data. In some examples, the data precision conversion unit 1051 may implement at least the following data precision conversions: int32 converts to int8, int32 converts to three, int8 converts to three, three converts to int32, three converts to int8, int8 converts to in32 type, and so on. Of course, in other embodiments, the data precision capable of implementing conversion is not limited to the above example, and the data precision conversion unit 1051 provided in the embodiment of the present application may provide any function of data precision conversion according to the requirement, which is not limited herein.

In practical application scenarios, the conversion of data precision may cause the total width of input and output data of the data precision conversion unit 1051 before and after the conversion to be mismatched. Based on this, in an embodiment of the present application, the address generating module 103 is further configured to, in a case where the data width is changed after the data precision conversion, increase the number of times of continuously generating the read address or the number of times of continuously generating the write address so that the total width of data input to the data precision converting unit 1051 matches the total width of data output to the data precision converting unit 1051. In some examples, when the data precision conversion is int32 to int8, the address generation module 103 may continuously read in 4 sets of data from the external memory, and write 1 set of data into the external memory. When the data precision is converted into int32 and converted into three values, the address generation module 103 may continuously read 16 sets of data from the external memory and write 1 set of data into the external memory. When the data precision conversion is int32 to int8, the address generation module 103 may continuously read in 4 sets of data from the external memory, and write 1 set of data into the external memory. When the data precision conversion is converted into the three-value conversion int8, the address generation module 103 may read in 1 group of data from the external memory, and continuously write 4 groups of data into the external memory. When the data precision conversion is converted into the three-value conversion to int32, the address generation module 103 may read 1 group of data from the external memory and continuously write 16 groups of data into the external memory. When the data precision conversion is int8 to int32, the address generation module 103 may read in 1 group of data from the external memory, and continuously write 4 groups of data into the external memory.

As shown in FIG. 3, the comparator module 105 may further include a data buffer 1053, wherein the data buffer 1053 is electrically connected to the parallel compare unit 1052. Specifically, the data buffer 1053 may include a plurality of independent sub-buffers, which may be electrically connected to each comparator set of the parallel comparing unit 1052, respectively, and used for buffering the output data of the comparator set. The data buffer 1053 buffers the output data of the comparator group mainly in two cases:

1. in the process of reading and writing with an external memory, when the external memory returns a working state abnormal signal, the read-write request cannot be responded in time, or the read-write conflict exists, and the arbitration is needed;

2. when the input and output data are not matched in precision, the method is used for caching the input data, so that different read-write frequency relations of the input data and the output data are met.

In an embodiment of the present application, the operations of the pixel counting module 101, the address generating module 103, the comparator module 105 and the module units contained therein need to be based on preset executable instructions and corresponding configuration parameters. The configuration parameters may include, for example, a base address, a size of the pooling window, a sliding step size of the pooling window, a pooling pattern of the comparators, a fill area indicator for the image pixels, and so forth. Based on this, before the neural network pooling circuit 100 starts to work, the executable instructions and the configuration parameters need to be sent to the respective modules. Optionally, as shown in fig. 1, in an embodiment of the present application, the neural network pooling circuit 100 may further include an instruction processing module 107, where the instruction processing module 107 is electrically connected to the pixel counting module 101, the address generating module 103, and the comparator module 105, and is configured to send executable instructions and configuration parameters for implementing a pooling function to the pixel counting module 101, the address generating module 103, and the comparator module 105. In an embodiment of the present application, the instruction processing module 107 is further configured to receive the input update signal of the image pixel in the external memory, and send the input update signal to the pixel counting module 101. Of course, in other embodiments, the executable instructions and the configuration parameters may also be sent by other circuits or modules outside the neural network pooling circuit 100, and the present application is not limited thereto.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A neural network pooling circuit comprising a pixel count module, an address generation module, and a comparator module, wherein,

the address generation module is electrically connected with the pixel counting module and the external memory, and is used for receiving the read-write control signal, generating a read enable signal and a read address or a write enable signal and a write address for the external memory according to the read-write control signal, and starting a read operation request or a write operation request for the external memory; the address generation module further comprises a filling pixel processing unit, wherein the filling pixel processing unit is used for judging whether the image pixel is located in a non-filling area, and generating a read address and a read operation request which are effective for the external memory under the condition that the image pixel is determined to be located in the non-filling area; otherwise, generating no effective read address and read operation request;

the comparator module is electrically connected with the external memory and is used for reading image pixel data from the external memory under the condition that the address generation module generates the read enable signal and the read address, and determining a pooling result according to a preset pooling mode; and a write module for writing the generated pooling result in the external memory in a case where the address generation module generates the write enable signal and the write address.

2. The circuit of claim 1, further comprising an instruction processing module electrically connected to the pixel count module, the address generation module, and the comparator module, for sending executable instructions and configuration parameters for implementing a pooling function to the pixel count module, the address generation module, and the comparator module.

3. The circuit of claim 2, wherein the instruction processing module is further configured to receive the input update signal of the image pixel in the external memory and send the input update signal to the pixel counting module.

4. The circuit of claim 1, wherein the pixel count module is further configured to send a read/write control signal to the address generation module if a row of pooled outputs is generated; and the input row reading request corresponding to the generated row of pooling output is determined according to the size of the pooling window in the column direction and the stepping value in the column direction.

5. The circuit of claim 1, wherein the comparator module further comprises a data precision conversion unit configured to perform data precision conversion on the image pixel data read from the external memory before determining the pooling result.

6. The circuit of claim 5, wherein the address generation module is further configured to, in a case where a data width is changed after the data precision conversion, increase the number of times of continuously generating the read address or the number of times of continuously generating the write address so that a total data width input to the data precision conversion unit matches a total data width output to the data precision conversion unit.

7. The circuit of claim 1, wherein the comparator module comprises a parallel comparison unit comprising a plurality of comparator groups, the comparator groups consisting of a plurality of comparators.

8. The circuit of claim 7, wherein the comparator module further comprises a data buffer electrically connected to the output terminals of the plurality of comparator groups in the parallel comparison unit for buffering the output results of the plurality of comparator groups.