CN111445013B

CN111445013B - Non-zero detector for convolutional neural network and method thereof

Info

Publication number: CN111445013B
Application number: CN202010347546.XA
Authority: CN
Inventors: 岳涛; 时云凿; 王蔓蓁; 邱禹欧; 潘红兵; 闫锋
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2023-04-25
Anticipated expiration: 2040-04-28
Also published as: CN111445013A

Abstract

The invention discloses a non-zero detector for a convolutional neural network and a method thereof. The non-zero detector comprises a top layer control unit, a local buffer module, a zero detection module, a bit and addressing module and a zero detection module, wherein the top layer control unit is used for storing input excitation and weight data into the local buffer module and controlling the operations of the excitation and weight zero detection module and the bit and addressing module; the local buffer module is used for storing excitation and weight data of the convolutional neural network; the excitation and weight zero detection module is used for carrying out non-zero screening according to the excitation and weight data of the local cache module and returning the obtained bitmap to the local cache module; and the bit and addressing module is used for bit-wise AND the excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputting the addresses to the local buffer module. The non-zero detector provided by the invention effectively improves the calculation efficiency of the convolutional neural network and reduces the calculation amount of the convolutional neural network under the condition that too much storage and calculation resources are not additionally occupied.

Description

Non-zero detector for convolutional neural network and method thereof

Technical Field

The invention belongs to the field of digital image classification, and particularly relates to a non-zero detector for a convolutional neural network and a method thereof.

Background

Convolutional neural networks are a type of feedforward neural network that includes convolutional computation and has a deep structure, and are widely used in the fields of computer vision and the like. The convolutional neural network has a very good effect in the field of image classification, has a large number of mature training methods and tools, and has a classical convolutional neural network model subjected to a large number of verifications, such as lenet-5, alexnet, vgg-16 and the like.

The convolutional neural network comprises a large number of convolutional operations, the number of layers of the convolutional layers is gradually deepened along with the continuous improvement of the classification effect, and the number of convolutional kernels and the number of channels are gradually increased. Furthermore, rather than stopping at classifying a smaller number of data sets, the size of the input stimulus also follows the data sets becoming larger and larger. These factors greatly increase the convolution operation in the convolution layer, and require a large amount of computing resources.

In convolutional neural networks, there are a large number of "0" s for excitation and weight due to the problem of the selected excitation function. When a large number of '0's participate in the convolution operation in the convolution neural network, a considerable part of calculation is invalidated, so that a meaningless convolution operation is performed between some calculation and a storage unit, thereby reducing the performance of the convolution neural network and increasing the amount of time for training the convolution neural network.

Therefore, with the continuous application of the convolutional neural network algorithm, the data calculation amount is greatly increased due to pursuing a higher classification effect, and meanwhile, the calculation redundancy caused by the participation of a large number of '0's in the convolutional calculation is avoided, so that most calculation units perform effective convolutional calculation, and the method is an effective method for reducing the calculation amount of the convolutional neural network at present.

Disclosure of Invention

In order to solve the problems, the invention provides a non-zero detector for a convolutional neural network and a method thereof based on the thought of maximizing the utilization of an operation unit.

The non-zero detector adopts the following technical scheme:

a non-zero detector for a convolutional neural network, comprising:

the top layer control unit is used for storing the input excitation and weight data into the local cache module and controlling the operations of the excitation and weight zero detection module and the bit and addressing module;

the local buffer module is used for storing excitation and weight data of the convolutional neural network, including excitation data, excitation bit map data, weight data and weight bit map data;

the excitation and weight zero detection module is used for carrying out non-zero screening according to the excitation and weight data of the local cache module and returning the obtained bitmap to the local cache module;

and the bit and addressing module is used for bit-wise AND the excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputting the addresses to the local buffer module.

Further, the excitation and weight zero detection module includes: the detection unit is used for detecting whether the excitation and weight data transmitted by the local cache module are completely stored; and the screening unit is used for carrying out non-zero screening on the excitation and weight data and sending the generated bit map of the excitation and weight to the local buffer module.

The invention discloses a detection method for a non-zero detector of a convolutional neural network, which comprises the following steps:

(1) The input weight and excitation data are cached in a local cache module through a top control unit, and then transmitted to an excitation and weight zero detection module through the local cache module;

(2) When the excitation and weight zero detection module detects that all data are completely transmitted, data screening is carried out, a bit map of the excitation and weight is generated, and the generated bit map is returned to the local cache module;

(3) The local buffer memory module transmits the bit map to the bit and addressing module, and when the excitation and weight zero detection module detects that all data are completely transmitted, the bit and addressing module starts working, calculates the address of non-zero weight and excitation data and returns the address to the local buffer memory module;

(4) And the local buffer module transmits the non-zero data to the computing unit according to the screened non-zero weight and the address of the excitation data.

In the step (2), the detection unit of the excitation and weight zero detection module pulls up after detecting that all data are completely stored, and then the screening unit of the excitation and weight zero detection module carries out non-zero screening on the excitation and weight data, and sends the generated bit map of the excitation and weight to the local buffer module for storage.

Further, in the step (3), the bit and address module performs bit and operation on the input excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputs the addresses to the local buffer module when one address is obtained.

Compared with the prior art, the invention has the advantages that:

(1) The non-zero detector provided by the invention can screen non-zero excitation and weight, avoid the waste of calculation resources caused by 0 participation in calculation, reduce the requirement on a calculation unit to a certain extent, and effectively improve the calculation efficiency of convolution calculation.

(2) The time required by the non-zero detection method is far less than the time consumed by 0 to participate in convolution calculation, so that the performance of the convolution neural network is improved.

Drawings

FIG. 1 is a diagram of the overall architecture of a non-zero detector;

FIG. 2 is a schematic diagram of the internal structure of the excitation and weight zero detection module;

FIG. 3 is a schematic diagram of the internal structure of the bit and address module;

FIG. 4 is a flow diagram of a PE computation module;

fig. 5 is a flow diagram of a non-zero detector.

Detailed Description

The following describes the scheme of the invention in detail with reference to the accompanying drawings.

As shown in fig. 1, the non-zero detector for convolutional neural network of the present embodiment includes a top-level control unit, an excitation and weight zero detection array, a local cache array, and a bit and address array. The excitation and weight data are distributed to the local cache array by an external distribution module through the top layer control unit, and the local cache array is connected with the PE (Process Element) computing unit.

The top control unit is used for mainly receiving the excitation and weight data and storing the excitation and weight data into the local cache array; the excitation and weight zero detection module is controlled to detect zero for data and return the obtained excitation and weight bitmap (bit map) to the local cache array; the control bit and the addressing array select the non-zero excitation and cache data of the local cache array to be sent to a calculation unit for calculation; and controlling the local cache array to respectively distribute weights, excitation and weights and excited bitmaps to the excitation and weight zero detection array, the bit and addressing array.

The local cache array is used for caching the data distributed by the weight and excitation distribution module locally, and the excitation and weight zero detection array picks non-zero data and sends the non-zero data to the calculation unit in the PE array for calculation. The array mainly comprises four parts, namely an excitation buffer, an excitation bitmap buffer, a weight buffer and a weight bitmap buffer. The size of the cache array matches the data of a single update. The excitation buffer and the weight buffer are distributed to the local buffer array by an external distributor through a control unit, and the excitation bitmap buffer and the weight bitmap buffer are calculated and returned by the excitation and weight zero detection array.

The excitation and weight zero detection array is mainly used for carrying out 0-setting or 1-setting operation on excitation and weight data, and if the data is judged to be larger than 0, setting 1, otherwise setting 0; returning the processed bitmap data to the local storage array, wherein the module comprises: (1) The detection unit is used for detecting that all channel data of excitation and storage data transmitted by the local cache array are pulled up after being completely stored; (2) And the screening unit performs non-zero screening on the excitation and weight data after the detection unit is pulled up, generates a bit map of the excitation and weight, and sends the bit map to the local cache module for storage.

The excitation and weight zero detection array is used for screening non-zero excitation and weight, and the '0' participation operation is avoided to waste calculation resources. And the input excitation bit map vector and the weight bit map vector are bitwise and are used for obtaining address index with non-zero weights and excitation values at corresponding positions, and the number of the corresponding index is read out after each index is detected.

And finally, the local cache array takes out the weight and excitation of the corresponding position according to index, and sends the weight and excitation to the PE array for multiply-accumulate module operation.

As shown in fig. 2, taking a data set of 4×9×8 as an example, the excitation and weight zero detection array includes 4 parallel branches, each parallel branch includes 9 working groups, each group includes 8 operation units, and each group of operation units includes a screening unit and a detection unit. And inputting corresponding weight and excitation data to the corresponding screening units in different working groups of the same branch. The detection unit is pulled up after detecting that all channel data of the excitation and storage data transmitted by the local cache array are completely stored; when the screening unit detects that the detection unit is pulled up, the screening unit starts working, and if the data is judged to be greater than 0, the screening unit sets 1, otherwise, sets 0. And then returning the processed weight and the excitation bitmap to the local cache module.

As shown in fig. 3, the bit and addressing module structure is consistent with the stimulus and weight array architecture, comprising 4 parallel branches, each parallel branch comprising 9 working groups, each group comprising 8 addressing units, each addressing unit comprising a bit and logic operator, a 72-bit register, and an address selection unit. And (3) bit-wise AND the input excitation bit map vector and the weight bit map vector to obtain address index with non-zero weights and excitation values at corresponding positions, and reading the number of the corresponding index and transmitting the number of the corresponding index when detecting one index. And the local cache array takes out the weight and the excitation data of the corresponding position according to the index and sends the weight and the excitation data into the PE array for multiply-accumulate module operation. Since excitation and weight data may not be simultaneously entered, it is necessary to wait for both excitation detection and weight detection signals to be valid before starting index calculation.

As shown in fig. 4, the PE array is mainly responsible for convolution operation of the multi-layer network, and is consistent with the excitation and weight zero-detection array architecture, and includes 4 parallel branches, each parallel branch includes 9 working groups, and each group includes 8 PE computing units. The excitation and weights calculated by each PE unit come from the same channel of the input image and convolution kernel, respectively. PE in the same working group of the same branch calculates a non-zero excitation weight multiplied by an accumulated value in the same sliding window. PE with the same number in different working groups of the same branch is used for calculating non-zero excitation weight multiplied by accumulated values corresponding to the same output channel of different sliding windows.

The PE unit functions to calculate a multiplication accumulation of non-zero excitation and weights corresponding to all input channels for a 3*3 window. After the non-zero detection array calculates index with non-zero excitation and weight values, the local cache array reads the corresponding excitation and weight according to the index and sends the index and weight to the PE unit for multiply-accumulate operation. The convolved portion and the result are stored in a register. After the convolution of 8/16 channels is calculated, the PE unit receives the I_clear signal, the accumulation is finished, the quantification of the accumulation result is realized through displacement, the 8-bit convolution result O_PEout output is obtained, the effective signal is output by pulling high O_PEout_valid, and the accumulation register in the PE unit is cleared.

In connection with fig. 5, the overall flow of the non-zero detector is as follows: (1) Inputting weight and excitation data, caching the weight and excitation data in a local cache array, and transmitting the weight and excitation data to an excitation and weight zero detection module through the local cache array; (2) When the detection unit detects complete transmission, the screening unit starts working, generates bitmap of excitation and weight, and returns the bitmap to the local cache array; (3) The local cache array transmits the bitmap to the locating and addressing module, and after the detection unit detects complete transmission, the bit and unit starts working, calculates the address of non-zero weight and excitation data and returns the address to the local cache module; (4) The local buffer module transmits non-zero data to the PE computing unit according to the screened non-zero weight and the stimulated address; (5) And when all the non-zero weights and the excitation are detected to be transmitted, the PE computing unit starts to work.

The invention designs the non-zero detector aiming at the excitation and weight in the convolutional neural network, screens the non-zero excitation and weight, avoids the 0 from participating in the operation to waste the calculation resource, reduces the requirement on an operation unit to a certain extent, and improves the efficiency of the convolutional operation. The time required by the non-zero detection algorithm is far less than the time consumed by 0 to participate in convolution calculation, so that the performance of the convolution neural network is improved.

The above description of the present invention provides a non-zero detector for convolutional neural networks to facilitate understanding of the present invention and its core ideas. Various modifications and deductions can be made according to the core idea of the present invention when it is embodied to those skilled in the art. In view of the foregoing, the description should not be taken as limiting the invention.

Claims

1. A non-zero detector for a convolutional neural network, comprising:

2. A non-zero detector for convolutional neural networks according to claim 1, wherein the excitation and weight zero detection module comprises:

the detection unit is used for detecting whether the excitation and weight data transmitted by the local cache module are completely stored;

and the screening unit is used for carrying out non-zero screening on the excitation and weight data and sending the generated bit map of the excitation and weight to the local buffer module.

3. A detection method using a non-zero detector for convolutional neural networks as claimed in claim 1, comprising the steps of:

(2) When the excitation and weight zero detection module detects that all data are completely transmitted, non-zero screening of the data is carried out, a bit map of the excitation and weight is generated, and the generated bit map is returned to the local cache module;

(3) The local buffer memory module transmits the bit map to the bit and addressing module, and when the excitation and weight zero detection module detects that all data are transmitted completely, the bit and addressing module starts to carry out bit and AND on the input excitation bit map vector and the weight bit map vector, calculates the addresses of non-zero weight and non-zero excitation data and returns the addresses to the local buffer memory module;

(4) And the local buffer module transmits the non-zero data to the computing unit according to the address of the non-zero excitation data and the non-zero weight which are screened out.