CN111445013B - Non-zero detector for convolutional neural network and method thereof - Google Patents

Non-zero detector for convolutional neural network and method thereof Download PDF

Info

Publication number
CN111445013B
CN111445013B CN202010347546.XA CN202010347546A CN111445013B CN 111445013 B CN111445013 B CN 111445013B CN 202010347546 A CN202010347546 A CN 202010347546A CN 111445013 B CN111445013 B CN 111445013B
Authority
CN
China
Prior art keywords
excitation
weight
zero
module
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010347546.XA
Other languages
Chinese (zh)
Other versions
CN111445013A (en
Inventor
岳涛
时云凿
王蔓蓁
邱禹欧
潘红兵
闫锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010347546.XA priority Critical patent/CN111445013B/en
Publication of CN111445013A publication Critical patent/CN111445013A/en
Application granted granted Critical
Publication of CN111445013B publication Critical patent/CN111445013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a non-zero detector for a convolutional neural network and a method thereof. The non-zero detector comprises a top layer control unit, a local buffer module, a zero detection module, a bit and addressing module and a zero detection module, wherein the top layer control unit is used for storing input excitation and weight data into the local buffer module and controlling the operations of the excitation and weight zero detection module and the bit and addressing module; the local buffer module is used for storing excitation and weight data of the convolutional neural network; the excitation and weight zero detection module is used for carrying out non-zero screening according to the excitation and weight data of the local cache module and returning the obtained bitmap to the local cache module; and the bit and addressing module is used for bit-wise AND the excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputting the addresses to the local buffer module. The non-zero detector provided by the invention effectively improves the calculation efficiency of the convolutional neural network and reduces the calculation amount of the convolutional neural network under the condition that too much storage and calculation resources are not additionally occupied.

Description

Non-zero detector for convolutional neural network and method thereof
Technical Field
The invention belongs to the field of digital image classification, and particularly relates to a non-zero detector for a convolutional neural network and a method thereof.
Background
Convolutional neural networks are a type of feedforward neural network that includes convolutional computation and has a deep structure, and are widely used in the fields of computer vision and the like. The convolutional neural network has a very good effect in the field of image classification, has a large number of mature training methods and tools, and has a classical convolutional neural network model subjected to a large number of verifications, such as lenet-5, alexnet, vgg-16 and the like.
The convolutional neural network comprises a large number of convolutional operations, the number of layers of the convolutional layers is gradually deepened along with the continuous improvement of the classification effect, and the number of convolutional kernels and the number of channels are gradually increased. Furthermore, rather than stopping at classifying a smaller number of data sets, the size of the input stimulus also follows the data sets becoming larger and larger. These factors greatly increase the convolution operation in the convolution layer, and require a large amount of computing resources.
In convolutional neural networks, there are a large number of "0" s for excitation and weight due to the problem of the selected excitation function. When a large number of '0's participate in the convolution operation in the convolution neural network, a considerable part of calculation is invalidated, so that a meaningless convolution operation is performed between some calculation and a storage unit, thereby reducing the performance of the convolution neural network and increasing the amount of time for training the convolution neural network.
Therefore, with the continuous application of the convolutional neural network algorithm, the data calculation amount is greatly increased due to pursuing a higher classification effect, and meanwhile, the calculation redundancy caused by the participation of a large number of '0's in the convolutional calculation is avoided, so that most calculation units perform effective convolutional calculation, and the method is an effective method for reducing the calculation amount of the convolutional neural network at present.
Disclosure of Invention
In order to solve the problems, the invention provides a non-zero detector for a convolutional neural network and a method thereof based on the thought of maximizing the utilization of an operation unit.
The non-zero detector adopts the following technical scheme:
a non-zero detector for a convolutional neural network, comprising:
the top layer control unit is used for storing the input excitation and weight data into the local cache module and controlling the operations of the excitation and weight zero detection module and the bit and addressing module;
the local buffer module is used for storing excitation and weight data of the convolutional neural network, including excitation data, excitation bit map data, weight data and weight bit map data;
the excitation and weight zero detection module is used for carrying out non-zero screening according to the excitation and weight data of the local cache module and returning the obtained bitmap to the local cache module;
and the bit and addressing module is used for bit-wise AND the excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputting the addresses to the local buffer module.
Further, the excitation and weight zero detection module includes: the detection unit is used for detecting whether the excitation and weight data transmitted by the local cache module are completely stored; and the screening unit is used for carrying out non-zero screening on the excitation and weight data and sending the generated bit map of the excitation and weight to the local buffer module.
The invention discloses a detection method for a non-zero detector of a convolutional neural network, which comprises the following steps:
(1) The input weight and excitation data are cached in a local cache module through a top control unit, and then transmitted to an excitation and weight zero detection module through the local cache module;
(2) When the excitation and weight zero detection module detects that all data are completely transmitted, data screening is carried out, a bit map of the excitation and weight is generated, and the generated bit map is returned to the local cache module;
(3) The local buffer memory module transmits the bit map to the bit and addressing module, and when the excitation and weight zero detection module detects that all data are completely transmitted, the bit and addressing module starts working, calculates the address of non-zero weight and excitation data and returns the address to the local buffer memory module;
(4) And the local buffer module transmits the non-zero data to the computing unit according to the screened non-zero weight and the address of the excitation data.
In the step (2), the detection unit of the excitation and weight zero detection module pulls up after detecting that all data are completely stored, and then the screening unit of the excitation and weight zero detection module carries out non-zero screening on the excitation and weight data, and sends the generated bit map of the excitation and weight to the local buffer module for storage.
Further, in the step (3), the bit and address module performs bit and operation on the input excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputs the addresses to the local buffer module when one address is obtained.
Compared with the prior art, the invention has the advantages that:
(1) The non-zero detector provided by the invention can screen non-zero excitation and weight, avoid the waste of calculation resources caused by 0 participation in calculation, reduce the requirement on a calculation unit to a certain extent, and effectively improve the calculation efficiency of convolution calculation.
(2) The time required by the non-zero detection method is far less than the time consumed by 0 to participate in convolution calculation, so that the performance of the convolution neural network is improved.
Drawings
FIG. 1 is a diagram of the overall architecture of a non-zero detector;
FIG. 2 is a schematic diagram of the internal structure of the excitation and weight zero detection module;
FIG. 3 is a schematic diagram of the internal structure of the bit and address module;
FIG. 4 is a flow diagram of a PE computation module;
fig. 5 is a flow diagram of a non-zero detector.
Detailed Description
The following describes the scheme of the invention in detail with reference to the accompanying drawings.
As shown in fig. 1, the non-zero detector for convolutional neural network of the present embodiment includes a top-level control unit, an excitation and weight zero detection array, a local cache array, and a bit and address array. The excitation and weight data are distributed to the local cache array by an external distribution module through the top layer control unit, and the local cache array is connected with the PE (Process Element) computing unit.
The top control unit is used for mainly receiving the excitation and weight data and storing the excitation and weight data into the local cache array; the excitation and weight zero detection module is controlled to detect zero for data and return the obtained excitation and weight bitmap (bit map) to the local cache array; the control bit and the addressing array select the non-zero excitation and cache data of the local cache array to be sent to a calculation unit for calculation; and controlling the local cache array to respectively distribute weights, excitation and weights and excited bitmaps to the excitation and weight zero detection array, the bit and addressing array.
The local cache array is used for caching the data distributed by the weight and excitation distribution module locally, and the excitation and weight zero detection array picks non-zero data and sends the non-zero data to the calculation unit in the PE array for calculation. The array mainly comprises four parts, namely an excitation buffer, an excitation bitmap buffer, a weight buffer and a weight bitmap buffer. The size of the cache array matches the data of a single update. The excitation buffer and the weight buffer are distributed to the local buffer array by an external distributor through a control unit, and the excitation bitmap buffer and the weight bitmap buffer are calculated and returned by the excitation and weight zero detection array.
The excitation and weight zero detection array is mainly used for carrying out 0-setting or 1-setting operation on excitation and weight data, and if the data is judged to be larger than 0, setting 1, otherwise setting 0; returning the processed bitmap data to the local storage array, wherein the module comprises: (1) The detection unit is used for detecting that all channel data of excitation and storage data transmitted by the local cache array are pulled up after being completely stored; (2) And the screening unit performs non-zero screening on the excitation and weight data after the detection unit is pulled up, generates a bit map of the excitation and weight, and sends the bit map to the local cache module for storage.
The excitation and weight zero detection array is used for screening non-zero excitation and weight, and the '0' participation operation is avoided to waste calculation resources. And the input excitation bit map vector and the weight bit map vector are bitwise and are used for obtaining address index with non-zero weights and excitation values at corresponding positions, and the number of the corresponding index is read out after each index is detected.
And finally, the local cache array takes out the weight and excitation of the corresponding position according to index, and sends the weight and excitation to the PE array for multiply-accumulate module operation.
As shown in fig. 2, taking a data set of 4×9×8 as an example, the excitation and weight zero detection array includes 4 parallel branches, each parallel branch includes 9 working groups, each group includes 8 operation units, and each group of operation units includes a screening unit and a detection unit. And inputting corresponding weight and excitation data to the corresponding screening units in different working groups of the same branch. The detection unit is pulled up after detecting that all channel data of the excitation and storage data transmitted by the local cache array are completely stored; when the screening unit detects that the detection unit is pulled up, the screening unit starts working, and if the data is judged to be greater than 0, the screening unit sets 1, otherwise, sets 0. And then returning the processed weight and the excitation bitmap to the local cache module.
As shown in fig. 3, the bit and addressing module structure is consistent with the stimulus and weight array architecture, comprising 4 parallel branches, each parallel branch comprising 9 working groups, each group comprising 8 addressing units, each addressing unit comprising a bit and logic operator, a 72-bit register, and an address selection unit. And (3) bit-wise AND the input excitation bit map vector and the weight bit map vector to obtain address index with non-zero weights and excitation values at corresponding positions, and reading the number of the corresponding index and transmitting the number of the corresponding index when detecting one index. And the local cache array takes out the weight and the excitation data of the corresponding position according to the index and sends the weight and the excitation data into the PE array for multiply-accumulate module operation. Since excitation and weight data may not be simultaneously entered, it is necessary to wait for both excitation detection and weight detection signals to be valid before starting index calculation.
As shown in fig. 4, the PE array is mainly responsible for convolution operation of the multi-layer network, and is consistent with the excitation and weight zero-detection array architecture, and includes 4 parallel branches, each parallel branch includes 9 working groups, and each group includes 8 PE computing units. The excitation and weights calculated by each PE unit come from the same channel of the input image and convolution kernel, respectively. PE in the same working group of the same branch calculates a non-zero excitation weight multiplied by an accumulated value in the same sliding window. PE with the same number in different working groups of the same branch is used for calculating non-zero excitation weight multiplied by accumulated values corresponding to the same output channel of different sliding windows.
The PE unit functions to calculate a multiplication accumulation of non-zero excitation and weights corresponding to all input channels for a 3*3 window. After the non-zero detection array calculates index with non-zero excitation and weight values, the local cache array reads the corresponding excitation and weight according to the index and sends the index and weight to the PE unit for multiply-accumulate operation. The convolved portion and the result are stored in a register. After the convolution of 8/16 channels is calculated, the PE unit receives the I_clear signal, the accumulation is finished, the quantification of the accumulation result is realized through displacement, the 8-bit convolution result O_PEout output is obtained, the effective signal is output by pulling high O_PEout_valid, and the accumulation register in the PE unit is cleared.
In connection with fig. 5, the overall flow of the non-zero detector is as follows: (1) Inputting weight and excitation data, caching the weight and excitation data in a local cache array, and transmitting the weight and excitation data to an excitation and weight zero detection module through the local cache array; (2) When the detection unit detects complete transmission, the screening unit starts working, generates bitmap of excitation and weight, and returns the bitmap to the local cache array; (3) The local cache array transmits the bitmap to the locating and addressing module, and after the detection unit detects complete transmission, the bit and unit starts working, calculates the address of non-zero weight and excitation data and returns the address to the local cache module; (4) The local buffer module transmits non-zero data to the PE computing unit according to the screened non-zero weight and the stimulated address; (5) And when all the non-zero weights and the excitation are detected to be transmitted, the PE computing unit starts to work.
The invention designs the non-zero detector aiming at the excitation and weight in the convolutional neural network, screens the non-zero excitation and weight, avoids the 0 from participating in the operation to waste the calculation resource, reduces the requirement on an operation unit to a certain extent, and improves the efficiency of the convolutional operation. The time required by the non-zero detection algorithm is far less than the time consumed by 0 to participate in convolution calculation, so that the performance of the convolution neural network is improved.
The above description of the present invention provides a non-zero detector for convolutional neural networks to facilitate understanding of the present invention and its core ideas. Various modifications and deductions can be made according to the core idea of the present invention when it is embodied to those skilled in the art. In view of the foregoing, the description should not be taken as limiting the invention.

Claims (3)

1. A non-zero detector for a convolutional neural network, comprising:
the top layer control unit is used for storing the input excitation and weight data into the local cache module and controlling the operations of the excitation and weight zero detection module and the bit and addressing module;
the local buffer module is used for storing excitation and weight data of the convolutional neural network, including excitation data, excitation bit map data, weight data and weight bit map data;
the excitation and weight zero detection module is used for carrying out non-zero screening according to the excitation and weight data of the local cache module and returning the obtained bitmap to the local cache module;
and the bit and addressing module is used for bit-wise AND the excitation bit map vector and the weight bit map vector to obtain addresses with non-zero weights and excitation values at corresponding positions, and outputting the addresses to the local buffer module.
2. A non-zero detector for convolutional neural networks according to claim 1, wherein the excitation and weight zero detection module comprises:
the detection unit is used for detecting whether the excitation and weight data transmitted by the local cache module are completely stored;
and the screening unit is used for carrying out non-zero screening on the excitation and weight data and sending the generated bit map of the excitation and weight to the local buffer module.
3. A detection method using a non-zero detector for convolutional neural networks as claimed in claim 1, comprising the steps of:
(1) The input weight and excitation data are cached in a local cache module through a top control unit, and then transmitted to an excitation and weight zero detection module through the local cache module;
(2) When the excitation and weight zero detection module detects that all data are completely transmitted, non-zero screening of the data is carried out, a bit map of the excitation and weight is generated, and the generated bit map is returned to the local cache module;
(3) The local buffer memory module transmits the bit map to the bit and addressing module, and when the excitation and weight zero detection module detects that all data are transmitted completely, the bit and addressing module starts to carry out bit and AND on the input excitation bit map vector and the weight bit map vector, calculates the addresses of non-zero weight and non-zero excitation data and returns the addresses to the local buffer memory module;
(4) And the local buffer module transmits the non-zero data to the computing unit according to the address of the non-zero excitation data and the non-zero weight which are screened out.
CN202010347546.XA 2020-04-28 2020-04-28 Non-zero detector for convolutional neural network and method thereof Active CN111445013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010347546.XA CN111445013B (en) 2020-04-28 2020-04-28 Non-zero detector for convolutional neural network and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010347546.XA CN111445013B (en) 2020-04-28 2020-04-28 Non-zero detector for convolutional neural network and method thereof

Publications (2)

Publication Number Publication Date
CN111445013A CN111445013A (en) 2020-07-24
CN111445013B true CN111445013B (en) 2023-04-25

Family

ID=71656289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010347546.XA Active CN111445013B (en) 2020-04-28 2020-04-28 Non-zero detector for convolutional neural network and method thereof

Country Status (1)

Country Link
CN (1) CN111445013B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967B (en) * 2016-08-22 2021-06-15 赛灵思公司 Hardware accelerator and method for realizing sparse GRU neural network based on FPGA
US10360163B2 (en) * 2016-10-27 2019-07-23 Google Llc Exploiting input data sparsity in neural network compute units
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110110851B (en) * 2019-04-30 2023-03-24 南京大学 FPGA accelerator of LSTM neural network and acceleration method thereof
CN110222835A (en) * 2019-05-13 2019-09-10 西安交通大学 A kind of convolutional neural networks hardware system and operation method based on zero value detection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix

Also Published As

Publication number Publication date
CN111445013A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN109598338B (en) Convolutional neural network accelerator based on FPGA (field programmable Gate array) for calculation optimization
KR102542580B1 (en) System and method for optimizing performance of a solid-state drive using a deep neural network
CN110135554A (en) A kind of hardware-accelerated framework of convolutional neural networks based on FPGA
CN110070178A (en) A kind of convolutional neural networks computing device and method
CN109741318B (en) Real-time detection method of single-stage multi-scale specific target based on effective receptive field
CN110084739A (en) A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN104239233B (en) Buffer memory management method, cache management device and caching management equipment
KR20180034853A (en) Apparatus and method test operating of convolutional neural network
CN110688088B (en) General nonlinear activation function computing device and method for neural network
CN111626403B (en) Convolutional neural network accelerator based on CPU-FPGA memory sharing
KR20180080876A (en) Convolution circuit, application processor having the same, and operating methoe thereof
CN107153522A (en) A kind of dynamic accuracy towards artificial neural networks can match somebody with somebody approximate multiplier
CN110032538A (en) A kind of data reading system and method
CN112966807B (en) Convolutional neural network implementation method based on storage resource limited FPGA
CN111768458A (en) Sparse image processing method based on convolutional neural network
CN111401532A (en) Convolutional neural network reasoning accelerator and acceleration method
CN115617712A (en) LRU replacement algorithm based on set associative Cache
CN111445013B (en) Non-zero detector for convolutional neural network and method thereof
CN108881254A (en) Intruding detection system neural network based
CN110222835A (en) A kind of convolutional neural networks hardware system and operation method based on zero value detection
Sanny et al. Energy-efficient median filter on FPGA
CN104050635B (en) System and method for nonlinear filter real-time processing of image with adjustable template size
CN108647780B (en) Reconfigurable pooling operation module structure facing neural network and implementation method thereof
CN102207909A (en) Cost-based buffer area replacement method of flash memory database
CN110728303B (en) Dynamic self-adaptive computing array based on convolutional neural network data complexity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant