CN109102065B - Convolutional neural network accelerator based on PSoC - Google Patents

Convolutional neural network accelerator based on PSoC Download PDF

Info

Publication number
CN109102065B
CN109102065B CN201810689938.7A CN201810689938A CN109102065B CN 109102065 B CN109102065 B CN 109102065B CN 201810689938 A CN201810689938 A CN 201810689938A CN 109102065 B CN109102065 B CN 109102065B
Authority
CN
China
Prior art keywords
memory
module
neural network
convolutional neural
multiply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810689938.7A
Other languages
Chinese (zh)
Other versions
CN109102065A (en
Inventor
熊晓明
李子聪
曾宇航
胡湘宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chipeye Microelectronics Foshan Ltd
Guangdong University of Technology
Original Assignee
Chipeye Microelectronics Foshan Ltd
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipeye Microelectronics Foshan Ltd, Guangdong University of Technology filed Critical Chipeye Microelectronics Foshan Ltd
Priority to CN201810689938.7A priority Critical patent/CN109102065B/en
Publication of CN109102065A publication Critical patent/CN109102065A/en
Application granted granted Critical
Publication of CN109102065B publication Critical patent/CN109102065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The utility model discloses a convolutional neural network accelerator based on PSoC device constructs, including off-chip memory, CPU, characteristic map input memory, characteristic map output memory, offset memory, weight memory, direct memory access and the same computational element number's of neuron computational element, computational element includes first-in first-out queue, state machine, data selector, average value pooling module, maximum value pooling module, multiply and add computational module, activation function module, and the calculation is carried out in the multiply and add computational module that constitutes is parallel execution, can be used to the convolutional neural network system of multiple framework. The invention fully utilizes a Programmable part in a Programmable System on Chip (PSoC) device to realize a convolution neural network calculating part with large calculated amount and high parallelism, and utilizes a CPU to realize a serial algorithm and state control.

Description

Convolutional neural network accelerator based on PSoC
Technical Field
The invention relates to a convolutional neural network structure technology, in particular to a convolutional neural network accelerator based on PSoC.
Background
The convolution neural network has unique superiority in image processing by local weight sharing, the layout of the convolution neural network is closer to the actual biological neural network, the complexity of the neural network is reduced by sharing the weight, and the calculation amount of the neural network is reduced. At present, the convolutional neural network is widely applied to the fields of video monitoring, machine vision, mode recognition, image search and the like.
But the realization of the hardware of the convolution network needs a large amount of hardware resources, and has the problems of low bandwidth utilization rate, low data multiplexing and the like. Convolutional neural networks need to support convolutional operation, pooling operation and full-link operation of different sizes, and many applications of convolutional neural networks usually include a picture processing part, so that the pure hardware logic implementation of an FPGA (Field Programmable Gate Array) limits expandability. For the implementation of the convolutional neural network, the network implemented by hardware is fixed, the bandwidth utilization rate is low, and the convolutional neural network supporting other structures cannot be expanded. The PSoC device has a hardware programmable part and a software programming characteristic, and is considered to be a suitable platform for realizing the convolutional neural network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a PSoC-based convolutional neural network accelerator, wherein the programmable part of the hardware of the whole neural network accelerator can be simplified into a multiplication and addition calculation module, an activation function module, a maximum pooling module and an average pooling module. The multiplication and addition operations in all the multiplication and addition calculation modules are calculated in parallel, convolution calculation with different convolution kernel sizes is supported, and the problems that the calculation amount of a roll-in neural network is large and the bandwidth requirement is large are solved. The software part solves the softmax classifier and non-maximum suppression algorithm and image processing algorithm which can not be realized by hardware logic, and solves the configuration of the convolutional neural network with different network structures.
The purpose of the invention is realized by the following technical scheme: a PSoC-based convolutional neural network accelerator, comprising: an off-chip memory, a CPU, a characteristic diagram input memory, a characteristic diagram output memory, an offset memory, a weight memory, a direct memory access DMA and a calculation unit with the same number as the neurons,
the direct memory storage DMA is read from the off-chip memory and transmitted to the characteristic diagram input memory, the offset memory and the weight memory under the control of the CPU, or data of the characteristic diagram data memory is written back to the off-chip memory, and the CPU needs to control the storage positions of the input characteristic diagram, the offset, the weight and the output characteristic diagram in the off-chip memory and the parameter transmission of the multilayer convolutional neural network so as to adapt to neural networks with various architectures.
Further, the computing unit comprises a first-in first-out queue, a state machine, a first data selector, a second data selector, an average value pooling module, a maximum value pooling module, a multiply-add computing module and an activation function module,
wherein the first data selector is in communication with the feature map input memory, the input feature map input data is input to the mean pooling module, the maximum pooling module, the multiply-add calculation module, and the activation function module via the first data selector,
the second data selector is communicated with the feature map output memory, and output results of the average value pooling module, the maximum value pooling module and the multiply-add calculation module are selectively output to the feature map output memory through the second data selector.
Further, the multiplication and addition calculation module is based on a combined structure of a multiplication and addition tree and a multiplication and addition register and comprises an input characteristic diagram matrix, a weight input matrix and a bias matrix.
Further, the activation function module includes a first configuration register, a first selector, a first multiplier, and a first adder, and is configured to implement a tangent function, a sigmoid function, and a ReLU function, and the CPU configures the first configuration register of the activation function module to implement the activation function through hardware logic.
Further, the average pooling module comprises a second configuration register, a second multiplier and a second adder, and the average pooling module is configured by the CPU to realize pooling of the matrix average and obtain the matrix average.
Further, the maximum pooling module comprises a third configuration register, a comparator and a second selector, the maximum pooling module is configured through the CPU, the maximum pooling of the matrix is realized, and each data in the matrix is compared to obtain a maximum value.
Compared with the prior art, the invention has the following advantages and effects: the whole convolutional neural network is controlled by a CPU to perform data storage allocation and data transmission, a data selector performs data allocation under the control of a state machine and transmits the data allocation to a multiply-add computing module, an activation function computing module, a maximum pooling module and an average pooling module, and meanwhile, the CPU performs algorithms such as image processing, a softmax classifier and a non-maximum suppression algorithm.
Drawings
FIG. 1 is a diagram of a PSoC-based convolutional neural network accelerator of the present invention;
FIG. 2 is a block diagram of a multiply-add calculation module according to the present invention;
FIG. 3 is a block diagram of an activation function module of the present invention;
FIG. 4 is a block diagram of the mean pooling module of the present invention;
FIG. 5 is a block diagram of a maximum pooling module of the present invention;
FIG. 6 is a CPU software flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example one
In order to increase the computation amount of the convolutional neural network, increase the parallel processing efficiency, and reduce the bandwidth requirement, the present invention provides a convolutional neural network accelerator 100 based on a PSoC shown in fig. 1, which includes: an off-chip memory 101, a CPU102, a feature map input memory 103, a feature map output memory 104, an offset memory 105, a weight memory 106, a direct memory access DMA107, and a calculation unit 108 having the same number of neurons.
The direct memory storage DMA107 reads data transferred from the off-chip memory 101 to the profile input memory 103, the offset memory 105, and the weight memory 106 under the control of the CPU102, or writes data of the profile output memory 104 back to the off-chip memory 101. The CPU102 needs to control the storage locations of the input feature map, the offset, the weight, the output feature map in the off-chip memory, and the parameter transmission of the multi-layer convolutional neural network to adapt to neural networks of various architectures.
The calculation unit 108 with the same number of neurons includes a first-in-first-out queue, a state machine 109, a first data selector 110, a second data selector 111, an average pooling module 112, a maximum pooling module 113, a multiply-add calculation module 114, and an activation function module 115, wherein the first data selector 110 is in communication with the feature map input memory 103, input feature map input data is input to the average pooling module 112, the maximum pooling module 113, the multiply-add calculation module 114, and the activation function module 115 through the first data selector 110, the second data selector 111 is in communication with the feature map output memory 104, and output results of the average pooling module 112, the maximum pooling module 113, and the multiply-add calculation module 114 are selected and output to the feature map output memory 104 through the second data selector 111.
As shown in fig. 2, the multiply-add calculation module is based on a structure combining a multiply-add tree and a multiply-add register, and includes an input feature map matrix, a weight input matrix, and a bias matrix. The structure can realize parallel and efficient completion of convolution operation, and cannot reduce the utilization rate of the multiplier when convolution kernels with different sizes are realized.
As shown in fig. 3, the activation function module includes a first configuration register, a first selector, a first multiplier, and a first adder, and is configured to implement a tangent function, a sigmoid function, and a ReLU function, and the CPU configures the first configuration register of the activation function module to implement the activation function through hardware logic.
As shown in fig. 4, the average pooling module includes a second configuration register, a second multiplier, and a second adder. And configuring an average value pooling module through the CPU, wherein the m value can be configured, so that the pooling of the m × m average value is realized, and the m × m matrix average value is obtained.
As shown in fig. 5, the maximum pooling module includes a third configuration register, a comparator, and a second selector. And the CPU is configured with a maximum value pooling module, the k value is configurable, k × k maximum value pooling is realized, and each data in the k × k matrix is compared to obtain a maximum value.
Example two
Correspondingly, the invention further describes a method flow of the convolutional neural network calculation by the convolutional neural network accelerator based on the PSoC in combination with FIG. 6.
The CPU can be programmed in embedded software, the construction of a deep convolutional neural network is realized in the software programming, and the deep convolutional neural network is input into a relevant processor and is used for transmitting a command value control register through bus configuration.
Examples of configuration commands are shown in the following table:
the first layer input is x1 input feature map data and x3 weight data, and the calculation results are input into a maximum value pooling module and an activation function module to obtain x2 output feature map data.
Figure BDA0001712240430000041
The convolution layer output characteristic diagram has M layers in the storage form of the off-chip memory, and M takes the value of 1,3,5,7 … …. The output characteristic diagram of the M layer is the input characteristic diagram of the M +1 layer, the output characteristic diagram of the M layer is stored in the address space with the address A1 as the starting address, and the output characteristic diagram of the M +1 layer is stored in the address space with the address A2 as the starting address.
In a particular application, the computations within the convolutional neural network layers are performed in parallel. The whole network implementation process is as follows:
(1) the software of the processor 102 controls the image processing, and the sample data is stored in the off-chip memory 101;
(2) the processor 102 controls the DMA107 to read off-chip memory data to the first data selector 110 while configuring the multiply-add calculation unit 114, the average pooling module 112, the maximum pooling module 113, the activation function module 115, and the state machine 109 via the processor 102. Configuration information includes, but is not limited to, convolution computation step size, convolution kernel size, activation function type, mean pooling size, maximum pooling block size.
(3) Data is transferred from the DMA to the profile input memory 103, offset memory 105, weight memory 106 under control of the state machine 109.
(4) The data is input into the multiply-add calculation unit 114, the activation function module 115, the average pooling module 112 or the maximum pooling module 113 to obtain the calculation result.
(5) Under state machine control, data is transferred from the multiply-add computation unit 114, the activation function 115, the mean pooling module 112 or the maximum pooling module 113 to the data selector and to the off-chip memory 101.
At this point, the whole network completes one layer of results, and the network completes multiple layers of results in a circulating manner.
In a word, the programmable part of the whole convolutional neural network accelerator hardware can be simplified into a multiply-add computing module, an activation function module, a maximum value pooling module and an average value pooling module, the multiply-add operations in all the multiply-add computing modules are computed in parallel, convolutional computations with different convolutional kernel sizes are supported, pooling computations with different sizes are supported, a Softmax classifier and a non-maximum value suppression algorithm which cannot be realized by hardware logic are realized through CPU software design of the convolutional neural network accelerator, convolutional neural network computations are completed through configuration of convolutional neural networks supporting different network structures, the problems that the amount of computation of a convolutional neural network is large, the bandwidth requirement is large are solved, and the convolutional neural network algorithms supporting different structures can be configured.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. A PSoC-based convolutional neural network accelerator, comprising: the device comprises an off-chip memory, a CPU, a characteristic diagram input memory, a characteristic diagram output memory, an offset memory, a weight memory, a Direct Memory Access (DMA) and a computing unit with the same number as the neurons;
the direct memory storage DMA is read from an off-chip memory and transmitted to a characteristic diagram input memory, an offset memory and a weight memory under the control of a CPU (central processing unit), or data of a characteristic diagram data memory is written back to the off-chip memory, the CPU needs to control the storage positions of an input characteristic diagram, an offset, a weight and an output characteristic diagram in the off-chip memory and the parameter transmission of a multilayer convolutional neural network so as to adapt to neural networks with various architectures;
the computing unit comprises a first-in first-out queue, a state machine, a first data selector, a second data selector, an average value pooling module, a maximum value pooling module, a multiply-add computing module and an activation function module;
the first data selector is communicated with the feature map input memory, and input feature map input data are input into the average value pooling module, the maximum value pooling module, the multiply-add calculation module and the activation function module through the first data selector;
the second data selector is communicated with the feature map output memory, and output results of the average value pooling module, the maximum value pooling module and the multiply-add calculation module are selectively output to the feature map output memory through the second data selector.
2. The PSoC-based convolutional neural network accelerator according to claim 1, wherein the multiply-add computation module is based on a combination of a multiply-add tree and a multiply-add register, and comprises an input feature map matrix, a weight input matrix and a bias matrix.
3. The PSoC-based convolutional neural network accelerator as claimed in claim 1, wherein said activation function module comprises a first configuration register, a first selector, a first multiplier and a first adder for implementing tangent function, sigmoid function and ReLU function, and the CPU configures the first configuration register of the activation function module to implement the activation function through hardware logic.
4. The PSoC-based convolutional neural network accelerator as claimed in claim 1, wherein the mean pooling module comprises a second configuration register, a second multiplier and a second adder, and the CPU configures the mean pooling module to realize pooling of matrix means to obtain matrix means.
5. The PSoC-based convolutional neural network accelerator as claimed in claim 1, wherein the maximum pooling module comprises a third configuration register, a comparator and a second selector, the maximum pooling module is configured by the CPU to pool the maximum values of the matrices, and each data in the matrices is compared to obtain the maximum value.
CN201810689938.7A 2018-06-28 2018-06-28 Convolutional neural network accelerator based on PSoC Active CN109102065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810689938.7A CN109102065B (en) 2018-06-28 2018-06-28 Convolutional neural network accelerator based on PSoC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810689938.7A CN109102065B (en) 2018-06-28 2018-06-28 Convolutional neural network accelerator based on PSoC

Publications (2)

Publication Number Publication Date
CN109102065A CN109102065A (en) 2018-12-28
CN109102065B true CN109102065B (en) 2022-03-11

Family

ID=64845331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810689938.7A Active CN109102065B (en) 2018-06-28 2018-06-28 Convolutional neural network accelerator based on PSoC

Country Status (1)

Country Link
CN (1) CN109102065B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948785B (en) * 2019-01-31 2020-11-20 瑞芯微电子股份有限公司 High-efficiency neural network circuit system and method
CN109918281B (en) * 2019-03-12 2022-07-12 中国人民解放军国防科技大学 Multi-bandwidth target accelerator efficiency testing method
CN110222815B (en) * 2019-04-26 2021-09-07 上海酷芯微电子有限公司 Configurable activation function device and method suitable for deep learning hardware accelerator
CN110263925B (en) * 2019-06-04 2022-03-15 电子科技大学 Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA
CN110390384B (en) * 2019-06-25 2021-07-06 东南大学 Configurable general convolutional neural network accelerator
CN110687392B (en) * 2019-09-02 2024-05-31 北京智芯微电子科技有限公司 Power system fault diagnosis device and method based on neural network
CN110689122B (en) * 2019-09-25 2022-07-12 苏州浪潮智能科技有限公司 Storage system and method
CN111047008B (en) * 2019-11-12 2023-08-01 天津大学 Convolutional neural network accelerator and acceleration method
CN111445018B (en) * 2020-03-27 2023-11-14 国网甘肃省电力公司电力科学研究院 Ultraviolet imaging real-time information processing method based on accelerating convolutional neural network algorithm
CN111626403B (en) * 2020-05-14 2022-05-10 北京航空航天大学 Convolutional neural network accelerator based on CPU-FPGA memory sharing
CN111563483B (en) * 2020-06-22 2024-06-11 武汉芯昌科技有限公司 Image recognition method and system based on compact lenet model
CN111860540B (en) * 2020-07-20 2024-01-12 深圳大学 Neural network image feature extraction system based on FPGA
CN111966076B (en) * 2020-08-11 2023-06-09 广东工业大学 Fault positioning method based on finite state machine and graph neural network
CN112101538B (en) * 2020-09-23 2023-11-17 成都市深思创芯科技有限公司 Graphic neural network hardware computing system and method based on memory computing
CN112561034A (en) * 2020-12-04 2021-03-26 深兰人工智能(深圳)有限公司 Neural network accelerating device
CN112784977B (en) * 2021-01-15 2023-09-08 北方工业大学 Target detection convolutional neural network accelerator
CN112905530B (en) * 2021-03-29 2023-05-26 上海西井信息科技有限公司 On-chip architecture, pooled computing accelerator array, unit and control method
CN113692592B (en) * 2021-07-08 2022-06-28 香港应用科技研究院有限公司 Dynamic tile parallel neural network accelerator
CN114781629B (en) * 2022-04-06 2024-03-05 合肥工业大学 Hardware accelerator of convolutional neural network based on parallel multiplexing and parallel multiplexing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107392308A (en) * 2017-06-20 2017-11-24 中国科学院计算技术研究所 A kind of convolutional neural networks accelerated method and system based on programming device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10515304B2 (en) * 2015-04-28 2019-12-24 Qualcomm Incorporated Filter specificity as training criterion for neural networks
KR20180034853A (en) * 2016-09-28 2018-04-05 에스케이하이닉스 주식회사 Apparatus and method test operating of convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107392308A (en) * 2017-06-20 2017-11-24 中国科学院计算技术研究所 A kind of convolutional neural networks accelerated method and system based on programming device

Also Published As

Publication number Publication date
CN109102065A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
CN109102065B (en) Convolutional neural network accelerator based on PSoC
US10942986B2 (en) Hardware implementation of convolutional layer of deep neural network
US10846591B2 (en) Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks
CN110097174B (en) Method, system and device for realizing convolutional neural network based on FPGA and row output priority
US10768856B1 (en) Memory access for multiple circuit components
JP6823495B2 (en) Information processing device and image recognition device
JP2021521515A (en) Methods and accelerators for accelerating operations
JP2021521516A (en) Accelerators and systems for accelerating operations
CN111465943B (en) Integrated circuit and method for neural network processing
CN111199275B (en) System on chip for neural network
KR20200139829A (en) Network on-chip data processing method and device
US20220179823A1 (en) Reconfigurable reduced instruction set computer processor architecture with fractured cores
EP3844610B1 (en) Method and system for performing parallel computation
CN108804973B (en) Hardware architecture of target detection algorithm based on deep learning and execution method thereof
CN110059797B (en) Computing device and related product
CN111353598A (en) Neural network compression method, electronic device and computer readable medium
JP2022137247A (en) Processing for a plurality of input data sets
KR20200138411A (en) Network-on-chip data processing method and device
WO2023109748A1 (en) Neural network adjustment method and corresponding apparatus
GB2582868A (en) Hardware implementation of convolution layer of deep neural network
CN110929854B (en) Data processing method and device and hardware accelerator
CN111209230B (en) Data processing device, method and related product
CN115081600A (en) Conversion unit for executing Winograd convolution, integrated circuit device and board card
CN111382848A (en) Computing device and related product
KR20200139256A (en) Network-on-chip data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant