WO2020258528A1 - Configurable universal convolutional neural network accelerator - Google Patents
Configurable universal convolutional neural network accelerator Download PDFInfo
- Publication number
- WO2020258528A1 WO2020258528A1 PCT/CN2019/105533 CN2019105533W WO2020258528A1 WO 2020258528 A1 WO2020258528 A1 WO 2020258528A1 CN 2019105533 W CN2019105533 W CN 2019105533W WO 2020258528 A1 WO2020258528 A1 WO 2020258528A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- accelerator
- data
- state
- read
- feature map
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
Definitions
- the invention discloses a configurable general convolutional neural network accelerator, which belongs to the technical field of calculation, calculation and counting.
- GPU Graphic Processing Unit, graphics processor
- multi-core CPU Central Processing Unit, central processing unit
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the existing accelerators have the problem of low versatility and inability to adapt to changing neural network structures.
- This application aims to propose a configurable general convolutional neural network that uses different scales of storage and computing resources for different network structures. Network to achieve excellent throughput and energy efficiency ratio indicators.
- the purpose of the present invention is to provide a configurable general convolutional neural network accelerator in view of the above-mentioned background technology deficiencies.
- the acceleration of the convolutional neural network structure of various scales is realized, aiming at the neural network of different structures. Adopting different data multiplexing modes and adopting highly parallelized processing units to obtain higher computing throughput under the condition of using less resources, which solves the problem that the existing hardware accelerators cannot adapt to the application requirements of the changeable neural network structure.
- the hardware accelerator faces the technical problem that the acceleration effect of the changed network structure is not ideal.
- a configurable general convolutional neural network accelerator including: state controller, feature map buffer, weight buffer, register stack, PE array, output buffer, functional module and AXI4 bus interface.
- the state controller selects the accelerator data reuse mode and state transition sequence according to the network parameters and controls the switching of the accelerator working state;
- the characteristic map buffer area is used to buffer the characteristic map data read from the external memory through the AXI4 bus interface, and before the calculation starts ,
- the feature map data required for a convolution calculation is stored in the register stack;
- the weight buffer area is used to cache the weight data read from the external memory through the AXI4 bus interface, and after the calculation is started, the weight data is directly input to Each PE unit;
- the register stack is used to cache the feature map data required for a calculation, and the feature map data in the register stack is gradually updated after the calculation is started;
- the PE array is used to read the feature map data in the register stack and the weight buffer area and Weight data, and store the convolution calculation result in the output buffer area;
- the output buffer area is used to store the convolution calculation result, and the calculation result is sent to the function module after the calculation is completed;
- the function module is used to complete the convolution after adding paranoi
- the state controller is divided into network parameter register and working state controller.
- the state controller In reading the network parameter state, the state controller reads the network parameter from the external memory through the AXI4 bus interface, and updates its own network parameter register.
- the accelerator configuration can be updated by updating the network parameter register, which is used to accelerate neural network structures of different sizes and use the optimal configuration parameters.
- Configuration parameters include: data reuse mode, feature map size, convolution kernel size, array size, number of sub-buffer areas, number of input channels, number of output channels, and functional module configuration information.
- the working states of the accelerator are: waiting, reading network parameters, reading BN parameters, reading feature maps, reading weights, calculating and sending.
- the working state controller will control the accelerator's working state switching according to the read network parameters, and send corresponding control signals to other modules.
- Data reuse modes include: input reuse, output reuse, and weight reuse. Choose appropriate data reuse modes for convolutional layers of different sizes, minimize the number of memory accesses, and improve accelerator performance.
- the data reuse mode used by each layer is configured through network parameters.
- Input reuse mode means that after a batch of data calculation is completed, the input feature map is retained and the weight data is replaced.
- the accelerator first enters the working state of reading the feature map, then enters the reading weight state, and enters the calculation state after reading, and returns to the reading weight after the calculation is completed State, repeat the previous state jump process until the controller prompts to enter the sending state.
- Output reuse mode means that after a batch of data calculation is completed, the intermediate calculation results are retained, and the feature map and weight data are replaced at the same time.
- the accelerator first enters the state of reading the feature map, then enters the state of reading weights, and enters the calculation state after reading, and after the calculation is completed Return to the state of reading the feature map and repeat the previous state jump process until the controller prompts to enter the sending state.
- the weight reuse mode means that after a batch of data calculation is completed, the weights are retained and the feature map data is replaced.
- the accelerator first enters the weight reading state, then enters the read feature map state, enters the calculation state after reading, and returns to the read feature map state after the calculation is completed , Repeat the previous state jump process until the controller prompts to enter the sending state.
- the feature map cache is divided into M feature map sub-buffer areas, and the value of M is determined by the number of sub-buffer areas in the configuration parameter.
- the feature map data of each input channel read from the external memory through the AXI4 interface is stored in the corresponding feature map sub-buffer area by line.
- the last feature map sub-buffer area has stored one line of image data
- the next line of image data of the feature map data is returned and stored in the first feature map sub-buffer area.
- the feature map data of the next input channel is stored in the feature map buffer in the same mode.
- the weight buffer is divided into N weight sub-buffer areas, and the value of N is determined by the number of PE array columns in the configuration parameter.
- the weight data read from the external memory through the AXI4 interface is stored in the weight sub-buffer in the order of the filter.
- Each column of PEs shares a weight sub-buffer area, and during calculation, the weight sub-buffer area sends the weight to each PE in the corresponding column.
- the output buffer is divided into R output sub-buffer areas, and the value of R is determined by the number of PE array rows used. Each row of PE corresponds to an output sub-buffer area.
- the output result of each line of PE is one line of data of multiple output feature maps, which are stored in the output sub-buffer area corresponding to each line of PE in the order of output feature maps.
- the register stack buffers all the feature map data required for one calculation of the PE array before the calculation starts.
- K*S feature points K: convolution kernel size, S: step size
- the feature map data in the register stack will be updated to ensure that before the next convolution calculation starts, the required
- the feature map data caching is completed.
- the PE array is a two-dimensional systolic array composed of multiple arithmetic units, which is used for convolution operations.
- Each row of PE corresponds to one row of the output feature map
- each column of PE corresponds to one output feature map.
- the input feature map data is input from the PE of the first column and passed to the PE of the next adjacent column in turn.
- the weight data is directly input to each PE from the weight sub-buffer corresponding to each column.
- the AXI4 bus interface is used to mount the accelerator on any bus device that uses the AXI4 protocol.
- the bit width of the AXI4 bus is greater than the data bit width used in calculations, so multiple data are spliced into one bus for data transmission to improve transmission efficiency.
- the present invention adopts the above technical scheme and has the following beneficial effects: the present invention uses the state controller to configure the optimal accelerator parameters matching the neural network structure according to the network parameters, and flexibly adjusts the PE array size and the division of the sub-buffer area to meet the needs of the neural network structure. Change application requirements, obtain the best acceleration effect under certain resource constraints, at the same time, adopt a configurable data reuse mode, adopt the optimal data reuse mode for different neural network structures, make full use of transmission bandwidth, and take advantage of high parallelism The PE array structure achieves a higher data throughput rate.
- Fig. 1 is a schematic structural diagram of a general convolutional neural network accelerator disclosed in the present invention.
- Figure 2 is a schematic diagram of the data flow of the PE array in the present invention.
- Fig. 3 is a schematic diagram of the work flow of the general convolutional neural network accelerator disclosed in the present invention.
- the configurable general convolutional neural network accelerator designed by the present invention is shown in Figure 1.
- the size of the two PE arrays is 14*16
- the size of the convolution kernel is 3*3
- the step length of the convolution kernel is 1
- the input feature map is The size is 15*15 (after adding padding)
- the number of input channels in a single batch is 14
- the number of output channels in a single batch is 32
- the data multiplexing mode is output multiplexing as an example, and the working method is described in detail.
- the accelerator After the accelerator in the waiting state receives the start signal.
- the accelerator reads the network parameters from the external memory through the bus interface, updates the network parameter register, and determines the data reuse mode and the switching sequence of the working state according to the value of the network parameter register.
- the BN parameter read through the accelerator interface is divided into two parts and stored in two BN parameter buffer areas for the output of the two PE arrays.
- the feature map data read through the accelerator interface is cached in five feature map sub-buffer areas by line, and 3 lines of feature map data are cached in each feature map sub-buffer area.
- the weight data read through the accelerator interface is stored in 32 weight sub-buffer areas in the order of the filter.
- 15*3*14 feature map data are read from the feature map sub-buffer area and stored in the register stack.
- the PE array obtains data from the register stack and the weight sub-buffer area for convolution operation. After calculating 3 times of multiplying and accumulating operations, 15*1 feature map data in the register stack is updated. After calculating 3*3*14 data, output the calculation result once.
- the calculation results are stored in the corresponding output sub-buffer area in the order of the feature map. Since the data multiplexing mode of this layer network is output multiplexing, it returns to the read characteristic map state after the calculation is completed, and enters the read weight state and the calculation state in sequence, and repeats the state cycle until the state controller gives the calculation completion command.
- the accelerator sends the data to the functional module to calculate the final output data after the convolution calculation is completed, the accelerator jumps from the calculation state to the sending state, and sends the data to the external memory through the AXI4 bus interface.
- the input data of each row of the PE array is provided by the register stack
- the weight data of each column is provided by the corresponding weight sub-buffer area
- the output data of each row of PE is stored in the corresponding output sub-buffer area.
- 15*3*14 feature map data are stored in the register stack.
- the data from rows 1 to 3 are sent to the first PE in column 1
- the data from rows 2 to 4 are sent to the second PE in column 1
- the data from rows 13 to 15 are sent to the first PE List the last PE.
- a 3*3 convolution window take the first PE in the first column as an example, read the data in column order, and read the first data of the first three rows in the register stack in the first three clock cycles.
- the accelerator has 7 working states: waiting, reading network parameters, reading BN parameters, reading feature maps, reading weights, calculating and sending.
- the selection and switching of the working state are determined by the accelerator working state controller.
- the accelerator working status controller judges whether it is necessary to read the BN parameter status by reading the network parameter register value.
- the accelerator state controller determines the cycle sequence of reading the characteristic map, reading the weight and calculating the three states by judging the accelerator data reuse mode.
- the accelerator In the input reuse mode, the accelerator first enters the state of reading feature maps, then enters the state of reading weights, and finally enters the calculation state, and returns to the state of reading weights after the calculation state ends; in the output reuse mode, the accelerator first enters the state of reading feature maps, and then Enter the read weight state, and finally enter the calculation state. After the calculation state is over, return to the read feature map state; in the weight reuse mode, the accelerator first enters the read weight state, then enters the read feature map state, and finally enters the calculation state, and ends in the calculation state Then return to the status of reading the feature map.
- Waiting state After initialization, the accelerator is in the waiting state.
- the accelerator working state controller waits for an external signal to start the accelerator. After the accelerator receives the start signal, it jumps to the state of reading network parameters. After the calculation of the last layer of convolution is completed, the accelerator returns to the waiting state and waits for the next trigger.
- Read network parameter state the accelerator enters the state of reading network parameters, reads the previously stored network parameters from the external memory through the AXI4 bus interface, analyzes and stores the read back bus data into the corresponding network parameter register, and reads back the network parameters Including data storage offset address, accelerator data multiplexing mode, accelerator function module configuration, network size and convolution kernel size, the optimal accelerator network parameters can be selected for different network sizes, and the best can be achieved for different network accelerators Performance indicators.
- Read BN parameter state After the accelerator enters this state, it reads the BN and bias parameters from the external memory through the AXI4 bus interface, and stores them in two BN parameter storage areas and bias parameter storage areas. After the data is read, it is based on the current data The multiplexing mode enters the read feature map state or read weight state.
- Read feature map state After the accelerator enters this state, the number of feature map sub-buffer areas used is determined by the configured accelerator network parameters, and the feature map data is read from the external memory through the AXI4 bus interface and stored in the feature map in line order In the buffer area, after reading the data, it is determined whether to enter the read weight state or the calculation state according to the current data multiplexing mode.
- Read weight state After the accelerator enters this state, the number of weight sub-buffer areas used is determined by the network parameters, and the weight data is read from the external memory through the AXI4 bus interface, and the weight data is stored in the weight sub-buffer area in the order of the filter, and read After the data is completed, it is determined whether to enter the read characteristic map state or the calculation state according to the current data multiplexing mode.
- Calculation state After the accelerator enters this state, it starts to read the calculation data from the register stack and the weight buffer in turn, and completes the convolution calculation. After the calculation is completed, according to the accelerator working state controller signal, it decides to enter the function module calculation or return Read the data state, after the convolution result is output by the function module, the calculation state ends, and the accelerator enters the sending state.
- Sending state In the sending state, the accelerator packs the calculation results output by the functional module and sends them to the external storage area through the AXI4 bus interface.
- the output result bit width is 16 bits
- the bus bit width is 64 bits, so 4 The output result is combined into one bus data transmission, after the transmission is finished, the accelerator returns to the waiting state, waiting for the next trigger work.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims (8)
- 一种可配置的通用卷积神经网络加速器,其特征在于,包括:A configurable general convolutional neural network accelerator, which is characterized in that it includes:状态控制器,从外部存储器读取网络参数,根据网络参数配置包含数据重用模式和阵列尺寸以及子缓存区个数的加速器参数,根据数据重用模式切换加速器工作状态,The state controller reads the network parameters from the external memory, configures the accelerator parameters including the data reuse mode and array size and the number of sub-buffer areas according to the network parameters, and switches the accelerator working state according to the data reuse mode.包含多个子缓存区的特征图缓存区,根据状态控制器配置的子缓存区个数按行缓存从外部存储器读取的特征图数据,A feature map buffer area containing multiple sub-buffer areas, and the feature map data read from the external memory is cached in rows according to the number of sub-buffer areas configured by the state controller.寄存器栈,缓存PE阵列一次计算所需的特征图数据,The register stack, which caches the feature map data required for one-time calculation of the PE array,包含多个子缓存区的权重缓存区,根据状态控制器配置的子缓存区个数按滤波器顺序缓存从外部存储器读取的权重数据,The weight buffer area contains multiple sub-buffer areas, and the weight data read from the external memory is buffered in filter order according to the number of sub-buffer areas configured by the state controller.PE阵列,每行PE单元从寄存器栈读取特征图数据,每列PE单元读取同一权重子缓存区中缓存的权重数据,对特征图数据和权重数据进行卷积计算,及,PE array, each row of PE units reads feature map data from the register stack, each column of PE units reads the weight data cached in the same weight sub-buffer area, performs convolution calculations on the feature map data and weight data, and,包含多个子缓存区的输出缓存区,缓存各行PE单元输出的不同特征图的行数据。The output buffer area containing multiple sub-buffer areas buffers the row data of different feature maps output by each row of PE units.
- 根据权利要求1所述一种可配置的通用卷积神经网络加速器,其特征在于,状态控制器根据从外部存储器读取的网络参数包含卷积层尺寸,根据卷积层尺寸配置访存次数最少的数据重用模式,所述数据重用模式包括:输入数据重用模式、权重数据重用模式、输出数据重用模式。The configurable universal convolutional neural network accelerator according to claim 1, wherein the state controller includes the size of the convolutional layer according to the network parameters read from the external memory, and configures the minimum number of accesses according to the size of the convolutional layer The data reuse mode includes: input data reuse mode, weight data reuse mode, and output data reuse mode.
- 根据权利要求1所述一种可配置的通用卷积神经网络加速器,其特征在于,所述加速器还包括:The configurable universal convolutional neural network accelerator according to claim 1, wherein the accelerator further comprises:BN参数存储区,在状态控制器从外部存储器读取的网络参数包含功能模块配置信息时缓存从外部存储器读取的BN参数,BN parameter storage area, when the network parameter read from the external memory by the state controller contains the configuration information of the function module, the BN parameter read from the external memory is cached,Bias参数存储器,在状态控制器从外部存储器读取的网络参数包含功能模块配置信息时缓存从外部存储器读取的Bias参数,Bias parameter memory, when the network parameter read from the external memory by the state controller contains the configuration information of the function module, the Bias parameter read from the external memory is cached,功能模块,在接收到状态控制器发出的进行功能运算的指令后,对输出缓存器存储的特征图行数据依次进行偏置加、归一化、激活、池化操作,最终输出神经网络的计算结果。The function module, after receiving the instruction of the function operation from the state controller, performs the offset addition, normalization, activation, and pooling operations on the characteristic map line data stored in the output buffer in turn, and finally outputs the calculation of the neural network result.
- 根据权利要求1所述一种可配置的通用卷积神经网络加速器,其特征在于,根据状态控制器配置的子缓存区个数确定特征图数据缓存区的子缓存区个数。The configurable universal convolutional neural network accelerator according to claim 1, wherein the number of sub-buffer areas in the feature map data buffer area is determined according to the number of sub-buffer areas configured by the state controller.
- 根据权利要求1所述一种可配置的通用卷积神经网络加速器,其特征在于,根据状态控制器配置的阵列列数确定权重缓存区的子缓存区个数。The configurable universal convolutional neural network accelerator according to claim 1, wherein the number of sub-buffer areas of the weight buffer is determined according to the number of array columns configured by the state controller.
- 根据权利要求1所述一种可配置的通用卷积神经网络加速器,其特征在于,根据状态控制器配置的阵列行数确定输出缓存区的子缓存区个数。The configurable universal convolutional neural network accelerator according to claim 1, wherein the number of sub-buffer areas of the output buffer area is determined according to the number of array rows configured by the state controller.
- 根据权利要求2所述一种可配置的通用卷积神经网络加速器,其特征在于,状态控制器初始化加速器进入读网络参数状态,根据读取的网络参数完成加速器参数的配置后,在输入数据重用模式下依次切换加速器进入读特征图状态、读权重状态、计算状态,在权重数据重用模式下依次切换加速器进入读权重状态、读特征图状态、计算状态,在输出数据重用状态下依次切换加速器进入读特征图状态、读权重状态、计算状态,在完成卷积计算后切换至数据发送状态。The configurable universal convolutional neural network accelerator according to claim 2, wherein the state controller initializes the accelerator to enter the state of reading network parameters, and after completing the configuration of the accelerator parameters according to the read network parameters, the input data is reused In the mode, switch the accelerator to enter the read feature map state, read weight state, and calculate state in sequence. In the weight data reuse mode, switch the accelerator to enter the read weight state, read feature map state, and calculate state in turn. In the output data reuse state, switch the accelerator to enter in turn Read feature map status, read weight status, calculation status, and switch to data sending status after completing convolution calculation.
- 根据权利要求7所述一种可配置的通用卷积神经网络加速器,其特征在于,在状态控制器从外部存储器读取的网络参数包含功能模块配置信息时,切换至读取BN参数和Bias参数的状态,再根据数据重用模式切换工作状态。The configurable universal convolutional neural network accelerator according to claim 7, wherein when the network parameters read by the state controller from the external memory include functional module configuration information, switch to read BN parameters and Bias parameters , And then switch the working state according to the data reuse mode.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910554533.7A CN110390384B (en) | 2019-06-25 | 2019-06-25 | Configurable general convolutional neural network accelerator |
CN201910554533.7 | 2019-06-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020258528A1 true WO2020258528A1 (en) | 2020-12-30 |
Family
ID=68285786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/105533 WO2020258528A1 (en) | 2019-06-25 | 2019-09-12 | Configurable universal convolutional neural network accelerator |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110390384B (en) |
WO (1) | WO2020258528A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113222129A (en) * | 2021-04-02 | 2021-08-06 | 西安电子科技大学 | Convolution operation processing unit and system based on multi-level cache cyclic utilization |
CN113313251A (en) * | 2021-05-13 | 2021-08-27 | 中国科学院计算技术研究所 | Deep separable convolution fusion method and system based on data stream architecture |
US20210295145A1 (en) * | 2020-03-23 | 2021-09-23 | Mentium Technologies Inc. | Digital-analog hybrid system architecture for neural network acceleration |
CN113962361A (en) * | 2021-10-09 | 2022-01-21 | 西安交通大学 | Winograd-based data conflict-free scheduling method for CNN accelerator system |
CN114707649A (en) * | 2022-03-28 | 2022-07-05 | 北京理工大学 | General convolution arithmetic device |
CN114781632A (en) * | 2022-05-20 | 2022-07-22 | 重庆科技学院 | Deep neural network accelerator based on dynamic reconfigurable pulse tensor operation engine |
CN114997386A (en) * | 2022-06-29 | 2022-09-02 | 桂林电子科技大学 | CNN neural network acceleration design method based on multi-FPGA heterogeneous architecture |
CN115965067A (en) * | 2023-02-01 | 2023-04-14 | 苏州亿铸智能科技有限公司 | Neural network accelerator for ReRAM |
CN116050474A (en) * | 2022-12-29 | 2023-05-02 | 上海天数智芯半导体有限公司 | Convolution calculation method, SOC chip, electronic equipment and storage medium |
CN118070855A (en) * | 2024-04-18 | 2024-05-24 | 南京邮电大学 | Convolutional neural network accelerator based on RISC-V architecture |
CN118070855B (en) * | 2024-04-18 | 2024-07-09 | 南京邮电大学 | Convolutional neural network accelerator based on RISC-V architecture |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382094B (en) * | 2018-12-29 | 2021-11-30 | 深圳云天励飞技术有限公司 | Data processing method and device |
CN112819022B (en) * | 2019-11-18 | 2023-11-07 | 同方威视技术股份有限公司 | Image recognition device and image recognition method based on neural network |
US11216375B2 (en) | 2020-02-26 | 2022-01-04 | Hangzhou Zhicun Intelligent Technology Co., Ltd. | Data caching |
CN113313228B (en) * | 2020-02-26 | 2022-10-14 | 杭州知存智能科技有限公司 | Data caching circuit and method |
CN111401543B (en) * | 2020-06-08 | 2020-11-10 | 深圳市九天睿芯科技有限公司 | Neural network accelerator with full on-chip storage and implementation method thereof |
CN111832717B (en) * | 2020-06-24 | 2021-09-28 | 上海西井信息科技有限公司 | Chip and processing device for convolution calculation |
CN111967587B (en) * | 2020-07-27 | 2024-03-29 | 复旦大学 | Method for constructing operation unit array structure facing neural network processing |
CN111626414B (en) * | 2020-07-30 | 2020-10-27 | 电子科技大学 | Dynamic multi-precision neural network acceleration unit |
CN111931911B (en) * | 2020-07-30 | 2022-07-08 | 山东云海国创云计算装备产业创新中心有限公司 | CNN accelerator configuration method, system and device |
CN112232499B (en) * | 2020-10-13 | 2022-12-23 | 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) | Convolutional neural network accelerator |
KR20220049325A (en) | 2020-10-14 | 2022-04-21 | 삼성전자주식회사 | Accelerator and electronic device including the same |
CN112465110B (en) * | 2020-11-16 | 2022-09-13 | 中国电子科技集团公司第五十二研究所 | Hardware accelerator for convolution neural network calculation optimization |
CN112766479B (en) * | 2021-01-26 | 2022-11-11 | 东南大学 | Neural network accelerator supporting channel separation convolution based on FPGA |
CN112949847B (en) * | 2021-03-29 | 2023-07-25 | 上海西井科技股份有限公司 | Neural network algorithm acceleration system, scheduling system and scheduling method |
CN113570034B (en) * | 2021-06-18 | 2022-09-27 | 北京百度网讯科技有限公司 | Processing device, neural network processing method and device |
CN113807509B (en) * | 2021-09-14 | 2024-03-22 | 绍兴埃瓦科技有限公司 | Neural network acceleration device, method and communication equipment |
CN113792868B (en) * | 2021-09-14 | 2024-03-29 | 绍兴埃瓦科技有限公司 | Neural network computing module, method and communication equipment |
CN113792687A (en) * | 2021-09-18 | 2021-12-14 | 兰州大学 | Human intrusion behavior early warning system based on monocular camera |
CN114239816B (en) * | 2021-12-09 | 2023-04-07 | 电子科技大学 | Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network |
CN114820630B (en) * | 2022-07-04 | 2022-09-06 | 国网浙江省电力有限公司电力科学研究院 | Target tracking algorithm model pipeline acceleration method and circuit based on FPGA |
CN116010313A (en) * | 2022-11-29 | 2023-04-25 | 中国科学院深圳先进技术研究院 | Universal and configurable image filtering calculation multi-line output system and method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341544A (en) * | 2017-06-30 | 2017-11-10 | 清华大学 | A kind of reconfigurable accelerator and its implementation based on divisible array |
WO2018196863A1 (en) * | 2017-04-28 | 2018-11-01 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium |
CN108805272A (en) * | 2018-05-03 | 2018-11-13 | 东南大学 | A kind of general convolutional neural networks accelerator based on FPGA |
CN109102065A (en) * | 2018-06-28 | 2018-12-28 | 广东工业大学 | A kind of convolutional neural networks accelerator based on PSoC |
CN109598338A (en) * | 2018-12-07 | 2019-04-09 | 东南大学 | A kind of convolutional neural networks accelerator of the calculation optimization based on FPGA |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11775313B2 (en) * | 2017-05-26 | 2023-10-03 | Purdue Research Foundation | Hardware accelerator for convolutional neural networks and method of operation thereof |
CN108241890B (en) * | 2018-01-29 | 2021-11-23 | 清华大学 | Reconfigurable neural network acceleration method and architecture |
-
2019
- 2019-06-25 CN CN201910554533.7A patent/CN110390384B/en active Active
- 2019-09-12 WO PCT/CN2019/105533 patent/WO2020258528A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018196863A1 (en) * | 2017-04-28 | 2018-11-01 | 北京市商汤科技开发有限公司 | Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium |
CN107341544A (en) * | 2017-06-30 | 2017-11-10 | 清华大学 | A kind of reconfigurable accelerator and its implementation based on divisible array |
CN108805272A (en) * | 2018-05-03 | 2018-11-13 | 东南大学 | A kind of general convolutional neural networks accelerator based on FPGA |
CN109102065A (en) * | 2018-06-28 | 2018-12-28 | 广东工业大学 | A kind of convolutional neural networks accelerator based on PSoC |
CN109598338A (en) * | 2018-12-07 | 2019-04-09 | 东南大学 | A kind of convolutional neural networks accelerator of the calculation optimization based on FPGA |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210295145A1 (en) * | 2020-03-23 | 2021-09-23 | Mentium Technologies Inc. | Digital-analog hybrid system architecture for neural network acceleration |
CN113222129A (en) * | 2021-04-02 | 2021-08-06 | 西安电子科技大学 | Convolution operation processing unit and system based on multi-level cache cyclic utilization |
CN113222129B (en) * | 2021-04-02 | 2024-02-13 | 西安电子科技大学 | Convolution operation processing unit and system based on multi-level cache cyclic utilization |
CN113313251B (en) * | 2021-05-13 | 2023-05-23 | 中国科学院计算技术研究所 | Depth separable convolution fusion method and system based on data flow architecture |
CN113313251A (en) * | 2021-05-13 | 2021-08-27 | 中国科学院计算技术研究所 | Deep separable convolution fusion method and system based on data stream architecture |
CN113962361A (en) * | 2021-10-09 | 2022-01-21 | 西安交通大学 | Winograd-based data conflict-free scheduling method for CNN accelerator system |
CN113962361B (en) * | 2021-10-09 | 2024-04-05 | 西安交通大学 | Winograd-based CNN accelerator system data conflict-free scheduling method |
CN114707649A (en) * | 2022-03-28 | 2022-07-05 | 北京理工大学 | General convolution arithmetic device |
CN114781632A (en) * | 2022-05-20 | 2022-07-22 | 重庆科技学院 | Deep neural network accelerator based on dynamic reconfigurable pulse tensor operation engine |
CN114997386B (en) * | 2022-06-29 | 2024-03-22 | 桂林电子科技大学 | CNN neural network acceleration design method based on multi-FPGA heterogeneous architecture |
CN114997386A (en) * | 2022-06-29 | 2022-09-02 | 桂林电子科技大学 | CNN neural network acceleration design method based on multi-FPGA heterogeneous architecture |
CN116050474A (en) * | 2022-12-29 | 2023-05-02 | 上海天数智芯半导体有限公司 | Convolution calculation method, SOC chip, electronic equipment and storage medium |
CN115965067B (en) * | 2023-02-01 | 2023-08-25 | 苏州亿铸智能科技有限公司 | Neural network accelerator for ReRAM |
CN115965067A (en) * | 2023-02-01 | 2023-04-14 | 苏州亿铸智能科技有限公司 | Neural network accelerator for ReRAM |
CN118070855A (en) * | 2024-04-18 | 2024-05-24 | 南京邮电大学 | Convolutional neural network accelerator based on RISC-V architecture |
CN118070855B (en) * | 2024-04-18 | 2024-07-09 | 南京邮电大学 | Convolutional neural network accelerator based on RISC-V architecture |
Also Published As
Publication number | Publication date |
---|---|
CN110390384B (en) | 2021-07-06 |
CN110390384A (en) | 2019-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020258528A1 (en) | Configurable universal convolutional neural network accelerator | |
CN108171317B (en) | Data multiplexing convolution neural network accelerator based on SOC | |
CN110390385B (en) | BNRP-based configurable parallel general convolutional neural network accelerator | |
CN109598338B (en) | Convolutional neural network accelerator based on FPGA (field programmable Gate array) for calculation optimization | |
US20190026626A1 (en) | Neural network accelerator and operation method thereof | |
CN107301455B (en) | Hybrid cube storage system for convolutional neural network and accelerated computing method | |
CN110334799B (en) | Neural network reasoning and training accelerator based on storage and calculation integration and operation method thereof | |
CN108416437A (en) | The processing system and method for artificial neural network for multiply-add operation | |
CN108805272A (en) | A kind of general convolutional neural networks accelerator based on FPGA | |
CN110516801A (en) | A kind of dynamic reconfigurable convolutional neural networks accelerator architecture of high-throughput | |
CN108537331A (en) | A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic | |
CN105912501B (en) | A kind of SM4-128 Encryption Algorithm realization method and systems based on extensive coarseness reconfigurable processor | |
CN115880132B (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN115860080B (en) | Computing core, accelerator, computing method, apparatus, device, medium, and system | |
US20230128421A1 (en) | Neural network accelerator | |
US20230376733A1 (en) | Convolutional neural network accelerator hardware | |
RU2294561C2 (en) | Device for hardware realization of probability genetic algorithms | |
CN113673691A (en) | Storage and computation combination-based multi-channel convolution FPGA (field programmable Gate array) framework and working method thereof | |
CN106569968A (en) | Inter-array data transmission structure and scheduling method used for reconfigurable processor | |
CN101452572A (en) | Image rotating VLSI structure based on cubic translation algorithm | |
CN117291240B (en) | Convolutional neural network accelerator and electronic device | |
US11068200B2 (en) | Method and system for memory control | |
CN115965067B (en) | Neural network accelerator for ReRAM | |
CN113177877B (en) | Schur elimination accelerator oriented to SLAM rear end optimization | |
Rezaei et al. | Smart Memory: Deep Learning Acceleration In 3D-Stacked Memories |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19935697 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935697 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935697 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 31/08/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935697 Country of ref document: EP Kind code of ref document: A1 |