CN112632459A - On-line computation element for deep convolution - Google Patents

On-line computation element for deep convolution Download PDF

Info

Publication number
CN112632459A
CN112632459A CN202011525795.XA CN202011525795A CN112632459A CN 112632459 A CN112632459 A CN 112632459A CN 202011525795 A CN202011525795 A CN 202011525795A CN 112632459 A CN112632459 A CN 112632459A
Authority
CN
China
Prior art keywords
convolution
component
activation
activation value
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011525795.XA
Other languages
Chinese (zh)
Other versions
CN112632459B (en
Inventor
张昆
钱磊
尚江卫
原昊
朱剑文
曾明勇
陆一峰
贾迅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN202011525795.XA priority Critical patent/CN112632459B/en
Publication of CN112632459A publication Critical patent/CN112632459A/en
Application granted granted Critical
Publication of CN112632459B publication Critical patent/CN112632459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an on-line computation component of depth convolution, which comprises a standard convolution component, an accumulator and a depth convolution component connected to a data output interface of the accumulator; the deep convolution component comprises a plurality of stages of activation value platforms, a plurality of multipliers, a plurality of weight value platforms and at least one delay platform arranged between 2 adjacent activation value platforms, each multiplier is provided with 1 activation value platform and 1 weight value platform, the delay value D of each delay platform is equal to the width of an input activation image, the weight values are preset before convolution calculation is started, the activation value platforms are injected into the operation component in a step-by-step advancing mode, and results currently stored in the activation value platforms of each stage are sent to the activation value platforms of the next stage. The invention efficiently completes the deep convolution calculation on the premise of not damaging the output data structure of the accumulator, can greatly improve the utilization rate of the calculation resources of the deep convolution calculation, and accelerates the calculation speed of the whole neural network.

Description

On-line computation element for deep convolution
Technical Field
The invention relates to an online computation component for deep convolution, and belongs to the technical field of neural networks.
Background
Most of the calculations in deep neural networks are convolution calculations, so neural network hardware accelerators typically design specialized computational components to achieve acceleration of convolution operations. Convolution acceleration components are generally organized in multi-vector or systolic arrays, and these structures (hereinafter referred to as standard convolution components) can efficiently integrate a large number of multiplication components to achieve high chip area utilization and chip performance ratio.
The deep convolution operation belongs to a special convolution operation, and is mainly characterized in that the accumulation calculation in the direction of an input channel is lacked, so that the efficiency of a standard convolution component is very low when the standard convolution component is used for deep convolution, and the utilization rate of hardware resources is reduced.
In order to speed up the efficiency of convolution operations in a neural network, a hardware structure of multi-vector (SIMD) or Systolic Array (Systolic Array) is generally adopted, and the two structures are consistent in nature, and a multi-vector structure is taken as an example for description.
First, the convolution operation can be described as the following 6-layer loop:
table 1: cyclic hierarchical schematic of convolution operations
for in M:// layer 5: output channel
for H in H:// layer 4: output feature map height
for W in W:// layer 3: output feature map width
for R in R:// layer 2: height of convolution kernel
for S in S:// layer 1: width of convolution kernel
for C in C:// layer 0: input channel
f _ out [ m ] [ h ] [ w ] + = ker [ m ] [ r ] [ s ] [ c ]. activation [ h + r ] [ w + s ] [ c ]// add and accumulate at R, S, C and combine into 1 number
The order of these 6-level loops is mathematically interchangeable, and the order listed above is one of the computational orders that facilitates hardware acceleration implementations.
The standard convolution operation unit generally utilizes the computation parallelism existing in the innermost loop (input channel) to complete computation of a large number (for example, 64 or 128) of input channel loops simultaneously in one clock cycle, so as to realize efficient hardware resource utilization and accelerate the computation of the neural network. Unlike standard convolution operations, deep convolution operations do not have the innermost loop.
The structure of the multi-vector standard convolution acceleration component is shown in fig. 1, and the design principle is that one input activation value needs to be calculated by multiple weights, so that different weight (kernel) data are stored in multiple components, the same input activation value is broadcast to multiple calculation components, each calculation component consists of SIMD _ W multipliers, SIMD _ W multiplications can be completed simultaneously, and the calculation results are accumulated (accumulated to 1 number), and through several rounds of calculation (SIMD _ W for each round), the complete calculation of one input channel at the innermost layer in 6-layer loop of convolution calculation can be completed, and the result is the accumulation of multiple rounds (still 1 number).
The design of the standard convolution calculation component has high adaptability, can deal with various convolution calculations (namely, in the 6-layer loop, if the traversal of a certain layer of loop is lacked, the standard convolution calculation component can still be used for realizing the convolution calculation), and has the defect of low efficiency in the deep convolution calculation. Specifically, corresponding to the 6-layer cyclic structure of convolution calculation, the calculation of deep convolution has no input channel traversal of the innermost layer, and since the standard convolution component has preset a large number of input channel traversals, it generally designs a wider vector width (for example, SIMD _ W =64 or 128), and when performing deep convolution calculation, the multiplication component with the width SIMD _ W cannot be fully utilized, resulting in a reduction in calculation efficiency. Further, due to the adoption of a general design, the standard convolution component cannot continue other convolution calculations of subsequent network layers before completing the deep convolution operation, thereby causing the low calculation efficiency of the whole network.
Disclosure of Invention
The invention aims to provide an on-line computation component for deep convolution, which can efficiently complete deep convolution computation on the premise of not damaging an output data structure of an accumulator, greatly improve the utilization rate of computation resources of deep convolution computation, improve the overall efficiency of a chip and accelerate the computation speed of the whole neural network.
In order to achieve the purpose, the invention adopts the technical scheme that: the on-line computation component for the deep convolution is provided, and comprises a standard convolution component, an accumulator and a deep convolution component connected to a data output interface of the accumulator;
the standard convolution component is used for calculating standard convolution in the convolutional neural network;
the accumulator is used for sending the convolution result obtained from the standard convolution component to the deep convolution component;
the deep convolution component is used for calculating the deep convolution in the convolution neural network;
the deep convolution component comprises a plurality of stages of activation value stations, a plurality of multipliers, a plurality of weight value stations and at least one delay station arranged between 2 adjacent activation value stations, each multiplier is provided with 1 activation value station and 1 weight value station, the delay value D of each delay station is equal to the width of an input activation image, the weight values are preset before convolution calculation is started, the activation value stations are injected into the multipliers in a step-by-step advancing mode, and the result currently stored in each stage of activation value station is sent to the next stage of activation value station in each clock cycle;
before starting the calculation, according to the size of the input activation map from the accumulator, setting the delay value D of the delay station equal to the width of the input activation map, and outputting valid data to the activation value station of the next stage at the output port of the delay station only after the delay station has received D data;
the output activation graph is organized into a plurality of columns according to the output channel (M), and organized into a plurality of rows according to the coordinate sequence, each clock period, the accumulator starts from the upper left corner of the output activation graph, firstly sends the input activation value data into an activation value platform of the deep convolution component according to the row priority from left to right and then according to the sequence of the rows from top to bottom;
when the depth convolution size is K, the input activation value data of the current K-1 line is sent into the activation value stations of the depth convolution component, and after the K input activation value data are sent, all the activation value stations simultaneously have effective input activation value data, at the moment, the multipliers respectively complete the multiplication of the input activation values and the weighted values in one-to-one correspondence, and the multiplication results are combined into 1 convolution result of the depth convolution through the addition tree logic
The further improved scheme in the technical scheme is as follows:
1. in the above scheme, the number of the depth convolution components is less than or equal to the number of output channels for outputting the activation map, and when the number of the depth convolution components is less than the number of output channels for outputting the activation map, the calculation of the whole output activation map is completed in a time division multiplexing mode.
2. In the scheme, the number of the delay stations is equal to the side length-1 of the square convolution kernel.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the on-line computation component of the deep convolution is characterized in that a customized deep convolution component is connected to an accumulator output interface of a standard convolution component, and the on-line computation and the customization of the component are adopted, so that the deep convolution computation is efficiently completed on the premise of not damaging an accumulator output data structure, the computation resource utilization rate of the deep convolution computation can be greatly improved, the overall efficiency of a chip is improved, the computation speed of the whole neural network is accelerated, and the problem of low hardware efficiency of the standard convolution component in the computation of the deep convolution is solved while the original design structure is not damaged.
Drawings
FIG. 1 is a diagram of a multi-vector standard convolution acceleration component;
FIG. 2 is a schematic structural diagram of the present invention;
FIG. 3 is a schematic diagram of a data storage structure in a standard convolution component accumulator;
FIG. 4 is a schematic diagram of the coupling of a standard convolution component and a deep convolution component of the present invention;
FIG. 5 is a schematic diagram of the structure of a deep convolution component;
FIG. 6 is a schematic diagram of the deep convolution component calculation;
FIG. 7 is a schematic diagram of the computation of multiple deep convolution components.
Detailed Description
Example (b): the invention provides an online computation component of depth convolution, which comprises a standard convolution component, an accumulator and a depth convolution component connected to an accumulator data output interface;
the standard convolution component is used for calculating standard convolution in the convolutional neural network;
the accumulator is used for sending the convolution result obtained from the standard convolution component to the deep convolution component;
the deep convolution component is used for calculating the deep convolution in the convolution neural network;
the deep convolution component comprises a plurality of stages of activation value stations, a plurality of multipliers, a plurality of weight value stations and at least one delay station arranged between 2 adjacent activation value stations, each multiplier is provided with 1 activation value station and 1 weight value station, the delay value D of each delay station is equal to the width of an input activation image, the weight values are preset before convolution calculation is started, the activation value stations are injected into the multipliers in a step-by-step advancing mode, and the result currently stored in each stage of activation value station is sent to the next stage of activation value station in each clock cycle;
before starting the calculation, according to the size of the input activation map from the accumulator, setting the delay value D of the delay station equal to the width of the input activation map, and outputting valid data to the activation value station of the next stage at the output port of the delay station only after the delay station has received D data;
the output activation graph is organized into a plurality of columns according to the output channel (M), and organized into a plurality of rows according to the coordinate sequence, each clock period, the accumulator starts from the upper left corner of the output activation graph, firstly sends the input activation value data into an activation value platform of the deep convolution component according to the row priority from left to right and then according to the sequence of the rows from top to bottom;
when the depth convolution size is K, the input activation value data of the current K-1 line is already sent to the activation value stations of the depth convolution component, and after the K input activation value data are sent, effective input activation value data exist in all the activation value stations at the same time, at the moment, the multipliers respectively complete the multiplication of the input activation values and the weighted values in a one-to-one correspondence manner, and the multiplication results are combined into 1 convolution result of the depth convolution through the addition tree logic.
The number of the depth convolution components is less than or equal to the number of the output channels for outputting the activation graph, when the number of the depth convolution components is less than the number of the output channels for outputting the activation graph, the calculation of the whole output activation graph is completed in a time division multiplexing mode, and on the premise of ensuring the performance, the hardware resource overhead can be saved.
The above embodiments are further explained as follows:
the overall structure of the invention is shown in fig. 2, a depth convolution component with a customized structure is designed on a data output interface of an accumulator of a standard convolution component, and efficient depth convolution calculation is carried out by an online calculation method.
The convolution result of the general convolution component is generally stored in an accumulator, and is used for combining intermediate results of multiple rounds of circulation, and after several rounds of calculation, a complete convolution result (output characteristic diagram) is obtained in the accumulator, and the structure of the convolution result is shown in fig. 3;
in fig. 3, the output activation map is organized into a plurality of columns according to the output channels (M), and organized into a plurality of rows according to the coordinate sequence, and when the result in the accumulator is output, the result is generally read according to the row direction (that is, the data of the same seat number and different output channels are read in each beat); the complete convolution result obtained in the accumulator is typically sent to memory or once again to the convolution component for calculation.
In the invention, the result of the accumulator is directly sent to the depth convolution component, so that the on-line depth convolution calculation can be realized, specifically, as shown in fig. 4, the accumulator outputs output characteristic graphs of m channels in one clock period, and correspondingly, m independent depth convolution components are adopted for carrying out convolution operation;
because of the deep convolution calculation, the m channel data output by the accumulator does not need to be combined into 1 number (namely, the innermost loop listed in table 1 does not exist), so that the data of each channel is sent to an independent deep convolution component, the multiplication and accumulation calculation of the 1 st layer and the 2 nd layer listed in table 1 is realized in the component, and the output result of each component still corresponds to the input Mi channel.
The deep convolution specific components are generally designed but not limited to the structure of FIG. 5;
fig. 5 shows a component structure capable of calculating a depth convolution result of a single channel, where the component structure includes 9 stages of enabled stations, and each clock cycle sends a result currently stored in each stage of the platform to a next stage of the platform; accordingly, there are 9 multipliers and 9 weight value stations;
before starting the calculation, the delay value D of the delay station in fig. 5 is set equal to the width of the input activation map according to the size of the input activation map, and the delay station functions as: outputting valid data to a subsequent stage at the output port only after the stage has received D data;
in each clock cycle, starting from the upper left corner of the output activation graph, firstly, inputting activation value data into a deep convolution component according to row priority from left to right and then according to the sequence of columns from top to bottom;
after the active values of the two rows have been sent to the deep convolution component and 3 active values have been sent, all the active value stations in fig. 5 will have valid active value data at the same time, at this time, 9 multipliers will simultaneously complete the one-to-one multiplication of 9 active values and 9 weight values, and the 9 multiplication results will be merged into 1 result through the addition tree logic, and this result is a convolution result of the deep convolution; after the 9 input activation values pass through the deep convolution component, a mapping relationship of 1 output activation value is obtained as shown in fig. 6.
Because the depth convolution component shown in fig. 5 is channel-independent, that is, the depth convolution component is performed on one input channel, in order to increase the calculation speed, m parts of depth convolution components can be implemented, each clock cycle can simultaneously receive the activation value data of m channels, and in order to facilitate the implementation of control logic, each clock cycle sends the data of the same coordinate position of m channels into m parts of depth convolution components, thereby implementing m times of parallelism;
as shown in fig. 7, since m parts of outputs of the depth convolution component correspond to m independent channels, and each clock cycle generates the same coordinate position of the output activation map, the input activation values of the depth convolution component can be organized as shown in fig. 3 so as to be directly sent to the storage component, or sent to other convolution components again for new convolution operation.
The number of output channels which can be obtained by each reading operation on the accumulator is recorded as acc _ out _ w, and the number of channels which can be simultaneously processed by the deep convolution component is recorded as m.
The data format output by the deep convolution component is consistent with the output of the accumulator and therefore does not affect the way the accumulator result is used by the destination to which the pre-accumulator result of the present invention is applied.
The independent deep convolution components are specially designed, and the specific implementation method is not limited in the invention.
When the on-line computation component of the deep convolution is adopted, the customized deep convolution component is connected to the accumulator output interface of the standard convolution component, and the on-line computation and the component customization are adopted, so that the deep convolution computation is efficiently completed on the premise of not damaging the output data structure of the accumulator, the computation resource utilization rate of the deep convolution computation can be greatly improved, the overall efficiency of a chip is improved, the computation speed of the whole neural network is accelerated, and the problem of low hardware efficiency of the standard convolution component in the computation of the deep convolution is solved while the original design structure is not damaged.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (3)

1. The on-line computation component of the deep convolution is characterized by comprising a standard convolution component, an accumulator and a deep convolution component connected to a data output interface of the accumulator;
the standard convolution component is used for calculating standard convolution in the convolutional neural network;
the accumulator is used for sending the convolution result obtained from the standard convolution component to the deep convolution component;
the deep convolution component is used for calculating the deep convolution in the convolution neural network;
the deep convolution component comprises a plurality of stages of activation value stations, a plurality of multipliers, a plurality of weight value stations and at least one delay station arranged between 2 adjacent activation value stations, each multiplier is provided with 1 activation value station and 1 weight value station, the delay value D of each delay station is equal to the width of an input activation image, the weight values are preset before convolution calculation is started, the activation value stations are injected into the multipliers in a step-by-step advancing mode, and the result currently stored in each stage of activation value station is sent to the next stage of activation value station in each clock cycle;
before starting the calculation, according to the size of the input activation map from the accumulator, setting the delay value D of the delay station equal to the width of the input activation map, and outputting valid data to the activation value station of the next stage at the output port of the delay station only after the delay station has received D data;
the output activation graph is organized into a plurality of columns according to the output channel (M), and organized into a plurality of rows according to the coordinate sequence, each clock period, the accumulator starts from the upper left corner of the output activation graph, firstly sends the input activation value data into an activation value platform of the deep convolution component according to the row priority from left to right and then according to the sequence of the rows from top to bottom;
when the depth convolution size is K, the input activation value data of the current K-1 line is already sent to the activation value stations of the depth convolution component, and after the K input activation value data are sent, effective input activation value data exist in all the activation value stations at the same time, at the moment, the multipliers respectively complete the multiplication of the input activation values and the weighted values in a one-to-one correspondence manner, and the multiplication results are combined into 1 convolution result of the depth convolution through the addition tree logic.
2. The on-line computation component of deep convolution of claim 1, characterized by: and when the number of the depth convolution components is less than the number of the output channels for outputting the activation map, the calculation of the whole output activation map is completed in a time division multiplexing mode.
3. The on-line computation component of deep convolution of claim 1, characterized by: the number of delay stations is equal to-1 for the side length of the square convolution kernel.
CN202011525795.XA 2020-12-22 2020-12-22 On-line computing component for depth convolution Active CN112632459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011525795.XA CN112632459B (en) 2020-12-22 2020-12-22 On-line computing component for depth convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011525795.XA CN112632459B (en) 2020-12-22 2020-12-22 On-line computing component for depth convolution

Publications (2)

Publication Number Publication Date
CN112632459A true CN112632459A (en) 2021-04-09
CN112632459B CN112632459B (en) 2023-07-07

Family

ID=75320624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011525795.XA Active CN112632459B (en) 2020-12-22 2020-12-22 On-line computing component for depth convolution

Country Status (1)

Country Link
CN (1) CN112632459B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046916A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
DE102017117381A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Accelerator for sparse folding neural networks
WO2018120989A1 (en) * 2016-12-29 2018-07-05 华为技术有限公司 Convolution operation chip and communication device
CN111626399A (en) * 2019-02-27 2020-09-04 中国科学院半导体研究所 Convolutional neural network calculation device and data calculation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180046916A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Sparse convolutional neural network accelerator
DE102017117381A1 (en) * 2016-08-11 2018-02-15 Nvidia Corporation Accelerator for sparse folding neural networks
WO2018120989A1 (en) * 2016-12-29 2018-07-05 华为技术有限公司 Convolution operation chip and communication device
CN111626399A (en) * 2019-02-27 2020-09-04 中国科学院半导体研究所 Convolutional neural network calculation device and data calculation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋佩卿;吴丽君;: "基于FPGA的改进二值化卷积层设计", 电气开关, no. 06 *

Also Published As

Publication number Publication date
CN112632459B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
US20220012593A1 (en) Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization
CN110458279B (en) FPGA-based binary neural network acceleration method and system
CN109948774B (en) Neural network accelerator based on network layer binding operation and implementation method thereof
CN111242289B (en) Convolutional neural network acceleration system and method with expandable scale
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN106445471A (en) Processor and method for executing matrix multiplication on processor
CN111445012A (en) FPGA-based packet convolution hardware accelerator and method thereof
CN111796796B (en) FPGA storage method, calculation method, module and FPGA board based on sparse matrix multiplication
CN111079923B (en) Spark convolutional neural network system suitable for edge computing platform and circuit thereof
WO2022110386A1 (en) Data processing method and artificial intelligence processor
CN108197075B (en) Multi-core implementation method of Inceptation structure
CN114781629B (en) Hardware accelerator of convolutional neural network based on parallel multiplexing and parallel multiplexing method
CN113033794A (en) Lightweight neural network hardware accelerator based on deep separable convolution
CN111738433A (en) Reconfigurable convolution hardware accelerator
CN111008691B (en) Convolutional neural network accelerator architecture with weight and activation value both binarized
CN111340198A (en) Neural network accelerator with highly-multiplexed data based on FPGA (field programmable Gate array)
CN113168309A (en) Method, circuit and SOC for executing matrix multiplication operation
CN112395549B (en) Reconfigurable matrix multiplication acceleration system for matrix multiplication intensive algorithm
CN112862091A (en) Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
CN112632459B (en) On-line computing component for depth convolution
CN113158132A (en) Convolution neural network acceleration system based on unstructured sparsity
CN110018457B (en) Method for designing satellite-borne SAR echo data frame header identifier detection functional module
CN114239816B (en) Reconfigurable hardware acceleration architecture of convolutional neural network-graph convolutional neural network
CN112905526B (en) FPGA implementation method for multiple types of convolution
CN112862079B (en) Design method of running water type convolution computing architecture and residual error network acceleration system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant