CN110209627A - A kind of hardware-accelerated method of SSD towards intelligent terminal - Google Patents
A kind of hardware-accelerated method of SSD towards intelligent terminal Download PDFInfo
- Publication number
- CN110209627A CN110209627A CN201910474860.1A CN201910474860A CN110209627A CN 110209627 A CN110209627 A CN 110209627A CN 201910474860 A CN201910474860 A CN 201910474860A CN 110209627 A CN110209627 A CN 110209627A
- Authority
- CN
- China
- Prior art keywords
- ssd
- fpga
- hardware
- intelligent terminal
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000001514 detection method Methods 0.000 claims abstract description 21
- 238000013461 design Methods 0.000 claims abstract description 20
- 230000001133 acceleration Effects 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000005055 memory storage Effects 0.000 claims description 3
- 238000003475 lamination Methods 0.000 claims 1
- 238000005457 optimization Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005265 energy consumption Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910052729 chemical element Inorganic materials 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7839—Architectures of general purpose stored program computers comprising a single central processing unit with memory
- G06F15/7842—Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
- G06F15/7846—On-chip cache and off-chip main memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a kind of hardware-accelerated methods of the SSD towards intelligent terminal, belong to FPGA hardware acceleration, target detection, computer vision and Heterogeneous Computing technical field.The hardware-accelerated method of SSD towards intelligent terminal of the invention uses ARM+FPGA isomery framework in edge side smart machine terminal, carries out computing hardware acceleration based on edge side target detection service application scene;The model training of SSD algorithm is completed by cloud data center, and is directed to different FPGA design personalization algorithms, and FPGA is dynamically loaded into smart machine terminal;3x3 convolution, add tree calculating and Relu computing unit are designed using dot-product operation and tree-like adder to SSD algorithm.The hardware-accelerated method of the SSD towards intelligent terminal of the invention can satisfy requirement of the edge side for calculating power, real-time and power consumption, have good application value.
Description
Technical field
The present invention relates to FPGA hardware acceleration, target detection, computer vision and Heterogeneous Computing technical fields, specific to provide
A kind of hardware-accelerated method of SSD towards intelligent terminal.
Background technique
FPGA (Field-Programmable Gate Array), i.e. field programmable gate array, are a kind of main needles
The semiconductor devices that application or functional requirement can be programmed.Isomery has been widely used in it and has accelerated field, has shown
Compared to general processor CPU better performance.The design of CPU is mainly for logic calculation, and different from CPU and GPU, FPGA is one
The typical Fei Nuoyiman framework of kind, is the mode of hardware adaptation software, can flexibly be adjusted according to system resource and algorithm characteristics
Whole degree of parallelism, the adaptation being optimal, therefore Energy Efficiency Ratio are higher than CPU and GPU.FPGA is especially good at Digital Signal Processing, can
The interface of compatible more level standards, and various high-speed electronic components, such as high speed fibre transceiver etc. can be interconnected, low in energy consumption,
The small feature of cost is even more that it is widely used in many fields.
In recent years, computer vision technique is quickly grown, and is widely used in the fields such as security protection, traffic, robot, unmanned vehicle,
Wherein target detection is wherein important research direction.SSD (Single Shot MultiBox Detector) is as typical
Algorithm of target detection, by using the priori frame (Prior boxes, Default boxes) of different scale and length-width ratio, and
And the characteristic pattern for extracting different scale is detected, and is directly classified and is returned after then extracting feature using CNN, solved
Common wisp detection difficult problem in target detection, while having the characteristics that fireballing, it can be used for automatic Pilot, security protection
The scenes such as camera, however calculated power, size, power consumption by the terminal of these application scenarios and limited, while also SSD being required to calculate
Method has executes speed faster.
Summary of the invention
Technical assignment of the invention is that in view of the above problems, providing one kind can satisfy edge side for calculating
The requirement of power, real-time and power consumption, and realize the dynamic update of FPGA not power down, continue boosting algorithm efficiency towards intelligent end
The hardware-accelerated method of the SSD at end.
To achieve the above object, the present invention provides the following technical scheme that
A kind of hardware-accelerated method of SSD towards intelligent terminal, it is different using ARM+FPGA in edge side smart machine terminal
Framework structure carries out computing hardware acceleration based on edge side target detection service application scene;SSD is completed by cloud data center to calculate
The model training of method, and it is directed to different FPGA design personalization algorithms, FPGA is dynamically loaded into smart machine terminal;It is right
SSD algorithm, using dot-product operation and tree-like adder, design 1x1 convolution unit, 3x3 convolution unit, add tree unit and
Relu unit completes the combination of a variety of computing units, carries out according to the loading sequence of setting rule selection data and network parameter
It calculates, cooperates jointly with ARM and realize SSD algorithm.
The hardware-accelerated method of SSD towards intelligent terminal is directed to the network structure of SSD algorithm, efficiently uses FPGA power consumption
Feature low, real-time parallel processing capacity is strong is based on edge side target detection service application field using ARM+FPGA isomery framework
Scape carries out computing hardware acceleration, meets requirement of the edge side for calculating power, real-time and power consumption;Fully consider that edge side is set
The resource situation stored in the calculating of standby resource situation and FPGA and chip, reasonably by convolution, Bias calculating, Relu etc.
Nonidentity operation is run in FPGA, designs 3X3 and 1X1 convolution unit, the convolutional calculation number of reverse scan is connect using forward scan
It according to loading sequence, makes full use of and is cached on chip, reduce the data exchange time that data cache on DDR memory and fpga chip
Number;The operation of unsuitable FPGA flowing water operation is cooperated into execution by arm processor, includes PriorBox, pond in SSD algorithm
Change etc. calculates, and ensure that the execution efficiency of FPGA, reduces the complexity of system;When ARM executes PriorBox relevant calculation, selection
FPGA idle moment accelerates the convolution algorithm in calculating, improves whole operational efficiency, realizes that the hardware of SSD algorithm edge side adds
Speed improves image object detection speed, realizes that high energy efficiency calculates, and then promote the overall performance of terminal.In addition, in cloud data
The characteristics of heart Continuous optimization model, effective use FPGA dynamic can load, the dynamic update of FPGA not power down is realized, is persistently mentioned
Rise efficiency of algorithm.
Preferably, the smart machine terminal uses ARM+FPGA isomery framework, there is memory storage and outer village to store,
Image Acquisition is provided, realizes the image real-time target detection of edge side.
Preferably, the cloud data center collects target detection training set, completes to train using SSD network, will obtain
SSD network model according to the FPGA performance of different size carry out customization.
Preferably, determine data and network parameter load after carrying out customization to FPGA and execute the sequence calculated,
Personalized SSD network model is downloaded to the smart machine terminal according to FPGA hardware situation.
Preferably, the FPGA design convolution circuit designs 3x3 convolution using dot-product operation and tree-like adder
With Relu computing unit.
Preferably, the ARM realizes the control to FPGA, by dma controller by data and network parameter from it is interior from
In middle reading FPGA, while realizing the maximum pond in SSD.
Realize maximum pond MaxPool, PriorBox, Permute, Normalize in SSD, the operation such as Flatten.
Preferably, limited according to PE resource on SSD algorithm and fpga chip and the specification of cache resources, using first calculating
The mode in channel, the 3X3 convolutional calculation in one group of 64 channel of design are a unit, and the inner product comprising 64 3X3 calculates, respectively
The value of 64 convolution is calculated, then connects add tree unit, obtains a numerical value as a result, completing one group of 64 3X3 convolution.
Preferably, the 1X1 convolutional calculation in one group of 64 channel of design is a unit, the multiplication fortune comprising 64 1X1
It calculates, calculates separately out the value of 64 convolution, then connect add tree unit, obtain a numerical result.
Preferably, design FPGA convolutional layer computing unit, input data caching is the number of the 3X3 or 1X1 in whole channels
According to node, parameter cache includes 3X3 the or 1X1 convolution nuclear parameter and a Bias parameter in whole channels of a Filter.
Relu calculate node is designed, realizes Relu (x)=Max (0, x) function in circuit;By combining multiple convolutional layers
Computing unit completes one layer of convolution algorithm, is exported, and external memory is arrived in storage.
Compared with prior art, the hardware-accelerated method of the SSD of the invention towards intelligent terminal has with following prominent
Beneficial effect:
(1) network structure of SSD algorithm, the effective use spy that FPGA is low in energy consumption, real-time parallel processing capacity is strong are directed to
Point is carried out computing hardware acceleration based on edge side target detection service application scene, is met side using ARM+FPGA isomery framework
Requirement of the edge side for calculating power, real-time and power consumption;
(2) resource situation stored on the resource situation of edge side apparatus and the calculating of FPGA and chip is fully considered,
The nonidentity operations such as convolution, Bias calculating, Relu are run in FPGA reasonably, 3X3 and 1X1 convolution unit are designed, using just
The convolutional calculation data loading sequence that reverse scan is connect to scanning, makes full use of and caches on chip, reduce data in DDR memory and
The data exchange number cached on fpga chip;
(3) operation of unsuitable FPGA flowing water operation is cooperated into execution by arm processor, comprising in SSD algorithm
PriorBox, pond etc. calculate, and ensure that the execution efficiency of FPGA, reduce the complexity of system;
(4) it when ARM executes PriorBox relevant calculation, selects FPGA idle moment to accelerate the convolution algorithm in calculating, mentions
High whole operational efficiency, realizes the hardware-accelerated of SSD algorithm edge side, improves image object detection speed, realizes high energy
Effect calculates, and then promotes the overall performance of terminal;
(5) the characteristics of cloud data center Continuous optimization model, effective use FPGA dynamic can load, FPGA is realized not
The dynamic update of power down continues boosting algorithm efficiency, has good application value.
Detailed description of the invention
Fig. 1 is the smart machine terminal structure and section in the hardware-accelerated method of the SSD towards intelligent terminal of the present invention
Point schematic diagram;
Fig. 2 is SSD accelerating algorithm schematic diagram in the hardware-accelerated method of the SSD towards intelligent terminal of the present invention;
Fig. 3 is that the SSD hardware of the smart machine terminal of the hardware-accelerated method of the SSD towards intelligent terminal of the present invention adds
Fast flow chart.
Specific embodiment
Below in conjunction with drawings and examples, the hardware-accelerated method of the SSD towards intelligent terminal of the invention is made into one
Step is described in detail.
Embodiment
As depicted in figs. 1 and 2, the hardware-accelerated method of the SSD of the invention towards intelligent terminal, in edge side smart machine
ARM+FPGA isomery framework is used in terminal, and computing hardware acceleration is carried out based on edge side target detection service application scene, it is full
Requirement of the sufficient edge side for calculating power, real-time and power consumption.By cloud data center complete SSD algorithm model training and
Optimization, and it is directed to different FPGA specifications design personalization algorithms, it is dynamically loaded into smart machine terminal.For SSD
Algorithm, using dot-product operation and tree-like adder etc., design 1x1 convolution unit, 3x3 convolution unit, add tree unit and
The basic computational ele- ments such as Relu unit fully consider cache size on chip, the combination of a variety of computing units are completed, according to setting
Rule selection data and the loading sequence of parameter calculated, cooperate realization SSD algorithm jointly with ARM.
Wherein cloud data center is responsible for collecting target detection training set, completes to train using SSD network, the SSD that will be obtained
Network model carries out customization according to the FPGA performance of different size, determines data and network parameter load and executes calculating
Sequentially, and according to FPGA hardware situation personalized model is downloaded to the smart machine terminal;The smart machine
Terminal uses ARM+FPGA isomery framework, and there is memory storage and external memory to store, provide image collecting function, realize edge side
The detection of image real-time target;The FPGA design convolution circuit, realize 3X3 dot product operations, add tree computing unit and
Relu unit;The ARM realizes the control for FPGA, by dma controller by data and network model parameter from memory
In middle reading FPGA, while realizing maximum pond MaxPool, PriorBox, Permute, Normalize in SSD,
The operation such as Flatten;According to PE resource and the specification of cache resources limit the characteristics of SSD algorithm and on fpga chip, use
The mode of channel (channel) is first calculated, it includes 64 3X3 that the 3X3 convolutional calculation in one group of 64 channel of design, which is a unit,
Inner product calculate, calculate separately out the value of 64 convolution, then connect add tree unit, obtain a numerical value as a result, complete one group 64
A 3X3 convolution;The 1X1 convolutional calculation for designing one group of 64 channel is a unit, the multiplying comprising 64 1X1, difference
The value of 64 convolution is calculated, then connects add tree unit (can be multiplexed with 3X3 convolution), obtains a numerical result;Design
FPGA convolutional layer computing unit, input data caching are the back end of the 3X3 or 1X1 in whole channels, and parameter cache includes one
3X3 the or 1X1 convolution nuclear parameter in whole channels of a Filter and a Bias parameter;Relu calculate node is designed, in circuit
Middle realization Relu (x)=Max (0, x) function;One layer of convolution algorithm is completed by combining multiple convolutional layer computing units, is obtained defeated
Out, external memory is arrived in storage.
As shown in figure 3, the hardware-accelerated method of SSD towards intelligent terminal, the SSD hardware for smart machine terminal add
Speed, comprising:
S1, cloud data center collect target detection training set, complete to train using SSD network, generate model.
S2, cloud data center carry out customization according to the FPGA performance of different size, and personalized model is downloaded to institute
The smart machine terminal stated.
S3, smart machine terminal obtain external image by image capture devices such as camera or cameras, and carry out image
Resolution adaptation conversion, meets SSD algorithm requirement.
S4, smart machine terminal read in image in DDR memory, are encoded with RGB triple channel.
S5, according to SSD algorithm the step of, by FPGA by DMA load in memory convolutional layer CONV1_1 (first layer roll up
Product) a 3X3 convolution kernel (include triple channel, a total of 64 convolution kernels of CONV1_1) model parameter.
S6, by DMA in memory according to the unit triple channel of 3X3 according to image from upper left to the right to read image
Data are calculated using the convolution unit in one group of 64 channel, only use wherein 3 channels.
S7, three channels are obtained by 3X3 convolution algorithm as a result, reusing add tree completes numerical value accumulation calculating, finally
It is added to obtain single result with Bias value.
S8, S7 result is passed through into Relu unit, is output on chip and caches, final output is into DDR memory.
S9, S5 to S8 is repeated, makes full use of PU unit on fpga chip to carry out convolution algorithm, first loads convolution kernel, then
Image data is read, is calculated, wherein image data starts according to image bottom-right location when loading next convolution kernel, to
The direction of upper left is loaded, and is guaranteed that last round of last image data can be multiplexed, is reduced memory reading times.
S10, load convolutional layer CONV1_2 (second layer convolution) load 3X3 convolution kernel according to same rule, carry out 64
The calculating of a one group of unit in channel, calculated result are output to DDR storage by Relu unit.
S11, the maximum pond MaxPool that 2X2 is calculated by ARM complete the first layer network of CONV1.
CONV2 to the CONV5 of S12, SSD network completes to calculate according to S5 to S10 sequence, and wherein CONV4_3 is output to DDR
In memory, regularization Normalize is completed by ARM and is calculated, and carries out the relevant calculation of CONV_MBOX PriorBox, including volume
The operations such as product, Permutation and Flatten.
The convolution algorithm of the 13X13 of S13, FC6 layer network is divided into the convolution (15X15) of 5 3X3 to be calculated, wherein
Unit beyond 13X13 is mended 0 and is calculated.
S14, FC7 layers of operation use one group of 64 channel 1X1 convolution algorithm, then pass through add tree, then be added with Bias value
Obtain single as a result, the 3X3 convolution in similar S9 repeats to calculate, final output to DDR storage is as next layer of CONV6's
Input, while result carries out processing by ARM and completes the relevant operations such as PriorBox.
S15, CONV6 are all made of similar calculating process to CONV10,1X1 convolutional layer, 3X3 convolutional layer and Relu layers by
FPGA is completed, still using being scanned to bottom right since the upper left FeatureMap, computation sequence from another mistake to return, and by result
It is output to DDR memory.
The related operation of S16, the NORMBOX of CONV6 to CONV9 and PriorBox transfer to ARM to complete, when CONV10 is calculated
It completes, the convolution algorithm of the NORMBOX of CONV10 transfers to FPGA to carry out operation.
It, can be according to the fortune of CONV6 to CONV9 if the NORMBOX related operation after S17, CONV10 is fully completed
Situation is calculated, continues to transfer to convolution algorithm therein FPGA to carry out.
After the completion of all operations in S18, front, calculating, the confidence level of last NORMBOX_priorbox are executed by ARM
(MBOX_CONF) and the calculating of position (MBOX_LOC), final output.
S19, S13 to S18 is repeated, persistently carries out target detection.
S20, cloud data center Continuous optimization model, dynamically load update terminal side model.
Embodiment described above, the only present invention more preferably specific embodiment, those skilled in the art is at this
The usual variations and alternatives carried out within the scope of inventive technique scheme should be all included within the scope of the present invention.
Claims (9)
1. a kind of hardware-accelerated method of SSD towards intelligent terminal, it is characterised in that: used in edge side smart machine terminal
ARM+FPGA isomery framework carries out computing hardware acceleration based on edge side target detection service application scene;By in cloud data
The heart completes the model training of SSD algorithm, and is directed to different FPGA design personalization algorithms, and FPGA is dynamically loaded into intelligence and is set
In standby terminal;Is designed by 1x1 convolution unit, 3x3 convolution unit, is added using dot-product operation and tree-like adder for SSD algorithm
Method tree unit and Relu unit, complete the combination of a variety of computing units, according to setting rule selection data and network parameter plus
Load sequence is calculated, and is cooperated jointly with ARM and is realized SSD algorithm.
2. the hardware-accelerated method of the SSD according to claim 1 towards intelligent terminal, it is characterised in that: the intelligence is set
Standby terminal uses ARM+FPGA isomery framework, and there is memory storage and outer village to store, provide Image Acquisition, realize the figure of edge side
As real-time target detects.
3. the hardware-accelerated method of the SSD according to claim 1 or 2 towards intelligent terminal, it is characterised in that: the cloud number
According to central collection target detection training set, complete to train using SSD network, by obtained SSD network model according to different size
FPGA performance carry out customization.
4. the hardware-accelerated method of the SSD according to claim 3 towards intelligent terminal, it is characterised in that: carried out to FPGA
After customization, determines data and network parameter load and execute the sequence calculated, it will be personalized according to FPGA hardware situation
SSD network model downloads to the smart machine terminal.
5. the hardware-accelerated method of the SSD according to claim 4 towards intelligent terminal, it is characterised in that: the FPGA is set
It counts convolution circuit and designs 3x3 convolution sum Relu computing unit using dot-product operation and tree-like adder.
6. the hardware-accelerated method of the SSD according to claim 5 towards intelligent terminal, it is characterised in that: the ARM is realized
Data and network parameter are therefrom read in FPGA from interior by dma controller, while realized in SSD by the control to FPGA
Maximum pond.
7. the hardware-accelerated method of the SSD according to claim 6 towards intelligent terminal, it is characterised in that: according to SSD algorithm
And PE resource and the limitation of the specification of cache resources design one group of 64 channel by the way of first calculating channel on fpga chip
3X3 convolutional calculation be a unit, the inner product comprising 64 3X3 calculates, and calculates separately out the value of 64 convolution, then connect addition
Unit is set, obtains a numerical value as a result, completing one group of 64 3X3 convolution.
8. the hardware-accelerated method of the SSD according to claim 7 towards intelligent terminal, it is characterised in that: one group 64 of design
The 1X1 convolutional calculation in a channel is a unit, and the multiplying comprising 64 1X1 calculates separately out the value of 64 convolution, then
Add tree unit is connect, a numerical result is obtained.
9. the hardware-accelerated method of the SSD according to claim 8 towards intelligent terminal, it is characterised in that: FPGA volumes of design
Lamination computing unit, input data caching are the back end of the 3X3 or 1X1 in whole channels, and parameter cache includes one
3X3 the or 1X1 convolution nuclear parameter in whole channels of Filter and a Bias parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910474860.1A CN110209627A (en) | 2019-06-03 | 2019-06-03 | A kind of hardware-accelerated method of SSD towards intelligent terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910474860.1A CN110209627A (en) | 2019-06-03 | 2019-06-03 | A kind of hardware-accelerated method of SSD towards intelligent terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110209627A true CN110209627A (en) | 2019-09-06 |
Family
ID=67790309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910474860.1A Pending CN110209627A (en) | 2019-06-03 | 2019-06-03 | A kind of hardware-accelerated method of SSD towards intelligent terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110209627A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112887093A (en) * | 2021-03-30 | 2021-06-01 | 矩阵元技术(深圳)有限公司 | Hardware acceleration system and method for implementing cryptographic algorithms |
CN115309407A (en) * | 2022-10-12 | 2022-11-08 | 中国移动通信有限公司研究院 | Method and system capable of realizing calculation power abstraction |
CN115550607A (en) * | 2020-09-27 | 2022-12-30 | 北京天玛智控科技股份有限公司 | Model reasoning accelerator realized based on FPGA and intelligent visual perception terminal |
US11687279B2 (en) | 2020-01-27 | 2023-06-27 | Samsung Electronics Co., Ltd. | Latency and throughput centric reconfigurable storage device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN108256636A (en) * | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
CN108764466A (en) * | 2018-03-07 | 2018-11-06 | 东南大学 | Convolutional neural networks hardware based on field programmable gate array and its accelerated method |
CN108932548A (en) * | 2018-05-22 | 2018-12-04 | 中国科学技术大学苏州研究院 | A kind of degree of rarefication neural network acceleration system based on FPGA |
-
2019
- 2019-06-03 CN CN201910474860.1A patent/CN110209627A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN108280514A (en) * | 2018-01-05 | 2018-07-13 | 中国科学技术大学 | Sparse neural network acceleration system based on FPGA and design method |
CN108764466A (en) * | 2018-03-07 | 2018-11-06 | 东南大学 | Convolutional neural networks hardware based on field programmable gate array and its accelerated method |
CN108256636A (en) * | 2018-03-16 | 2018-07-06 | 成都理工大学 | A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing |
CN108932548A (en) * | 2018-05-22 | 2018-12-04 | 中国科学技术大学苏州研究院 | A kind of degree of rarefication neural network acceleration system based on FPGA |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11687279B2 (en) | 2020-01-27 | 2023-06-27 | Samsung Electronics Co., Ltd. | Latency and throughput centric reconfigurable storage device |
CN115550607A (en) * | 2020-09-27 | 2022-12-30 | 北京天玛智控科技股份有限公司 | Model reasoning accelerator realized based on FPGA and intelligent visual perception terminal |
CN112887093A (en) * | 2021-03-30 | 2021-06-01 | 矩阵元技术(深圳)有限公司 | Hardware acceleration system and method for implementing cryptographic algorithms |
CN115309407A (en) * | 2022-10-12 | 2022-11-08 | 中国移动通信有限公司研究院 | Method and system capable of realizing calculation power abstraction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110209627A (en) | A kind of hardware-accelerated method of SSD towards intelligent terminal | |
CN112214726B (en) | Operation accelerator | |
CN109102065B (en) | Convolutional neural network accelerator based on PSoC | |
CN109740534B (en) | Image processing method, device and processing equipment | |
CN107844832A (en) | A kind of information processing method and Related product | |
Pestana et al. | A full featured configurable accelerator for object detection with YOLO | |
CN110458279A (en) | A kind of binary neural network accelerated method and system based on FPGA | |
CN108647773B (en) | Hardware interconnection system capable of reconstructing convolutional neural network | |
CN111667051A (en) | Neural network accelerator suitable for edge equipment and neural network acceleration calculation method | |
CN111898733B (en) | Deep separable convolutional neural network accelerator architecture | |
CN112163601B (en) | Image classification method, system, computer device and storage medium | |
CN111047008B (en) | Convolutional neural network accelerator and acceleration method | |
CN109416756A (en) | Acoustic convolver and its applied artificial intelligence process device | |
US11996105B2 (en) | Information processing method and terminal device | |
CN111738433A (en) | Reconfigurable convolution hardware accelerator | |
CN109598250A (en) | Feature extracting method, device, electronic equipment and computer-readable medium | |
CN110598844A (en) | Parallel convolution neural network accelerator based on FPGA and acceleration method | |
CN114329324A (en) | Data processing circuit, data processing method and related product | |
CN117501245A (en) | Neural network model training method and device, and data processing method and device | |
CN117217274B (en) | Vector processor, neural network accelerator, chip and electronic equipment | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
CN110222835A (en) | A kind of convolutional neural networks hardware system and operation method based on zero value detection | |
CN113837922A (en) | Computing device, data processing method and related product | |
CN113128673B (en) | Data processing method, storage medium, neural network processor and electronic device | |
CN114581952A (en) | Pedestrian re-identification method, system, device, equipment and computer medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190906 |
|
RJ01 | Rejection of invention patent application after publication |