CN110209627A - A kind of hardware-accelerated method of SSD towards intelligent terminal - Google Patents

A kind of hardware-accelerated method of SSD towards intelligent terminal Download PDF

Info

Publication number
CN110209627A
CN110209627A CN201910474860.1A CN201910474860A CN110209627A CN 110209627 A CN110209627 A CN 110209627A CN 201910474860 A CN201910474860 A CN 201910474860A CN 110209627 A CN110209627 A CN 110209627A
Authority
CN
China
Prior art keywords
ssd
fpga
hardware
intelligent terminal
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910474860.1A
Other languages
Chinese (zh)
Inventor
孙善宝
王子彤
姜凯
李朋
秦刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Original Assignee
Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Artificial Intelligence Research Institute Co Ltd filed Critical Shandong Inspur Artificial Intelligence Research Institute Co Ltd
Priority to CN201910474860.1A priority Critical patent/CN110209627A/en
Publication of CN110209627A publication Critical patent/CN110209627A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • G06F15/7846On-chip cache and off-chip main memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a kind of hardware-accelerated methods of the SSD towards intelligent terminal, belong to FPGA hardware acceleration, target detection, computer vision and Heterogeneous Computing technical field.The hardware-accelerated method of SSD towards intelligent terminal of the invention uses ARM+FPGA isomery framework in edge side smart machine terminal, carries out computing hardware acceleration based on edge side target detection service application scene;The model training of SSD algorithm is completed by cloud data center, and is directed to different FPGA design personalization algorithms, and FPGA is dynamically loaded into smart machine terminal;3x3 convolution, add tree calculating and Relu computing unit are designed using dot-product operation and tree-like adder to SSD algorithm.The hardware-accelerated method of the SSD towards intelligent terminal of the invention can satisfy requirement of the edge side for calculating power, real-time and power consumption, have good application value.

Description

A kind of hardware-accelerated method of SSD towards intelligent terminal
Technical field
The present invention relates to FPGA hardware acceleration, target detection, computer vision and Heterogeneous Computing technical fields, specific to provide A kind of hardware-accelerated method of SSD towards intelligent terminal.
Background technique
FPGA (Field-Programmable Gate Array), i.e. field programmable gate array, are a kind of main needles The semiconductor devices that application or functional requirement can be programmed.Isomery has been widely used in it and has accelerated field, has shown Compared to general processor CPU better performance.The design of CPU is mainly for logic calculation, and different from CPU and GPU, FPGA is one The typical Fei Nuoyiman framework of kind, is the mode of hardware adaptation software, can flexibly be adjusted according to system resource and algorithm characteristics Whole degree of parallelism, the adaptation being optimal, therefore Energy Efficiency Ratio are higher than CPU and GPU.FPGA is especially good at Digital Signal Processing, can The interface of compatible more level standards, and various high-speed electronic components, such as high speed fibre transceiver etc. can be interconnected, low in energy consumption, The small feature of cost is even more that it is widely used in many fields.
In recent years, computer vision technique is quickly grown, and is widely used in the fields such as security protection, traffic, robot, unmanned vehicle, Wherein target detection is wherein important research direction.SSD (Single Shot MultiBox Detector) is as typical Algorithm of target detection, by using the priori frame (Prior boxes, Default boxes) of different scale and length-width ratio, and And the characteristic pattern for extracting different scale is detected, and is directly classified and is returned after then extracting feature using CNN, solved Common wisp detection difficult problem in target detection, while having the characteristics that fireballing, it can be used for automatic Pilot, security protection The scenes such as camera, however calculated power, size, power consumption by the terminal of these application scenarios and limited, while also SSD being required to calculate Method has executes speed faster.
Summary of the invention
Technical assignment of the invention is that in view of the above problems, providing one kind can satisfy edge side for calculating The requirement of power, real-time and power consumption, and realize the dynamic update of FPGA not power down, continue boosting algorithm efficiency towards intelligent end The hardware-accelerated method of the SSD at end.
To achieve the above object, the present invention provides the following technical scheme that
A kind of hardware-accelerated method of SSD towards intelligent terminal, it is different using ARM+FPGA in edge side smart machine terminal Framework structure carries out computing hardware acceleration based on edge side target detection service application scene;SSD is completed by cloud data center to calculate The model training of method, and it is directed to different FPGA design personalization algorithms, FPGA is dynamically loaded into smart machine terminal;It is right SSD algorithm, using dot-product operation and tree-like adder, design 1x1 convolution unit, 3x3 convolution unit, add tree unit and Relu unit completes the combination of a variety of computing units, carries out according to the loading sequence of setting rule selection data and network parameter It calculates, cooperates jointly with ARM and realize SSD algorithm.
The hardware-accelerated method of SSD towards intelligent terminal is directed to the network structure of SSD algorithm, efficiently uses FPGA power consumption Feature low, real-time parallel processing capacity is strong is based on edge side target detection service application field using ARM+FPGA isomery framework Scape carries out computing hardware acceleration, meets requirement of the edge side for calculating power, real-time and power consumption;Fully consider that edge side is set The resource situation stored in the calculating of standby resource situation and FPGA and chip, reasonably by convolution, Bias calculating, Relu etc. Nonidentity operation is run in FPGA, designs 3X3 and 1X1 convolution unit, the convolutional calculation number of reverse scan is connect using forward scan It according to loading sequence, makes full use of and is cached on chip, reduce the data exchange time that data cache on DDR memory and fpga chip Number;The operation of unsuitable FPGA flowing water operation is cooperated into execution by arm processor, includes PriorBox, pond in SSD algorithm Change etc. calculates, and ensure that the execution efficiency of FPGA, reduces the complexity of system;When ARM executes PriorBox relevant calculation, selection FPGA idle moment accelerates the convolution algorithm in calculating, improves whole operational efficiency, realizes that the hardware of SSD algorithm edge side adds Speed improves image object detection speed, realizes that high energy efficiency calculates, and then promote the overall performance of terminal.In addition, in cloud data The characteristics of heart Continuous optimization model, effective use FPGA dynamic can load, the dynamic update of FPGA not power down is realized, is persistently mentioned Rise efficiency of algorithm.
Preferably, the smart machine terminal uses ARM+FPGA isomery framework, there is memory storage and outer village to store, Image Acquisition is provided, realizes the image real-time target detection of edge side.
Preferably, the cloud data center collects target detection training set, completes to train using SSD network, will obtain SSD network model according to the FPGA performance of different size carry out customization.
Preferably, determine data and network parameter load after carrying out customization to FPGA and execute the sequence calculated, Personalized SSD network model is downloaded to the smart machine terminal according to FPGA hardware situation.
Preferably, the FPGA design convolution circuit designs 3x3 convolution using dot-product operation and tree-like adder With Relu computing unit.
Preferably, the ARM realizes the control to FPGA, by dma controller by data and network parameter from it is interior from In middle reading FPGA, while realizing the maximum pond in SSD.
Realize maximum pond MaxPool, PriorBox, Permute, Normalize in SSD, the operation such as Flatten.
Preferably, limited according to PE resource on SSD algorithm and fpga chip and the specification of cache resources, using first calculating The mode in channel, the 3X3 convolutional calculation in one group of 64 channel of design are a unit, and the inner product comprising 64 3X3 calculates, respectively The value of 64 convolution is calculated, then connects add tree unit, obtains a numerical value as a result, completing one group of 64 3X3 convolution.
Preferably, the 1X1 convolutional calculation in one group of 64 channel of design is a unit, the multiplication fortune comprising 64 1X1 It calculates, calculates separately out the value of 64 convolution, then connect add tree unit, obtain a numerical result.
Preferably, design FPGA convolutional layer computing unit, input data caching is the number of the 3X3 or 1X1 in whole channels According to node, parameter cache includes 3X3 the or 1X1 convolution nuclear parameter and a Bias parameter in whole channels of a Filter.
Relu calculate node is designed, realizes Relu (x)=Max (0, x) function in circuit;By combining multiple convolutional layers Computing unit completes one layer of convolution algorithm, is exported, and external memory is arrived in storage.
Compared with prior art, the hardware-accelerated method of the SSD of the invention towards intelligent terminal has with following prominent Beneficial effect:
(1) network structure of SSD algorithm, the effective use spy that FPGA is low in energy consumption, real-time parallel processing capacity is strong are directed to Point is carried out computing hardware acceleration based on edge side target detection service application scene, is met side using ARM+FPGA isomery framework Requirement of the edge side for calculating power, real-time and power consumption;
(2) resource situation stored on the resource situation of edge side apparatus and the calculating of FPGA and chip is fully considered, The nonidentity operations such as convolution, Bias calculating, Relu are run in FPGA reasonably, 3X3 and 1X1 convolution unit are designed, using just The convolutional calculation data loading sequence that reverse scan is connect to scanning, makes full use of and caches on chip, reduce data in DDR memory and The data exchange number cached on fpga chip;
(3) operation of unsuitable FPGA flowing water operation is cooperated into execution by arm processor, comprising in SSD algorithm PriorBox, pond etc. calculate, and ensure that the execution efficiency of FPGA, reduce the complexity of system;
(4) it when ARM executes PriorBox relevant calculation, selects FPGA idle moment to accelerate the convolution algorithm in calculating, mentions High whole operational efficiency, realizes the hardware-accelerated of SSD algorithm edge side, improves image object detection speed, realizes high energy Effect calculates, and then promotes the overall performance of terminal;
(5) the characteristics of cloud data center Continuous optimization model, effective use FPGA dynamic can load, FPGA is realized not The dynamic update of power down continues boosting algorithm efficiency, has good application value.
Detailed description of the invention
Fig. 1 is the smart machine terminal structure and section in the hardware-accelerated method of the SSD towards intelligent terminal of the present invention Point schematic diagram;
Fig. 2 is SSD accelerating algorithm schematic diagram in the hardware-accelerated method of the SSD towards intelligent terminal of the present invention;
Fig. 3 is that the SSD hardware of the smart machine terminal of the hardware-accelerated method of the SSD towards intelligent terminal of the present invention adds Fast flow chart.
Specific embodiment
Below in conjunction with drawings and examples, the hardware-accelerated method of the SSD towards intelligent terminal of the invention is made into one Step is described in detail.
Embodiment
As depicted in figs. 1 and 2, the hardware-accelerated method of the SSD of the invention towards intelligent terminal, in edge side smart machine ARM+FPGA isomery framework is used in terminal, and computing hardware acceleration is carried out based on edge side target detection service application scene, it is full Requirement of the sufficient edge side for calculating power, real-time and power consumption.By cloud data center complete SSD algorithm model training and Optimization, and it is directed to different FPGA specifications design personalization algorithms, it is dynamically loaded into smart machine terminal.For SSD Algorithm, using dot-product operation and tree-like adder etc., design 1x1 convolution unit, 3x3 convolution unit, add tree unit and The basic computational ele- ments such as Relu unit fully consider cache size on chip, the combination of a variety of computing units are completed, according to setting Rule selection data and the loading sequence of parameter calculated, cooperate realization SSD algorithm jointly with ARM.
Wherein cloud data center is responsible for collecting target detection training set, completes to train using SSD network, the SSD that will be obtained Network model carries out customization according to the FPGA performance of different size, determines data and network parameter load and executes calculating Sequentially, and according to FPGA hardware situation personalized model is downloaded to the smart machine terminal;The smart machine Terminal uses ARM+FPGA isomery framework, and there is memory storage and external memory to store, provide image collecting function, realize edge side The detection of image real-time target;The FPGA design convolution circuit, realize 3X3 dot product operations, add tree computing unit and Relu unit;The ARM realizes the control for FPGA, by dma controller by data and network model parameter from memory In middle reading FPGA, while realizing maximum pond MaxPool, PriorBox, Permute, Normalize in SSD, The operation such as Flatten;According to PE resource and the specification of cache resources limit the characteristics of SSD algorithm and on fpga chip, use The mode of channel (channel) is first calculated, it includes 64 3X3 that the 3X3 convolutional calculation in one group of 64 channel of design, which is a unit, Inner product calculate, calculate separately out the value of 64 convolution, then connect add tree unit, obtain a numerical value as a result, complete one group 64 A 3X3 convolution;The 1X1 convolutional calculation for designing one group of 64 channel is a unit, the multiplying comprising 64 1X1, difference The value of 64 convolution is calculated, then connects add tree unit (can be multiplexed with 3X3 convolution), obtains a numerical result;Design FPGA convolutional layer computing unit, input data caching are the back end of the 3X3 or 1X1 in whole channels, and parameter cache includes one 3X3 the or 1X1 convolution nuclear parameter in whole channels of a Filter and a Bias parameter;Relu calculate node is designed, in circuit Middle realization Relu (x)=Max (0, x) function;One layer of convolution algorithm is completed by combining multiple convolutional layer computing units, is obtained defeated Out, external memory is arrived in storage.
As shown in figure 3, the hardware-accelerated method of SSD towards intelligent terminal, the SSD hardware for smart machine terminal add Speed, comprising:
S1, cloud data center collect target detection training set, complete to train using SSD network, generate model.
S2, cloud data center carry out customization according to the FPGA performance of different size, and personalized model is downloaded to institute The smart machine terminal stated.
S3, smart machine terminal obtain external image by image capture devices such as camera or cameras, and carry out image Resolution adaptation conversion, meets SSD algorithm requirement.
S4, smart machine terminal read in image in DDR memory, are encoded with RGB triple channel.
S5, according to SSD algorithm the step of, by FPGA by DMA load in memory convolutional layer CONV1_1 (first layer roll up Product) a 3X3 convolution kernel (include triple channel, a total of 64 convolution kernels of CONV1_1) model parameter.
S6, by DMA in memory according to the unit triple channel of 3X3 according to image from upper left to the right to read image Data are calculated using the convolution unit in one group of 64 channel, only use wherein 3 channels.
S7, three channels are obtained by 3X3 convolution algorithm as a result, reusing add tree completes numerical value accumulation calculating, finally It is added to obtain single result with Bias value.
S8, S7 result is passed through into Relu unit, is output on chip and caches, final output is into DDR memory.
S9, S5 to S8 is repeated, makes full use of PU unit on fpga chip to carry out convolution algorithm, first loads convolution kernel, then Image data is read, is calculated, wherein image data starts according to image bottom-right location when loading next convolution kernel, to The direction of upper left is loaded, and is guaranteed that last round of last image data can be multiplexed, is reduced memory reading times.
S10, load convolutional layer CONV1_2 (second layer convolution) load 3X3 convolution kernel according to same rule, carry out 64 The calculating of a one group of unit in channel, calculated result are output to DDR storage by Relu unit.
S11, the maximum pond MaxPool that 2X2 is calculated by ARM complete the first layer network of CONV1.
CONV2 to the CONV5 of S12, SSD network completes to calculate according to S5 to S10 sequence, and wherein CONV4_3 is output to DDR In memory, regularization Normalize is completed by ARM and is calculated, and carries out the relevant calculation of CONV_MBOX PriorBox, including volume The operations such as product, Permutation and Flatten.
The convolution algorithm of the 13X13 of S13, FC6 layer network is divided into the convolution (15X15) of 5 3X3 to be calculated, wherein Unit beyond 13X13 is mended 0 and is calculated.
S14, FC7 layers of operation use one group of 64 channel 1X1 convolution algorithm, then pass through add tree, then be added with Bias value Obtain single as a result, the 3X3 convolution in similar S9 repeats to calculate, final output to DDR storage is as next layer of CONV6's Input, while result carries out processing by ARM and completes the relevant operations such as PriorBox.
S15, CONV6 are all made of similar calculating process to CONV10,1X1 convolutional layer, 3X3 convolutional layer and Relu layers by FPGA is completed, still using being scanned to bottom right since the upper left FeatureMap, computation sequence from another mistake to return, and by result It is output to DDR memory.
The related operation of S16, the NORMBOX of CONV6 to CONV9 and PriorBox transfer to ARM to complete, when CONV10 is calculated It completes, the convolution algorithm of the NORMBOX of CONV10 transfers to FPGA to carry out operation.
It, can be according to the fortune of CONV6 to CONV9 if the NORMBOX related operation after S17, CONV10 is fully completed Situation is calculated, continues to transfer to convolution algorithm therein FPGA to carry out.
After the completion of all operations in S18, front, calculating, the confidence level of last NORMBOX_priorbox are executed by ARM (MBOX_CONF) and the calculating of position (MBOX_LOC), final output.
S19, S13 to S18 is repeated, persistently carries out target detection.
S20, cloud data center Continuous optimization model, dynamically load update terminal side model.
Embodiment described above, the only present invention more preferably specific embodiment, those skilled in the art is at this The usual variations and alternatives carried out within the scope of inventive technique scheme should be all included within the scope of the present invention.

Claims (9)

1. a kind of hardware-accelerated method of SSD towards intelligent terminal, it is characterised in that: used in edge side smart machine terminal ARM+FPGA isomery framework carries out computing hardware acceleration based on edge side target detection service application scene;By in cloud data The heart completes the model training of SSD algorithm, and is directed to different FPGA design personalization algorithms, and FPGA is dynamically loaded into intelligence and is set In standby terminal;Is designed by 1x1 convolution unit, 3x3 convolution unit, is added using dot-product operation and tree-like adder for SSD algorithm Method tree unit and Relu unit, complete the combination of a variety of computing units, according to setting rule selection data and network parameter plus Load sequence is calculated, and is cooperated jointly with ARM and is realized SSD algorithm.
2. the hardware-accelerated method of the SSD according to claim 1 towards intelligent terminal, it is characterised in that: the intelligence is set Standby terminal uses ARM+FPGA isomery framework, and there is memory storage and outer village to store, provide Image Acquisition, realize the figure of edge side As real-time target detects.
3. the hardware-accelerated method of the SSD according to claim 1 or 2 towards intelligent terminal, it is characterised in that: the cloud number According to central collection target detection training set, complete to train using SSD network, by obtained SSD network model according to different size FPGA performance carry out customization.
4. the hardware-accelerated method of the SSD according to claim 3 towards intelligent terminal, it is characterised in that: carried out to FPGA After customization, determines data and network parameter load and execute the sequence calculated, it will be personalized according to FPGA hardware situation SSD network model downloads to the smart machine terminal.
5. the hardware-accelerated method of the SSD according to claim 4 towards intelligent terminal, it is characterised in that: the FPGA is set It counts convolution circuit and designs 3x3 convolution sum Relu computing unit using dot-product operation and tree-like adder.
6. the hardware-accelerated method of the SSD according to claim 5 towards intelligent terminal, it is characterised in that: the ARM is realized Data and network parameter are therefrom read in FPGA from interior by dma controller, while realized in SSD by the control to FPGA Maximum pond.
7. the hardware-accelerated method of the SSD according to claim 6 towards intelligent terminal, it is characterised in that: according to SSD algorithm And PE resource and the limitation of the specification of cache resources design one group of 64 channel by the way of first calculating channel on fpga chip 3X3 convolutional calculation be a unit, the inner product comprising 64 3X3 calculates, and calculates separately out the value of 64 convolution, then connect addition Unit is set, obtains a numerical value as a result, completing one group of 64 3X3 convolution.
8. the hardware-accelerated method of the SSD according to claim 7 towards intelligent terminal, it is characterised in that: one group 64 of design The 1X1 convolutional calculation in a channel is a unit, and the multiplying comprising 64 1X1 calculates separately out the value of 64 convolution, then Add tree unit is connect, a numerical result is obtained.
9. the hardware-accelerated method of the SSD according to claim 8 towards intelligent terminal, it is characterised in that: FPGA volumes of design Lamination computing unit, input data caching are the back end of the 3X3 or 1X1 in whole channels, and parameter cache includes one 3X3 the or 1X1 convolution nuclear parameter in whole channels of Filter and a Bias parameter.
CN201910474860.1A 2019-06-03 2019-06-03 A kind of hardware-accelerated method of SSD towards intelligent terminal Pending CN110209627A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910474860.1A CN110209627A (en) 2019-06-03 2019-06-03 A kind of hardware-accelerated method of SSD towards intelligent terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910474860.1A CN110209627A (en) 2019-06-03 2019-06-03 A kind of hardware-accelerated method of SSD towards intelligent terminal

Publications (1)

Publication Number Publication Date
CN110209627A true CN110209627A (en) 2019-09-06

Family

ID=67790309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910474860.1A Pending CN110209627A (en) 2019-06-03 2019-06-03 A kind of hardware-accelerated method of SSD towards intelligent terminal

Country Status (1)

Country Link
CN (1) CN110209627A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887093A (en) * 2021-03-30 2021-06-01 矩阵元技术(深圳)有限公司 Hardware acceleration system and method for implementing cryptographic algorithms
CN115309407A (en) * 2022-10-12 2022-11-08 中国移动通信有限公司研究院 Method and system capable of realizing calculation power abstraction
CN115550607A (en) * 2020-09-27 2022-12-30 北京天玛智控科技股份有限公司 Model reasoning accelerator realized based on FPGA and intelligent visual perception terminal
US11687279B2 (en) 2020-01-27 2023-06-27 Samsung Electronics Co., Ltd. Latency and throughput centric reconfigurable storage device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN108256636A (en) * 2018-03-16 2018-07-06 成都理工大学 A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN108932548A (en) * 2018-05-22 2018-12-04 中国科学技术大学苏州研究院 A kind of degree of rarefication neural network acceleration system based on FPGA

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN108256636A (en) * 2018-03-16 2018-07-06 成都理工大学 A kind of convolutional neural networks algorithm design implementation method based on Heterogeneous Computing
CN108932548A (en) * 2018-05-22 2018-12-04 中国科学技术大学苏州研究院 A kind of degree of rarefication neural network acceleration system based on FPGA

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11687279B2 (en) 2020-01-27 2023-06-27 Samsung Electronics Co., Ltd. Latency and throughput centric reconfigurable storage device
CN115550607A (en) * 2020-09-27 2022-12-30 北京天玛智控科技股份有限公司 Model reasoning accelerator realized based on FPGA and intelligent visual perception terminal
CN112887093A (en) * 2021-03-30 2021-06-01 矩阵元技术(深圳)有限公司 Hardware acceleration system and method for implementing cryptographic algorithms
CN115309407A (en) * 2022-10-12 2022-11-08 中国移动通信有限公司研究院 Method and system capable of realizing calculation power abstraction

Similar Documents

Publication Publication Date Title
CN110209627A (en) A kind of hardware-accelerated method of SSD towards intelligent terminal
CN112214726B (en) Operation accelerator
CN109102065B (en) Convolutional neural network accelerator based on PSoC
CN109740534B (en) Image processing method, device and processing equipment
CN107844832A (en) A kind of information processing method and Related product
Pestana et al. A full featured configurable accelerator for object detection with YOLO
CN110458279A (en) A kind of binary neural network accelerated method and system based on FPGA
CN108647773B (en) Hardware interconnection system capable of reconstructing convolutional neural network
CN111667051A (en) Neural network accelerator suitable for edge equipment and neural network acceleration calculation method
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN112163601B (en) Image classification method, system, computer device and storage medium
CN111047008B (en) Convolutional neural network accelerator and acceleration method
CN109416756A (en) Acoustic convolver and its applied artificial intelligence process device
US11996105B2 (en) Information processing method and terminal device
CN111738433A (en) Reconfigurable convolution hardware accelerator
CN109598250A (en) Feature extracting method, device, electronic equipment and computer-readable medium
CN110598844A (en) Parallel convolution neural network accelerator based on FPGA and acceleration method
CN114329324A (en) Data processing circuit, data processing method and related product
CN117501245A (en) Neural network model training method and device, and data processing method and device
CN117217274B (en) Vector processor, neural network accelerator, chip and electronic equipment
CN110716751B (en) High-parallelism computing platform, system and computing implementation method
CN110222835A (en) A kind of convolutional neural networks hardware system and operation method based on zero value detection
CN113837922A (en) Computing device, data processing method and related product
CN113128673B (en) Data processing method, storage medium, neural network processor and electronic device
CN114581952A (en) Pedestrian re-identification method, system, device, equipment and computer medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190906

RJ01 Rejection of invention patent application after publication