CN112596718B - Hardware code generation and performance evaluation method - Google Patents

Hardware code generation and performance evaluation method Download PDF

Info

Publication number
CN112596718B
CN112596718B CN202011557924.3A CN202011557924A CN112596718B CN 112596718 B CN112596718 B CN 112596718B CN 202011557924 A CN202011557924 A CN 202011557924A CN 112596718 B CN112596718 B CN 112596718B
Authority
CN
China
Prior art keywords
neural network
deep neural
network algorithm
intelligent computing
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011557924.3A
Other languages
Chinese (zh)
Other versions
CN112596718A (en
Inventor
刘飞阳
郭鹏
文鹏程
白林亭
李奕璇
李亚晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202011557924.3A priority Critical patent/CN112596718B/en
Publication of CN112596718A publication Critical patent/CN112596718A/en
Application granted granted Critical
Publication of CN112596718B publication Critical patent/CN112596718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a hardware code generation and performance evaluation method, which comprises the following steps: analyzing the structure, the data type, the operation type and the processing flow of the deep neural network algorithm model to generate a deep neural network algorithm model analysis report; according to the analysis report of the deep neural network algorithm model, selecting an intelligent computing unit IP corresponding to the operation type from an intelligent computing unit IP library, instantiating and selecting an optimal interconnection network structure; mapping all operations in the analysis report of the deep neural network algorithm model to the IP examples of the intelligent computing units in a one-to-one correspondence manner; generating a hardware configuration scheme; generating a comprehensive hardware code of a deep neural network algorithm according to a hardware configuration scheme, and generating a binary file which can be loaded to an embedded intelligent computing module through hardware comprehensive software; and (4) performing functional simulation and performance analysis by taking the deep neural network algorithm model as a test reference to obtain a performance analysis report and a time sequence diagram.

Description

Hardware code generation and performance evaluation method
Technical Field
The invention belongs to the field of embedded computing, and relates to a hardware code generation and performance evaluation method.
Background
With the increasing demand of complex embedded computing systems such as airborne processing systems for intelligent computing in recent years, artificial intelligence algorithms represented by deep neural networks are gradually popularized and applied, for example, convolutional neural network algorithm models such as AlexNet, VGG, mobileNet and the like for intelligent image target classification, convolutional neural network algorithm models such as SSD, YOLO, fast R-CNN and the like for intelligent target detection, and cyclic neural network models such as LSTM, GRU and the like. However, the complex embedded computing system faces various intelligent task scenarios, such as image recognition, voice recognition, target positioning, target confirmation, decision control, etc., and it is necessary to be able to run various deep neural network intelligent algorithms in a unified embedded intelligent computing module, and to be able to quickly implement the loading and running of a high-level algorithm model in an intelligent computing hardware module. At present, dedicated intelligent computing chips introduced at home and abroad, such as Hi3559A and Shengteng 310 of Haimai, only support image applications such as target identification, etc., while general intelligent computing chips of the Mirabilis MLU series are difficult to optimize for different intelligent algorithms, and have yet to be improved in performance and power consumption, and in addition, the problems of immature instruction sets, inconvenient use of development tool software, difficulty in developing accurate performance evaluation, etc. generally exist in the existing intelligent computing chips. The embedded intelligent computing module is an embedded computing unit containing FPGA programmable hardware resources, and can realize the support of various intelligent algorithms by converting an artificial intelligent algorithm into hardware logic and configuring the hardware logic in the FPGA, so the embedded intelligent computing module has important significance for systematization, miniaturization and configuration definition as required of an embedded intelligent computing system. However, at present, the rapid conversion from a high-level artificial intelligence algorithm model to hardware codes and the effective performance evaluation of the artificial intelligence algorithm on the embedded intelligent computing module cannot be realized.
Disclosure of Invention
In order to solve the problems mentioned in the background, the invention provides a hardware code generation and performance evaluation method, which solves the problems of rapid hardware loading and performance evaluation of a deep neural network intelligent algorithm, accelerates the design and deployment of an artificial intelligent algorithm of a complex embedded computing system, and improves the processing capacity and the resource utilization rate.
The application provides a hardware code generation and performance evaluation method, which is applied to an embedded computing system, wherein the embedded computing system comprises at least one configurable embedded intelligent computing module, the embedded intelligent computing module internally comprises an FPGA (field programmable gate array), and the FPGA can configure at least two deep neural network algorithm types; the method comprises the following steps:
analyzing the structure, the data type, the operation type and the processing flow of the deep neural network algorithm model to generate a deep neural network algorithm model analysis report;
according to the deep neural network algorithm model analysis report, selecting an intelligent computing unit IP corresponding to the operation type from an intelligent computing unit IP library, instantiating and selecting an optimal interconnection network structure;
mapping all operations in the deep neural network algorithm model analysis report to the IP examples of the intelligent computing units in a one-to-one correspondence manner; generating a hardware configuration scheme of a deep neural network algorithm in an embedded intelligent computing module;
generating a comprehensive hardware code of the deep neural network algorithm according to the hardware configuration scheme, and generating a binary file which can be loaded to the embedded intelligent computing module through hardware comprehensive software;
and performing functional simulation and performance analysis by taking the deep neural network algorithm model as a test reference to obtain a performance analysis report and a time sequence diagram.
Preferably, the analysis report of the deep neural network algorithm model includes: the algorithm type of the deep neural network; the number of network layers; the characteristic diagram data format, the characteristic diagram data bit width, the parameter data format and the parameter data bit width of each layer; the operation type of each layer; the data flow between different layers of the algorithm.
Preferably, the intelligent computing unit IP library is a set of hardware comprehensive processing unit IPs facing to the deep neural network algorithm operation type, and comprises vector multiplication and addition, pooling, activation functions and classification; the data format and the data bit width of each IP unit are configurable;
the interconnection network structure is a network structure selected according to data flow directions among different intelligent computing unit IP instances and comprises point-to-point direct connection, a shared bus, a crossbar switch and a switching network.
Preferably, mapping all operations in the analysis report of the deep neural network algorithm model to the instances of the intelligent computing units IP in a one-to-one correspondence manner specifically includes:
a multi-objective optimization task-resource mapping algorithm is adopted, the algorithm comprises a genetic algorithm, a particle swarm algorithm, a simulated annealing algorithm and an ant colony algorithm, and the optimization objective comprises the minimum processing delay and the minimum hardware resource consumption of an embedded intelligent computing module.
Preferably, the deep neural network algorithm model test benchmark comprises AlexNet, VGG and MobileNet convolution neural network algorithm models for intelligent image target classification; SSD, YOLO and Fast R-CNN convolution neural network algorithm models for intelligent target detection, and LSTM and GRU circulation neural network models.
Preferably, the functional simulation is performed by using the deep neural network algorithm model as a test reference, and specifically includes: whether the hardware configuration of the deep neural network algorithm model can be loaded normally or not; whether the logic resource quantity of the FPGA in the embedded intelligent computing module meets the requirement of deep neural network algorithm model configuration or not; testing whether the data set can be input normally; whether the deep neural network algorithm function is normal or not is judged, wherein the deep neural network algorithm function comprises target recognition, voice recognition and task decision; whether the operation result of the algorithm can be normally output and displayed or not;
preferably, the performance analysis is performed by using the deep neural network algorithm model as a test reference, and specifically includes: the operation speed and the average accuracy of the deep neural network algorithm model on the embedded intelligent computing module, the number of occupied FPGA resources and the power consumption.
Preferably, the types of deep neural network intelligence algorithms include convolutional neural networks CNN and recurrent neural networks RNN.
The invention has the advantages that: a hardware code generation and performance evaluation method comprises the following steps: the method comprises the steps of algorithm model analysis of the deep neural network, intelligent computing unit IP instantiation, interconnection network structure selection, algorithm-intelligent computing unit IP mapping and network configuration optimization, comprehensive hardware code generation, function simulation and performance analysis, can support the whole process of deep neural network algorithm model analysis-hardware configuration-hardware code generation-function simulation-performance analysis, realizes rapid hardware loading and performance evaluation of the deep neural network intelligent algorithm, and improves the processing capacity and resource utilization rate of an embedded intelligent computing system
Drawings
Fig. 1 is a schematic flowchart of a hardware code generation and performance evaluation method according to an embodiment of the present invention.
Detailed Description
Example one
The embodiment of the invention provides a hardware code generation and performance evaluation method, which is characterized in that the method is applied to an embedded computing system, and the embedded computing system comprises at least one configurable embedded intelligent computing module; the configurable embedded intelligent computing module internally comprises FPGA programmable hardware resources, and can operate at least two types of configurable intelligent algorithm types through configuring and operating a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) deep neural network intelligent algorithm;
the hardware code generation and performance evaluation method comprises the following steps:
(1) Analyzing an algorithm model of the deep neural network;
(2) An intelligent computing unit IP instantiation and interconnection network structure selection;
(3) Algorithm-intelligent computing unit IP mapping and network configuration optimization;
(4) Comprehensive hardware code generation is possible;
(5) Functional simulation and performance analysis;
the output of the hardware code generation and performance evaluation method comprises: the system comprises a binary file, a performance analysis report and a timing chart which can be loaded into an embedded intelligent computing module FPGA.
Further, the deep neural network algorithm model analysis is to analyze the structure, data type, operation type and processing flow of the deep neural network algorithm model applied to artificial intelligence to generate a deep neural network algorithm model analysis report;
the deep neural network comprises: the convolutional neural network is used for intelligent video/image target recognition, the cyclic neural network is used for intelligent voice recognition and intelligent decision control, and the deep neural network is self-defined by other application scenes;
the deep neural network algorithm model comprises: algorithm models based on Caffe, tensorFlow and PyTorch development frameworks, and algorithm models written by C, C + +, python, matlab and Java high-level programming languages;
the intelligent algorithm model analysis report comprises: the algorithm type of the deep neural network; the number of network layers; the data format, bit width, parameter data format and bit width of the feature map data of each layer; the operation type of each layer; the data flow between different layers of the algorithm.
Furthermore, the intelligent computing unit IP instantiation and interconnection network structure selection is to select an intelligent computing unit IP of a corresponding operation type in an intelligent computing unit IP library according to the deep neural network algorithm model analysis report and instantiate according to a feature map, a parameter data format and bit width; selecting an optimal interconnection network structure according to the data flow direction between different layers of the algorithm;
the intelligent computing unit IP library is a set of processing unit IPs which are oriented to the deep neural network arithmetic operation type and can be synthesized by hardware, and comprises vector multiplication and addition, pooling, activation functions and classification; the data format and the data bit width of each intelligent computing unit IP are configurable;
the interconnection structure is a network structure selected according to data flow directions among different intelligent computing unit IP instances and comprises point-to-point direct connection, a shared bus, a crossbar switch and a switching network.
Further, the algorithm-intelligent computing unit IP mapping and network configuration optimization is to map all operations in the deep neural network algorithm model analysis report to the embedded intelligent computing unit IP instance in a one-to-one correspondence manner; configuring and optimizing the bandwidth of the link between the nodes in the interconnection structure according to the data traffic between different layers of the algorithm model, and generating a hardware configuration scheme of the deep neural network algorithm in the embedded intelligent computing module;
the algorithm-intelligent computing unit IP mapping adopts a multi-objective optimized task-resource mapping algorithm, including genetic algorithm, particle swarm algorithm, simulated annealing algorithm and ant colony algorithm, and the optimized objective includes minimum processing delay and minimum hardware resource consumption of an embedded intelligent computing module;
and the network configuration optimization is to optimize the data bandwidth configuration of the IP link of the intelligent computing unit according to the data traffic among different layers of operation in the analysis report of the deep neural network algorithm model, and configure the link bandwidth in equal proportion according to the data traffic, so as to ensure that the data transmission time delay among different IPs of the intelligent computing unit is consistent.
Furthermore, the generation of the synthesizable hardware code is to generate the synthesizable hardware code of the deep neural network algorithm according to the hardware configuration scheme of the deep neural network algorithm, and generate a binary file which can be loaded to the embedded intelligent computing module through hardware synthesis software;
further, the functional simulation and performance analysis is simulation analysis carried out by taking a deep neural network algorithm model as a test reference;
the deep neural network algorithm model test benchmark comprises AlexNet, VGG and MobileNet convolutional neural network algorithm models for intelligent image target classification, SSD, YOLO and Fast R-CNN convolutional neural network algorithm models for intelligent target detection, LSTM and GRU cyclic neural network models and other self-defined deep neural network algorithm models;
the functional simulation is functional verification of the deep neural network algorithm model running on the embedded intelligent computing module, and comprises the following steps: whether the hardware configuration of the deep neural network algorithm model can be loaded normally or not; whether the logic resource quantity of the FPGA in the embedded intelligent computing module meets the requirement of deep neural network algorithm model configuration or not; testing whether the data set can be input normally; whether the deep neural network algorithm functions are normal or not, such as target recognition, voice recognition and task decision functions; whether the algorithm operation result can be normally output and displayed or not;
the performance analysis is to analyze the performance and the time sequence of the deep neural network algorithm model running on the embedded intelligent computing module, and comprises the following steps: the running speed and the average accuracy of the deep neural network algorithm model on the embedded intelligent computing module, the number of occupied FPGA resources and the power consumption.
Example two
The present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic flow diagram of a hardware code generation and performance evaluation method provided in an embodiment of the present invention is shown, where the method mainly includes:
s101, analyzing an algorithm model of the deep neural network, analyzing the structure, the data type, the operation type and the processing flow of the algorithm model of the deep neural network, and generating an analysis report of the algorithm model of the deep neural network;
s102, an intelligent computing unit IP is instantiated and an interconnection network structure is selected, firstly, according to a deep neural network algorithm model analysis report, an intelligent computing unit IP corresponding to an operation type is selected from an intelligent computing unit IP library, instantiation is carried out according to a feature map, a parameter data format and bit width, and then an optimal interconnection network structure is selected according to a data flow direction among different layers of an algorithm;
s103, mapping an algorithm-intelligent computing unit IP and optimizing network configuration, namely mapping all operations in a deep neural network algorithm model analysis report to an embedded intelligent computing unit IP instance in a one-to-one correspondence manner, and then performing configuration optimization on the bandwidth of links among nodes in an interconnection structure according to data traffic among different layers of the algorithm model to generate a hardware configuration scheme of the deep neural network algorithm in an embedded intelligent computing module;
s104, generating a synthesizable hardware code, generating the synthesizable hardware code of the deep neural network algorithm according to the hardware configuration scheme of the deep neural network algorithm, and generating a binary file which can be loaded to the embedded intelligent computing module through hardware synthesis software;
s105, function simulation and performance analysis of the embedded intelligent computing module, namely firstly verifying whether hardware configuration of the deep neural network algorithm model can be loaded normally or not, whether the number of logic resources of an FPGA in the embedded intelligent computing module meets the requirement of the deep neural network algorithm model configuration or not, whether a test data set can be input normally or not, whether the deep neural network algorithm function is normal or not, whether an algorithm operation result can be output normally or not and displaying the function or not, and then analyzing the operation speed, the average accuracy, the number of occupied FPGA resources and power consumption performance indexes of the deep neural network algorithm model on the embedded intelligent computing module.
The invention provides a hardware code generation and performance evaluation method, and belongs to the field of embedded computing. The method comprises the following steps: the method comprises the steps of deep neural network algorithm model analysis, intelligent computing unit IP instantiation and interconnection network structure selection, algorithm-intelligent computing unit IP mapping and network configuration optimization, comprehensive hardware code generation, function simulation and performance analysis. The input supported by the method comprises a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) deep neural network algorithm model, a binary file of an FPGA (field programmable gate array) of which the deep neural network algorithm can be loaded into an embedded intelligent computing module can be generated, and a performance analysis report and a timing diagram which run in the embedded intelligent computing module. Hardware code generation and performance evaluation of multiple deep neural network intelligent algorithms are realized on the embedded intelligent computing module, so that artificial intelligent algorithm design and deployment of complex embedded computing systems such as an airborne computer and the like can be accelerated, and the processing capacity and resource utilization rate of the embedded intelligent computing module are improved.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the scope of protection not disclosed should be dominated by the scope of protection claimed.

Claims (6)

1. A hardware code generation and performance evaluation method is applied to an embedded computing system, wherein the embedded computing system comprises at least one configurable embedded intelligent computing module, an FPGA is contained in the embedded intelligent computing module, and the FPGA can configure at least two deep neural network algorithm types; the method comprises the following steps:
analyzing the structure, the data type, the operation type and the processing flow of the deep neural network algorithm model to generate a deep neural network algorithm model analysis report;
according to the deep neural network algorithm model analysis report, selecting an intelligent computing unit IP corresponding to the operation type from an intelligent computing unit IP library, instantiating and selecting an optimal interconnection network structure;
mapping all operations in the deep neural network algorithm model analysis report to the IP examples of the intelligent computing units in a one-to-one correspondence manner; generating a hardware configuration scheme of a deep neural network algorithm in an embedded intelligent computing module;
generating a comprehensive hardware code of the deep neural network algorithm according to the hardware configuration scheme, and generating a binary file which can be loaded to the embedded intelligent computing module through comprehensive software;
taking the deep neural network algorithm model as a test reference, performing functional simulation and performance analysis to obtain a performance analysis report and a timing chart;
the analysis report of the deep neural network algorithm model comprises the following steps: the algorithm type of the deep neural network; the number of network layers; the data format, the characteristic diagram, the data bit width, the parameter data format and the parameter data bit width of each layer of characteristic diagram data; the operation type of each layer; data flow direction between different layers of the algorithm;
the intelligent computing unit IP library is a set of hardware comprehensive processing unit IPs facing to the deep neural network algorithm operation type, and comprises vector multiplication and addition, pooling, activation functions and classification; the data format and the data bit width of each IP unit are configurable;
the interconnection network structure is a network structure selected according to data flow directions among different intelligent computing unit IP instances and comprises point-to-point direct connection, a shared bus, a cross switch and a switching network.
2. The method of claim 1, wherein mapping all operations in the deep neural network algorithm model analysis report to the instances of the intelligent computing units IP in a one-to-one correspondence comprises:
a multi-objective optimization task-resource mapping algorithm is adopted, the algorithm comprises a genetic algorithm, a particle swarm algorithm, a simulated annealing algorithm and an ant colony algorithm, and the optimization objective comprises the minimum processing delay and the minimum hardware resource consumption of an embedded intelligent computing module.
3. The method of claim 1, wherein the deep neural network algorithm model test benchmarks comprise AlexNet, VGG, mobileNet convolutional neural network algorithm models for intelligent image target classification; SSD, YOLO and Fast R-CNN convolution neural network algorithm models for intelligent target detection, LSTM and GRU circulation neural network models.
4. The method according to claim 1, wherein the functional simulation is performed with the deep neural network algorithm model as a test reference, and specifically comprises: whether the hardware configuration of the deep neural network algorithm model can be loaded normally or not; whether the logic resource quantity of the FPGA in the embedded intelligent computing module meets the requirement of deep neural network algorithm model configuration or not; testing whether the data set can be normally input; whether the deep neural network algorithm function is normal or not is judged, wherein the deep neural network algorithm function comprises target recognition, voice recognition and task decision; and whether the operation result of the algorithm can be normally output and displayed or not.
5. The method according to claim 1, wherein the performance analysis is performed with the deep neural network algorithm model as a test reference, and specifically comprises: the operation speed and the average accuracy of the deep neural network algorithm model on the embedded intelligent computing module, the number of occupied FPGA resources and the power consumption.
6. The method of claim 1, wherein the types of deep neural network intelligence algorithms include Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).
CN202011557924.3A 2020-12-24 2020-12-24 Hardware code generation and performance evaluation method Active CN112596718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011557924.3A CN112596718B (en) 2020-12-24 2020-12-24 Hardware code generation and performance evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011557924.3A CN112596718B (en) 2020-12-24 2020-12-24 Hardware code generation and performance evaluation method

Publications (2)

Publication Number Publication Date
CN112596718A CN112596718A (en) 2021-04-02
CN112596718B true CN112596718B (en) 2023-04-14

Family

ID=75202145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011557924.3A Active CN112596718B (en) 2020-12-24 2020-12-24 Hardware code generation and performance evaluation method

Country Status (1)

Country Link
CN (1) CN112596718B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221839A (en) * 2019-05-30 2019-09-10 华南理工大学 A kind of hardware accelerator VerilogHDL code automatic generation method
CN111104124A (en) * 2019-11-07 2020-05-05 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270205B2 (en) * 2018-02-28 2022-03-08 Sophos Limited Methods and apparatus for identifying the shared importance of multiple nodes within a machine learning model for multiple tasks
US11556762B2 (en) * 2018-04-21 2023-01-17 Microsoft Technology Licensing, Llc Neural network processor based on application specific synthesis specialization parameters
CN110058883B (en) * 2019-03-14 2023-06-16 梁磊 CNN acceleration method and system based on OPU
US20190391796A1 (en) * 2019-06-28 2019-12-26 Intel Corporation Control of scheduling dependencies by a neural network compiler
CN111191772B (en) * 2020-01-02 2022-12-06 中国航空工业集团公司西安航空计算技术研究所 Intelligent computing general acceleration system facing embedded environment and construction method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110221839A (en) * 2019-05-30 2019-09-10 华南理工大学 A kind of hardware accelerator VerilogHDL code automatic generation method
CN111104124A (en) * 2019-11-07 2020-05-05 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Also Published As

Publication number Publication date
CN112596718A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
EP3525119A1 (en) Deep learning fpga converter
CN113128143B (en) AI processor simulation method, AI processor simulation device, computer equipment and storage medium
CN110704364A (en) Automatic dynamic reconstruction method and system based on field programmable gate array
CN114358319B (en) Machine learning framework-based classification method and related device
CN114138674A (en) Automatic testing method and device and computer equipment
Doppa et al. Autonomous design space exploration of computing systems for sustainability: Opportunities and challenges
CN114358216B (en) Quantum clustering method based on machine learning framework and related device
CN111753973A (en) Optimization method, system, equipment and storage medium of neural network chip
Herget et al. Design space exploration for distributed cyber-physical systems: State-of-the-art, challenges, and directions
CN115130407A (en) Method for partitioning simulation models between a processor and an FPGA
CN112596718B (en) Hardware code generation and performance evaluation method
CN109933515B (en) Regression test case set optimization method and automatic optimization device
CN114357685A (en) Quantum chip performance simulation analysis system based on cloud platform
Zhuang et al. Towards high-quality CGRA mapping with graph neural networks and reinforcement learning
CN114358318B (en) Machine learning framework-based classification method and related device
WO2022166851A1 (en) Quantum computer operating system, quantum computer, and readable storage medium
Karl A Comparison of the architecture of network simulators NS-2 and TOSSIM
CN114511094B (en) Quantum algorithm optimization method and device, storage medium and electronic device
CN115469912A (en) Heterogeneous real-time information processing system design method
CN114358317A (en) Data classification method based on machine learning framework and related equipment
CN114462340A (en) Automatic design method for storage-computation module interconnection circuit of hardware accelerator
CN112579293A (en) Comprehensive verification method of distributed computing system
CN111930471A (en) GPU-based parallel simulation evaluation selection method
Tumeo et al. SO (DA)^ 2: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk)
EP4270121A1 (en) Method and system for seamless transition of runtime system from controller device to digitalization platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant