CN112596718B

CN112596718B - Hardware code generation and performance evaluation method

Info

Publication number: CN112596718B
Application number: CN202011557924.3A
Authority: CN
Inventors: 刘飞阳; 郭鹏; 文鹏程; 白林亭; 李奕璇; 李亚晖
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2023-04-14
Anticipated expiration: 2040-12-24
Also published as: CN112596718A

Abstract

The application provides a hardware code generation and performance evaluation method, which comprises the following steps: analyzing the structure, the data type, the operation type and the processing flow of the deep neural network algorithm model to generate a deep neural network algorithm model analysis report; according to the analysis report of the deep neural network algorithm model, selecting an intelligent computing unit IP corresponding to the operation type from an intelligent computing unit IP library, instantiating and selecting an optimal interconnection network structure; mapping all operations in the analysis report of the deep neural network algorithm model to the IP examples of the intelligent computing units in a one-to-one correspondence manner; generating a hardware configuration scheme; generating a comprehensive hardware code of a deep neural network algorithm according to a hardware configuration scheme, and generating a binary file which can be loaded to an embedded intelligent computing module through hardware comprehensive software; and (4) performing functional simulation and performance analysis by taking the deep neural network algorithm model as a test reference to obtain a performance analysis report and a time sequence diagram.

Description

Hardware code generation and performance evaluation method

Technical Field

The invention belongs to the field of embedded computing, and relates to a hardware code generation and performance evaluation method.

Background

With the increasing demand of complex embedded computing systems such as airborne processing systems for intelligent computing in recent years, artificial intelligence algorithms represented by deep neural networks are gradually popularized and applied, for example, convolutional neural network algorithm models such as AlexNet, VGG, mobileNet and the like for intelligent image target classification, convolutional neural network algorithm models such as SSD, YOLO, fast R-CNN and the like for intelligent target detection, and cyclic neural network models such as LSTM, GRU and the like. However, the complex embedded computing system faces various intelligent task scenarios, such as image recognition, voice recognition, target positioning, target confirmation, decision control, etc., and it is necessary to be able to run various deep neural network intelligent algorithms in a unified embedded intelligent computing module, and to be able to quickly implement the loading and running of a high-level algorithm model in an intelligent computing hardware module. At present, dedicated intelligent computing chips introduced at home and abroad, such as Hi3559A and Shengteng 310 of Haimai, only support image applications such as target identification, etc., while general intelligent computing chips of the Mirabilis MLU series are difficult to optimize for different intelligent algorithms, and have yet to be improved in performance and power consumption, and in addition, the problems of immature instruction sets, inconvenient use of development tool software, difficulty in developing accurate performance evaluation, etc. generally exist in the existing intelligent computing chips. The embedded intelligent computing module is an embedded computing unit containing FPGA programmable hardware resources, and can realize the support of various intelligent algorithms by converting an artificial intelligent algorithm into hardware logic and configuring the hardware logic in the FPGA, so the embedded intelligent computing module has important significance for systematization, miniaturization and configuration definition as required of an embedded intelligent computing system. However, at present, the rapid conversion from a high-level artificial intelligence algorithm model to hardware codes and the effective performance evaluation of the artificial intelligence algorithm on the embedded intelligent computing module cannot be realized.

Disclosure of Invention

In order to solve the problems mentioned in the background, the invention provides a hardware code generation and performance evaluation method, which solves the problems of rapid hardware loading and performance evaluation of a deep neural network intelligent algorithm, accelerates the design and deployment of an artificial intelligent algorithm of a complex embedded computing system, and improves the processing capacity and the resource utilization rate.

The application provides a hardware code generation and performance evaluation method, which is applied to an embedded computing system, wherein the embedded computing system comprises at least one configurable embedded intelligent computing module, the embedded intelligent computing module internally comprises an FPGA (field programmable gate array), and the FPGA can configure at least two deep neural network algorithm types; the method comprises the following steps:

analyzing the structure, the data type, the operation type and the processing flow of the deep neural network algorithm model to generate a deep neural network algorithm model analysis report;

according to the deep neural network algorithm model analysis report, selecting an intelligent computing unit IP corresponding to the operation type from an intelligent computing unit IP library, instantiating and selecting an optimal interconnection network structure;

mapping all operations in the deep neural network algorithm model analysis report to the IP examples of the intelligent computing units in a one-to-one correspondence manner; generating a hardware configuration scheme of a deep neural network algorithm in an embedded intelligent computing module;

generating a comprehensive hardware code of the deep neural network algorithm according to the hardware configuration scheme, and generating a binary file which can be loaded to the embedded intelligent computing module through hardware comprehensive software;

and performing functional simulation and performance analysis by taking the deep neural network algorithm model as a test reference to obtain a performance analysis report and a time sequence diagram.

Preferably, the analysis report of the deep neural network algorithm model includes: the algorithm type of the deep neural network; the number of network layers; the characteristic diagram data format, the characteristic diagram data bit width, the parameter data format and the parameter data bit width of each layer; the operation type of each layer; the data flow between different layers of the algorithm.

Preferably, the intelligent computing unit IP library is a set of hardware comprehensive processing unit IPs facing to the deep neural network algorithm operation type, and comprises vector multiplication and addition, pooling, activation functions and classification; the data format and the data bit width of each IP unit are configurable;

the interconnection network structure is a network structure selected according to data flow directions among different intelligent computing unit IP instances and comprises point-to-point direct connection, a shared bus, a crossbar switch and a switching network.

Preferably, mapping all operations in the analysis report of the deep neural network algorithm model to the instances of the intelligent computing units IP in a one-to-one correspondence manner specifically includes:

a multi-objective optimization task-resource mapping algorithm is adopted, the algorithm comprises a genetic algorithm, a particle swarm algorithm, a simulated annealing algorithm and an ant colony algorithm, and the optimization objective comprises the minimum processing delay and the minimum hardware resource consumption of an embedded intelligent computing module.

Preferably, the deep neural network algorithm model test benchmark comprises AlexNet, VGG and MobileNet convolution neural network algorithm models for intelligent image target classification; SSD, YOLO and Fast R-CNN convolution neural network algorithm models for intelligent target detection, and LSTM and GRU circulation neural network models.

Preferably, the functional simulation is performed by using the deep neural network algorithm model as a test reference, and specifically includes: whether the hardware configuration of the deep neural network algorithm model can be loaded normally or not; whether the logic resource quantity of the FPGA in the embedded intelligent computing module meets the requirement of deep neural network algorithm model configuration or not; testing whether the data set can be input normally; whether the deep neural network algorithm function is normal or not is judged, wherein the deep neural network algorithm function comprises target recognition, voice recognition and task decision; whether the operation result of the algorithm can be normally output and displayed or not;

preferably, the performance analysis is performed by using the deep neural network algorithm model as a test reference, and specifically includes: the operation speed and the average accuracy of the deep neural network algorithm model on the embedded intelligent computing module, the number of occupied FPGA resources and the power consumption.

Preferably, the types of deep neural network intelligence algorithms include convolutional neural networks CNN and recurrent neural networks RNN.

The invention has the advantages that: a hardware code generation and performance evaluation method comprises the following steps: the method comprises the steps of algorithm model analysis of the deep neural network, intelligent computing unit IP instantiation, interconnection network structure selection, algorithm-intelligent computing unit IP mapping and network configuration optimization, comprehensive hardware code generation, function simulation and performance analysis, can support the whole process of deep neural network algorithm model analysis-hardware configuration-hardware code generation-function simulation-performance analysis, realizes rapid hardware loading and performance evaluation of the deep neural network intelligent algorithm, and improves the processing capacity and resource utilization rate of an embedded intelligent computing system

Drawings

Fig. 1 is a schematic flowchart of a hardware code generation and performance evaluation method according to an embodiment of the present invention.

Detailed Description

Example one

The embodiment of the invention provides a hardware code generation and performance evaluation method, which is characterized in that the method is applied to an embedded computing system, and the embedded computing system comprises at least one configurable embedded intelligent computing module; the configurable embedded intelligent computing module internally comprises FPGA programmable hardware resources, and can operate at least two types of configurable intelligent algorithm types through configuring and operating a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) deep neural network intelligent algorithm;

the hardware code generation and performance evaluation method comprises the following steps:

(1) Analyzing an algorithm model of the deep neural network;

(2) An intelligent computing unit IP instantiation and interconnection network structure selection;

(3) Algorithm-intelligent computing unit IP mapping and network configuration optimization;

(4) Comprehensive hardware code generation is possible;

(5) Functional simulation and performance analysis;

the output of the hardware code generation and performance evaluation method comprises: the system comprises a binary file, a performance analysis report and a timing chart which can be loaded into an embedded intelligent computing module FPGA.

Further, the deep neural network algorithm model analysis is to analyze the structure, data type, operation type and processing flow of the deep neural network algorithm model applied to artificial intelligence to generate a deep neural network algorithm model analysis report;

the deep neural network comprises: the convolutional neural network is used for intelligent video/image target recognition, the cyclic neural network is used for intelligent voice recognition and intelligent decision control, and the deep neural network is self-defined by other application scenes;

the deep neural network algorithm model comprises: algorithm models based on Caffe, tensorFlow and PyTorch development frameworks, and algorithm models written by C, C + +, python, matlab and Java high-level programming languages;

the intelligent algorithm model analysis report comprises: the algorithm type of the deep neural network; the number of network layers; the data format, bit width, parameter data format and bit width of the feature map data of each layer; the operation type of each layer; the data flow between different layers of the algorithm.

Furthermore, the intelligent computing unit IP instantiation and interconnection network structure selection is to select an intelligent computing unit IP of a corresponding operation type in an intelligent computing unit IP library according to the deep neural network algorithm model analysis report and instantiate according to a feature map, a parameter data format and bit width; selecting an optimal interconnection network structure according to the data flow direction between different layers of the algorithm;

the intelligent computing unit IP library is a set of processing unit IPs which are oriented to the deep neural network arithmetic operation type and can be synthesized by hardware, and comprises vector multiplication and addition, pooling, activation functions and classification; the data format and the data bit width of each intelligent computing unit IP are configurable;

the interconnection structure is a network structure selected according to data flow directions among different intelligent computing unit IP instances and comprises point-to-point direct connection, a shared bus, a crossbar switch and a switching network.

Further, the algorithm-intelligent computing unit IP mapping and network configuration optimization is to map all operations in the deep neural network algorithm model analysis report to the embedded intelligent computing unit IP instance in a one-to-one correspondence manner; configuring and optimizing the bandwidth of the link between the nodes in the interconnection structure according to the data traffic between different layers of the algorithm model, and generating a hardware configuration scheme of the deep neural network algorithm in the embedded intelligent computing module;

the algorithm-intelligent computing unit IP mapping adopts a multi-objective optimized task-resource mapping algorithm, including genetic algorithm, particle swarm algorithm, simulated annealing algorithm and ant colony algorithm, and the optimized objective includes minimum processing delay and minimum hardware resource consumption of an embedded intelligent computing module;

and the network configuration optimization is to optimize the data bandwidth configuration of the IP link of the intelligent computing unit according to the data traffic among different layers of operation in the analysis report of the deep neural network algorithm model, and configure the link bandwidth in equal proportion according to the data traffic, so as to ensure that the data transmission time delay among different IPs of the intelligent computing unit is consistent.

Furthermore, the generation of the synthesizable hardware code is to generate the synthesizable hardware code of the deep neural network algorithm according to the hardware configuration scheme of the deep neural network algorithm, and generate a binary file which can be loaded to the embedded intelligent computing module through hardware synthesis software;

further, the functional simulation and performance analysis is simulation analysis carried out by taking a deep neural network algorithm model as a test reference;

the deep neural network algorithm model test benchmark comprises AlexNet, VGG and MobileNet convolutional neural network algorithm models for intelligent image target classification, SSD, YOLO and Fast R-CNN convolutional neural network algorithm models for intelligent target detection, LSTM and GRU cyclic neural network models and other self-defined deep neural network algorithm models;

the functional simulation is functional verification of the deep neural network algorithm model running on the embedded intelligent computing module, and comprises the following steps: whether the hardware configuration of the deep neural network algorithm model can be loaded normally or not; whether the logic resource quantity of the FPGA in the embedded intelligent computing module meets the requirement of deep neural network algorithm model configuration or not; testing whether the data set can be input normally; whether the deep neural network algorithm functions are normal or not, such as target recognition, voice recognition and task decision functions; whether the algorithm operation result can be normally output and displayed or not;

the performance analysis is to analyze the performance and the time sequence of the deep neural network algorithm model running on the embedded intelligent computing module, and comprises the following steps: the running speed and the average accuracy of the deep neural network algorithm model on the embedded intelligent computing module, the number of occupied FPGA resources and the power consumption.

Example two

The present invention will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic flow diagram of a hardware code generation and performance evaluation method provided in an embodiment of the present invention is shown, where the method mainly includes:

s101, analyzing an algorithm model of the deep neural network, analyzing the structure, the data type, the operation type and the processing flow of the algorithm model of the deep neural network, and generating an analysis report of the algorithm model of the deep neural network;

s102, an intelligent computing unit IP is instantiated and an interconnection network structure is selected, firstly, according to a deep neural network algorithm model analysis report, an intelligent computing unit IP corresponding to an operation type is selected from an intelligent computing unit IP library, instantiation is carried out according to a feature map, a parameter data format and bit width, and then an optimal interconnection network structure is selected according to a data flow direction among different layers of an algorithm;

s103, mapping an algorithm-intelligent computing unit IP and optimizing network configuration, namely mapping all operations in a deep neural network algorithm model analysis report to an embedded intelligent computing unit IP instance in a one-to-one correspondence manner, and then performing configuration optimization on the bandwidth of links among nodes in an interconnection structure according to data traffic among different layers of the algorithm model to generate a hardware configuration scheme of the deep neural network algorithm in an embedded intelligent computing module;

s104, generating a synthesizable hardware code, generating the synthesizable hardware code of the deep neural network algorithm according to the hardware configuration scheme of the deep neural network algorithm, and generating a binary file which can be loaded to the embedded intelligent computing module through hardware synthesis software;

s105, function simulation and performance analysis of the embedded intelligent computing module, namely firstly verifying whether hardware configuration of the deep neural network algorithm model can be loaded normally or not, whether the number of logic resources of an FPGA in the embedded intelligent computing module meets the requirement of the deep neural network algorithm model configuration or not, whether a test data set can be input normally or not, whether the deep neural network algorithm function is normal or not, whether an algorithm operation result can be output normally or not and displaying the function or not, and then analyzing the operation speed, the average accuracy, the number of occupied FPGA resources and power consumption performance indexes of the deep neural network algorithm model on the embedded intelligent computing module.

The invention provides a hardware code generation and performance evaluation method, and belongs to the field of embedded computing. The method comprises the following steps: the method comprises the steps of deep neural network algorithm model analysis, intelligent computing unit IP instantiation and interconnection network structure selection, algorithm-intelligent computing unit IP mapping and network configuration optimization, comprehensive hardware code generation, function simulation and performance analysis. The input supported by the method comprises a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) deep neural network algorithm model, a binary file of an FPGA (field programmable gate array) of which the deep neural network algorithm can be loaded into an embedded intelligent computing module can be generated, and a performance analysis report and a timing diagram which run in the embedded intelligent computing module. Hardware code generation and performance evaluation of multiple deep neural network intelligent algorithms are realized on the embedded intelligent computing module, so that artificial intelligent algorithm design and deployment of complex embedded computing systems such as an airborne computer and the like can be accelerated, and the processing capacity and resource utilization rate of the embedded intelligent computing module are improved.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the scope of protection not disclosed should be dominated by the scope of protection claimed.

Claims

1. A hardware code generation and performance evaluation method is applied to an embedded computing system, wherein the embedded computing system comprises at least one configurable embedded intelligent computing module, an FPGA is contained in the embedded intelligent computing module, and the FPGA can configure at least two deep neural network algorithm types; the method comprises the following steps:

generating a comprehensive hardware code of the deep neural network algorithm according to the hardware configuration scheme, and generating a binary file which can be loaded to the embedded intelligent computing module through comprehensive software;

taking the deep neural network algorithm model as a test reference, performing functional simulation and performance analysis to obtain a performance analysis report and a timing chart;

the analysis report of the deep neural network algorithm model comprises the following steps: the algorithm type of the deep neural network; the number of network layers; the data format, the characteristic diagram, the data bit width, the parameter data format and the parameter data bit width of each layer of characteristic diagram data; the operation type of each layer; data flow direction between different layers of the algorithm;

the intelligent computing unit IP library is a set of hardware comprehensive processing unit IPs facing to the deep neural network algorithm operation type, and comprises vector multiplication and addition, pooling, activation functions and classification; the data format and the data bit width of each IP unit are configurable;

the interconnection network structure is a network structure selected according to data flow directions among different intelligent computing unit IP instances and comprises point-to-point direct connection, a shared bus, a cross switch and a switching network.

2. The method of claim 1, wherein mapping all operations in the deep neural network algorithm model analysis report to the instances of the intelligent computing units IP in a one-to-one correspondence comprises:

3. The method of claim 1, wherein the deep neural network algorithm model test benchmarks comprise AlexNet, VGG, mobileNet convolutional neural network algorithm models for intelligent image target classification; SSD, YOLO and Fast R-CNN convolution neural network algorithm models for intelligent target detection, LSTM and GRU circulation neural network models.

4. The method according to claim 1, wherein the functional simulation is performed with the deep neural network algorithm model as a test reference, and specifically comprises: whether the hardware configuration of the deep neural network algorithm model can be loaded normally or not; whether the logic resource quantity of the FPGA in the embedded intelligent computing module meets the requirement of deep neural network algorithm model configuration or not; testing whether the data set can be normally input; whether the deep neural network algorithm function is normal or not is judged, wherein the deep neural network algorithm function comprises target recognition, voice recognition and task decision; and whether the operation result of the algorithm can be normally output and displayed or not.

5. The method according to claim 1, wherein the performance analysis is performed with the deep neural network algorithm model as a test reference, and specifically comprises: the operation speed and the average accuracy of the deep neural network algorithm model on the embedded intelligent computing module, the number of occupied FPGA resources and the power consumption.

6. The method of claim 1, wherein the types of deep neural network intelligence algorithms include Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).