CN115204368A

CN115204368A - Aircraft engine fault diagnosis method based on intelligent chip technology

Info

Publication number: CN115204368A
Application number: CN202210823470.2A
Authority: CN
Inventors: 王翔; 郝强; 周智喻; 贾浩宇; 张准; 徐冬冬
Original assignee: Shaoxing Yangyu Smart Chip Co ltd
Current assignee: Shaoxing Yangyu Smart Chip Co ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-10-18

Abstract

The invention provides an aircraft engine fault diagnosis method based on an intelligent chip technology. The method extracts the original fault data of the engine, and improves the generalization capability of the model through data preprocessing; designing an intelligent fault diagnosis model based on a deep neural network, and deeply coupling a self-attention mechanism with a convolutional neural network and an LSTM network to improve the fault diagnosis accuracy; obtaining an optimal intelligent fault diagnosis model through repeated testing and model parameter adjustment; hardware IP core design is carried out on an FPGA, and hardware acceleration optimization design based on cyclic expansion is realized, so that the calculation speed is increased; and finishing the back end design of the chip. The intelligent fault diagnosis chip for the aircraft engine has the advantages that the intelligent fault diagnosis chip for the aircraft engine is high in fault diagnosis accuracy rate, strong in generalization capability and high in calculation speed, the safety and reliability of the engine are improved, and the maintenance cost and troubleshooting time are reduced.

Description

Aircraft engine fault diagnosis method based on intelligent chip technology

Technical Field

The application mainly relates to the technical field of intelligent chips for fault diagnosis, in particular to an aircraft engine fault diagnosis method based on an intelligent chip technology, which can perform optimization processing and intelligent analysis aiming at original fault data of complex control systems of aircraft engines, automobile engine electronics, similar industrial equipment and the like, determine complex relation between key data information and multiple fault symptoms, and realize accurate and rapid fault diagnosis.

Background

The aircraft engine is the core system of the aircraft, and the operational stability and reliability of the aircraft engine directly influence the use safety. If the engine fails, the running efficiency is reduced and economic loss is caused; serious safety accidents are generated, and serious economic losses are caused. Therefore, the method can accurately and quickly diagnose the faults of the aircraft engine, can effectively improve the reliability and maintainability of the aircraft engine, enhances the scientificity of maintenance decision, is the key for ensuring the safe and reliable operation of the aircraft, and has very important significance for reducing the operation cost, shortening the maintenance period, improving the economic benefit and the like.

The failure symptoms and failure causes of an aircraft engine often exhibit a nonlinear, complex relationship between them. Deep Neural Networks (DNN) have the advantages of fault tolerance, self-adaption, self-learning and the like, and can well express any functional relation. Therefore, in a data-driven context, intelligent fault diagnosis is performed by using DNN, and a nonlinear mapping relation between fault symptoms and fault causes is determined, so that the method is a promising technical route at present.

However, the existing DNN-based fault diagnosis scheme does not fully consider that different sensors contain different fault information, and key data characteristics are omitted during data processing, so that the fault diagnosis accuracy of an algorithm model is low, and the generalization capability is insufficient; in addition, these solutions do not fully consider the characteristics of DNN layers (including convolutional layers, activation functions, full link layers, pooling layers, and the like), fail to perform targeted hardware design optimization, and have a problem of slow computation speed.

Lei et al ^[1] A wind driven generator fault diagnosis method based on a long-short-term memory Network (LSTM) and a Convolutional Neural Network (CNN) is provided, but only the feature extraction by convolution on a single scale is considered, and the model generalization capability is not high; tang Si et al ^[2] An end-to-end multi-scale excitation attention convolution neural network is provided, which can be used for motor fault diagnosis, but only the traditional attention weighting module is used, only the correlation between a source data sequence and a target data sequence can be noticed, and the correlation of respective internal data is not noticed; google machine translation team ^[3] A Self-Attention Mechanism (SAM) is provided, which not only can pay Attention to the mutual relationship between the source data sequence and the target data sequence, but also can add the Self relationship of the source data sequence to the relationship of the target data sequence to obtain the mutual relationship between the source data sequence and the target data sequence. However, the self-attention mechanism is introduced into the intelligent fault diagnosis model of the aircraft engine, and key parameters such as channel information and time information of original fault data of the aircraft engine also need to be considered.

The deep neural network computing platform realized based on an Application Specific Integrated Circuit (ASIC) has the advantages of high running speed, low power consumption and the like. However, in order to verify the function of the ASIC chip in advance, it is usually necessary to perform hardware implementation on a Field-Programmable Gate Array (FPGA). Ma, etc ^[4] The convolution neural network based on the FPGA is accelerated by using technologies such as loop splitting and loop tiling, but the problem of low hardware efficiency exists. Meishiwei et al ^[5] A parallel computing circuit is designed to improve the network computing speed, but the parallel computing circuit only focuses on the parallel computing optimization of Intellectual Property (IP) cores of convolutional layers, ignores the characteristics of IP cores such as an activation function, a full connection layer and a pooling layer, and also has a certain design optimization space. And the parallel computing method is suitable for target detection of images but not suitable for target detection of imagesAnd diagnosing faults of the aircraft engine.

Therefore, a deep neural network model capable of comprehensively considering the characteristics of original fault data under the conditions of different sensors, different working conditions and the like needs to be designed, so that the accuracy and the generalization capability of the fault diagnosis of the aircraft engine are improved; aiming at the hardware realization of the model, the method combines the characteristics of a convolution layer, an activation function, a full connection layer, a pooling layer and the like, provides a hardware acceleration optimization technology based on cycle expansion, accelerates the calculation speed of the intelligent fault diagnosis chip, and improves the timeliness of fault diagnosis.

Reference:

[1]Lei J,Liu C,Jiang D.Fault diagnosis of wind turbine based on long short-term memory networks[J].Renewable Energy,2019,133:422-432.

[2] tangs, old and new, zheng Song, motor bearing Fault diagnosis based on attention and multiscale convolutional neural networks [ J ] Electrical technology, 2020,21 (11): 32-38.

[3]Vaswani A,Shazeer N,Parmar N,et al.Attention Is All You Need[C].Advances in neural information processing systems,2017,5998-6008.

[4]Ma Y,Suda N,Cao Y,et al.Scalable and modularized RTL compilation of convolutional neural networks onto FPGA[C].2016 26th International Conference on Field Programmable Logic and Applications(FPL).IEEE,2016:1-8.

[5] Meishiwei, dingxingjun, liujinpeng, YOLOv3-tiny convolution neural network accelerated design based on FPGA [ J ] ship electronic countermeasure, 2022,45 (2): 81-88,108.

Disclosure of Invention

1. The purpose of the invention is as follows:

the purpose of the application is to provide an aircraft engine fault diagnosis method based on an intelligent chip technology aiming at the defects of the existing fault diagnosis technology. According to the method, firstly, preprocessing operations such as visual analysis, normalization processing, index weighting smoothing processing and the like are carried out on original fault data accumulated when an aircraft engine operates, and the generalization capability of a model is improved; then, designing an intelligent fault diagnosis model based on DNN, deeply coupling SAM with CNN and LSTM networks, extracting the relation among sensor parameters and the decline information quantity of different operation conditions, and improving the fault diagnosis accuracy of the model; then, training and debugging are carried out to obtain an optimal intelligent fault diagnosis model, and model weight parameters are extracted and stored; then, completing hardware realization of an optimal intelligent diagnosis model on the FPGA, and performing hardware accelerated design optimization based on cyclic expansion to accelerate the calculation speed; and finally, the design work of the rear end of the chip is completed, and the intelligent fault diagnosis chip of the aircraft engine with high fault diagnosis accuracy and high calculation speed is realized.

2. The technical scheme is as follows:

to achieve the above purpose, the present application will be implemented by using the following technical solutions:

an aircraft engine fault diagnosis method based on an intelligent chip technology comprises the following steps:

step 1, acquiring original fault data of an aircraft engine during operation, wherein the original fault data is acquired by a sensor; carrying out primary preprocessing on the original fault data to obtain second fault data; the first preprocessing comprises visual analysis, fault label marking and data set division; the data set division comprises a training set, a test set and a verification set;

step 2, performing data set distribution consistency analysis on the second fault data obtained in the step 1 by utilizing kernel density estimation to obtain third fault data; the third fault data is data with inconsistent data characteristics which are removed, so that the data of the verification set and the test set in the step 1 are ensured to cover the key data characteristics in the data of the training set;

step 3, carrying out second preprocessing on the third fault data in the step 2 to obtain fourth fault data; the second preprocessing comprises normalization processing based on system operation conditions and exponential weighting smoothing processing;

step 4, designing an intelligent fault diagnosis model based on a deep neural network; the intelligent fault diagnosis model consists of two parts, wherein the first part adopts a first convolution kernel to extract local data features, and the second part adopts a second convolution kernel to extract global data features; each part comprises an input layer, a CNN module, a SAM module, an LSTM module and a full connection layer;

step 5, using the training set and the verification set in the fourth fault data in the step 3 as input data of the intelligent fault diagnosis model in the step 4, using sparse classification cross entropy as a loss function during model training, using Adam as an optimizer, using accuracy as an evaluation index of the training model, performing multiple times of model training and super-parameter adjustment, determining the structural depth of the network, the number, the size and the step length of convolution kernels, the number of neurons and an activation function, and obtaining a primarily optimized intelligent fault diagnosis model;

step 6, taking the test set in the fourth fault data in the step 3 as the input of the preliminarily optimized intelligent fault diagnosis model in the step 5, testing, returning to the step 5 again according to the output result to adjust the super parameters of the model until the output result of the model reaches the expected error standard acceptance criterion, finally obtaining the optimal intelligent fault diagnosis model, and storing the weight parameters of the optimal intelligent fault diagnosis model;

step 7, carrying out hardware IP core design based on FPGA on the optimal intelligent fault diagnosis model in the step 6 in design software; the hardware IP core design comprises a code design of a convolution calculation IP core, a pooling calculation IP core, an activation function IP core and a full-connection calculation IP core;

step 8, performing hardware accelerated optimization design based on cyclic expansion on the hardware IP core in the step 7, and verifying and synthesizing to obtain an optimized circuit file; the hardware acceleration optimization design based on loop unrolling comprises the steps of performing loop unrolling on computation codes of a convolution layer, an activation function, a full connection layer and a pooling layer, and optimizing corresponding computation circuit designs;

and 9, performing chip rear end design work such as testability design, layout planning, clock tree synthesis, wiring, parasitic parameter extraction, layout physical verification and the like on the circuit file in the step 8, and realizing the intelligent fault diagnosis chip of the aircraft engine with high fault diagnosis accuracy and high calculation speed.

The specific steps of step 1 are as follows:

1.1, the original fault data, namely various fault data collected by a plurality of different types of sensors, comprises temperature state parameters, pressure state parameters, rotating speed state parameters, bleed air flow parameters and lubricating oil detection parameters of an aircraft engine, and the parameters are used as input data of an intelligent fault diagnosis model based on a deep neural network;

1.2, visualizing data, visually representing each sensor data of each fault in a visualized form by using a Python language, analyzing the change trend of the data, eliminating parameters which do not influence the system performance, and reserving parameters containing fault information as input of model training;

1.3, setting a fault label for the data of the 1.2 according to the fault type:

various kinds of fault tags are set in a digitally encoded form: [0,1,2, ·,5], wherein 0 represents a fault label when the aircraft engine operates normally, 1 represents a fault label when the temperature parameter of the aircraft engine is abnormal, 2 represents a fault label when the pressure state parameter of the aircraft engine is abnormal, 3 represents a fault label when the rotating speed state parameter of the aircraft engine is abnormal, 4 represents a fault label when the induced air flow parameter of the aircraft engine is abnormal, and 5 represents a fault label when the lubricating oil detection parameter of the aircraft engine is abnormal;

1.4 the data which is visually analyzed and marked with the fault label is divided, namely all the data are averagely divided into 10 parts, the data volume of each part is ensured to be the same, 6 parts are randomly selected from the data to be used as a training set, 2 parts are randomly selected from the remaining 4 parts to be used as a verification set, and finally the remaining 2 parts are used as a test set.

The kernel density estimation in step 2 is to estimate the probability density function of the given data, and the consistency of the data distribution in the verification set, the test set and the training set is ensured after the kernel density distribution analysis is divided, so that the consistency of the data characteristics in each data set is ensured. Taking one-dimensional data as an example, assume that there are n data: x is the number of ₁ ，x ₂ ，x ₃ ，…，x _n The probability density function of this data is f (x).

The formula of the kernel density function is:

wherein x _i Denotes the ith data, n is the data amount, f (x) _i ) Is the probability density function of the ith data, and h is the bandwidth in the kernel density estimate.

The specific steps of performing the second preprocessing on the data in the step 3 are as follows:

3.1 carrying out normalization processing on the data based on the operation condition of the aircraft engine, eliminating the influence of dimension and enhancing the stability of the data. Because a certain fault may exist under different operating conditions, the data at this time may show different data characteristics due to different operating conditions, and in order to better extract the variation characteristics of the data, the fault data under the same operating condition needs to be classified and then normalized, and the formula is as follows:

wherein x _i Representing a data sample (x) ₁ ，x ₂ ，x ₃ ，…，x _n ) The ith data, X _i ∈[0,1]And is dimensionless, x _i Normalized data, x _j Representing a data sample (x) ₁ ，x ₂ ，x ₃ ，…，x _n ) J (th) data, max _1≤j≤n {x _j Is x ₁ ，x ₂ ，x ₃ ，…，x _n Maximum value of (1), min _1≤j≤n {x _j Is x ₁ ，x ₂ ，x ₃ ，…，x _n Minimum value of (1);

3.2 because the original fault data contains a lot of noises, in order to improve the accuracy of the data and increase the generalization ability of the model, the data needs to be subjected to exponential weighting smoothing processing to filter noise interference in the data, and the formula is as follows:

G(t)＝α*g(t)+(1-α)*G(t-1)

g (t) is a value obtained after smoothing processing is carried out on G (t) at the current moment, G (t) is a data value acquired by a certain sensor at present, G (t-1) is a result of smoothing processing at the last moment, alpha is the intensity of the smoothing processing, and the value range is between 0 and 1.

The intelligent fault diagnosis model design structure based on the deep neural network in the step 4 is as follows:

4.1 the input layer of the model mainly takes all sensor data of t cycles of the system as the input of the model;

4.2 the input layers are followed by a module of a convolutional neural network CNN, which is composed of two one-dimensional convolutional layers combined with a maximum pooling layer, and between each convolutional layer a Dropout layer is added for preventing overfitting, and its output coefficient is 0.3. The configuration parameters of both convolutional layers are the same, in order to ensure that the output characteristic size after the convolution operation is unchanged, the height of the convolution kernel is 3, the number of the convolution kernels is 128, the moving step length is 1, and all-zero padding is used. The activation function uses the ReLU to implement a non-linear mapping of different features. The pooling layer size in each combination is 2, with the others configured as defaults to achieve a halving of the size of the output feature. The output size of the last module is S N, where N is the characteristic dimension, equal to the number of convolution kernels of the last convolutional layer. S is the length of the output sequence, and the calculation formula is as follows when the convolution kernel shift step is 1:

S＝W-H+1

wherein W is the sequence length of the input data, and H is the height of the convolution kernel;

4.3 the output data of the CNN module is then input to the self-attention mechanism SAM module. Firstly, the output of the CNN module is integrated with global features by utilizing global maximum pooling and global average pooling, and the output size N is output after global pooling. Then, the same two layers of perceptrons are respectively utilized to learn N characteristics of global pooling output, the relationship and the information content between the N characteristics are obtained, and in order to ensure that the output size of the perceptron is still N, the number of the neurons in the second layer must be the same as the output dimension of the CNN module. Then combining the two perceptrons using element summationThen normalizing the sum by a Softmax function to obtain the attention weight alpha of different characteristics _N . Secondly, because the hole convolution utilizes the expansion rate to enlarge the distance between each value when the convolution kernel processes data, the expansion of the receptive field is realized, and the extraction of long-distance information is facilitated, therefore, the extraction of important information is carried out on the output of the CNN module by utilizing the two hole convolutions. In order to ensure that the output size after convolution is unchanged, the sizes, the moving step lengths and the filling configurations of convolution kernels are the same as those of a CNN module, the number of the convolution kernels is the characteristic dimension of input data, and the expansion rate is 3. Then, the self-attention weight beta of the input sequence is obtained by normalization through a Softmax function _S . And then with the already obtained S alpha _N N, performing dot multiplication to obtain operation cycles with different importance;

4.4 connecting the long-time memory network LSTM module with the SAM module. The LSTM module is provided with an LSTM layer, the number of neurons of each gate of the LSTM layer is 128, an activation function is Tanh, and the fault diagnosis network model can learn time information in data characteristics in time. In order to reduce the sensitivity of the prediction model to the tiny change of data, a Dropout layer with a node output coefficient of 0.2 is added behind an LSTM layer, and 20% of neurons in the LSTM layer are assigned with zero weight to prevent the model from being over-fitted, so that the accuracy of the intelligent fault diagnosis model is further improved;

and 4.5, finally, connecting a full connection layer behind the LSTM module to realize regression analysis on the information extracted by each module. Since the model has multi-scale inputs, the sum and average of the analysis results of the different parts is used as the final fault diagnosis result. The calculation formula of each neuron in the full junction layer is as follows:

wherein

Is the output of the d-th neuron of the l-th layer, F is the activation function, z _l-1 Being layer l-1 neuronsThe number of the cells is equal to or greater than the total number of the cells,

is the weight of the c-th neuron at layer l-1 at layer l,

is the output of the c-th neuron at layer l-1,

is the bias of the d-th neuron of layer i.

The specific steps of training the model, adjusting the hyper-parameters, obtaining a diagnosis model and the like in the step 5 are as follows:

5.1, taking a training set and a verification set of the data as the input of the intelligent fault diagnosis model, wherein the verification set is input into the model after the training of the training set is finished so as to verify the generalization capability of the model;

5.2 the loss function in model training uses sparse classification cross entropy, and the formula is as follows:

where m is the total number of samples, k is the number of fault classifications, p _ek True probability of occurrence of class k failure for the e-th sample, q _ek Predicting the k-th type fault occurrence probability of the e-th sample;

5.3 the optimizer of model training uses Adam, and takes the accuracy as the network evaluation index;

5.4 after the above points are set, the compiling model carries out repeated iterative training to determine the following hyper-parameters:

the method comprises the steps of calculating the structural depth of an intelligent fault diagnosis model based on a deep neural network, the number, the size and the step length of convolution kernels, a Dropout proportionality coefficient, the number of neurons, an activation function, iteration times, the batch times of putting the model in each time, the smoothing intensity of weighted smoothing processing and the width of input data;

5.5 in the training process, the loss values of the iteration test set and the verification set at each time can be output to determine whether the model is over-fitted, and whether 5.4 steps of adjusting the super parameters are needed to be returned or not by combining the accuracy analysis of each fault classification result, so that the fault classification result is the best through repeated debugging.

The specific steps of step 6 are as follows:

and inputting the test set data into the primarily optimized intelligent fault diagnosis model, and further verifying the generalization capability of the model. If the classification result of the test set is not ideal and cannot meet the expected error standard acceptance criterion, the model training still has problems, possibly under-fitting or over-fitting conditions, the steps are required to be returned to 5.4-5.5, the training and debugging are carried out again, after the output result of the test set meets the requirements, the final optimal model is obtained at the moment, and finally the weight of the model is saved.

The hardware IP core design method in the step 7 comprises the following specific steps:

7.1 using a vivado HLS tool to respectively establish project projects corresponding to a convolution calculation IP, a pooling calculation IP, an activation function IP and a full-connection calculation IP, and realizing the functions of the IPs by using C language;

7.2 selecting the model of the FPGA chip, firstly designing a convolution calculation IP core, defining a convolution kernel function, declaring a weight array, an input array, an output array and a bias array, and compiling a convolution algorithm code;

7.3 designing pooling calculation IP to realize 2x2 maximum pooling, declaring that a cache array is used for caching comparison results, using three cache arrays to compare input data pairwise, storing the input data in the two cache arrays, then comparing the input data for the second time in a second clock period, and storing final results in a third cache array;

7.4 designing an activation function IP, mainly declaring a cache array for storing comparison results, comparing input array elements with 0 one by one, and storing results into the cache array;

7.5 designing a full connection calculation IP, declaring an input array, an output array, a weight array and a bias array used by a full connection layer, and compiling a corresponding full connection layer algorithm code;

7.6, adding an AXI interface declaration code in the code compiling process, so that an IP core of the AXI interface is automatically generated during HLS synthesis, and parameters of each IP core function can be transmitted by using an AXI bus interface;

7.7 add preprocessing instruction of HLS to core computation or comparison code in each IP core code, and add array partition instruction to declared array, so that pipeline operation and parallelization operation of computation are realized when HLS is synthesized.

The specific steps of step 8 are as follows:

8.1 analyze convolutional layer loop code. The typical convolution operation can be regarded as multiplication and accumulation operation containing 6 layers of for loop nesting, the loop factors are cho, chi, row, col, kr and kc (cho represents an output characteristic diagram channel, row and col represent output characteristic diagram sizes, chi represents an input characteristic diagram size, kr and kc represent convolution kernel sizes), three arrays participating in the operation are three-dimensional data out [ chout ] [ R ] [ C ] (output data), four-dimensional arrays weight [ chout ] [ chi ] [ K ] [ K ] (weight parameter), and three-dimensional arrays input [ chi ] [ S × R + K ] [ S × C + K ] (input data);

8.2 selecting an input characteristic diagram channel and an output characteristic diagram channel as the cyclic expansion parameters of the convolutional layer through theoretical analysis;

8.3 aiming at the parallel calculation of the input characteristic diagram channel and the output characteristic diagram channel of the convolution layer, a multiply-accumulate array circuit is designed, and multiply-accumulate parallel operation of a plurality of groups of circularly expanded data is realized. For Processing Element units (PEs) of multiply-accumulate, each PE Unit may complete convolution operation of multiple input feature map channels of one output feature map channel, and if parallel convolution operation of output feature map channels is to be implemented, a multiply-accumulate array composed of multiple PEs is required to complete the convolution operation. In the array, the storage module storing the input feature map input needs to transmit elements of multiple input channels to multiple PE units at the same time, that is, the input feature map data received by each PE is the same, so the storage module of the input feature map is still split according to the input channels;

8.4 analyze the activation function code. Because the ReLU activation function only relates to the size comparison with the value 0, hardware only needs to realize a comparator, and the complex exponential calculation or even logarithmic calculation needs to be realized in a lookup table mode unlike other activation functions, therefore, the ReLU activation function is mainly used in the application, and the output characteristic diagram channel is selected to be circularly expanded according to the characteristics of the ReLU activation function;

8.5 analyze the fully-connected layer computation code. Because the input and output dimensions of the full connection layer are both 1, the data input to the full connection layer is one-dimensional data, the essence of the full connection layer is to change the array size into an array meeting the requirements through matrix multiplication, and finally output, therefore, for the full connection layer, the output characteristic channel is selected to be circularly expanded;

8.6 analyze the pooling layer's computing code. Because the size of the output feature graph of the pooling layer can be continuously changed along with the depth of the pooling layer, the output feature size is selected to be circularly expanded;

8.7 authentication includes C code functional authentication and IP core post-synthesis authentication. C, code verification, namely writing a main function for verification, namely a testbench function, calling the IP core designed in the step 7 according to the previously designed neural network model to form a corresponding model, and importing fault data to be processed and stored model parameters for verification;

and 8.8 after the function verification of the C code, synthesizing an IP core, then performing C & RTL joint simulation, and obtaining a circuit file which can be subjected to chip back-end design after verification.

The specific steps in step 9 are as follows:

9.1 Design For Test (DFT) is to insert scan chain into the circuit file in step 8, change the non-scan unit (such as register) into the scan unit, and use the software tool as DFT Compiler from Synopsys;

9.2 layout planning is to place macro-unit modules needed by the chip in a circuit file, determine the placement positions of various functional circuits on the whole, such as IP modules, RAM, I/O pins and the like, which can directly influence the final area of the chip, and use a software tool Astro of Synopsys company;

9.3 Clock Tree Synthesis (CTS), in a circuit file, reasonably completing the wiring of a Clock, and symmetrically connecting the Clock to each register unit, so that when the Clock reaches each register from the same Clock source, the Clock delay difference is minimum, and the used software tool is the Physical Compiler of Synopsys company;

9.4 wiring (Place & Route), which is used for completing common signals in a circuit file and comprises the wiring among various standard units (basic logic gate circuits), wherein the used software tool is Astro of Synopsys company;

9.5 parasitic parameter extraction is that signal noise, crosstalk and reflection can be generated inside a chip due to resistance of the wires, mutual inductance between adjacent wires, coupling capacitance and the like, signal voltage fluctuation and change are caused, and signal distortion error can be caused if the signal voltage fluctuation and the signal distortion error are serious, so that parasitic parameters need to be extracted, analysis and verification are carried out again, the problem of signal integrity is solved, and a used software tool is Star-RCXT of Synopsys company;

9.6 the physical verification of the layout is to perform functional and time sequence verification on the physical layout for completing the wiring, and comprises the comparison verification of the gate-level circuit diagram after layout and logic synthesis, design rule check (checking whether the wiring distance, the wiring width and the like meet the chip process requirements) and electrical rule check (checking electrical rule violations such as short circuit, open circuit and the like), wherein the used software tool is Hercules of Synopsys company.

9.7 after the design of the back end of the chip is finished, the file of the GDS II can be delivered to a chip factory to finish the manufacture, packaging and test of the chip, and the intelligent fault diagnosis chip of the aircraft engine, which is designed by the embodiment, is high in fault diagnosis accuracy and high in calculation speed.

3. Has the advantages that:

compared with the prior art, the aircraft engine fault diagnosis method based on the intelligent chip technology can produce the following beneficial effects:

(1) And 4, deep coupling the SAM with the CNN and the LSTM network, and designing and realizing an intelligent fault diagnosis model with high fault diagnosis accuracy and strong generalization capability. The SAM module designed by the application can concern the relationship among the sensor parameters and the declining information quantity of different operation conditions, can obtain different importance weights of input data characteristics to highlight important declining information in the sensor parameters and inhibit unimportant information, and can also obtain importance weights in different states to highlight declining information when the system performance is reduced and reduce the influence of the system in a normal state.

(2) And 8, carrying out careful analysis on algorithm codes of the convolutional layer, the activation function, the full connection layer and the pooling layer, and providing a hardware acceleration optimization design method based on cyclic expansion, so that the calculation speed of the intelligent fault diagnosis chip of the aircraft engine is increased with less hardware resource consumption. For the convolution layer, circularly spreading an input characteristic diagram channel and an output characteristic diagram channel, and designing a special multiply-accumulate array circuit to realize parallel calculation; carrying out cyclic expansion of the output characteristic diagram channel for the activation function and the full connection layer; and for the pooling layer, selecting an output characteristic size for cyclic expansion by combining the characteristics of the algorithm.

Drawings

Fig. 1 is a flow chart of the design of an intelligent fault diagnosis chip of an aircraft engine in the present application.

Fig. 2 is a structural diagram of an intelligent fault diagnosis model based on a deep neural network in the present application.

Fig. 3 is a diagram of a SAM module structure in the present application.

FIG. 4 is a schematic diagram of a multiply-accumulate array for parallel computing according to the present application.

Fig. 5 is a structure diagram of an aircraft engine intelligent fault diagnosis chip in the present application.

Detailed Description

For a better explanation of the present application, reference will now be made in detail to the accompanying drawings.

Fig. 1 shows a key process of designing an aircraft engine intelligent fault diagnosis chip according to the present application: firstly, carrying out primary preprocessing (including visual analysis, fault label marking and data set division) on original fault data of an aircraft engine, and then carrying out data set distribution consistency analysis, normalization and index weighting smoothing to obtain a processed training set, a processed test set and a processed verification set; then, designing an intelligent fault diagnosis model based on a deep neural network, inputting a training set and a verification set into the model, and obtaining a preliminarily optimized intelligent fault diagnosis model through multiple times of model training and hyper-parameter adjustment; inputting the test set into a preliminarily optimized intelligent fault diagnosis model, judging whether the output of the model meets an expected error precision acceptance criterion, and if not, continuously adjusting the super-parameters of the model until the output meets the requirement to obtain an optimal intelligent fault diagnosis model; then, carrying out hardware IP core design on the optimal intelligent fault diagnosis model, and realizing the optimization design based on cyclic expansion; and finally, finishing the chip design work to obtain the intelligent fault diagnosis chip.

The design flow chart of the intelligent fault diagnosis chip of the aircraft engine shown in fig. 1 is combined to construct an intelligent fault diagnosis model of the aircraft engine based on a deep neural network, and the method specifically comprises the following implementation steps:

step 1, acquiring original fault data of an aircraft engine during operation, wherein the original fault data is acquired by a sensor of the aircraft engine; performing primary preprocessing on the original fault data to obtain second fault data; the first preprocessing comprises visual analysis, fault label marking and data set division; the data set division comprises a training set, a testing set and a verification set.

The method comprises the steps of collecting various fault data collected by a sensor in the running process of an engine to serve as original fault data of an intelligent fault diagnosis model. The various fault data collected by the sensor comprise: temperature state parameters, pressure state parameters, rotating speed state parameters, bleed air flow parameters and lubricating oil detection parameters of the engine. They are used as input data of an intelligent fault diagnosis model based on a deep neural network.

The visual analysis is to use Python language to perform data visual operation, that is, each sensor data of each fault is visually represented in a visual form, the variation trend of the data is analyzed, the sensor data without fault information is removed, and the data with the fault information is reserved and used as the input of model training.

The label of the marked fault is that various faults are represented in a digital coding mode, and the label is [0,1,2 ]. Cndot.5 ], wherein 0 represents a fault label when the aircraft engine operates normally, 1 represents a fault label when the temperature parameter of the aircraft engine is abnormal, 2 represents a fault label when the pressure state parameter of the aircraft engine is abnormal, 3 represents a fault label when the rotating speed state parameter of the aircraft engine is abnormal, 4 represents a fault label when the induced airflow parameter of the aircraft engine is abnormal, and 5 represents a fault label when the lubricating oil detection parameter of the aircraft engine is abnormal. By the method, the complicated engine fault types can be mapped into corresponding numbers, and maintenance personnel can find out which faults of the aircraft engine occur conveniently in time.

The data set division refers to the division of data subjected to visual analysis and labeled fault labels, namely, the data are averagely divided into 10 parts, the data volume of each part is ensured to be the same, 6 parts are randomly selected from the data to serve as a training set, 2 parts are randomly selected from the remaining 4 parts to serve as a verification set, and finally the remaining 2 parts are taken as a test set. The training set is used for training the intelligent fault diagnosis model based on the deep neural network, the verification set is used for determining the network structure and the adjustment hyper-parameters of the model, and the test set is used for detecting whether the finally designed intelligent fault diagnosis model achieves the expected design target. The purpose of dividing all data into 10 shares is to ensure that the amount of data contained in each share is the same, and therefore the characteristics of the data contained in each share are also comparable. The data required by the training model is the most, so the training set accounts for the most, and secondly, the ratio of the training set to the testing set is 6.

Step 2, performing data set distribution consistency analysis on the second fault data obtained in the step 1 by utilizing kernel density estimation to obtain third fault data; and the third fault data is data with inconsistent removed data characteristics, so that the data of the verification set and the test set in the step 1 are ensured to cover the key data characteristics in the training set data.

Before designing an intelligent fault diagnosis model based on a deep neural network, a training set, a verification set and a test set in a data set need to be subjected to distribution consistency analysis, and the three subsets are ensured to keep the same data characteristics in data composition. If the data in the validation set and the training set are similar in distribution to the training data, the generalization of the model is better, i.e., it adapts better to the data that has not occurred. It is of course not required to be exactly the same as the distribution of the training data, but it should not differ too much from the training data. The method uses kernel density estimation to detect whether the distribution conditions of the training set and the test set are consistent.

The kernel density estimation is to estimate the probability density function of the given data, and the data distribution in the verification set, the test set and the training set which are divided by the kernel density distribution analysis is consistent. Taking one-dimensional data as an example, assume that there are n data: x is a radical of a fluorine atom ₁ ，x ₂ ，x ₃ ，…，x _n The probability density function of this data is f (x). The kernel density function is formulated as:

By adopting the data visualization analysis in the step 1, whether the distribution of each parameter of the training set is consistent with the distribution of each parameter of the test set or not can be visually detected, namely the probability density curve distribution is relatively consistent, so that the test set data covers the characteristics in the training set data, and the generalization capability of the constructed model is further ensured.

Step 3, carrying out second preprocessing on the third fault data in the step 2 to obtain fourth fault data; the second preprocessing comprises normalization processing based on the system operation condition and exponential weighting smoothing processing.

As can be seen from the analysis of the parameters of the sensors in the data set, the data difference between the parameters is large, and large-dimension data can dominate the prediction of the deep neural network model, so that small-dimension data in the data does not work, and the model prediction result is poor. In addition, there may be singular samples in the data that cause the network model training time to increase and may cause the model to fail to converge. Therefore, normalization processing is required for the data input to the model to eliminate the influence of dimensions and singular samples in the parameters.

Because a certain fault of the engine may exist under different operating conditions, the data at this time may have different characteristics expressed by different operating condition data, and in order to better extract the variation characteristics of the data, the fault data under the same operating condition needs to be classified and then normalized, and the formula is as follows:

wherein x _i Representing a data sample (x) ₁ ，x ₂ ，x ₃ ，…，x _n ) The ith data, X _i ∈[0,1]And is dimensionless as x _i Normalized data, x _j Representing a data sample (x) ₁ ，x ₂ ，x ₃ ，…，x _n ) J (th) data, msx _1≤j≤n {x _j Is x ₁ ，x ₂ ，x ₃ ，…，x _n Maximum value of (1), min _1≤j≤n {x _j Is x ₁ ，x ₂ ，x ₃ ，…，x _n Minimum value of (1).

However, if the state parameters monitored by the sensors are simply normalized, the state parameters do not conform to the actual operation condition of the engine. In most documents, most researchers do not consider the point, only simple normalization processing is carried out on each data set, and operation condition information in the data sets is directly removed by useless information.

Therefore, in the normalization based on the working conditions, all monitoring records of different operating conditions are firstly classified to be used as a reference for data processing under the same operating conditions, and then the sensor parameters are scaled by using the normalization operation. This allows parameters under different conditions to be somewhat level and comparable, and processing in this manner is very efficient if the sensor data shows similar behavior but is centered on different means.

Because the original data contains a lot of noises, in order to improve the accuracy of the data, the data needs to be subjected to exponential weighting smoothing processing to filter noise interference in the data, and the formula is as follows:

G(t)＝α*g(t)+(1-α)*G(t-1)

g (t) is a value obtained after smoothing processing is carried out on G (t) at the current moment, G (t) is a data value acquired by a certain sensor at present, G (t-1) is a smoothing processing result at the last moment, alpha is the smoothing processing intensity, and the value range is between 0 and 1.

And removing noise in the data by using exponential weighted smoothing processing on the data after the normalization processing based on the working condition. The smoothing weight is set to be 0.4, so that the fluctuation of data can be greatly reduced while the change characteristics of the data are kept, and the noise in the data is effectively reduced. By testing the data before and after processing, the root mean square error of the intelligent fault diagnosis model on the denoised test set is reduced by 18.1% compared with that of the test set which is not processed.

Fig. 2 shows an intelligent fault diagnosis model structure based on a deep neural network designed by the present application. The model consists of an input layer, a CNN module, a SAM module, an LSTM module and a full connection layer. The input layer is responsible for transmitting fault data to a following module; the method comprises the steps of extracting local features of fault data by using a small convolution kernel of 3 × 3, extracting global features of the fault data by using a large convolution kernel of 7 × 7, wherein a CNN module consists of a convolution layer-Dropout layer-convolution layer-maximum pooling layer; in order to extract the relation among the parameters of each sensor and the decline information quantity of different operating conditions, designing an SAM module, and further processing the output data of the CNN module; capturing time information characteristics in fault data by using an LSTM module, wherein the LSTM module consists of an LSTM layer, an activation function and a Dropout layer; the full connection layer is responsible for carrying out regression analysis on the extracted information of the modules, changing the output data of the LSTM module into one dimension, and outputting a fault diagnosis result after activating a function, summing and averaging.

In combination with the structure diagram of the intelligent fault diagnosis model of the aircraft engine based on the deep neural network in fig. 2, the intelligent fault diagnosis model of the CNN-SAM-LSTM with multiple scales is designed, and the specific steps are as follows:

step 4, designing an intelligent fault diagnosis model based on a deep neural network; the intelligent fault diagnosis model consists of two parts, wherein the first part adopts a first convolution kernel to extract local data features, and the second part adopts a second convolution kernel to extract global data features; each section includes an input layer, a CNN module, a SAM module, an LSTM module, and a fully connected layer.

The first convolution kernel is a small convolution kernel, and local information can be better extracted. The size of the small convolution kernel selected in this application is 3 x 3. The second convolution kernel is a large convolution kernel and is more suitable for data with wide information distribution. The large convolution kernel size chosen in this application is 7 × 7. The size of the CNN receptive field is determined by the size of the convolution kernel, i.e., it determines the extraction of important information. It is generally believed that small-sized convolution kernels extract local information better than large convolution kernels, but for the original fault data of the engine, the information changes with the operating conditions. When the aircraft engine monitoring system is put into operation at the beginning, data recorded by each sensor are quite stable in a long period of time, at the moment, no decline information is contained in the data, but with the increase of the service time, the performance of the aircraft engine is worse and worse due to the occurrence of faults, more decline information is generated in the monitoring data, and the data monitored by different sensors have different characteristics. Therefore, the raw fault data for the engine has a more global nature of distribution. In order to synthesize the strong terms of the large and small convolution kernels, the method adopts convolution kernels with different scales to establish a multi-channel model when designing the intelligent fault diagnosis model based on the DNN.

The input layer mainly takes all sensor data of t cycles of the system processed by the steps 1 to 3 as intelligent DNN-based fault diagnosis model input.

The input layer is followed by the CNN module. The module consists of two convolutional layers combined with a max-pooling layer, and incorporates a Dropout layer between each convolutional layer to prevent overfitting. The configuration parameters of the two convolution layers are the same, in order to ensure that the size of the output feature after convolution operation is unchanged, the moving step length of a convolution kernel is 1, all-zero filling is used, the ReLU is used as an activation function, and nonlinear mapping of different features is realized. The pooling layer size in each combination is 2, with the others configured as defaults to achieve a halving of the size of the output feature. The output size of the last CNN module is S x N, wherein N is a characteristic dimension and is equal to the number of convolution kernels of the last convolution layer; s is the length of the output sequence.

Under the condition that the convolution kernel moving step is 1, the calculation formula of the convolution module output data is as follows:

S＝W-H+1

where W is the sequence length of the input data and H is the height of the convolution kernel.

The output data of the CNN module is then input to the SAM module, which is structured as shown in fig. 3.

Fig. 3 shows the structure of the self-attention mechanism SAM module designed in the present application. The module respectively performs global maximum pooling and global average pooling on the output data of the CNN module, sums the output data and the global maximum pooling and global average pooling, obtains attention weights with different characteristics through a Softmax function, and performs dot multiplication on the attention weights and the output data of the CNN module; in addition, the module also processes the output data of the CNN module with a Softmax function after two times of cavity convolution, obtains the self-attention weight of the input sequence, and performs dot multiplication with the calculation result to obtain the output data of the SAM module.

The output of the CNN module contains a sequence of multi-sensor parameters. Each series of characteristic data represents a parameter which is monitored by different sensors when the aircraft engine is running. When the faults are different, the performance degradation degree of the engine is different, the degradation information contained in the monitoring data is different, when the faults are the same, important degradation information is contained in the related sensors, and only a small amount of information may be useful in the parameters monitored by other sensors.

Therefore, the SAM module among different monitoring parameters, namely the self-attention mechanism module, is designed to identify the difference among the different monitoring parameters of the engine, highlight the important decline information, and obtain the importance weights in different states so as to highlight the decline information when the system performance is reduced and reduce the influence of the system in a normal state.

Firstly, the output of the CNN module is integrated with global features by utilizing global maximum pooling and global average pooling, and the output size N is output after global pooling. Then, the same two perceptrons are respectively utilized to learn N characteristics output by global pooling, and the relationship and the information content between the N characteristics are obtained. In order to ensure that the output size of the perceptron is still N, the number of the neurons in the second layer must be the same as the output dimension of the CNN module. Then, element summation is utilized, the outputs of the two perceptrons are combined, and normalization is carried out through a Softmax function to obtain the attention weight alpha with different characteristics _N . Finally, multiplying the attention weight by the output of the CNN module to obtain characteristic information with different importance, namely alpha _N The larger the size, the more important the regression information contained in the corresponding features after multiplication.

During the whole process from operation to failure of the aircraft engine, the decline information is only contained in the data of the sensor after the failure occurs, and the decline information is hardly contained in the normal operation stage before the failure occurs, so that attention needs to be paid to the operation cycle in which the failure occurs in the input sequence. Because the hole convolution utilizes the expansion rate to enlarge the distance between values when the convolution kernel processes data, the perception visual field is enlarged, and the extraction of long-distance information is facilitated, the two hole convolutions are utilized to extract important information from the output of the CNN module. In order to ensure that the output size after convolution is unchanged, the sizes, the moving step lengths and the filling configurations of convolution kernels are the same as those of a CNN module, the number of the convolution kernels is the characteristic dimension of input data, and the expansion rate is 3. Then normalized using the Softmax functionObtaining a self-attention weight beta of the input sequence _S . Will beta _S With already obtained S x alpha _N N performs dot multiplication to obtain operation cycles with different importance, namely output beta of SAM module _S S*α _N And N is added. The larger the value, the more important the operating cycle at this point in time is for the fault diagnosis of the aircraft engine.

The LSTM module is connected to the SAM module. The LSTM module has an LSTM layer, wherein the number of neurons of each gate is 128, and the activation function is Tanh, so that the fault diagnosis network model can learn time information in data features in time. In order to reduce the sensitivity of the prediction model to small changes of data, a Dropout layer with a node output coefficient ratio of 0.2 is added behind an LSTM layer, and 20% of neurons in the network are assigned with zero weight to prevent the model from being over-fitted, so that the accuracy of the intelligent fault diagnosis model is further improved.

And finally, connecting a full connection layer behind the LSTM module to realize regression analysis of the extracted information of each module in the front. Since the intelligent fault diagnosis model has multi-scale input, the sum and average of the analysis results of different parts are used as the final fault diagnosis result. The calculation formula of each neuron in the full junction layer is as follows:

wherein

Is the output of the d-th neuron of the l-th layer, F is the activation function, z _l-1 The number of layer l-1 neurons,

is the weight of the c-th neuron at layer l-1 at layer l,

is the output of the c-th neuron at layer l-1,

is the bias of the d-th neuron of layer i.

And 5, taking the training set and the verification set in the fourth fault data in the step 3 as input data of the intelligent fault diagnosis model in the step 4. During model training, sparse classification cross entropy is used as a loss function, adam is used as an optimizer, fault diagnosis accuracy is used as an evaluation index of a training model, multiple times of model training and super-parameter adjustment are carried out, the structural depth of a network, the number, the size and the step length of convolution kernels, the number of neurons and an activation function are determined, and a primarily optimized intelligent fault diagnosis model is obtained.

The loss function in model training uses sparse classification cross entropy, and the formula is as follows:

where m is the total number of samples, k is the number of fault classifications, p _ek True probability of occurrence of class k failure for the e-th sample, q _ek The predicted probability of occurrence of a kth class failure for the e-th sample.

TABLE 1 DNN-based partial hyper-parametric adjustment space for intelligent fault diagnosis model

The optimizer uses Adam, the accuracy is used as an evaluation index of a training model, and after the points are set, the model is compiled to perform repeated iterative training to determine the following hyper-parameters:

the structure depth of the intelligent fault diagnosis network, the number, the size and the step length of convolution kernels, a Dropout proportionality coefficient, the number of neurons, an activation function, iteration times, the batch times of putting a model in each time, the smooth strength of weighted smoothing processing and the width of input data. The adjustment space for the partial hyperparameters is shown in table 1.

TABLE 2 random search over-parameter tuning partial results

In the training process, the loss value of each iteration test set and verification set can be observed or drawn to determine whether the model is over-fitted, and whether retraining and parameter adjustment are needed or not is carried out by combining the accuracy analysis of each fault classification result, so that the fault classification result is the best through repeated debugging. And (3) searching 120 times by using a random search algorithm, and selecting 3 times with the best parameter adjusting effect, wherein the searching is performed according to the table 2. The MSE is the mean square error of the model on the verification set, alpha represents the smoothing intensity of exponential weighting smoothing, S is the number of convolution kernels in a CNN module, epochs is iteration times, M _ n is the node number of neurons in the first layer of the perceptron, D _ o is the fraction ratio of node output in a Dropout layer, L-n is the neuron number of an LSTM module, D _ n is the neuron node number in a full connection layer, batch _ size is batch times, and S is the length of an input sequence. As can be seen from table 2, the MSE is minimum during the 97 th extraction training, and the corresponding super-parameter is the optimal model parameter in the random search times.

And 6, taking the test set in the fourth fault data in the step 3 as the input of the preliminarily optimized intelligent fault diagnosis model in the step 5, testing, returning to the step 5 again according to an output result, modifying the network structure and adjusting parameters until the output result of the model reaches an expected error standard acceptance criterion, finally obtaining an expected optimal intelligent fault diagnosis model, and storing the weight parameters of the optimal intelligent fault diagnosis model.

If the classification result of the test set is not ideal and cannot meet the expected error standard acceptance criterion, the model training still has problems, and the situations can be under-fitting or over-fitting. According to the method, the accuracy of the intelligent fault diagnosis model is evaluated by adopting K-kold cross validation, particularly, the performance of the intelligent fault diagnosis model on new data is tested on a trained model, and the overfitting condition can be reduced to a certain extent; in addition, the model can be further learned from limited data as much as possibleUseful information of (a). And dividing all data sets into r parts by K-kold cross validation, then taking out one part as a validation set each time without repeating during training, taking the rest r-1 parts as a real training set of the model for training, and then calculating the mean square error of the intelligent fault diagnosis model on the validation set. MSE _u And expressing the mean square error of the intelligent fault diagnosis model on the u verification set. Finally, all r parts of MSE _u Averaging to obtain the final mean square error O _r The calculation formula is as follows:

step 7, carrying out hardware IP core design based on FPGA on the optimal intelligent fault diagnosis model in the step 6 in design software; the hardware IP core design comprises code design of a convolution calculation IP core, a pooling calculation IP core, an activation function IP core and a full connection calculation IP core.

The EDA design software mainly used in the method is a Vivado HLS tool, and the design of each calculation IP core of the intelligent fault diagnosis model can be carried out by using C language.

Designing a convolution calculation IP core, defining a convolution kernel function, declaring a weight array, an input array, an output array and a bias array, and writing a convolution algorithm code.

Designing pooling calculation IP to realize maximum pooling of the size 2x2, declaring that a cache array is used for temporarily storing a cache comparison result, comparing input data pairwise by using three cache arrays, storing the input data in the two cache arrays, then comparing the input data for the second time in a second clock period, and storing a final result in a third cache array.

And designing an activation function IP, mainly declaring a cache array for storing comparison results, comparing input array elements with 0 one by one, and storing the results into the cache array.

And then designing a full-connection calculation IP, declaring an input array, an output array, a weight array and a bias array used by a full-connection layer, and writing a corresponding full-connection layer algorithm code.

An AXI interface declaration code is added in the code writing process, so that an IP core of the AXI interface is automatically generated during HLS synthesis, and parameters of each IP core function can be transmitted by using an AXI bus interface. And adding a preprocessing instruction of the HLS to the core calculation or comparison code in each IP core code, and adding an array partition instruction to the declared array, so that the HLS realizes the flow and parallelization of calculation during the synthesis.

Step 8, performing hardware accelerated optimization design based on cyclic expansion on the hardware IP core in the step 7, and verifying and synthesizing to obtain an optimized circuit file; the hardware acceleration optimization design based on the loop unrolling comprises the steps of performing loop unrolling on computation codes on a convolutional layer, an activation function, a full connection layer and a pooling layer, and optimizing corresponding computation circuit designs.

For the convolution layer, the convolution operation can be regarded as a multiply-accumulate operation containing 6 layers of for loop nesting, the loop factors are cho, chi, row, col, kr and kc (cho represents an output characteristic diagram channel, chout represents the number of the maximum output characteristic diagram channels, chi represents the size of an input characteristic diagram, chi represents the size of a maximum input characteristic diagram, row represents a row of the output characteristic diagram, col represents a row of the output characteristic diagram, R represents the row number of the maximum output characteristic diagram, C represents the row number of the maximum output characteristic diagram, kr and kc represent the width and the height of a convolution kernel, the convolution kernels adopted in the application have the same width and height, and K is the height or the width of the maximum convolution kernel), and three arrays participating in the operation are three-dimensional data out [ chout ] [ R ] [ C ] (output data), four-dimensional array weight [ chout ] [ chi ] [ K ] (weight parameter), and three-dimensional input [ chi ] [ R + K ] [ C ] (input data).

The computation code for the convolutional layer is as follows:

for(cho＝0；cho<chout；cho++){

for(chi＝0；chi<chin；chi++){

for(row＝0；row<R；row++){

for(col＝0；col<C；col++){

for(kr＝0；kr<K；kr++){

for(kc＝0；kc<K；kc++){

Out[cho][row][col]+＝weights[cho][chi][kr][kc]*

input[chi][S*row+kr][S*col+kc]；}}}}}}

the calculation code shows that in a single cycle of the convolutional layer, the output data of the current time is the output data of the previous cycle, and the product of the input data of the current time and the weight parameter of the current time is added, so that the multiplication and accumulation calculation of the current time is realized. To complete all computation of the convolutional layer, 6 layers of nested for loops are needed, that is, the multiply-accumulate computation of the total of the cycles of spout × chi × R × C × K is completed according to the characteristics of the output characteristic diagram channel, the input characteristic diagram, the output characteristic diagram, and the convolution kernel.

For the multilayer nested for loop of convolution calculation, loop expansion can be selectively carried out, so that parallel calculation operation can be carried out on a convolution layer when hardware is implemented, and the calculation speed of an intelligent fault diagnosis platform based on an FPGA is increased. However, performing loop unrolling also means that more hardware resources are required. If the loop body of the convolution operation is completely unfolded, the required duplication 10 ^{^5} Even 10 ^{^6} Circuit fraction of an order of magnitude. This cannot be realized for the logic resources of some low-end FPGA chips, and even a high-end FPGA chip will occupy a large portion of the logic resources. Therefore, the characteristics of the algorithm, the task characteristics, the implementation difficulty, the resource consumption and the calculation speed must be comprehensively considered.

If the convolution kernel size is circularly expanded, the width, height, or both of the convolution kernels may be selected to be expanded, which essentially involves parallel computation of each element of the convolution kernel. However, the computational parallelism improved by this method is limited by the convolution kernel size. For example, for a convolution kernel with a size of 3, the design space for realizing parallel computation is limited, that is, the parallelism obtained by expanding the convolution kernel size loop is very limited, which cannot bring about a significant increase in computation speed.

Similarly, if the input feature map size is expanded in a loop, the design space for realizing parallel computation is also limited because the fault data involved in the present application has the same size width and height and the parameters are smaller.

The loop expansion of the output feature map size is essentially parallel computation of the convolution layer output feature map elements, that is, element outputs of a plurality of output feature maps can be obtained at one time. Although a large range of parallelism can be selected, this is the case when the convolutional neural network model is at a relatively advanced convolutional layer. The convolutional layer and the pooling layer in the convolutional neural network can play a role in reducing the size of the output feature map. Therefore, as the convolutional neural network goes deep, the parallelism selection space of the method becomes smaller and disappears, and as the selectable parallelism becomes smaller and smaller, the corresponding hardware circuit is also modified, which results in the need to design hardware circuits with various parallelism to implement the cyclic expansion of the output feature diagram sizes of different convolutional layers, and the design is time-consuming, labor-consuming and inefficient.

The loop unrolling for the input profile channel and the output profile channel is the most suitable loop unrolling for convolutional layer operation. Due to the characteristics of the convolutional neural network, the number of channels of the output feature map is larger and larger along with the depth of the convolutional layer, and because the pooling layer only reduces the size of the output feature map without influencing the number of channels, the number of channels of the output feature map is the same as the number of channels of the input feature map of the next convolutional layer, so that the number of channels of the input feature map and the number of channels of the output feature map are gradually increased along with the convolutional layer, and the selection space of the parallelism degree is larger and smaller and cannot be smaller as the size of the output feature map until the size of the output feature map is not smaller. For most convolutional neural networks, even at the first convolutional layer, the number of output characteristic channels usually takes 16/32/64, which is a relatively large value, so that a larger degree of parallelism can be selected. Although the number of input feature map channels in the first layer may be only 1 or 3, because of the effect of the convolutional layer in the first layer, the number of input feature map channels in the second layer is equal to the number of output feature map channels in the first layer, and the number of input feature map channels from the second layer is a relatively large value, and the input feature map channels can be expanded together with the number of output feature map channels in a loop, that is, a large parallelism design can be realized from the second layer. Thus, only the first layer of the convolutional layer needs special design considerations. The design of the convolution layer of the first layer can adopt the method of only expanding the channel number loop of the output characteristic diagram, and the method of expanding the channel number loop of the input characteristic diagram and the channel number loop of the output characteristic diagram from the second layer.

The main operation of the convolutional layer calculation loop body is as follows: reading corresponding elements in the input array and the weights array for multiplication, then reading corresponding elements in the output array, adding the corresponding elements with the previous products, and writing the final result back to the corresponding position in the output array. Therefore, in order to realize the multiply-accumulate parallel operation of multiple sets of data after the input feature map channel and the output feature map channel are circularly expanded, a multiply-accumulate array needs to be designed, as shown in fig. 4.

FIG. 4 shows a multiply-accumulate array designed to implement parallel computation of input profile channels and output profile channels on convolutional layers according to the present application. If the loop unrolls with a parallelism of 8, the inputs to the multiply-accumulator are 8 elements from the input data and the weight parameter, respectively. The data is output through 8 multipliers and 7 adders, namely the convolution operation result of a plurality of output characteristic diagram channels. These multipliers and adders are referred to as multiply-accumulate PE units. However, to implement the parallel convolution operation of the output feature map channels, a multiply-accumulate array composed of a plurality of PE units is required.

In fig. 4, a part including an adder and a multiplier is referred to as a Processing Element (PE) Unit of a multiply-accumulate array. Assuming that the input channel circularly expands with parallelism of 8, the input to the multiply-accumulator in a PE is 8 elements from the input array and the weights array, respectively. To implement parallel computation of multiply-accumulate, the PE uses 7 adders to satisfy the multiply-and-add parallel operation. Each PE unit can complete convolution operation of a plurality of groups of input feature map channels of one output feature map channel, and if parallel convolution operation of the output feature map channels is to be implemented, multiply-accumulate array composed of a plurality of PEs is required to complete the convolution operation. In the array, the storage module storing the input feature map input is to transmit elements of multiple input channels to multiple PE units simultaneously, that is, the input feature map data received by each PE is the same, so the storage module storing the input feature map is still split according to the input channels. For the storage module storing the weight parameters weights, each PE is responsible for parallel computation of one output channel, so the weight value transmitted to each PE module should be the weight of the channel corresponding to the PE. For a storage module for storing weights [ chop ] [ K ] [ K ], the first dimension and the second dimension of the storage module need to be simultaneously split, namely, the storage module is a single-instantiated single-port RAM, and the total number of the storage modules is chop × chip. This allows for the provision of corresponding weights for multiple input channels for a single PE and for the provision of corresponding weights for output channels for each PE. Similarly, for the storage module of the output feature map, corresponding splitting also needs to be performed, which is different from the read operation of the input feature map and the weight of only data, the splitting module is realized by instantiating a plurality of single-port RAMs, and the storage module of the output feature map array output needs to perform the read operation and the write operation of data at the same time, so that a plurality of double-port RAMs need to be instantiated to realize the splitting module. For the output feature map storage module of the convolutional layer, i.e., the input feature map storage module of the subsequent activation function, the output storage module of the activation function is consistent with the storage module of the convolutional layer because the activation function does not change the number of channels.

Through the analysis, the data reading operation required to be carried out during the execution of the weight storage in the primary loop body is the most, and a corresponding weight parameter needs to be input into each multiplier corresponding to each PE unit, so that the single-port RAM in the weight storage is instantiated by selectively using LUT (look up table) resources in an FPGA (field programmable gate array) chip, and the single-port RAM and the double-port RAM are instantiated respectively by using Block RAM resources for the storage of the input characteristic graph and the output characteristic graph.

For the activation function, assume that the activation function is v (x), and its calculation code is as follows:

for(cho＝0；cho<chout；cho++){

for(row＝0；row<R；row++){

for(col＝0；col<C；col++){

Out[cho][row][col]＝v{input[cho][row][col]}；}}}

the calculation code indicates that for the activation function v (x), the output data Out [ co ] [ row ] [ col ] of a single cycle is the value calculated by the activation function v (x) for the input data input [ co ] [ row ] [ col ]. To complete all the calculations of the activation function, the cyclic calculation of the total of chop × R × C times needs to be completed according to the characteristics of the output characteristic diagram channel and the output characteristic diagram.

Because the ReLU activation function only involves the size comparison with the value 0, the hardware only needs to realize the comparator, and unlike other activation functions, the complex exponential calculation and even logarithmic calculation need to be realized by a lookup table, therefore, the ReLU activation function is mainly used in the application, and according to the characteristics of the ReLU activation function, a method of circularly expanding the output characteristic channel can be adopted, so that the operation of the activation function can be processed according to the channel in parallel. For the input feature map storage module of the activation function, i.e., the output feature map storage module of the convolutional layer, the output feature map storage module of the activation function is consistent with the storage module of the convolutional layer because the activation function does not change the number of channels.

The calculation code for the fully-connected layer is as follows:

for(cho＝0；cho<chout；cho++){

for(chi＝0；chi<chin；chi++){

Out[cho][row][col]+＝weights[cho][chi]*input[chi]；}}

the calculation code shows that in single cycle calculation of the full connection layer, the output data Out [ cho ] [ row ] [ col ] of this time is the product of the last output data plus the weight parameter weights [ cho ] [ chi ] of the layer and the input data input [ chi ] of this time. And completing the cycle calculation of the total chord multiplied by the chi n times according to the characteristics of the output characteristic diagram channel and the input characteristic diagram to complete all calculations of the full connection layer.

Compared with the computation code of the convolutional layer, the loop nesting of the computation code of the fully-connected layer only has an output feature map channel loop and an input feature map channel loop. It can be observed that the dimension of input [ chi ] and output [ cho ] of the fully-connected layer is 1, and the data input to the fully-connected layer is already one-dimensional data, so the essence of the fully-connected layer is that the array size is changed into the array meeting the requirement through matrix multiplication, and the array is finally output. Referring to the discussion of convolutional layer unrolling, for fully connected layers, the present application chooses to unroll the output eigen-channel circularly.

After the convolution calculation, a large amount of feature data is generated, and if the feature data is directly transmitted to a full connection layer, the data learning speed is low, and an overfitting phenomenon is easy to occur. Therefore, a pooling layer is needed to down-sample the output data of the convolutional layer, so as to reduce the data and parameter amount and prevent the over-fitting phenomenon while ensuring the data characteristics.

The calculation code for the pooling layer is as follows:

for(row＝0；row<R；row++){

for(col＝0；col<C；col++){

for(cho＝0；cho<chout；cho++){

temp1= [ "maximum of pooled pre-nuclear data" ];

temp2= [ "maximum value of post-pooling nuclear data" ];

temp3＝(temp1>temp2temp1:temp2)；

Out[cho][row][col]＝temp3；}}}

the calculation code indicates that the present application employs a global max pooling calculation. In the single-cycle calculation of the pooling layer, the output data Out [ cho ] [ row ] [ col ] of the current time is the maximum value of the data before and after the pooling core. To complete all computations of the pooling layer, it is necessary to complete the R × C × chout cycle computations in total according to the output feature map and the features of the output feature channel.

Since the output feature map size will change continuously as the pooling layer goes deeper, the present application chooses to cycle out the output feature map size for the pooling layer, with reference to the discussion of convolutional layer cycle out.

And verifying and integrating the designed IP core, wherein the verification comprises C code function verification and verification after the IP core is integrated. And C, code verification, namely writing a main function for verification, namely a testbench function, calling the IP core designed in the step 7 according to the previously designed neural network model to form a corresponding model, and importing fault data to be processed and stored model parameters for verification. After the function verification is carried out on the C code, the IP core is synthesized, then the C & RTL joint simulation is carried out, and after the verification, a circuit file capable of carrying out back-end design is obtained.

The previous steps mainly complete the design and simulation work of the front end of the chip, and verify the functional logic of the intelligent fault diagnosis model and the performance indexes of the intelligent fault diagnosis model when the intelligent fault diagnosis model runs on the FPGA, including fault diagnosis accuracy rate, calculation speed and the like. However, the ASIC is very different from the FPGA, and if influences of parasitic capacitance and inductance are considered, a back end design work of the chip needs to be completed to ensure that functions and performance indexes of the chip after tape out meet expected design requirements.

The chip back end Design mainly comprises Design For Test (DFT), layout planning, clock Tree Synthesis (CTS), wiring (Place & Route), parasitic parameter extraction and layout physical verification.

Wherein, the design for testability is to insert scan chains into the circuit file in step 8, change non-scan units (such as registers) into scan units, and use a software tool of DFT Compiler from Synopsys;

the layout planning is to place macro-unit modules required by a chip in a circuit file, determine the placement positions of various functional circuits on the whole, such as an IP module, an RAM, I/O pins and the like, and directly influence the final area of the chip, wherein the used software tool is Astro of Synopsys company;

the clock tree synthesis is to reasonably complete the wiring of the clock in the circuit file, and to make the clock symmetrically connected to each register unit, so that when the clock reaches each register from the same clock source, the clock delay difference is minimum, and the used software tool is the Physical Compiler of Synopsys company;

the wiring is to complete common signals in a circuit file, and comprises the wiring among various standard units (basic logic gate circuits), and the used software tool is Astro of Synopsys company;

because the resistance of the wire, mutual inductance between adjacent wires, coupling capacitance and the like can generate signal noise, crosstalk and reflection in a chip, so that signal voltage fluctuation and change are caused, if the signal distortion is wrong seriously caused, parasitic parameters need to be extracted, secondary analysis and verification are carried out, and the problem of signal integrity is solved, wherein the used software tool is Star-RCXT of Synopsys company;

the layout physical verification is to perform functional and time sequence verification on a physical layout for completing wiring, and comprises comparison verification of a gate level circuit diagram after layout and logic synthesis, design rule check (checking whether the wiring distance, the wiring width and the like meet chip process requirements or not) and electrical rule check (checking electrical rule violations such as short circuit, open circuit and the like), wherein a used software tool is Hercules of Synopsys company.

After the design of the rear end of the chip is finished, the file of the GDS II can be delivered to a chip factory to finish the manufacture, packaging and test of the chip, and the intelligent fault diagnosis chip of the aircraft engine, which is designed by the embodiment, is high in fault diagnosis accuracy and high in calculation speed.

With reference to fig. 5, a structure diagram of an intelligent fault diagnosis chip of an aircraft engine is shown, and the method for diagnosing the fault of the aircraft engine based on the intelligent chip technology provided by the application specifically comprises the following operation steps:

firstly, monitoring parameters of the aircraft engine and weight parameters of the intelligent fault diagnosis model are stored in an SD card in advance (wherein the monitoring parameters can also be transmitted to an intelligent fault detection chip in real time through a connecting wire). The CPU reads the parameters from the SD card and stores the parameters into the off-chip DDR memory to complete the initialization operation. According to a data transmission path of the intelligent fault diagnosis chip, the CPU reads required weight parameters from the DDR memory, simultaneously reads monitoring parameters of the aircraft engine and completes data preprocessing work, and then the monitoring parameters are transmitted into the intelligent fault diagnosis chip of the aircraft engine to complete the whole DNN-based fault information calculation process. The output result of the chip can be written back to the DDR storage to be used as an important basis for maintenance personnel to judge whether the aircraft engine has faults or what kind of faults occur, and therefore rapid and accurate aircraft engine fault diagnosis is achieved.

If the intelligent fault diagnosis model needs to be updated and perfected subsequently, the CPU can be used for controlling and recalling each IP core, constructing a new deep neural network model and updating the related weight parameters of the optimal intelligent fault diagnosis model in the SD card, so that the function of the intelligent fault diagnosis chip can be updated quickly.

The method uses original fault data generated by the operation of an aircraft engine for training an intelligent fault diagnosis model based on a deep neural network. First, a large number of analyses were performed on the data. On the premise of considering the operation condition of the system, the data are preprocessed, and the generalization capability of the intelligent fault diagnosis model is improved. And then, establishing an intelligent aircraft engine fault diagnosis model based on the CNN-SAM-LSTM. The SAM module designed by the application can concern the relation among the parameters of each sensor and the declining information quantity of different operation working conditions, can obtain different importance weights of input data characteristics to highlight important declining information in the parameters of the sensors and inhibit unimportant information, and can also obtain the importance weights in different states to highlight the declining information when the system performance is reduced, reduce the influence of the system in a normal state, improve the fault diagnosis accuracy of the model and further improve the generalization capability of the model. After the optimal intelligent fault diagnosis model is obtained through repeated tests, the model weight parameters are stored for hardware to use. And then, completing hardware realization of an intelligent fault diagnosis model on an FPGA platform, carrying out detailed analysis on algorithm codes of a convolution layer, an activation function, a full connection layer and a pooling layer, and providing a hardware acceleration optimization method based on cyclic expansion, so that the calculation speed is improved with lower resource cost. Aiming at the convolutional layer, combining the characteristics of an algorithm, selecting an input characteristic diagram channel and an output characteristic diagram channel for cyclic expansion, and designing a special multiply-accumulate array to realize multiply-accumulate parallel operation of a plurality of groups of data after cyclic expansion; for the ReLU activation function and the full connection layer, combining the characteristics of respective calculation codes, and selecting an output feature map channel for cyclic expansion; and for the pooling layer, selecting an output characteristic size for cyclic expansion by combining the characteristics of the algorithm. And finally, finishing related work of chip rear end design, and realizing the intelligent fault diagnosis chip of the aircraft engine with high fault diagnosis accuracy and high calculation speed. Compared with the prior art, the method and the system can help maintenance personnel to accurately and quickly diagnose various faults of the aircraft engine, and reduce maintenance cost and time.

The above are specific embodiments of the present application. The embodiments described in this application are only for describing the preferred embodiments, and not for limiting the concept and scope of the embodiments of the present application, and various modifications and improvements made to the technical solutions of the present application by the engineers in the field shall fall within the protection scope of the present invention without departing from the design idea of the present application, and the technical content of the present application, which is claimed, is fully set forth in the claims.

Claims

1. An aircraft engine fault diagnosis method based on an intelligent chip technology is characterized by comprising the following steps:

step 1, acquiring original fault data of an aircraft engine during operation, wherein the original fault data is acquired by a sensor; carrying out primary preprocessing on original fault data to obtain second fault data; the first preprocessing comprises visual analysis, fault label marking and data set division; the data set division comprises a training set, a testing set and a verification set;

step 2, performing data set distribution consistency analysis on the second fault data in the step 1 by utilizing kernel density estimation to obtain third fault data; the third fault data is removed from the data with inconsistent data characteristics, so that the data of the verification set and the test set in the step 1 are ensured to cover the key data characteristics in the data of the training set;

step 4, designing an intelligent fault diagnosis model based on a deep neural network; the intelligent fault diagnosis model consists of two parts, wherein the first part adopts a first convolution kernel to extract local data characteristics, and the second part adopts a second convolution kernel to extract global data characteristics; each part comprises an input layer, a CNN module, a SAM module, an LSTM module and a full connection layer;

step 5, using the training set and the verification set in the fourth fault data in the step 3 as input data of the intelligent fault diagnosis model in the step 4, using sparse classification cross entropy as a loss function during model training, using Adam as an optimizer and using accuracy as an evaluation index of the training model, performing multiple times of model training and super-parameter adjustment, determining the structural depth of the network, the number, the size and the step length of convolution kernels, the number of neurons and an activation function, and obtaining a primarily optimized intelligent fault diagnosis model;

step 6, taking the test set in the fourth fault data of the step 3 as the input of the preliminarily optimized intelligent fault diagnosis model of the step 5, testing, returning to the step 5 again according to the output result to adjust the super parameters of the model until the output result of the model reaches the expected error standard acceptance criterion, finally obtaining the optimal intelligent fault diagnosis model, and storing the weight parameters of the optimal intelligent fault diagnosis model;

step 7, performing hardware IP core design based on FPGA on the optimal intelligent fault diagnosis model in the step 6 in design software; the hardware IP core design comprises code design of a convolution calculation IP core, a pooling calculation IP core, an activation function IP core and a full-connection calculation IP core;

step 8, performing hardware accelerated optimization design based on cyclic expansion on the hardware IP core in the step 7, and verifying and integrating to obtain an optimized circuit file; the hardware acceleration optimization design based on loop expansion comprises the steps of performing loop expansion of calculation codes on a convolution layer, an activation function, a full connection layer and a pooling layer, and optimizing corresponding calculation circuit design;

and 9, performing chip rear end design work of testability design, layout planning, clock tree synthesis, wiring, parasitic parameter extraction and layout physical verification on the circuit file in the step 8.

2. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the specific steps in step 1 are as follows:

1.1, original fault data, namely various fault data collected by a plurality of different types of sensors, including temperature state parameters, pressure state parameters, rotating speed state parameters, bleed air flow parameters and lubricating oil detection parameters of an aircraft engine, and taking the parameters as intelligent fault diagnosis model input data based on a deep neural network;

1.2 visualization of data utilizes Python language, each sensor data of each fault is visually represented in a visualization form, the variation trend of the data is analyzed, parameters which do not influence the system performance are removed, and parameters containing fault information are reserved and used as input of model training;

1.3, setting a fault label for the data in the step 1.2 according to the fault type:

various kinds of fault labels are set in a digital coding form: [0,1,2, ·,5], wherein 0 represents a fault label when the aircraft engine operates normally, 1 represents a fault label when the temperature parameter of the aircraft engine is abnormal, 2 represents a fault label when the pressure state parameter of the aircraft engine is abnormal, 3 represents a fault label when the rotating speed state parameter of the aircraft engine is abnormal, 4 represents a fault label when the induced air flow parameter of the aircraft engine is abnormal, and 5 represents a fault label when the lubricating oil detection parameter of the aircraft engine is abnormal;

1.4 the data set which is subjected to visual analysis and labeled with the fault label is divided, the data set is averagely divided into 10 parts, the data volume of each part is ensured to be the same, 6 parts are randomly selected from the data set to serve as a training set, 2 parts are randomly selected from the remaining 4 parts to serve as a verification set, and finally the remaining 2 parts are taken as a test set.

3. The method of claim 1An aircraft engine fault diagnosis method based on an intelligent chip technology is characterized in that: the kernel density estimation in the step 2 is to estimate a probability density function of given data, and verify the consistency of data distribution in the set, the test set and the training set after the kernel density distribution analysis is divided, so as to ensure the consistency of data characteristics in each data set; n data are set: x is the number of ₁ ，x ₂ ，x ₃ ，…，x _n The probability density function of the data is f (x);

the formula of the kernel density function is:

4. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the specific steps of performing the second preprocessing on the data in the step 3 are as follows:

3.1 classifying fault data under the same operation condition, and then carrying out normalization treatment, wherein the formula is as follows:

wherein x _i Representing a data sample (x) ₁ ，x ₂ ，x ₃ ，…，x _n ) The ith data, X _i ∈[0,1]And is dimensionless as x _i Normalized data, x _j Representing a data sample (x) ₁ ，x ₂ ，x ₃ ，…，x _n ) J (th) data, max _1≤j≤n {x _j Is x ₁ ，x ₂ ，x ₃ ，…，x _n Maximum value of (1), min _1≤j≤n {x _j Is x ₁ ，x ₂ ，x ₃ ，…，x _n Minimum value of (1);

3.2 because the original fault data contains a lot of noises, the data needs to be subjected to exponential weighting smoothing processing, and noise interference in the data is filtered, and the formula is as follows:

G(t)＝α*g(t)+(1-α)*G(t-1)

5. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the intelligent fault diagnosis model design structure based on the deep neural network in the step 4 is as follows:

4.1 the input layer of the model takes all sensor data of t cycles of the system as the input of the model;

4.2 after the input layer, connecting and combining a convolutional neural network CNN module, wherein the convolutional neural network CNN module is formed by combining two one-dimensional convolutional layers and a maximum pooling layer, a Dropout layer is added between each convolutional layer for preventing overfitting, and the output coefficient is 0.3; the configuration parameters of the two convolution layers are the same, in order to ensure that the output characteristic size after convolution operation is unchanged, the height of a convolution kernel is 3, the number of the convolution kernels is 128, the moving step length is 1, and all-zero padding is used; the activation function uses the ReLU to realize nonlinear mapping of different characteristics; the size of the pooling layer in each combination is 2, so that the size of the output characteristic is halved; the output size of the last module is S x N, wherein N is a characteristic dimension and is equal to the number of convolution kernels of the last convolution layer; s is the length of the output sequence, and the calculation formula is as follows under the condition that the convolution kernel moving step is 1:

S＝W-H+1

4.3, inputting the output data of the CNN module into the SAM module; first, for the output of CNN module, use the global maximum poolIntegrating global characteristics by means of global pooling and global average pooling, and outputting the size N after global pooling; then, learning N characteristics of global pooling output by using the same two layers of perceptrons respectively, and acquiring the relationship and the information quantity between the N characteristics, wherein in order to ensure that the output size of the perceptron is still N, the number of the neurons of the second layer must be the same as the output dimension of the CNN module; then, the outputs of the two perceptrons are combined by element summation, and the sum is normalized through a Softmax function to obtain attention weights alpha of different characteristics _N (ii) a Secondly, because the hole convolution utilizes the expansion rate to enlarge the distance between each value when the convolution kernel processes data, the expansion of the receptive field is realized, and the extraction of long-distance information is facilitated, therefore, the extraction of important information is carried out on the output of the CNN module by utilizing the two hole convolutions; in order to ensure that the output size after convolution is unchanged, the sizes, the moving step lengths and the filling configuration of convolution kernels are the same as those of a CNN module, the number of the convolution kernels is the characteristic dimension of input data, and the expansion rate is 3; then, the self-attention weight beta of the input sequence is obtained by normalization through a Softmax function _S (ii) a And then with the already obtained S alpha _N N, performing dot multiplication to obtain operation cycles with different importance;

4.4, connecting the LSTM module with the SAM module; the LSTM module is provided with an LSTM layer, the number of neurons of each gate of the LSTM layer is 128, and an activation function is Tanh, so that the fault diagnosis network model can learn time information in data characteristics in time; in order to reduce the sensitivity of the prediction model to the tiny change of data, a Dropout layer with a node output coefficient of 0.2 is added behind an LSTM layer, and 20% of neurons in the LSTM layer are assigned with zero weight to prevent the model from being over-fitted and improve the accuracy of the intelligent fault diagnosis model;

4.5 finally, connecting a full connection layer behind the LSTM module to realize regression analysis of the information extracted by each module in the front; since the model has multi-scale input, the analysis results of different parts are summed and averaged to be used as the final fault diagnosis result; the calculation formula of each neuron in the full junction layer is as follows:

wherein

is the weight of the c-th neuron at layer l-1 at layer l,

is the output of the c-th neuron at layer l-1,

is the bias of the d-th neuron of layer i.

6. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the specific steps of training the model, adjusting the hyper-parameters and obtaining the diagnosis model in the step 5 are as follows:

5.1, taking a training set and a verification set as the input of the intelligent fault diagnosis model, wherein the verification set is input into the model after the training of the training set is finished so as to verify the generalization capability of the model;

5.4 the compiled model is iteratively trained iteratively, to determine the following hyper-parameters:

5.5 in the training process, the loss values of the iteration test set and the verification set are output each time to determine whether the model is over-fitted, and whether each super parameter needs to be adjusted in the step 5.4 or not by combining the accuracy analysis of each fault classification result, and the fault classification result is best through repeated debugging.

7. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the specific steps in step 6 are as follows:

inputting the test set data into a primarily optimized intelligent fault diagnosis model, and further verifying the generalization capability of the model; if the classification result of the test set is not ideal and cannot meet the expected error standard acceptance criterion, the model training still has problems, or under-fitting or over-fitting conditions are indicated, the step 5.4-5.5 needs to be returned, the training and debugging are carried out again, after the output result of the test set meets the requirement, the final optimal model is obtained at the moment, and finally the weight of the model is saved.

8. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the hardware IP core design in the step 7 comprises the following specific steps:

7.4 designing an activation function IP, declaring a cache array for storing comparison results, comparing input array elements with 0 one by one, and storing results into the cache array;

7.5 designing a full-connection calculation IP, declaring an input array, an output array, a weight array and a bias array used by a full-connection layer, and compiling a corresponding full-connection layer algorithm code;

9. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the specific steps in step 8 are as follows:

8.1 analyzing the convolutional layer loop code; the convolution operation is regarded as multiplication accumulation operation containing 6 layers of for loop nesting, loop factors are cho, chi, row, col, kr and kc respectively, cho represents an output characteristic diagram channel, row and col represent output characteristic diagram sizes, chi represents the size of an input characteristic diagram, kr and kc represent the size of a convolution kernel, and three arrays participating in the operation are three-dimensional data out [ chout ] [ R ] [ C ], four-dimensional arrays weights [ chout ] [ chi ] [ K ] [ K ], three-dimensional arrays input [ chi ] [ Chin ] [ R + K ] [ S + C + K ];

8.3 designing a multiply-accumulate array circuit aiming at the parallel calculation of the input characteristic diagram channel and the output characteristic diagram channel of the convolutional layer to realize multiply-accumulate parallel operation of a plurality of groups of circularly expanded data; for the multiply-accumulate processing element units PE, each PE unit completes convolution operation of a plurality of groups of input characteristic diagram channels of one output characteristic diagram channel, and if parallel convolution operation of the output characteristic diagram channels is to be realized, the multiply-accumulate operation is completed by a multiply-accumulate array consisting of a plurality of PEs; in the array, the storage module storing the input feature map input needs to transmit elements of multiple input channels to multiple PE units at the same time, that is, the input feature map data received by each PE is the same, so the storage module of the input feature map is still split according to the input channels;

8.4 analyzing the activation function code; using a ReLU activation function to circularly expand the output characteristic diagram channel;

8.5 analyzing the calculation codes of the full connection layer; because the input and output dimensions of the full connection layer are both 1, the data input to the full connection layer is one-dimensional data, the essence of the full connection layer is to change the array size into an array meeting the requirements through matrix multiplication, and finally output, therefore, for the full connection layer, the output characteristic channel is selected to be circularly expanded;

8.6 analyzing the calculation codes of the pooling layer; because the size of the output feature graph of the pooling layer can be continuously changed along with the depth of the pooling layer, the output feature size is selected to be circularly expanded;

8.7, verifying the code after the C code function verification and the IP core integration; c, code verification, namely compiling a main function for verification, namely a testbench function, calling the IP core designed in the step 7 according to the previously designed neural network model to form a corresponding model, and importing fault data to be processed and stored model parameters for verification;

and 8.8 after the function verification of the C code, carrying out the synthesis of an IP core, then carrying out C & RTL joint simulation, and obtaining a circuit file for carrying out chip back end design after the verification.

10. The aircraft engine fault diagnosis method based on the intelligent chip technology as claimed in claim 1, wherein: the specific steps in step 9 are as follows:

9.1 design for testability DFT is to insert scan chains into the circuit file of step 8, change non-scan units into scan units, and use a software tool which is DFT Compiler of Synopsys company;

9.2 layout planning is to place macro-unit modules required by the chip in a circuit file, determine the placement positions of various functional circuits on the whole, and directly influence the final area of the chip, wherein the used software tool is Astro of Synopsys company;

9.3 clock tree comprehensive CTS, in the circuit file, reasonably finishing the wiring of the clock, making it symmetrically connected to each register unit, when the clock reaches each register from the same clock source, the clock delay difference is minimum, the software tool used is the Physical Compiler of Synopsys company;

9.4 wiring Place & Route, which is to complete common signals in a circuit file, including the routing among various standard units, wherein the used software tool is Astro of Synopsys company;

9.5 parasitic parameter extraction is to extract parasitic parameters and perform analysis and verification again to solve the problem of signal integrity because the resistance of the wire itself, the mutual inductance between adjacent wires and the coupling capacitance can generate signal noise, crosstalk and reflection inside the chip to cause signal voltage fluctuation and change, and the used software tool is Star-RCXT of Synopsys company;

9.6, the physical verification of the layout is to verify the function and the time sequence of the physical layout completing the wiring, and comprises the comparison verification of a gate level circuit diagram after the layout and the logic are integrated, the check of a design rule and the check of an electrical rule, and a used software tool is Hercules of Synopsys company;

9.7 after the back end design of the chip is finished, the GDS II file is delivered to a chip factory to finish the manufacture, packaging and test of the chip.