CN112836577A

CN112836577A - Intelligent traffic unmanned vehicle fault gene diagnosis method and system

Info

Publication number: CN112836577A
Application number: CN202011616139.0A
Authority: CN
Inventors: 刘辉; 杨睿; 王佳康; 尹昱成; 谭静
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-25
Anticipated expiration: 2040-12-30
Also published as: CN112836577B

Abstract

The invention discloses a fault gene diagnosis method and a fault gene diagnosis system for an intelligent traffic unmanned vehicle.

Description

Intelligent traffic unmanned vehicle fault gene diagnosis method and system

Technical Field

The invention relates to the field of vehicle fault recognition, in particular to a fault gene diagnosis method and system for an intelligent traffic unmanned vehicle.

Background

The intellectualization of traffic is one of the cores of the development of the current and future traffic industries. Unmanned automobiles are an important embodiment and core representative of the level of intelligence in the transportation industry. It is the basic operation mode of future traffic vehicles. Compared with railway traffic, highway traffic has the characteristics of flexible route, good time controllability and the like. The unmanned vehicle can reduce traffic jam and management pressure, reduce the threshold of a driver and reduce energy loss of a user. In addition, the intelligent automobile has the greatest advantages that traffic accidents and casualties are reduced, and the systematic unmanned driving strategy can effectively avoid the occurrence of the traffic accidents.

The most critical link for ensuring the safety and stability of the unmanned vehicle is fault detection and maintenance, fault prediction in advance and accurate fault repair. Statistics show that over 22000 patent applications related to unmanned automobiles exist in five years, namely 2010-2015, wherein the number of the patent applications related to fault identification is not small. The patent publication CN108415409B adopts a combination of local fault diagnosis equipment, remote fault diagnosis equipment and expert diagnosis equipment to realize multiple incremental automobile fault diagnosis. The method has certain limitation on local fault diagnosis, can only process some common small faults, lacks a coping method for some difficult and complicated diseases, and greatly reduces the overhauling cost performance and timeliness when the fault diagnosis grade is increased to remote diagnosis or even expert diagnosis.

Disclosure of Invention

The invention aims to solve the technical problem that the prior art is insufficient, and provides a fault gene diagnosis method and system for an intelligent traffic unmanned vehicle, so that the vehicle fault identification precision is improved.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a fault gene diagnosis method for an intelligent traffic unmanned vehicle comprises the following steps:

s1, collecting unmanned drivingVoltage waveform E of whole section of vehicle_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdThe voltage waveform E_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdIntegrated as a data matrix X^*；

S2, converting the data matrix X^*Inputting the fault signals into an RNNs network, eliminating abnormal values, outputting X, and detecting fault signals by using X to obtain a fault signal matrix E;

s3, converting the fault signal matrix E into a gene sequence O expressed by 4 bases A, T, C and G and capable of coding₁,O₂,O₃,O₄；

S4, mixing₁,O₂,O₃,O₄Integration into the DNA sequence S ═ S₁,S₂,S₃,...,S_NAnd extracting the characteristics of S to obtain a candidate vehicle part fault gene V which is pre-determined_s；

S5, pre-judging candidate vehicle component fault gene V_sPartial data and gate control cycle unit network model hidden layer initial neuron number theta₀As the input of the multi-target locust optimization algorithm, the number theta of neurons in a hidden layer is obtained_κTaking the gated circulation unit network model as output, training the gated circulation unit network model to obtain a classification model;

s6 real-time acquired voltage waveform E of unmanned vehicle_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdThe voltage waveform E_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdAnd integrating the data matrix into a data matrix, and inputting the fault classification model to perform fault classification identification.

The invention adopts various waveform data to provide more sufficient criteria for fault diagnosis, the conversion of the encodable gene sequence can help to extract the depth characteristics of fault components, and the training and establishment of the classification model is helpful for industry related personnel to diagnose vehicle faults. The method carries out data preprocessing on various originally collected signals, converts DNA coding and decoding, extracts depth features, and constructs a machine learning classification model, thereby realizing intelligent detection and diagnosis of vehicle faults on big data.

In S1, the data matrix X is divided into^*Outliers are rejected as inputs to the RNNs neural network. Thus, outliers are eliminated, and interference of noise on real fault signals is reduced.

In S2, the specific implementation process of fault detection includes: setting the minimum value of the initial time sequence difference as T and the value of the minimum threshold discrimination as A_dy,A_dl,A_wd,A_zdWhen the time difference between the initial data point and the final data point of the data matrix X or the data matrix after the outlier is removed is larger than a threshold T, and the amplitude is larger than a minimum threshold A_dy,A_dl,A_wd,A_zdIf so, judging that the current fault occurs, recording the change conditions of the data position and the waveform amplitude at the moment, and integrating and recording the change conditions as new matrix data E; wherein A is_dy,A_dl,A_wd,A_zdRespectively, voltage amplitude, current amplitude, temperature amplitude, and vibration amplitude. The fault detection means based on the time sequence difference and the threshold discrimination can effectively judge the position corresponding to the fault information in the data wave band, and is convenient for subsequent dimension reduction and feature extraction.

Generating a pre-determined candidate vehicle component failure gene V_s＝(W₁₁,W₁₂,...,W_UU,C₁,...,C_U,D₁,...,D_U) (ii) a Wherein, the base B_iTransfer to base B_jProbability of (2)

n_iFor a single base point B_iThe number of occurrences in the DNA sequence S; b is_iIs the base at the ith data point position in the DNA sequence S; i is more than or equal to 1 and less than or equal to U; u refers to the dimension of the characteristic vector represented by the base element; n is the length of the DNA sequence S; n is_ijIs base pair B_iB_jThe number of occurrences in the DNA sequence S; base content

Base position ratio

Base B in the DNA sequence S_iThe position of occurrence is marked S_iWherein s is_iIs S_iA value of (1). The most representative characteristics can be found by characteristic extraction of base pairs of the encodable gene sequence, and high-dimensional information as much as possible is expressed by low-dimensional data, so that overfitting of a model in a modeling process can be avoided.

The specific implementation process of S5 includes:

1) fault gene V of vehicle parts_sRandomly dividing the training set into a training set and a testing set; initializing iteration times kappa and expected precision of a multi-target locust optimization algorithm;

2) the initial number theta of the initial neurons of the hidden layer of the training set and the gated loop unit network model₀As input to the gated cyclic unit network model to have the number of hidden layer neurons Θ_κThe double-gate control circulation unit network model is used as output to train the gate control circulation unit network model;

3) the test set and the number theta of the neurons in the hidden layer are used_κCalculating values of two target optimization functions as the input of the two target optimization functions of the multi-target locust optimization algorithm;

4) updating the search path of the neuron number of each layer of the hidden layer of the gating cycle unit network model according to the product of the two calculated objective optimization function values, so that the product of the next two objective function values is larger than the product of the current two objective function values, and thus obtaining the new neuron number theta of each layer of the hidden layer_κ+1In addition, in each iteration process, the number theta of neurons in the hidden layer is provided_κThe gated loop unit network model (i.e. the classifier) outputs a primary classification value;

5) adding 1 to the number of search iterations to obtain each new hidden layerNumber of neurons Θ_κ+1As the input of the objective function of the multi-objective locust optimization algorithm, returning to the step 3) until the objective optimization function value of the multi-objective locust optimization algorithm reaches the expected precision or the set iteration times is finished, finishing the training of the gate control circulation unit network model, and obtaining the optimal parameter theta_optimalThe optimum parameter theta_optimalAnd (3) a corresponding gated loop unit network model is a classification model.

The two target optimization functions are expressed as:

object1＝Max{Accuracy_FDJ(Θ),Accuracy_DP(Θ),Accuracy_CS(Θ),Accuracy_DQ(Θ)}

object2＝Max{Recall_FDJ(Θ),Recall_DP(Θ),Recall_CS(Θ),Recall_DQ(Θ)}；

in the formula of_fNumber of neurons, alpha, for layer f in gated cyclic unit network model_fFor combining weights, the object1 and object2 optimization objective functions are the maximum value of the combination of four variables in the max () function; theta is the number of neurons in the hidden layer, subscript FDJ represents generator failure, DP represents chassis failure, CS represents vehicle body failure, and DQ represents electrical equipment failure; accuracy_FDJ,Accuracy_DP,Accuracy_CS,Accuracy_DQSequentially corresponding to the fault classification accuracy of the generator, the fault classification accuracy of the chassis, the fault classification accuracy of the vehicle body and the fault classification accuracy of the electrical equipment; recall_FDJ,Recall_DP,Recall_CS,Recall_DQThe fault classification recall ratio of the generator, the fault classification recall ratio of the chassis, the fault classification recall ratio of the vehicle body and the fault classification recall ratio of the electrical equipment correspond to one another in sequence. Accuracy ═ Accuracy_FDJ(Θ),Accuracy_DP(Θ),Accuracy_CS(Θ),Accuracy_DQ(Θ)]；Recall＝Recall_FDJ(Θ),Recall_DP(Θ),Recall_CS(Θ),Recall_DQ(Θ)；

For each sample point, when the output classification value of the classifier is the same as the test set value and is positive, marking TP + 1; when the output classification value of the classifier is opposite to the test set value and the output classification value of the classifier is positive, recording FP + 1; when the output classification value of the classifier is opposite to the test set value and the output classification value of the classifier is negative, marking FN + 1; when the output classification value of the classifier is the same as the test set value and is negative, recording TN + 1; the classifier is that the number theta of neurons with hidden layers is determined in each iteration process_κThe gated loop cell network model of (1).

The process of training the classification model and optimizing the number of neurons in each layer of the hidden layer of the classification model by using the multi-objective locust optimization algorithm provides guidance for the realization of the algorithm, and the vehicle fault type classification with higher precision can be realized by building the deep learning model based on the optimization algorithm parameter optimization.

The system of the present invention further comprises: candidate vehicle component failure gene V to be predetermined_sAnd building a template library as the input of the clustering model. The modeling based on deep learning can complete multi-classification work of vehicle faults, and the optimization of the optimization algorithm on the classification model parameters can improve the classification precision of the model.

The concrete implementation process for building the template library comprises the following steps:

step 1: the candidate vehicle component fault gene V which is obtained by the prejudgment of non-negative matrix factorization dimensionality reduction_sObtaining high-dimensional data points V as input of a random adjacent embedding algorithm_i、V_jLow-dimensional data points v_i,v_jConditional probability p of_j|iAnd q is_j|iMinimizing conditional probability to obtain minimized high dimensional numberAccording to conditional probability p_j|iAnd the conditional probability q of the minimized low dimensional data_ij；

Step 2: calculating the minimum value of the conditional probability difference of high and low dimensions according to the minimum result of the conditional probability, and calculating

Minimizing cost function L by gradient descent method, wherein n represents data sample number, and finally integrating the above results to obtain optimal solution

Will optimize the solution

And outputting as a clustering result of the tSNE clustering algorithm, wherein the output clustering information entropy clusters correspond to 4 clustering templates of DNA sequences of major intelligent automobiles:

template＝[FDJ,DP,CS,DQ]；

FDJ, DP, CS and DQ are large fault categories in a template library; FDJ: an engine; DP: a chassis; CS: a vehicle body; DQ: an electrical device.

The method combining non-negative matrix factorization and t-SNE clustering avoids the unfavorable condition that a large amount of effective information of vehicle faults is lost, and soft clustering can obtain more reliable template library information.

After S6, further comprising:

s7, judging whether the fault category corresponding to the prediction sequence output by the fault classification model is matched with the fault category in the clustering result, and if the fault category belongs to a sub-fault in a certain fault category in the clustering result, classifying the fault category into the fault category; if not, updating and supplementing the fault category in the clustering result, if a result output by the fault classification model cannot be matched with the fault category in the clustering result, judging whether to update by manual supervision, if so, setting the original signal threshold of the classification result as a new fault detection threshold, setting a new category (namely the fault category) in the fault category in the clustering result, and if not, directly abandoning the classification result. The mapping relation between the large-class template library and the classification model fine fault classification is built, fault classification is facilitated, a template library comparison mechanism helps related personnel to quickly identify the difference between current fault information and historical fault information, and a template library updating mechanism helps to improve the content of the template library, so that more fault information is contained.

The specific implementation process for updating the template library comprises the following steps: updating the minimum value T of the initial time sequence difference and the value A of the minimum threshold discrimination_dy,A_dl,A_wd,A_zdAnd returning to execute S1-S4 to obtain new pre-judged candidate vehicle component fault genes, and obtaining an updated template library by taking the new pre-judged candidate vehicle component fault genes as the input of the clustering model. The updating of the template library can realize the classification of new faults (non-historical archiving faults), so that the classification result is more accurate and reliable.

The invention also provides a system for diagnosing the fault genes of the intelligent traffic unmanned vehicle, which comprises computer equipment; the computer device is configured or programmed for performing the steps of the above-described method.

Compared with the prior art, the invention has the beneficial effects that:

1) the invention provides a precise fault identification method based on a DNA sequence template base on the basis of the existing unmanned vehicle fault diagnosis technology. The virtual LabVIEW graphical data acquisition system is combined with the existing CAN, the existing vehicle-mounted Ethernet and the existing WiFi to provide guarantee for a large amount of historical data, large data vehicle-mounted interconnection, vehicle fault information detection, gene signal conversion, codeable gene sequence feature extraction, DNA sequence template library building of a fault module, deep learning artificial intelligence modeling are mutually matched, and the position and the type of a vehicle fault are accurately identified.

2) The invention builds a DNA sequence template library of the coding fault module, which can correspond to four major components (an engine, a chassis, a vehicle body and electrical equipment) of an automobile. The construction of the fault template library provides technical guidance for related personnel, and the accurate and complete fault information is more favorable for the personnel to carry out the fault maintenance of the unmanned vehicle.

3) The invention provides a fault diagnosis multi-classification modeling method for an unmanned vehicle.

4) A closed loop feedback structure is built around real-time online data acquisition of big data of the unmanned vehicle, fault detection, gene signal conversion, codeable gene sequence feature extraction and establishment of a DNA sequence template library of a fault module, and newly-appeared faults can be fed back to the template library through a supervised self-learning model to be updated. On occasions with high requirements on timeliness, the fault classification model can be embedded into a MapReduce parallel big data platform for training and speed increasing.

Drawings

FIG. 1 is a schematic diagram of a method according to an embodiment of the present invention.

Detailed Description

The implementation process of the embodiment of the invention comprises the following steps:

1. various data acquisition based on LabVIEW automobile parts is carried out, and meanwhile, the timeliness and richness of data are improved through vehicle-mounted interconnection;

2. the voltage waveform E is collected_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdIntegrated as a data matrix X^*Inputting the abnormal values into an RNNs network for removing abnormal values, outputting new X, and inputting the new X into a fault detection module for detecting a fault signal to obtain matrix data E;

3. transforming E into a gene sequence O expressed by 4 bases A, T, C and G by an NMF dimensionality reduction method₁,O₂,O₃,O₄；

4. Mixing O with₁,O₂,O₃,O₄Integration into the DNA sequence S ═ S₁,S₂,S₃,...,S_NBy feature extractionCandidate vehicle component failure gene V for which a series of operations are predetermined_s；

5. Candidate vehicle component failure gene V to be predetermined_sAs the input of the clustering model, building a template library;

6. partial data V of the pre-determined candidate vehicle component fault gene_sAnd the number theta of initial neurons of hidden layers of the gate control circulation unit network₀As the input of the multi-target locust optimization algorithm, the number theta of neurons in a hidden layer is obtained_κThe gated circulation unit network is used as output, and the gated circulation unit network is trained to obtain a classification model;

7. refreshing a gene template library which is manually supervised according to actual needs;

8. embedding the proposed model into a big data platform to improve the speed of fault diagnosis;

9. and according to the various types of data acquired by real-time vehicle-mounted interconnection, the classification model is utilized to carry out automobile fault diagnosis, detection and identification.

The method specifically comprises the following steps:

step 1: multi-source heterogeneous fault data acquisition of automobile parts

In order to obtain reliable real-time vehicle-mounted operation equipment parameters and state travel data, the invention builds a graphical data acquisition system based on a virtual instrument LabVIEW aiming at CAN, vehicle-mounted Ethernet and WiFi which are commonly used by unmanned vehicles. The designed wireless data acquisition system mainly comprises an acquisition circuit sensor, a wireless communication system, a LabVIEW system interface and the like. More specifically, the signal acquisition system mainly comprises a Hall current sensor, a Hall voltage sensor, a vibration sensor, a temperature sensor, an analog-to-digital conversion module, a single chip microcomputer, a wireless transceiver module and a series of auxiliary electrical equipment.

LabVIEW is a graphical programming language, is a virtual instrument software development tool, and can be applied to the fields of automatic driving of automobiles, wireless communication, electronic automation and the like. The invention combines the traditional sensor acquisition technology and the LabVIEW platform to realize the online acquisition and monitoring of vehicle faults, and the combination of the two technologies builds a communication bridge between a user and vehicle equipment. The specific multi-source heterogeneous fault data acquisition of the automobile parts can be summarized into the following substeps:

A1. raw on-board data acquisition for hardware systems

The bottom layer of data acquisition is the arrangement of sensors, and with the development of science and technology, the price is very cheap while the performance of the sensors is continuously improved, which provides capital for data acquisition. In this link, a hall current sensor, a voltage sensor, a temperature sensor and a vibration sensor (which are uniformly arranged around the engine, chassis, body and electrical equipment of the vehicle as much as possible, and a certain number of sensors needs to be added in an area where the electrical equipment is dense) are required to capture four different signals. Furthermore, in order that the acquired signals can be used successfully, the a/D conversion module is designed to perform conversion of analog-to-digital signals. The singlechip is designed to effectively process processed current, voltage, temperature and vibration signals in real time, and the accuracy of signal acquisition is ensured by setting a plurality of communication I/O ports. The selection of the wireless data transmission module needs to take the communication distance, power consumption and the like into consideration, and long communication distance and low power consumption are pursued.

A2. Software system design

After the hardware system in the substep a1 finishes the collection, a system interface program of the upper computer is programmed in a software system through a virtual instrument LabVIEW. The data acquisition system can realize real-time data acquisition and store the data to the data center. The design of the software acquisition system mainly comprises the following parts: main program, single-chip microcomputer processing program, serial port communication program, wireless module program, communication protocol and the like. The flow of the software acquisition system module can be summarized as follows:

firstly, configuration initialization of single chip microcomputer equipment is carried out, sensor acquisition of voltage, current, temperature and vibration signals is sequentially carried out, then an ADC (analog-to-digital converter) module carries out analog-to-digital conversion on voltage, current, air pressure, vibration and temperature waveform signals and transmits the voltage, current, air pressure, vibration and temperature waveform signals to the single chip microcomputer, the single chip microcomputer carries out judgment processing on received data signals, information is sent to a receiving port by means of a wireless data transmission module, finally, data are transmitted to a PC (personal computer) through a serial port, and display monitoring is carried.

A3. LabVIEW data collection for unmanned vehicle equipment

The development of virtual instrument technology provides more possibilities for the industry, and for LabVIEW, the characteristics of strong visibility and simple and understandable operation greatly reduce the use threshold of practitioners. The programming at the PC end can enable the LabVIEW acquisition system to be in butt joint with communication protocols such as TCP/IP. In the invention, the programming of modules such as an information recording sub VI, a serial port setting module, a collection module, a playback module, a filtering module, a historical data calling sub VI, an abnormal alarm module and the like is required.

PC end remote monitoring

The completely built system is monitored and executed through a remote PC (personal computer) end, a controller sends a trigger signal to a LabVIEW platform, a sensor carries out remote data acquisition, ADC (analog to digital converter) is input into a single chip microcomputer to carry out data judgment processing, a wireless transmission port transmits real-time information back to a LabVIEW visual screen end, and a professional carries out acquisition and monitoring on vehicle data.

Step 2: vehicle-mounted interconnection among wireless vehicles

The wireless communication of the LabVIEW system can ensure that vehicle information in a range of hundreds of meters is perfectly communicated, which is very effective in cities with a large number of vehicles. The equipment fault information of vehicles with close distances can be mutually transmitted through wireless communication, and finally, the voltage waveform E is obtained_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdThe information is fed back to a PC (personal computer) end of a manager, and the manager can monitor the whole equipment operating system through the URL network address. The vehicle-mounted interconnection among the wireless vehicles ensures the richness of data, and the human-vehicle Internet of things system ensures the real-time performance of the data.

And step 3: internet big data fault detection for automobile equipment

After the vehicle-mounted internet in the step 2 is established, the device information among different vehicles is communicated, and once any vehicle fails, the failure data of the vehicle can be stored in a data system for learning. The equipment information acquired by vehicle-mounted interconnection and intercommunication is completely transmitted to a data storage center under the support of a big data technology, wherein repeated parts are removed in advance through replication factor neural networks (RNNs).

Voltage waveform E after the vehicle-to-vehicle interconnection processing of the step 2_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdAggregate matrix X^*Inputting into RNNs network, firstly setting the output of the c-th neuron of the m-th layer of RNNs network as S_m(I_mc) Input I to the c-th neuron of the m-th layer_mcThe mathematical expression θ is calculated as follows:

in the formula, Z_mjIs the output of the j-th neuron of the m-th network, and the number of neurons of the m-th network is L_m，w_mcjIs the connection weight from the jth neuron to the next neuron c in the m-th layer network, and when the number m of the layers of the neural network is 2,4, the activation function S is_mThe mathematical expression of (a) is as follows:

S_m(θ)＝tanh(a_mθ)(m＝2,4) (2)

in the formula, a_mIs constant 1. When the middle layer m is 3, the activation function can be regarded as a step function, and the expression is as follows:

in which N represents the number of steps of the step function, a₃Representing the lifting rate of the next layer. Finally, the neural network can calculate a single outlier and a small outlier, which are then rejected, resulting in a new set of processed data (including voltage, current, temperature, vibration data) X.

Original voltage waveform E_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdSample data ofThe same expression usually corresponds to different actual fault types, and a uniform occurrence threshold or fluctuation dynamics can be set to determine whether the fault occurs. In the present invention, the failure determination conditions are: setting the voltage waveform E of fault occurrence according to historical experience_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdAnd starting a fault detection module based on the time sequence difference and the threshold value discrimination by using a signal threshold value (the voltage amplitude minimum threshold value discrimination size is 15.6 volts, the current amplitude minimum threshold value discrimination size is 100A, the temperature minimum threshold value discrimination size is 80 ℃, the vibration amplitude minimum threshold value discrimination size is 15mm, and the time interval (the minimum value of the time sequence difference) of two times of continuous threshold values is 0.2s so as to judge whether a fault occurs or not and finish the classification of whether the fault occurs or not.

In addition, the time sequence difference refers to a time difference between two time series sample points of a voltage, a current, a temperature and a vibration signal with larger amplitude when the vehicle fails, and the threshold value discrimination refers to a certain amplitude value reached by the voltage, the current, the temperature and the vibration signal when the vehicle fails. In this connection, the data X is input to a fault detection module, which performs the following operations: setting the minimum value of the initial time sequence difference as T and setting the value of the minimum threshold discrimination as A_dy,A_dl,A_wd,A_zdThe time difference between the start and end Data points of the Data set after processing with RNNs outliers (see Hawkins S, He H, Williams G, Baxter R. outer detection using temporal networks. in: Proceedings of International Conference on Data retrieval and Knowledge Discovery: Springer; 2002.p.170-80.) is greater than a threshold T and the amplitude is greater than a minimum threshold A_dy,A_dl,A_wd,A_zdAnd judging that the current fault occurs, recording the change conditions of the data position and the waveform amplitude at the moment, and integrating and recording the change conditions as new matrix data E.

Finally, to accomplish an effective closed loop test diagnosis, the start and end positions of the data samples for which a fault is detected are labeled as 1, while the other data points that do not meet the decision criteria are all labeled as 0. When a new fault type is encountered, the fluctuation state of the fault type does not necessarily meet the judgment condition, if the fluctuation state does not meet the judgment condition, the fault type is input into the self-learning module in the step 8 to update the gene library, and then the judgment condition in the step 3 is returned to update (the voltage waveform, the current waveform, the temperature waveform and the vibration waveform signal threshold (the voltage amplitude minimum threshold is judged to be 15.6 volts, the current amplitude minimum threshold is judged to be 100A, the temperature minimum threshold is judged to be 80 ℃ and the vibration amplitude minimum threshold is judged to be 15mm, and the time interval for continuously reaching the threshold twice (the minimum value of time sequence difference) is 0.2s, the fault is judged to occur) so as to start the fault detection module based on the time sequence difference and threshold judgment, thereby judging whether the fault occurs or not. And inputting the fluctuation state meeting the conditions into the follow-up processes for feature extraction and fault identification.

And 4, step 4: gene sequence signal transformation

The gene data is a matrix of high or ultra-high dimensions in mathematical expression, and it is necessary to perform dimension reduction processing for effective use of the data. Among them, non-Negative Matrix Factorization (NMF) dimensionality reduction is a very widely used method (see Pauca V P, Piper J, Plemons R J. non-negative matrix factorization for spectral data analysis [ J ]. Linear algebra and its applications,2006,416(1): 29-47.). Compared with the traditional dimension reduction method, the method has the characteristics of small calculated amount and strong interpretability. The dimensionality of data can be effectively reduced, and key information is kept from being lost.

Given a data matrix E ═ E in step 3₁,e₂,e₃,...,e_n]∈R^g×hWhere each column of the matrix represents one data sample, and g × h is the matrix size. The purpose of the NMF algorithm is to decompose the raw data matrix E into the product of two non-negative matrices J, K. Specifically, J ═ J₁,j₂,j₃,...,j_r]∈R^g×r，K＝[k₁,k₂,k₃,...,k_r]∈R^r×k. J represents a base space, each column of which can be regarded as a base vector, and K can be regarded as a combination of J mappings in the base spaceAnd (4) the coefficient. Generally they satisfy the following conditions:

r＜＜min(g,h) (4)

E≈JK (5)

in the NMF algorithm, a set of high-dimensional data E is mapped to K through a base space J, which can be essentially regarded as a matrix projection. In the present invention, which corresponds to the transformation of the data type dimension, the signals are mapped into a set of low-dimensional gene expressions, which represent the expression of the failed component.

In step 4, the fault signal data matrix E is spread, the spread data is decomposed into a product of two non-negative matrices J, K (the high dimensional data E is mapped to K through the base space J), and a non-negative matrix decomposition (NMF) dimension reduction process of the data is performed. And defining a dimensionality reduction U according to the priori knowledge and the degree of information needing to be reserved, and mapping the high-dimensional data E by a U-dimensional feature vector expressed by four basic elements A, T, C and G after dimensionality reduction, namely a transformed gene sequence signal required by the subsequent step. For convenience of representation, O is used respectively₁,O₂,O₃,O₄"A, T, C, G" is expressed instead of four bases. The preprocessed vibration signal is converted into a gene sequence which can be coded.

And 5: encoded gene sequence feature extraction

And (4) inputting the U-dimensional feature vector signals subjected to the internet big data fault detection and gene sequence conversion of the automobile equipment in the steps 3 and 4 into a fault feature extraction module. The link carries out independent DNA sequence feature extraction of unmanned vehicle component faults by calculating the content, the position and the transfer probability of the base in the transformed gene sequence.

B1. First, define an O₁,O₂,O₃,O₄Integrated DNA sequence S ═ S₁,S₂,S₃,...,S_NThe length of which is N, if the base at the kth (1. ltoreq. k.ltoreq.N) data point position in the DNA sequence is B_i(i is more than or equal to 1 and less than or equal to U), then is recorded as S_k＝B_i(ii) a In the case of two consecutive point bases, if the base at the l-th (1. ltoreq. l.ltoreq.N-1) data point positionIs B_iThe base at the l +1 th data point position is B_jThen is marked as S_lS_l+1＝B_iB_j(1≤i,j≤U)。

B2. Definition of base transition probability W_ij. Firstly, n is_iDefined as a single base point B_iThe number of occurrences in the DNA sequence S, and, in addition, n for the case of two successive point bases_ijIs base pair B_iB_jThe number of occurrences in the DNA sequence S. The specific calculation formula is as follows:

for special cases, if base B is_iNot present in the DNA sequence S, or present but only for the last time, W can then be regarded_ijHas a numerator of 0, i.e. W_ij＝0。

In addition to the above-mentioned descriptions,

this is because:

so that W can be replaced_ijConsidered as base B_iTransfer to base B_jI.e. base transition probability vector.

B3. Definition of base content C_i. Base B in the DNA sequence S_i(1. ltoreq. i. ltoreq.U) is calculated as follows:

in the case of the U-dimensional bases,its content vector is C₁,C₂,C₃,...,C_U。

B4. Defining the base position ratio D_i. The base B in the DNA sequence S_i(1. ltoreq. i. ltoreq.U) is marked S_iThe superposition expression is as follows:

conversion to give the base position ratio D_iThe mathematical expression is as follows:

for U-dimensional base, the position ratio vector is D₁,D₂,D₃,...,D_U。

The encoding gene sequence can be subjected to feature extraction to obtain an available U-dimensional vector. Integrating the base transition probability vector, base content vector, and base position ratio vector obtained by the above steps to obtain V_s＝(W₁₁,W₁₂,...,W_UU,C₁,...,C_U,D₁,...,D_U). These feature vectors are defined as pre-determined candidate vehicle component failure genes.

Step 6: establishing DNA sequence template library of fault module

The candidate fault gene feature vector extracted in the step 5 is input into a (t-distribution random neighborhood embedding) t-SNE clustering model in the link, and a DNA sequence template library of a fault module is established through fine clustering division. The template library corresponds to 4 large panels of the unmanned automobile and is an engine (FDJ) library, a chassis (DP) library, a vehicle body (CS) library and an electrical equipment (DQ) library respectively. The abbreviations in brackets represent the major class labels for obtaining the gene sequence expression, and a plurality of minor class labels of the same fault type can be included under one major class label. It is worth mentioning that if the vibration signal is directly reduced to a 3-dimensional space by non-Negative Matrix Factorization (NMF), a large amount of key information is lost, so in the invention, the NMF is firstly reduced to a medium-small multi-dimensional space U, and is expressed by multi-dimensional base characteristics, and finally, the final clustering result is obtained by utilizing a tSNE clustering method, so that the soft clustering effect can be achieved. Each clustering result corresponds to the fault of one component, the clustered results are transmitted to the model in the step 7 for training, and then the DNA sequence template is used for secondary large-class division. the t-SNE is a nonlinear dimensionality reduction algorithm capable of exploring high-dimensional data, and the DNA sequence clustering method of the vehicle fault module t-SNE comprises the following steps:

C1. the data are first transformed by random adjacency embedding (SNE), and the high-dimensional euclidean distances between the data are transformed to represent similar conditional probabilities, specifically, high-dimensional data points V_i、V_jConditional probability p of_j|iThe mathematical calculation of (a) is given as follows:

in the formula, V_i,V_jIs a data point in the DNA sequence S, σ_iIs a data point V_i,V_jGaussian variance is centered.

C2. Conversion of high-dimensional data points to low-dimensional data points. Similarly, for low dimensional data points v_i,v_jIn other words, its conditional probability q_j|iThe calculation method of (2) is also similar:

in this process, the random neighborhood embedding algorithm attempts to minimize the difference in conditional probabilities. For t-SNE, assuming v obeys a t-distribution, one can obtain:

wherein z is the number of the candidate vehicle component failure gene determined in advance.

C3. And measuring the minimum value of the sum of the conditional probability differences of the high and low dimensions. In the link, the SNE minimizes the Kullback-Leibler difference distance by using a gradient descent method, meanwhile, the cost function of the SNE puts attention to the local structure of mapping data, and further, the congestion problem of optimizing the function is relieved by using the heavy tail distribution of the t-SNE. In order to make the distributions of P and Q as close as possible, it is necessary to make the divergence of KL as small as possible and calculate P_ij：

The smaller the value of the KL divergence, the closer the distance between the two distributions. When the divergence KL is 0, it indicates that the distributions of P and Q are the same. If the probability distribution of the points in the reduced feature space is similar to the probability distribution of the points in the original feature space, a well-defined cluster can be obtained, where the cost function is minimized by the gradient descent method:

C4. iterative optimization, namely optimizing a variable target function L, and continuously updating low-dimensional data points until a corresponding solved optimal solution is obtained

The optimal solution is a few clusters that can be expressed as FDJ, DP, CS and DQ.

Wherein y is the iteration number in the iteration process, y_maxIs the maximum iteration total number, eta is the learning rate, alpha (y) is the learning momentum, and the set of low dimensional data

This link requires a large amount of historical fault data as support. The template library corresponds to the large-class fault classification, one class of gene characteristic expression with the same attribute corresponds to the fault of one component, and finally the system sends out a diagnosis early warning report. The optimal solution obtained finally

The clustering result can be expressed as several clusters of FDJ, DP, CS and DQ, and can be visualized as a clustering template of DNA sequences of 4 smart car major components. The expression of the tags of the template is shown below:

template＝[FDJ,DP,CS,DQ] (19)

FDJ: an engine; DP: a chassis; CS: a vehicle body; DQ: an electrical device. And at this moment, the construction of the DNA sequence template library of the fault module is completed.

Specifically, the construction of the template library may be summarized as:

step 1: the candidate vehicle component fault gene V which is obtained by the prejudgment of non-negative matrix factorization dimensionality reduction_sSeparately deriving high-dimensional data points V as inputs to a random adjacency embedding (SNE) algorithm_i、V_jAnd low-dimensional data points v_i,v_jConditional probability p of_j|iAnd q is_j|iAnd further minimizing the conditional probability to obtain a minimized conditional probability p of the high dimensional data_j|iAnd the conditional probability q of the minimized low dimensional data_ij。

Minimizing a cost function L by a gradient descent method, wherein n is the number of data samples, and finally calculating to obtain an optimal solution according to the result

That is, the optimal solution

And outputting the clustering result as a clustering result of the tSNE clustering algorithm. The output clustering information entropy clusters correspond to 4 clustering templates of DNA sequences of the large intelligent automobile.

And 7: AI modeling for unmanned vehicle fault identification

Candidate vehicle component fault gene V capable of being coded and pre-determined after conversion_sAfter normalization, the model is input to train a vehicle fault diagnosis and identification classification model, namely, multi-classification of faults is carried out. The specific modeling process is as follows:

D1. and (4) dividing the data set. V is divided into 60 percent and 40 percent according to the proportion of a training set and a test set respectively_sThe classification is performed, and the evaluation indexes of the classification model are set to be fault classification Accuracy (Accuracy) and fault classification Recall (Recall), and the closer the numerical value is to 1, the better the performance of the model is.

D2. And establishing a gate control cycle unit deep learning model with a mapping relation with the DNA sequence characteristic template library, and optimizing network model parameters. The selection of the number of neurons in each layer of the deep neural network has a great influence on the model performance, and in order to further improve the model performance, a multi-objective locust optimization algorithm (MOGOA) is used for optimizing parameters of the number of neurons in each layer of the deep neural network. The optimization process is performed simultaneously with the GRU modeling process. The specific implementation details are as follows:

1) selecting an optimization algorithm for parameter initialization: and selecting a multi-objective locust optimization algorithm to optimize parameters of the GRU model. The iteration times of the multi-target locust optimization algorithm are set to be 500, and the classification precision is

The iteration is stopped when a preset number of iterations is reached or a desired accuracy is met.

2) Setting an optimization variable: and setting the neuron number theta of each layer of the hidden layer of the gated loop unit network model as a variable to be optimized. In this link, the loop structure of the gated loop unit network model is set to 4 layers, the output of the upper layer corresponds to the input of the next layer, and then the depth feature representation of the encodable data is learned.

3) And (5) training a model. The initial number theta of each layer of neuron of the hidden layer of the training set and the gating cycle unit network model is determined₀As the input of the multi-target locust optimization algorithm, the number theta of neurons in a hidden layer is obtained_κThe gated loop unit network model is used as output to train the gated loop unit network model.

4) Performing multi-objective optimization of model parameters (see, Tharwat A, Houssein E H, Ahmed M, et al. MOGOA algorithm for constrained and unconstrained multi-objective optimization schemes [ J]Applied Intelligence,2018,48(8): 2268-. In order to further improve the performance of the model, a multi-objective locust optimization algorithm is used for optimizing the number of neurons in the hidden layer of the GRU model so as to improve the classification precision. The test set and the number theta of the neurons in the hidden layer are used_κCalculating the target optimization function value as the input of the target optimization function of the multi-target locust optimization algorithm, and in addition, in each iteration process, the number theta of neurons in a hidden layer is provided_κThe gated loop unit network model (i.e. the classifier) outputs a primary classification value; where κ represents the current iteration number, 0 ≦ κ ≦ 500.

Setting an optimization target to maximize classification accuracy and classification recall rate of various faults of the model, and when the objective function object1 and the objective function object2 obtain comprehensive optimization through multi-objective optimization, forming a group of pareto surface solution sets simultaneously containing a plurality of (theta), wherein each (theta) in the solution sets corresponds to the comprehensive optimization of two objective function values, and the expression of the optimization objective function is as follows:

object1＝Max{Accuracy_FDJ(Θ),Accuracy_DP(Θ),Accuracy_CS(Θ),Accuracy_DQ(Θ)}

(21)

object2＝Max{Recall_FDJ(Θ),Recall_DP(Θ),Recall_CS(Θ),Recall_DQ(Θ)} (23)

in the formula of_fNumber of neurons, alpha, for layer f in gated cyclic unit network model_fFor combining weights, the formulas (21) and (23) are maximum values obtained by combining four variables in the max () function; theta is the number of neurons in the hidden layer, subscript FDJ indicates generator failure, DP indicates chassis failure, CS indicates vehicle body failure, and DQ indicates electrical equipment failure. Accuracy_FDJ,Accuracy_DP,Accuracy_CS,Accuracy_DQSequentially corresponding to the fault classification accuracy of the generator, the fault classification accuracy of the chassis, the fault classification accuracy of the vehicle body and the fault classification accuracy of the electrical equipment; recall_FDJ,Recall_DP,Recall_CS,Recall_DQThe fault classification recall ratio of the generator, the fault classification recall ratio of the chassis, the fault classification recall ratio of the vehicle body and the fault classification recall ratio of the electrical equipment correspond to one another in sequence. Accuracy ═ Accuracy_FDJ(Θ),Accuracy_DP(Θ),Accuracy_CS(Θ),Accuracy_DQ(Θ)]；Recall＝Recall_FDJ(Θ),Recall_DP(Θ),Recall_CS(Θ),Recall_DQ(Θ)。

For the binary problem, there are four results for sample classification, True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN), and the specific confusion matrix is explained as follows:

the calculation method of Accuracy can be explained as follows: for each sample point, when the output classification value of the classifier is the same as the test set value and is positive, marking TP + 1; when the output classification value of the classifier is opposite to the test set value and the output classification value of the classifier is positive, recording FP + 1; when the output classification value of the classifier is opposite to the test set value and the output classification value of the classifier is negative, marking FN + 1; when the output classification value of the classifier is the same as the test set value and is negative, recording TN + 1; the classifier here is not the final classification model, and the classifier only outputs the classification result in the training process.

5) Updating the search path of the neuron number of each layer of the hidden layer of the gated loop unit network model according to the product of the two objective optimization function values, so that the product of the next two objective function values is larger than the product of the two objective function values, and obtaining a new neuron number theta of the hidden layer_κ+1。

6) Searching iteration number It is It +1, and calculating new hidden layer neuron number theta_κ+1As the input of the objective optimization function of the multi-objective locust optimization algorithm, returning to the step 4) until the objective function value of the multi-objective locust optimization algorithm reaches the expectation or the set iteration times is completed, completing the training of the gate control circulation unit network model, and obtaining the optimal parameter theta_optimalThe optimum parameter theta_optimalAnd (3) a corresponding gated loop unit network model is a classification model.

When the real label is matched with the prediction label, the model classification is correct, and the fault diagnosis of the equipment is accurately finished. The classification result may correspond to [ FDJ, DP, CS, DQ ] in the template library of step 6]Four types of faults. Judging whether the classification result output by the classification model is matched with the fault class in the template library or not, if the fault class belongs to the sub-fault in a certain fault class in the template library, dividing the fault class into the template library of the fault, and marking as the old fault class

If the fault category does not belong to any of the template libraries, then step 8 of supervised is performedThe DNA gene template library is self-learned and updated on line. The library of DNA sequence templates guides the direction for training of the model.

And 8: supervised self-learning online update of DNA gene template library

For new faults that do not reach the initial event detection module threshold decision, their original vibration signals are input to step 8 for supervised self-learning online updating of the DNA gene template library. This step requires manual supervision to determine the type of fault from past experience and to determine the voltage, current, temperature and vibration signals that the sensor can receive when the fault occurs. Then, the minimum value T of the initial time sequence difference and the value A of the minimum threshold value discrimination are carried out_dy,A_dl,A_wd,A_zdAnd (3) refreshing, if the artificial inspection is passed, supplementing and perfecting the DNA sequence template library of the fault module in the step (6) through a new training, and marking the fault as a new fault

For example, originally, if the voltage amplitude exceeds 15.6 within a time difference of 0.2s, the voltage amplitude is taken as one of the determination conditions, but since the voltage amplitude is the initially set threshold, if a new unseen fault does occur at this time, and the corresponding voltage threshold is only 15.4, the initial value 15.6 set before the refresh needs to be adjusted, and is defined as a new value 15.4, and then the threshold determination condition of the voltage part is updated to 15.4, and other current, temperature and vibration thresholds are the same, and when a new fault occurs but the threshold of the last iteration is not reached, a new refresh is performed, and the process needs to be finished by manual supervision.

If a small amplitude fault with a threshold value being too low or even similar to the noise fluctuation amplitude is encountered, the refreshing range is not counted, a new more refined model method needs to be established for checking the small amplitude fault, and even the small amplitude fault can be considered not to be diagnosed.

And step 9: parallel big data platform embedding

By combining the time consumption of the method and the real-time requirement of automobile equipment maintenance in practical engineering, the module can be embedded into a parallel big data platform to accelerate the model training and self-learning updating speed, so that the application requirement is met to a greater extent. Available large data parallel computing framework platforms include MapReduce, Apache Spark et al (see Dittrich J, Quiane-Ruiz J A. efficient big data processing in Hadoop MapReduce [ J ]. Proceedings of the VLDB expression, 2012,5(12):2014 2015.). The analysis engine and the cluster computing system for large-scale data processing have the characteristics of high efficiency, usability, universality, compatibility and the like, and can greatly meet the use requirement.

Claims

1. A fault gene diagnosis method for an intelligent traffic unmanned vehicle is characterized by comprising the following steps:

s1, collecting the voltage waveform E of the whole section of the unmanned vehicle_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdThe voltage waveform E_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdIntegrated as a data matrix X^*；

S5, training a gating cycle unit network model by utilizing the pre-judged candidate vehicle component fault gene to obtain a classification model;

preferably, the method further comprises the following steps:

s6, collecting voltage waveform E of unmanned vehicle in real time_dyCurrent waveform E_dlTemperature waveform E_wdVibration generatorDynamic waveform E_zdThe voltage waveform E_dyCurrent waveform E_dlTemperature waveform E_wdVibration waveform E_zdAnd integrating the data matrix into a data matrix, and inputting the data matrix into the classification model to perform fault classification and identification.

2. The method as claimed in claim 1, wherein in S1, the data matrix X is used as input of RNNs neural network to remove outliers.

3. The method for diagnosing the failure gene of the intelligent transportation unmanned vehicle as claimed in claim 1 or 2, wherein the step S2 is implemented by the following steps: setting the minimum value of the initial time sequence difference as T and the value of the minimum threshold discrimination as A_dy,A_dl,A_wd,A_zdWhen the time difference between the initial data point and the final data point of the data matrix X or the data matrix after the outlier is removed is larger than a threshold T, and the amplitude is larger than a minimum threshold A_dy,A_dl,A_wd,A_zdIf so, judging that the current fault occurs, recording the change conditions of the data position and the waveform amplitude at the moment, and integrating and recording the change conditions as new matrix data E; wherein A is_dy,A_dl,A_wd,A_zdRespectively, voltage amplitude, current amplitude, temperature amplitude, and vibration amplitude.

4. The method as claimed in claim 1, wherein the candidate vehicle component fault gene V is generated_s＝(W₁₁,W₁₂,...,W_UU,C₁,...,C_U,D₁,...,D_U) (ii) a Wherein, the base B_iTransfer to base B_jProbability of (2)

n_iFor a single base point B_iThe number of occurrences in the DNA sequence S; b is_iIs DNThe base at the ith data point position in the sequence S of A; i is more than or equal to 1 and less than or equal to U; u refers to the dimension of the characteristic vector represented by the base element; n is the length of the DNA sequence S; n is_ijIs base pair B_iB_jThe number of occurrences in the DNA sequence S; base content

Base position ratio

Base B in the DNA sequence S_iThe position of occurrence is marked S_iWherein s is_iIs S_iA value of (1).

5. The method as claimed in claim 1, wherein the implementation procedure of S5 includes:

3) the test set and the number theta of the neurons in the hidden layer are used_κCalculating values of two target optimization functions as the input of the two target optimization functions of the multi-target locust optimization algorithm; according to the product of the two objective optimization function values, updating the search path of the neuron number of each layer of the hidden layer of the gating cycle unit network model, so that the product of the next two objective function values is larger than the product of the current two objective function values, and thus obtaining the new neuron number theta of each layer of the hidden layer_κ+1；

4) Adding 1 to the number of search iterations, and adding the number theta of neurons in each layer of the new hidden layer_κ+1As a multi-objective locust optimization algorithmInputting the method target function, returning to the step 3) until the target optimization function value of the multi-target locust algorithm reaches the expected precision or finishes the set iteration times, finishing the training of the gate control circulation unit network model, and obtaining the optimal parameter theta_optimalThe optimum parameter theta_optimalA corresponding gated cyclic unit network model, namely a classification model; preferably, the two target optimization function expressions are:

object1＝Max{Accuracy_FDJ(Θ),Accuracy_DP(Θ),Accuracy_CS(Θ),Accuracy_DQ(Θ)}

object2＝Max{Recall_FDJ(Θ),Recall_DP(Θ),Recall_CS(Θ),Recall_DQ(Θ)}；

in the formula of_fNumber of neurons, alpha, for layer f in gated cyclic unit network model_fFor combining weights, the object1 and object2 optimization objective functions are the maximum value of the combination of four variables in the max () function; theta is the number of neurons in the hidden layer, subscript FDJ represents generator failure, DP represents chassis failure, CS represents vehicle body failure, and DQ represents electrical equipment failure; accuracy_FDJ,Accuracy_DP,Accuracy_CS,Accuracy_DQSequentially corresponding to the fault classification accuracy of the generator, the fault classification accuracy of the chassis, the fault classification accuracy of the vehicle body and the fault classification accuracy of the electrical equipment; recall_FDJ,Recall_DP,Recall_CS,Recall_DQCorresponding to the fault classification of the generator in turnThe system comprises a recall rate, a chassis fault classification recall rate, a vehicle body fault classification recall rate and an electrical equipment fault classification recall rate. Accuracy ═ Accuracy_FDJ(Θ),Accuracy_DP(Θ),Accuracy_CS(Θ),Accuracy_DQ(Θ)]；Recall＝Recall_FDJ(Θ),Recall_DP(Θ),Recall_CS(Θ),Recall_DQ(Θ)；

In each iteration process, for each sample point in the test set, when the output classification value of the classifier is the same as and positive to the value of the sample point in the test set, adding 1 to the value of TP; when the output classification value of the classifier is opposite to the value of the sample point in the test set and the output classification value of the classifier is positive, adding 1 to the value of FP; when the output classification value of the classifier is opposite to the value of the sample point in the test set and the output classification value of the classifier is negative, adding 1 to the value of FN; when the output classification value of the classifier is the same as the value of the sample point in the test set and is negative, adding 1 to the value of TN; the classifier is that the number theta of neurons with hidden layers is determined in each iteration process_κThe gated loop cell network model of (1); wherein the initial values of the true positive TP, the false positive FP, the true negative TN and the false negative FN are all 0.

6. The intelligent traffic unmanned vehicle fault gene diagnosis method according to claim 1, further comprising: candidate vehicle component failure gene V to be predetermined_sAnd building a template library as the input of the clustering model.

7. The intelligent traffic unmanned vehicle fault gene diagnosis method according to claim 6, wherein the specific implementation process of building the template library comprises:

step 1: the candidate vehicle component fault gene V which is obtained by the prejudgment of non-negative matrix factorization dimensionality reduction_sObtaining high-dimensional data points V as input of a random adjacent embedding algorithm_i、V_jLow-dimensional data points v_i,v_jConditional probability p of_j|iAnd q is_j|iMinimizing conditional probabilityObtaining a conditional probability p of minimized high dimensional data_j|iAnd the conditional probability q of the minimized low dimensional data_ij；

Step 2: calculating the minimum value p of the conditional probability difference of high and low dimensions according to the minimal result of the conditional probability_ij，

Minimizing the cost function L by gradient descent:

get the optimal solution

The optimal solution is obtained

Outputting a clustering result serving as a clustering result of the tSNE clustering algorithm, wherein the clustering result corresponds to a clustering template of 4 major DNA sequences of the intelligent automobile:

template＝[FDJ,DP,CS,DQ]；

FDJ, DP, CS and DQ are large fault categories in a template library; FDJ: an engine; DP: a chassis; CS: a vehicle body; DQ: an electrical device; where n represents the number of data samples and KL represents the divergence.

8. The method as claimed in claim 6, further comprising, after S6:

s7, judging whether the fault category corresponding to the prediction sequence output by the fault classification model is matched with the fault category in the clustering result, and if the fault category belongs to a sub-fault in a certain fault category in the clustering result, classifying the fault category into the fault category; if not, updating and supplementing the fault category in the clustering result, if the fault classification model outputs a result which cannot be matched with the fault category in the clustering result, judging whether to update by manual supervision, if so, setting the original signal threshold of the classification result as a new fault detection threshold, setting a new fault category in the clustering result, and if not, directly giving up the classification result.

9. The method as claimed in claim 8, wherein the updating of the template library comprises: updating the minimum value T of the initial time sequence difference and the value A of the minimum threshold discrimination_dy,A_dl,A_wd,A_zdAnd returning to execute S1-S4 to obtain new pre-judged candidate vehicle component fault genes, and obtaining an updated template library by taking the new pre-judged candidate vehicle component fault genes as the input of the clustering model.

10. A fault gene diagnosis system for an intelligent traffic unmanned vehicle is characterized by comprising computer equipment; the computer device is configured or programmed for carrying out the steps of the method according to one of claims 1 to 9.