CN112257914A - Aviation safety cause and effect prediction method based on random forest - Google Patents

Aviation safety cause and effect prediction method based on random forest Download PDF

Info

Publication number
CN112257914A
CN112257914A CN202011111711.8A CN202011111711A CN112257914A CN 112257914 A CN112257914 A CN 112257914A CN 202011111711 A CN202011111711 A CN 202011111711A CN 112257914 A CN112257914 A CN 112257914A
Authority
CN
China
Prior art keywords
aviation safety
event
unsafe
prediction
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011111711.8A
Other languages
Chinese (zh)
Other versions
CN112257914B (en
Inventor
任博
崔利杰
刘嘉
王强
史越
胡良谋
李大伟
刘超
苗卓广
周之
王新河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Air Force Engineering University of PLA
Original Assignee
Air Force Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Air Force Engineering University of PLA filed Critical Air Force Engineering University of PLA
Priority to CN202011111711.8A priority Critical patent/CN112257914B/en
Publication of CN112257914A publication Critical patent/CN112257914A/en
Application granted granted Critical
Publication of CN112257914B publication Critical patent/CN112257914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an aviation safety cause and effect prediction method based on random forests, which comprises the following steps of S1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model, and determining a key dependent variable; s2: establishing an aviation safety scale data acquisition list, and labeling unsafe event data characteristics in an original safety database; s3: performing dimensionality reduction on the features of the unsafe event data to obtain unsafe event modeling data of an airline company; s4: constructing an aviation safety situation prediction model based on a random forest model; s5: evaluating the prediction capability of the aviation safety situation prediction model; s6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event. The prediction method disclosed by the invention can quantitatively detect the aviation safety situation which is difficult to measure, and is suitable for the aviation safety quantitative evaluation of the current state and the prediction of the future aviation safety situation.

Description

Aviation safety cause and effect prediction method based on random forest
Technical Field
The invention relates to the technical field of aviation safety, in particular to an aviation safety cause and effect prediction method based on random forests.
Background
The aviation safety prediction reveals the accident occurrence rule by researching the internal relation of aviation accidents and causative factors, realizes the prediction of future safety trend, and the accurate aviation safety prediction model has important significance for realizing aviation safety intelligent management, advance decision, emergency management and the like;
however, the occurrence of an aviation accident is caused by the mutual interweaving influence of various factors, including a plurality of uncertain factors such as flight environment, task characteristics, airplane quality, maintenance management mechanism and personnel errors. The aviation accident inducement is complex, and the method has the characteristics of low frequency, randomness, time variability, high dimension and the like, and has large difficulty in predicting and modeling. At present, aviation safety prediction is mainly time series prediction, and the influence relation between the accident occurrence rule at the previous moment and the future accident occurrence trend is researched; establishing an aviation safety time sequence by adopting methods such as parameter, nonparametric, Bayesian network, artificial intelligence and the like to carry out prediction; the Wang Yanyang concave sample strip interpolation function carries out prediction analysis on the aviation safety comprehensive index, researches the influence relation between human factors and aviation safety, and improves the applicability to nonlinear data; the flight accidents are researched by people such as the butyl pinabine, the ganxu rising, the luxu mei and the like by using a BP neural network, an autoregressive moving average method and the like;
however, the models all belong to 'black box' models, the internal mechanisms of input and output are unknown, the influence of input on output is difficult to determine, the explanation of a prediction variable is unclear, reverse distribution according to an output prediction result cannot be realized, and the aviation safety management supporting force is limited.
Disclosure of Invention
Aiming at the existing problems, the invention aims to provide an aviation safety cause and effect prediction method based on random forests, which applies a random forest algorithm based on the Bow-Tie model combination to aviation safety cause and effect prediction to complete parameter optimization and cause variable contribution sequencing of a safety prediction model so as to predict aviation safety key factors and the change trend of aviation safety situations.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
an aviation safety cause and effect prediction method based on random forests is characterized by comprising the following steps,
s1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model by using an original security database of an airline company, and determining a key dependent variable of an aviation safety unsafe event;
s2: establishing an aviation safety scale data acquisition list according to the key dependent variable determined in the step S1, and labeling unsafe event data characteristics in the original safety database;
s3: for the unsafe event data characteristics in the step S2, considering characteristic reduction of specific safe output, realizing dimension reduction processing to obtain unsafe event modeling data of the airline company;
s4: obtaining training and testing sample subsets from the unsafe event modeling data obtained in the step S3 by combining a random forest model with a feature election and sample sampling method, and constructing an aviation safety situation prediction model based on the random forest model;
s5: evaluating the prediction capability of the aviation safety situation prediction model by combining an original safety database of an airline company;
s6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event.
Further, the specific operation of step S1 includes,
s11: determining a dependent variable X ═ X (X) of an aviation safety unsafe event by using an original security database of an airline company1,x2,…,xm) And corresponding different types of aviation safety unsafe events Y ═ Y (Y)1,y2,…,yn) Y ═ g (x), wherein Y ═ g (x)1,y2,…,yn) Representing n different types of unsafe events for aviation safety, X ═ X1,x2,…,xm) Representing a causative variable that causes an aviation safety unsafe event;
s12: the method comprises the steps that dependent variables, possible consequences and corresponding control measures of unsafe aviation events are in one-to-one correspondence with basic events BE, intermediate events IE, top events CE, control events SE and consequence events OE of a Bow-tie model; the method comprises the following steps that a dependent variable of an aviation safety unsafe event corresponds to a basic event BE of a Bow-tie model, a control measure of the aviation safety unsafe event corresponds to a control event SE of the Bow-tie model, a possible consequence of the aviation safety unsafe event corresponds to a consequence event OE of the Bow-tie model, and a previous-level slight consequence of the possible consequence of the aviation safety unsafe event corresponds to an intermediate event IE of the Bow-tie model;
s13: let every basic event happenHas a probability of pBEConsider that the presence of a branch can lead to the ith outcome event OEiProbability of occurrence, assuming that the k control events on the mth branch occur with a probability of
Figure BDA0002728797440000031
Then the probability of the occurrence of an outcome event for the mth branch is
Figure BDA0002728797440000032
In the formula, when a link event occurs on a branch,
Figure BDA0002728797440000033
when a link event does not occur,
Figure BDA0002728797440000034
s14: the consequence event OE is based on the result of step S13iThe occurrence probability of (2) is expressed as a function of the occurrence probability of n basic events and m consequent events, namely an aviation safety dependent variable correlation identification model,
Figure BDA0002728797440000041
s15: and determining key dependent variables of the unsafe events of the aviation safety from the dependent variables based on the correlation identification model of the dependent variables of the aviation safety established in the step S14.
Further, the specific operation of step S3 includes,
s31: defining mutual information between a dependent variable X and an aviation safety unsafe event Y by using a mutual information principle
Figure BDA0002728797440000042
In the formula, I (X, Y) represents mutual information between a dependent variable X and an aviation safety unsafe event Y; f (X, Y) is a joint probability density function of a dependent variable X and an aviation safety unsafe event YF (X) and f (Y) are the dependent variable X and the edge probability density function of the unsafe event Y of aviation safety;
s32: for any set of multivariate data sequences (X, Y) with sample size N, I (X, Y) is calculated by
Figure BDA0002728797440000043
S33: solving mutual information between the dependent variable X and the unsafe event Y of aviation safety by utilizing a nuclear density estimation and low deviation determination sampling method;
s34: and determining the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y according to the mutual information obtained by the solution in the step S33, and obtaining the unsafe event modeling data of the airline company according to the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y.
Further, the specific operation of step S4 includes,
s41: carrying out normalization processing on all unsafe event modeling data;
s42: performing random sampling with a place back, performing repeated sampling with a place back from the unsafe event modeling data after normalization processing to obtain K training samples N which are equal to the unsafe event modeling data, and forming a decision tree according to the training samples;
s43: randomly extracting M of all dependent variables from each split node on each decision treetryTaking the variables as feature subsets of the current node splitting, and selecting an optimal splitting mode from the feature subsets according to a classification regression tree method to split;
s44: calculating an input variable xj(j ═ 1, 2, 3, …, h) importance in the kth tree
Figure BDA0002728797440000051
In the formula, h is the number of input dependent variables, NOOBIs the number of data samples outside the bag, fk(xn) For the nth sample value, f, of the data outside the bagk(x′n) For the k tree out-of-bag data after random variable replacementAn nth sample estimate; i (-) is a discriminant function when fk(xn)=fk(xn') the value of I (-) is 1, otherwise is 0;
s45: calculating an input variable xjImportance scores throughout random forests
Figure BDA0002728797440000052
S46: will input variable xjAveraging the importance scores obtained for multiple times to obtain the weight of each dependent variable;
s47: and optimizing parameters of the regression aviation safety situation prediction model.
Further, the specific operations of the normalization process in step S41 include,
utilization of all accident causative variables
Figure BDA0002728797440000053
Performing dimensionless treatment, wherein xjFor normalized data, xj.max、xj.minMaximum and minimum values, respectively; x is the number ofj.true、xj.preActual and predicted values, respectively.
Further, the specific operation of step S5 includes using the decision coefficient R2The root mean square error RMSE and the relative root mean square error rRMSE are used as evaluation indexes to evaluate the prediction capability of the regression aviation safety situation prediction model, wherein,
Figure BDA0002728797440000061
Figure BDA0002728797440000062
Figure BDA0002728797440000063
in the formula, x (i) represents the ith sample in the verification data set, and x (i) P represents the model prediction aviation safety situation obtained by using the ith sample point prediction variable in the verification data set.
The invention has the beneficial effects that:
1. the invention designs an aviation safety prediction method based on random forests, which can analyze and predict the relation change between an aviation safety unsafe event and key causative factors, is an important means for developing aviation intelligent management and assistant decision-making, and in various machine learning algorithms, a random forest model has obvious advantages in the aspects of parameter optimization, variable sequencing, subsequent variable analysis and interpretation and the like, and correlation coefficients and prediction precision are obviously superior to other models such as a linear model, a correlation vector machine model, a neural network model and the like, so that the method is more suitable for aviation safety trend prediction and key causative factor determination;
2. according to the method, the internal operation mechanism of aviation safety can be well reflected by constructing the aviation safety cause model based on the Bow-Tie model, and the fact is proved on the data mutual information index that the aviation safety can be well predicted by the random forest, the prediction precision reaches over 90 percent, and the random forest can well describe the nonlinear relation between aviation safety cause variables and aviation safety;
3. the prediction method provided by the invention converts the traditional time series prediction aiming at a single accident into the research of the influence mechanism between the causative factors and the accident, and further researches the accident change rule by researching the change relation of the causative factors (flight intensity, equipment failure, environment, weather and the like), so that the problems of few accident samples and difficulty in measurement are solved, and the aviation safety situation which is difficult to measure is quantitatively measurable;
4. the prediction method adopts a mutual information method to process the data of the unsafe events of the aviation safety, can accurately measure the causal relationship between the unsafe events of the aviation safety and the dependent variables, and can more accurately model the unsafe events and the dependent variables by utilizing the modeling data of the unsafe events determined according to the causal relationship between the unsafe events of the aviation safety and the dependent variables, thereby ensuring higher modeling precision and more accurate prediction result.
Drawings
FIG. 1 is a schematic diagram of the Bow-tie model of the present invention.
FIG. 2 is a sample distribution diagram of a training and validation database according to the present invention.
FIG. 3 is a flow chart of the prediction of the regression aviation safety situation based on the random forest according to the invention.
FIG. 4 is a diagram of an aviation safety random forest model of the present invention.
FIG. 5 is a diagram of the results of the random forest parameter optimization of the present invention.
FIG. 6 is an error estimation diagram of the aviation safety random forest model of the present invention.
FIG. 7 is a graph of the random forest regression model variable screening of the present invention.
FIG. 8 is a diagram of an aviation safety prediction estimation situation based on a random forest model according to the invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following further describes the technical solution of the present invention with reference to the drawings and the embodiments.
An aviation safety cause and effect prediction method based on random forests comprises the following steps,
s1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model by using an original security database of an airline company, and determining a key dependent variable of an aviation safety unsafe event;
the Bow-tie model inherits the advantages of a fault tree and an event tree of a safety analysis tool, integrates a plurality of factors such as accident occurrence reasons, precautionary measures, possible consequences, corresponding control measures and the like, establishes the relation between the prior and the subsequent events and the accident consequences, solves the problem of insufficient quantization of the traditional accident analysis model, and displays a precursor consequence tie graph of the accident occurrence, and the Bow-tie model is schematically shown as an attached figure 1 and comprises five types of events including a basic event BE, an intermediate event IE, a top event CE, a control event SE and an consequence event OE.
The invention constructs an aeronautical safety dependent variable correlation identification model based on a Bow-tie model, and specific operations for determining the dependent variable of the aeronautical safety unsafe event comprise,
s11: determining a dependent variable X ═ X (X) of an aviation safety unsafe event by using an original security database of an airline company1,x2,…,xm) And corresponding different types of aviation safety unsafe events Y ═ Y (Y)1,y2,…,yn) Y ═ g (x), wherein Y ═ g (x)1,y2,…,yn) Representing n different types of unsafe events for aviation safety, X ═ X1,x2,…,xm) Representing a causative variable that causes an aviation safety unsafe event;
s12: the method comprises the steps that dependent variables, possible consequences and corresponding control measures of unsafe aviation events are in one-to-one correspondence with basic events BE, intermediate events IE, top events CE, control events SE and consequence events OE of a Bow-tie model; the method comprises the following steps that a dependent variable of an aviation safety unsafe event corresponds to a basic event BE of a Bow-tie model, a control measure of the aviation safety unsafe event corresponds to a control event SE of the Bow-tie model, a possible consequence of the aviation safety unsafe event corresponds to a consequence event OE of the Bow-tie model, and a previous-level slight consequence of the possible consequence of the aviation safety unsafe event corresponds to an intermediate event IE of the Bow-tie model;
s13: let the probability of each elementary event occurrence be pBEConsider that the presence of a branch can lead to the ith outcome event OEiProbability of occurrence, assuming that the k control events on the mth branch occur with a probability of
Figure BDA0002728797440000091
Then the probability of the occurrence of an outcome event for the mth branch is
Figure BDA0002728797440000092
In the formula, when a link event occurs on a branch,
Figure BDA0002728797440000093
when a link event does not occur,
Figure BDA0002728797440000094
s14: the consequence event OE is based on the result of step S13iThe occurrence probability of (2) is expressed as a function of the occurrence probability of n basic events and m consequent events, namely an aviation safety dependent variable correlation identification model,
Figure BDA0002728797440000095
s15: and determining key dependent variables of the unsafe events of the aviation safety from the dependent variables based on the correlation identification model of the dependent variables of the aviation safety established in the step S14.
Specific aviation safety dependent variables can be obtained by combining certain aviation company safety supervision data as shown in the following table 1, wherein all the dependent variables are not disclosed in the table 1 due to the privacy of an aviation company, and the key dependent variables of the aviation safety unsafe events mainly comprise aircraft systems, weather reasons, accident reasons, free planning, resource management and the like can be determined from the table 1.
TABLE 1 analysis of critical risks and hazard sources for aviation safety based on the Bow-tie model
Figure BDA0002728797440000101
S2: establishing an aviation safety scale data acquisition list according to the key dependent variable determined in the step S1, and labeling unsafe event data characteristics in the original safety database;
specifically, data of unsafe events in 2016-. The unsafe event data associated with these 6 and dependent variables is collected from the airline's original security database.
TABLE 2 dependent variable index
Figure BDA0002728797440000111
S3: for the unsafe event data characteristics in the step S2, considering characteristic reduction of specific safe output, realizing dimension reduction processing to obtain unsafe event modeling data of the airline company;
mutual information can not only represent the relationship between 2 random variables, but also reflect the strength of the relationship between the two random variables. Mutual information I (x, y) represents the amount of information obtained about x after receipt of message y, i.e. the uncertainty of event x before receipt of message y minus the uncertainty remaining after receipt of message y. Mutual information can measure linear correlation between variables and nonlinear relation between the variables, can measure the degree of interdependence between the two variables, represents the amount of information shared between the two variables, is not limited by the distribution form of the variables, and can be applied to probability distribution of any irregular shape.
Specifically, the mutual information principle is applied to perform dimension reduction processing on the unsafe event data in the step S2 to obtain unsafe event modeling data of the airline company;
s31: defining mutual information between a dependent variable X and an aviation safety unsafe event Y by using a mutual information principle
Figure BDA0002728797440000112
In the formula, I (X, Y) represents mutual information between a dependent variable X and an aviation safety unsafe event Y; f (X, Y) is a joint probability density function of the dependent variable X and the aviation safety unsafe event Y, and f (X) and f (Y) are edge probability density functions of the dependent variable X and the aviation safety unsafe event Y;
s32: for any set of multivariate data sequences (X, Y) with sample size N, I (X, Y) is calculated by
Figure BDA0002728797440000121
S33: solving mutual information between the dependent variable X and the unsafe event Y of aviation safety by utilizing a nuclear density estimation and low deviation determination sampling method;
s34: and determining the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y according to the mutual information obtained by the solution in the step S33, obtaining unsafe event modeling data of the airline company according to the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y, realizing the characteristic reduction of the dependent variable considering specific safety output, and finishing the dimension reduction.
When the cause variable X is completely unrelated to the aviation safety unsafe event Y or is independent from the aviation safety unsafe event Y, the mutual information is minimum, which means that no overlapping information exists between the two variables of the cause variable X and the aviation safety unsafe event Y; on the contrary, the higher the degree of mutual dependence between the two is, the larger the mutual information value is, the more the same information is contained, therefore, the strength of the causal relationship between the causative variable X and the aviation safety unsafe event Y can be obtained through solving, and the data with stronger causal relationship between the causative variable X and the aviation safety unsafe event Y is selected to obtain the modeling data of the unsafe event of the airline company.
S4: obtaining a training sample subset from the unsafe event modeling data obtained in the step S3 by combining a random forest model with a feature election and sample sampling method, and constructing an aviation safety situation prediction model based on the random forest model;
random forest is a machine learning method and can be used for sample classification; number of model generating decision trees (N)tree) And selecting the number of split attributes (M)try) Plays a key role in sample classification and influences the accuracy of results. The random forest regression has high tolerance to noise data and good prediction capability to high-dimensional data, and is composed of a set of independent regression decision trees { h (x, theta)k) K-1, 2, 3, …, K-K constitutes K integrated decision trees, denoted as
Figure BDA0002728797440000131
Wherein x is a safety dependent variable, K is the number of decision trees, and thetakAre independent and equally distributed random vectors.
In order to improve the model prediction accuracy and prevent overfitting, the random forest model is combined with the bagging method to obtain the training sample subset, and the random subspace method is combined to obtain the node splitting characteristic, as shown in figure 3, the method specifically comprises the following steps,
s41: normalizing all unsafe event data;
because different types of aviation unsafe events have different dimensions, normalization processing is carried out on all unsafe event data by adopting a specific method
Figure BDA0002728797440000132
In the formula, xjFor normalized data, xj.max、xj.minMaximum and minimum values, respectively; x is the number ofj.true、xj.preActual and predicted values, respectively.
S42: repeatedly sampling K training samples N which are equal to the original sample data set from the original samples after normalization processing through random sampling with a place to place, wherein each training sample forms a decision tree; the probability of the un-decimated samples is (1-1/N) N each time Bootstrap resampling is performed, the closer the probability of the un-decimated samples is to 1/e as N goes to infinity, and the un-selected data is called Out of bag (OOB) data, which is not involved in the regression tree construction, and thus can be used to estimate the predicted Out-of-bag data error (OOB error) and to evaluate the contribution of the independent variable to the dependent variable point. In addition, based on the OOB error, the generalization capability of the model can be verified, and the test set is not required to be used for verifying the precision of the model. K training samples obtained by the bag-out method are different, so that the difference of the regression tree is ensured.
The distribution of training samples and validation samples obtained by using the method is shown in figure 2. The scattered points represent values of parameters in the samples, and the boxplot contains distribution information such as sample mean values, maximum values, minimum values, median and the like.
S43: after K regression trees are obtained by a random subspace method through a bagging method, M in all dependent variables is randomly extracted by each split nodetryAnd selecting an optimal splitting mode from the feature subsets to split according to a classification regression tree method by taking the variables as feature subsets of the current node splitting.The regression tree obtained by the random subspace method has randomness and independence.
S44: the random forest regression model not only can accurately estimate the aviation safety situation, but also can give the importance scores of all variables, and the degree of influence of input on output. Based on the variable importance score statistic that the Keyny coefficient and the OOB error are common, the importance of each variable is obtained based on the OOB error in the invention; let input variable xj(j ═ 1, 2, 3, …, h), importance I in the kth treekFor the difference in estimated error of the data before and after the random substitution of the variable, i.e.
Figure BDA0002728797440000141
In the formula, h is the number of input dependent variables, NOOBIs the number of data samples outside the bag, fk(xn) For the nth sample value, f, of the data outside the bagk(x′n) Estimating the nth sample value of the data outside the bag on the kth tree after the variables are randomly replaced; i (-) is a discriminant function when
Figure BDA0002728797440000142
When I (-) takes on value of 1, otherwise it is 0.
S45: calculating an input variable xjImportance scores throughout random forests
Figure BDA0002728797440000143
S46: will input variable xjAnd averaging the importance scores obtained for multiple times to obtain the weight of each dependent variable.
Taking the aviation safety data of a certain airline company 2017 and 2019 as research objects, and constructing an aviation safety prediction model based on random forests, as shown in the attached figure 4.
S47: optimizing parameters of a regression aviation safety situation prediction model;
random forest uses default parameters to obtain good results, and the parameter adjusting process is to determine the number N of decision treestreesAnd the maximum number of features Leaf per tree split.The optimization algorithm can search the parameter universe to determine the optimal parameters of the algorithm, and the process and the result of the optimization of the random forest model parameters are shown in the attached figure 5.
The number of the decision trees is default to 100, and N is takentreesA decision tree number list is formed for 100, 200, 300 and 400. Default number of features m
Figure BDA0002728797440000151
(N is the total number of features), so taking Leaf as 3,4,5 constitutes the maximum feature number list. In FIG. 5, different combinations of k and m have a large influence on the prediction accuracy of the model, and when m is kept unchanged, the larger k is, the higher the accuracy of the model is; also, when k is fixed, the larger m, the higher the model accuracy. When the optimization result is that k is 200 and m is 2, the model precision is high and the variation is small. The curve of the variation of the aviation safety and cause effect prediction OOB error with the number K of trees is shown in the attached figure 6.
In the figure 6, the horizontal axis represents the number of random forest trees, and the vertical axis represents the mean square error of the model. As the number of trees increases, the model error gradually decreases, and the OOB error tends to level off when K is 150. Therefore, the number of the aviation safety prediction model trees is determined to be 150 in the invention.
S5: evaluating the prediction capability of the aviation safety situation prediction model by combining an original safety database of an airline company;
specifically, a coefficient of determination R is used2The root mean square error RMSE and the relative root mean square error rRMSE are used as evaluation indexes to evaluate the prediction capability of the regression aviation safety situation prediction model, wherein,
Figure BDA0002728797440000152
Figure BDA0002728797440000161
Figure BDA0002728797440000162
in the formula, x (i) represents the ith sample in the verification data set, and x (i) P represents the model prediction aviation safety situation obtained by using the ith sample point prediction variable in the verification data set.
S6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event.
The random forest model is used for evaluating the relative importance of each variable in the model by evaluating the capability of each variable for improving the prediction accuracy of the overall model and sequencing the contribution of the predicted variables. The more a variable contributes to the model prediction accuracy, the more important the variable is. Aviation safety prediction is built based on a random forest model, 5 features are extracted and subjected to importance sorting, and sorting results are shown in an attached figure 7.
As can be seen from the attached drawing 7, the importance of variables such as 'environment, facility equipment, external factors, human and management' on aviation safety influence is measured, the influence of the environment factors on aviation safety is the largest in the attached drawing 7, important monitoring is needed, for example, bird repelling frequency needs to be increased, the influence of bird collision factors on aviation safety is reduced, forecast prediction of severe weather environment is enhanced, flying personnel is informed to respond in time, a commander is advised to change a plan, and the crew is required to take airplane adaptability measures (deicing, sand prevention and the like) of special weather environment; the importance of human factors and facility equipment on aviation safety is the second, and the influence degrees of the human factors and the facility equipment are equivalent; the influence of management factors on the aviation safety prediction result is small, and the influence can be ignored for reducing the complexity of the model and improving the calculation efficiency.
Furthermore, the aviation safety cause and effect prediction method based on the random forest is subjected to prediction precision analysis, specifically, variable selection is carried out based on a random forest model, and training is carried out according to equipment factors, human factors, environmental factors and the like.
FIG. 8 is a graph of the scatter plot relationship between aviation safety and actual values predicted by a random forest method. The result shows that the predicted value and the measured value obtained by the random forest model have high correlation, the RMSE and the rRMSE are ideal, and the prediction of the aviation safety situation by using the random forest model is feasible.
Simulation experiment:
in aviation safety prediction, the influence mechanism of aviation safety is explained by adopting the relation between a prediction response variable and a cause input variable, and the prediction of the aviation safety on a space dimension or a time dimension is realized, such as an artificial neural network, a support vector machine and the like which are used for aviation safety prediction. However, the aviation safety change rule is influenced by a complex environment and various uncertainties, presents a complex high-dimensional nonlinear relation, and is difficult to predict and model. Other models have also been tried with the airline as the subject prior to the study, with the accuracy and efficiency effects shown in table 3.
TABLE 3 comparison of the effects of different prediction models
Figure BDA0002728797440000171
As can be seen from table 3, under the same sample size, the random forest model has better effect in determining the coefficient and predicting, the coefficient reaches 0.91, and the root mean square error reaches 9.7%. The method is more suitable for establishing an aviation safety prediction model than a correlation vector machine and a neural network. In addition, the correlation vector machine and the neural network model have defects in aviation safety modeling, which mainly shows that the actual influence mechanism in aviation safety is difficult to explain, and the importance of the input dependent variable to the aviation safety is unknown. While random forest regression is also a black box model, it provides other effective ways to assist interpretation, such as the importance of variables to model prediction. In addition, due to the introduction of two random parameters (k, m) in the random forest algorithm, the random forest algorithm has better anti-noise capability and is not easy to fall into overfitting.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. An aviation safety cause and effect prediction method based on random forests is characterized by comprising the following steps,
s1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model by using an original security database of an airline company, and determining a key dependent variable of an aviation safety unsafe event;
s2: establishing an aviation safety scale data acquisition list according to the key dependent variable determined in the step S1, and labeling unsafe event data characteristics in the original safety database;
s3: for the unsafe event data characteristics in the step S2, considering characteristic reduction of specific safe output, realizing dimension reduction processing to obtain unsafe event modeling data of the airline company;
s4: obtaining training and testing sample subsets from the unsafe event modeling data obtained in the step S3 by combining a random forest model with a feature election and sample sampling method, and constructing an aviation safety situation prediction model based on the random forest model;
s5: evaluating the prediction capability of the aviation safety situation prediction model by combining an original safety database of an airline company;
s6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event.
2. A random forest based aviation safety and cause prediction method according to claim 1, wherein the specific operation of step S1 includes,
s11: determining a dependent variable X ═ X (X) of an aviation safety unsafe event by using an original security database of an airline company1,x2,…,xm) And corresponding different types of aviation safety unsafe events Y ═ Y (Y)1,y2,…,yn) Y ═ g (x), wherein Y ═ g (x)1,y2,…,yn) Representing n different types of unsafe events for aviation safety, X ═ X1,x2,…,xm) Representing a causative variable that causes an aviation safety unsafe event;
s12: the method comprises the steps that dependent variables, possible consequences and corresponding control measures of unsafe aviation events are in one-to-one correspondence with basic events BE, intermediate events IE, top events CE, control events SE and consequence events OE of a Bow-tie model; the method comprises the following steps that a dependent variable of an aviation safety unsafe event corresponds to a basic event BE of a Bow-tie model, a control measure of the aviation safety unsafe event corresponds to a control event SE of the Bow-tie model, a possible consequence of the aviation safety unsafe event corresponds to a consequence event OE of the Bow-tie model, and a previous-level slight consequence of the possible consequence of the aviation safety unsafe event corresponds to an intermediate event IE of the Bow-tie model;
s13: let the probability of each elementary event occurrence be pBEConsider that the presence of a branch can lead to the ith outcome event OEiProbability of occurrence, assuming that the k control events on the mth branch occur with a probability of
Figure FDA0002728797430000023
Then the probability of the occurrence of an outcome event for the mth branch is
Figure FDA0002728797430000021
In the formula, when a link event occurs on a branch,
Figure FDA0002728797430000025
when a link event does not occur,
Figure FDA0002728797430000024
s14: the consequence event OE is based on the result of step S13iIs expressed asA function of the occurrence probability of the n basic events and the m consequence events, namely an aviation safety dependent variable correlation identification model,
Figure FDA0002728797430000022
s15: and determining key dependent variables of the unsafe events of the aviation safety from the dependent variables based on the correlation identification model of the dependent variables of the aviation safety established in the step S14.
3. A random forest based aviation safety and cause prediction method according to claim 2, wherein the specific operation of step S3 includes,
s31: defining mutual information between a dependent variable X and an aviation safety unsafe event Y by using a mutual information principle
Figure FDA0002728797430000031
In the formula, I (X, Y) represents mutual information between a dependent variable X and an aviation safety unsafe event Y; f (X, Y) is a joint probability density function of the dependent variable X and the aviation safety unsafe event Y, and f (X) and f (Y) are edge probability density functions of the dependent variable X and the aviation safety unsafe event Y;
s32: for any set of multivariate data sequences (X, Y) with sample size N, I (X, Y) is calculated by
Figure FDA0002728797430000032
S33: solving mutual information between the dependent variable X and the unsafe event Y of aviation safety by utilizing a nuclear density estimation and low deviation determination sampling method;
s34: and determining the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y according to the mutual information obtained by the solution in the step S33, and obtaining the unsafe event modeling data of the airline company according to the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y.
4. A random forest based aviation safety and cause prediction method according to claim 3, wherein the specific operation of step S4 includes,
s41: carrying out normalization processing on all unsafe event modeling data;
s42: performing random sampling with a place back, performing repeated sampling with a place back from the unsafe event modeling data after normalization processing to obtain K training samples N which are equal to the unsafe event modeling data, and forming a decision tree according to the training samples;
s43: randomly extracting M of all dependent variables from each split node on each decision treetryTaking the variables as feature subsets of the current node splitting, and selecting an optimal splitting mode from the feature subsets according to a classification regression tree method to split;
s44: calculating an input variable xj(j ═ 1, 2, 3, …, h) importance in the kth tree
Figure FDA0002728797430000041
In the formula, h is the number of input dependent variables, NOOBIs the number of data samples outside the bag, fk(xn) For the nth sample value, f, of the data outside the bagk(x′n) Estimating the nth sample value of the data outside the bag on the kth tree after the variables are randomly replaced; i (-) is a discriminant function when fk(xn)=fk(xn') the value of I (-) is 1, otherwise is 0;
s45: calculating an input variable xjImportance scores throughout random forests
Figure FDA0002728797430000042
S46: will input variable xjAveraging the importance scores obtained for multiple times to obtain the weight of each dependent variable;
s47: and optimizing parameters of the regression aviation safety situation prediction model.
5. A random forest based aviation safety and cause prediction method according to claim 4, wherein the specific operation of the normalization process in step S41 includes,
utilization of all accident causative variables
Figure FDA0002728797430000043
Performing dimensionless treatment, wherein xjFor normalized data, xj.max、xj.minMaximum and minimum values, respectively; x is the number ofj.true、xj.preActual and predicted values, respectively.
6. A method as claimed in claim 5, wherein the operation of step S5 includes applying a decision coefficient R2The root mean square error RMSE and the relative root mean square error rRMSE are used as evaluation indexes to evaluate the prediction capability of the regression aviation safety situation prediction model, wherein,
Figure FDA0002728797430000051
Figure FDA0002728797430000052
Figure FDA0002728797430000053
in the formula, x (i) represents the ith sample in the verification data set, and x (i) P represents the model prediction aviation safety situation obtained by using the ith sample point prediction variable in the verification data set.
CN202011111711.8A 2020-10-16 2020-10-16 Aviation safety causal prediction method based on random forest Active CN112257914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011111711.8A CN112257914B (en) 2020-10-16 2020-10-16 Aviation safety causal prediction method based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011111711.8A CN112257914B (en) 2020-10-16 2020-10-16 Aviation safety causal prediction method based on random forest

Publications (2)

Publication Number Publication Date
CN112257914A true CN112257914A (en) 2021-01-22
CN112257914B CN112257914B (en) 2023-06-06

Family

ID=74244475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011111711.8A Active CN112257914B (en) 2020-10-16 2020-10-16 Aviation safety causal prediction method based on random forest

Country Status (1)

Country Link
CN (1) CN112257914B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113919186A (en) * 2021-12-14 2022-01-11 中国民航大学 Event tree-based method for calculating severity of synthetic consequence of primary overrun event
CN114997549A (en) * 2022-08-08 2022-09-02 阿里巴巴(中国)有限公司 Interpretation method, device and equipment of black box model
CN115048874A (en) * 2022-08-16 2022-09-13 北京航空航天大学 Aircraft design parameter estimation method based on machine learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276370A (en) * 2019-05-05 2019-09-24 南京理工大学 A kind of road traffic accident risk Factor Analysis method based on random forest
US20200074306A1 (en) * 2018-08-31 2020-03-05 Ca, Inc. Feature subset evolution by random decision forest accuracy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074306A1 (en) * 2018-08-31 2020-03-05 Ca, Inc. Feature subset evolution by random decision forest accuracy
CN110276370A (en) * 2019-05-05 2019-09-24 南京理工大学 A kind of road traffic accident risk Factor Analysis method based on random forest

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王孝军等: "基于随机森林算法的航空发动机振动趋势预测", 《燃气涡轮试验与研究》 *
王衍洋等: "民航安全指数结果分析与预测", 《北京航空航天大学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113919186A (en) * 2021-12-14 2022-01-11 中国民航大学 Event tree-based method for calculating severity of synthetic consequence of primary overrun event
WO2023108928A1 (en) * 2021-12-14 2023-06-22 中国民航大学 Event tree-based flight exceedance event comprehensive consequence severity calculation method
CN114997549A (en) * 2022-08-08 2022-09-02 阿里巴巴(中国)有限公司 Interpretation method, device and equipment of black box model
CN114997549B (en) * 2022-08-08 2022-10-28 阿里巴巴(中国)有限公司 Interpretation method, device and equipment of black box model
CN115048874A (en) * 2022-08-16 2022-09-13 北京航空航天大学 Aircraft design parameter estimation method based on machine learning
CN115048874B (en) * 2022-08-16 2023-01-24 北京航空航天大学 Aircraft design parameter estimation method based on machine learning

Also Published As

Publication number Publication date
CN112257914B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN109766583B (en) Aircraft engine life prediction method based on unlabeled, unbalanced and initial value uncertain data
CN108960303B (en) Unmanned aerial vehicle flight data anomaly detection method based on LSTM
CN112257914A (en) Aviation safety cause and effect prediction method based on random forest
CN103974311B (en) Based on the Condition Monitoring Data throat floater detection method for improving Gaussian process regression model
CN102208028B (en) Fault predicting and diagnosing method suitable for dynamic complex system
CN111680875B (en) Unmanned aerial vehicle state risk fuzzy comprehensive evaluation method based on probability baseline model
CN110033135A (en) The train braking system failure prediction method that Multivariate Time Series feature is reinforced
CN114297036B (en) Data processing method, device, electronic equipment and readable storage medium
Mathew et al. Regression kernel for prognostics with support vector machines
CN108154256A (en) The determining method and device of forecasting risk value, storage medium
CN112257935B (en) Aviation safety prediction method based on LSTM-RBF neural network model
Subramanian et al. Deep-learning based time series forecasting of go-around incidents in the national airspace system
CN114580545A (en) Wind turbine generator gearbox fault early warning method based on fusion model
Wawrzyniak et al. Data-driven models in machine learning for crime prediction
CN112150304A (en) Power grid running state track stability prejudging method and system and storage medium
CN116957331A (en) Risk passenger flow range prediction method and device
CN111967308A (en) Online road surface unevenness identification method and system
CN114978968A (en) Micro-service anomaly detection method and device, computer equipment and storage medium
CN113989550A (en) Electric vehicle charging pile operation state prediction method based on CNN and LSTM hybrid network
Dang et al. seq2graph: Discovering dynamic non-linear dependencies from multivariate time series
Dui et al. Reliability Evaluation and Prediction Method with Small Samples.
Xia et al. Degradation prediction and rolling predictive maintenance policy for multi-sensor systems based on two-dimensional self-attention
Kuşkapan et al. Examination of Aircraft Accidents That Occurred in the Last 20 Years in the World
CN113139344A (en) Civil aircraft operation risk assessment method oriented to multiple failure modes
Vachtsevanos et al. Prognosis: Challenges, Precepts, Myths and Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant