CN112257914A - Aviation safety cause and effect prediction method based on random forest - Google Patents
Aviation safety cause and effect prediction method based on random forest Download PDFInfo
- Publication number
- CN112257914A CN112257914A CN202011111711.8A CN202011111711A CN112257914A CN 112257914 A CN112257914 A CN 112257914A CN 202011111711 A CN202011111711 A CN 202011111711A CN 112257914 A CN112257914 A CN 112257914A
- Authority
- CN
- China
- Prior art keywords
- aviation safety
- event
- unsafe
- prediction
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 52
- 230000000694 effects Effects 0.000 title claims abstract description 14
- 230000001419 dependent effect Effects 0.000 claims abstract description 84
- 238000004458 analytical method Methods 0.000 claims abstract description 10
- 230000009467 reduction Effects 0.000 claims abstract description 10
- 238000012163 sequencing technique Methods 0.000 claims abstract description 7
- 238000002372 labelling Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 238000003066 decision tree Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 12
- 230000001364 causal effect Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000011158 quantitative evaluation Methods 0.000 abstract 1
- 238000007726 management method Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 125000000484 butyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000001846 repelling effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 231100000279 safety data Toxicity 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Educational Administration (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an aviation safety cause and effect prediction method based on random forests, which comprises the following steps of S1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model, and determining a key dependent variable; s2: establishing an aviation safety scale data acquisition list, and labeling unsafe event data characteristics in an original safety database; s3: performing dimensionality reduction on the features of the unsafe event data to obtain unsafe event modeling data of an airline company; s4: constructing an aviation safety situation prediction model based on a random forest model; s5: evaluating the prediction capability of the aviation safety situation prediction model; s6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event. The prediction method disclosed by the invention can quantitatively detect the aviation safety situation which is difficult to measure, and is suitable for the aviation safety quantitative evaluation of the current state and the prediction of the future aviation safety situation.
Description
Technical Field
The invention relates to the technical field of aviation safety, in particular to an aviation safety cause and effect prediction method based on random forests.
Background
The aviation safety prediction reveals the accident occurrence rule by researching the internal relation of aviation accidents and causative factors, realizes the prediction of future safety trend, and the accurate aviation safety prediction model has important significance for realizing aviation safety intelligent management, advance decision, emergency management and the like;
however, the occurrence of an aviation accident is caused by the mutual interweaving influence of various factors, including a plurality of uncertain factors such as flight environment, task characteristics, airplane quality, maintenance management mechanism and personnel errors. The aviation accident inducement is complex, and the method has the characteristics of low frequency, randomness, time variability, high dimension and the like, and has large difficulty in predicting and modeling. At present, aviation safety prediction is mainly time series prediction, and the influence relation between the accident occurrence rule at the previous moment and the future accident occurrence trend is researched; establishing an aviation safety time sequence by adopting methods such as parameter, nonparametric, Bayesian network, artificial intelligence and the like to carry out prediction; the Wang Yanyang concave sample strip interpolation function carries out prediction analysis on the aviation safety comprehensive index, researches the influence relation between human factors and aviation safety, and improves the applicability to nonlinear data; the flight accidents are researched by people such as the butyl pinabine, the ganxu rising, the luxu mei and the like by using a BP neural network, an autoregressive moving average method and the like;
however, the models all belong to 'black box' models, the internal mechanisms of input and output are unknown, the influence of input on output is difficult to determine, the explanation of a prediction variable is unclear, reverse distribution according to an output prediction result cannot be realized, and the aviation safety management supporting force is limited.
Disclosure of Invention
Aiming at the existing problems, the invention aims to provide an aviation safety cause and effect prediction method based on random forests, which applies a random forest algorithm based on the Bow-Tie model combination to aviation safety cause and effect prediction to complete parameter optimization and cause variable contribution sequencing of a safety prediction model so as to predict aviation safety key factors and the change trend of aviation safety situations.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
an aviation safety cause and effect prediction method based on random forests is characterized by comprising the following steps,
s1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model by using an original security database of an airline company, and determining a key dependent variable of an aviation safety unsafe event;
s2: establishing an aviation safety scale data acquisition list according to the key dependent variable determined in the step S1, and labeling unsafe event data characteristics in the original safety database;
s3: for the unsafe event data characteristics in the step S2, considering characteristic reduction of specific safe output, realizing dimension reduction processing to obtain unsafe event modeling data of the airline company;
s4: obtaining training and testing sample subsets from the unsafe event modeling data obtained in the step S3 by combining a random forest model with a feature election and sample sampling method, and constructing an aviation safety situation prediction model based on the random forest model;
s5: evaluating the prediction capability of the aviation safety situation prediction model by combining an original safety database of an airline company;
s6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event.
Further, the specific operation of step S1 includes,
s11: determining a dependent variable X ═ X (X) of an aviation safety unsafe event by using an original security database of an airline company1,x2,…,xm) And corresponding different types of aviation safety unsafe events Y ═ Y (Y)1,y2,…,yn) Y ═ g (x), wherein Y ═ g (x)1,y2,…,yn) Representing n different types of unsafe events for aviation safety, X ═ X1,x2,…,xm) Representing a causative variable that causes an aviation safety unsafe event;
s12: the method comprises the steps that dependent variables, possible consequences and corresponding control measures of unsafe aviation events are in one-to-one correspondence with basic events BE, intermediate events IE, top events CE, control events SE and consequence events OE of a Bow-tie model; the method comprises the following steps that a dependent variable of an aviation safety unsafe event corresponds to a basic event BE of a Bow-tie model, a control measure of the aviation safety unsafe event corresponds to a control event SE of the Bow-tie model, a possible consequence of the aviation safety unsafe event corresponds to a consequence event OE of the Bow-tie model, and a previous-level slight consequence of the possible consequence of the aviation safety unsafe event corresponds to an intermediate event IE of the Bow-tie model;
s13: let every basic event happenHas a probability of pBEConsider that the presence of a branch can lead to the ith outcome event OEiProbability of occurrence, assuming that the k control events on the mth branch occur with a probability ofThen the probability of the occurrence of an outcome event for the mth branch is
s14: the consequence event OE is based on the result of step S13iThe occurrence probability of (2) is expressed as a function of the occurrence probability of n basic events and m consequent events, namely an aviation safety dependent variable correlation identification model,
s15: and determining key dependent variables of the unsafe events of the aviation safety from the dependent variables based on the correlation identification model of the dependent variables of the aviation safety established in the step S14.
Further, the specific operation of step S3 includes,
s31: defining mutual information between a dependent variable X and an aviation safety unsafe event Y by using a mutual information principleIn the formula, I (X, Y) represents mutual information between a dependent variable X and an aviation safety unsafe event Y; f (X, Y) is a joint probability density function of a dependent variable X and an aviation safety unsafe event YF (X) and f (Y) are the dependent variable X and the edge probability density function of the unsafe event Y of aviation safety;
s32: for any set of multivariate data sequences (X, Y) with sample size N, I (X, Y) is calculated by
S33: solving mutual information between the dependent variable X and the unsafe event Y of aviation safety by utilizing a nuclear density estimation and low deviation determination sampling method;
s34: and determining the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y according to the mutual information obtained by the solution in the step S33, and obtaining the unsafe event modeling data of the airline company according to the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y.
Further, the specific operation of step S4 includes,
s41: carrying out normalization processing on all unsafe event modeling data;
s42: performing random sampling with a place back, performing repeated sampling with a place back from the unsafe event modeling data after normalization processing to obtain K training samples N which are equal to the unsafe event modeling data, and forming a decision tree according to the training samples;
s43: randomly extracting M of all dependent variables from each split node on each decision treetryTaking the variables as feature subsets of the current node splitting, and selecting an optimal splitting mode from the feature subsets according to a classification regression tree method to split;
s44: calculating an input variable xj(j ═ 1, 2, 3, …, h) importance in the kth tree
In the formula, h is the number of input dependent variables, NOOBIs the number of data samples outside the bag, fk(xn) For the nth sample value, f, of the data outside the bagk(x′n) For the k tree out-of-bag data after random variable replacementAn nth sample estimate; i (-) is a discriminant function when fk(xn)=fk(xn') the value of I (-) is 1, otherwise is 0;
S46: will input variable xjAveraging the importance scores obtained for multiple times to obtain the weight of each dependent variable;
s47: and optimizing parameters of the regression aviation safety situation prediction model.
Further, the specific operations of the normalization process in step S41 include,
utilization of all accident causative variablesPerforming dimensionless treatment, wherein xjFor normalized data, xj.max、xj.minMaximum and minimum values, respectively; x is the number ofj.true、xj.preActual and predicted values, respectively.
Further, the specific operation of step S5 includes using the decision coefficient R2The root mean square error RMSE and the relative root mean square error rRMSE are used as evaluation indexes to evaluate the prediction capability of the regression aviation safety situation prediction model, wherein,
in the formula, x (i) represents the ith sample in the verification data set, and x (i) P represents the model prediction aviation safety situation obtained by using the ith sample point prediction variable in the verification data set.
The invention has the beneficial effects that:
1. the invention designs an aviation safety prediction method based on random forests, which can analyze and predict the relation change between an aviation safety unsafe event and key causative factors, is an important means for developing aviation intelligent management and assistant decision-making, and in various machine learning algorithms, a random forest model has obvious advantages in the aspects of parameter optimization, variable sequencing, subsequent variable analysis and interpretation and the like, and correlation coefficients and prediction precision are obviously superior to other models such as a linear model, a correlation vector machine model, a neural network model and the like, so that the method is more suitable for aviation safety trend prediction and key causative factor determination;
2. according to the method, the internal operation mechanism of aviation safety can be well reflected by constructing the aviation safety cause model based on the Bow-Tie model, and the fact is proved on the data mutual information index that the aviation safety can be well predicted by the random forest, the prediction precision reaches over 90 percent, and the random forest can well describe the nonlinear relation between aviation safety cause variables and aviation safety;
3. the prediction method provided by the invention converts the traditional time series prediction aiming at a single accident into the research of the influence mechanism between the causative factors and the accident, and further researches the accident change rule by researching the change relation of the causative factors (flight intensity, equipment failure, environment, weather and the like), so that the problems of few accident samples and difficulty in measurement are solved, and the aviation safety situation which is difficult to measure is quantitatively measurable;
4. the prediction method adopts a mutual information method to process the data of the unsafe events of the aviation safety, can accurately measure the causal relationship between the unsafe events of the aviation safety and the dependent variables, and can more accurately model the unsafe events and the dependent variables by utilizing the modeling data of the unsafe events determined according to the causal relationship between the unsafe events of the aviation safety and the dependent variables, thereby ensuring higher modeling precision and more accurate prediction result.
Drawings
FIG. 1 is a schematic diagram of the Bow-tie model of the present invention.
FIG. 2 is a sample distribution diagram of a training and validation database according to the present invention.
FIG. 3 is a flow chart of the prediction of the regression aviation safety situation based on the random forest according to the invention.
FIG. 4 is a diagram of an aviation safety random forest model of the present invention.
FIG. 5 is a diagram of the results of the random forest parameter optimization of the present invention.
FIG. 6 is an error estimation diagram of the aviation safety random forest model of the present invention.
FIG. 7 is a graph of the random forest regression model variable screening of the present invention.
FIG. 8 is a diagram of an aviation safety prediction estimation situation based on a random forest model according to the invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following further describes the technical solution of the present invention with reference to the drawings and the embodiments.
An aviation safety cause and effect prediction method based on random forests comprises the following steps,
s1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model by using an original security database of an airline company, and determining a key dependent variable of an aviation safety unsafe event;
the Bow-tie model inherits the advantages of a fault tree and an event tree of a safety analysis tool, integrates a plurality of factors such as accident occurrence reasons, precautionary measures, possible consequences, corresponding control measures and the like, establishes the relation between the prior and the subsequent events and the accident consequences, solves the problem of insufficient quantization of the traditional accident analysis model, and displays a precursor consequence tie graph of the accident occurrence, and the Bow-tie model is schematically shown as an attached figure 1 and comprises five types of events including a basic event BE, an intermediate event IE, a top event CE, a control event SE and an consequence event OE.
The invention constructs an aeronautical safety dependent variable correlation identification model based on a Bow-tie model, and specific operations for determining the dependent variable of the aeronautical safety unsafe event comprise,
s11: determining a dependent variable X ═ X (X) of an aviation safety unsafe event by using an original security database of an airline company1,x2,…,xm) And corresponding different types of aviation safety unsafe events Y ═ Y (Y)1,y2,…,yn) Y ═ g (x), wherein Y ═ g (x)1,y2,…,yn) Representing n different types of unsafe events for aviation safety, X ═ X1,x2,…,xm) Representing a causative variable that causes an aviation safety unsafe event;
s12: the method comprises the steps that dependent variables, possible consequences and corresponding control measures of unsafe aviation events are in one-to-one correspondence with basic events BE, intermediate events IE, top events CE, control events SE and consequence events OE of a Bow-tie model; the method comprises the following steps that a dependent variable of an aviation safety unsafe event corresponds to a basic event BE of a Bow-tie model, a control measure of the aviation safety unsafe event corresponds to a control event SE of the Bow-tie model, a possible consequence of the aviation safety unsafe event corresponds to a consequence event OE of the Bow-tie model, and a previous-level slight consequence of the possible consequence of the aviation safety unsafe event corresponds to an intermediate event IE of the Bow-tie model;
s13: let the probability of each elementary event occurrence be pBEConsider that the presence of a branch can lead to the ith outcome event OEiProbability of occurrence, assuming that the k control events on the mth branch occur with a probability ofThen the probability of the occurrence of an outcome event for the mth branch is
s14: the consequence event OE is based on the result of step S13iThe occurrence probability of (2) is expressed as a function of the occurrence probability of n basic events and m consequent events, namely an aviation safety dependent variable correlation identification model,
s15: and determining key dependent variables of the unsafe events of the aviation safety from the dependent variables based on the correlation identification model of the dependent variables of the aviation safety established in the step S14.
Specific aviation safety dependent variables can be obtained by combining certain aviation company safety supervision data as shown in the following table 1, wherein all the dependent variables are not disclosed in the table 1 due to the privacy of an aviation company, and the key dependent variables of the aviation safety unsafe events mainly comprise aircraft systems, weather reasons, accident reasons, free planning, resource management and the like can be determined from the table 1.
TABLE 1 analysis of critical risks and hazard sources for aviation safety based on the Bow-tie model
S2: establishing an aviation safety scale data acquisition list according to the key dependent variable determined in the step S1, and labeling unsafe event data characteristics in the original safety database;
specifically, data of unsafe events in 2016-. The unsafe event data associated with these 6 and dependent variables is collected from the airline's original security database.
TABLE 2 dependent variable index
S3: for the unsafe event data characteristics in the step S2, considering characteristic reduction of specific safe output, realizing dimension reduction processing to obtain unsafe event modeling data of the airline company;
mutual information can not only represent the relationship between 2 random variables, but also reflect the strength of the relationship between the two random variables. Mutual information I (x, y) represents the amount of information obtained about x after receipt of message y, i.e. the uncertainty of event x before receipt of message y minus the uncertainty remaining after receipt of message y. Mutual information can measure linear correlation between variables and nonlinear relation between the variables, can measure the degree of interdependence between the two variables, represents the amount of information shared between the two variables, is not limited by the distribution form of the variables, and can be applied to probability distribution of any irregular shape.
Specifically, the mutual information principle is applied to perform dimension reduction processing on the unsafe event data in the step S2 to obtain unsafe event modeling data of the airline company;
s31: defining mutual information between a dependent variable X and an aviation safety unsafe event Y by using a mutual information principleIn the formula, I (X, Y) represents mutual information between a dependent variable X and an aviation safety unsafe event Y; f (X, Y) is a joint probability density function of the dependent variable X and the aviation safety unsafe event Y, and f (X) and f (Y) are edge probability density functions of the dependent variable X and the aviation safety unsafe event Y;
s32: for any set of multivariate data sequences (X, Y) with sample size N, I (X, Y) is calculated by
S33: solving mutual information between the dependent variable X and the unsafe event Y of aviation safety by utilizing a nuclear density estimation and low deviation determination sampling method;
s34: and determining the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y according to the mutual information obtained by the solution in the step S33, obtaining unsafe event modeling data of the airline company according to the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y, realizing the characteristic reduction of the dependent variable considering specific safety output, and finishing the dimension reduction.
When the cause variable X is completely unrelated to the aviation safety unsafe event Y or is independent from the aviation safety unsafe event Y, the mutual information is minimum, which means that no overlapping information exists between the two variables of the cause variable X and the aviation safety unsafe event Y; on the contrary, the higher the degree of mutual dependence between the two is, the larger the mutual information value is, the more the same information is contained, therefore, the strength of the causal relationship between the causative variable X and the aviation safety unsafe event Y can be obtained through solving, and the data with stronger causal relationship between the causative variable X and the aviation safety unsafe event Y is selected to obtain the modeling data of the unsafe event of the airline company.
S4: obtaining a training sample subset from the unsafe event modeling data obtained in the step S3 by combining a random forest model with a feature election and sample sampling method, and constructing an aviation safety situation prediction model based on the random forest model;
random forest is a machine learning method and can be used for sample classification; number of model generating decision trees (N)tree) And selecting the number of split attributes (M)try) Plays a key role in sample classification and influences the accuracy of results. The random forest regression has high tolerance to noise data and good prediction capability to high-dimensional data, and is composed of a set of independent regression decision trees { h (x, theta)k) K-1, 2, 3, …, K-K constitutes K integrated decision trees, denoted asWherein x is a safety dependent variable, K is the number of decision trees, and thetakAre independent and equally distributed random vectors.
In order to improve the model prediction accuracy and prevent overfitting, the random forest model is combined with the bagging method to obtain the training sample subset, and the random subspace method is combined to obtain the node splitting characteristic, as shown in figure 3, the method specifically comprises the following steps,
s41: normalizing all unsafe event data;
because different types of aviation unsafe events have different dimensions, normalization processing is carried out on all unsafe event data by adopting a specific methodIn the formula, xjFor normalized data, xj.max、xj.minMaximum and minimum values, respectively; x is the number ofj.true、xj.preActual and predicted values, respectively.
S42: repeatedly sampling K training samples N which are equal to the original sample data set from the original samples after normalization processing through random sampling with a place to place, wherein each training sample forms a decision tree; the probability of the un-decimated samples is (1-1/N) N each time Bootstrap resampling is performed, the closer the probability of the un-decimated samples is to 1/e as N goes to infinity, and the un-selected data is called Out of bag (OOB) data, which is not involved in the regression tree construction, and thus can be used to estimate the predicted Out-of-bag data error (OOB error) and to evaluate the contribution of the independent variable to the dependent variable point. In addition, based on the OOB error, the generalization capability of the model can be verified, and the test set is not required to be used for verifying the precision of the model. K training samples obtained by the bag-out method are different, so that the difference of the regression tree is ensured.
The distribution of training samples and validation samples obtained by using the method is shown in figure 2. The scattered points represent values of parameters in the samples, and the boxplot contains distribution information such as sample mean values, maximum values, minimum values, median and the like.
S43: after K regression trees are obtained by a random subspace method through a bagging method, M in all dependent variables is randomly extracted by each split nodetryAnd selecting an optimal splitting mode from the feature subsets to split according to a classification regression tree method by taking the variables as feature subsets of the current node splitting.The regression tree obtained by the random subspace method has randomness and independence.
S44: the random forest regression model not only can accurately estimate the aviation safety situation, but also can give the importance scores of all variables, and the degree of influence of input on output. Based on the variable importance score statistic that the Keyny coefficient and the OOB error are common, the importance of each variable is obtained based on the OOB error in the invention; let input variable xj(j ═ 1, 2, 3, …, h), importance I in the kth treekFor the difference in estimated error of the data before and after the random substitution of the variable, i.e.
In the formula, h is the number of input dependent variables, NOOBIs the number of data samples outside the bag, fk(xn) For the nth sample value, f, of the data outside the bagk(x′n) Estimating the nth sample value of the data outside the bag on the kth tree after the variables are randomly replaced; i (-) is a discriminant function whenWhen I (-) takes on value of 1, otherwise it is 0.
S46: will input variable xjAnd averaging the importance scores obtained for multiple times to obtain the weight of each dependent variable.
Taking the aviation safety data of a certain airline company 2017 and 2019 as research objects, and constructing an aviation safety prediction model based on random forests, as shown in the attached figure 4.
S47: optimizing parameters of a regression aviation safety situation prediction model;
random forest uses default parameters to obtain good results, and the parameter adjusting process is to determine the number N of decision treestreesAnd the maximum number of features Leaf per tree split.The optimization algorithm can search the parameter universe to determine the optimal parameters of the algorithm, and the process and the result of the optimization of the random forest model parameters are shown in the attached figure 5.
The number of the decision trees is default to 100, and N is takentreesA decision tree number list is formed for 100, 200, 300 and 400. Default number of features m(N is the total number of features), so taking Leaf as 3,4,5 constitutes the maximum feature number list. In FIG. 5, different combinations of k and m have a large influence on the prediction accuracy of the model, and when m is kept unchanged, the larger k is, the higher the accuracy of the model is; also, when k is fixed, the larger m, the higher the model accuracy. When the optimization result is that k is 200 and m is 2, the model precision is high and the variation is small. The curve of the variation of the aviation safety and cause effect prediction OOB error with the number K of trees is shown in the attached figure 6.
In the figure 6, the horizontal axis represents the number of random forest trees, and the vertical axis represents the mean square error of the model. As the number of trees increases, the model error gradually decreases, and the OOB error tends to level off when K is 150. Therefore, the number of the aviation safety prediction model trees is determined to be 150 in the invention.
S5: evaluating the prediction capability of the aviation safety situation prediction model by combining an original safety database of an airline company;
specifically, a coefficient of determination R is used2The root mean square error RMSE and the relative root mean square error rRMSE are used as evaluation indexes to evaluate the prediction capability of the regression aviation safety situation prediction model, wherein,
in the formula, x (i) represents the ith sample in the verification data set, and x (i) P represents the model prediction aviation safety situation obtained by using the ith sample point prediction variable in the verification data set.
S6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event.
The random forest model is used for evaluating the relative importance of each variable in the model by evaluating the capability of each variable for improving the prediction accuracy of the overall model and sequencing the contribution of the predicted variables. The more a variable contributes to the model prediction accuracy, the more important the variable is. Aviation safety prediction is built based on a random forest model, 5 features are extracted and subjected to importance sorting, and sorting results are shown in an attached figure 7.
As can be seen from the attached drawing 7, the importance of variables such as 'environment, facility equipment, external factors, human and management' on aviation safety influence is measured, the influence of the environment factors on aviation safety is the largest in the attached drawing 7, important monitoring is needed, for example, bird repelling frequency needs to be increased, the influence of bird collision factors on aviation safety is reduced, forecast prediction of severe weather environment is enhanced, flying personnel is informed to respond in time, a commander is advised to change a plan, and the crew is required to take airplane adaptability measures (deicing, sand prevention and the like) of special weather environment; the importance of human factors and facility equipment on aviation safety is the second, and the influence degrees of the human factors and the facility equipment are equivalent; the influence of management factors on the aviation safety prediction result is small, and the influence can be ignored for reducing the complexity of the model and improving the calculation efficiency.
Furthermore, the aviation safety cause and effect prediction method based on the random forest is subjected to prediction precision analysis, specifically, variable selection is carried out based on a random forest model, and training is carried out according to equipment factors, human factors, environmental factors and the like.
FIG. 8 is a graph of the scatter plot relationship between aviation safety and actual values predicted by a random forest method. The result shows that the predicted value and the measured value obtained by the random forest model have high correlation, the RMSE and the rRMSE are ideal, and the prediction of the aviation safety situation by using the random forest model is feasible.
Simulation experiment:
in aviation safety prediction, the influence mechanism of aviation safety is explained by adopting the relation between a prediction response variable and a cause input variable, and the prediction of the aviation safety on a space dimension or a time dimension is realized, such as an artificial neural network, a support vector machine and the like which are used for aviation safety prediction. However, the aviation safety change rule is influenced by a complex environment and various uncertainties, presents a complex high-dimensional nonlinear relation, and is difficult to predict and model. Other models have also been tried with the airline as the subject prior to the study, with the accuracy and efficiency effects shown in table 3.
TABLE 3 comparison of the effects of different prediction models
As can be seen from table 3, under the same sample size, the random forest model has better effect in determining the coefficient and predicting, the coefficient reaches 0.91, and the root mean square error reaches 9.7%. The method is more suitable for establishing an aviation safety prediction model than a correlation vector machine and a neural network. In addition, the correlation vector machine and the neural network model have defects in aviation safety modeling, which mainly shows that the actual influence mechanism in aviation safety is difficult to explain, and the importance of the input dependent variable to the aviation safety is unknown. While random forest regression is also a black box model, it provides other effective ways to assist interpretation, such as the importance of variables to model prediction. In addition, due to the introduction of two random parameters (k, m) in the random forest algorithm, the random forest algorithm has better anti-noise capability and is not easy to fall into overfitting.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (6)
1. An aviation safety cause and effect prediction method based on random forests is characterized by comprising the following steps,
s1: constructing an aviation safety dependent variable correlation identification model based on a Bow-tie model by using an original security database of an airline company, and determining a key dependent variable of an aviation safety unsafe event;
s2: establishing an aviation safety scale data acquisition list according to the key dependent variable determined in the step S1, and labeling unsafe event data characteristics in the original safety database;
s3: for the unsafe event data characteristics in the step S2, considering characteristic reduction of specific safe output, realizing dimension reduction processing to obtain unsafe event modeling data of the airline company;
s4: obtaining training and testing sample subsets from the unsafe event modeling data obtained in the step S3 by combining a random forest model with a feature election and sample sampling method, and constructing an aviation safety situation prediction model based on the random forest model;
s5: evaluating the prediction capability of the aviation safety situation prediction model by combining an original safety database of an airline company;
s6: and (4) according to the analysis of the influence of the dependent variable on the prediction result of the aviation safety situation, sequencing the contribution of the key dependent variable to the aviation safety unsafe event.
2. A random forest based aviation safety and cause prediction method according to claim 1, wherein the specific operation of step S1 includes,
s11: determining a dependent variable X ═ X (X) of an aviation safety unsafe event by using an original security database of an airline company1,x2,…,xm) And corresponding different types of aviation safety unsafe events Y ═ Y (Y)1,y2,…,yn) Y ═ g (x), wherein Y ═ g (x)1,y2,…,yn) Representing n different types of unsafe events for aviation safety, X ═ X1,x2,…,xm) Representing a causative variable that causes an aviation safety unsafe event;
s12: the method comprises the steps that dependent variables, possible consequences and corresponding control measures of unsafe aviation events are in one-to-one correspondence with basic events BE, intermediate events IE, top events CE, control events SE and consequence events OE of a Bow-tie model; the method comprises the following steps that a dependent variable of an aviation safety unsafe event corresponds to a basic event BE of a Bow-tie model, a control measure of the aviation safety unsafe event corresponds to a control event SE of the Bow-tie model, a possible consequence of the aviation safety unsafe event corresponds to a consequence event OE of the Bow-tie model, and a previous-level slight consequence of the possible consequence of the aviation safety unsafe event corresponds to an intermediate event IE of the Bow-tie model;
s13: let the probability of each elementary event occurrence be pBEConsider that the presence of a branch can lead to the ith outcome event OEiProbability of occurrence, assuming that the k control events on the mth branch occur with a probability ofThen the probability of the occurrence of an outcome event for the mth branch is
s14: the consequence event OE is based on the result of step S13iIs expressed asA function of the occurrence probability of the n basic events and the m consequence events, namely an aviation safety dependent variable correlation identification model,
s15: and determining key dependent variables of the unsafe events of the aviation safety from the dependent variables based on the correlation identification model of the dependent variables of the aviation safety established in the step S14.
3. A random forest based aviation safety and cause prediction method according to claim 2, wherein the specific operation of step S3 includes,
s31: defining mutual information between a dependent variable X and an aviation safety unsafe event Y by using a mutual information principleIn the formula, I (X, Y) represents mutual information between a dependent variable X and an aviation safety unsafe event Y; f (X, Y) is a joint probability density function of the dependent variable X and the aviation safety unsafe event Y, and f (X) and f (Y) are edge probability density functions of the dependent variable X and the aviation safety unsafe event Y;
s32: for any set of multivariate data sequences (X, Y) with sample size N, I (X, Y) is calculated by
S33: solving mutual information between the dependent variable X and the unsafe event Y of aviation safety by utilizing a nuclear density estimation and low deviation determination sampling method;
s34: and determining the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y according to the mutual information obtained by the solution in the step S33, and obtaining the unsafe event modeling data of the airline company according to the strength of the causal relationship between the dependent variable X and the aviation safety unsafe event Y.
4. A random forest based aviation safety and cause prediction method according to claim 3, wherein the specific operation of step S4 includes,
s41: carrying out normalization processing on all unsafe event modeling data;
s42: performing random sampling with a place back, performing repeated sampling with a place back from the unsafe event modeling data after normalization processing to obtain K training samples N which are equal to the unsafe event modeling data, and forming a decision tree according to the training samples;
s43: randomly extracting M of all dependent variables from each split node on each decision treetryTaking the variables as feature subsets of the current node splitting, and selecting an optimal splitting mode from the feature subsets according to a classification regression tree method to split;
s44: calculating an input variable xj(j ═ 1, 2, 3, …, h) importance in the kth tree
In the formula, h is the number of input dependent variables, NOOBIs the number of data samples outside the bag, fk(xn) For the nth sample value, f, of the data outside the bagk(x′n) Estimating the nth sample value of the data outside the bag on the kth tree after the variables are randomly replaced; i (-) is a discriminant function when fk(xn)=fk(xn') the value of I (-) is 1, otherwise is 0;
S46: will input variable xjAveraging the importance scores obtained for multiple times to obtain the weight of each dependent variable;
s47: and optimizing parameters of the regression aviation safety situation prediction model.
5. A random forest based aviation safety and cause prediction method according to claim 4, wherein the specific operation of the normalization process in step S41 includes,
6. A method as claimed in claim 5, wherein the operation of step S5 includes applying a decision coefficient R2The root mean square error RMSE and the relative root mean square error rRMSE are used as evaluation indexes to evaluate the prediction capability of the regression aviation safety situation prediction model, wherein,
in the formula, x (i) represents the ith sample in the verification data set, and x (i) P represents the model prediction aviation safety situation obtained by using the ith sample point prediction variable in the verification data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011111711.8A CN112257914B (en) | 2020-10-16 | 2020-10-16 | Aviation safety causal prediction method based on random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011111711.8A CN112257914B (en) | 2020-10-16 | 2020-10-16 | Aviation safety causal prediction method based on random forest |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112257914A true CN112257914A (en) | 2021-01-22 |
CN112257914B CN112257914B (en) | 2023-06-06 |
Family
ID=74244475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011111711.8A Active CN112257914B (en) | 2020-10-16 | 2020-10-16 | Aviation safety causal prediction method based on random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112257914B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113919186A (en) * | 2021-12-14 | 2022-01-11 | 中国民航大学 | Event tree-based method for calculating severity of synthetic consequence of primary overrun event |
CN114997549A (en) * | 2022-08-08 | 2022-09-02 | 阿里巴巴(中国)有限公司 | Interpretation method, device and equipment of black box model |
CN115048874A (en) * | 2022-08-16 | 2022-09-13 | 北京航空航天大学 | Aircraft design parameter estimation method based on machine learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276370A (en) * | 2019-05-05 | 2019-09-24 | 南京理工大学 | A kind of road traffic accident risk Factor Analysis method based on random forest |
US20200074306A1 (en) * | 2018-08-31 | 2020-03-05 | Ca, Inc. | Feature subset evolution by random decision forest accuracy |
-
2020
- 2020-10-16 CN CN202011111711.8A patent/CN112257914B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200074306A1 (en) * | 2018-08-31 | 2020-03-05 | Ca, Inc. | Feature subset evolution by random decision forest accuracy |
CN110276370A (en) * | 2019-05-05 | 2019-09-24 | 南京理工大学 | A kind of road traffic accident risk Factor Analysis method based on random forest |
Non-Patent Citations (2)
Title |
---|
王孝军等: "基于随机森林算法的航空发动机振动趋势预测", 《燃气涡轮试验与研究》 * |
王衍洋等: "民航安全指数结果分析与预测", 《北京航空航天大学学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113919186A (en) * | 2021-12-14 | 2022-01-11 | 中国民航大学 | Event tree-based method for calculating severity of synthetic consequence of primary overrun event |
WO2023108928A1 (en) * | 2021-12-14 | 2023-06-22 | 中国民航大学 | Event tree-based flight exceedance event comprehensive consequence severity calculation method |
CN114997549A (en) * | 2022-08-08 | 2022-09-02 | 阿里巴巴(中国)有限公司 | Interpretation method, device and equipment of black box model |
CN114997549B (en) * | 2022-08-08 | 2022-10-28 | 阿里巴巴(中国)有限公司 | Interpretation method, device and equipment of black box model |
CN115048874A (en) * | 2022-08-16 | 2022-09-13 | 北京航空航天大学 | Aircraft design parameter estimation method based on machine learning |
CN115048874B (en) * | 2022-08-16 | 2023-01-24 | 北京航空航天大学 | Aircraft design parameter estimation method based on machine learning |
Also Published As
Publication number | Publication date |
---|---|
CN112257914B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109766583B (en) | Aircraft engine life prediction method based on unlabeled, unbalanced and initial value uncertain data | |
CN108960303B (en) | Unmanned aerial vehicle flight data anomaly detection method based on LSTM | |
CN112257914A (en) | Aviation safety cause and effect prediction method based on random forest | |
CN103974311B (en) | Based on the Condition Monitoring Data throat floater detection method for improving Gaussian process regression model | |
CN102208028B (en) | Fault predicting and diagnosing method suitable for dynamic complex system | |
CN111680875B (en) | Unmanned aerial vehicle state risk fuzzy comprehensive evaluation method based on probability baseline model | |
CN110033135A (en) | The train braking system failure prediction method that Multivariate Time Series feature is reinforced | |
CN114297036B (en) | Data processing method, device, electronic equipment and readable storage medium | |
Mathew et al. | Regression kernel for prognostics with support vector machines | |
CN108154256A (en) | The determining method and device of forecasting risk value, storage medium | |
CN112257935B (en) | Aviation safety prediction method based on LSTM-RBF neural network model | |
Subramanian et al. | Deep-learning based time series forecasting of go-around incidents in the national airspace system | |
CN114580545A (en) | Wind turbine generator gearbox fault early warning method based on fusion model | |
Wawrzyniak et al. | Data-driven models in machine learning for crime prediction | |
CN112150304A (en) | Power grid running state track stability prejudging method and system and storage medium | |
CN116957331A (en) | Risk passenger flow range prediction method and device | |
CN111967308A (en) | Online road surface unevenness identification method and system | |
CN114978968A (en) | Micro-service anomaly detection method and device, computer equipment and storage medium | |
CN113989550A (en) | Electric vehicle charging pile operation state prediction method based on CNN and LSTM hybrid network | |
Dang et al. | seq2graph: Discovering dynamic non-linear dependencies from multivariate time series | |
Dui et al. | Reliability Evaluation and Prediction Method with Small Samples. | |
Xia et al. | Degradation prediction and rolling predictive maintenance policy for multi-sensor systems based on two-dimensional self-attention | |
Kuşkapan et al. | Examination of Aircraft Accidents That Occurred in the Last 20 Years in the World | |
CN113139344A (en) | Civil aircraft operation risk assessment method oriented to multiple failure modes | |
Vachtsevanos et al. | Prognosis: Challenges, Precepts, Myths and Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |