CN108197081A - A kind of data actuarial model method for building up of flight delay danger - Google Patents

A kind of data actuarial model method for building up of flight delay danger Download PDF

Info

Publication number
CN108197081A
CN108197081A CN201711074657.2A CN201711074657A CN108197081A CN 108197081 A CN108197081 A CN 108197081A CN 201711074657 A CN201711074657 A CN 201711074657A CN 108197081 A CN108197081 A CN 108197081A
Authority
CN
China
Prior art keywords
data
weather
airport
calculating
flight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711074657.2A
Other languages
Chinese (zh)
Inventor
翟文君
李静
李鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jingzhi Network Technology Co Ltd
Original Assignee
Shanghai Jingzhi Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jingzhi Network Technology Co Ltd filed Critical Shanghai Jingzhi Network Technology Co Ltd
Priority to CN201711074657.2A priority Critical patent/CN108197081A/en
Publication of CN108197081A publication Critical patent/CN108197081A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of data actuarial model method for building up of flight delay danger, the data actuarial model of flight delay danger includes primary data preprocessing module, data model computing module, prediction result analysis module, including step:(1) primary data is pre-processed;(2) data target is determined, master data, additional data for flight are calculated;(3) data model is built, obtained multiple and different indexs are subjected to homing method calculating by R language;(4) data model is run multiple times and obtains multiple weighted values, weighted value is taken into mean value computation final weight value;(5) the result of different index and data model is carried out that final prediction result is calculated;(6) the difference of prediction result and actual value is compared, adjusts data model so that data model is fitted the delay value of flight, obtains more accurately prediction result.The present invention builds data model to flight data index using R language by homing method, has the characteristics that prediction more accurately.

Description

A kind of data actuarial model method for building up of flight delay danger
Technical field
The present invention relates to a kind of data actuarial model method for building up of flight delay danger, more particularly to a kind of prediction is accurate The data actuarial model method for building up of flight delay danger, belongs to civil aviaton's information data and utilizes analyzing and processing field.
Background technology
At present on the market flight delay prediction be generally by take historical record be averaged calculate delay rate, can not The prediction case of flight is accurately provided, therefore cannot accurately be applied to very much insurance industry, particularly through after a period of time Practice after, the situation of loss of capital is not broken away from flight delay danger, so that the numerous and confused undercarriage flight delay of major insurance company is protected Dangerous product.Moreover, because neither one is accurately delayed prediction, many preferable flight choosings of travelling merchants personage's neither one It selects.
Invention content
The data actuarial model method for building up of flight delay danger of the present invention discloses new scheme, using R language to flight Data target by homing method build data model, solve existing scheme flight delay probabilistic model it is excessively coarse, can not The problem of supporting the dynamic adjustment of the dangerous odds of delay.
The data actuarial model method for building up of flight delay danger of the present invention, the data actuarial model of flight delay danger are included just Beginning data preprocessing module, data model computing module, prediction result analysis module, including step:(1) primary data is carried out Pretreatment;(2) data target is determined, master data, additional data for flight are calculated;(3) data model is built, will The multiple and different indexs arrived carry out homing method calculating by R language;(4) data model is run multiple times and obtains multiple weighted values, Weighted value is taken into mean value computation final weight value;(5) the result of different index and data model be calculated final Prediction result;(6) the difference of prediction result and actual value is compared, adjusts data model so that data model is fitted the delay of flight Value, obtains more accurately prediction result.
Further, the primary data preprocessing module of the method for this programme includes history flight delay database, following boat Shift plan database, weather data library, the process that primary data preprocessing module pre-processes primary data include step: (1) data cleansing:Vacancy value is handled, handles noise data, deletes repetition values;(2) data integration:Integrate multiple data sources, processing The redundancy issue of data source;(3) data convert:Carry out data normalization and data aggregation;(4) data regularization:By identical data One kind is classified as, by continuous data demarcation interval.
Further, the data cleansing of the method for this programme, data integration, data transformation, data regularization include process:(1) needle Flight number, airport, airline, course line, terminal, region, city, flight status, weather data are encoded;(2) logarithm According to basic operations are carried out, the actual delay time is calculated, the period, stop over section, whether shares flight, whether calculate flight more Data;(3) shared, benefit is flown, the data of longest section flight are divided into fine screening data item;(4) according to the difference in weather data source Data are integrated;(5) weather, delay grade, the information of stop over section are filled up to dusting cover data item, and fill a vacancy value.
Further, the data target of the method for this programme includes:
(1) base values:Calculate day previous delay, calculate a few days ago secondary delay, calculate delay a few days ago three times, It calculates the mean delay of the first seven time of day, the maximum value of history delay, calculate 7 days a few days ago course line delay mean values;
(2) airport, period, boat department:Departure airport, departure time section, boat department is calculating the mean delay of 1 day a few days ago; Departure airport, departure time section, boat department is calculating the mean delay of 2 days a few days ago;Departure airport, departure time section, boat department Calculating the mean delay of 3 days a few days ago;Departure airport, departure time section, boat department is calculating the mean delay in 7 days a few days ago; It arrives at the airport, arrival time section, boat department is calculating the mean delay of 1 day a few days ago;It arrives at the airport, arrival time section, boat department Calculating the mean delay of 2 days a few days ago;It arrives at the airport, arrival time section, boat department is calculating the mean delay of 3 days a few days ago;It arrives Up to airport, arrival time section, boat department is calculating the mean delay of 7 days a few days ago;
(3) airport, period, weather:Calculate trimestral departure airport recently, departure time section, the average of weather According to the weather for predicting day according to acquisition is matched, the value being obtained;The nearest trimestral departure airport of calculating, departure time section, The average data of weather is matched, the value being obtained according to the weather for obtaining calculating day;Departure airport, departure time section, weather Calculating the mean delay of 1 day a few days ago;Departure airport, departure time section, weather are calculating the mean delay of 2 days a few days ago; Departure airport, departure time section, weather are calculating the mean delay of 3 days a few days ago;Departure airport, departure time section, weather Calculating the mean delay in 7 days a few days ago;Calculating is trimestral recently to arrive at the airport, departure time section, the average data of weather, Weather according to prediction day is obtained is matched, the value being obtained;Calculate trimestral departure airport recently, departure time section, day The average data of gas is matched, the value being obtained according to the weather for obtaining calculating day;It arrives at the airport, departure time section, weather Calculating the mean delay of 1 day a few days ago;It arrives at the airport, departure time section, weather is calculating the mean delay of 1 day a few days ago;It arrives Up to airport, departure time section, weather is calculating the mean delay of 1 day a few days ago;Arrive at the airport, the departure time section, weather Calculate the mean delay of 1 day a few days ago;
(4) airport, boat department, weather:Calculate trimestral departure airport recently, boat department, the average data of weather, according to obtaining The weather of prediction day is taken to be matched, the value being obtained;The nearest trimestral departure airport of calculating, boat department, the average data of weather, It is matched according to the weather for calculating day is obtained, the value being obtained;Departure airport, boat department, weather are averaged calculating a few days ago 1 day Delay;Departure airport, boat department, weather are calculating the mean delay of 2 days a few days ago;Departure airport, boat department, weather are calculating day The mean delay of first 3 days;Departure airport, boat department, weather are calculating the mean delay of 7 days a few days ago;It calculates trimestral recently It arrives at the airport, boat department, the average data of weather is matched, the value being obtained according to the weather for obtaining prediction day;Calculate nearest three It arrives at the airport within a month, boat department, the average data of weather is matched, the value that is obtained according to the weather that calculates day is obtained;It reaches Airport, boat department, weather are calculating the mean delay of 1 day a few days ago;It arrives at the airport, boat department, weather is calculating 2 days a few days ago flat It is delayed;It arrives at the airport, boat department, weather is calculating the mean delay of 3 days a few days ago;It arrives at the airport, boat department, weather is calculating The mean delay of 7 days a few days ago;
(5) preamble flight airport, period, boat department:Preamble flight arrives at the airport, and arrival time section, boat department is calculating The mean delay of 1 day a few days ago;Preamble flight arrives at the airport, arrival time section, being averaged and prolong in calculating 2 days a few days ago of boat department Accidentally;Preamble flight arrives at the airport, arrival time section, and boat department is calculating the mean delay of 3 days a few days ago;The arrival of preamble flight Airport, arrival time section, boat department is calculating the mean delay of 7 days a few days ago;
(6) preamble flight airport, period, weather:Preamble flight calculate recently it is trimestral arrive at the airport, the period, The average data of weather is matched, the value being obtained according to the weather for obtaining prediction day;Preamble flight is calculating nearest three months Arrive at the airport, the period, the average data of weather, according to obtain calculate day weather matched, the value being obtained;Preamble is navigated Class arrives at the airport, the period, and weather is calculating the mean delay of 1 day a few days ago;Preamble flight arrives at the airport, the period, Weather is calculating the mean delay of 2 days a few days ago;The arrival machine airport of preamble flight, period, weather are calculating 3 days a few days ago Mean delay;Preamble flight arrives at the airport, the period, and weather is calculating the mean delay of 7 days a few days ago;
(7) preamble flight airport, boat department, weather:Preamble flight calculate recently it is trimestral arrive at the airport, boat department, weather Average data, according to obtain prediction day weather matched, the value being obtained;Preamble flight is calculating trimestral recently arrive Up to airport, boat department, the average data of weather is matched, the value being obtained according to the weather for obtaining calculating day;Preamble flight arrives Up to airport, weather, boat department is calculating the mean delay of 1 day a few days ago;Preamble flight arrives at the airport, weather, and boat department is counting Calculate the mean delay of 2 days a few days ago;Preamble flight arrives at the airport, weather, and boat department is calculating the mean delay of 3 days a few days ago;Before Sequence flight arrives at the airport, weather, and boat department is calculating the mean delay of 7 days a few days ago.
Further, the data model of the method for this programme includes:Linear regression model (LRM), ridge regression model, Lasso return mould Type.
Further, the data set structure of the method for this programme includes process:(1) following flight data and future weather are obtained Data are combined;(2) the index value of following flight data is calculated according to corresponding Index Content;(3) gone through by R language by existing History data are divided into training set, test set by setting ratio.
Further, the model measurement of the method for this programme is settled accounts including model, and model calculating includes process:(1) punishment is determined Coefficient;(2) it is trained according to training the set pair analysis model, obtains corresponding to the parameter value of different indexs;(3) the ginseng obtained by training set Numerical value calculates test set;(4) repeatedly carry out model calculating and acquire different parameter values, take the average as most of parameter value End condition value;(5) bring Future Data into corresponding parameter and be worth to final prediction result.
The data actuarial model method for building up of flight delay danger of the present invention passes through recurrence using R language to flight data index Method builds data model, has the characteristics that prediction more accurately.
Description of the drawings
Fig. 1 is the schematic diagram of the data actuarial model method for building up of flight delay danger of the present invention.
Fig. 2 is the flow chart of the data actuarial model method for building up of flight delay danger of the present invention.
Fig. 3 is Lasso regression model regressive case schematic diagrams.
Fig. 4 is ridge regression model regressive case schematic diagram.
Specific embodiment
As shown in Figure 1, 2, schematic diagram, the flow chart of the data actuarial model method for building up of flight delay danger of the present invention.This Scheme discloses a kind of method for building up of flight Delay Model, applies and utilizes analyzing and processing field in civil aviaton's information data, for The flight delay probabilistic forecasting that insurance company uses at present is excessively coarse, and without the dynamic flight delay probability mould of refinement Type meets the requirement that traveller person's hope of navigating obtains accurate flight selection guidance to support the dynamic adjustment for being delayed dangerous odds, our Case discloses the design for instructing insurance products, reduces the loss ratio of insurance products and the traveller person that navigates is helped to make correct boat The scheme of class's selection.The system of this programme can include:(1) data processing module for handling initial data, obtains Meet the data of model needs;(2) model calculation module carries out operation to processed data by model, obtains prediction knot Fruit;(3) interpretation of result module for comparing the difference of prediction result and actual value, constantly adjusts model so that it can preferably The delay value of flight is fitted, obtains more accurately result.
The data actuarial model method for building up of the flight delay danger of this programme includes procedure below.
(i) data prediction
It is pre- that the primary data of this programme includes history flight delay data, following flight planning data, weather data, data Processing.Including basic procedure:Vacancy value is handled, handles noise data, deletes repetition values;It determines data source, integrates multiple data Redundancy issue therein is reduced in source;Data convert, and carry out data normalization and data aggregation;Data regularization, by identical data One kind is classified as, by continuous data demarcation interval, the data volume that must be handled with reduction.Specific operation process is included in following Hold.
(1) for data such as flight number, airport, airline, course line, terminal, region, city, flight status, weather It is encoded.
(2) basic operations are carried out to existing data, calculated including actual delay time, period, stop over section, if Shared flight, if calculate flight etc. more.
(3) according to sharing, mend and fly, longest section flight is divided into fine garbled data as distinguishing.
(4) data are integrated according to the difference in weather data source.
(5), to dusting cover data filling weather, the information such as grade, stop over section are delayed, fill a vacancy value.
(ii) it determines data target, counts the index of each dimension
It is directed to the master data of the flight and the additional data of flight is calculated, be mainly concerned with:Preceding n times are gone through History data include basic data, such as maximum value, minimum value, mean value, median, kurtosis degree of bias etc.;History course data;Airport Data;Boat department data;Weather data;Time hop counts evidence;Preamble data;Data on flows etc..It is carried out not by the data to magnanimity With the synthesis of dimension, it can be deduced that the such as results such as the mean delay of some airport in some period under some weather.Refer to Mark is largely divided into following components.
(1) base values:Calculate day previous delay, calculate a few days ago secondary delay, calculate delay a few days ago three times, It calculates the mean delay of the first seven time of day, the maximum value of history delay, calculate 7 days a few days ago course line delay mean values;
(2) airport, period, boat department:Departure airport, departure time section, boat department is calculating the mean delay of 1 day a few days ago; Departure airport, departure time section, boat department is calculating the mean delay of 2 days a few days ago;Departure airport, departure time section, boat department Calculating the mean delay of 3 days a few days ago;Departure airport, departure time section, boat department is calculating the mean delay in 7 days a few days ago; It arrives at the airport, arrival time section, boat department is calculating the mean delay of 1 day a few days ago;It arrives at the airport, arrival time section, boat department Calculating the mean delay of 2 days a few days ago;It arrives at the airport, arrival time section, boat department is calculating the mean delay of 3 days a few days ago;It arrives Up to airport, arrival time section, boat department is calculating the mean delay of 7 days a few days ago;
(3) airport, period, weather:Calculate trimestral departure airport recently, departure time section, the average of weather According to the weather for predicting day according to acquisition is matched, the value being obtained;The nearest trimestral departure airport of calculating, departure time section, The average data of weather is matched, the value being obtained according to the weather for obtaining calculating day;Departure airport, departure time section, weather Calculating the mean delay of 1 day a few days ago;Departure airport, departure time section, weather are calculating the mean delay of 2 days a few days ago; Departure airport, departure time section, weather are calculating the mean delay of 3 days a few days ago;Departure airport, departure time section, weather Calculating the mean delay in 7 days a few days ago;Calculating is trimestral recently to arrive at the airport, departure time section, the average data of weather, Weather according to prediction day is obtained is matched, the value being obtained;Calculate trimestral departure airport recently, departure time section, day The average data of gas is matched, the value being obtained according to the weather for obtaining calculating day;It arrives at the airport, departure time section, weather Calculating the mean delay of 1 day a few days ago;It arrives at the airport, departure time section, weather is calculating the mean delay of 1 day a few days ago;It arrives Up to airport, departure time section, weather is calculating the mean delay of 1 day a few days ago;Arrive at the airport, the departure time section, weather Calculate the mean delay of 1 day a few days ago;
(4) airport, boat department, weather:Calculate trimestral departure airport recently, boat department, the average data of weather, according to obtaining The weather of prediction day is taken to be matched, the value being obtained;The nearest trimestral departure airport of calculating, boat department, the average data of weather, It is matched according to the weather for calculating day is obtained, the value being obtained;Departure airport, boat department, weather are averaged calculating a few days ago 1 day Delay;Departure airport, boat department, weather are calculating the mean delay of 2 days a few days ago;Departure airport, boat department, weather are calculating day The mean delay of first 3 days;Departure airport, boat department, weather are calculating the mean delay of 7 days a few days ago;It calculates trimestral recently It arrives at the airport, boat department, the average data of weather is matched, the value being obtained according to the weather for obtaining prediction day;Calculate nearest three It arrives at the airport within a month, boat department, the average data of weather is matched, the value that is obtained according to the weather that calculates day is obtained;It reaches Airport, boat department, weather are calculating the mean delay of 1 day a few days ago;It arrives at the airport, boat department, weather is calculating 2 days a few days ago flat It is delayed;It arrives at the airport, boat department, weather is calculating the mean delay of 3 days a few days ago;It arrives at the airport, boat department, weather is calculating The mean delay of 7 days a few days ago;
(5) preamble flight airport, period, boat department:Preamble flight arrives at the airport, and arrival time section, boat department is calculating The mean delay of 1 day a few days ago;Preamble flight arrives at the airport, arrival time section, being averaged and prolong in calculating 2 days a few days ago of boat department Accidentally;Preamble flight arrives at the airport, arrival time section, and boat department is calculating the mean delay of 3 days a few days ago;The arrival of preamble flight Airport, arrival time section, boat department is calculating the mean delay of 7 days a few days ago;
(6) preamble flight airport, period, weather:Preamble flight calculate recently it is trimestral arrive at the airport, the period, The average data of weather is matched, the value being obtained according to the weather for obtaining prediction day;Preamble flight is calculating nearest three months Arrive at the airport, the period, the average data of weather, according to obtain calculate day weather matched, the value being obtained;Preamble is navigated Class arrives at the airport, the period, and weather is calculating the mean delay of 1 day a few days ago;Preamble flight arrives at the airport, the period, Weather is calculating the mean delay of 2 days a few days ago;The arrival machine airport of preamble flight, period, weather are calculating 3 days a few days ago Mean delay;Preamble flight arrives at the airport, the period, and weather is calculating the mean delay of 7 days a few days ago;
(7) preamble flight airport, boat department, weather:Preamble flight calculate recently it is trimestral arrive at the airport, boat department, weather Average data, according to obtain prediction day weather matched, the value being obtained;Preamble flight is calculating trimestral recently arrive Up to airport, boat department, the average data of weather is matched, the value being obtained according to the weather for obtaining calculating day;Preamble flight arrives Up to airport, weather, boat department is calculating the mean delay of 1 day a few days ago;Preamble flight arrives at the airport, weather, and boat department is counting Calculate the mean delay of 2 days a few days ago;Preamble flight arrives at the airport, weather, and boat department is calculating the mean delay of 3 days a few days ago;Before Sequence flight arrives at the airport, weather, and boat department is calculating the mean delay of 7 days a few days ago.
The model can predict the flight of T+1 to T+10, and T therein is current date.All delays in relation to weather Value is matched according to the weather for obtaining prediction day.Such as T+2 days delay values of prediction, related weather data will be with the day of T+2 days Subject to gas, T+3 days delay values are predicted, the data in relation to weather will be subject to the weather of T+3 days.
(iii) data model is built
Homing method calculating is carried out by the method for R (language) by obtain more or less a hundred different index, is mainly passed through The mode that lasso is returned calculates the index obtained.Lasso is to be less than one in the sum of absolute value of regression coefficient Under the constraints of constant, residual sum of squares (RSS) is minimized, so as to generate certain regression coefficients exactly equal to 0, obtained The stronger model of explanation strengths.Using mathematical statistical model from mass data effective mined information, at the beginning of model is established, in order to Reduce the model bias occurred when lacking important independent variable as possible, it will usually select independent variable as much as possible.However, modeling Process need find to dependent variable have most strong explanation strengths independent variable set, that is, by Variable selection (index select, Field selects) improve the solution to model property released and precision of prediction.Index selection is extremely important ask during statistical modeling Topic.Lasso algorithms are a kind of methods of estimation that can be realized index set and simplify, and this algorithm is by constructing a punishment letter Number obtains the model of a refining, is that zero, lasso algorithms realize index set essence by the coefficient for finally determining some indexs The purpose of letter, this is a kind of Biased estimator for handling and having multi-collinearity data.The basic thought of Lasso is in regression coefficient The sum of absolute value is less than under the constraints of constant, minimizes residual sum of squares (RSS), certain stringent etc. so as to generate In 0 regression coefficient, the stronger model of explanation strengths is obtained.The software package of the Lars algorithms of R statistical softwares provides Lasso calculations Method.According to the needs of model refinement, data mining worker can utilize AIC criterion and BIC criterion by means of Lasso algorithms Refining simplifies the variables collection of statistical model, achievees the purpose that dimensionality reduction.Therefore, Lasso algorithms are to can be applied to data mining In practical algorithm.
Ridge regression (ridge regression) is a kind of Biased estimator homing method for being exclusively used in synteny data analysis, Substantially a kind of least squares estimate of improvement, by abandoning the unbiasedness of least square method, to lose partial information, drop Low precision obtains regression coefficient for cost and more meets practical, more reliable homing method, and the fitting of ill data is better than Least square method.The difference of Lasso and ridge regression is exactly that constraints is different, and one is that the sum of regression coefficient absolute value is less than One constant, one is that quadratic sum is less than a constant.The constraints of Lasso is linear.If Fig. 3,4 are that two variables return Situation about returning, what contour map represented is the contour of residual sum of squares (RSS).Residual error is minimum at least-squares estimation.Dash area It is ridge regression and the restricted area of lasso respectively.Obviously round is ridge regression, and diamond shape is lasso's.Both carry penalty term Method be all to find coordinate (the i.e. ridge estimaion and lasso of first that position for falling on the contour on restricted area Estimation).Because of diamond shape band wedge angle, more likely so that the coefficient of some variable (first point found is water chestnut by 0 One of four vertex of shape).When regression variable increases, the wedge angle of lasso can also become more, become 0 so as to increase more multiple index Possibility.And the obvious of smooth high n-dimensional sphere n can not possibly have such probability.That is lasso can be used for variable for this Selection, this is lasso more advantageous.
Several algorithm models of analysis below.
(1) linear regression model (LRM)
Linear regression (Linear Regression) is to utilize the least square function pair one for being known as equation of linear regression Or a kind of regression analysis that relationship is modeled between multiple independents variable and dependent variable.This function is one or more is known as back Return the linear combination of the model parameter of coefficient.
Assuming that stochastic variable y and p independent variable x1, x2, x3..., xpBetween there is linear relationship, actual samples It measures as n, ith observation is xi1, xi2, xi3..., xip:yi(i=1,2,3...n),
Then its n times observation can appear as following form:
Wherein β0, β1, β2..., βpIt is unknown parameter, x1, x2, x3..., xpIt is that p can be measured accurately and controllable General variance, ε1, ε2, ε3..., εpIt is random error.Assuming that εiBe it is mutual indepedent and obey same normal distribution N (0, σ) with Machine variable.
If equation group is represented have with matrix:Y=x β+ε,
The top priority of multiple linear regression analysis is exactly the estimated value b by seeking β, establishes multiple linear regression side Journey:
Needing the loss function of minimization is:
If solved with gradient descent method, the expression formula of each round θ iteration is:
θ=θ-α XT(X θ-Y),
Wherein α is step-length.If with least square method, θ's the result is that:
θ=(XTX)-1XTY。
(2) ridge regression model
Due to directly applying mechanically linear regression, there may be over-fittings, it is therefore desirable to regularization term is added in, if what is added in is L2 regularization terms are exactly that Ridge is returned, i.e. ridge regression, and the difference that it is returned with general linear is increased on loss function The item of one L2 regularization, linear adjustment return item, regularization term weight factor alpha.Loss function expression formula is as follows:
Wherein α is Changshu coefficient, needs to carry out tuning, | | θ | |2For L2 norms,
The solution and general linear that Ridge is returned return similar.If using gradient descent method, each round θ changes The expression formula in generation is:
θ=θ-(β XT(X θ-Y)+α θ,
Wherein β be step-length, if using least square method, θ's the result is that:
θ=(XTX+αE)-1XTY,
Wherein E is unit matrix.
Ridge is returned in the case where not abandoning any one variable, reduces regression coefficient so that model is in contrast The stabilization compared, but this can so that the variable of model is especially more, and model explanation is poor.
(3) Lasso regression models
Lasso returns the L1 regularizations for being sometimes referred to as linear regression, and the main distinction that Ridge is returned is exactly just Then change item, Ridge recurrence is L2 regularizations, and Lasso recurrence is L1 regularizations.The loss function that Lasso is returned Expression formula is as follows:
Wherein n is number of samples, and α is constant coefficient, needs to carry out tuning | | θ | |1For L1 norms.
Lasso is returned so that some coefficients become smaller or even the smaller coefficient of some absolute values directly becomes 0, therefore Especially suitable for number of parameters reduction and the selection of parameter, thus it is used for estimating the linear model of Sparse parameter.
Lasso regression models are carried out below to compare with linear regression model (LRM), ridge regression model.
(1) angle is optimized
Linear regression is general only to try out low dimensional, such as n=50, p=5, and there is no multicollinearities.
Ridge regression is to solve multicollinearity, and it is more convenient in order to calculate to add a L2 penalty term.So And he is unable to shrinkage parameters to 0, so variables choice cannot be done.
The problem of Lasso cannot do variables choice for ridge regression proposes that although L1 penalty terms are counted to bother, do not have There are analytic solutions, but certain parameters can be shrunk to 0.
Although lasso can do variable choosing selection, inconsistent, and n change can only be at most selected when n very littles Amount, and selection cannot be grouped.
Then a weight is done between L1 and L2, this is exactly elastomeric network, is had adaptively for inconsistency Lasso models have the lasso models of grouping for grouping selection.
(2) regularization angle
L1, L2 penalty term are provided to prevent or reduce over-fitting.The process of machine learning is substantially from a vacation If according to the process of the optimal hypothesis of certain algorithms selection in space, in addition some regularization parameters are exactly to limit our institutes The function of choosing can only be chosen within limits.For linear regression, the apparent mark of over-fitting one is exactly certain Weighted value can be very big, in addition some limitations can reduce the generation of these things.The solution of L1 is usually sparse, and L2 is for angle is calculated It is more convenient.
Lasso is single order regularization, and ridge regression is second order.The starting point of lasso is to reduce over-fitting, and ridge regression is general It is considered a kind of way of processing multicollinearity, it also plays the role of reducing over-fitting certainly.Lasso is actually to select certainly A kind of way of variable, ridge regression are the certain coefficients of compression.The two all can on training set fault in enlargement, but can test Reduce evaluated error on collection.
(3) Bayes's angle
Different rules is represented uses different prior distributions to weight.
(iv) data set is built, model measurement
Data model is run multiple times and obtains multiple weighted values, and these weighted values are taken with the final weighted value of mean value computation, The result that different indexs is calculated with model is calculated, obtains last prediction result.
Data set building process includes:(1) following flight data is obtained, and is combined with future weather data;(2) basis Corresponding Index Content calculates the index value of following flight data;(3) existing historical data data are compared as certain by R language Example is divided into training set, test set.
Model measurement process is calculated including model, is specifically included:(1) a penalty coefficient is determined;(2) according to training set to mould Type is trained, and obtains corresponding to the parameter value of different indexs;(3) test set is calculated by the parameter value that training set obtains; (4) repeatedly carry out model calculating and acquire different parameter values, take the average as final parameter value of parameter value;(5) by following number Final prediction result is obtained according to corresponding parameter value is brought into.
The data actuarial model method for building up of this programme flight delay danger is not limited to interior disclosed in specific embodiment Hold, the technical solution occurred in embodiment can be extended based on the understanding of those skilled in the art, those skilled in the art's root The simple replacement scheme made according to this programme combination common knowledge also belongs to the range of this programme.

Claims (7)

1. a kind of data actuarial model method for building up of flight delay danger, the data actuarial model of flight delay danger include initial number Data preprocess module, data model computing module, prediction result analysis module, it is characterized in that including step:
(1) primary data is pre-processed;
(2) data target is determined, master data, additional data for flight are calculated;
(3) data model is built, obtained multiple and different indexs are subjected to homing method calculating by R language;
(4) data model is run multiple times and obtains multiple weighted values, weighted value is taken into mean value computation final weight value;
(5) the result of different index and data model is carried out that final prediction result is calculated;
(6) the difference of prediction result and actual value is compared, adjusts data model so that data model is fitted the delay value of flight, obtains To more accurately prediction result.
2. the data actuarial model method for building up of flight delay danger according to claim 1, which is characterized in that described initial Data preprocessing module includes history flight delay database, following flight planning database, weather data library, and primary data is pre- The process that processing module pre-processes primary data includes step:
(1) data cleansing:Vacancy value is handled, handles noise data, deletes repetition values;
(2) data integration:Multiple data sources are integrated, handle the redundancy issue of data source;
(3) data convert:Carry out data normalization and data aggregation;
(4) data regularization:Identical data are classified as one kind, by continuous data demarcation interval.
3. the data actuarial model method for building up of flight delay danger according to claim 2, which is characterized in that data are clear It washes, data integration, data convert, data regularization includes process:
(1) compiled for flight number, airport, airline, course line, terminal, region, city, flight status, weather data Code;
(2) basic operations are carried out to data, be calculated the actual delay time, the period, stop over section, whether shared flight, whether The data for calculating flight more;
(3) shared, benefit is flown, the data of longest section flight are divided into fine screening data item;
(4) data are integrated according to the difference in weather data source;
(5) weather, delay grade, the information of stop over section are filled up to dusting cover data item, and fill a vacancy value.
4. the data actuarial model method for building up of flight delay danger according to claim 1, which is characterized in that data target Including:
(1) base values:It calculates day previous delay, calculate delay, the calculating of a few days ago secondary delay, calculating a few days ago three times Mean delay of the first seven time of day, the maximum value of history delay calculate 7 days a few days ago course line delay mean values;
(2) airport, period, boat department:Departure airport, departure time section, boat department is calculating the mean delay of 1 day a few days ago;It sets out Airport, departure time section, boat department is calculating the mean delay of 2 days a few days ago;Departure airport, departure time section, boat department is counting Calculate the mean delay of 3 days a few days ago;Departure airport, departure time section, boat department is calculating the mean delay in 7 days a few days ago;It reaches Airport, arrival time section, boat department is calculating the mean delay of 1 day a few days ago;It arrives at the airport, arrival time section, boat department is counting Calculate the mean delay of 2 days a few days ago;It arrives at the airport, arrival time section, boat department is calculating the mean delay of 3 days a few days ago;Arrival machine , arrival time section, boat department is calculating the mean delay of 7 days a few days ago;
(3) airport, period, weather:Trimestral departure airport recently, departure time section are calculated, the average data of weather is pressed It is matched according to the weather for obtaining prediction day, the value being obtained;Calculate trimestral departure airport recently, departure time section, weather Average data, according to obtain calculate day weather matched, the value being obtained;Departure airport, the departure time section, weather Calculate the mean delay of 1 day a few days ago;Departure airport, departure time section, weather are calculating the mean delay of 2 days a few days ago;It sets out Airport, departure time section, weather are calculating the mean delay of 3 days a few days ago;Departure airport, departure time section, weather are being counted Calculate the mean delay in 7 days a few days ago;Calculating is trimestral recently to arrive at the airport, departure time section, the average data of weather, according to The weather for obtaining prediction day is matched, the value being obtained;The nearest trimestral departure airport of calculating, departure time section, weather Average data is matched, the value being obtained according to the weather for obtaining calculating day;It arrives at the airport, departure time section, weather is being counted Calculate the mean delay of 1 day a few days ago;It arrives at the airport, departure time section, weather is calculating the mean delay of 1 day a few days ago;Arrival machine , departure time section, weather is calculating the mean delay of 1 day a few days ago;It arrives at the airport, departure time section, weather is calculating The mean delay of 1 day a few days ago;
(4) airport, boat department, weather:Trimestral departure airport, boat department, the average data of weather are pre- according to obtaining recently for calculating The weather for surveying day is matched, the value being obtained;The nearest trimestral departure airport of calculating, boat department, the average data of weather, according to The weather for obtaining calculating day is matched, the value being obtained;Departure airport, boat department, weather are averaged and prolonged calculating 1 day a few days ago Accidentally;Departure airport, boat department, weather are calculating the mean delay of 2 days a few days ago;Departure airport, boat department, weather are calculating a few days ago The mean delay of 3 days;Departure airport, boat department, weather are calculating the mean delay of 7 days a few days ago;Calculate trimestral recently arrive Up to airport, boat department, the average data of weather is matched, the value being obtained according to the weather for obtaining prediction day;Calculate nearest three The moon arrives at the airport, boat department, the average data of weather, is matched according to the weather for obtaining calculating day, the value being obtained;Arrival machine , boat department, weather is calculating the mean delay of 1 day a few days ago;It arrives at the airport, boat department, weather is averaged calculating a few days ago 2 days Delay;It arrives at the airport, boat department, weather is calculating the mean delay of 3 days a few days ago;It arrives at the airport, boat department, weather is calculating day The mean delay of first 7 days;
(5) preamble flight airport, period, boat department:Preamble flight arrives at the airport, arrival time section, and boat department is calculating a few days ago The mean delay of 1 day;Preamble flight arrives at the airport, arrival time section, and boat department is calculating the mean delay of 2 days a few days ago;Before Sequence flight arrives at the airport, arrival time section, and boat department is calculating the mean delay of 3 days a few days ago;Preamble flight arrives at the airport, Arrival time section, boat department is calculating the mean delay of 7 days a few days ago;
(6) preamble flight airport, period, weather:Preamble flight calculate recently it is trimestral arrive at the airport, the period, weather Average data, according to obtain prediction day weather matched, the value being obtained;Preamble flight is calculating trimestral recently arrive Up to airport, period, the average data of weather is matched, the value being obtained according to the weather for obtaining calculating day;Preamble flight It arrives at the airport, period, weather is calculating the mean delay of 1 day a few days ago;Preamble flight arrives at the airport, the period, weather Calculating the mean delay of 2 days a few days ago;The arrival machine airport of preamble flight, period, weather are calculating 3 days a few days ago flat It is delayed;Preamble flight arrives at the airport, the period, and weather is calculating the mean delay of 7 days a few days ago;
(7) preamble flight airport, boat department, weather:Preamble flight calculate recently it is trimestral arrive at the airport, boat department, weather put down Equal data are matched, the value being obtained according to the weather for obtaining prediction day;Preamble flight is calculating trimestral arrival machine recently , boat department, the average data of weather is matched, the value being obtained according to the weather for obtaining calculating day;The arrival machine of preamble flight , weather, boat department is calculating the mean delay of 1 day a few days ago;Preamble flight arrives at the airport, weather, and boat department is calculating day The mean delay of first 2 days;Preamble flight arrives at the airport, weather, and boat department is calculating the mean delay of 3 days a few days ago;Preamble is navigated Class arrives at the airport, weather, and boat department is calculating the mean delay of 7 days a few days ago.
5. the data actuarial model method for building up of flight delay danger according to claim 1, which is characterized in that data model Including:Linear regression model (LRM), ridge regression model, Lasso regression models.
6. the data actuarial model method for building up of flight delay danger according to claim 1, which is characterized in that data set structure It builds including process:
(1) following flight data is obtained to be combined with future weather data;
(2) the index value of following flight data is calculated according to corresponding Index Content;
(3) existing historical data is divided by training set, test set by setting ratio by R language.
7. the data actuarial model method for building up of flight delay danger according to claim 6, which is characterized in that model measurement It is calculated including model, model calculating includes process:
(1) penalty coefficient is determined;
(2) it is trained according to training the set pair analysis model, obtains corresponding to the parameter value of different indexs;
(3) test set is calculated by the parameter value that training set obtains;
(4) repeatedly carry out model calculating and acquire different parameter values, the average of parameter value is taken to be used as final argument value;
(5) bring Future Data into corresponding parameter and be worth to final prediction result.
CN201711074657.2A 2017-11-03 2017-11-03 A kind of data actuarial model method for building up of flight delay danger Pending CN108197081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711074657.2A CN108197081A (en) 2017-11-03 2017-11-03 A kind of data actuarial model method for building up of flight delay danger

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711074657.2A CN108197081A (en) 2017-11-03 2017-11-03 A kind of data actuarial model method for building up of flight delay danger

Publications (1)

Publication Number Publication Date
CN108197081A true CN108197081A (en) 2018-06-22

Family

ID=62573010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711074657.2A Pending CN108197081A (en) 2017-11-03 2017-11-03 A kind of data actuarial model method for building up of flight delay danger

Country Status (1)

Country Link
CN (1) CN108197081A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109009009A (en) * 2018-07-26 2018-12-18 方顺丽 Blood vessel appraisal procedure, device and intelligent terminal
CN109492334A (en) * 2018-12-11 2019-03-19 青岛心中有数科技有限公司 Delayed method for establishing model, prediction technique and device
CN109657285A (en) * 2018-11-27 2019-04-19 中国科学院空间应用工程与技术中心 The detection method of turbine rotor transient stress
CN110059439A (en) * 2019-04-29 2019-07-26 中国人民解放军战略支援部队航天工程大学 A kind of spacecraft orbit based on data-driven determines method
CN111178628A (en) * 2019-12-30 2020-05-19 沈阳民航东北凯亚有限公司 Luggage arrival time prediction method and device
CN111652427A (en) * 2020-05-29 2020-09-11 航科院中宇(北京)新技术发展有限公司 Flight arrival time prediction method and system based on data mining analysis
CN111950791A (en) * 2020-08-14 2020-11-17 中国民航信息网络股份有限公司 Flight delay prediction method, device, server and storage medium
CN112180316A (en) * 2020-09-27 2021-01-05 青岛鼎信通讯股份有限公司 Electric energy meter metering error analysis method based on adaptive shrinkage ridge regression
CN112801421A (en) * 2021-03-18 2021-05-14 瑞涯科技(上海)股份有限公司 Method for predicting airplane flight punctuality rate based on probability theory
CN115879953A (en) * 2023-02-15 2023-03-31 中航信移动科技有限公司 User data processing method, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844346A (en) * 2016-03-17 2016-08-10 福州大学 Flight delay prediction method based on ARIMA model
CN106530841A (en) * 2016-12-30 2017-03-22 中国人民解放军空军装备研究院雷达与电子对抗研究所 Airport delay prediction method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844346A (en) * 2016-03-17 2016-08-10 福州大学 Flight delay prediction method based on ARIMA model
CN106530841A (en) * 2016-12-30 2017-03-22 中国人民解放军空军装备研究院雷达与电子对抗研究所 Airport delay prediction method and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109009009A (en) * 2018-07-26 2018-12-18 方顺丽 Blood vessel appraisal procedure, device and intelligent terminal
CN109657285A (en) * 2018-11-27 2019-04-19 中国科学院空间应用工程与技术中心 The detection method of turbine rotor transient stress
CN109492334A (en) * 2018-12-11 2019-03-19 青岛心中有数科技有限公司 Delayed method for establishing model, prediction technique and device
CN109492334B (en) * 2018-12-11 2023-12-22 青岛心中有数科技有限公司 Model building method, prediction method and device for flight delay
CN110059439A (en) * 2019-04-29 2019-07-26 中国人民解放军战略支援部队航天工程大学 A kind of spacecraft orbit based on data-driven determines method
CN111178628B (en) * 2019-12-30 2023-09-19 沈阳民航东北凯亚有限公司 Luggage arrival time prediction method and device
CN111178628A (en) * 2019-12-30 2020-05-19 沈阳民航东北凯亚有限公司 Luggage arrival time prediction method and device
CN111652427A (en) * 2020-05-29 2020-09-11 航科院中宇(北京)新技术发展有限公司 Flight arrival time prediction method and system based on data mining analysis
CN111652427B (en) * 2020-05-29 2023-12-29 航科院中宇(北京)新技术发展有限公司 Flight arrival time prediction method and system based on data mining analysis
CN111950791A (en) * 2020-08-14 2020-11-17 中国民航信息网络股份有限公司 Flight delay prediction method, device, server and storage medium
CN112180316B (en) * 2020-09-27 2022-07-01 青岛鼎信通讯股份有限公司 Electric energy meter metering error analysis method based on adaptive shrinkage ridge regression
CN112180316A (en) * 2020-09-27 2021-01-05 青岛鼎信通讯股份有限公司 Electric energy meter metering error analysis method based on adaptive shrinkage ridge regression
CN112801421A (en) * 2021-03-18 2021-05-14 瑞涯科技(上海)股份有限公司 Method for predicting airplane flight punctuality rate based on probability theory
CN115879953A (en) * 2023-02-15 2023-03-31 中航信移动科技有限公司 User data processing method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108197081A (en) A kind of data actuarial model method for building up of flight delay danger
CN107045788B (en) Traffic road condition prediction method and device
Horowitz Reconsidering the multinomial probit model
Lorenc A global three-dimensional multivariate statistical interpolation scheme
CN108629503B (en) Prediction method for taxi getting-on demand based on deep learning
US9177473B2 (en) Vehicle arrival prediction using multiple data sources including passenger bus arrival prediction
CN107610464A (en) A kind of trajectory predictions method based on Gaussian Mixture time series models
CN107730893B (en) A kind of shared bus website passenger flow forecasting based on passenger's trip multidimensional characteristic
US9507053B2 (en) Using aircraft trajectory data to infer atmospheric conditions
Lauderdale Probabilistic conflict detection for robust detection and resolution
Akyilmaz et al. Prediction of Earth rotation parameters by fuzzy inference systems
CN109308343A (en) A kind of Forecasting of Travel Time and degree of reiability method based on Stochastic Volatility Model
Lin et al. Approach for 4-d trajectory management based on HMM and trajectory similarity
Lehouillier et al. Solving the air conflict resolution problem under uncertainty using an iterative biobjective mixed integer programming approach
Schnitzler et al. Combining a Gauss-Markov model and Gaussian process for traffic prediction in Dublin city center.
Shepelev et al. Method" Mean–Risk" for Comparing Poly-Interval Objects in Intelligent Systems
CN111160594B (en) Method and device for estimating arrival time and storage medium
Estes et al. Predicting performance of ground delay programs
Roychoudhury et al. Predicting real-time safety of the national airspace system
CN115097548A (en) Sea fog classification early warning method, device, equipment and medium based on intelligent prediction
CN111652428B (en) Flight arrival time estimation method and system based on historical data
Sato et al. People flow prediction by multi-agent simulator
Stover et al. Data-driven modeling of aircraft midair separation violation
Lyu et al. Aircraft Reserve Fuel Study with High-fidelity Fuel Approximation Model
Takeichi et al. Feasibility study on modeling of cruise flight time uncertainty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180622

RJ01 Rejection of invention patent application after publication