CN118071138A

CN118071138A - Construction method and device of object XGBoost model, computer equipment and storage medium

Info

Publication number: CN118071138A
Application number: CN202410069738.7A
Authority: CN
Inventors: 王占海; 陈奇; 吴涛; 曹大树; 赵聪; 杨睿
Original assignee: China Academy of Civil Aviation Science and Technology
Current assignee: China Academy of Civil Aviation Science and Technology
Priority date: 2024-01-17
Filing date: 2024-01-17
Publication date: 2024-05-24

Abstract

The invention relates to the technical field of data processing, and discloses a method, a device, computer equipment and a storage medium for constructing a target XGBoost model, wherein the method comprises the following steps: acquiring a training data set and constructing XGBoost models; model training is carried out on the XGBoost model by utilizing a training data set, and a trained XGBoost model is obtained; optimizing iteration times, learning rate and tree depth of the XGBoost model by utilizing a badger optimization algorithm, and determining target iteration times, target learning rate and target tree depth; and constructing a target XGBoost model based on the target iteration times, the target learning rate, the depth of the target tree and the trained XGBoost model. According to the method, the iteration times, the learning rate and the tree depth of the XGBoost model are optimized by using the badger optimization algorithm, and as the badger optimization algorithm has strong global searching capability, the accuracy, the complexity and the stability of the algorithm have obvious advantages, and the parameters of the badger optimization algorithm are fewer, so that the calculation cost can be reduced, and meanwhile, the accuracy of the target XGBoost model is ensured.

Description

Construction method and device of object XGBoost model, computer equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for constructing a target XGBoost model, a computer device, and a storage medium.

Background

Safety is a life line of civil aviation and is also a perpetual theme of civil aviation. The comprehensive analysis and utilization work of the safety information is basic work of civil aviation safety management, is an important support necessary for mastering situations, identifying risks and assisting decisions, and is an important means for realizing risk prevention. The future flying risk for a certain period of time is predicted through the historical symptom information, and the method plays an important role in absorbing experience and training and preventing the occurrence of the event again.

Machine learning is a core of artificial intelligence and is in the prime period of rapid development. The XGBoost (eXtreme Gradient Boosting) algorithm is used as one of three algorithms of machine learning, and is derived from an open source Boosting library comprising a tree reinforcement learning algorithm and a linear solver, is an efficient and rapid integrated learning tool, has the advantages of built-in cross validation, regularization prevention of overfitting, parallel efficient processing and the like, is widely applied to data mining in the fields of finance, electronic commerce, medical treatment, automation and the like, and shows good learning performance and classification accuracy.

XGBoost and other machine learning models have the problem of super-parameter optimization, namely that the generalization capability of the model constructed by using different super-parameters by using the same input data is different. At present, grid search is a feasible method for searching the optimal super-parameters, but when the dimension of the super-parameters is too high, the calculation cost of grid search is higher.

Disclosure of Invention

In view of this, the invention provides a method, a device, a computer device and a storage medium for constructing a target XGBoost model, so as to solve the problem of higher grid search calculation cost when the dimension of super parameters is too high.

In a first aspect, the present invention provides a method for constructing a model of a target XGBoost, where the method includes: acquiring a training data set and constructing XGBoost models; model training is carried out on the XGBoost model by utilizing a training data set, and a trained XGBoost model is obtained; optimizing iteration times, learning rate and tree depth of the XGBoost model by utilizing a badger optimization algorithm, and determining target iteration times corresponding to the iteration times, target learning rate corresponding to the learning rate and target tree depth corresponding to the tree depth; and constructing a target XGBoost model based on the target iteration times, the target learning rate, the depth of the target tree and the trained XGBoost model.

According to the construction method of the target XGBoost model, the XGBoost model is subjected to model training through the training data set to obtain the trained XGBoost model, then the iteration times, the learning rate and the tree depth of the XGBoost model are optimized through the badger optimization algorithm, and the accuracy, the complexity and the stability of the algorithm are all remarkable due to the fact that the badger optimization algorithm has strong global searching capability, parameters of the algorithm are fewer (namely the iteration times, the learning rate and the tree depth), so that the calculation cost can be reduced, and meanwhile the accuracy of the target XGBoost model is guaranteed.

In an alternative embodiment, optimizing the iteration number, the learning rate and the tree depth of the XGBoost model by using a badger optimization algorithm, determining the target iteration number corresponding to the iteration number, the target learning rate corresponding to the learning rate and the target tree depth corresponding to the tree depth includes: acquiring an upper boundary, a lower boundary, population quantity, random number and dimension quantity of attribute parameters; determining the position of each badger individual based on the upper boundary, the lower boundary, the population number, the random number and the dimension number of the attribute parameters; wherein, the position of the individual badger of bag includes: iteration times, learning rate and tree depth; determining an objective function value of the badger population based on the positions of the badger individuals; determining an fitness function based on a mean square error of the XGBoost model; sequencing the obtained fitness, determining the optimal searching position of the current population, and setting the optimal searching position as the current global optimal position; iteratively updating the positions of the badgers in various populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

According to the method for constructing the target XGBoost model, the optimal solution is quickly found in a larger parameter space through the badger optimization algorithm, the global searching efficiency is improved, and the parameter range can be automatically adjusted according to the self iteration process, so that the searching process is finer.

In an optional embodiment, iteratively updating the positions of the badgers in the various populations until the iteration number meets a preset number, and outputting a target iteration number, a target learning rate and a target tree depth, including: in the process of optimizing the iteration times, the learning rate and the tree depth of the XGBoost model by utilizing the badger optimization algorithm, if the random number is larger than a preset value, determining the new position of the badgers in each population based on the position of the slough; determining an fitness value corresponding to the new position, comparing the fitness value corresponding to the new position with the fitness value corresponding to the original position, and judging whether to update the position of the badger population; if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, updating the position of the badger population, iteratively updating the positions of the badgers in various populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

The method for constructing the object XGBoost model provided in this embodiment can flexibly adjust the search strategy according to different conditions (here, the size of the random number). When the random number is greater than the preset value, the algorithm is more focused on searching based on the position information of the meat, which helps the algorithm to find solutions more effectively in some cases, and the strategy can balance between global searching and local searching through the position information of the meat. The search based on the position of the meat helps the algorithm to search more finely in a local area. In addition, by comparing fitness values of the old and new locations and updating the locations of the population of badgers, the strategy helps to avoid the algorithm from sinking into a locally optimal solution, thereby increasing the likelihood of finding a globally optimal solution.

In an optional embodiment, iteratively updating the positions of the badgers in the various populations until the iteration number meets a preset number, and outputting a target iteration number, a target learning rate and a target tree depth, including: in the process of optimizing iteration times, learning rate and tree depth of XGBoost models by utilizing a badger optimization algorithm, if the random number is smaller than a preset value, determining a new position of the badgers in various populations based on the position of a game in a first stage of hunting behavior; determining an fitness value corresponding to the new position, comparing the fitness value corresponding to the new position with the fitness value corresponding to the original position, and judging whether to update the position of the badger population; if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, updating the position of the badger population; in the second stage of hunting, taking the badger individuals as the center, and updating the positions of the badger population individuals based on the range of the preset badger hunting objects; after the optimal position of the badger is locked, the rest badger is continuously close to the optimal position in a spiral search mode, and the target iteration times, the target learning rate and the depth of the target tree are output.

According to the method for constructing the target XGBoost model, different updating strategies are adopted at different stages, so that the searching mode can be dynamically adjusted according to the searching process of the badger optimization algorithm. In the first stage, a new position is determined based on the position information of the prey, which facilitates extensive exploration of the algorithm in the solution space. In the second stage, the position of the badger is updated by taking the badger as the center, and the algorithm is facilitated to search more finely in a local range by combining a preset tracking range, and the strategy can be balanced in a global range and a local range by the wide search in the first stage and the local fine search in the second stage, so that the algorithm is facilitated to find a better solution. In addition, the spiral search mode is adopted to be continuously close to the optimal position, so that the algorithm can be ensured to perform comprehensive and fine search in a local range, and the quality of the solution can be further optimized. In addition, by adopting different updating strategies at different stages, the method reduces the risk of sinking into a local optimal solution and improves the possibility of finding a global optimal solution.

In an alternative embodiment, the method further comprises: acquiring a test data set; inputting the test data set into a target XGBoost model to obtain a test result; detecting whether the test result meets a preset condition; if the test result meets the preset condition, judging that the target XGBoost model is qualified; if the test result does not meet the preset condition, the target XGBoost model is judged to be unqualified.

According to the construction method of the target XGBoost model, the target XGBoost model is tested through the test data set, and the target XGBoost model can be further verified, so that the accuracy of the target XGBoost model is guaranteed.

In an alternative embodiment, the means for obtaining the test data set and the training data set comprises: acquiring aircraft data; the aircraft data comprise information of event levels, event types, event reasons and event phases; preprocessing aircraft data to obtain an original data set; dividing the original data set into a test data set and a training data set according to a preset proportion; wherein the number of training data sets is greater than the number of test data sets.

According to the construction method of the target XGBoost model, the consistency and consistency of data are ensured through the complete process of obtaining the data, preprocessing and dividing the data set. In addition, the preprocessing step ensures the standardization of the data, eliminates the problems of non-uniformity of abnormal values, missing values or formats and the like, and ensures that the data is more reliable and usable.

In an alternative embodiment, the preprocessing of the aircraft data to obtain the raw data set includes: and carrying out normalization processing on the aircraft data by using a normalization function to obtain an original data set.

The method for constructing the object XGBoost model according to this embodiment can convert the data to a specific range, such as [0,1] or [ -1,1], through normalization processing, which helps to eliminate the influence of the data scale on the algorithm. For example, if a range of feature values is much larger than other feature values, it may dominate the algorithm. Through normalization, it can be ensured that all feature values are on the same scale, so that the algorithm treats each feature more equitably.

In a second aspect, the present invention provides an apparatus for constructing a model of a target XGBoost, the apparatus comprising: the acquisition and construction module is used for acquiring a training data set and constructing XGBoost models; the model training module is used for carrying out model training on the XGBoost model by utilizing the training data set to obtain a trained XGBoost model; the optimization module is used for optimizing iteration times, learning rate and tree depth of the XGBoost model by utilizing a badger optimization algorithm, and determining target iteration times corresponding to the iteration times, target learning rate corresponding to the learning rate and target tree depth corresponding to the tree depth; the building module is used for building a target XGBoost model based on the target iteration times, the target learning rate, the depth of the target tree and the trained XGBoost model.

In a third aspect, the present invention provides a computer device comprising: the memory and the processor are in communication connection with each other, the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the method for constructing the object XGBoost model according to the first aspect or any one of the corresponding embodiments.

In a fourth aspect, the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a computer to execute the method for constructing the object XGBoost model according to the first aspect or any one of the embodiments corresponding to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of constructing a model of a target XGBoost according to an embodiment of the present invention;

FIG. 2 is a flow diagram of another method of constructing a XX target XGBoost model, according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a risk predictor system in a method of constructing a target XGBoost model according to an embodiment of the present invention;

FIG. 4 is a flow chart of a method of constructing a model of a further object XGBoost according to an embodiment of the present invention;

FIG. 5 is a block diagram of a construction apparatus of a target XGBoost model according to an embodiment of the present invention;

Fig. 6 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Based on the related technology, the safety is a life line of civil aviation and is also a constant theme of civil aviation. The comprehensive analysis and utilization work of the safety information is basic work of civil aviation safety management, is an important support necessary for mastering situations, identifying risks and assisting decisions, and is an important means for realizing risk prevention. The future flying risk for a certain period of time is predicted through the historical symptom information, and the method plays an important role in absorbing experience and training and preventing the occurrence of the event again.

Machine learning is a core of artificial intelligence and is in the prime period of rapid development. The XGBoost algorithm is used as one of three algorithms of machine learning, and is derived from an open source Boosting library comprising a tree reinforcement learning algorithm and a linear solver, is an efficient and rapid integrated learning tool, has the advantages of built-in cross validation, regularization prevention of overfitting, parallel efficient processing and the like, is widely applied to data mining in the fields of finance, electronic commerce, medical treatment, automation and the like, and shows good learning performance and classification accuracy.

Based on this, according to the method for constructing the target XGBoost model provided by the embodiment, the XGBoost model is model-trained through the training data set to obtain the trained XGBoost model, and then the iteration times, the learning rate and the tree depth of the XGBoost model are optimized by utilizing the badger optimization algorithm.

In accordance with an embodiment of the present invention, there is provided a method embodiment of constructing a model of object XGBoost, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system, such as a set of computer-executable instructions, and, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

In this embodiment, a method for constructing a model of a target XGBoost is provided, which may be used in a computer device, such as a computer, a server, etc., fig. 1 is a schematic flow chart of a method for constructing a model of a target XGBoost according to an embodiment of the present invention, as shown in fig. 1, where the flow chart includes the following steps:

step S101, a training data set is acquired, and XGBoost models are built.

The training dataset may be used to characterize a set of data that trains XGBoost models. Wherein the training data set may comprise: the event class, event type, event cause, event stage, etc., are not particularly limited herein. Specifically, the training data set may be directly obtained from the database, or the training data set may be obtained by dividing the original data set, and the like, which is not particularly limited herein. Among them, specific dividing steps are described in detail below.

The XGBoost model is a gradient lifting algorithm, which is collectively referred to as "Extreme Gradient Boosting". Based on a gradient lifting framework, the method fits gradient information of data by iteratively adding a new decision tree, so that the prediction accuracy of a model is gradually improved.

And step S102, performing model training on the XGBoost model by using the training data set to obtain a trained XGBoost model.

First, the training data set is subjected to necessary preprocessing including data cleansing, feature selection, missing value processing, and the like. The purpose of this step is to ensure the quality and availability of the data, which lays a foundation for model training.

Parameter setting: parameters of the XGBoost model need to be set before training can begin. These parameters include learning rate, maximum depth, regularization terms, etc., which will affect the training process and final performance of the model. The optimal parameter combination is determined through experiments and tuning according to specific tasks and data characteristics.

Model training: and training the XGBoost model by using the set parameters through a training data set. In the process, XGBoost fits the gradient information of the data by iteratively adding a new decision tree, and gradually improves the prediction accuracy of the model, so that a trained XGBoost model is obtained.

And step S103, optimizing the iteration times, the learning rate and the tree depth of the XGBoost model by utilizing a badger optimization algorithm, and determining the target iteration times corresponding to the iteration times, the target learning rate corresponding to the learning rate and the target tree depth corresponding to the tree depth.

Firstly, setting initial parameters for a badger optimization algorithm, including population size, iteration times, learning rate and the like. These parameters will affect the search process and the final result of the algorithm. And (3) constructing a solution: in the initial population, a set of solutions (i.e., model parameter combinations) are randomly generated based on the parameters of the XGBoost models. Each solution represents one possible model configuration. Evaluating the performance of the solution: for each solution (model configuration), model training is performed using the training dataset and its performance is evaluated using the validation dataset. The performance evaluation index may be accuracy, precision, recall, etc. Selecting an excellent solution: and selecting the solution with the best performance as the current optimal solution according to the evaluation result. Meanwhile, other solutions are ordered according to fitness values. Generating a new solution: and generating a new solution by using excellent individuals in the current population according to the updating rule of the badger optimization algorithm. This may be accomplished by mutation, crossover, selection, etc. In each iteration, the optimal solution and population are updated continuously. Determining target parameters: after the optimization is finished, the model parameter combination corresponding to the optimal solution is the target iteration number, the target learning rate and the depth of the target tree after optimization by the badger optimization algorithm.

Step S104, constructing a target XGBoost model based on the target iteration times, the target learning rate, the depth of the target tree and the trained XGBoost model.

Substituting the optimal parameters after iteration, namely the target iteration times, the target learning rate and the depth of the target tree into XGBoost models to obtain a target XGBoost model.

As shown in fig. 2, in an alternative embodiment, the step S103 includes:

In step S1031, an upper boundary, a lower boundary, a population number, a random number, and a dimension number of the attribute parameter are acquired.

The attribute parameters may be used to characterize the solved model range values. Wherein, the upper boundary: representing the maximum possible value of the attribute parameter. The lower boundary: representing the smallest possible value of the attribute parameter. These limits limit the scope of the search space, helping the algorithm to find the optimal solution in a limited scope.

The population number is the size of the badger population in the badger optimization algorithm. The population number may be 20, 30, etc., and is not particularly limited herein. The population number determines the breadth of the algorithm search solution space, and more population numbers may improve the global search capability of the algorithm, but increase the computational complexity and time cost.

In the badger optimization algorithm, the random number is used to generate a new solution. Through the random numbers, the algorithm can explore different areas in the solution space, and the possibility of finding the globally optimal solution is increased.

The number of dimensions characterizes the number of dimensions or features of the solution of the problem. The number of dimensions determines the complexity of the problem and the size of the solution space. The high dimensionality may increase the complexity of the knowledge space, making searching and optimization more difficult.

Step S1032, determining the position of each badger individual based on the upper boundary, the lower boundary, the population number, the random number and the dimension number of the attribute parameters; wherein, the position of the individual badger of bag includes: the number of iterations, the learning rate, and the depth of the tree.

Determining the position of each individual badger based on the upper boundary, the lower boundary, the population number, the random number and the dimension number of the attribute parameters, comprising:

P _i,j＝lb_j+rand·(ub_j-lb_j), i=1, 2, i, N, j=1, 2, m; wherein P _i,j represents the position of the ith individual in the j-th dimension; rand is a random number between the ranges 0, 1; ub _j is the upper boundary of the solved model range; lb _j is the lower boundary of the solved model range; n is the population number; m is the number of dimensions.

Assuming that there are N badger populations of individuals in the m-dimensional space, the position P of the badger populations can be expressed as follows:

wherein P _i represents the position of the ith individual in the badger bag.

Accordingly, the objective function vector of the badger population can be expressed as:

wherein F _N represents the objective function value of the Nth individual badger.

Step S1033, determining the objective function value of the badger population based on the positions of the badger individuals.

Step S1034, determining an fitness function based on the mean square error of XGBoost model.

Where n is the number of solutions, y _i is the actual value,/>Is the predicted value of XGBoost models. The smaller the fitness value, the better the predictive effect of the model.

Step S1035, the obtained fitness is ranked, the best searching position of the current population is determined, and the best searching position is set as the current global best position.

The solutions in the current population are ordered according to each solution (i.e., the fitness value of each badger in the bag). Solutions with higher fitness values represent better performance and closer to optimal solutions. Determining the best search position: and selecting the solution with the highest fitness value from the sorted population as the best searching position of the current population. This best solution represents the best model configuration or parameter combination in the current population. Setting a global optimal position: the best solution of the current population is compared to the global historical best solution. And if the fitness value of the current optimal solution is higher than that of the global historical optimal solution, updating the current optimal solution into the global historical optimal solution and taking the current optimal solution as the current global optimal position.

Step S1036, iteratively updating the positions of the badgers in various populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

The preset number of times may be the number of iterations. The preset times may be 100 times, 200 times, etc., and may be set by the skilled person independently, and are not particularly limited herein. Specifically, the positions of the badgers in various groups are iteratively updated until the iteration times meet the preset times, and the optimal positions at the time comprise: target iteration number, target learning rate, target tree depth.

In an alternative embodiment, step S1036 includes:

In the step a1, in the process of optimizing the iteration times, learning rate and tree depth of the XGBoost model by utilizing the badger optimization algorithm, if the random number is larger than a preset value, determining the new position of the badgers in various populations based on the position of the slough.

Position of the meat: a _i＝X_m, i=1, 2,..n, m e {1,2,..n, |m+.i }; wherein ai is the position of the meat of the ith pocket of badger, the position of the meat of the ith pocket of badger is the position of the mth pocket of badger, and m is a random natural number from 1 to N.

In each iteration, a random number is generated. This random number may be a floating point number between 0 and 1 or a range may be set as desired for a particular problem. And comparing the generated random number with a preset value. The predetermined value is a number between 0 and 1 (e.g., 0.5) for controlling the probability or degree of influence of the determination of the new position of the badger. If the random number is larger than the preset value, determining a new position of the badger in the population based on the position of the slough. The "location of the meat" herein may be a heuristic that indicates a better solution or advantageous region near the current location of the badger. And determining the new position of the badger in the population according to the position of the slough and the corresponding updating rules (such as linear interpolation, random selection and the like). This new location may be a random point near the current location or a favorable location selected based on heuristic information.

And a2, determining an fitness value corresponding to the new position, comparing the fitness value corresponding to the new position with the fitness value corresponding to the original position, and judging whether to update the position of the badger population.

And for the new position determined based on the position of the preserved beancurd, calculating the fitness value corresponding to the new position according to the definition of the fitness function. The fitness value is used to evaluate the quality of the solution, and is typically related to the predictive performance of the model. And comparing the fitness value of the new position with the fitness value of the original position. This may be done by calculating the difference or using other comparison methods. And judging whether to update the position of the badger population according to the comparison result. If the fitness value of the new position is better than the original position (i.e. the new position has better performance), the position of the badger is updated to the new position. Otherwise, the original position is kept unchanged.

And a3, updating the positions of the badger populations of the bag if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, and iteratively updating the positions of the badgers of the bag in various populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

Wherein/>Is the new position of the ith badger in the j-th dimension, r is a random number ranging between [0,1], C is a random number 1 or 2,/>The adaptability value of the new position of the preserved beancurd is F _i, and the adaptability value of one position on the new position is F _i.

In an alternative embodiment, step S1036 includes:

And b1, in the process of optimizing the iteration times, the learning rate and the tree depth of the XGBoost model by utilizing the badger optimization algorithm, if the random number is smaller than a preset value, determining the new position of the badgers in various groups based on the position of the hunting behavior in the first stage of hunting behavior.

In each iteration, a random number is generated. This random number may be a floating point number between 0 and 1 or a range may be set as desired for a particular problem. And comparing the generated random number with a preset value. The preset value is a number between 0 and 1 for controlling the probability or influence degree of the determination of the new position of the badger. If the random number is less than the preset value, determining a new position of the badger in the population based on the position of the prey. The "location of the prey" herein may be a heuristic that indicates a better solution or advantageous region near the current location of the badger. First stage of hunting behavior: a hunting behavior simulation process may be considered when determining a new location based on the game location. For example, the search, tracking, and approaching hunting behavior of a badger may be simulated. This may be accomplished by defining a hunting field or heuristic to bring the new location closer to the location of the prey.

Specifically, the step b1 includes:

Wherein/> Is the new position of the ith badger in the j-th dimension, r is a random number ranging between [0,1], C is a random number 1 or 2,/>Is the fitness value of the new position of the beancurd, F _i is the fitness value of one position on the new position, and b _i,j is the position of the prey.

And b2, determining an fitness value corresponding to the new position, comparing the fitness value corresponding to the new position with the fitness value corresponding to the original position, and judging whether to update the position of the badger population.

And b3, if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, updating the position of the badger population.

For a new location determined based on hunting behavior, a fitness value corresponding to the new location is calculated according to the definition of the fitness function. The fitness value is used to evaluate the quality of the solution, and is typically related to the predictive performance of the model. And comparing fitness values of the new and old positions: and comparing the fitness value of the new position with the fitness value of the original position. This may be done by calculating the difference or using other comparison methods. And judging whether to update the position of the badger population according to the comparison result. If the fitness value of the new position is better than the original position (i.e. the new position has better performance), the position of the badger is updated to the new position. This is because in the optimization process we want to find a better solution.

And b4, in the second stage of hunting, taking the badger individuals as the center, and updating the positions of the badger population individuals based on the range of the preset badger hunting.

And determining a tracking range taking the badger individual as the center according to the preset range of the badger tracking prey. This range may be a circular, rectangular or other shaped area for guiding the badger in the direction of the game. And calculating the distance between the current badger individual and the prey. This may be accomplished by Euclidean distance or other distance measurement methods. And determining the moving direction and the step length of the badger individual according to the tracking range and the distance between the badger and the prey. The moving direction can be the direction of the prey or the direction opposite to the prey, and the step length represents the moving amplitude of the badger. And updating the positions of the individuals in the badger population according to the determined moving direction and step length.

Specifically, the step b4 includes:

Wherein R is the attack radius of the badger population, t is the current iteration number, and t _max is the maximum iteration number.

And b5, after the optimal position of the badger individual is locked, continuously approaching the rest badger individuals to the optimal position in a spiral search mode, and outputting the target iteration times, the target learning rate and the depth of the target tree.

In the iterative process, the optimal position of the badger in the bag is locked, namely the position of the badger in the bag with the highest fitness value. And calculating the range and step length of spiral searching of the rest badger individuals according to the optimal position and the preset spiral searching parameters. The range can be a circular or rectangular area, and the step length represents the distance of each movement of the badger. For each unlocked badger individual, the badger individual is continuously close to the optimal position in a spiral searching mode. In each iteration, the position of the badger is updated according to the step length and the direction, and the optimal position is gradually approximated. Repeating the steps until the preset iteration times are reached or other termination conditions are met. In each iteration, the performance of the new location is evaluated according to the fitness value, and the location of the badger is updated. And after the iteration is finished, outputting the found target iteration times, the target learning rate and the depth of the target tree. These parameters are the best model parameter combinations determined during the optimization process.

Specifically, the step b5 includes:

The position updating formula of the improved badger optimization algorithm is as follows:

D_i,j＝|2r·P_best,j-P_i,j(t)|

Wherein, P is a random number with value range of [ -1,1] evenly distributed, t is the current iteration number, t _max is the maximum iteration number, P _best,j is the optimal position, D _i,j is the distance from the ith pocket badger to the prey (the optimal solution obtained so far), and e is a natural constant. η ^* is an adaptive weight factor, and in the initial iteration stage, the value of η ^* is large, so that the possibility that the badger optimization algorithm falls into a local extremum is reduced. In the later period of iteration, the value of eta ^* is reduced in a self-adaptive mode, and the searching precision of the badger optimization algorithm is improved.

Wherein, the self-adaptive weight factors can effectively balance the searching capability of the algorithm in different stages. The position of the badger is updated by using the self-adaptive weight factors, so that the path of the badger during searching can be dynamically adjusted according to the iteration times in the searching process, the diversity of searching paths is improved, and the global optimal solution is obtained. In the initial stage of iteration, the iteration times are fewer, the target position is fuzzy, the value of eta ^* is larger, more efficient position update is realized to lock the optimal position of the badger individual, the update efficiency of the searching route in the original badger algorithm is improved, the badger can find the target solution more easily and more quickly in the early stage of algorithm iteration, and the possibility that the badger optimization algorithm falls into a local extremum is reduced. In the later iteration stage, as the iteration times are increased, the target is gradually clear, the value of eta ^* is reduced in a self-adaptive mode, the locked optimal position of the badger individual is more attractive to the badger individual, so that the badger can find the target solution more accurately in the middle and later stages of algorithm iteration, and the searching precision of the badger optimization algorithm is improved. The self-adaptive weight factor eta ^* is added, so that the global searching capability is improved, the stable transition of the global searching and the local searching of the algorithm can be promoted, and the optimizing performance and the stability of the algorithm are improved.

In an alternative embodiment, the method further comprises:

step c1, acquiring a test data set.

The test data set may be used to characterize historical event levels, event types, event causes, information on the stage of occurrence, etc., without specific limitation herein.

And c2, inputting the test data set into a target XGBoost model to obtain a test result.

After the computer device obtains the test data set, the test data set is input to the target XGBoost model to test the target XGBoost model, so that a test result is obtained.

And step c3, detecting whether the test result meets the preset condition.

And c4, if the test result meets the preset condition, judging that the target XGBoost model is qualified.

And c5, if the test result does not meet the preset condition, judging that the target XGBoost model is not qualified.

The computer device detects whether the test result meets a preset condition. Wherein, the preset condition can be used for representing the accuracy of the test result. For example: the preset condition may be to use R2, RMSE, MAE, etc. as model evaluation index. And when the test result does not meet the preset condition, determining that the target XGBoost model is not qualified, and when the test result does not meet the preset condition, determining that the target XGBoost model is not qualified.

In an alternative embodiment, the means for obtaining the test data set and the training data set comprises:

Step d1, acquiring aircraft data; the aircraft data comprises information of event level, event type, event reason and event stage.

And d2, preprocessing the aircraft data to obtain an original data set.

Step d3, dividing the original data set into a test data set and a training data set according to a preset proportion; wherein the number of training data sets is greater than the number of test data sets.

The aircraft data may be used to characterize aviation data that affects risk factor variables. The aircraft data comprises information of event level, event type, event reason and event stage.

And calculating a month average risk value through aviation data, and classifying the collected data samples to obtain training samples and test samples. Risk is a combination of the probability of an adverse event occurring and the consequences of the adverse event. The aircraft safety risk is mainly related to factors such as unsafe event level, event type, event reason, event stage and the like, so that a risk prediction index system is constructed, and is shown in combination with fig. 3.

The civil aviation operation risk can be obtained through unsafe events occurring in a certain historical time period of the airplane, and four main influencing factors influencing the unsafe event risk program are analyzed: on the basis of reasonably weighting four main influencing factors, the monthly average risk of the aircraft is calculated, and the monthly average risk is used as a monitoring index of civil aviation operation risk to measure and predict the risk condition. The month average risk calculation formula is:

Monthly average risk = (accident level weight + accident type number accident type weight + accident cause number accident cause weight + accident stage number story development stage weight + sign level weight + sign type number sign cause weight + sign stage number sign stage weight + event class weight + event type number event cause weight + event cause number event stage weight + event cause number + event cause weight + event stage number of event/month flight time).

The weight determining method of the accident level of the accident, the symptom and the general event comprises the following steps: the "Haen rule" indicates 29 signs of an accident at 1-liter accident. And 300 accident-indicating seedlings (severe errors) were located below. Determining the weight of accidents, symptoms and general events by applying a 'Haen law' reciprocal method, namely that the accidents are 1/1; the sign is 1/29; the general event is 1/300. The weights of the type, the reason and the incident stage are determined by a Delphi method.

Specifically, the preprocessing of the aircraft data in the step d2 to obtain an original data set includes:

And carrying out normalization processing on the aircraft data by using a normalization function to obtain an original data set.

Preprocessing the collected data: the preprocessing comprises normalization, noise and abnormal data; the normalization processing adopts maximum and minimum normalization processing, and the data normalization processing formula is as follows:

And training XGBoost the prediction model by taking the preprocessed month average risk data as the input of the model and taking the future month average risk value as the output of the model.

60 Groups of data of the first 60 months are used as training samples, 12 groups of data of the last 12 months are used as test samples, and XGBoost are input;

According to the method for constructing the target XGBoost model, the complex nonlinear mapping relation between aviation operation data and event levels, event types, event reasons and event phases is established, the structure and parameters of the XGBoost are optimized by adopting a badger algorithm, the prediction accuracy of XGBoost is further improved, risk conditions in future periods are predicted in advance, preventive measures are formulated in a targeted manner, and the safety management level is improved.

Referring to fig. 4, in an alternative implementation manner, the embodiment provides a method for constructing a target XGBoost model, where the method includes:

1. Inputting data;

2. carrying out normalization processing on input data, and dividing a training set and a testing set;

3. Constructing XGBoost a model, and initializing badger population parameters;

4. Calculating fitness values and sequencing;

5. Randomly setting Pr (random number);

If Pr is more than 0.5, calculating a new position of the badger population according to Ci and updating (namely, the steps a 1-a 3);

7. if not, calculating a new position of the badger population according to Pi and updating; updating R, then calculating a new position of the badger population and updating; optimizing the adaptive spiral search strategy (i.e., steps b 1-b 5 above);

8. judging whether a termination condition is met, if so, outputting the optimal position of the badger population;

9. If not, i=i+1 (i.e., iteratively executing the above 4 to 8);

10. obtaining XGBOOST optimal parameters;

11. framework XGBOOST model (i.e., target XGBOOST model described above);

12. outputting a prediction result;

13. And (5) ending.

The embodiment also provides a device for constructing the object XGBoost model, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a device for constructing a model of a target XGBoost, as shown in fig. 5, including:

an acquisition construction module 501, configured to acquire a training data set and construct XGBoost models;

The model training module 502 is configured to perform model training on the XGBoost model by using the training data set to obtain a trained XGBoost model;

The optimizing module 503 is configured to optimize the iteration number, the learning rate and the tree depth of the XGBoost model by using a badger optimization algorithm, and determine a target iteration number corresponding to the iteration number, a target learning rate corresponding to the learning rate and a target tree depth corresponding to the tree depth;

A construction module 504, configured to construct a target XGBoost model based on the target iteration number, the target learning rate, the depth of the target tree, and the trained XGBoost model.

In some alternative embodiments, the optimization module 503 includes: the first acquisition unit is used for acquiring the upper boundary, the lower boundary, the population quantity, the random number and the dimension quantity of the attribute parameters; the first determining unit is used for determining the position of each badger individual based on the upper boundary, the lower boundary, the population quantity, the random number and the dimension quantity of the attribute parameters; wherein, the position of the individual badger of bag includes: iteration times, learning rate and tree depth; a second determining unit, configured to determine an objective function value of the badger population based on the positions of the badger individuals; a third determining unit, configured to determine an fitness function based on a mean square error of the XGBoost model; and the fourth determining unit is used for sequencing the obtained fitness, determining the optimal searching position of the current population, and setting the optimal searching position as the current global optimal position.

In some alternative embodiments, the fourth determining unit comprises: the first determining subunit is configured to determine a new position of the badger in each population based on the position of the slough if the random number is greater than a preset value in a process of optimizing the iteration number, the learning rate and the tree depth of the XGBoost model by using the badger optimization algorithm; the second determining subunit is used for determining an adaptability value corresponding to the new position, comparing the adaptability value corresponding to the new position with the adaptability value corresponding to the original position, and judging whether to update the position of the badger population; and the first updating subunit is used for updating the positions of the badger populations if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, iteratively updating the positions of the badgers in the populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

In some alternative embodiments, the fourth determining unit comprises: the third determining subunit is used for determining new positions of the badgers in various groups based on the positions of the hunting objects in the first stage of hunting behavior if the random number is smaller than a preset value in the process of optimizing the iteration times, the learning rate and the tree depth of the XGBoost model by utilizing the badger optimization algorithm; a fourth determining subunit, configured to determine an fitness value corresponding to the new position, compare the fitness value corresponding to the new position with the fitness value corresponding to the home position, and determine whether to update the position of the badger population; the second updating subunit is used for updating the positions of the badger population if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position; a third updating subunit, configured to update, in a second stage of hunting, a position of an individual in the population of badgers based on a range of a preset badger-tracking game with the individual in the badger as a center; and the spiral searching subunit is used for continuously approaching the other badger individuals to the optimal position in a spiral searching mode after the optimal position of the badger individuals is locked, and outputting the target iteration times, the target learning rate and the depth of the target tree.

In an alternative embodiment, the apparatus further comprises: the test data set acquisition module is used for acquiring a test data set; the input module is used for inputting the test data set into the target XGBoost model to obtain a test result; the detection module is used for detecting whether the test result meets the preset condition; the first judging module is used for judging whether the target XGBoost model is qualified if the test result meets the preset condition; and the second judging module is used for judging that the target XGBoost model is unqualified if the test result does not meet the preset condition.

In an alternative embodiment, the apparatus further comprises: the aircraft data acquisition module is used for acquiring aircraft data; the aircraft data comprise information of event levels, event types, event reasons and event phases; the preprocessing module is used for preprocessing the aircraft data to obtain an original data set; the dividing module is used for dividing the original data set into a test data set and a training data set according to a preset proportion; wherein the number of training data sets is greater than the number of test data sets.

In an alternative embodiment, the preprocessing module is configured to normalize the aircraft data by using a normalization function to obtain the original data set.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The object XGBoost model building means in this embodiment is presented in the form of functional units, where the functional units refer to ASIC (Application SPECIFIC INTEGRATED Circuit) circuits, processors and memories that execute one or more software or firmware programs, and/or other devices that can provide the above functions.

The embodiment of the invention also provides computer equipment, which is provided with the device for constructing the object XGBoost model shown in the figure 5.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 6, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method for constructing a model of a target XGBoost, comprising:

acquiring a training data set and constructing XGBoost models;

Performing model training on the XGBoost model by using the training data set to obtain a trained XGBoost model;

optimizing iteration times, learning rate and tree depth of a XGBoost model by utilizing a badger optimization algorithm, and determining target iteration times corresponding to the iteration times, target learning rate corresponding to the learning rate and target tree depth corresponding to the tree depth;

and constructing a target XGBoost model based on the target iteration times, the target learning rate, the depth of the target tree and the trained XGBoost model.

2. The method for constructing the target XGBoost model according to claim 1, wherein optimizing the iteration number, the learning rate and the tree depth of the XGBoost model by using a bag-in-a-mel optimization algorithm, determining the target iteration number corresponding to the iteration number, the target learning rate corresponding to the learning rate and the target tree depth corresponding to the tree depth comprises:

Acquiring an upper boundary, a lower boundary, population quantity, random number and dimension quantity of attribute parameters;

determining the position of each badger individual based on the upper boundary, the lower boundary, the population number, the random number and the dimension number of the attribute parameters; wherein, the position of the badger individual includes: iteration times, learning rate and tree depth;

determining an objective function value of the badger population based on the positions of the badger individuals;

Determining an fitness function based on a mean square error of the XGBoost model;

sequencing the obtained fitness, determining the optimal searching position of the current population, and setting the optimal searching position as the current global optimal position;

and iteratively updating the positions of the badgers in various populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

3. The method for constructing the target XGBoost model according to claim 2, wherein iteratively updating the positions of the badgers in the respective populations until the number of iterations satisfies a preset number of iterations, and outputting the target number of iterations, the target learning rate, and the depth of the target tree, includes:

in the process of optimizing the iteration times, the learning rate and the tree depth of the XGBoost model by utilizing the badger optimization algorithm, if the random number is larger than a preset value, determining the new position of the badgers in each population based on the position of the slough;

Determining an fitness value corresponding to the new position, comparing the fitness value corresponding to the new position with the fitness value corresponding to the original position, and judging whether to update the position of the badger population;

And if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, updating the position of the badger population, iteratively updating the positions of the badgers in various populations until the iteration times meet the preset times, and outputting the target iteration times, the target learning rate and the depth of the target tree.

4. The method for constructing the target XGBoost model according to claim 2, wherein iteratively updating the positions of the badgers in the respective populations until the number of iterations satisfies a preset number of iterations, and outputting the target number of iterations, the target learning rate, and the depth of the target tree, includes:

in the process of optimizing iteration times, learning rate and tree depth of XGBoost models by utilizing a badger optimization algorithm, if the random number is smaller than a preset value, determining a new position of the badgers in various populations based on the position of a game in a first stage of hunting behavior;

if the fitness value corresponding to the new position is smaller than the fitness value corresponding to the original position, updating the position of the badger population;

in the second stage of hunting, taking the badger individuals as the center, and updating the positions of the badger population individuals based on the range of the preset badger hunting objects;

After the optimal position of the badger is locked, the rest badger is continuously close to the optimal position in a spiral search mode, and the target iteration times, the target learning rate and the depth of the target tree are output.

5. The method for constructing a model of object XGBoost as claimed in claim 1, wherein,

Acquiring a test data set;

inputting the test data set into a target XGBoost model to obtain a test result;

detecting whether the test result meets a preset condition;

If the test result meets the preset condition, judging that the target XGBoost model is qualified;

and if the test result does not meet the preset condition, judging that the target XGBoost model is not qualified.

6. The method of constructing a model of target XGBoost of claim 5, wherein the means for obtaining the test data set and the training data set comprises:

Acquiring aircraft data; the aircraft data comprise information of event levels, event types, event reasons and event phases;

preprocessing the aircraft data to obtain an original data set;

Dividing the original data set into a test data set and a training data set according to a preset proportion; wherein the number of training data sets is greater than the number of test data sets.

7. The method of constructing a model of target XGBoost of claim 6, wherein preprocessing the aircraft data to obtain a raw data set comprises:

and normalizing the aircraft data by using a normalization function to obtain the original data set.

8. A device for constructing a model of an object XGBoost, the device comprising:

the acquisition and construction module is used for acquiring a training data set and constructing XGBoost models;

the model training module is used for carrying out model training on the XGBoost model by utilizing the training data set to obtain a trained XGBoost model;

the optimization module is used for optimizing iteration times, learning rate and tree depth of the XGBoost model by utilizing a badger optimization algorithm, and determining target iteration times corresponding to the iteration times, target learning rate corresponding to the learning rate and target tree depth corresponding to the tree depth;

the building module is configured to build a target XGBoost model based on the target iteration number, the target learning rate, the depth of the target tree, and the trained XGBoost model.

9. A computer device, comprising:

A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of constructing the model of object XGBoost as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, wherein computer instructions for causing a computer to execute the method of constructing the object XGBoost model according to any one of claims 1 to 7 are stored on the computer-readable storage medium.