CN115938496A

CN115938496A - Quality Estimation Method Based on XGBoost Algorithm

Info

Publication number: CN115938496A
Application number: CN202211663862.3A
Authority: CN
Inventors: 汪明宇; 汪胜利
Original assignee: Hubei Shenli Auto Parts And Components Share Co ltd
Current assignee: Hubei Shenli Auto Parts And Components Share Co ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-04-07

Abstract

The invention discloses a quality estimation method based on an XGboost algorithm, and belongs to the technical field of metal heat treatment. The method comprises the following steps: s101: testing to obtain corresponding input parameters and output parameters as modeling data; s102: establishing an objective function of the XGboost model according to modeling data, presetting model parameters, and establishing the XGboost model with estimated quality; s103: dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model; s104: and substituting the input parameters to be estimated into the trained quality estimation model to obtain the predicted values of the output parameters. Aiming at the problem of complex induction quenching process, the XGboost algorithm-based induction quenching quality estimation method adopts the data driving model to estimate the quenching quality, has high model accuracy and can help to shorten the process formulation period.

Description

Quality estimation method based on XGboost algorithm

Technical Field

The invention belongs to the technical field of metal heat treatment, and particularly relates to a quality estimation method based on an XGboost algorithm.

Background

Induction quenching is an important metal heat treatment mode and has the advantages of high quality, high efficiency and environmental protection. However, induction quenching involves a highly nonlinear complex process in which electromagnetic induction, skin effect, eddy current and other electrothermal effects and phase change affect each other, physical changes in the process are rapid, influence parameters are multiple, fluctuation is severe, and it is difficult to establish a theoretical model to accurately predict induction quenching quality, so that a traditional treatment mode selects a reasonable heat treatment process through a large-batch quenching test, so that a product has standard quenching quality, but the traditional mode is high in cost and long in time consumption.

Disclosure of Invention

The invention provides an induction quenching quality estimation method based on an XGboost (eXtreme Gradient enhancement) algorithm, so that the problem that the quenching quality is difficult to estimate due to the complex induction quenching process in the prior art is solved, and the method has the advantage of short estimation time. The technical scheme is as follows:

the embodiment of the invention provides a quality estimation method based on an XGboost algorithm, which comprises the following steps:

s101: testing to obtain corresponding input parameters and output parameters as modeling data;

s102: establishing an objective function of the XGboost model according to modeling data, presetting model parameters, and establishing the XGboost model with estimated quality;

s103: dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model;

s104: and substituting the input parameters to be estimated into the trained quality estimation model to obtain the predicted values of the output parameters.

When the intrinsic quality estimation method is used for estimating the induction quenching quality of the automobile half shaft, input parameters comprise product material parameters, product dimensions, equipment electrical parameters, quenching inductor parameters, processing parameters and quenching liquid parameters, and output parameters comprise the quenching layer hardness and the quenching layer depth of the rod part of the automobile half shaft;

wherein the product material parameters include density, thermal conductivity, specific heat capacity, and coefficient of thermal expansion; the product size comprises the diameter of a neck, the diameter of a rod part, the diameter of a spline, the length of a half shaft and the angular radius of a combination part of the rod part and the disc part of the half shaft; the electrical parameters of the equipment comprise incoming line three-phase voltage, frequency and transformer transformation ratio; the quenching inductor parameters comprise the inner diameter of a quenching inductor, the voltage of the quenching inductor, the current of the quenching inductor, the frequency of the quenching inductor and the power of the quenching inductor; the processing parameters comprise heating time, heating average speed and half-shaft rotation speed; the quenching liquid parameters comprise the flow rate of the quenching liquid, the concentration of the quenching liquid, the temperature of the quenching liquid and the specific heat capacity of the quenching liquid.

Wherein, the XGboost model is as follows:

wherein the content of the first and second substances,

for the ith sample x _i Output parameter of the model prediction of (1), F _K (x _i ) Represents the ith sample x of K tree pairs _i Prediction of (A), F _K-1 (x _i ) Represents the first K-1 tree pairs for the ith sample x _i Prediction of (f) _K (x _i ) Representing the Kth CART regression tree; the CART regression tree maps the input parameters, the input parameters enable nodes to split and enable the CART regression tree to grow, each sample can finally fall on a corresponding leaf node according to the splitting condition of the tree, the leaf node has a uniform prediction score for the samples falling on the leaf node, the uniform prediction score is used as a prediction value omega of the samples falling on the leaf node in the tree, a CART regression tree model is newly generated in each training round and is added into a current model, a certain number of CART regression trees are trained, the sum of the corresponding prediction values of the samples in each tree is the final prediction value of the samples, and the prediction value of the output parameters is obtained;

wherein the constructed objective function is as follows:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

output parameter, y, predicted for model of ith sample _i For the actual output parameter of the ith sample, K represents the number of regression trees, n represents the number of samples, f _k Represents the kth regression tree model, Ω (f) represents the regularization term, T represents the number of leaf nodes of the regression tree, ω represents the fraction of the leaf nodes, γ and λ are hyper-parameters, and/or>

L2 regularization representing leaf node scores;

wherein the loss function is:

wherein, the first and the second end of the pipe are connected with each other,

output parameter, y, predicted for model of ith sample _i For the actual output parameter of the ith sample, δ is the parameter of the loss function, and n represents the number of samples.

Specifically, the specific establishment process of the quality estimation model is as follows:

(1) Establishing XGboost model

(2) Constructing an objective function

as a loss function for calculating the error between the predicted value and the true value of the quality estimation model, y _i Represents a true value, <' > is present>

Denotes the predicted value, Ω (f) _k ) The regularization term is used for controlling the complexity of the model and controlling overfitting;

the regularization term is:

/>

wherein T represents the number of leaf nodes, omega represents the fraction of the leaf nodes, and gamma and lambda are constant coefficients; the first term gamma T controls the complexity of the tree through the number of leaf nodes and the coefficient thereof; the second term is an L2 regular term and is used for controlling the prediction scores of the leaf nodes;

(3) Training model

And (3) performing second-order Taylor expansion on the target function:

wherein, g _i Is the first order gradient of the loss function, h _i For the second order gradient of the loss function, statistics can be made in advance before training the objective function:

the constant term does not influence the optimization result, can be further simplified, and the constant term

Removing to obtain an objective function:

for continued optimization, integrate the tree model f _s (x _i ) And leaf node prediction score ω, due to f ₃ (x _i ) Finally enabling the sample to fall on a leaf node, and calculating the predicted score omega of the leaf node, so when the sample falls on a certain leaf node j, the predicted score omega can be used _j In place of f _s (x _i ) And obtaining a new objective function:

I _j a sample set of all samples falling at leaf node j;

the optimal prediction score for leaf node j can be calculated

The optimal weight of the leaf node depends on the first-order and second-order gradients and the L2 regular coefficient lambda;

the optimal weight is utilized to obtain the optimal solution obj of the target function ^(*) ：

Based on the optimal solution obj ^(*) In each training process, evaluating all candidate CART regression tree models by using the scoring indexes to select an optimal model; optimizing by adopting an accurate greedy algorithm, and calculating the values before and after node splitting of a new CART regression tree from the root node of the treeDetermining whether to split according to the difference of the objective function values;

based on the optimal solution obj ^(*) Calculating the difference of the objective function values before and after the node splitting:

selecting part of pre-estimated input parameters in the splitting process of the CART regression tree nodes, sequencing all sample values under the selected characteristics, and calculating g of each sample _i And h _i Then calculate obj at all the segmentation points _split Selecting obj _split The maximum input parameter and the segmentation point thereof are used as the optimal characteristic and the optimal segmentation point of the node, and the leaf node is split according to the optimal input parameter and the optimal segmentation point to finally generate a new CART regression tree; in the CART regression tree generation process, generation is stopped when the CART regression tree model meets one of the following conditions: the CART regression tree reaches a preset maximum depth max _ depth; the sample weight sum is smaller than a preset threshold min _ child _ weight.

Preferably, pruning is carried out on the newly generated CART regression tree, and the main steps are as follows: evaluating tree nodes from bottom to top, judging whether the income of the current node is less than the preset minimum income or not, and if the income is less than the minimum income, pruning;

after pruning, a new CART regression tree is determined, and the newly generated tree model is added to the current model:

then represents the s-1 th training sample x _i Eta is the reduction coefficient of the newly generated CART regression tree model, f _s (x _i ) A new tree model trained for the S-th round; the value range of eta is more than or equal to 0 and less than or equal to 1.

Preferably, when the CART regression tree nodes are split, a column sampling method of layer-wise random colsample _ bylev is adopted: before each node in the same layer of the CART regression tree is split, randomly selecting a part of features, and determining an optimal splitting point according to the part of features; during model training, before generating the CART regression tree, randomly selecting part of training samples to train.

Preferably, during model training, one XGBoost model generally predicts only one output parameter, and when a requirement for predicting multiple output parameters at the same time during implementation needs to be met, one package multioutputregresor provided by sklern is used to predict the packaged multiple output parameters, so as to finally obtain a unique evaluation index.

Specifically, in step S103, training and verification are performed with 80% as a training set and 20% as a verification set.

Preferably, step S103 adopts 5-fold cross validation to optimize the hyper-parameters during model training.

Preferably, in step S103, when obtaining the final quality estimation model, the mean square error of MSE, the mean absolute error of MAE, and the sum R are used ² And determining at least one item in the coefficient indexes to evaluate the pre-estimated model, and selecting the optimal model as a final quality pre-estimated model from the models with errors reaching the standard.

In summary, compared with the prior art, the above technical solutions contemplated by the present invention mainly have the following technical advantages: aiming at the problem of complex induction quenching process, the method for estimating the induction quenching quality based on the XGboost algorithm adopts the data driving model to estimate the quenching quality, has high model accuracy and can provide help for shortening the process formulation period; the method has the advantages that various characteristics influencing the induction quenching quality are considered, the method is well suitable for induction quenching tests with a plurality of influencing factors, a large number of tests for certain influencing characteristics are not needed, and the test cost is saved; data in the induction quenching test can be used as a whole, the coupling relation among different characteristics is fully considered, the XGboost model does not need to normalize the data, and the XGboost model has good adaptability to the characteristics of different orders of magnitude, wide applicability and strong generalization capability.

Drawings

Fig. 1 is a flowchart of a quality estimation method based on an XGBoost algorithm according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an XGBoost model provided in an embodiment of the present invention;

fig. 3 is a schematic diagram of the estimation and selection of the prediction model according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides a quality estimation method based on an XGBoost algorithm, including the following steps:

s101: tests are performed to obtain corresponding input parameters and output parameters as modeling data.

S102: and establishing an objective function of the XGboost model according to the modeling data, presetting model parameters, and establishing the XGboost model with the estimated quality.

S103: and dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model.

When the method for estimating the intrinsic quality is used for estimating the induction quenching quality of the automobile half shaft, input parameters comprise product material parameters, product dimensions, equipment electrical parameters, quenching inductor parameters, processing parameters, quenching liquid parameters and the like, and output parameters comprise the quenching layer hardness, the quenching layer depth and the like of the rod part of the automobile half shaft.

Wherein the product material parameters comprise density, thermal conductivity, specific heat capacity, thermal expansion coefficient and the like; the product size comprises the diameter of a neck, the diameter of a rod part, the diameter of a spline, the length of a half shaft, the angular radius of a combination part of the rod part of the half shaft and the disk part and the like; the electrical parameters of the equipment comprise incoming line three-phase voltage, frequency, transformer transformation ratio and the like; the quenching inductor parameters comprise the inner diameter of a quenching inductor, the voltage of the quenching inductor, the current of the quenching inductor, the frequency of the quenching inductor, the power of the quenching inductor and the like; the processing parameters comprise heating time, heating average speed, half shaft rotation speed and the like; the quenching liquid parameters comprise the flow rate of the quenching liquid, the concentration of the quenching liquid, the temperature of the quenching liquid, the specific heat capacity of the quenching liquid and the like. From the above description, it can be seen that, for the estimation of the induction hardening quality of the automobile half shaft, the types of the input parameters are very many, and can reach more than 20, therefore, in the subsequent processing process of the invention, the inventor fully considers the simplification of the model, the simplification of the algorithm and the simplification of the process so as to reduce the evaluation time.

Further, abnormal data generated by test failure and the like are eliminated from the acquired sample data, and all data used for training are ensured to be within a reasonable range.

Wherein, the XGboost model is as follows:

for the ith sample x _i Output parameter of the model prediction of (1), F _K (x _i ) Represents the ith sample x of K tree pairs _i Prediction of (D), F _K-1 (x _i ) Represents the first K-1 tree pairs for the ith sample x _i Prediction of (f) _K (x _i ) Represents the Kth CART regression tree.

Referring to fig. 2, since the induction hardening quality prediction belongs to regression prediction, the CART regression tree maps the input parameters, the input parameters split nodes to allow the CART regression tree to grow, each sample finally falls on a corresponding leaf node according to the splitting condition of the tree, the leaf node has a uniform prediction score for the samples falling on the leaf node, the uniform prediction score is used as a prediction value omega of the sample falling on the leaf node on the tree, a CART regression tree model is newly generated in each training run and is added into a current model, a certain number of CART regression trees are trained, the sum of the corresponding prediction values of the samples in each tree is the final prediction value of the sample, and the prediction value of the output parameter is obtained.

Wherein the constructed objective function is as follows:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

L2 canonical representing leaf node score.

The prediction of the quality of induction quenching is a regression problem,

generally, the average absolute error of the MAE is adopted, but because the MAE has an inconductable region, a loss function Pseudo Huber loss approximated by the MAE is adopted, and the loss function is:

wherein the content of the first and second substances,

is as followsOutput parameter of model prediction of i samples, y _i For the actual output parameter of the ith sample, δ is the parameter of the loss function, and n represents the number of samples.

Preferably, in step S103, relevant parameters of the XGBoost model are set: eta represents the shrink step; gamma represents the minimum loss reduction for further segmentation of the leaf nodes of the tree; max _ depth represents the maximum depth of the representation tree; min _ child _ weight represents the minimum of the sum of subtree weights; max _ delta _ step represents the maximum step size of each tree weight change; max _ leaf _ nodes represents the maximum number of leaf nodes; subsample represents the sample random sampling, i.e., the ratio of subsamples for tree growth; colsample _ byte represents column sample, i.e., the subset ratio of features used to construct each tree; lambda represents the L2 regularization term parameter of the weight value controlling the complexity of the model; alpha represents an L1 regularization term parameter that controls the weight value of the model complexity; scale _ pos _ weight is a positive sample weight scale.

(1) Establishing XGboost model

(2) Constructing an objective function

Wherein the content of the first and second substances,

Represents a predicted value, this term being a differentiable convex function; omega (f) _k ) For the regularization term, for controlling the complexity of the modelAnd (6) fitting.

The regularization term is:

wherein T represents the number of leaf nodes, omega represents the fraction of the leaf nodes, and gamma and lambda are constant coefficients; the first term gamma T controls the complexity of the tree by the number of leaf nodes and their coefficients; the larger the value, the larger the objective function, thereby suppressing the complexity of the model. The second term is an L2 regular term used to control the prediction score of the leaf node.

(3) Training model

And (3) performing second-order Taylor expansion on the target function:

Removing to obtain an objective function:

for continued optimization, integrate the tree model f ₃ (x _i ) And leaf node prediction score ω, due to f ₃ (x _i ) Finally enabling the sample to fall on a leaf node, and calculating to obtain the predicted score omega of the leaf node, so that when the sample falls on a certain leaf node jWhen, may use ω _j In place of f _S (x _i ) And obtaining a new objective function:

I _j a sample set of all samples falling at leaf node j;

the optimal prediction score (optimal weight) for leaf node j can be calculated

The optimal weight of the leaf node depends on the first order, second order gradients and the L2 regularization coefficient λ. The L2 regular reduces the weight of the leaf nodes, reduces the influence of the leaf nodes on the whole prediction result, and prevents overfitting.

The optimal weight is utilized to obtain the optimal solution obj of the objective function ^(*) ：

Based on the optimal solution obj ^(*) In each training process, all candidate CART regression tree models are evaluated by using the scoring indexes, and then the optimal model can be selected. The method has the advantages that the number of input parameters estimated by induction quenching is large, the calculation amount is large, in order to reduce the calculation amount and avoid enumerating all possible tree structures, the optimization is carried out by adopting an accurate greedy algorithm, and for a new CART regression tree, whether the tree is split or not is determined by calculating the difference of objective function values before and after the node is split from the root node of the tree.

Based on the optimal solution obj ^(*) Before and after splitting of a computing nodeDifference in objective function value:

in the splitting process of the CART regression tree node, selecting part of pre-estimated input parameters, sequencing all sample values (segmentation points) under the selected characteristics, and calculating g of each sample _i And h _i Then calculate obj at all the segmentation points _split Selecting obj _split And the maximum input parameter and the segmentation point thereof are used as the optimal characteristic and the optimal segmentation point of the node, and the leaf node is split according to the optimal input parameter and the optimal segmentation point to finally generate a new CART regression tree. In the generation process of the CART regression tree, when the CART regression tree model meets one of the following conditions, the generation is stopped: the CART regression tree reaches a preset maximum depth max _ depth; the sample weight sum is smaller than a preset threshold min _ child _ weight.

Preferably, overfitting caused by excessive splitting of the CART regression tree is avoided, and the generalization capability of the model is improved; pruning the newly generated CART regression tree, and the main steps are as follows: and evaluating the tree nodes from bottom to top, judging whether the income of the current node is less than the preset minimum income, and if the income is less than the minimum income, pruning is beneficial to improving the generalization capability of the model and pruning.

wherein the content of the first and second substances,

then represents the s-1 th training sample x _i Eta is a reduction coefficient of the newly generated CART regression tree model, f _s (x _l ) A new tree model is trained for the s-th round. Eta can also be regarded as learning rate, and its value range is 0 ≦Eta is less than or equal to 1, the influence of each tree on the overall model is limited, and overfitting is avoided.

Preferably, the estimated characteristics of the induction quenching quality are more, the calculated amount is large, and the characteristic number needs to be compressed for accelerating the training speed; meanwhile, different CART regression trees adopt the same characteristics, so that the correlation between the CART regression tree models is too strong. The column sampling method of random colsample _ bylev according to layers is adopted to solve the problems: before each node in the same layer of the CART regression tree is split, a part of features are randomly selected, and the optimal splitting point is determined only according to the part of features, so that the model training speed can be increased. In addition, the relevance among models is reduced, different CART regression trees are different, and the variance of the finally integrated XGboost model can be reduced. During model training, before generating the CART regression tree, randomly selecting part of training samples to train, further reducing the correlation between the CART regression trees and increasing the diversity.

Preferably, when model training is performed, one XGBoost model is generally predicted only for one output parameter, and when a requirement for simultaneously predicting a plurality of output parameters is required to be met, a package multioutputregresor provided by sklern is used to predict the plurality of packaged output parameters, so as to finally obtain a unique evaluation index, thereby avoiding training the XGBoost model for different output parameters in sequence.

Specifically, referring to fig. 3, in step S103, training and verification are performed with 80% as a training set and 20% as a verification set.

Preferably, in step S103, during model training, 5-fold cross validation is used to optimize the hyper-parameters, and the hyper-parameters are continuously adjusted according to the evaluation result, so as to obtain a more accurate and stable model.

Preferably, in step S103, when obtaining the final quality estimation model, the mean square error of MSE, mean absolute error of MAE and R are adopted ² And determining at least one item in the coefficient indexes to evaluate the pre-estimated model, and selecting the optimal model as the final quality pre-estimated model from the models with the errors reaching the standard.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. The quality estimation method based on the XGboost algorithm is characterized by comprising the following steps of:

2. The quality estimation method based on the XGboost algorithm according to claim 1, wherein when the intrinsic quality estimation method is used for estimating the induction quenching quality of the automobile half shaft, the input parameters comprise product material parameters, product size, equipment electrical parameters, quenching inductor parameters, processing parameters and quenching liquid parameters, and the output parameters comprise the quenching hardness and the quenching depth of the rod part of the automobile half shaft;

3. The XGboost algorithm-based quality estimation method according to claim 1, wherein the XGboost model is as follows:

wherein the content of the first and second substances,

is the ith sample x _i Output parameter of the model prediction of (1), P _K (x _i ) Represents the ith sample x of K tree pairs _i Prediction of (A), F _K-1 (x _i ) Represents the first K-1 tree pairs for the ith sample x _i Prediction of (f) _K (x _i ) Representing the Kth CART regression tree; the CART regression tree maps the input parameters, the input parameters enable the nodes to split and enable the CART regression tree to grow, each sample can finally fall on a corresponding certain leaf node according to the splitting condition of the tree, the leaf node has a uniform prediction score for the sample falling on the leaf node and serves as the prediction value omega of the sample falling on the leaf node in the tree, a CART regression tree model is newly generated in each training round and is added into a current model to train a certain number of CART regression trees, the sum of the corresponding prediction values of the sample in each tree is the final prediction value of the sample, and the prediction value of the output parameter is obtained;

wherein, the constructed objective function is as follows:

wherein the content of the first and second substances,

output parameter, y, predicted for model of ith sample _i For the actual output parameter of the ith sample, K represents the number of regression trees, n represents the number of samples, f _k Represents the kth regression tree model, Ω (f) represents the regularization term, T represents the number of leaf nodes of the regression tree, ω represents the fraction of the leaf nodes, γ and λ are hyperparameters, or>

L2 regularization representing leaf node scores;

wherein the loss function is:

wherein the content of the first and second substances,

4. The quality estimation method based on the XGboost algorithm according to claim 3, characterized in that the specific establishment process of the quality estimation model is as follows:

(1) XGboost model establishment

(2) Constructing an objective function

the regularization term is:

wherein T represents the number of leaf nodes, omega represents the fraction of the leaf nodes, and gamma and lambda are constant coefficients; the first term gamma T controls the complexity of the tree by the number of leaf nodes and their coefficients; the second term is an L2 regular term and is used for controlling the prediction scores of the leaf nodes;

(3) Training model

And (3) performing second-order Taylor expansion on the target function:

/>

Removing to obtain an objective function:

for continued optimization, integrate the tree model f _s (x _i ) And leaf node prediction score ω, due to f _s (x _i ) Finally enabling the sample to fall on a leaf node, and calculating the predicted score omega of the leaf node, so when the sample falls on a certain leaf node j, the predicted score omega can be used _j In place of f _s (x _i ) And obtaining a new objective function:

wherein the content of the first and second substances,

I _j a sample set of all samples falling at leaf node j;

the optimal prediction score for leaf node j can be calculated

Based on the optimal solution obj ^(*) In each training process, all candidate CART regression tree models are evaluated by using the evaluation indexSelecting an optimal model by price; optimizing by adopting an accurate greedy algorithm, and for a new CART regression tree, calculating the difference of objective function values before and after node splitting from the root node of the tree to determine whether the tree is split;

selecting part of pre-estimated input parameters in the splitting process of the CART regression tree nodes, sequencing all sample values under the selected characteristics, and calculating g of each sample _i And g _i Then calculate obj at all the segmentation points _split Selecting obj _split The maximum input parameter and the segmentation point thereof are used as the optimal characteristic and the optimal segmentation point of the node, and the leaf node is split according to the optimal input parameter and the optimal segmentation point to finally generate a new CART regression tree; in the generation process of the CART regression tree, when the CART regression tree model meets one of the following conditions, the generation is stopped: the CART regression tree reaches a preset maximum depth max _ depth; the sample weight sum is smaller than a preset threshold min _ child _ weight.

5. The quality estimation method based on the XGboost algorithm according to claim 4, characterized in that the pruning is carried out on the newly generated CART regression tree, and the main steps are as follows: evaluating tree nodes from bottom to top, judging whether the income of the current node is less than the preset minimum income or not, and if the income is less than the minimum income, pruning;

wherein the content of the first and second substances,

then the training sample x of the s-1 th round is represented _i Eta is the reduction coefficient of the newly generated CART regression tree model, f _s (x _i ) A new tree model trained for the s-th round; eta is greater than or equal to 0 and less than or equal to 1.

6. The quality estimation method based on the XGboost algorithm according to claim 5, characterized in that when the CART regression tree nodes are split, a column sampling method of a layer-wise random collemple _ bylevel is adopted: before each node in the same layer of the CART regression tree is split, randomly selecting a part of features, and determining an optimal splitting point according to the part of features; during model training, before the CART regression tree is generated, part of training samples are randomly selected for training.

7. The quality estimation method based on the XGboost algorithm as claimed in claim 4, wherein during model training, an XGboost model is generally predicted only for one output parameter, and when the requirement for simultaneously predicting a plurality of output parameters is met during implementation, a package multiOutputRegessor provided by sklern is used for predicting the packaged plurality of output parameters, so as to finally obtain a unique evaluation index.

8. The quality estimation method based on the XGBoost algorithm of claim 1, wherein in step S103, training and verification are performed with 80% as a training set and 20% as a verification set.

9. The XGboost algorithm-based quality estimation method according to claim 3, wherein in step S103, 5-fold cross validation is adopted to optimize the hyper-parameters during model training.

10. Quality prediction method based on XGboost algorithm according to claim 1, characterized in that said steps areIn step S103, when obtaining the final quality estimation model, the mean square error of MSE, the mean absolute error of MAE, and the mean absolute error of R are used ² And determining at least one item in the coefficient indexes to evaluate the pre-estimated model, and selecting the optimal model as a final quality pre-estimated model from the models with errors reaching the standard.