CN115938496A - Quality Estimation Method Based on XGBoost Algorithm - Google Patents

Quality Estimation Method Based on XGBoost Algorithm Download PDF

Info

Publication number
CN115938496A
CN115938496A CN202211663862.3A CN202211663862A CN115938496A CN 115938496 A CN115938496 A CN 115938496A CN 202211663862 A CN202211663862 A CN 202211663862A CN 115938496 A CN115938496 A CN 115938496A
Authority
CN
China
Prior art keywords
model
tree
parameters
sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211663862.3A
Other languages
Chinese (zh)
Inventor
汪明宇
汪胜利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Shenli Auto Parts And Components Share Co ltd
Original Assignee
Hubei Shenli Auto Parts And Components Share Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Shenli Auto Parts And Components Share Co ltd filed Critical Hubei Shenli Auto Parts And Components Share Co ltd
Priority to CN202211663862.3A priority Critical patent/CN115938496A/en
Publication of CN115938496A publication Critical patent/CN115938496A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P10/00Technologies related to metal processing
    • Y02P10/25Process efficiency

Abstract

The invention discloses a quality estimation method based on an XGboost algorithm, and belongs to the technical field of metal heat treatment. The method comprises the following steps: s101: testing to obtain corresponding input parameters and output parameters as modeling data; s102: establishing an objective function of the XGboost model according to modeling data, presetting model parameters, and establishing the XGboost model with estimated quality; s103: dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model; s104: and substituting the input parameters to be estimated into the trained quality estimation model to obtain the predicted values of the output parameters. Aiming at the problem of complex induction quenching process, the XGboost algorithm-based induction quenching quality estimation method adopts the data driving model to estimate the quenching quality, has high model accuracy and can help to shorten the process formulation period.

Description

Quality estimation method based on XGboost algorithm
Technical Field
The invention belongs to the technical field of metal heat treatment, and particularly relates to a quality estimation method based on an XGboost algorithm.
Background
Induction quenching is an important metal heat treatment mode and has the advantages of high quality, high efficiency and environmental protection. However, induction quenching involves a highly nonlinear complex process in which electromagnetic induction, skin effect, eddy current and other electrothermal effects and phase change affect each other, physical changes in the process are rapid, influence parameters are multiple, fluctuation is severe, and it is difficult to establish a theoretical model to accurately predict induction quenching quality, so that a traditional treatment mode selects a reasonable heat treatment process through a large-batch quenching test, so that a product has standard quenching quality, but the traditional mode is high in cost and long in time consumption.
Disclosure of Invention
The invention provides an induction quenching quality estimation method based on an XGboost (eXtreme Gradient enhancement) algorithm, so that the problem that the quenching quality is difficult to estimate due to the complex induction quenching process in the prior art is solved, and the method has the advantage of short estimation time. The technical scheme is as follows:
the embodiment of the invention provides a quality estimation method based on an XGboost algorithm, which comprises the following steps:
s101: testing to obtain corresponding input parameters and output parameters as modeling data;
s102: establishing an objective function of the XGboost model according to modeling data, presetting model parameters, and establishing the XGboost model with estimated quality;
s103: dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model;
s104: and substituting the input parameters to be estimated into the trained quality estimation model to obtain the predicted values of the output parameters.
When the intrinsic quality estimation method is used for estimating the induction quenching quality of the automobile half shaft, input parameters comprise product material parameters, product dimensions, equipment electrical parameters, quenching inductor parameters, processing parameters and quenching liquid parameters, and output parameters comprise the quenching layer hardness and the quenching layer depth of the rod part of the automobile half shaft;
wherein the product material parameters include density, thermal conductivity, specific heat capacity, and coefficient of thermal expansion; the product size comprises the diameter of a neck, the diameter of a rod part, the diameter of a spline, the length of a half shaft and the angular radius of a combination part of the rod part and the disc part of the half shaft; the electrical parameters of the equipment comprise incoming line three-phase voltage, frequency and transformer transformation ratio; the quenching inductor parameters comprise the inner diameter of a quenching inductor, the voltage of the quenching inductor, the current of the quenching inductor, the frequency of the quenching inductor and the power of the quenching inductor; the processing parameters comprise heating time, heating average speed and half-shaft rotation speed; the quenching liquid parameters comprise the flow rate of the quenching liquid, the concentration of the quenching liquid, the temperature of the quenching liquid and the specific heat capacity of the quenching liquid.
Wherein, the XGboost model is as follows:
Figure BDA0004013900000000021
wherein the content of the first and second substances,
Figure BDA0004013900000000022
for the ith sample x i Output parameter of the model prediction of (1), F K (x i ) Represents the ith sample x of K tree pairs i Prediction of (A), F K-1 (x i ) Represents the first K-1 tree pairs for the ith sample x i Prediction of (f) K (x i ) Representing the Kth CART regression tree; the CART regression tree maps the input parameters, the input parameters enable nodes to split and enable the CART regression tree to grow, each sample can finally fall on a corresponding leaf node according to the splitting condition of the tree, the leaf node has a uniform prediction score for the samples falling on the leaf node, the uniform prediction score is used as a prediction value omega of the samples falling on the leaf node in the tree, a CART regression tree model is newly generated in each training round and is added into a current model, a certain number of CART regression trees are trained, the sum of the corresponding prediction values of the samples in each tree is the final prediction value of the samples, and the prediction value of the output parameters is obtained;
wherein the constructed objective function is as follows:
Figure BDA0004013900000000023
wherein the content of the first and second substances,
Figure BDA0004013900000000024
wherein the content of the first and second substances,
Figure BDA0004013900000000025
output parameter, y, predicted for model of ith sample i For the actual output parameter of the ith sample, K represents the number of regression trees, n represents the number of samples, f k Represents the kth regression tree model, Ω (f) represents the regularization term, T represents the number of leaf nodes of the regression tree, ω represents the fraction of the leaf nodes, γ and λ are hyper-parameters, and/or>
Figure BDA0004013900000000026
L2 regularization representing leaf node scores;
wherein the loss function is:
Figure BDA0004013900000000027
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004013900000000028
output parameter, y, predicted for model of ith sample i For the actual output parameter of the ith sample, δ is the parameter of the loss function, and n represents the number of samples.
Specifically, the specific establishment process of the quality estimation model is as follows:
(1) Establishing XGboost model
Figure BDA0004013900000000031
(2) Constructing an objective function
Figure BDA0004013900000000032
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004013900000000033
as a loss function for calculating the error between the predicted value and the true value of the quality estimation model, y i Represents a true value, <' > is present>
Figure BDA0004013900000000034
Denotes the predicted value, Ω (f) k ) The regularization term is used for controlling the complexity of the model and controlling overfitting;
the regularization term is:
Figure BDA0004013900000000035
/>
wherein T represents the number of leaf nodes, omega represents the fraction of the leaf nodes, and gamma and lambda are constant coefficients; the first term gamma T controls the complexity of the tree through the number of leaf nodes and the coefficient thereof; the second term is an L2 regular term and is used for controlling the prediction scores of the leaf nodes;
(3) Training model
And (3) performing second-order Taylor expansion on the target function:
Figure BDA0004013900000000036
wherein, g i Is the first order gradient of the loss function, h i For the second order gradient of the loss function, statistics can be made in advance before training the objective function:
Figure BDA0004013900000000037
the constant term does not influence the optimization result, can be further simplified, and the constant term
Figure BDA0004013900000000038
Removing to obtain an objective function:
Figure BDA0004013900000000039
for continued optimization, integrate the tree model f s (x i ) And leaf node prediction score ω, due to f 3 (x i ) Finally enabling the sample to fall on a leaf node, and calculating the predicted score omega of the leaf node, so when the sample falls on a certain leaf node j, the predicted score omega can be used j In place of f s (x i ) And obtaining a new objective function:
Figure BDA0004013900000000041
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004013900000000042
I j a sample set of all samples falling at leaf node j;
the optimal prediction score for leaf node j can be calculated
Figure BDA0004013900000000043
Figure BDA0004013900000000044
The optimal weight of the leaf node depends on the first-order and second-order gradients and the L2 regular coefficient lambda;
the optimal weight is utilized to obtain the optimal solution obj of the target function (*)
Figure BDA0004013900000000045
Based on the optimal solution obj (*) In each training process, evaluating all candidate CART regression tree models by using the scoring indexes to select an optimal model; optimizing by adopting an accurate greedy algorithm, and calculating the values before and after node splitting of a new CART regression tree from the root node of the treeDetermining whether to split according to the difference of the objective function values;
based on the optimal solution obj (*) Calculating the difference of the objective function values before and after the node splitting:
Figure BDA0004013900000000046
selecting part of pre-estimated input parameters in the splitting process of the CART regression tree nodes, sequencing all sample values under the selected characteristics, and calculating g of each sample i And h i Then calculate obj at all the segmentation points split Selecting obj split The maximum input parameter and the segmentation point thereof are used as the optimal characteristic and the optimal segmentation point of the node, and the leaf node is split according to the optimal input parameter and the optimal segmentation point to finally generate a new CART regression tree; in the CART regression tree generation process, generation is stopped when the CART regression tree model meets one of the following conditions: the CART regression tree reaches a preset maximum depth max _ depth; the sample weight sum is smaller than a preset threshold min _ child _ weight.
Preferably, pruning is carried out on the newly generated CART regression tree, and the main steps are as follows: evaluating tree nodes from bottom to top, judging whether the income of the current node is less than the preset minimum income or not, and if the income is less than the minimum income, pruning;
after pruning, a new CART regression tree is determined, and the newly generated tree model is added to the current model:
Figure BDA0004013900000000047
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004013900000000048
then represents the s-1 th training sample x i Eta is the reduction coefficient of the newly generated CART regression tree model, f s (x i ) A new tree model trained for the S-th round; the value range of eta is more than or equal to 0 and less than or equal to 1.
Preferably, when the CART regression tree nodes are split, a column sampling method of layer-wise random colsample _ bylev is adopted: before each node in the same layer of the CART regression tree is split, randomly selecting a part of features, and determining an optimal splitting point according to the part of features; during model training, before generating the CART regression tree, randomly selecting part of training samples to train.
Preferably, during model training, one XGBoost model generally predicts only one output parameter, and when a requirement for predicting multiple output parameters at the same time during implementation needs to be met, one package multioutputregresor provided by sklern is used to predict the packaged multiple output parameters, so as to finally obtain a unique evaluation index.
Specifically, in step S103, training and verification are performed with 80% as a training set and 20% as a verification set.
Preferably, step S103 adopts 5-fold cross validation to optimize the hyper-parameters during model training.
Preferably, in step S103, when obtaining the final quality estimation model, the mean square error of MSE, the mean absolute error of MAE, and the sum R are used 2 And determining at least one item in the coefficient indexes to evaluate the pre-estimated model, and selecting the optimal model as a final quality pre-estimated model from the models with errors reaching the standard.
In summary, compared with the prior art, the above technical solutions contemplated by the present invention mainly have the following technical advantages: aiming at the problem of complex induction quenching process, the method for estimating the induction quenching quality based on the XGboost algorithm adopts the data driving model to estimate the quenching quality, has high model accuracy and can provide help for shortening the process formulation period; the method has the advantages that various characteristics influencing the induction quenching quality are considered, the method is well suitable for induction quenching tests with a plurality of influencing factors, a large number of tests for certain influencing characteristics are not needed, and the test cost is saved; data in the induction quenching test can be used as a whole, the coupling relation among different characteristics is fully considered, the XGboost model does not need to normalize the data, and the XGboost model has good adaptability to the characteristics of different orders of magnitude, wide applicability and strong generalization capability.
Drawings
Fig. 1 is a flowchart of a quality estimation method based on an XGBoost algorithm according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an XGBoost model provided in an embodiment of the present invention;
fig. 3 is a schematic diagram of the estimation and selection of the prediction model according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides a quality estimation method based on an XGBoost algorithm, including the following steps:
s101: tests are performed to obtain corresponding input parameters and output parameters as modeling data.
S102: and establishing an objective function of the XGboost model according to the modeling data, presetting model parameters, and establishing the XGboost model with the estimated quality.
S103: and dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model.
S104: and substituting the input parameters to be estimated into the trained quality estimation model to obtain the predicted values of the output parameters.
When the method for estimating the intrinsic quality is used for estimating the induction quenching quality of the automobile half shaft, input parameters comprise product material parameters, product dimensions, equipment electrical parameters, quenching inductor parameters, processing parameters, quenching liquid parameters and the like, and output parameters comprise the quenching layer hardness, the quenching layer depth and the like of the rod part of the automobile half shaft.
Wherein the product material parameters comprise density, thermal conductivity, specific heat capacity, thermal expansion coefficient and the like; the product size comprises the diameter of a neck, the diameter of a rod part, the diameter of a spline, the length of a half shaft, the angular radius of a combination part of the rod part of the half shaft and the disk part and the like; the electrical parameters of the equipment comprise incoming line three-phase voltage, frequency, transformer transformation ratio and the like; the quenching inductor parameters comprise the inner diameter of a quenching inductor, the voltage of the quenching inductor, the current of the quenching inductor, the frequency of the quenching inductor, the power of the quenching inductor and the like; the processing parameters comprise heating time, heating average speed, half shaft rotation speed and the like; the quenching liquid parameters comprise the flow rate of the quenching liquid, the concentration of the quenching liquid, the temperature of the quenching liquid, the specific heat capacity of the quenching liquid and the like. From the above description, it can be seen that, for the estimation of the induction hardening quality of the automobile half shaft, the types of the input parameters are very many, and can reach more than 20, therefore, in the subsequent processing process of the invention, the inventor fully considers the simplification of the model, the simplification of the algorithm and the simplification of the process so as to reduce the evaluation time.
Further, abnormal data generated by test failure and the like are eliminated from the acquired sample data, and all data used for training are ensured to be within a reasonable range.
Wherein, the XGboost model is as follows:
Figure BDA0004013900000000061
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004013900000000062
for the ith sample x i Output parameter of the model prediction of (1), F K (x i ) Represents the ith sample x of K tree pairs i Prediction of (D), F K-1 (x i ) Represents the first K-1 tree pairs for the ith sample x i Prediction of (f) K (x i ) Represents the Kth CART regression tree.
Referring to fig. 2, since the induction hardening quality prediction belongs to regression prediction, the CART regression tree maps the input parameters, the input parameters split nodes to allow the CART regression tree to grow, each sample finally falls on a corresponding leaf node according to the splitting condition of the tree, the leaf node has a uniform prediction score for the samples falling on the leaf node, the uniform prediction score is used as a prediction value omega of the sample falling on the leaf node on the tree, a CART regression tree model is newly generated in each training run and is added into a current model, a certain number of CART regression trees are trained, the sum of the corresponding prediction values of the samples in each tree is the final prediction value of the sample, and the prediction value of the output parameter is obtained.
Wherein the constructed objective function is as follows:
Figure BDA0004013900000000071
wherein the content of the first and second substances,
Figure BDA0004013900000000072
wherein the content of the first and second substances,
Figure BDA0004013900000000073
output parameter, y, predicted for model of ith sample i For the actual output parameter of the ith sample, K represents the number of regression trees, n represents the number of samples, f k Represents the kth regression tree model, Ω (f) represents the regularization term, T represents the number of leaf nodes of the regression tree, ω represents the fraction of the leaf nodes, γ and λ are hyper-parameters, and/or>
Figure BDA0004013900000000074
L2 canonical representing leaf node score.
The prediction of the quality of induction quenching is a regression problem,
Figure BDA0004013900000000075
generally, the average absolute error of the MAE is adopted, but because the MAE has an inconductable region, a loss function Pseudo Huber loss approximated by the MAE is adopted, and the loss function is:
Figure BDA0004013900000000076
wherein the content of the first and second substances,
Figure BDA0004013900000000077
is as followsOutput parameter of model prediction of i samples, y i For the actual output parameter of the ith sample, δ is the parameter of the loss function, and n represents the number of samples.
Preferably, in step S103, relevant parameters of the XGBoost model are set: eta represents the shrink step; gamma represents the minimum loss reduction for further segmentation of the leaf nodes of the tree; max _ depth represents the maximum depth of the representation tree; min _ child _ weight represents the minimum of the sum of subtree weights; max _ delta _ step represents the maximum step size of each tree weight change; max _ leaf _ nodes represents the maximum number of leaf nodes; subsample represents the sample random sampling, i.e., the ratio of subsamples for tree growth; colsample _ byte represents column sample, i.e., the subset ratio of features used to construct each tree; lambda represents the L2 regularization term parameter of the weight value controlling the complexity of the model; alpha represents an L1 regularization term parameter that controls the weight value of the model complexity; scale _ pos _ weight is a positive sample weight scale.
Specifically, the specific establishment process of the quality estimation model is as follows:
(1) Establishing XGboost model
Figure BDA0004013900000000078
(2) Constructing an objective function
Figure BDA0004013900000000079
Wherein the content of the first and second substances,
Figure BDA0004013900000000081
as a loss function for calculating the error between the predicted value and the true value of the quality estimation model, y i Represents a true value, <' > is present>
Figure BDA0004013900000000082
Represents a predicted value, this term being a differentiable convex function; omega (f) k ) For the regularization term, for controlling the complexity of the modelAnd (6) fitting.
The regularization term is:
Figure BDA0004013900000000083
wherein T represents the number of leaf nodes, omega represents the fraction of the leaf nodes, and gamma and lambda are constant coefficients; the first term gamma T controls the complexity of the tree by the number of leaf nodes and their coefficients; the larger the value, the larger the objective function, thereby suppressing the complexity of the model. The second term is an L2 regular term used to control the prediction score of the leaf node.
(3) Training model
And (3) performing second-order Taylor expansion on the target function:
Figure BDA0004013900000000084
wherein, g i Is the first order gradient of the loss function, h i For the second order gradient of the loss function, statistics can be made in advance before training the objective function:
Figure BDA0004013900000000085
the constant term does not influence the optimization result, can be further simplified, and the constant term
Figure BDA0004013900000000086
Removing to obtain an objective function:
Figure BDA0004013900000000087
for continued optimization, integrate the tree model f 3 (x i ) And leaf node prediction score ω, due to f 3 (x i ) Finally enabling the sample to fall on a leaf node, and calculating to obtain the predicted score omega of the leaf node, so that when the sample falls on a certain leaf node jWhen, may use ω j In place of f S (x i ) And obtaining a new objective function:
Figure BDA0004013900000000088
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0004013900000000089
I j a sample set of all samples falling at leaf node j;
the optimal prediction score (optimal weight) for leaf node j can be calculated
Figure BDA0004013900000000091
Figure BDA0004013900000000092
The optimal weight of the leaf node depends on the first order, second order gradients and the L2 regularization coefficient λ. The L2 regular reduces the weight of the leaf nodes, reduces the influence of the leaf nodes on the whole prediction result, and prevents overfitting.
The optimal weight is utilized to obtain the optimal solution obj of the objective function (*)
Figure BDA0004013900000000093
Based on the optimal solution obj (*) In each training process, all candidate CART regression tree models are evaluated by using the scoring indexes, and then the optimal model can be selected. The method has the advantages that the number of input parameters estimated by induction quenching is large, the calculation amount is large, in order to reduce the calculation amount and avoid enumerating all possible tree structures, the optimization is carried out by adopting an accurate greedy algorithm, and for a new CART regression tree, whether the tree is split or not is determined by calculating the difference of objective function values before and after the node is split from the root node of the tree.
Based on the optimal solution obj (*) Before and after splitting of a computing nodeDifference in objective function value:
Figure BDA0004013900000000094
in the splitting process of the CART regression tree node, selecting part of pre-estimated input parameters, sequencing all sample values (segmentation points) under the selected characteristics, and calculating g of each sample i And h i Then calculate obj at all the segmentation points split Selecting obj split And the maximum input parameter and the segmentation point thereof are used as the optimal characteristic and the optimal segmentation point of the node, and the leaf node is split according to the optimal input parameter and the optimal segmentation point to finally generate a new CART regression tree. In the generation process of the CART regression tree, when the CART regression tree model meets one of the following conditions, the generation is stopped: the CART regression tree reaches a preset maximum depth max _ depth; the sample weight sum is smaller than a preset threshold min _ child _ weight.
Preferably, overfitting caused by excessive splitting of the CART regression tree is avoided, and the generalization capability of the model is improved; pruning the newly generated CART regression tree, and the main steps are as follows: and evaluating the tree nodes from bottom to top, judging whether the income of the current node is less than the preset minimum income, and if the income is less than the minimum income, pruning is beneficial to improving the generalization capability of the model and pruning.
After pruning, a new CART regression tree is determined, and the newly generated tree model is added to the current model:
Figure BDA0004013900000000095
wherein the content of the first and second substances,
Figure BDA0004013900000000096
then represents the s-1 th training sample x i Eta is a reduction coefficient of the newly generated CART regression tree model, f s (x l ) A new tree model is trained for the s-th round. Eta can also be regarded as learning rate, and its value range is 0 ≦Eta is less than or equal to 1, the influence of each tree on the overall model is limited, and overfitting is avoided.
Preferably, the estimated characteristics of the induction quenching quality are more, the calculated amount is large, and the characteristic number needs to be compressed for accelerating the training speed; meanwhile, different CART regression trees adopt the same characteristics, so that the correlation between the CART regression tree models is too strong. The column sampling method of random colsample _ bylev according to layers is adopted to solve the problems: before each node in the same layer of the CART regression tree is split, a part of features are randomly selected, and the optimal splitting point is determined only according to the part of features, so that the model training speed can be increased. In addition, the relevance among models is reduced, different CART regression trees are different, and the variance of the finally integrated XGboost model can be reduced. During model training, before generating the CART regression tree, randomly selecting part of training samples to train, further reducing the correlation between the CART regression trees and increasing the diversity.
Preferably, when model training is performed, one XGBoost model is generally predicted only for one output parameter, and when a requirement for simultaneously predicting a plurality of output parameters is required to be met, a package multioutputregresor provided by sklern is used to predict the plurality of packaged output parameters, so as to finally obtain a unique evaluation index, thereby avoiding training the XGBoost model for different output parameters in sequence.
Specifically, referring to fig. 3, in step S103, training and verification are performed with 80% as a training set and 20% as a verification set.
Preferably, in step S103, during model training, 5-fold cross validation is used to optimize the hyper-parameters, and the hyper-parameters are continuously adjusted according to the evaluation result, so as to obtain a more accurate and stable model.
Preferably, in step S103, when obtaining the final quality estimation model, the mean square error of MSE, mean absolute error of MAE and R are adopted 2 And determining at least one item in the coefficient indexes to evaluate the pre-estimated model, and selecting the optimal model as the final quality pre-estimated model from the models with the errors reaching the standard.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. The quality estimation method based on the XGboost algorithm is characterized by comprising the following steps of:
s101: testing to obtain corresponding input parameters and output parameters as modeling data;
s102: establishing an objective function of the XGboost model according to modeling data, presetting model parameters, and establishing the XGboost model with estimated quality;
s103: dividing modeling data into a training set and a verification set, and training and verifying the established XGboost model to obtain a quality estimation model;
s104: and substituting the input parameters to be estimated into the trained quality estimation model to obtain the predicted values of the output parameters.
2. The quality estimation method based on the XGboost algorithm according to claim 1, wherein when the intrinsic quality estimation method is used for estimating the induction quenching quality of the automobile half shaft, the input parameters comprise product material parameters, product size, equipment electrical parameters, quenching inductor parameters, processing parameters and quenching liquid parameters, and the output parameters comprise the quenching hardness and the quenching depth of the rod part of the automobile half shaft;
wherein the product material parameters include density, thermal conductivity, specific heat capacity, and coefficient of thermal expansion; the product size comprises the diameter of a neck, the diameter of a rod part, the diameter of a spline, the length of a half shaft and the angular radius of a combination part of the rod part and the disc part of the half shaft; the electrical parameters of the equipment comprise incoming line three-phase voltage, frequency and transformer transformation ratio; the quenching inductor parameters comprise the inner diameter of a quenching inductor, the voltage of the quenching inductor, the current of the quenching inductor, the frequency of the quenching inductor and the power of the quenching inductor; the processing parameters comprise heating time, heating average speed and half-shaft rotation speed; the quenching liquid parameters comprise the flow rate of the quenching liquid, the concentration of the quenching liquid, the temperature of the quenching liquid and the specific heat capacity of the quenching liquid.
3. The XGboost algorithm-based quality estimation method according to claim 1, wherein the XGboost model is as follows:
Figure FDA0004013899990000011
wherein the content of the first and second substances,
Figure FDA0004013899990000012
is the ith sample x i Output parameter of the model prediction of (1), P K (x i ) Represents the ith sample x of K tree pairs i Prediction of (A), F K-1 (x i ) Represents the first K-1 tree pairs for the ith sample x i Prediction of (f) K (x i ) Representing the Kth CART regression tree; the CART regression tree maps the input parameters, the input parameters enable the nodes to split and enable the CART regression tree to grow, each sample can finally fall on a corresponding certain leaf node according to the splitting condition of the tree, the leaf node has a uniform prediction score for the sample falling on the leaf node and serves as the prediction value omega of the sample falling on the leaf node in the tree, a CART regression tree model is newly generated in each training round and is added into a current model to train a certain number of CART regression trees, the sum of the corresponding prediction values of the sample in each tree is the final prediction value of the sample, and the prediction value of the output parameter is obtained;
wherein, the constructed objective function is as follows:
Figure FDA0004013899990000021
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004013899990000022
wherein the content of the first and second substances,
Figure FDA0004013899990000023
output parameter, y, predicted for model of ith sample i For the actual output parameter of the ith sample, K represents the number of regression trees, n represents the number of samples, f k Represents the kth regression tree model, Ω (f) represents the regularization term, T represents the number of leaf nodes of the regression tree, ω represents the fraction of the leaf nodes, γ and λ are hyperparameters, or>
Figure FDA0004013899990000024
L2 regularization representing leaf node scores;
wherein the loss function is:
Figure FDA0004013899990000025
wherein the content of the first and second substances,
Figure FDA0004013899990000026
output parameter, y, predicted for model of ith sample i For the actual output parameter of the ith sample, δ is the parameter of the loss function, and n represents the number of samples.
4. The quality estimation method based on the XGboost algorithm according to claim 3, characterized in that the specific establishment process of the quality estimation model is as follows:
(1) XGboost model establishment
Figure FDA0004013899990000027
(2) Constructing an objective function
Figure FDA0004013899990000028
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004013899990000029
as a loss function for calculating the error between the predicted value and the true value of the quality estimation model, y i Represents a true value, <' > is present>
Figure FDA00040138999900000210
Denotes the predicted value, Ω (f) k ) The regularization term is used for controlling the complexity of the model and controlling overfitting;
the regularization term is:
Figure FDA00040138999900000211
wherein T represents the number of leaf nodes, omega represents the fraction of the leaf nodes, and gamma and lambda are constant coefficients; the first term gamma T controls the complexity of the tree by the number of leaf nodes and their coefficients; the second term is an L2 regular term and is used for controlling the prediction scores of the leaf nodes;
(3) Training model
And (3) performing second-order Taylor expansion on the target function:
Figure FDA0004013899990000031
wherein, g i Is the first order gradient of the loss function, h i For the second order gradient of the loss function, statistics can be made in advance before training the objective function:
Figure FDA0004013899990000032
/>
the constant term does not influence the optimization result, can be further simplified, and the constant term
Figure FDA0004013899990000033
Removing to obtain an objective function:
Figure FDA0004013899990000034
for continued optimization, integrate the tree model f s (x i ) And leaf node prediction score ω, due to f s (x i ) Finally enabling the sample to fall on a leaf node, and calculating the predicted score omega of the leaf node, so when the sample falls on a certain leaf node j, the predicted score omega can be used j In place of f s (x i ) And obtaining a new objective function:
Figure FDA0004013899990000035
wherein the content of the first and second substances,
Figure FDA0004013899990000036
I j a sample set of all samples falling at leaf node j;
the optimal prediction score for leaf node j can be calculated
Figure FDA0004013899990000037
Figure FDA0004013899990000038
The optimal weight of the leaf node depends on the first-order and second-order gradients and the L2 regular coefficient lambda;
the optimal weight is utilized to obtain the optimal solution obj of the target function (*)
Figure FDA0004013899990000039
Based on the optimal solution obj (*) In each training process, all candidate CART regression tree models are evaluated by using the evaluation indexSelecting an optimal model by price; optimizing by adopting an accurate greedy algorithm, and for a new CART regression tree, calculating the difference of objective function values before and after node splitting from the root node of the tree to determine whether the tree is split;
based on the optimal solution obj (*) Calculating the difference of the objective function values before and after the node splitting:
Figure FDA0004013899990000041
selecting part of pre-estimated input parameters in the splitting process of the CART regression tree nodes, sequencing all sample values under the selected characteristics, and calculating g of each sample i And g i Then calculate obj at all the segmentation points split Selecting obj split The maximum input parameter and the segmentation point thereof are used as the optimal characteristic and the optimal segmentation point of the node, and the leaf node is split according to the optimal input parameter and the optimal segmentation point to finally generate a new CART regression tree; in the generation process of the CART regression tree, when the CART regression tree model meets one of the following conditions, the generation is stopped: the CART regression tree reaches a preset maximum depth max _ depth; the sample weight sum is smaller than a preset threshold min _ child _ weight.
5. The quality estimation method based on the XGboost algorithm according to claim 4, characterized in that the pruning is carried out on the newly generated CART regression tree, and the main steps are as follows: evaluating tree nodes from bottom to top, judging whether the income of the current node is less than the preset minimum income or not, and if the income is less than the minimum income, pruning;
after pruning, a new CART regression tree is determined, and the newly generated tree model is added to the current model:
Figure FDA0004013899990000042
wherein the content of the first and second substances,
Figure FDA0004013899990000043
then the training sample x of the s-1 th round is represented i Eta is the reduction coefficient of the newly generated CART regression tree model, f s (x i ) A new tree model trained for the s-th round; eta is greater than or equal to 0 and less than or equal to 1.
6. The quality estimation method based on the XGboost algorithm according to claim 5, characterized in that when the CART regression tree nodes are split, a column sampling method of a layer-wise random collemple _ bylevel is adopted: before each node in the same layer of the CART regression tree is split, randomly selecting a part of features, and determining an optimal splitting point according to the part of features; during model training, before the CART regression tree is generated, part of training samples are randomly selected for training.
7. The quality estimation method based on the XGboost algorithm as claimed in claim 4, wherein during model training, an XGboost model is generally predicted only for one output parameter, and when the requirement for simultaneously predicting a plurality of output parameters is met during implementation, a package multiOutputRegessor provided by sklern is used for predicting the packaged plurality of output parameters, so as to finally obtain a unique evaluation index.
8. The quality estimation method based on the XGBoost algorithm of claim 1, wherein in step S103, training and verification are performed with 80% as a training set and 20% as a verification set.
9. The XGboost algorithm-based quality estimation method according to claim 3, wherein in step S103, 5-fold cross validation is adopted to optimize the hyper-parameters during model training.
10. Quality prediction method based on XGboost algorithm according to claim 1, characterized in that said steps areIn step S103, when obtaining the final quality estimation model, the mean square error of MSE, the mean absolute error of MAE, and the mean absolute error of R are used 2 And determining at least one item in the coefficient indexes to evaluate the pre-estimated model, and selecting the optimal model as a final quality pre-estimated model from the models with errors reaching the standard.
CN202211663862.3A 2022-12-23 2022-12-23 Quality Estimation Method Based on XGBoost Algorithm Pending CN115938496A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211663862.3A CN115938496A (en) 2022-12-23 2022-12-23 Quality Estimation Method Based on XGBoost Algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211663862.3A CN115938496A (en) 2022-12-23 2022-12-23 Quality Estimation Method Based on XGBoost Algorithm

Publications (1)

Publication Number Publication Date
CN115938496A true CN115938496A (en) 2023-04-07

Family

ID=86655810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211663862.3A Pending CN115938496A (en) 2022-12-23 2022-12-23 Quality Estimation Method Based on XGBoost Algorithm

Country Status (1)

Country Link
CN (1) CN115938496A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116305588A (en) * 2023-05-17 2023-06-23 中国航空工业集团公司沈阳空气动力研究所 Wind tunnel test data anomaly detection method, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116305588A (en) * 2023-05-17 2023-06-23 中国航空工业集团公司沈阳空气动力研究所 Wind tunnel test data anomaly detection method, electronic equipment and storage medium
CN116305588B (en) * 2023-05-17 2023-08-11 中国航空工业集团公司沈阳空气动力研究所 Wind tunnel test data anomaly detection method, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107621269A (en) Fiber Optic Gyroscope Temperature Drift error compensating method
CN110889085A (en) Intelligent wastewater monitoring method and system based on complex network multiple online regression
CN106980910B (en) Medium-and-long-term power load measuring and calculating system and method
CN110781947B (en) Power load prediction model training and power load prediction method and device
CN111310348A (en) Material constitutive model prediction method based on PSO-LSSVM
CN115938496A (en) Quality Estimation Method Based on XGBoost Algorithm
CN113780420B (en) GRU-GCN-based method for predicting concentration of dissolved gas in transformer oil
CN108961460B (en) Fault prediction method and device based on sparse ESGP (Enterprise service gateway) and multi-objective optimization
CN113066527A (en) Target prediction method and system for siRNA knockdown of mRNA
CN112070272A (en) Method and device for predicting icing thickness of power transmission line
CN115049115A (en) RDPG wind speed correction method considering NWP wind speed transverse and longitudinal errors
CN113495214B (en) Super-capacitor state-of-charge estimation method based on temperature change model
CN116662925A (en) Industrial process soft measurement method based on weighted sparse neural network
CN114648178B (en) Operation and maintenance strategy optimization method of electric energy metering device based on DDPG algorithm
CN114755586B (en) Lithium ion battery residual life prediction method
CN111861041B (en) Method for predicting dynamic recrystallization type rheological stress of Nb microalloyed steel
CN112581311B (en) Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants
CN107944552A (en) A kind of industrial Internet of Things parameter prediction method based on Elman neutral nets
CN114971090A (en) Electric heating load prediction method, system, equipment and medium
CN109345274B (en) Neighbor user selection method based on BP neural network scoring prediction error
CN114492195A (en) CAE model multi-parameter intelligent correction calculation method based on optimization algorithm
CN113837474A (en) Regional soil heavy metal pollution index prediction method and device
CN113111588A (en) NO of gas turbineXEmission concentration prediction method and device
CN112859793B (en) Industrial production process dynamic time delay identification method based on improved sliding time window
CN111079995A (en) Power load nonlinear harmonic comprehensive prediction method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination