CN117332676A - Fatigue performance prediction method based on self-adaptive feature selection - Google Patents
Fatigue performance prediction method based on self-adaptive feature selection Download PDFInfo
- Publication number
- CN117332676A CN117332676A CN202311164299.XA CN202311164299A CN117332676A CN 117332676 A CN117332676 A CN 117332676A CN 202311164299 A CN202311164299 A CN 202311164299A CN 117332676 A CN117332676 A CN 117332676A
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- inputting
- model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012360 testing method Methods 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 30
- 239000000463 material Substances 0.000 claims abstract description 9
- 238000012706 support-vector machine Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000003044 adaptive effect Effects 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 4
- 238000011144 upstream manufacturing Methods 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000005255 carburizing Methods 0.000 description 2
- 238000001816 cooling Methods 0.000 description 2
- 238000009792 diffusion process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000005496 tempering Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 229910000831 Steel Inorganic materials 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010791 quenching Methods 0.000 description 1
- 230000000171 quenching effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
- G06F18/15—Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2101/00—Indexing scheme relating to the type of digital function generated
- G06F2101/14—Probability distribution functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/02—Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Geometry (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Investigating Strength Of Materials By Application Of Mechanical Stress (AREA)
Abstract
The invention relates to a fatigue performance prediction method based on self-adaptive feature selection, which comprises the following steps: s1, collecting material related data, and dividing a data set into a training set and a testing set; s2, taking a correlation combination of the training set and the testing set as a characteristic weight, and taking sensitive characteristics of the characteristic weight as sample data; s3, inputting sample data into a support vector machine regression model for training, and inputting a sample test set into the trained support vector machine regression model for evaluating model performance; s4, inputting the combined test set into the integrated regression model, inputting the combined test set into the trained integrated regression model, and using the estimated model performance. The method can help identify the most relevant features of fatigue performance prediction, thereby potentially improving the accuracy of the prediction model, and the model can intensively consider key factors influencing fatigue behavior through selecting information features and eliminating irrelevant features.
Description
Technical Field
The invention belongs to the technical field of material fatigue performance evaluation, and particularly relates to a fatigue performance prediction method based on self-adaptive feature selection.
Background
In modern industrial production, the fatigue performance of materials or parts is a very important physical quantity, and plays a vital role in guaranteeing the quality, the service life and the safety of products. In the past, fatigue performance predictions were generally based on engineering experience and experimental data, and this approach has problems of great dependence, low reliability and high cost. In recent years, the wide application of machine learning methods has brought new opportunities for fatigue performance prediction.
However, there are also some technical problems to be solved in the course of applying machine learning to fatigue performance prediction. For example, how to preprocess the input data and reduce noise and errors of the data; how to design and select appropriate feature extraction and processing techniques to capture important features of materials or components; how to select a proper machine learning algorithm and optimize to improve the accuracy, precision and robustness of fatigue performance prediction; and how to evaluate and verify the predictive model to determine whether its quality and performance meet requirements, etc.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a fatigue performance prediction method based on self-adaptive feature selection.
The invention solves the technical problems by the following technical proposal:
a fatigue performance prediction method based on self-adaptive feature selection is characterized by comprising the following steps: the prediction method comprises the following steps:
s1, collecting material related data, performing normalization processing on characteristic data to form a data set, and dividing the data set into a training set and a testing set;
s2, extracting correlation between the features and the target variable by using mutual information, combining the correlation of the training set and the testing set by using a self-adaptive weighting method to serve as feature weights, taking the feature weights as input of self-adaptive feature selection, screening sensitive features, and taking the sensitive features as sample data;
s3, dividing sample data into a sample training set and a sample testing set, inputting the sample training set into a support vector machine regression model for training, inputting the sample testing set into the trained support vector machine regression model, and using a decision coefficient R 2 To evaluate model performance;
s4, combining the non-sensitive features and the sensitive features by using an enumeration method, dividing the combined data into a combined training set and a combined testing set, constructing an integrated regression model based on SVR by using a BaggingReggresor, inputting the combined testing set into the integrated regression model, inputting the combined testing set into the trained integrated regression model, and using a decision coefficient R 2 To evaluate model performance.
The specific steps of S1 are as follows:
the collected material-related data contains chemical composition, process parameters, upstream process characteristics, and corresponding fatigue strength, and the resulting data set is denoted as f= { F i,j ,y i ,i=1,2,…,n;j=1,2,…,k},
Wherein: n is the number of samples;
k is the number of features;
f i,j a j-th feature that is the i-th sample;
y i fatigue strength for the ith sample;
normalizing the characteristics of the data, and dividing the normalized data set into a training set and a testing set;
the normalization process is expressed as:
wherein: f's' i,j Normalized values for the jth feature in the ith sample;
x j all eigenvalues for the j-th.
The specific steps of S2 are as follows:
mutual information is used to extract correlations between the characteristic variables of the training and test sets and the fatigue strength. Mutual information is a concept that measures the degree of association between two random variables and is not limited to linear relationships. It is a concept based on information theory, which is used to describe the statistical dependency between two variables, i.e. the degree of sharing of information between them. The computation of mutual information involves a joint probability distribution of two random variables and a respective edge probability distribution. In the continuous random variable, the calculation formula is as follows:
wherein: p (f) j Y) is f j And a joint probability density function of Y;
p(f j ) And p (Y) is f j And an edge probability density function of Y;
I(f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is f j And Y, which is used to measure two random variables f j And Y;
the correlation of the training set and the test set is combined by adopting a self-adaptive weighting method to form characteristic weights, and the calculation formula is as follows:
W j =αI 1 (f j ;Y)+βI 2 (f j ;Y)
wherein: w (W) j Is a characteristic weight;
I 1 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the training set;
I 2 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the test set;
α, β are weight coefficients, α+β=1;
calculating threshold value l according to feature weight and significance level α Selecting a feature weight W j Exceeding threshold l α Is characterized by sensitive characteristics, and is used as sample data, and the calculation formula is as follows:
wherein: mu is the average value of the combination weights;
sigma is the variance of the combining weights;
alpha is chi-square distributionSignificance level of 2 mu 2 And/sigma is the degree of freedom.
The invention has the advantages and beneficial effects that:
1. the fatigue performance prediction method based on the self-adaptive feature selection can help to identify the most relevant feature of fatigue performance prediction, thereby potentially improving the accuracy of a prediction model, and the model can intensively consider key factors influencing fatigue behavior through the selection of information features and the elimination of irrelevant features.
2. The fatigue performance prediction method based on self-adaptive feature selection utilizes data to identify important features, allows a model to adapt and learn from available information, has higher accuracy and adaptability when processing a complex fatigue mechanism which cannot be completely captured by a traditional model, enumerates non-sensitive features, and can consider interaction and mutual influence among the features, so that the contribution of the features to a target variable is more comprehensively known.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic illustration of feature screening of the present invention;
FIG. 3 is a graph comparing the predicted results of the original data and the sensitive features of the present invention;
FIG. 4 is a schematic diagram of the prediction results of the combination features of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are intended to be illustrative only and not limiting in any way.
As shown in fig. 1, a fatigue performance prediction method based on adaptive feature selection is characterized in that: the method comprises the following steps:
step S1: the collected material data contains chemical composition, process parameters, upstream process characteristics and corresponding fatigue strength, and the data set is expressed as f= { F i,j ,y i I=1, 2, …, n; j=1, 2, …, k }, wherein: n is the number of samples, 437 in this case, and k is the characteristic number, 25 in this case, and includes the chemical component variable (C, si, mn, P, S, ni, cr, cu, mo), the process parameter variable (normalizing temperature, over hardening time, over hardening cooling rate, carburizing temperature, carburizing time, diffusion temperature, diffusion time, quenching medium temperature, tempering time, tempering cooling rate), the upstream processing characteristic variable (ingot ratio, area ratio of plastically deformed inclusions, area ratio of inclusions in discontinuous arrangement, area ratio of isolated inclusions), f i,j The j-th feature, y, of the i-th sample i Is the rotational bending fatigue strength of the ith sample.
The data features are normalized, the normalized data set is randomly divided into 80% of training set and 20% of test set, and the normalization is expressed as:
wherein: f's' i,j Normalized values for the jth feature in the ith sample;
f j all eigenvalues for column j.
Step S2: the mutual information is used to evaluate the feature weights between the feature variables and the fatigue strength, and in the continuous random variables, the calculation formula is as follows:
wherein: p (f) j Y) is f j And a joint probability density function of Y;
p(f j ) And p (Y) is f j And an edge probability density function of Y;
the correlation between the training set and the test set is self-adaptively distributed with a weight coefficient to calculate a characteristic weight W j The calculation formula is as follows:
W j =αI 1 (f j ;Y)+βI 2 (f j ;Y)
wherein: w (W) j Is a characteristic weight;
I 1 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the training set;
I 2 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the test set;
α, β are weight coefficients, α+β=1;
calculating threshold value l from feature weight and significance level α The significance level is 0.01, and the characteristic weight W is selected j Exceeding threshold l α Is characterized by sensitive features, as sample data, as shown in fig. 2, and the calculation formula is as follows:
wherein μ is the mean of the combining weights, σ is the variance of the combining weights, and α is the chi-square distributionSignificance level of 2 mu 2 And/sigma is the degree of freedom.
Step S3: dividing sample data into 80% sample training set and 20% sample test set, inputting the sample training set into support vector regression model for training, obtaining parameters of the model, inputting the sample test set into the trained model, and using decision coefficient R 2 To evaluate the performance of the model and to output an optimal combination through successive iterations of the model.
R 2 The calculation formula is as follows:
wherein y is i In order to achieve a true fatigue strength, the steel sheet is,for predicting fatigue strength->Is the average value of the true fatigue strength.
The optimal weight coefficient combination is alpha=0.65, beta=0.35, and the test set of the original data predicts the result R 2 Test set prediction result R for sensitive feature=0.974 2 =0.961, as shown in fig. 3.
Although a large number of features are removed, the screened sensitive features still can maintain good prediction capability, which shows that the features have higher relevance and importance for the prediction of the target variable, and the validity of adaptive feature selection is further verified.
Step S4: combining the non-sensitive features and the sensitive features by adopting an enumeration method, constructing 10 SVR base models by performing put-back random sampling on a combined training set by BaggingRegoresor, averaging the prediction results of each base model to obtain a final integrated prediction result, inputting the combined testing set into the trained integrated model, and using a decision coefficient R 2 To evaluate model performance.
The optimal combination features are shown in FIG. 2, and the sensitive features and the non-sensitive features are reasonably combined to capture higher-level feature interaction and patterns, which determine the coefficient R 2 =0.983, as shown in fig. 4, R compared with the original data and the sensitive data 2 Has a certain lifting. These methods can help the model better interpret and predict data and improve the performance of the model in practical applications.
Although the embodiments of the present invention and the accompanying drawings have been disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the invention and the appended claims, and therefore the scope of the invention is not limited to the embodiments and the disclosure of the drawings.
Claims (3)
1. A fatigue performance prediction method based on self-adaptive feature selection is characterized by comprising the following steps: the prediction method comprises the following steps:
s1, collecting material related data, performing normalization processing on characteristic data to form a data set, and dividing the data set into a training set and a testing set;
s2, extracting correlation between the features and the target variable by using mutual information, combining the correlation of the training set and the testing set by using a self-adaptive weighting method to serve as feature weights, taking the feature weights as input of self-adaptive feature selection, screening sensitive features, and taking the sensitive features as sample data;
s3, dividing sample data into a sample training set and a sample testing set, inputting the sample training set into a support vector machine regression model for training, and inputting the sample testing set into the trained support vector machine regression model to enableBy determining coefficient R 2 To evaluate model performance;
s4, combining the non-sensitive features and the sensitive features by using an enumeration method, dividing the combined data into a combined training set and a combined testing set, constructing an integrated regression model based on SVR by using a BaggingReggresor, inputting the combined testing set into the integrated regression model, inputting the combined testing set into the trained integrated regression model, and using a decision coefficient R 2 To evaluate model performance.
2. The fatigue performance prediction method based on adaptive feature selection according to claim 1, wherein: the specific steps of the S1 are as follows:
the collected material-related data contains chemical composition, process parameters, upstream process characteristics, and corresponding fatigue strength, and the resulting data set is denoted as f= { F i,j ,y i ,i=1,2,…,n;j=1,2,…,k},
Wherein: n is the number of samples;
k is the number of features;
f i,j a j-th feature that is the i-th sample;
y i fatigue strength for the ith sample;
normalizing the characteristics of the data, and dividing the normalized data set into a training set and a testing set;
the normalization process is expressed as:
wherein: f's' i,j Normalized values for the jth feature in the ith sample;
x j all eigenvalues for the j-th.
3. The fatigue performance prediction method based on adaptive feature selection according to claim 1, wherein: the specific steps of the S2 are as follows:
mutual information is used to extract correlations between the characteristic variables of the training and test sets and the fatigue strength. Mutual information is a concept that measures the degree of association between two random variables and is not limited to linear relationships. It is a concept based on information theory, which is used to describe the statistical dependency between two variables, i.e. the degree of sharing of information between them. The computation of mutual information involves a joint probability distribution of two random variables and a respective edge probability distribution. In the continuous random variable, the calculation formula is as follows:
wherein: p (f) j Y) is f j And a joint probability density function of Y;
p(f j ) And p (Y) is f j And an edge probability density function of Y;
I(f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is f j And Y, which is used to measure two random variables f j And Y;
the correlation of the training set and the test set is combined by adopting a self-adaptive weighting method to form characteristic weights, and the calculation formula is as follows:
W j =αI 1 (f j ;Y)+βI 2 (f j ;Y)
wherein: w (W) j Is a characteristic weight;
I 1 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the training set;
I 2 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the test set;
α, β are weight coefficients, α+β=1;
calculating threshold value l according to feature weight and significance level α Selecting a feature weight W j Exceeding threshold l α Is characterized by sensitive characteristics, and is used as sample data, and the calculation formula is as follows:
wherein: mu is the average value of the combination weights;
sigma is the variance of the combining weights;
alpha is chi-square distributionSignificance level of 2 mu 2 And/sigma is the degree of freedom.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311164299.XA CN117332676A (en) | 2023-09-11 | 2023-09-11 | Fatigue performance prediction method based on self-adaptive feature selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311164299.XA CN117332676A (en) | 2023-09-11 | 2023-09-11 | Fatigue performance prediction method based on self-adaptive feature selection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117332676A true CN117332676A (en) | 2024-01-02 |
Family
ID=89276275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311164299.XA Pending CN117332676A (en) | 2023-09-11 | 2023-09-11 | Fatigue performance prediction method based on self-adaptive feature selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117332676A (en) |
-
2023
- 2023-09-11 CN CN202311164299.XA patent/CN117332676A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fang et al. | An adaptive functional regression-based prognostic model for applications with missing data | |
CN108985335B (en) | Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material | |
CN104951809A (en) | Unbalanced data classification method based on unbalanced classification indexes and integrated learning | |
CN112214933B (en) | Fatigue performance prediction method based on machine learning | |
Shiraiwa et al. | Prediction of fatigue strength in steels by linear regression and neural network | |
CN111581892B (en) | Bearing residual life prediction method based on GDAU neural network | |
CN112651119B (en) | Multi-performance parameter acceleration degradation test evaluation method for space harmonic reducer | |
CN111125964B (en) | Sewage treatment process proxy model construction method based on Kriging interpolation method | |
CN108052793A (en) | A kind of mobile pollution source concentration of emission Forecasting Methodology based on FUZZY WEIGHTED ELM | |
CN115422687A (en) | Service life prediction method of rolling bearing | |
Xiao et al. | A noise-boosted remaining useful life prediction method for rotating machines under different conditions | |
CN115389743A (en) | Method, medium and system for predicting content interval of dissolved gas in transformer oil | |
CN112070030B (en) | Barkhausen signal randomness measurement and conversion method | |
JP4299508B2 (en) | Operation and quality related analysis device in manufacturing process, related analysis method, and computer-readable storage medium | |
CN117332676A (en) | Fatigue performance prediction method based on self-adaptive feature selection | |
CN106647274B (en) | Operating condition stable state method of discrimination in a kind of continuous flow procedure | |
CN117575412A (en) | Model training method, device, equipment and medium for charge quality prediction | |
Jobi-Taiwo et al. | Mahalanobis-Taguchi system for multiclass classification of steel plates fault | |
Tenner et al. | Prediction of mechanical properties in steel heat treatment process using neural networks | |
Qiao et al. | Machine-learning approach to predict work hardening behavior of pearlitic steel | |
CN111832731B (en) | Multi-index monitoring method for representing uncertain state of oil and diagnosing faults | |
CN114781250A (en) | Multi-factor influence environment fatigue life prediction method based on machine learning | |
Hajiannejad et al. | The Predictive Power of Future Cash Flow by Earning and Cash Flow | |
CN111967142A (en) | Performance degradation experiment modeling and analyzing method considering cognitive uncertainty | |
Vuolio et al. | Neural Network Model Identification Studies to Predict Residual Stress of a Steel Plate Based on a Non-destructive Barkhausen Noise Measurement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |