CN117332676A - Fatigue performance prediction method based on self-adaptive feature selection - Google Patents

Fatigue performance prediction method based on self-adaptive feature selection Download PDF

Info

Publication number
CN117332676A
CN117332676A CN202311164299.XA CN202311164299A CN117332676A CN 117332676 A CN117332676 A CN 117332676A CN 202311164299 A CN202311164299 A CN 202311164299A CN 117332676 A CN117332676 A CN 117332676A
Authority
CN
China
Prior art keywords
data
sample
inputting
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311164299.XA
Other languages
Chinese (zh)
Inventor
武川
姚磊
黎振
蔡玉俊
王琳宁
王浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Original Assignee
Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology and Education China Vocational Training Instructor Training Center filed Critical Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Priority to CN202311164299.XA priority Critical patent/CN117332676A/en
Publication of CN117332676A publication Critical patent/CN117332676A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2101/00Indexing scheme relating to the type of digital function generated
    • G06F2101/14Probability distribution functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Geometry (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Investigating Strength Of Materials By Application Of Mechanical Stress (AREA)

Abstract

The invention relates to a fatigue performance prediction method based on self-adaptive feature selection, which comprises the following steps: s1, collecting material related data, and dividing a data set into a training set and a testing set; s2, taking a correlation combination of the training set and the testing set as a characteristic weight, and taking sensitive characteristics of the characteristic weight as sample data; s3, inputting sample data into a support vector machine regression model for training, and inputting a sample test set into the trained support vector machine regression model for evaluating model performance; s4, inputting the combined test set into the integrated regression model, inputting the combined test set into the trained integrated regression model, and using the estimated model performance. The method can help identify the most relevant features of fatigue performance prediction, thereby potentially improving the accuracy of the prediction model, and the model can intensively consider key factors influencing fatigue behavior through selecting information features and eliminating irrelevant features.

Description

Fatigue performance prediction method based on self-adaptive feature selection
Technical Field
The invention belongs to the technical field of material fatigue performance evaluation, and particularly relates to a fatigue performance prediction method based on self-adaptive feature selection.
Background
In modern industrial production, the fatigue performance of materials or parts is a very important physical quantity, and plays a vital role in guaranteeing the quality, the service life and the safety of products. In the past, fatigue performance predictions were generally based on engineering experience and experimental data, and this approach has problems of great dependence, low reliability and high cost. In recent years, the wide application of machine learning methods has brought new opportunities for fatigue performance prediction.
However, there are also some technical problems to be solved in the course of applying machine learning to fatigue performance prediction. For example, how to preprocess the input data and reduce noise and errors of the data; how to design and select appropriate feature extraction and processing techniques to capture important features of materials or components; how to select a proper machine learning algorithm and optimize to improve the accuracy, precision and robustness of fatigue performance prediction; and how to evaluate and verify the predictive model to determine whether its quality and performance meet requirements, etc.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a fatigue performance prediction method based on self-adaptive feature selection.
The invention solves the technical problems by the following technical proposal:
a fatigue performance prediction method based on self-adaptive feature selection is characterized by comprising the following steps: the prediction method comprises the following steps:
s1, collecting material related data, performing normalization processing on characteristic data to form a data set, and dividing the data set into a training set and a testing set;
s2, extracting correlation between the features and the target variable by using mutual information, combining the correlation of the training set and the testing set by using a self-adaptive weighting method to serve as feature weights, taking the feature weights as input of self-adaptive feature selection, screening sensitive features, and taking the sensitive features as sample data;
s3, dividing sample data into a sample training set and a sample testing set, inputting the sample training set into a support vector machine regression model for training, inputting the sample testing set into the trained support vector machine regression model, and using a decision coefficient R 2 To evaluate model performance;
s4, combining the non-sensitive features and the sensitive features by using an enumeration method, dividing the combined data into a combined training set and a combined testing set, constructing an integrated regression model based on SVR by using a BaggingReggresor, inputting the combined testing set into the integrated regression model, inputting the combined testing set into the trained integrated regression model, and using a decision coefficient R 2 To evaluate model performance.
The specific steps of S1 are as follows:
the collected material-related data contains chemical composition, process parameters, upstream process characteristics, and corresponding fatigue strength, and the resulting data set is denoted as f= { F i,j ,y i ,i=1,2,…,n;j=1,2,…,k},
Wherein: n is the number of samples;
k is the number of features;
f i,j a j-th feature that is the i-th sample;
y i fatigue strength for the ith sample;
normalizing the characteristics of the data, and dividing the normalized data set into a training set and a testing set;
the normalization process is expressed as:
wherein: f's' i,j Normalized values for the jth feature in the ith sample;
x j all eigenvalues for the j-th.
The specific steps of S2 are as follows:
mutual information is used to extract correlations between the characteristic variables of the training and test sets and the fatigue strength. Mutual information is a concept that measures the degree of association between two random variables and is not limited to linear relationships. It is a concept based on information theory, which is used to describe the statistical dependency between two variables, i.e. the degree of sharing of information between them. The computation of mutual information involves a joint probability distribution of two random variables and a respective edge probability distribution. In the continuous random variable, the calculation formula is as follows:
wherein: p (f) j Y) is f j And a joint probability density function of Y;
p(f j ) And p (Y) is f j And an edge probability density function of Y;
I(f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is f j And Y, which is used to measure two random variables f j And Y;
the correlation of the training set and the test set is combined by adopting a self-adaptive weighting method to form characteristic weights, and the calculation formula is as follows:
W j =αI 1 (f j ;Y)+βI 2 (f j ;Y)
wherein: w (W) j Is a characteristic weight;
I 1 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the training set;
I 2 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the test set;
α, β are weight coefficients, α+β=1;
calculating threshold value l according to feature weight and significance level α Selecting a feature weight W j Exceeding threshold l α Is characterized by sensitive characteristics, and is used as sample data, and the calculation formula is as follows:
wherein: mu is the average value of the combination weights;
sigma is the variance of the combining weights;
alpha is chi-square distributionSignificance level of 2 mu 2 And/sigma is the degree of freedom.
The invention has the advantages and beneficial effects that:
1. the fatigue performance prediction method based on the self-adaptive feature selection can help to identify the most relevant feature of fatigue performance prediction, thereby potentially improving the accuracy of a prediction model, and the model can intensively consider key factors influencing fatigue behavior through the selection of information features and the elimination of irrelevant features.
2. The fatigue performance prediction method based on self-adaptive feature selection utilizes data to identify important features, allows a model to adapt and learn from available information, has higher accuracy and adaptability when processing a complex fatigue mechanism which cannot be completely captured by a traditional model, enumerates non-sensitive features, and can consider interaction and mutual influence among the features, so that the contribution of the features to a target variable is more comprehensively known.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic illustration of feature screening of the present invention;
FIG. 3 is a graph comparing the predicted results of the original data and the sensitive features of the present invention;
FIG. 4 is a schematic diagram of the prediction results of the combination features of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are intended to be illustrative only and not limiting in any way.
As shown in fig. 1, a fatigue performance prediction method based on adaptive feature selection is characterized in that: the method comprises the following steps:
step S1: the collected material data contains chemical composition, process parameters, upstream process characteristics and corresponding fatigue strength, and the data set is expressed as f= { F i,j ,y i I=1, 2, …, n; j=1, 2, …, k }, wherein: n is the number of samples, 437 in this case, and k is the characteristic number, 25 in this case, and includes the chemical component variable (C, si, mn, P, S, ni, cr, cu, mo), the process parameter variable (normalizing temperature, over hardening time, over hardening cooling rate, carburizing temperature, carburizing time, diffusion temperature, diffusion time, quenching medium temperature, tempering time, tempering cooling rate), the upstream processing characteristic variable (ingot ratio, area ratio of plastically deformed inclusions, area ratio of inclusions in discontinuous arrangement, area ratio of isolated inclusions), f i,j The j-th feature, y, of the i-th sample i Is the rotational bending fatigue strength of the ith sample.
The data features are normalized, the normalized data set is randomly divided into 80% of training set and 20% of test set, and the normalization is expressed as:
wherein: f's' i,j Normalized values for the jth feature in the ith sample;
f j all eigenvalues for column j.
Step S2: the mutual information is used to evaluate the feature weights between the feature variables and the fatigue strength, and in the continuous random variables, the calculation formula is as follows:
wherein: p (f) j Y) is f j And a joint probability density function of Y;
p(f j ) And p (Y) is f j And an edge probability density function of Y;
the correlation between the training set and the test set is self-adaptively distributed with a weight coefficient to calculate a characteristic weight W j The calculation formula is as follows:
W j =αI 1 (f j ;Y)+βI 2 (f j ;Y)
wherein: w (W) j Is a characteristic weight;
I 1 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the training set;
I 2 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the test set;
α, β are weight coefficients, α+β=1;
calculating threshold value l from feature weight and significance level α The significance level is 0.01, and the characteristic weight W is selected j Exceeding threshold l α Is characterized by sensitive features, as sample data, as shown in fig. 2, and the calculation formula is as follows:
wherein μ is the mean of the combining weights, σ is the variance of the combining weights, and α is the chi-square distributionSignificance level of 2 mu 2 And/sigma is the degree of freedom.
Step S3: dividing sample data into 80% sample training set and 20% sample test set, inputting the sample training set into support vector regression model for training, obtaining parameters of the model, inputting the sample test set into the trained model, and using decision coefficient R 2 To evaluate the performance of the model and to output an optimal combination through successive iterations of the model.
R 2 The calculation formula is as follows:
wherein y is i In order to achieve a true fatigue strength, the steel sheet is,for predicting fatigue strength->Is the average value of the true fatigue strength.
The optimal weight coefficient combination is alpha=0.65, beta=0.35, and the test set of the original data predicts the result R 2 Test set prediction result R for sensitive feature=0.974 2 =0.961, as shown in fig. 3.
Although a large number of features are removed, the screened sensitive features still can maintain good prediction capability, which shows that the features have higher relevance and importance for the prediction of the target variable, and the validity of adaptive feature selection is further verified.
Step S4: combining the non-sensitive features and the sensitive features by adopting an enumeration method, constructing 10 SVR base models by performing put-back random sampling on a combined training set by BaggingRegoresor, averaging the prediction results of each base model to obtain a final integrated prediction result, inputting the combined testing set into the trained integrated model, and using a decision coefficient R 2 To evaluate model performance.
The optimal combination features are shown in FIG. 2, and the sensitive features and the non-sensitive features are reasonably combined to capture higher-level feature interaction and patterns, which determine the coefficient R 2 =0.983, as shown in fig. 4, R compared with the original data and the sensitive data 2 Has a certain lifting. These methods can help the model better interpret and predict data and improve the performance of the model in practical applications.
Although the embodiments of the present invention and the accompanying drawings have been disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the invention and the appended claims, and therefore the scope of the invention is not limited to the embodiments and the disclosure of the drawings.

Claims (3)

1. A fatigue performance prediction method based on self-adaptive feature selection is characterized by comprising the following steps: the prediction method comprises the following steps:
s1, collecting material related data, performing normalization processing on characteristic data to form a data set, and dividing the data set into a training set and a testing set;
s2, extracting correlation between the features and the target variable by using mutual information, combining the correlation of the training set and the testing set by using a self-adaptive weighting method to serve as feature weights, taking the feature weights as input of self-adaptive feature selection, screening sensitive features, and taking the sensitive features as sample data;
s3, dividing sample data into a sample training set and a sample testing set, inputting the sample training set into a support vector machine regression model for training, and inputting the sample testing set into the trained support vector machine regression model to enableBy determining coefficient R 2 To evaluate model performance;
s4, combining the non-sensitive features and the sensitive features by using an enumeration method, dividing the combined data into a combined training set and a combined testing set, constructing an integrated regression model based on SVR by using a BaggingReggresor, inputting the combined testing set into the integrated regression model, inputting the combined testing set into the trained integrated regression model, and using a decision coefficient R 2 To evaluate model performance.
2. The fatigue performance prediction method based on adaptive feature selection according to claim 1, wherein: the specific steps of the S1 are as follows:
the collected material-related data contains chemical composition, process parameters, upstream process characteristics, and corresponding fatigue strength, and the resulting data set is denoted as f= { F i,j ,y i ,i=1,2,…,n;j=1,2,…,k},
Wherein: n is the number of samples;
k is the number of features;
f i,j a j-th feature that is the i-th sample;
y i fatigue strength for the ith sample;
normalizing the characteristics of the data, and dividing the normalized data set into a training set and a testing set;
the normalization process is expressed as:
wherein: f's' i,j Normalized values for the jth feature in the ith sample;
x j all eigenvalues for the j-th.
3. The fatigue performance prediction method based on adaptive feature selection according to claim 1, wherein: the specific steps of the S2 are as follows:
mutual information is used to extract correlations between the characteristic variables of the training and test sets and the fatigue strength. Mutual information is a concept that measures the degree of association between two random variables and is not limited to linear relationships. It is a concept based on information theory, which is used to describe the statistical dependency between two variables, i.e. the degree of sharing of information between them. The computation of mutual information involves a joint probability distribution of two random variables and a respective edge probability distribution. In the continuous random variable, the calculation formula is as follows:
wherein: p (f) j Y) is f j And a joint probability density function of Y;
p(f j ) And p (Y) is f j And an edge probability density function of Y;
I(f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is f j And Y, which is used to measure two random variables f j And Y;
the correlation of the training set and the test set is combined by adopting a self-adaptive weighting method to form characteristic weights, and the calculation formula is as follows:
W j =αI 1 (f j ;Y)+βI 2 (f j ;Y)
wherein: w (W) j Is a characteristic weight;
I 1 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the training set;
I 2 (f j the method comprises the steps of carrying out a first treatment on the surface of the Y) is the correlation of the test set;
α, β are weight coefficients, α+β=1;
calculating threshold value l according to feature weight and significance level α Selecting a feature weight W j Exceeding threshold l α Is characterized by sensitive characteristics, and is used as sample data, and the calculation formula is as follows:
wherein: mu is the average value of the combination weights;
sigma is the variance of the combining weights;
alpha is chi-square distributionSignificance level of 2 mu 2 And/sigma is the degree of freedom.
CN202311164299.XA 2023-09-11 2023-09-11 Fatigue performance prediction method based on self-adaptive feature selection Pending CN117332676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311164299.XA CN117332676A (en) 2023-09-11 2023-09-11 Fatigue performance prediction method based on self-adaptive feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311164299.XA CN117332676A (en) 2023-09-11 2023-09-11 Fatigue performance prediction method based on self-adaptive feature selection

Publications (1)

Publication Number Publication Date
CN117332676A true CN117332676A (en) 2024-01-02

Family

ID=89276275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311164299.XA Pending CN117332676A (en) 2023-09-11 2023-09-11 Fatigue performance prediction method based on self-adaptive feature selection

Country Status (1)

Country Link
CN (1) CN117332676A (en)

Similar Documents

Publication Publication Date Title
Fang et al. An adaptive functional regression-based prognostic model for applications with missing data
CN108985335B (en) Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material
CN104951809A (en) Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN112214933B (en) Fatigue performance prediction method based on machine learning
Shiraiwa et al. Prediction of fatigue strength in steels by linear regression and neural network
CN111581892B (en) Bearing residual life prediction method based on GDAU neural network
CN112651119B (en) Multi-performance parameter acceleration degradation test evaluation method for space harmonic reducer
CN111125964B (en) Sewage treatment process proxy model construction method based on Kriging interpolation method
CN108052793A (en) A kind of mobile pollution source concentration of emission Forecasting Methodology based on FUZZY WEIGHTED ELM
CN115422687A (en) Service life prediction method of rolling bearing
Xiao et al. A noise-boosted remaining useful life prediction method for rotating machines under different conditions
CN115389743A (en) Method, medium and system for predicting content interval of dissolved gas in transformer oil
CN112070030B (en) Barkhausen signal randomness measurement and conversion method
JP4299508B2 (en) Operation and quality related analysis device in manufacturing process, related analysis method, and computer-readable storage medium
CN117332676A (en) Fatigue performance prediction method based on self-adaptive feature selection
CN106647274B (en) Operating condition stable state method of discrimination in a kind of continuous flow procedure
CN117575412A (en) Model training method, device, equipment and medium for charge quality prediction
Jobi-Taiwo et al. Mahalanobis-Taguchi system for multiclass classification of steel plates fault
Tenner et al. Prediction of mechanical properties in steel heat treatment process using neural networks
Qiao et al. Machine-learning approach to predict work hardening behavior of pearlitic steel
CN111832731B (en) Multi-index monitoring method for representing uncertain state of oil and diagnosing faults
CN114781250A (en) Multi-factor influence environment fatigue life prediction method based on machine learning
Hajiannejad et al. The Predictive Power of Future Cash Flow by Earning and Cash Flow
CN111967142A (en) Performance degradation experiment modeling and analyzing method considering cognitive uncertainty
Vuolio et al. Neural Network Model Identification Studies to Predict Residual Stress of a Steel Plate Based on a Non-destructive Barkhausen Noise Measurement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination