CN115527625A - Hardness prediction method and system for high-entropy alloy - Google Patents
Hardness prediction method and system for high-entropy alloy Download PDFInfo
- Publication number
- CN115527625A CN115527625A CN202211277619.8A CN202211277619A CN115527625A CN 115527625 A CN115527625 A CN 115527625A CN 202211277619 A CN202211277619 A CN 202211277619A CN 115527625 A CN115527625 A CN 115527625A
- Authority
- CN
- China
- Prior art keywords
- hardness
- entropy alloy
- alloy
- classifier
- alcocrcufeni
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C60/00—Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Chemical & Material Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Optimization (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Pure & Applied Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Operations Research (AREA)
- Evolutionary Biology (AREA)
- Medicinal Chemistry (AREA)
- Investigating And Analyzing Materials By Characteristic Methods (AREA)
Abstract
The invention relates to a hardness prediction method and system for a high-entropy alloy. The method comprises the steps of obtaining candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data, and establishing a data set; training a Stacking integration model by using a data set; screening candidate characteristics by adopting a Pearson correlation coefficient, an XGboost evaluation model, a random forest, a genetic algorithm, an XGboost-based recursive elimination characteristic method and an exhaustion method, and determining the screened characteristics; establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics; constructing an alloy component search space according to the trained Stacking integration model and the classifier; and (4) carrying out hardness prediction according to the alloy composition search space. The method can improve the accuracy of hardness prediction of the high-entropy alloy.
Description
Technical Field
The invention relates to the technical field of metal material design, in particular to a hardness prediction method and system for a high-entropy alloy.
Background
Conventional alloys are generally based on one or two alloys with the addition of other different elements to improve material properties. And the high-entropy alloy is designed by using four or more equimolar or nearly equimolar mixture ratio main elements. Based on the complex and changeable structure, the high-entropy alloy has a plurality of excellent effects such as high-entropy effect expressed in thermodynamics, serious lattice distortion effect, slow diffusion effect, cocktail effect and the like.
Although the performance of the high-entropy alloy can be excellent, due to the complex composition and structure of the high-entropy alloy, it is difficult to find the high-entropy alloy with excellent performance, and the main screening method is based on prior experience and continuous trial and error, and the efficiency is low. In recent years, some techniques for designing materials based on calculation are also rapidly developed, and the development of new materials is greatly promoted. In particular, with the rapid rise and development of artificial intelligence in the last decade, machine learning has gradually begun to be applied in various application fields. Compared with traditional calculation methods such as Density Functional Theory (DFT), molecular dynamics and the like, the performance of the material can be predicted by establishing a black box model through machine learning, and the calculation efficiency is greatly improved. And with the development of interpretable machine learning, information which cannot be noticed by human beings in the past can be gradually discovered from the black box model. The machine learning is a good tool for researching complex principal elements and structures of high-entropy alloys and predicting performances, and can capture rules except prior experience of people, and the machine learning can greatly accelerate the design of high-performance materials and the discovery of new materials by predicting the performances of the materials.
Bhandari et al used an artificial neural network with five hidden layers to predict hardness of refractory high-entropy alloys and successfully predicted C 0.1 Cr 3 Mo 11.9 Nb 20 Re 15 Ta 30 W 20 The hardness of (d) is 695HV. Wen et al construct a double-layer loop iterative machine learning system, search candidate high-entropy alloys by combining a support vector machine and a utility function, and synthesize 42 high-entropy alloys through seven rounds of iteration, wherein the hardness of 17 alloys is improved by more than 10%. King and the like synthesize 138 alloy samples through high-throughput experiments, 3 machine learning models and 4 description factors are used for constructing 120 combinations according to hardness data of the alloy samples, and the optimal model screened from the combinations improves the comprehensive design efficiency of the alloy by 200 times. Chang et al predict AlCoCrFeMnNi system high-entropy alloy using artificial neural network and select candidate alloy from the search space by combining simulated annealing algorithm to obtain high-hardness high-entropy alloy. Although the above works all obtain good prediction effects, in the stage of synthesis verification, due to the low generalization performance of the conventional machine learning algorithm, the deviation between the actual value and the predicted value becomes large, that is, the prediction accuracy is not high.
Therefore, in order to solve the above problems, it is necessary to provide a method or a system capable of improving the accuracy of hardness prediction of a high-entropy alloy, thereby achieving the purpose of designing a high-hardness high-entropy alloy.
Disclosure of Invention
The invention aims to provide a hardness prediction method and system for a high-entropy alloy, which can improve the accuracy of hardness prediction of the high-entropy alloy.
In order to achieve the purpose, the invention provides the following scheme:
a hardness prediction method of a high-entropy alloy comprises the following steps:
acquiring corresponding candidate characteristics according to AlCoCrCuFeNi system high-entropy alloy hardness data in a high-entropy alloy hardness database; constructing a data set according to the candidate characteristics and the corresponding AlCoCrCuFeNi system high-entropy alloy hardness data; the hardness data of the AlCoCrCuFeNi system high-entropy alloy comprises the following steps: the molar ratio of each element in each system and the corresponding hardness; the candidate features include: valence electron concentration, entropy of mixing, enthalpy of mixing, atomic radius difference, electronegativity difference, average melting point of alloy, local electronegativity mismatch, electron concentration, gibbs free energy, shear modulus, young's modulus, lattice distortion energy, shear modulus mismatch, energy in a reinforcing model, coagulation energy, peierls-nabaro factor, work function, local atomic radius mismatch, local modulus mismatch, shear modulus difference, average deviation of alloy atomic weight, average deviation of alloy family, average deviation of alloy specific volume, and synthesis parameters;
training a Stacking integration model by using the data set; the trained packing integration model comprises the following steps: a primary classifier and a secondary classifier; the trained Stacking integrated model is used for determining corresponding hardness according to the candidate features;
screening candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, an XGboost evaluation model, a random forest, a genetic algorithm, an XGboost-based recursive elimination characteristic method and an exhaustion method, and determining the screened characteristics;
establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data; the classifier is used for classifying the hardness of the high-entropy alloy;
constructing an alloy component search space according to the trained Stacking integration model and the classifier;
and predicting the hardness of the AlCoCrCuFeNi system high-entropy alloy according to the alloy component search space.
Optionally, the training of the Stacking integration model by using the data set specifically includes:
determining a primary classifier by using a RandomForest, XGboost and Catboost method;
a secondary classifier is determined using a bayesian regressor.
Optionally, the candidate features corresponding to the alcocrccufeni system high-entropy alloy hardness data are screened by using a pearson correlation coefficient, an XGBoost evaluation model, a random forest, a genetic algorithm, an XGBoost-based recursive elimination feature method, and an exhaustion method, and the features after screening are determined, which specifically include:
performing correlation screening on candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, and dividing the candidate characteristics into 3 high correlation groups and 1 low correlation group by taking 0.9 as a threshold value; adding the candidate features in the high correlation groups into the low correlation groups, determining corresponding mean square errors by adopting an XGboost evaluation model, and reserving the candidate features of which the mean square errors are smaller than a mean square error threshold value in each high correlation group to obtain the features after primary screening;
respectively adopting a random forest, a genetic algorithm and a recursive elimination characteristic method based on XGboost to the characteristics after the first screening to obtain the characteristics after the second screening;
and (4) performing an exhaustion method on the features after the second screening to obtain the screened features.
Optionally, the establishing a classifier by using principal component analysis and logistic regression according to the screened characteristics corresponding to the alcocrccufeni system high-entropy alloy hardness data specifically includes:
carrying out data normalization on the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data;
performing dimensionality reduction on the features after data normalization by using principal component analysis;
and establishing a classifier according to the features after dimension reduction by using logistic regression.
Optionally, the predicting the hardness of the alcocrccufeni system high-entropy alloy according to the alloy component search space specifically includes:
constructing an element molar ratio data set of the high-entropy alloy to be predicted, and judging whether the element molar ratio data set exceeds a set hardness by using a classifier;
and (4) predicting the hardness of the data in the element molar ratio data set exceeding the set hardness by using the trained Stacking integrated model.
A system for hardness prediction in high entropy alloys, comprising:
the candidate feature acquisition module is used for acquiring corresponding candidate features according to AlCoCrCuFeNi system high-entropy alloy hardness data in the high-entropy alloy hardness database; constructing a data set according to the candidate characteristics and the corresponding AlCoCrCuFeNi system high-entropy alloy hardness data; the hardness data of the AlCoCrCuFeNi system high-entropy alloy comprises the following steps: the molar ratio of each element in each system and the corresponding hardness; the candidate features include: valence electron concentration, entropy of mixing, enthalpy of mixing, atomic radius difference, electronegativity difference, average melting point of the alloy, local electronegativity mismatch, electron concentration, gibbs free energy, shear modulus, young's modulus, lattice distortion energy, shear modulus mismatch, energy in a reinforcing model, condensation energy, peierls-nabaro factor, work function, local atomic radius mismatch, local modulus mismatch, shear modulus difference, average deviation of alloy atomic weight, average deviation of alloy family, average deviation of alloy specific volume, and synthesis parameters;
the trained packing integrated model determining module is used for utilizing the data set to train a packing integrated model; the trained packing integration model comprises the following steps: a primary classifier and a secondary classifier; the trained Stacking integrated model is used for determining corresponding hardness according to the candidate features;
the screened feature determination module is used for screening candidate features corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting Pearson correlation coefficients, an XGboost evaluation model, a random forest, a genetic algorithm, a recursive elimination feature method based on XGboost and an exhaustion method, and determining the screened features;
the classifier establishing module is used for establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data; the classifier is used for classifying the hardness of the high-entropy alloy;
the alloy component search space construction module is used for constructing an alloy component search space according to the trained Stacking integration model and the classifier;
and the hardness prediction module is used for predicting the hardness of the AlCoCrCuFeNi system high-entropy alloy according to the alloy component search space.
Optionally, the trained Stacking integrated model determining module specifically includes:
a primary classifier determining unit for determining a primary classifier by using a random forest, XGboost and Catboost method;
and the secondary classifier determining unit is used for determining the secondary classifier by using a Bayesian regression unit.
Optionally, the screened feature determination module specifically includes:
the first screening unit is used for carrying out correlation screening on candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, and dividing the candidate characteristics into 3 high correlation groups and 1 low correlation group by taking 0.9 as a threshold value; adding the candidate features in the high correlation groups into the low correlation groups, determining corresponding mean square errors by adopting an XGboost evaluation model, and reserving the candidate features of which the mean square errors are smaller than a mean square error threshold value in each high correlation group to obtain the features after primary screening;
the second screening unit is used for respectively adopting a random forest, a genetic algorithm and a recursive elimination characteristic method based on XGboost to the characteristics after the first screening to obtain the characteristics after the second screening;
and the screened feature determining unit is used for obtaining the screened features by adopting an exhaustion method for the features screened for the second time.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the hardness prediction method and system for the high-entropy alloy, provided by the invention, candidate characteristics are determined according to existing data in a database, a Stacking integration model consisting of a primary classifier and a secondary classifier is constructed and trained, the candidate characteristics are screened, the screened characteristics are adopted, and a classifier is established by adopting principal component analysis and logistic regression. And constructing an alloy component search space according to the trained Stacking integrated model and the classifier, namely performing primary prediction by using the classifier, screening out the alloy with high hardness, and performing secondary prediction by using the trained Stacking integrated model. The problems of overfitting and low prediction accuracy of the model are effectively solved, the robustness of the model is improved, and the purpose of designing the high-hardness high-entropy alloy can be further achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a hardness prediction method for a high-entropy alloy provided by the invention;
FIG. 2 is a schematic diagram illustrating a hardness prediction method of a high-entropy alloy according to the present invention;
FIG. 3 is a flow chart of classifier construction;
FIG. 4 is a Stacking integration model building flow chart;
FIG. 5 is a hardness data set prediction fit.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention aims to provide a hardness prediction method and system for a high-entropy alloy, which can improve the accuracy of hardness prediction of the high-entropy alloy.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic flow chart of a hardness prediction method for a high-entropy alloy provided by the present invention, fig. 2 is a schematic principle diagram of a hardness prediction method for a high-entropy alloy provided by the present invention, and as shown in fig. 1 and fig. 2, the hardness prediction method for a high-entropy alloy provided by the present invention includes:
s101, acquiring corresponding candidate characteristics according to AlCoCrCuFeNi system high-entropy alloy hardness data in a high-entropy alloy hardness database; constructing a data set according to the candidate characteristics and the corresponding AlCoCrCuFeNi system high-entropy alloy hardness data; the hardness data of the AlCoCrCuFeNi system high-entropy alloy comprises the following steps: valence Electron Concentration (VEC), entropy of mixing (Smix), enthalpy of mixing (Hmix), atomic radius difference (δ r), electronegativity difference (Δ χ), average melting point (Tm) of the alloy, local electronegativity mismatch (D. χ), electron concentration (E/a), gibbs free energy (Gmix), shear modulus (G), young modulus (E), lattice distortion energy (μ), shear modulus mismatch (η), energy in the enhanced model (A), cohesion energy (Ec), peierls-Nabarro factor (F), work function (w), local atomic radius mismatch (D.r), local modulus mismatch (D.G), shear modulus difference (δ G), average deviation of atomic weights of the alloy (D.rw), average deviation of alloy family (D.v), average deviation of specific volume of the alloy (D.sv), and three synthetic parameters related to atomic radius and thermodynamics such as Ω, Λ, γ, wherein VEC, smix, hmix, delta r, delta x, tm, D, chi, E/a, gmix, ec, D.rw, D.v, D.sv, omega, lambda and Gamma are characteristics constructed based on the formation rule of alloy phases, and eta, D.r, A, F, w, G, delta G, D.G, mu and E are characteristics constructed based on the physical properties of materials. C in the formula, as shown in Table 1 i Representing the molar ratio of each element. R in the formula for Smix is gas constant (8.314 J.K) -1 ·mol -1 )。H i-j mix Representing the enthalpy of mixing between the different alloying elements. r is min And r max Represents the minimum atomic radius and the maximum atomic radius of each element in the high-entropy alloy. VEC i 、r i 、χ i 、(Tm) i 、(e/a) i 、G i 、E i 、Ec i 、w i 、v i 、sv i 、rw i Respectively the valence electron concentration, the atomic radius, the electronegativity difference and the melting of the i element in the high-entropy alloyPoint, electron concentration, shear modulus, young's modulus, cohesive energy, work function, family, specific volume, atomic weight. r, v, sv, rw are respectively the atomic radius, group, specific volume, atomic weight of the high entropy alloy, let r, v, sv, rw be k, r of each element in the high entropy alloy i 、v i 、sv i 、rw i Is set to k i The general formula of calculation is:
TABLE 1
And determining a high-entropy alloy hardness database according to the AlCoCrCuFeNi system high-entropy alloy hardness data prepared by an electric arc furnace smelting method.
S102, training a Stacking integration model by using the data set; the trained packing integration model comprises the following steps: a primary classifier and a secondary classifier; and the trained Stacking integrated model is used for determining corresponding hardness according to the candidate features.
S102 specifically comprises the following steps:
as shown in fig. 3, the primary classifier is determined using the RandomForest, XGBoost, and castboost methods.
And finally, XGboost, catboost and RandomForest are selected as modeling algorithms according to decision coefficients of ten machine learning algorithms on data prediction results of five-fold cross validation tests.
The specific method of five-fold cross validation comprises the following steps: the data set is equally divided into 5 groups, four groups are obtained as a training set during each training, the rest group is used as a test set, 5 different training set and test set combinations are generated by 5 groups of equally divided data, each combination can obtain different model scores, and finally the average value of five groups of scores is taken.
The ten selected machine learning algorithms are support vector machines of Gaussian kernels (SVR-RBF), support vector machines of Linear kernels (SVR-Linear), K neighbors (KNN), ridge regression (Ridge), lasso regression (Lasso), decision trees (Cart), random forest (RandomForest), XGboost, catboost, lightGBM.
The sum of the mean square errors of different algorithm pairs of data sets with different division ratios is calculated, three machine learning algorithms with smaller sum of the mean square errors are obtained, and finally integrated algorithms such as XGboost, catboost, randomForest and the like are selected as modeling algorithms. The machine learning algorithm is selected in multiple directions, the performance of each machine algorithm on the data set is considered, the sensitivity of different machine learning algorithms on different training sets and test set division ratios is also considered, and the selection of the algorithm with higher precision and stability is facilitated.
A secondary classifier is determined using a bayesian regressor.
The specific steps of constructing the Stacking integration model comprise: the data set was first divided into a training set and a test set and the training set was divided equally into 5. Respectively using 4 training sets in a training set as a primary classifier for RandomForest, XGboost and CatBoost, sequentially using the training sets for 4 training, using the training models for predicting the rest training set and testing set, after 5 rounds of training, obtaining 5 groups of training set prediction results and 5 groups of testing set prediction results by each primary classifier, splicing and recording the 5 groups of training set prediction results as T1, and taking the mean value of the 5 groups of testing set prediction results as P1. The training method is used for all three primary classifiers to obtain T1, T2, T3, P1, P2 and P3, finally the T1, T2 and T3 are used as characteristic input, a Bayes regression device of a secondary classifier is trained, and the P1, P2 and P3 are used as a test set and are placed into the trained Bayes regression device to obtain a prediction result of the Stacking model.
S103, screening candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, an XGboost evaluation model, a random forest, a genetic algorithm, an XGboost-based recursive elimination characteristic method and an exhaustion method, and determining the screened characteristics.
S103 specifically comprises the following steps:
and performing correlation screening on candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, and dividing the characteristics into 3 high correlation groups [ D.sv, delta r, D chi, E/a, D.r, delta G, D.rw, D.v, D.G, delta chi, mu ], [ w, ec, tm, G, E ], [ F, A ] and 1 low correlation group [ VEC, smix, hmix, omega, lambda and Gmix ] by taking 0.9 as a threshold value. The features in each high correlation group are added into the low correlation group respectively, the corresponding mean square error is determined by adopting an XGBoost evaluation model, the features with smaller mean square error in each high correlation group are reserved (11 features of [ D.sv, delta r, D χ, e/a, D.r, delta G, D.rw, D.v, D.G, delta χ and mu ] high correlation groups are reserved, 4 features are reserved in the high correlation groups for preventing filtering important features, and only one feature is reserved in other high correlation groups), and the features [ VEC, smix, hmix, omega, Λ, gmix, D.sv, delta r, e/a, delta G, ec and A ] after the first screening are obtained. And respectively adopting a random forest, a genetic algorithm and a recursive elimination characteristic method based on XGboost to the characteristics after the first screening to obtain the characteristics [ VEC, hmix, Λ, G, δ r, δ G and Ec ] after the second screening.
And (4) adopting an exhaustion method for the characteristics after the second screening to obtain the screened characteristics [ VEC, hmix, gmix and delta G ].
S104, establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data; the classifier is used for classifying the hardness of the high-entropy alloy.
S104 specifically comprises the following steps:
as shown in fig. 3, data normalization is performed on the screened characteristics corresponding to the alcocrccufeni system high entropy alloy hardness data.
And (5) performing dimensionality reduction on the features after data normalization by using principal component analysis. The main component analysis method comprises the following specific steps: firstly, centralizing input data, then calculating a covariance matrix among all features, and solving an eigenvalue and an eigenvector of the covariance matrix. Because the invention reduces the dimension of the 4-dimensional eigenvector into 3 dimensions, the three eigenvalues with the largest covariance matrix and the corresponding eigenvectors are taken, and finally the original data is projected to the direction of the selected eigenvector, thus completing the principal component analysis dimension reduction.
And establishing a classifier according to the features after the dimension reduction by using logistic regression. The classifier was used to distinguish high entropy alloy samples with hardness >600 and hardness < 600. Therefore, samples in a huge component space can be preferentially screened before the packing model is formally predicted, and the high-entropy alloy is ensured to be high in hardness.
And S105, constructing an alloy component search space according to the trained packing integrated model and the classifier. Firstly, the hardness is screened out by a classifier>600, and then using a Stacking integration model to predict true hardness. As shown in FIG. 5, the Stacking regression model, with five-fold cross validation on the data set, had an RMSE of 75 2 Is 0.83 and also exhibits excellent performance on the ternary medium entropy alloy data set. The accuracy of the logistic regression classification was 0.94.
S106, predicting the hardness of the AlCoCrCuFeNi system high-entropy alloy according to the alloy component search space.
S106 specifically comprises:
and constructing an element molar ratio data set of the high-entropy alloy to be predicted, and judging whether the element molar ratio data set exceeds a set hardness by using a classifier.
And (4) predicting the hardness of the data in the element molar ratio data set exceeding the set hardness by using the trained Stacking integrated model.
In order to verify the accuracy of the Stacking model on high-hardness and low-hardness data, high-hardness and low-hardness high-entropy alloys are respectively selected for prediction and synthesis verification. The results are shown in Table 2.
TABLE 2
The invention uses a machine learning method to dig out the reason of high hardness of AlCoCrCuFeNi system high entropy alloy from big data. The Pearson correlation coefficient is used for screening high-correlation characteristics, the high-importance characteristics are selected by eliminating the characteristics through a genetic algorithm, a random forest and a recursion, the optimal characteristic combination is selected by using an exhaustion method, more representative description factors for synthesizing the high-hardness high-entropy alloy are selected through a series of characteristic selection methods, and the empirical parameters provide important guiding significance for screening the high-hardness high-entropy alloy in the future.
When the model is constructed, the classification algorithm is used for primarily screening high-hardness data, and the idea of ensemble learning is also applied to perform Stacking fusion on RandomForest, XGboost and Catboost algorithms. Therefore, the high-entropy alloy with high hardness is screened out, the accuracy of prediction of the high-entropy alloy with high hardness is improved, and the synthesis efficiency of the high-entropy alloy with high hardness is improved.
As another embodiment, the present invention also provides a hardness prediction system of a high-entropy alloy, including:
the candidate feature acquisition module is used for acquiring corresponding candidate features according to AlCoCrCuFeNi system high-entropy alloy hardness data in the high-entropy alloy hardness database; constructing a data set according to the candidate characteristics and the corresponding AlCoCrCuFeNi system high-entropy alloy hardness data; the hardness data of the AlCoCrCuFeNi system high-entropy alloy comprises the following steps: the molar ratio of each element in each system and the corresponding hardness; the candidate features include: valence electron concentration, entropy of mixing, enthalpy of mixing, atomic radius difference, electronegativity difference, average melting point of the alloy, local electronegativity mismatch, electron concentration, gibbs free energy, shear modulus, young's modulus, lattice distortion energy, shear modulus mismatch, energy in a reinforcing model, condensation energy, peierls-nabaro factor, work function, local atomic radius mismatch, local modulus mismatch, shear modulus difference, average deviation of alloy atomic weight, average deviation of alloy family, average deviation of alloy specific volume, and synthetic parameters.
The trained Stacking integrated model determining module is used for training a Stacking integrated model by utilizing the data set; the trained packing integration model comprises the following steps: a primary classifier and a secondary classifier; and the trained Stacking integrated model is used for determining corresponding hardness according to the candidate features.
And the screened feature determination module is used for screening candidate features corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, an XGboost evaluation model, a random forest, a genetic algorithm, an XGboost-based recursive elimination feature method and an exhaustion method, and determining the screened features.
The classifier establishing module is used for establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data; the classifier is used for classifying the hardness of the high-entropy alloy.
And the alloy component search space construction module is used for constructing an alloy component search space according to the trained Stacking integration model and the classifier.
And the hardness prediction module is used for predicting the hardness of the AlCoCrCuFeNi system high-entropy alloy according to the alloy component search space.
The trained Stacking integrated model determining module specifically comprises:
and the primary classifier determining unit is used for determining the primary classifier by using a random forest, XGboost and Catboost method.
And the secondary classifier determining unit is used for determining the secondary classifier by using a Bayesian regression device.
The screened feature determination module specifically includes:
the first screening unit is used for carrying out correlation screening on candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting Pearson correlation coefficients, and dividing the candidate characteristics into 3 high correlation groups and 1 low correlation group by taking 0.9 as a threshold value; adding the candidate features in the high correlation groups into the low correlation groups, determining corresponding mean square errors by adopting an XGboost evaluation model, and reserving the candidate features of which the mean square errors are smaller than a mean square error threshold value in each high correlation group to obtain the features after primary screening;
and the second screening unit is used for respectively adopting a random forest, a genetic algorithm and a recursive elimination characteristic method based on XGboost to the characteristics after the first screening to obtain the characteristics after the second screening.
And the screened feature determining unit is used for obtaining the screened features by adopting an exhaustion method for the features screened for the second time.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.
Claims (8)
1. A method for predicting the hardness of a high-entropy alloy is characterized by comprising the following steps:
acquiring corresponding candidate characteristics according to AlCoCrCuFeNi system high-entropy alloy hardness data in a high-entropy alloy hardness database; constructing a data set according to the candidate characteristics and the corresponding AlCoCrCuFeNi system high-entropy alloy hardness data; the hardness data of the AlCoCrCuFeNi system high-entropy alloy comprises the following steps: the molar ratio of each element in each system and the corresponding hardness; the candidate features include: valence electron concentration, entropy of mixing, enthalpy of mixing, atomic radius difference, electronegativity difference, average melting point of alloy, local electronegativity mismatch, electron concentration, gibbs free energy, shear modulus, young's modulus, lattice distortion energy, shear modulus mismatch, energy in a reinforcing model, coagulation energy, peierls-nabaro factor, work function, local atomic radius mismatch, local modulus mismatch, shear modulus difference, average deviation of alloy atomic weight, average deviation of alloy family, average deviation of alloy specific volume, and synthesis parameters;
training a Stacking integration model by using the data set; the trained Stacking integration model comprises the following steps: a primary classifier and a secondary classifier; the trained packing integrated model is used for determining corresponding hardness according to the candidate features;
screening candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, an XGboost evaluation model, a random forest, a genetic algorithm, an XGboost-based recursive elimination characteristic method and an exhaustion method, and determining the screened characteristics;
establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data; the classifier is used for classifying the hardness of the high-entropy alloy;
constructing an alloy component search space according to the trained Stacking integration model and the classifier;
and predicting the hardness of the AlCoCrCuFeNi system high-entropy alloy according to the alloy component search space.
2. The method for predicting the hardness of the high-entropy alloy according to claim 1, wherein training the Stacking integration model by using the data set specifically comprises:
determining a primary classifier by using a RandomForest, XGboost and Catboost method;
a secondary classifier is determined using a bayesian regressor.
3. The method for predicting the hardness of the high-entropy alloy according to claim 1, wherein candidate features corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data are screened by using a Pearson correlation coefficient, an XGboost evaluation model, a random forest, a genetic algorithm, an XGboost-based recursive elimination feature method and an exhaustion method, and the screened features are determined, and the method specifically comprises the following steps:
performing correlation screening on candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, and dividing the candidate characteristics into 3 high correlation groups and 1 low correlation group by taking 0.9 as a threshold value; adding the candidate features in the high correlation groups into the low correlation groups, determining corresponding mean square errors by adopting an XGboost evaluation model, and reserving the candidate features of which the mean square errors are smaller than a mean square error threshold value in each high correlation group to obtain the features after primary screening;
respectively adopting a random forest, a genetic algorithm and a recursive elimination characteristic method based on XGboost to the characteristics after the first screening to obtain the characteristics after the second screening;
and (4) adopting an exhaustion method for the characteristics after the second screening to obtain the screened characteristics.
4. The method for predicting the hardness of the high-entropy alloy according to claim 1, wherein a classifier is established by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the hardness data of the high-entropy alloy in the AlCoCrCuFeNi system, and the method specifically comprises the following steps:
carrying out data normalization on the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data;
performing dimensionality reduction on the features of the normalized data by using principal component analysis;
and establishing a classifier according to the features after the dimension reduction by using logistic regression.
5. The method for predicting the hardness of the high-entropy alloy according to claim 1, wherein the predicting the hardness of the high-entropy alloy of the AlCoCrCuFeNi system according to the alloy component search space specifically comprises:
constructing an element molar ratio data set of the high-entropy alloy to be predicted, and judging whether the element molar ratio data set exceeds a set hardness by using a classifier;
and (4) predicting the hardness of the data in the element molar ratio data set exceeding the set hardness by using the trained Stacking integrated model.
6. A system for predicting hardness of a high-entropy alloy, comprising:
the candidate feature acquisition module is used for acquiring corresponding candidate features according to AlCoCrCuFeNi system high-entropy alloy hardness data in the high-entropy alloy hardness database; constructing a data set according to the candidate characteristics and the corresponding AlCoCrCuFeNi system high-entropy alloy hardness data; the hardness data of the AlCoCrCuFeNi system high-entropy alloy comprises the following steps: the molar ratio of each element in each system and the corresponding hardness; the candidate features include: valence electron concentration, entropy of mixing, enthalpy of mixing, atomic radius difference, electronegativity difference, average melting point of the alloy, local electronegativity mismatch, electron concentration, gibbs free energy, shear modulus, young's modulus, lattice distortion energy, shear modulus mismatch, energy in a reinforcing model, condensation energy, peierls-nabaro factor, work function, local atomic radius mismatch, local modulus mismatch, shear modulus difference, average deviation of alloy atomic weight, average deviation of alloy family, average deviation of alloy specific volume, and synthesis parameters;
the trained Stacking integrated model determining module is used for training a Stacking integrated model by utilizing the data set; the trained Stacking integration model comprises the following steps: a primary classifier and a secondary classifier; the trained packing integrated model is used for determining corresponding hardness according to the candidate features;
the screened feature determination module is used for screening candidate features corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting Pearson correlation coefficients, an XGboost evaluation model, a random forest, a genetic algorithm, a recursive elimination feature method based on XGboost and an exhaustion method, and determining the screened features;
the classifier establishing module is used for establishing a classifier by adopting principal component analysis and logistic regression according to the screened characteristics corresponding to the AlCoCrCuFeNi system high-entropy alloy hardness data; the classifier is used for classifying the hardness of the high-entropy alloy;
the alloy component search space construction module is used for constructing an alloy component search space according to the trained Stacking integration model and the classifier;
and the hardness prediction module is used for predicting the hardness of the AlCoCrCuFeNi system high-entropy alloy according to the alloy component search space.
7. The system for predicting the hardness of the high-entropy alloy according to claim 6, wherein the trained Stacking integrated model determining module specifically comprises:
a primary classifier determining unit for determining a primary classifier by using a random forest, XGboost and Catboost method;
and the secondary classifier determining unit is used for determining the secondary classifier by using a Bayesian regression device.
8. The system for predicting hardness of a high-entropy alloy according to claim 6, wherein the post-screening characteristic determination module specifically includes:
the first screening unit is used for carrying out correlation screening on candidate characteristics corresponding to AlCoCrCuFeNi system high-entropy alloy hardness data by adopting a Pearson correlation coefficient, and dividing the candidate characteristics into 3 high correlation groups and 1 low correlation group by taking 0.9 as a threshold value; adding the candidate features in the high correlation groups into the low correlation groups, determining corresponding mean square errors by adopting an XGboost evaluation model, and reserving the candidate features of which the mean square errors are smaller than a mean square error threshold value in each high correlation group to obtain the features after primary screening;
the second screening unit is used for respectively adopting a random forest, a genetic algorithm and a recursive elimination characteristic method based on XGboost to the characteristics after the first screening to obtain the characteristics after the second screening;
and the screened feature determining unit is used for obtaining the screened features by adopting an exhaustion method for the features screened for the second time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211277619.8A CN115527625A (en) | 2022-10-19 | 2022-10-19 | Hardness prediction method and system for high-entropy alloy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211277619.8A CN115527625A (en) | 2022-10-19 | 2022-10-19 | Hardness prediction method and system for high-entropy alloy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115527625A true CN115527625A (en) | 2022-12-27 |
Family
ID=84703806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211277619.8A Pending CN115527625A (en) | 2022-10-19 | 2022-10-19 | Hardness prediction method and system for high-entropy alloy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115527625A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116720058A (en) * | 2023-04-28 | 2023-09-08 | 贵研铂业股份有限公司 | Method for realizing key feature combination screening of machine learning candidate features |
CN117935995A (en) * | 2024-03-21 | 2024-04-26 | 江苏众钠能源科技有限公司 | Hard carbon material screening method and device for ion battery |
-
2022
- 2022-10-19 CN CN202211277619.8A patent/CN115527625A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116720058A (en) * | 2023-04-28 | 2023-09-08 | 贵研铂业股份有限公司 | Method for realizing key feature combination screening of machine learning candidate features |
CN117935995A (en) * | 2024-03-21 | 2024-04-26 | 江苏众钠能源科技有限公司 | Hard carbon material screening method and device for ion battery |
CN117935995B (en) * | 2024-03-21 | 2024-06-11 | 江苏众钠能源科技有限公司 | Hard carbon material screening method and device for ion battery |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115527625A (en) | Hardness prediction method and system for high-entropy alloy | |
Khatamsaz et al. | Multi-objective materials bayesian optimization with active learning of design constraints: Design of ductile refractory multi-principal-element alloys | |
CN107545275A (en) | The unbalanced data Ensemble classifier method that resampling is merged with cost sensitive learning | |
CN108921604B (en) | Advertisement click rate prediction method based on cost-sensitive classifier integration | |
CN112989635B (en) | Integrated learning soft measurement modeling method based on self-encoder diversity generation mechanism | |
JP7411977B2 (en) | Machine learning support method and machine learning support device | |
CN113159264B (en) | Intrusion detection method, system, equipment and readable storage medium | |
CN112634992A (en) | Molecular property prediction method, training method of model thereof, and related device and equipment | |
Khalaf et al. | Hybridized deep learning model for perfobond rib shear strength connector prediction | |
CN106951728B (en) | Tumor key gene identification method based on particle swarm optimization and scoring criterion | |
CN116312890A (en) | Method for screening high-hardness high-entropy alloy by aid of particle swarm optimization algorithm and machine learning | |
CN112002380A (en) | Self-adaptive design method of high-heat-generation energetic material based on machine learning | |
CN115640529A (en) | Novel circular RNA-disease association prediction method | |
van Stein et al. | Neural network design: learning from neural architecture search | |
Wang et al. | Early diagnosis of Parkinson's disease with Speech Pronunciation features based on XGBoost model | |
CN117198417A (en) | Stable crystal structure prediction method and system based on machine learning and target optimization | |
CN117172386A (en) | Dominant reservoir partition identification prediction method, system, electronic equipment and medium | |
Gao et al. | An ensemble classifier learning approach to ROC optimization | |
CN113268936B (en) | Key quality characteristic identification method based on multi-objective evolution random forest characteristic selection | |
Li et al. | Extracting core answers using the grey wolf optimizer in community question answering | |
CN113657106A (en) | Feature selection method based on normalized word frequency weight | |
CN109345274B (en) | Neighbor user selection method based on BP neural network scoring prediction error | |
Huang et al. | Dynamic boosting in deep learning using reconstruction error | |
Colla et al. | GADF—Genetic Algorithms for distribution fitting | |
Srinivas et al. | Feature selection algorithms: A comparative study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |