CN116720058A

CN116720058A - Method for realizing key feature combination screening of machine learning candidate features

Info

Publication number: CN116720058A
Application number: CN202310481517.6A
Authority: CN
Inventors: 方继恒; 杨尚荣; 谢明; 胡洁琼; 张吉明; 刘国化; 杨有才; 赵上强; 马洪伟; 陈永泰; 李爱坤; 宁德魁; 王塞北; 毕亚男; 张巧; 段云昭; 陈松
Original assignee: Yunnan Precious Metals Laboratory Co ltd; Sino Platinum Metals Co Ltd; Kunming Institute of Precious Metals
Current assignee: Yunnan Precious Metals Laboratory Co ltd; Sino Platinum Metals Co Ltd; Kunming Institute of Precious Metals
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-09-08

Abstract

The application discloses a method for realizing key feature combination screening of machine learning candidate features, which comprises the following steps: firstly, primarily screening candidate feature sets through linear correlation filtering; searching and further screening the residual features after linear correlation filtering and screening based on a genetic algorithm for limiting the number of the features; after the features are screened out through a genetic algorithm, feature weight sorting is adopted to sort the importance of the features, and key features which are ranked at the front are screened out through the feature weight sorting to form candidate features which are screened out in an exhaustive way; and finally, screening out the feature combination with the best model prediction precision through exhaustive screening to be used as the final machine learning feature combination. The method can overcome the difficulties of high field knowledge requirement, high computational complexity, low feature universality, low interpretability and the like faced when the traditional feature selection technology is adopted to screen the key feature combination for a large number of candidate feature sets.

Description

Method for realizing key feature combination screening of machine learning candidate features

Technical Field

The application relates to a method for realizing key feature combination screening of machine learning candidate features, and belongs to the technical field of noble metal alloys.

Background

The data and features determine the upper bound of machine learning, and the model and algorithm approach this upper bound. Thus, feature selection becomes particularly important. In materials research, each feature set is typically only specific to the application of a particular condition, and there is no unified feature that is valid for all applications. Therefore, selecting the most appropriate feature for each machine learning process belongs to one of the other challenges.

At present, many efforts are being made to select features. The feature selection method can be classified into 4 types according to the form of feature selection. (1) feature selection based on domain knowledge; feature selection techniques based on domain knowledge are well-interpreted, but in many cases face situations where domain knowledge is inadequate. (2) The filtering method selects features, and common filtering methods include correlation coefficients, variance screening, mutual information and hypothesis testing. The filtering method has the advantages of high efficiency in calculation time, high robustness to the over-fitting problem and the disadvantages: without consideration of the correlation between features, useful correlation features may be miskicked away. (3) packaging options feature, common packaging methods include: full searches (e.g., branch bound searches, breadth-first traversals, directed searches, etc.); heuristic search (e.g., bi-directional search, sequence forward selection, sequence backward selection, etc.), random search (e.g., randomly generated sequence selection algorithm, genetic algorithm, simulated annealing algorithm, etc.), feature subset classification performance found by the wrapper method is generally better than that found by the filtering method, the feature versatility selected by the wrapper method is not strong, feature selection needs to be performed again for the learning algorithm when the learning algorithm is changed, algorithm calculation complexity is high because classifier training and testing are performed every time the subset is evaluated, and the algorithm execution time is long especially for large-scale data sets.

As can be seen from the above analysis, the above feature selection method can perform feature selection, but the above feature selection method cannot simultaneously meet the feature selection requirements of less field knowledge requirements, low computational complexity, strong feature universality, high interpretability and the like, especially when the number of candidate features is large or the application scene is complex, so that the above problem is more prominent, and therefore, a reasonable framework for screening the most suitable feature group needs to be provided to optimize the problem, and the problem is always a problem in predicting the material performance field by using the machine learning technology.

Disclosure of Invention

The application aims to solve the problems that: the method has the advantages that the method is difficult to meet the difficulties of high field knowledge requirement, high computational complexity, low feature universality, low interpretability and the like when a large number of noble metal alloy machine learning candidate feature sets are screened for key feature combinations by adopting the traditional feature selection technology.

The application aims to provide a method for realizing key feature combination screening of machine learning candidate features, which comprises the following steps:

(1) The candidate feature sets are initially screened through linear correlation filtering, in the linear correlation screening, the linear correlation degree of each alloy feature is analyzed, and the linear correlation degree between the features is evaluated through a linear regression correlation coefficient R (see formula 1 for details):

wherein N is the number of samples, x _i And y _i Representing two different characteristics of the ith alloySign (i=1, 2,.,. The term, N),and->Representing the average of these two different features in N alloys.

Taking a correlation coefficient R larger than 0.95 as strong linear correlation, classifying alloy characteristics with strong linear correlation among the alloy characteristics into the same group; each group selects the alloy characteristic with the lowest modeling error by using a single characteristic quantity in the group to represent the combined gold characteristic and enter the subsequent screening; after grouping, the alloy features in each group are in strong linear correlation (|R| > 0.95), and the alloy features in each group have no strong linear correlation.

As a preferable scheme of the application, the number of the feature sets in the candidate feature sets is more than or equal to 60, and when the number of the features is less than 60, the existing mature feature selection technical means can realize key feature combination screening under the condition of simultaneously meeting the feature selection requirements of less field knowledge requirements, low calculation complexity, strong feature universality, high interpretability and the like.

(2) And (3) searching a genetic algorithm based on the number of the limiting features, further screening the remaining features subjected to linear correlation filtering screening for K times, and screening m features each time.

As a further preferable scheme of the application, the number m of limiting feature screening in the step (2) is 5-15, and the screening times K is more than 15; in addition, the model parameters used were: 50-150 generations, 150-250 population, model parameter optimization, 20-70 random sampling, and 5-10 fold cross validation is adopted for each sampling.

(3) After the characteristics are screened out by adopting a genetic algorithm, the characteristic weight sorting is adopted to sort the importance of the characteristics, and the process is as follows:

(a) The remaining n features after linear correlation screening constitute a feature set:

F＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _n } (3)

wherein X is _i And (3) representing the i-th feature which is screened out, wherein n is the number of the residual features after linear correlation screening.

(b) K times of screening are carried out by adopting a genetic algorithm, m features are screened out each time, regression modeling is carried out based on the m features, and the prediction precision is p as shown in the following formula (4) _k : (wherein p _k =1-MAPE, MAPE is the model mean absolute percentage error

1＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P ₁

2＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P ₂

3＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P ₃

…

K＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P _k (4)

(c) The feature weight is shown in the formula (5), and is equal to the sum of products of the same feature and model prediction precision after each genetic algorithm screening, and the sum of products of the features of different types and prediction precision is sequenced later.

W _A Is the feature weight of feature A, n is the number of candidate feature sets,for the kth screening, whether or not a characteristic A is selected, if the selected characteristic is A, then +.>No->

For the kth screening, whether or not a characteristic B is selected, if the selected characteristic is B, then +.>Otherwise->

…

And so on

For the kth screening, whether or not a characteristic N is selected, if the selected characteristic is N, then +.>Otherwise

(d) Feature ordering: sorting the sum of products of the characteristics of different types of characteristics and prediction precision after the characteristic weight formula processing, wherein the drawn graph is the sorting for calculating the precision accumulation sum of the characteristics:

I _n ＝rank(W _A ,W _B ,W _C ,.........,W _N ) (6)

wherein I is _n Representing feature weight ranking results for n candidate features, rank () represents ranking the weights of different features.

(4) The first 12 to 16 most important key features are screened out through feature weight sequencing to form candidate features which are screened out in an exhaustive way, and then feature combinations with optimal model prediction precision are screened out through the exhaustion way: the accuracy and generalization ability of feature combinations in the evaluation model are evaluated by model Mean Absolute Percentage Error (MAPE), and the feature combination with the lowest relative error is taken as the finally screened feature combination.

The sequence of the feature screening steps in the steps (1), (2), (3) and (4) is a stepwise progressive relationship, namely: linear correlation screening, searching by a genetic algorithm based on the number of limiting features, sorting feature weights and exhaustive screening; the difficulty of high field knowledge requirements, high computational complexity, low feature universality, low interpretability and the like faced when a traditional feature selection technology is adopted to screen a large number of candidate feature sets for key feature combinations cannot be overcome by changing the sequence of the feature screening steps or lacking the steps.

Compared with the prior art, the application has the following beneficial effects:

(1) The method has higher effectiveness for screening key feature combinations from a large number of candidate features in the precious metal alloy machine learning process, and can be popularized to the problem of screening key features from a large number of candidate features in other non-precious metal alloy machine learning processes.

(2) The feature screening strategy provided by the method can simultaneously meet the requirements of high interpretability, low computational complexity, high feature universality and good model prediction effect.

Drawings

FIG. 1 shows the results of solid phase temperature feature importance ranking using feature weight ranking;

FIG. 2 is an exhaustive view of the solid phase temperature profile;

FIG. 3 is a graph showing the prediction results of a solid phase temperature model based on the combination of the key features of the screened solid phase temperature.

Detailed Description

The application will now be described in more detail with reference to the drawings and the preferred embodiments, but the scope of the application is not limited to the description.

Example 1

In this embodiment, the analysis and the explanation are performed by taking solid-phase temperature key machine learning feature combination screening of the multi-element noble metal alloy as an example, and the specific steps are as follows:

(1) Machine learning data for 267 sets of multi-element noble metal alloys and corresponding solid phase temperatures were collected, with a partial data set as shown in table 1 (only a portion of the data is shown due to the large amount of data).

TABLE 1 partial multiple noble Metal alloy composition (Wt%) and corresponding solid phase temperature (DEG C) data collected

(2) The solid-phase temperature characteristic set of the multi-element noble metal alloy containing 100 characteristics is constructed through machine learning characteristic engineering (as shown in table 2, the specific value of each characteristic corresponding to each alloy in the table can be obtained through the following characteristic construction formula calculation), the candidate characteristics of machine learning can be generally obtained through the modes of domain knowledge, characteristic engineering and the like, the composition of the characteristic candidate set is mainly determined according to application scenes, and the characteristic types and the characteristic numbers of the candidate characteristic sets of different application scenes are generally different. In this embodiment 1, a machine learning feature is constructed by using feature engineering (compared with a mode of acquiring a feature by knowledge in the field, when facing the application scenario of the small sample data set in this embodiment, the feature constructed by feature engineering can improve universality of a machine learning model), and the feature is selected according to the fact that features associated with solid phase temperature performance are selected as far as possible mainly from the predicted solid phase temperature performance, and the feature engineering process mainly includes: establishing a physical and chemical parameter set, and constructing a feature set for evaluating the influence degree of each parameter on the target quantity according to the chemical proportion of the chemical formula of the collected alloy solder to replace the direct input of the chemical formula; the construction process of the feature set for evaluating the influence degree of each basic physicochemical parameter on the target quantity is as follows:

calculating the mean value factor f of each basic physicochemical parameter of each alloy by the formula (7) _mi Calculating the variance factor f of each basic physicochemical parameter of each alloy by the formula (8) _vi Feature quantity, and f _mi And f _vi As input to a machine learning performance prediction model;

f _mi ＝∑(f _ij ×c _j )/∑c _j (7)

wherein f _mi For the average alloy factor characteristic, f _vi Is characterized by a variance alloy factor, f _ij The ith physicochemical parameter (i=1, 2, …; j=1, 2, … n) representing the jth element, n representing the number of components of the alloy, the alloy collected in this example being eight-component alloy at the maximum, so that n=8, c _j Representing the mass percent of the jth element in the alloy.

For how each alloy is converted to 100 new alloy features shown in table 2 by formulas (7) and (8), reference may be made specifically to the feature construction method in patent CN114580271a (a method for achieving multi-element precious metal alloy solder solid-liquid phase temperature prediction) in example 1, because the work of construction of features in this example is huge, and the focus of the present document is also focused mainly on the feature screening process rather than the feature construction process, much like the construction process in patent CN114580271a, so the feature construction process will not be described in detail here.

TABLE 2 structured alloy features and corresponding numbering thereof

(3) And (3) primarily screening the candidate feature set through linear correlation filtering, analyzing the linear correlation degree of each alloy feature in the linear correlation screening, and evaluating the linear correlation degree between the features through a linear regression correlation coefficient R, wherein R is calculated as follows:

wherein N is the number of samples, in this embodiment N is 267, x _i And y _i Two different characteristics of the ith alloy are represented (i=1, 2,., N),and->Representing the average of these two different features in N alloys.

As can be seen from the calculation formula, the R value calculation amount of any two characteristics of 267 alloys is very large, and the space occupied by the display is also very large, so that specific calculation examples are not shown here, and the work is usually obtained by coding calculation through Matlab or Python software.

Taking the correlation coefficient larger than 0.95 as strong linear correlation, classifying alloy features with strong linear correlation among the alloy features into the same group; each group selects the alloy characteristic with the lowest modeling error by using a single characteristic quantity in the group to represent the combined gold characteristic and enter the subsequent screening; after grouping, the alloy features in each group are in strong linear correlation (|R| > 0.95), and the alloy features in each group have no strong linear correlation; after linear correlation screening, the remaining alloy characteristics of the solid phase temperature model were 55, and the characteristics are shown in table 3 (each characteristic in the table corresponds to a specific value of each alloy, and the calculation mode is the same as that in table 1).

TABLE 3 55 characterization of the remaining 55 after Linear correlation screening

/>

(4) The method is characterized in that the method further carries out 50 times of screening on the residual characteristics after linear correlation filtering screening based on the genetic algorithm search of limiting the number of the characteristics, wherein the number of the characteristics screened each time is 10, and the adopted model parameters are as follows: 100 generations, 200 populations, model parameter optimization, 50 random samples, and 5-fold cross validation for each sample.

(5) After the genetic algorithm is adopted to screen the features, the feature weight ranking is adopted to rank the importance of the features, and the result is shown in figure 1. The feature prediction precision addition and sequencing can be known: the first 5 key features affecting the solid phase temperature are numbered 35, 38, 81, 74 and 85, corresponding to the melting enthalpy average, bulk modulus average, atomic radius 2 (coordination number 12) variance, ambient atomic number variance value, and melting enthalpy variance feature, respectively, and particularly the fitness value of the two features 35 and 38 is high, which means that the two features 35 and 38 are the most critical machine learning features affecting solid phase temperature prediction.

(6) The first 12 most important key features are selected through feature weight sorting to form candidate features which are selected through exhaustion, then feature combinations with optimal model prediction precision are selected through exhaustion, the precision and generalization capability of the feature combinations in an evaluation model are evaluated through model average absolute percentage error (MAPE), and the feature combination with the lowest relative error is used as the feature combination which is selected finally; the calculation formula of the model Mean Absolute Percentage Error (MAPE) is shown in formula (4), the exhaustion result of the solid phase temperature is shown in figure 2, and the specific alloy characteristic types which are screened out by the solid phase temperature model in an exhaustion way are shown in table 4.

Where N is the number of samples, n=267, y in this embodiment _i Is the actual value of the solid phase temperature,is the solid phase temperature predicted value, y _i Is the average of the actual values of the solid phase temperature.

TABLE 4 results of solid phase temperature model alloy feature screening

Based on the alloy screening result, adopting a learner-Support Vector Regression (SVR) algorithm consistent with the alloy characteristic screening to carry out regression modeling; the data set is randomly divided into a training set (80%) and a testing set (20%), the modeling result of the solid phase temperature is shown in fig. 3, and the result shows that for the solid phase temperature prediction model, the percentage error of the training set is 4.72%, the percentage error of the testing set is 9.83%, and the errors are smaller, so that the model trained according to the screened solid phase temperature characteristic combination has better effect and better generalization capability.

Example 2

In this embodiment, the analysis and the explanation are performed by taking the combination and screening of the liquid phase temperature key machine learning characteristics of the multi-element noble metal alloy as an example, and the specific steps are as follows:

(1) Machine learning data for 267 sets of multi-element noble metal alloys and corresponding liquidus temperatures were collected, with a partial data set as shown in table 5 (only a portion of the data is shown due to the large amount of data).

TABLE 5 collected partial multiple noble metal alloy composition (Wt%) and corresponding liquidus temperature (DEG C) data

/>

(2) The multi-element noble metal alloy liquidus temperature characteristic set containing 100 characteristics is constructed through machine learning characteristic engineering, and the same characteristic construction method and the same characteristic types and numbers in the embodiment 1 are adopted because liquidus temperature and solidus temperature properties are similar, and the repeated display is omitted.

(2) And (3) primarily screening the candidate feature set through linear correlation filtering, analyzing the linear correlation degree of each alloy feature in the linear correlation screening, evaluating the linear correlation degree between the features through a linear regression correlation coefficient R (see formula 1 in detail), and classifying the alloy features with strong linear correlation among the alloy features into the same group by taking the correlation coefficient larger than 0.95 as strong linear correlation. Each group selects the alloy characteristic with the lowest modeling error by using a single characteristic quantity in the group to represent the combined gold characteristic for subsequent screening. After grouping, the alloy features in each group are in strong linear correlation (|R| > 0.95), and the alloy features in each group have no strong linear correlation. After linear correlation screening, the alloy characteristics of the liquidus temperature model remained 55.

(3) The genetic algorithm search based on the limiting feature number further carries out 40 times of screening on the remaining features after linear correlation filtering screening, the number of the features screened each time is 5, and the adopted model parameters are as follows: 100 generations, 200 populations, model parameter optimization, 50 random samples, and 5-fold cross validation for each sample.

(4) After the genetic algorithm is adopted to screen the characteristics, the characteristic weight sorting is adopted to sort the importance of the characteristics.

(5) The first 14 most important key features are selected through feature weight sorting to form candidate features which are selected through exhaustion, then feature combinations with optimal model prediction precision are selected through exhaustion, the precision and generalization capability of the feature combinations in an evaluation model are evaluated through model average absolute percentage error (MAPE), the feature combinations with the lowest relative error are used as feature combinations which are selected finally, and the specific alloy feature types which are selected through exhaustion by a liquid phase temperature model are shown in table 6.

TABLE 6 results of screening alloy characteristics for liquid phase temperature model

Feature numbering	Eigenvalues
		35	Average value of melting enthalpy
81	Variance of atomic radius 2 (coordination number 12)
		95	Variance of mass attenuation coefficient CrKalpha
39	Mean Young's modulus
		68	Melting Point 1 variance

(6) Based on the alloy screening result, adopting a learner-Support Vector Regression (SVR) algorithm consistent with the alloy characteristic screening to carry out regression modeling, wherein the modeling result of the liquid phase temperature shows that: the percentage error of the training set is 3.72%, the percentage error of the test set is 9.35%, and the errors are smaller, so that the model trained according to the screened liquid phase temperature characteristic combination has better effect and better generalization capability.

Example 3

The embodiment takes the combination screening of the conductivity key machine learning characteristics of the noble metal electric contact alloy material as an example for analysis and explanation, and specifically comprises the following steps:

(1) Machine learning data for 205 sets of precious metal electrical contact alloy material compositions and corresponding conductivity properties were collected, with a partial data set as shown in table 7 (only a portion of the data is shown due to the large amount of data).

TABLE 7 collected partial noble Metal electric contact alloy Material composition (Wt%) and corresponding conductivity (% IACS) Performance data

Sequence number	Ag	Cu	Ni	Au	Pd	Ce	Pt	Cd	V	Conductivity of electric conductivity
											1	90	10	0	0	0	0	0	0	0	90.74
2	85	15	0	0	0	0	0	0	0	82.1
											3	80	20	0	0	0	0	0	0	0	82.1
4	75	25	0	0	0	0	0	0	0	82.1
											5	72	28	0	0	0	0	0	0	0	78.37
6	50	50	0	0	0	0	0	0	0	78.37
											7	78	0	0	0	22	0	0	0	0	16.9
8	70	0	0	0	30	0	0	0	0	11.49
											9	60	0	0	0	40	0	0	0	0	8.62
10	50	0	0	0	50	0	0	0	0	5.7
											11	90	0	0	10	0	0	0	0	0	47.89
12	80	0	0	20	0	0	0	0	0	28.74
											13	60	0	0	40	0	0	0	0	0	20.28
14	40	0	0	60	0	0	0	0	0	15.67
											15	95	0	0	0	0	0	5	0	0	45.37
16	90	0	0	0	0	0	10	0	0	29.73
											17	88	0	0	0	0	0	12	0	0	28.74
18	80	0	0	0	0	0	20	0	0	17.07
											19	20	0	0	0	0	0	80	0	0	5.75
20	25	0	0	0	0	0	75	0	0	5.3
											21	86	0	0	0	0	0	0	14	0	59.45
22	84	0	0	0	0	0	0	16	0	43.1
											23	83	0	0	0	0	0	0	17	0	35.92

(2) The precious metal electric contact alloy material conductivity feature set containing 120 features is constructed through machine learning feature engineering, and since the part works in another unpublished research result, no specific feature display is performed here, and the feature construction process is similar to that of the embodiment 1, except that the types and the numbers of the features are different.

(3) Preliminary screening is carried out on candidate feature sets through linear correlation filtering, in the linear correlation screening, the linear correlation degree of each alloy feature is analyzed, the linear correlation degree between the features is evaluated through a linear regression correlation coefficient R (see formula 1 in detail), the correlation coefficient is more than 0.95 and is taken as strong linear correlation, and the alloy features with strong linear correlation among the alloy features are classified into the same group; each group selects the alloy characteristic with the lowest modeling error by using a single characteristic quantity in the group to represent the combined gold characteristic and enter the subsequent screening; after grouping, the alloy features in each group are in strong linear correlation (|R| > 0.95), and the alloy features in each group have no strong linear correlation. After linear correlation screening, the alloy characteristics of the conductivity model remained as 63.

(4) The genetic algorithm search based on the number of limiting features further carries out 60 times of screening on the remaining features after linear correlation filtering screening, the number of the features screened each time is 15, and the adopted model parameters are as follows: 100 generations, 200 populations, model parameter optimization, 50 random samples, and 5-fold cross validation for each sample.

(5) After the genetic algorithm is adopted to screen the characteristics, the characteristic weight sorting is adopted to sort the importance of the characteristics.

(6) The first 15 most important key features are selected through feature weight sorting to form candidate features which are selected through exhaustion, then feature combinations with optimal model prediction precision are selected through exhaustion, the precision and generalization capability of the feature combinations in an evaluation model are evaluated through model average absolute percentage error (MAPE), and the feature combination with the lowest relative error is used as the feature combination which is selected finally; the specific alloy characteristic types which are screened out by the conductivity model are shown in table 8.

TABLE 8 results of conductivity model alloy feature screening

Feature numbering	Eigenvalues
		3	Group number average
10	Third ionization energy average value
		95	Variance of mass attenuation coefficient CrKalpha
62	Variance of chemical potential energy

(7) Based on the alloy screening result, adopting a learner-Support Vector Machine (SVM) algorithm consistent with the alloy characteristic screening to carry out regression modeling, wherein the modeling result of the conductivity shows that: the percentage error of the training set is 4.12%, the percentage error of the test set is 3.99%, and the errors are smaller, so that the model trained according to the screened conductivity characteristic combination has better effect and better generalization capability.

Claims

1. The method for realizing the key feature combination screening of the machine learning candidate features is characterized by comprising the following steps of:

(1) Preliminary screening is carried out on the candidate feature set through linear correlation filtering, the strong linear correlation is taken as a correlation coefficient larger than 0.95, and the features with the strong linear correlation in the feature set are classified into the same group;

(2) Searching a genetic algorithm based on the number of the limiting features, further screening the remaining features subjected to linear correlation filtering screening for K times, and screening m features each time;

(3) After the characteristics are screened out by adopting a genetic algorithm, the characteristic weight ranking is adopted to rank the importance of the characteristics;

(4) The first 12 to 16 most important key features are screened out through feature weight sorting to form candidate features which are screened out in an exhaustive way, and then the feature combination with the best model prediction precision is screened out through the exhaustion way.

2. The method for implementing key feature combination screening for a machine learning candidate feature of claim 1, wherein: the number of the feature sets in the candidate feature sets is more than or equal to 60.

3. The method for implementing key feature combination screening for a machine learning candidate feature of claim 1, wherein: the formula for evaluating the correlation coefficient in step (1) is as follows:

where N is the number of samples,x _i and y _i Two different characteristics of the ith alloy are represented (i=1, 2,., N),andrepresenting the average of these two different features in N alloys.

4. The method for implementing key feature combination screening for a machine learning candidate feature of claim 1, wherein: the number of limiting feature screening in the step (2) is 5-15, namely m=5-15; in addition, the screening times K is more than 15, and the adopted model parameters are as follows: 50-150 generations, 150-250 population, model parameter optimization, 20-70 random sampling, and 5-10 fold cross validation is adopted for each sampling.

5. The method for implementing key feature combination screening for a machine learning candidate feature of claim 1, wherein: in the step (3), feature weight sorting is adopted to sort the feature importance, and the process is as follows:

(a) The remaining n features after linear correlation screening constitute feature set F:

F＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _n } (3)

wherein X is _i Representing the i-th feature which is screened out, wherein n is the number of the residual features after linear correlation screening;

(b) K times of screening are carried out by adopting a genetic algorithm, m features are screened out each time, regression modeling is carried out based on the m features, and the prediction precision is p as shown in the following formula (4) _k Wherein p is _k =1-MAPE, MAPE is the mean absolute percentage error of the model;

1＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P ₁

2＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P ₂

3＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P ₃

…

K＝{X ₁ ,X ₂ ,X ₃ ,X ₄ …X _i ,X _i+1 …X _m }→P _k (4)

(c) The feature weights are shown in a formula (5), the feature weights are equal to the sum of products of the same features and model prediction precision after each genetic algorithm screening, and the sum of products of the features of different types and the prediction precision is sequenced later;

For the kth screening, whether feature B was screened,if the screening is characterized by B, then +.>No->

…

And so on

For the kth screening, whether or not a characteristic N is selected, if the selected characteristic is N, then +.>No->

I _n ＝rank(W _A ,W _B ,W _C ,.........,W _N ) (6)

6. The method for implementing key feature combination screening for a machine learning candidate feature of claim 1, wherein: and (4) evaluating the precision and generalization capability of the feature combinations in an evaluation model through a model average absolute percentage error (MAPE), and taking the feature combination with the lowest relative error as the finally screened feature combination.