CN115691700A

CN115691700A - High-entropy alloy hardness prediction method based on double-granularity clustering integration algorithm of three consensus strategies

Info

Publication number: CN115691700A
Application number: CN202211397847.9A
Authority: CN
Inventors: 李述; 单云霄; 李帅; 崔禹欣; 李福祥
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-02-03
Anticipated expiration: 2042-11-09
Also published as: CN115691700B

Abstract

The invention provides a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies, relates to the technical field of alloy hardness prediction, and aims to solve the problems that a single clustering method cannot be simultaneously applied to data sets with different distribution characteristics and a stable and uniform clustering effect cannot be achieved even under the same data distribution in the prior art. The invention leads the high entropy alloy data set X = { X = ₁ ，x ₂ ，...，x _N }∈R ^h Method for generating base clustering combination pi = { pi by using clustering algorithm ¹ ，π ² ，...，π ^M }; embedding a given consensus function into a selection strategy, removing noise members in a base cluster set, calculating a consensus result of the base cluster combination, and fusing the consensus results obtained after removing the noise members by adopting an adjustable DS evidence theory consensus strategy to obtain the most common results of different clustersFinal dividing results; and respectively establishing a regression model for the different clusters to carry out hardness prediction calculation. The clustering method adopted by the invention can extract a plurality of pieces of basic clustering information, and realize a clustering result with better performance.

Description

High-entropy alloy hardness prediction method based on double-granularity clustering integration algorithm of three consensus strategies

Technical Field

The invention relates to the technical field of alloy hardness prediction, in particular to a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies.

Background

In recent years, under the guidance of a multi-component alloy design idea, researchers discover a novel metal material, namely a high-entropy alloy, which has both structural disorder and chemical disorder by changing and modulating the configuration entropy of an alloy system; the wear-resistant steel has high hardness, good wear resistance, excellent low-temperature fracture toughness, excellent magnetic performance and other excellent physical and mechanical properties. For such material design, conventional experiments or theoretical calculations consume a lot of time and raw materials, and the requirements for experimental equipment are high. In addition, when complex theoretical calculations are performed, the method using machine learning can effectively infer the relationship between the material characteristics and the target attributes by constructing a model without a large amount of time and money cost. However, sometimes for a given unknown high-entropy alloy data set, which contains alloy materials with different intrinsic properties and rules, if the alloy materials of the whole data set are put together to train a model for predicting hardness, it is difficult to obtain a more accurate prediction model.

Clustering is used as an analysis mode technology without prior knowledge, plays a key role in the process of exploring data internal structure information, and the prior patent is a high-entropy alloy hardness prediction method based on an improved density peak value clustering algorithm, and the patent application number is as follows: CN202210221449.5, through an improved density peak value clustering algorithm, can improve the model prediction ability, but still there is a problem that the method cannot be applied to data sets with different distribution characteristics at the same time due to the constraint of a limited applicable range, and a stable and uniform clustering effect cannot be achieved under the same data distribution; the method has the problems that the stability is sacrificed, the accuracy or the generalization capability is sacrificed, and the good clustering effect cannot be achieved on all the distribution type data. The existing research proves that member subset selection of the base cluster has a crucial influence on the final consensus clustering result, but the optimal clustering result can be obtained by fusing all member information, and the contribution of other high-quality members can be weakened by the participation of the noise members with poor quality in the base cluster, so that the integral level of the integration effect can be inhibited. The hidden danger can be avoided by adopting a cluster integration selection (CES) technology. However, CES technology has several obstacles to overcome and break through; first, existing selection strategies are too dependent on parameters and the structure of the data set itself, and adaptive selection strategies are lacking. Secondly, actual spatial position information among samples is ignored on the reconstructed cluster-to-cluster, sample-to-sample or cluster-to-sample relation matrix, so that the real relation among objects cannot be accurately described for analysis and one-sidedness of the relation matrix, and the final consensus result is influenced. Moreover, the lack of a global perspective addresses the problem of inconsistent partitioning between different consensus strategies. The DS evidence theory is used as an effective means for solving the problems of conflict and uncertainty, but the application of the DS evidence theory in cluster integration is usually concentrated on the level of fusing single cluster results, and the research is not carried out on the level of consensus strategy of a cluster integration selection framework. Moreover, when high conflict exists among evidences, the traditional DS evidence theory lacks robustness, so that the reliability of a fusion result is reduced; therefore, a higher dimensional viewing angle is required to integrate different consistency results.

Disclosure of Invention

The technical problem to be solved by the invention is as follows:

the single clustering method is adopted, because the constraint of the limited applicable range can not be simultaneously suitable for the data sets with different distribution characteristics, the stable and uniform clustering effect can not be achieved even under the same data distribution; either the stability is sacrificed, or the accuracy or the generalization capability is sacrificed, and meanwhile, the existing integrated clustering algorithm cannot effectively integrate the partitioning conflict generated among different consensus results; satisfactory clustering effect cannot be achieved;

the technical scheme adopted by the invention for solving the technical problems is as follows:

the invention provides a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies, which is based on a backward clustering integration selection framework (BCESF), and comprises the following steps:

s1, a base clustering generation process;

for a high entropy alloy dataset X = { X = ₁ ,x ₂ ,…,x _N }∈R ^h X is a high-entropy alloy sample point, h is a characteristic dimension of each sample point, M basis clustering results are generated by using a clustering algorithm, and a basis clustering combination pi = { pi = pi ¹ ,π ² ,…,π ^M }；

S2, selecting a subset of members of the base cluster;

embedding a given consensus function into a selection strategy, and calculating a base cluster combination pi = { pi ] under the given consensus strategy ¹ ,π ² ,…,π ^M The consensus result of the base clustering set is obtained, and the noise members in the base clustering set are removed, so that the optimal clustering subset pi is obtained ^* ＝{π ^1* ,π ^2* ,…,π ^L* },L≤M；

S3, consensus clustering process;

fusing the optimal consensus results obtained under the consensus functions based on the optimal clustering subset obtained in the step S2 by adopting an adjustable DS evidence theory to obtain the final division results of the different clusters;

and S4, establishing a regression model for each obtained different cluster, and performing high-entropy alloy hardness prediction calculation.

Further, the S1 includes the following processes:

utilizing local density rho by density peak clustering algorithm _i And a relative distance δ _i Screening out possible candidate clustering centers C _P :

Setting a random initialization range of a cluster number to

wherein ,|C_P L is the set C _P The number of elements in (1);

randomly deleting one of the high-correlation attribute pairs by adopting a Pearson correlation coefficient algorithm, and generating a base clustering result by adopting the remaining characteristics;

respectively generating M/2 base clustering results by adopting a fuzzy C mean value algorithm and a density peak value clustering algorithm to obtain a base clustering combination pi = { pi = ¹ ,π ² ,…,π ^M }。

Further, the S2 includes the following processes:

s21, pi = { pi) based on base cluster combination ¹ ,π ² ,…,π ^M Calculating the consensus result of the base clustering combination under a given consensus strategy, and calculating the normalized mutual information of the consensus result, namely an NMI value;

s22, clustering combination pi = { pi at base ¹ ,π ² ,…,π ^M On the basis of the method, respectively and independently calculating an NMI value which is combined by removing one base cluster in sequence, and then selecting an NMI value to achieve the optimal combination containing M-1 base clusters;

s23, based on the obtained combination containing M-1 base clusters, respectively and independently calculating the NMI value of the combination formed by removing one base cluster in sequence, and selecting the combination containing M-2 base clusters which enables the NMI value to reach the optimal value, so as to carry out iterative calculation until no base cluster which can be removed exists;

selecting the base cluster combination II with the highest NMI value score ^* ＝{π ^1* ,π ^2* ,…,π ^L* And L is less than or equal to M and is used as the optimal base cluster subset under the given consensus strategy.

Further, the consensus strategy in S21 includes a spectral clustering base consensus strategy and a density peak clustering base consensus strategy;

the spectral clustering base consensus strategy is a modified similarity matrix S ^DIS As input, a modified similarity matrix S with the sample points as nodes is constructed ^DIS As a new undirected graph of an adjacency matrix between nodes

Wherein V = X is a node set composed of sample points,

is an edge set; in undirected graph

In (1), similarity matrix S ^DIS Determining the weight of an edge, for a given node x _i and x_j The edge weight between the two is defined as:

to undirected graph

The Laplace matrix of (2) is regularized:

wherein, I is a unit matrix, D belongs to R ^N×N Is a degree matrix, and any one element on the diagonal line

For regularization

Performing eigenvalue decomposition to obtain the minimum front C ^* The characteristic vector corresponding to each characteristic value; from this C ^* The characteristic vectors are expanded according to the column standardization to form a new matrix

Finally, obtaining a consensus clustering result pi by using a K mean value clustering algorithm on the basis of the matrix F _SC Namely:

wherein ,

embedding SC as consensus strategy into BCESF algorithm to obtain the optimal base cluster member combination.

Further, the modified similarity matrix S ^DIS The establishing process comprises the following steps:

wherein ,d_i,j Is a sample point x _i and x_j With min (d) and max (d) being the minimum and maximum values of the distance, respectively.

Further, the density peak clustering base consensus strategy in S21 is a modified distance matrix D ^SIM As input, based on the distance matrix D ^SIM Calculating the local density p _i ：

wherein ,d_c As the truncation distance, 1% of the distance in ascending order is usually taken2% position;

when x is _i Relative distance delta for non-maximum local density points _i By a distance x _i Nearest sample point x _j Determining:

when x is _i The relative distance delta of the point of maximum local density _i Is denoted as delta _max Namely:

δ _max ＝max _j (d _i,j ) (11)

local density ρ obtained based on the above calculation _i And relative distance delta _i Before selection of C ^* Each has a maximum gamma _i ＝ρ _i ·δ _i Sample points of values and marking them as cluster centers, where the local density p _i And a relative distance δ _i Satisfy the requirement of

And is

Finally, distributing each residual non-central point to the point closest to the non-central point as the same cluster to obtain a consensus clustering result pi _DC Namely:

wherein ,

embedding the DC as a consensus strategy into the optimal base cluster member combination obtained by the BCESF algorithm.

Further, the distance matrix D ^SIM The establishment process comprises the following steps:

further, the calculation process of S3 is:

first, each sample point x is calculated _i K nearest neighbor NN of (2) _k (x _i ) Said NN _k (x _i ) The calculation formula of (c) is:

wherein ,N_k (x _i ) Is the sample point x _i The kth neighbor of (1);

NN-based _k (x _i ) And the qth clustering integration algorithm Y _q Calculating a sample point x _i Elementary probability value m belonging to cluster label r _q (A _r ) The initial values of (a) are:

wherein, | r (x) _j ) I is the sample point x _i K is the number of elements in the neighbor belonging to the cluster label r;

for the initial m _q (A _r ) Are weighted to obtain

By an adjustable coefficient w _q and m_q (A _r ) Determining, namely:

wherein ,

as shown in equation (20), the Q consensus results are fused to obtain a fused result m (A) _r )：

Class A is calculated by _r Confidence value of (d):

and finally, performing final distribution on the cluster labels to which the sample points belong according to the obtained confidence value result:

obtaining a fusion result pi based on a consensus strategy DSC _DSC Namely:

π _DSC ＝BCESF-DSC(Y ₁ ,Y ₂ ,…,Y _Q ) (23)。

further, the regression model in S4 is a linear SVR model.

A high-entropy alloy hardness prediction system based on a dual-granularity cluster integration algorithm of three consensus strategies is provided with a program module corresponding to the steps of any one of the technical schemes, and the steps in the high-entropy alloy hardness prediction method based on the dual-granularity cluster integration algorithm of the three consensus strategies are executed during running.

Compared with the prior art, the invention has the beneficial effects that:

the invention discloses a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies, which is characterized in that a BCESF method without presetting a parameter threshold or human intervention is designed, three different consensus strategies SC, DC and DSC are adopted, the consensus strategies SC and DC simultaneously consider the internal relation between the co-occurrence frequency and the actual spatial position information and take the reconstructed relation matrix as input, more practical data structure information can be mined, and the consensus strategy DSC adopts an improved adjustable DS evidence theory to fuse the consensus results of the consensus strategies SC and DC on an integrated integration layer. The consensus strategy based on the adjustable DS evidence theory not only has the characteristic of self-adaptive adjustment of the label probability, so that the consensus strategy can be automatically adjusted along with the change of a data set structure and an integration means, but also has better conflict resolution capability than the traditional DS evidence theory, thereby further obtaining a consensus result with higher confidence coefficient. According to the method, a plurality of pieces of basic clustering information are extracted, and three consensus strategies are designed on the double-granularity level to accurately capture the information of the hidden complicated structure so as to obtain a final clustering result with better performance.

Drawings

FIG. 1 is a flowchart of a high-entropy alloy hardness prediction method based on a dual-granularity clustering integration algorithm of three consensus strategies in an embodiment of the invention;

FIG. 2 is a schematic diagram of an adjustable DS evidence theory model fusion in an embodiment of the present invention;

FIG. 3 is a comparison graph of hardness prediction results of multiple models in the embodiment of the present invention, wherein the linear SVR model, the BCESF-SC + linear SVR model, the BCESF-DC + linear SVR model, and the fitting condition of the average prediction result and the experimental result of the method of the present invention, which are run 30 times under 80% of training set and 20% of testing set, are sequentially shown from top to bottom;

fig. 4 is a comparison diagram of a plurality of model hardness prediction results in the embodiment of the present invention, wherein a linear SVR model, a BCESF-SC + linear SVR model, a BCESF-DC + linear SVR model, and a fitting condition of an average prediction result and an experimental result of the method of the present invention, which is run 30 times under 70% of a training set and 30% of a testing set, are sequentially provided from top to bottom.

Detailed Description

In the description of the present invention, it should be noted that the terms "first", "second" and "third" mentioned in the embodiments of the present invention are only used for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, features defined as "first", "second", and "third" may explicitly or implicitly include one or more of the features.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Referring to fig. 1 to 4, the invention provides a high-entropy alloy hardness prediction method based on a dual-granularity clustering integration algorithm of three consensus strategies, as shown in fig. 1, comprising the following steps:

s1, a base clustering generation process;

for a high entropy alloy dataset X = { X = ₁ ,x ₂ ,…,x _N }∈R ^h X is a high-entropy alloy sample point, and h is a characteristic dimension of each sample point;

the local density ρ is calculated by the following formula _i And relative distance delta _i ；

δ _max ＝max _j (d _i,j ) (3)

Utilizing local density rho through a density peak value clustering algorithm _i And relative distance delta _i Screening out possible candidate clustering centers C _P :

Setting the random initialization range of the cluster number as

wherein ,|C_P L is the set C _P Number of elements in (1).

In a conventional manner ([ c,2 c)]Or

) When the random generation range of the cluster number is set, if the difference between the right boundary and the real cluster number is too large, a base cluster which is seriously deviated from the actual base cluster may be generated, thereby affecting the final integration effect. Therefore, a more reasonable right boundary value is determined by the method of the embodiment.

Randomly deleting one of the high-correlation attribute pairs by adopting a pearson correlation coefficient algorithm, wherein the high-correlation attribute pair meets the requirement that the absolute value of the correlation is a threshold value of the absolute value of the correlation, and generating a base clustering result by adopting the rest characteristics;

respectively generating M/2 base clustering results by adopting a fuzzy C mean value algorithm and a density peak value clustering algorithm to obtain a base clustering combination pi = { pi = ¹ ,π ² ,…,π ^M }. Generating the base clusters under the two distinct and complementary partitioning modes can better achieve the balance of quality and diversity.

The method can avoid the occurrence of over-extreme members in the generation process of the base clusters, realize the optimal balance between quality and diversity and make solid foundation preparation for the implementation of subsequent steps.

S2, selecting a subset of members of the base cluster;

embedding a given consensus function into a selection strategy, and calculating a base cluster combination pi = { pi ] under the given consensus strategy ¹ ,π ² ,…,π ^M The consensus result of the base clustering set is obtained, and the noise members in the base clustering set are removed, so that the optimal clustering subset pi is obtained ^* ＝{π ^1* ,π ^2* ,…,π ^L* },L≤M。

The S2 comprises the following processes:

s22, clustering combination pi = { pi ] at base ¹ ,π ² ,…,π ^M On the basis of the method, respectively and independently calculating an NMI value which is combined by sequentially removing one base cluster, and then selecting an NMI value to reach the optimal combination containing M-1 base clusters;

s23, based on the obtained combination comprising M-1 base clusters, respectively and independently calculating the NMI value of the combination formed by removing one base cluster in sequence, and selecting the combination which enables the NMI value to reach the optimal value and comprises M-2 base clusters, so as to carry out iterative calculation until no base cluster which can be removed exists;

In the embodiment, the given consensus function is embedded into the selection strategy, and the final basis cluster combination is determined in an iterative manner, so that noise members can be eliminated on the premise of not introducing additional parameters, and the consensus quality of the basis cluster combination can be improved.

The consensus strategy in S21 comprises a spectral clustering base consensus Strategy (SC) and a density peak value clustering base consensus strategy (DC);

the spectral clustering base consensus Strategy (SC) is based on a modified similarity matrix S ^DIS As input, a modified similarity matrix S with sample points as nodes is constructed ^DIS New undirected graph for adjacency matrix between nodes

Wherein V = X is a node set composed of sample points，

Is an edge set; in undirected graph

to undirected graph

The Laplace matrix of (2) is regularized:

For regularization

Performing eigenvalue decomposition to obtain the minimum front C ^* Feature vectors corresponding to the feature values, where the number of clusters C into which the target data set X is ultimately divided ^* Is required to be preset; from this C ^* The characteristic vector is expanded according to the column standardization to form a new matrix

wherein ,

The modified similarity matrix S ^DIS The establishing process comprises the following steps:

wherein ,d_i,j Is a sample point x _i and x_j In between, min (d) and max (d) are the minimum and maximum values of the distances, respectively.

The density peak clustering basis consensus strategy (DC) in S21 is modified distance matrix D ^SIM As input, based on the distance matrix D ^SIM Calculating the local density p _i ：

wherein ,d_c For the truncation distance, the position of 1% -2% of the distance in ascending order is generally taken;

δ _max ＝max _j (d _i,j ) (14)

local density ρ obtained based on the above calculation _i And relative distance delta _i Before selection of C ^* Each has a maximum gamma _i ＝ρ _i ·δ _i Sample points of values and marking them as cluster centers, where the local density ρ _i And a relative distance δ _i Satisfy the requirement of

And is

wherein ,

The distance matrix D ^SIM The establishment process comprises the following steps:

in the cluster integration selection problem, the use of the transmission is often usedThe unified co-incidence matrix is used as an input of the consensus function to reflect the similarity relation between the sample pairs, i.e. pi = { pi ] for a given set of basis cluster members ¹ ,π ² ,…,π ^M The set of clusters of all base clusters in pi is

Co-correlation matrix a = { a = _ij } _N×N Then the degree of similarity between the two samples is indicated, a _ij The larger the illustrated sample point x _i And x _j And are divided into the same cluster in more base clusters, and the expression is as follows:

it can be seen from the calculation method, the co-occurrence of the sample pairs in each base cluster is simply counted, and the attraction difference between different sample pairs is ignored. The distance of the actual distance even in the same cluster has a non-trivial impact on the degree of similarity between pairs of samples.

In view of this, the present embodiment employs two modified relationship matrices to more fully capture the co-occurrence relationship between pairs of samples; they not only represent the co-occurrence frequency of the sample pairs in the macroscopic view, but also take into account the local spatial position information in the microscopic view. The two are fully fused and corrected with each other, the internal relation of the sample pair hidden in the depth is excavated from more diversified angles, and more accurate and more practical input information is provided for the following consensus strategy.

S3, consensus clustering process;

as shown in fig. 2, the optimal consensus results obtained under the consensus functions are fused based on the optimal clustering subset obtained in S2 by using an adjustable DS evidence theory to obtain the final partitioning results of the different clusters; the calculation process is as follows:

wherein ,N_k (x _i ) Is the sample point x _i The kth neighbor of (1);

wherein, | r (x) _j ) I is the sample point x _i K is the number of elements belonging to the cluster label r in the neighbor;

obviously, m _q (A _r ) The basic probability that any sample point belongs to any cluster label can be effectively represented by the label distribution in the statistical neighborhood.

For the initial m _q (A _r ) Are weighted to obtain

By an adjustable coefficient w _q and m_q (A _r ) Determining, namely:

wherein ,

as shown in equation (25), the consensus results of Q species (i.e., the above two BCESF-SC and BCESF-DC) are fused to obtain a fusion result m (A) _r )：

Class A is calculated by _r The confidence value of (c):

finally, the cluster label to which the sample point belongs is finally distributed according to the obtained confidence value result, namely the cluster label with the highest confidence value is the sample point x _i The cluster in which:

obtaining a fusion result pi based on a consensus strategy DSC _DSC Namely:

π _DSC ＝BCESF-DSC(Y ₁ ,Y ₂ ,…,Y _Q ) (28)。

the problem of inconsistent division results still exists after the target data set is processed by various clustering integration algorithms; therefore, the embodiment adopts a higher-dimensional view to globally integrate different consistency results. The common identification results obtained by the BCESF-SC and BCESF-DC algorithms have the same cluster number C ^* The BCESF-DSC adopts the method of maximum inter-cluster intersection to correspond the cluster labels in different results one by one, and provides an effective new idea for solving the problem of inconsistent division of cluster integration in the consensus layer.

And S4, respectively establishing a regression model for the obtained different clusters, and performing high-entropy alloy hardness prediction calculation.

The regression model in S4 is a linear SVR model.

In order to verify the accuracy of the method, aiming at a high-entropy alloy data set containing 601 sample points, the sample point characteristic parameter types are as follows: phase parameters, mechanical parameters, processing preparation parameters and element component molar ratio parameters. The phase parameters comprise valence electron concentration, electronegativity difference, atomic radius difference, mixing enthalpy, mixing entropy, electron concentration and cohesive energy; the mechanical parameters comprise work function, modulus mismatch, shear modulus difference, shear modulus and melting point; the processing preparation parameters comprise casting state, additive manufacturing, powder metallurgy, work hardening and homogenization; the elemental constituent molar ratio parameters include the molar ratios of lithium, magnesium, aluminum, silicon, scandium, titanium, vanadium, chromium, manganese, iron, nickel, cobalt, copper, zinc, zirconium, niobium, molybdenum, tin, hafnium, tantalum, and tungsten.

And respectively adopting a linear SVR model, a BCESF-SC + linear SVR model, a BCESF-DC + linear SVR model and the method of the invention to predict the hardness of the sample points in the data set. The SVR model is to directly adopt an SVR algorithm to carry out high-entropy alloy hardness prediction on a data set; the BCESF-SC + linear SVR model is characterized in that a spectral clustering basis consensus function (SC) is embedded into a selection strategy in the process of selecting the basis clustering member subsets to select the basis clustering member subsets, integrated heterogeneous clusters are finally obtained, then SVR regression models are respectively established for the heterogeneous clusters obtained, and high-entropy alloy hardness prediction calculation is carried out; the BCESF-DC + linear SVR model is characterized in that a density peak value clustering base consensus strategy (DC) is embedded into a selection strategy in the base clustering member subset selection process to perform base clustering member subset selection, integrated heterogeneous clusters are finally obtained, then SVR regression models are respectively established for the heterogeneous clusters obtained, and high-entropy alloy hardness prediction calculation is performed; the prediction results of each model are shown in fig. 3 and 4.

From the comparison result graph, it can be seen very intuitively that under different distribution ratios of the training set and the test set, the prediction capability of the method of the invention is compared with that of the original SVR modelWith a large increase in R ² A boost of around 24% (0.247, 0.235) has been successfully achieved. In addition, compared with BCESF-SC + linear SVR model and BCESF-DC + linear SVR model R, the method of the invention ² There is also a lift of around 3% (0.042, 0.038, 0.031, 0.024). It is worth noting that the BCESF-SC + linear SVR model and the BCESF-DC + linear SVR model are also proposed for the first time in the invention to improve the hardness prediction capability of the high-entropy alloy, and the BCESF-SC + linear SVR model and the BCESF-DC + linear SVR model also have strong high-entropy alloy hardness prediction performance. The invention further fuses the consensus results of the BCESF-SC and the BCESF-DC through a better method, so that the final prediction result breaks through higher requirements and has more enhanced high-entropy alloy hardness prediction performance. The method has universality, and when similar problems are encountered, the method can be considered to be combined with other regression models so as to fundamentally improve the prediction capability of the models.

Although the present disclosure has been described with reference to the above embodiments, the scope of the present disclosure is not limited thereto. Various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and these changes and modifications are intended to fall within the scope of the disclosure.

Claims

1. A high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies is characterized by comprising the following steps of:

s1, a base clustering generation process;

for a high entropy alloy dataset X = { X = ₁ ,x ₂ ,…,x _N }∈R ^h X is a high-entropy alloy sample point, h is a characteristic dimension of each sample point, M basis clustering results are generated by using a clustering algorithm, and a basis clustering combination pi = { pi = pi ¹ ,π ² ,…,π ^M ]；

S2, selecting a subset of members of the base cluster;

embedding a given consensus function into a selection strategy, and calculating a base cluster combination pi = { pi ] under the given consensus strategy ¹ ,π ² ,…,π ^M The consensus of the base cluster set, and remove the noise in the base cluster setMember, obtaining the best clustering subset pi ^* ＝{π ^1* ,π ^2* ,…,π ^L* },L≤M；

S3, consensus clustering process;

fusing the optimal consensus results obtained under the consensus functions by adopting an adjustable DS evidence theory based on the optimal clustering subset obtained in the S2 to obtain final division results of the different clusters;

2. The method according to claim 1, characterized in that said S1 comprises the following process:

utilization of local density rho by density peak clustering algorithm _i And a relative distance δ _i Screening out possible candidate clustering centers C _P :

Setting the random initialization range of the cluster number as

wherein ,|C_P L is the set C _P The number of elements in (1);

respectively generating M/2 base clustering results by adopting a fuzzy C mean value algorithm and a density peak value clustering algorithm to obtain a base clustering combination pi = { pi = ¹ ,π ² ,…,π ^M ]。

3. The method according to claim 2, characterized in that said S2 comprises the following procedure:

s21, pi = { pi ] based on base clustering combination ¹ ,π ² ,…,π ^M And calculating the base cluster combination under the given consensus strategyCalculating the normalized mutual information of the consensus result, namely an NMI value;

s22, clustering combination pi = { pi at base ¹ ,π ² ,…,π ^M On the basis of the method, respectively and independently calculating an NMI value which is combined by sequentially removing one base cluster, and then selecting an NMI value to reach the optimal combination containing M-1 base clusters;

4. The method according to claim 3, wherein the consensus strategy in S21 comprises a spectral clustering base consensus strategy and a density peak clustering base consensus strategy;

the spectrum clustering base consensus strategy is a modified similarity matrix S ^DIS As input, a modified similarity matrix S with the sample points as nodes is constructed ^DIS As a new undirected graph of an adjacency matrix between nodes

Wherein V = X is a node set composed of sample points,

is an edge set; in undirected graph

to undirected graph

The Laplace matrix is regularized:

wherein, I is an identity matrix, D belongs to R ^N×N Is a degree matrix, and any one element on the diagonal line

For regularization

Performing eigenvalue decomposition to obtain the minimum front C ^* The characteristic vector corresponding to each characteristic value; from this C ^* The characteristic vectors are developed according to the column standardization to form a new matrix F epsilon R ^N×C* (ii) a Finally, obtaining a consensus clustering result pi by using a K mean value clustering algorithm on the basis of the matrix F _SC Namely:

wherein ,

clustering the optimal basis obtained by embedding SC as consensus strategy into BCESF algorithmAnd (4) combining the members.

5. Method according to claim 4, characterized in that said modified similarity matrix S ^DIS The establishment process comprises the following steps:

6. The method of claim 4, wherein the density peak clustering base consensus strategy is modified distance matrix D in S21 ^SIM As input, based on the distance matrix D ^SIM Calculating the local density p _i ：

δ _max ＝max _j (d _i,j ) (11)

local density ρ obtained based on the above calculation _i And a relative distance δ _i Before selection of C ^* Each has a maximum gamma _i ＝ρ _i ·δ _i Sample points of values and marking them as cluster centers, where the local density p _i And a relative distance δ _i Satisfy the requirement of

And is

wherein ,

7. Method according to claim 6, characterized in that said distance matrix D ^SIM The establishment process comprises the following steps:

8. the method according to claim 1, wherein the calculation process of S3 is:

first, each sample point x is calculated _i K nearest neighbor NN of (2) _k (x _i ) Said NN _k (x _i ) The calculation formula of (2) is as follows:

wherein ,N_k (x _i ) Is the sample point x _i The kth neighbor of (1);

for the initial m _q (A _r ) Are weighted to obtain

By an adjustable coefficient w _q and m_q (A _r ) Determining, namely:

wherein ,

Class A is calculated by _r The confidence value of (c):

and finally, performing final distribution on the cluster label to which the sample point belongs according to the obtained confidence value result:

obtaining a fusion result pi based on a consensus strategy DSC _DSC Namely:

π _DSC ＝BCESF-DSC(Y ₁ ，Y ₂ ，...，Y _Q ) (23)。

9. the method of claim 1, wherein the regression model in S4 is a linear SVR model.

10. A high-entropy alloy hardness prediction system based on a double-granularity clustering integration algorithm of three consensus strategies is characterized by comprising a program module corresponding to the steps of any one of claims 1 to 9 and being used for executing the steps in the high-entropy alloy hardness prediction method based on the double-granularity clustering integration algorithm of the three consensus strategies.