CN115691700A - High-entropy alloy hardness prediction method based on double-granularity clustering integration algorithm of three consensus strategies - Google Patents

High-entropy alloy hardness prediction method based on double-granularity clustering integration algorithm of three consensus strategies Download PDF

Info

Publication number
CN115691700A
CN115691700A CN202211397847.9A CN202211397847A CN115691700A CN 115691700 A CN115691700 A CN 115691700A CN 202211397847 A CN202211397847 A CN 202211397847A CN 115691700 A CN115691700 A CN 115691700A
Authority
CN
China
Prior art keywords
clustering
consensus
base
cluster
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211397847.9A
Other languages
Chinese (zh)
Other versions
CN115691700B (en
Inventor
李述
单云霄
李帅
崔禹欣
李福祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202211397847.9A priority Critical patent/CN115691700B/en
Publication of CN115691700A publication Critical patent/CN115691700A/en
Application granted granted Critical
Publication of CN115691700B publication Critical patent/CN115691700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies, relates to the technical field of alloy hardness prediction, and aims to solve the problems that a single clustering method cannot be simultaneously applied to data sets with different distribution characteristics and a stable and uniform clustering effect cannot be achieved even under the same data distribution in the prior art. The invention leads the high entropy alloy data set X = { X = 1 ,x 2 ,...,x N }∈R h Method for generating base clustering combination pi = { pi by using clustering algorithm 1 ,π 2 ,...,π M }; embedding a given consensus function into a selection strategy, removing noise members in a base cluster set, calculating a consensus result of the base cluster combination, and fusing the consensus results obtained after removing the noise members by adopting an adjustable DS evidence theory consensus strategy to obtain the most common results of different clustersFinal dividing results; and respectively establishing a regression model for the different clusters to carry out hardness prediction calculation. The clustering method adopted by the invention can extract a plurality of pieces of basic clustering information, and realize a clustering result with better performance.

Description

High-entropy alloy hardness prediction method based on double-granularity clustering integration algorithm of three consensus strategies
Technical Field
The invention relates to the technical field of alloy hardness prediction, in particular to a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies.
Background
In recent years, under the guidance of a multi-component alloy design idea, researchers discover a novel metal material, namely a high-entropy alloy, which has both structural disorder and chemical disorder by changing and modulating the configuration entropy of an alloy system; the wear-resistant steel has high hardness, good wear resistance, excellent low-temperature fracture toughness, excellent magnetic performance and other excellent physical and mechanical properties. For such material design, conventional experiments or theoretical calculations consume a lot of time and raw materials, and the requirements for experimental equipment are high. In addition, when complex theoretical calculations are performed, the method using machine learning can effectively infer the relationship between the material characteristics and the target attributes by constructing a model without a large amount of time and money cost. However, sometimes for a given unknown high-entropy alloy data set, which contains alloy materials with different intrinsic properties and rules, if the alloy materials of the whole data set are put together to train a model for predicting hardness, it is difficult to obtain a more accurate prediction model.
Clustering is used as an analysis mode technology without prior knowledge, plays a key role in the process of exploring data internal structure information, and the prior patent is a high-entropy alloy hardness prediction method based on an improved density peak value clustering algorithm, and the patent application number is as follows: CN202210221449.5, through an improved density peak value clustering algorithm, can improve the model prediction ability, but still there is a problem that the method cannot be applied to data sets with different distribution characteristics at the same time due to the constraint of a limited applicable range, and a stable and uniform clustering effect cannot be achieved under the same data distribution; the method has the problems that the stability is sacrificed, the accuracy or the generalization capability is sacrificed, and the good clustering effect cannot be achieved on all the distribution type data. The existing research proves that member subset selection of the base cluster has a crucial influence on the final consensus clustering result, but the optimal clustering result can be obtained by fusing all member information, and the contribution of other high-quality members can be weakened by the participation of the noise members with poor quality in the base cluster, so that the integral level of the integration effect can be inhibited. The hidden danger can be avoided by adopting a cluster integration selection (CES) technology. However, CES technology has several obstacles to overcome and break through; first, existing selection strategies are too dependent on parameters and the structure of the data set itself, and adaptive selection strategies are lacking. Secondly, actual spatial position information among samples is ignored on the reconstructed cluster-to-cluster, sample-to-sample or cluster-to-sample relation matrix, so that the real relation among objects cannot be accurately described for analysis and one-sidedness of the relation matrix, and the final consensus result is influenced. Moreover, the lack of a global perspective addresses the problem of inconsistent partitioning between different consensus strategies. The DS evidence theory is used as an effective means for solving the problems of conflict and uncertainty, but the application of the DS evidence theory in cluster integration is usually concentrated on the level of fusing single cluster results, and the research is not carried out on the level of consensus strategy of a cluster integration selection framework. Moreover, when high conflict exists among evidences, the traditional DS evidence theory lacks robustness, so that the reliability of a fusion result is reduced; therefore, a higher dimensional viewing angle is required to integrate different consistency results.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
the single clustering method is adopted, because the constraint of the limited applicable range can not be simultaneously suitable for the data sets with different distribution characteristics, the stable and uniform clustering effect can not be achieved even under the same data distribution; either the stability is sacrificed, or the accuracy or the generalization capability is sacrificed, and meanwhile, the existing integrated clustering algorithm cannot effectively integrate the partitioning conflict generated among different consensus results; satisfactory clustering effect cannot be achieved;
the technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies, which is based on a backward clustering integration selection framework (BCESF), and comprises the following steps:
s1, a base clustering generation process;
for a high entropy alloy dataset X = { X = 1 ,x 2 ,…,x N }∈R h X is a high-entropy alloy sample point, h is a characteristic dimension of each sample point, M basis clustering results are generated by using a clustering algorithm, and a basis clustering combination pi = { pi = pi 12 ,…,π M };
S2, selecting a subset of members of the base cluster;
embedding a given consensus function into a selection strategy, and calculating a base cluster combination pi = { pi ] under the given consensus strategy 12 ,…,π M The consensus result of the base clustering set is obtained, and the noise members in the base clustering set are removed, so that the optimal clustering subset pi is obtained * ={π 1*2* ,…,π L* },L≤M;
S3, consensus clustering process;
fusing the optimal consensus results obtained under the consensus functions based on the optimal clustering subset obtained in the step S2 by adopting an adjustable DS evidence theory to obtain the final division results of the different clusters;
and S4, establishing a regression model for each obtained different cluster, and performing high-entropy alloy hardness prediction calculation.
Further, the S1 includes the following processes:
utilizing local density rho by density peak clustering algorithm i And a relative distance δ i Screening out possible candidate clustering centers C P :
Figure BDA0003934378900000021
Setting a random initialization range of a cluster number to
Figure BDA0003934378900000022
wherein ,|CP L is the set C P The number of elements in (1);
randomly deleting one of the high-correlation attribute pairs by adopting a Pearson correlation coefficient algorithm, and generating a base clustering result by adopting the remaining characteristics;
respectively generating M/2 base clustering results by adopting a fuzzy C mean value algorithm and a density peak value clustering algorithm to obtain a base clustering combination pi = { pi = 12 ,…,π M }。
Further, the S2 includes the following processes:
s21, pi = { pi) based on base cluster combination 12 ,…,π M Calculating the consensus result of the base clustering combination under a given consensus strategy, and calculating the normalized mutual information of the consensus result, namely an NMI value;
s22, clustering combination pi = { pi at base 12 ,…,π M On the basis of the method, respectively and independently calculating an NMI value which is combined by removing one base cluster in sequence, and then selecting an NMI value to achieve the optimal combination containing M-1 base clusters;
s23, based on the obtained combination containing M-1 base clusters, respectively and independently calculating the NMI value of the combination formed by removing one base cluster in sequence, and selecting the combination containing M-2 base clusters which enables the NMI value to reach the optimal value, so as to carry out iterative calculation until no base cluster which can be removed exists;
selecting the base cluster combination II with the highest NMI value score * ={π 1*2* ,…,π L* And L is less than or equal to M and is used as the optimal base cluster subset under the given consensus strategy.
Further, the consensus strategy in S21 includes a spectral clustering base consensus strategy and a density peak clustering base consensus strategy;
the spectral clustering base consensus strategy is a modified similarity matrix S DIS As input, a modified similarity matrix S with the sample points as nodes is constructed DIS As a new undirected graph of an adjacency matrix between nodes
Figure BDA0003934378900000031
Figure BDA0003934378900000032
Wherein V = X is a node set composed of sample points,
Figure BDA0003934378900000033
is an edge set; in undirected graph
Figure BDA0003934378900000034
In (1), similarity matrix S DIS Determining the weight of an edge, for a given node x i and xj The edge weight between the two is defined as:
Figure BDA0003934378900000035
to undirected graph
Figure BDA0003934378900000036
The Laplace matrix of (2) is regularized:
Figure BDA0003934378900000037
wherein, I is a unit matrix, D belongs to R N×N Is a degree matrix, and any one element on the diagonal line
Figure BDA0003934378900000038
For regularization
Figure BDA0003934378900000039
Performing eigenvalue decomposition to obtain the minimum front C * The characteristic vector corresponding to each characteristic value; from this C * The characteristic vectors are expanded according to the column standardization to form a new matrix
Figure BDA00039343789000000310
Finally, obtaining a consensus clustering result pi by using a K mean value clustering algorithm on the basis of the matrix F SC Namely:
Figure BDA00039343789000000311
wherein ,
Figure BDA00039343789000000312
embedding SC as consensus strategy into BCESF algorithm to obtain the optimal base cluster member combination.
Further, the modified similarity matrix S DIS The establishing process comprises the following steps:
Figure BDA0003934378900000041
Figure BDA0003934378900000042
Figure BDA0003934378900000043
wherein ,di,j Is a sample point x i and xj With min (d) and max (d) being the minimum and maximum values of the distance, respectively.
Further, the density peak clustering base consensus strategy in S21 is a modified distance matrix D SIM As input, based on the distance matrix D SIM Calculating the local density p i
Figure BDA0003934378900000044
wherein ,dc As the truncation distance, 1% of the distance in ascending order is usually taken2% position;
when x is i Relative distance delta for non-maximum local density points i By a distance x i Nearest sample point x j Determining:
Figure BDA0003934378900000045
when x is i The relative distance delta of the point of maximum local density i Is denoted as delta max Namely:
δ max =max j (d i,j ) (11)
local density ρ obtained based on the above calculation i And relative distance delta i Before selection of C * Each has a maximum gamma i =ρ i ·δ i Sample points of values and marking them as cluster centers, where the local density p i And a relative distance δ i Satisfy the requirement of
Figure BDA0003934378900000046
And is
Figure BDA0003934378900000047
Finally, distributing each residual non-central point to the point closest to the non-central point as the same cluster to obtain a consensus clustering result pi DC Namely:
Figure BDA0003934378900000048
wherein ,
Figure BDA0003934378900000049
embedding the DC as a consensus strategy into the optimal base cluster member combination obtained by the BCESF algorithm.
Further, the distance matrix D SIM The establishment process comprises the following steps:
Figure BDA00039343789000000410
Figure BDA00039343789000000411
further, the calculation process of S3 is:
first, each sample point x is calculated i K nearest neighbor NN of (2) k (x i ) Said NN k (x i ) The calculation formula of (c) is:
Figure BDA0003934378900000051
wherein ,Nk (x i ) Is the sample point x i The kth neighbor of (1);
NN-based k (x i ) And the qth clustering integration algorithm Y q Calculating a sample point x i Elementary probability value m belonging to cluster label r q (A r ) The initial values of (a) are:
Figure BDA0003934378900000052
wherein, | r (x) j ) I is the sample point x i K is the number of elements in the neighbor belonging to the cluster label r;
for the initial m q (A r ) Are weighted to obtain
Figure BDA0003934378900000053
Figure BDA00039343789000000510
By an adjustable coefficient w q and mq (A r ) Determining, namely:
Figure BDA0003934378900000054
wherein ,
Figure BDA0003934378900000055
Figure BDA0003934378900000056
as shown in equation (20), the Q consensus results are fused to obtain a fused result m (A) r ):
Figure BDA0003934378900000057
Class A is calculated by r Confidence value of (d):
Figure BDA0003934378900000058
and finally, performing final distribution on the cluster labels to which the sample points belong according to the obtained confidence value result:
Figure BDA0003934378900000059
obtaining a fusion result pi based on a consensus strategy DSC DSC Namely:
π DSC =BCESF-DSC(Y 1 ,Y 2 ,…,Y Q ) (23)。
further, the regression model in S4 is a linear SVR model.
A high-entropy alloy hardness prediction system based on a dual-granularity cluster integration algorithm of three consensus strategies is provided with a program module corresponding to the steps of any one of the technical schemes, and the steps in the high-entropy alloy hardness prediction method based on the dual-granularity cluster integration algorithm of the three consensus strategies are executed during running.
Compared with the prior art, the invention has the beneficial effects that:
the invention discloses a high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies, which is characterized in that a BCESF method without presetting a parameter threshold or human intervention is designed, three different consensus strategies SC, DC and DSC are adopted, the consensus strategies SC and DC simultaneously consider the internal relation between the co-occurrence frequency and the actual spatial position information and take the reconstructed relation matrix as input, more practical data structure information can be mined, and the consensus strategy DSC adopts an improved adjustable DS evidence theory to fuse the consensus results of the consensus strategies SC and DC on an integrated integration layer. The consensus strategy based on the adjustable DS evidence theory not only has the characteristic of self-adaptive adjustment of the label probability, so that the consensus strategy can be automatically adjusted along with the change of a data set structure and an integration means, but also has better conflict resolution capability than the traditional DS evidence theory, thereby further obtaining a consensus result with higher confidence coefficient. According to the method, a plurality of pieces of basic clustering information are extracted, and three consensus strategies are designed on the double-granularity level to accurately capture the information of the hidden complicated structure so as to obtain a final clustering result with better performance.
Drawings
FIG. 1 is a flowchart of a high-entropy alloy hardness prediction method based on a dual-granularity clustering integration algorithm of three consensus strategies in an embodiment of the invention;
FIG. 2 is a schematic diagram of an adjustable DS evidence theory model fusion in an embodiment of the present invention;
FIG. 3 is a comparison graph of hardness prediction results of multiple models in the embodiment of the present invention, wherein the linear SVR model, the BCESF-SC + linear SVR model, the BCESF-DC + linear SVR model, and the fitting condition of the average prediction result and the experimental result of the method of the present invention, which are run 30 times under 80% of training set and 20% of testing set, are sequentially shown from top to bottom;
fig. 4 is a comparison diagram of a plurality of model hardness prediction results in the embodiment of the present invention, wherein a linear SVR model, a BCESF-SC + linear SVR model, a BCESF-DC + linear SVR model, and a fitting condition of an average prediction result and an experimental result of the method of the present invention, which is run 30 times under 70% of a training set and 30% of a testing set, are sequentially provided from top to bottom.
Detailed Description
In the description of the present invention, it should be noted that the terms "first", "second" and "third" mentioned in the embodiments of the present invention are only used for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, features defined as "first", "second", and "third" may explicitly or implicitly include one or more of the features.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1 to 4, the invention provides a high-entropy alloy hardness prediction method based on a dual-granularity clustering integration algorithm of three consensus strategies, as shown in fig. 1, comprising the following steps:
s1, a base clustering generation process;
for a high entropy alloy dataset X = { X = 1 ,x 2 ,…,x N }∈R h X is a high-entropy alloy sample point, and h is a characteristic dimension of each sample point;
the local density ρ is calculated by the following formula i And relative distance delta i
Figure BDA0003934378900000071
Figure BDA0003934378900000072
δ max =max j (d i,j ) (3)
Utilizing local density rho through a density peak value clustering algorithm i And relative distance delta i Screening out possible candidate clustering centers C P :
Figure BDA0003934378900000073
Setting the random initialization range of the cluster number as
Figure BDA0003934378900000074
wherein ,|CP L is the set C P Number of elements in (1).
In a conventional manner ([ c,2 c)]Or
Figure BDA0003934378900000075
) When the random generation range of the cluster number is set, if the difference between the right boundary and the real cluster number is too large, a base cluster which is seriously deviated from the actual base cluster may be generated, thereby affecting the final integration effect. Therefore, a more reasonable right boundary value is determined by the method of the embodiment.
Randomly deleting one of the high-correlation attribute pairs by adopting a pearson correlation coefficient algorithm, wherein the high-correlation attribute pair meets the requirement that the absolute value of the correlation is a threshold value of the absolute value of the correlation, and generating a base clustering result by adopting the rest characteristics;
respectively generating M/2 base clustering results by adopting a fuzzy C mean value algorithm and a density peak value clustering algorithm to obtain a base clustering combination pi = { pi = 12 ,…,π M }. Generating the base clusters under the two distinct and complementary partitioning modes can better achieve the balance of quality and diversity.
The method can avoid the occurrence of over-extreme members in the generation process of the base clusters, realize the optimal balance between quality and diversity and make solid foundation preparation for the implementation of subsequent steps.
S2, selecting a subset of members of the base cluster;
embedding a given consensus function into a selection strategy, and calculating a base cluster combination pi = { pi ] under the given consensus strategy 12 ,…,π M The consensus result of the base clustering set is obtained, and the noise members in the base clustering set are removed, so that the optimal clustering subset pi is obtained * ={π 1*2* ,…,π L* },L≤M。
The S2 comprises the following processes:
s21, pi = { pi) based on base cluster combination 12 ,…,π M Calculating the consensus result of the base clustering combination under a given consensus strategy, and calculating the normalized mutual information of the consensus result, namely an NMI value;
s22, clustering combination pi = { pi ] at base 12 ,…,π M On the basis of the method, respectively and independently calculating an NMI value which is combined by sequentially removing one base cluster, and then selecting an NMI value to reach the optimal combination containing M-1 base clusters;
s23, based on the obtained combination comprising M-1 base clusters, respectively and independently calculating the NMI value of the combination formed by removing one base cluster in sequence, and selecting the combination which enables the NMI value to reach the optimal value and comprises M-2 base clusters, so as to carry out iterative calculation until no base cluster which can be removed exists;
selecting the base cluster combination II with the highest NMI value score * ={π 1*2* ,…,π L* And L is less than or equal to M and is used as the optimal base cluster subset under the given consensus strategy.
In the embodiment, the given consensus function is embedded into the selection strategy, and the final basis cluster combination is determined in an iterative manner, so that noise members can be eliminated on the premise of not introducing additional parameters, and the consensus quality of the basis cluster combination can be improved.
The consensus strategy in S21 comprises a spectral clustering base consensus Strategy (SC) and a density peak value clustering base consensus strategy (DC);
the spectral clustering base consensus Strategy (SC) is based on a modified similarity matrix S DIS As input, a modified similarity matrix S with sample points as nodes is constructed DIS New undirected graph for adjacency matrix between nodes
Figure BDA0003934378900000081
Figure BDA0003934378900000082
Wherein V = X is a node set composed of sample points,
Figure BDA0003934378900000083
Is an edge set; in undirected graph
Figure BDA0003934378900000084
In (1), similarity matrix S DIS Determining the weight of an edge, for a given node x i and xj The edge weight between the two is defined as:
Figure BDA0003934378900000085
to undirected graph
Figure BDA0003934378900000086
The Laplace matrix of (2) is regularized:
Figure BDA0003934378900000087
wherein, I is a unit matrix, D belongs to R N×N Is a degree matrix, and any one element on the diagonal line
Figure BDA0003934378900000088
For regularization
Figure BDA0003934378900000089
Performing eigenvalue decomposition to obtain the minimum front C * Feature vectors corresponding to the feature values, where the number of clusters C into which the target data set X is ultimately divided * Is required to be preset; from this C * The characteristic vector is expanded according to the column standardization to form a new matrix
Figure BDA00039343789000000810
Finally, obtaining a consensus clustering result pi by using a K mean value clustering algorithm on the basis of the matrix F SC Namely:
Figure BDA00039343789000000811
wherein ,
Figure BDA00039343789000000812
embedding SC as consensus strategy into BCESF algorithm to obtain the optimal base cluster member combination.
The modified similarity matrix S DIS The establishing process comprises the following steps:
Figure BDA00039343789000000813
Figure BDA00039343789000000814
Figure BDA00039343789000000815
wherein ,di,j Is a sample point x i and xj In between, min (d) and max (d) are the minimum and maximum values of the distances, respectively.
The density peak clustering basis consensus strategy (DC) in S21 is modified distance matrix D SIM As input, based on the distance matrix D SIM Calculating the local density p i
Figure BDA0003934378900000091
wherein ,dc For the truncation distance, the position of 1% -2% of the distance in ascending order is generally taken;
when x is i Relative distance delta for non-maximum local density points i By a distance x i Nearest sample point x j Determining:
Figure BDA0003934378900000092
when x is i The relative distance delta of the point of maximum local density i Is denoted as delta max Namely:
δ max =max j (d i,j ) (14)
local density ρ obtained based on the above calculation i And relative distance delta i Before selection of C * Each has a maximum gamma i =ρ i ·δ i Sample points of values and marking them as cluster centers, where the local density ρ i And a relative distance δ i Satisfy the requirement of
Figure BDA0003934378900000093
And is
Figure BDA0003934378900000094
Finally, distributing each residual non-central point to the point closest to the non-central point as the same cluster to obtain a consensus clustering result pi DC Namely:
Figure BDA0003934378900000095
wherein ,
Figure BDA0003934378900000096
embedding the DC as a consensus strategy into the optimal base cluster member combination obtained by the BCESF algorithm.
The distance matrix D SIM The establishment process comprises the following steps:
Figure BDA0003934378900000097
Figure BDA0003934378900000098
in the cluster integration selection problem, the use of the transmission is often usedThe unified co-incidence matrix is used as an input of the consensus function to reflect the similarity relation between the sample pairs, i.e. pi = { pi ] for a given set of basis cluster members 12 ,…,π M The set of clusters of all base clusters in pi is
Figure BDA0003934378900000099
Co-correlation matrix a = { a = ij } N×N Then the degree of similarity between the two samples is indicated, a ij The larger the illustrated sample point x i And x j And are divided into the same cluster in more base clusters, and the expression is as follows:
Figure BDA00039343789000000910
Figure BDA00039343789000000911
it can be seen from the calculation method, the co-occurrence of the sample pairs in each base cluster is simply counted, and the attraction difference between different sample pairs is ignored. The distance of the actual distance even in the same cluster has a non-trivial impact on the degree of similarity between pairs of samples.
In view of this, the present embodiment employs two modified relationship matrices to more fully capture the co-occurrence relationship between pairs of samples; they not only represent the co-occurrence frequency of the sample pairs in the macroscopic view, but also take into account the local spatial position information in the microscopic view. The two are fully fused and corrected with each other, the internal relation of the sample pair hidden in the depth is excavated from more diversified angles, and more accurate and more practical input information is provided for the following consensus strategy.
S3, consensus clustering process;
as shown in fig. 2, the optimal consensus results obtained under the consensus functions are fused based on the optimal clustering subset obtained in S2 by using an adjustable DS evidence theory to obtain the final partitioning results of the different clusters; the calculation process is as follows:
first, each sample point x is calculated i K nearest neighbor NN of (2) k (x i ) Said NN k (x i ) The calculation formula of (c) is:
Figure BDA0003934378900000101
wherein ,Nk (x i ) Is the sample point x i The kth neighbor of (1);
NN-based k (x i ) And the qth clustering integration algorithm Y q Calculating a sample point x i Elementary probability value m belonging to cluster label r q (A r ) The initial values of (a) are:
Figure BDA0003934378900000102
wherein, | r (x) j ) I is the sample point x i K is the number of elements belonging to the cluster label r in the neighbor;
obviously, m q (A r ) The basic probability that any sample point belongs to any cluster label can be effectively represented by the label distribution in the statistical neighborhood.
For the initial m q (A r ) Are weighted to obtain
Figure BDA0003934378900000103
Figure BDA00039343789000001010
By an adjustable coefficient w q and mq (A r ) Determining, namely:
Figure BDA0003934378900000104
wherein ,
Figure BDA0003934378900000105
Figure BDA0003934378900000106
as shown in equation (25), the consensus results of Q species (i.e., the above two BCESF-SC and BCESF-DC) are fused to obtain a fusion result m (A) r ):
Figure BDA0003934378900000107
Class A is calculated by r The confidence value of (c):
Figure BDA0003934378900000108
finally, the cluster label to which the sample point belongs is finally distributed according to the obtained confidence value result, namely the cluster label with the highest confidence value is the sample point x i The cluster in which:
Figure BDA0003934378900000109
obtaining a fusion result pi based on a consensus strategy DSC DSC Namely:
π DSC =BCESF-DSC(Y 1 ,Y 2 ,…,Y Q ) (28)。
the problem of inconsistent division results still exists after the target data set is processed by various clustering integration algorithms; therefore, the embodiment adopts a higher-dimensional view to globally integrate different consistency results. The common identification results obtained by the BCESF-SC and BCESF-DC algorithms have the same cluster number C * The BCESF-DSC adopts the method of maximum inter-cluster intersection to correspond the cluster labels in different results one by one, and provides an effective new idea for solving the problem of inconsistent division of cluster integration in the consensus layer.
And S4, respectively establishing a regression model for the obtained different clusters, and performing high-entropy alloy hardness prediction calculation.
The regression model in S4 is a linear SVR model.
In order to verify the accuracy of the method, aiming at a high-entropy alloy data set containing 601 sample points, the sample point characteristic parameter types are as follows: phase parameters, mechanical parameters, processing preparation parameters and element component molar ratio parameters. The phase parameters comprise valence electron concentration, electronegativity difference, atomic radius difference, mixing enthalpy, mixing entropy, electron concentration and cohesive energy; the mechanical parameters comprise work function, modulus mismatch, shear modulus difference, shear modulus and melting point; the processing preparation parameters comprise casting state, additive manufacturing, powder metallurgy, work hardening and homogenization; the elemental constituent molar ratio parameters include the molar ratios of lithium, magnesium, aluminum, silicon, scandium, titanium, vanadium, chromium, manganese, iron, nickel, cobalt, copper, zinc, zirconium, niobium, molybdenum, tin, hafnium, tantalum, and tungsten.
And respectively adopting a linear SVR model, a BCESF-SC + linear SVR model, a BCESF-DC + linear SVR model and the method of the invention to predict the hardness of the sample points in the data set. The SVR model is to directly adopt an SVR algorithm to carry out high-entropy alloy hardness prediction on a data set; the BCESF-SC + linear SVR model is characterized in that a spectral clustering basis consensus function (SC) is embedded into a selection strategy in the process of selecting the basis clustering member subsets to select the basis clustering member subsets, integrated heterogeneous clusters are finally obtained, then SVR regression models are respectively established for the heterogeneous clusters obtained, and high-entropy alloy hardness prediction calculation is carried out; the BCESF-DC + linear SVR model is characterized in that a density peak value clustering base consensus strategy (DC) is embedded into a selection strategy in the base clustering member subset selection process to perform base clustering member subset selection, integrated heterogeneous clusters are finally obtained, then SVR regression models are respectively established for the heterogeneous clusters obtained, and high-entropy alloy hardness prediction calculation is performed; the prediction results of each model are shown in fig. 3 and 4.
From the comparison result graph, it can be seen very intuitively that under different distribution ratios of the training set and the test set, the prediction capability of the method of the invention is compared with that of the original SVR modelWith a large increase in R 2 A boost of around 24% (0.247, 0.235) has been successfully achieved. In addition, compared with BCESF-SC + linear SVR model and BCESF-DC + linear SVR model R, the method of the invention 2 There is also a lift of around 3% (0.042, 0.038, 0.031, 0.024). It is worth noting that the BCESF-SC + linear SVR model and the BCESF-DC + linear SVR model are also proposed for the first time in the invention to improve the hardness prediction capability of the high-entropy alloy, and the BCESF-SC + linear SVR model and the BCESF-DC + linear SVR model also have strong high-entropy alloy hardness prediction performance. The invention further fuses the consensus results of the BCESF-SC and the BCESF-DC through a better method, so that the final prediction result breaks through higher requirements and has more enhanced high-entropy alloy hardness prediction performance. The method has universality, and when similar problems are encountered, the method can be considered to be combined with other regression models so as to fundamentally improve the prediction capability of the models.
Although the present disclosure has been described with reference to the above embodiments, the scope of the present disclosure is not limited thereto. Various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and these changes and modifications are intended to fall within the scope of the disclosure.

Claims (10)

1. A high-entropy alloy hardness prediction method based on a double-granularity clustering integration algorithm of three consensus strategies is characterized by comprising the following steps of:
s1, a base clustering generation process;
for a high entropy alloy dataset X = { X = 1 ,x 2 ,…,x N }∈R h X is a high-entropy alloy sample point, h is a characteristic dimension of each sample point, M basis clustering results are generated by using a clustering algorithm, and a basis clustering combination pi = { pi = pi 12 ,…,π M ];
S2, selecting a subset of members of the base cluster;
embedding a given consensus function into a selection strategy, and calculating a base cluster combination pi = { pi ] under the given consensus strategy 12 ,…,π M The consensus of the base cluster set, and remove the noise in the base cluster setMember, obtaining the best clustering subset pi * ={π 1*2* ,…,π L* },L≤M;
S3, consensus clustering process;
fusing the optimal consensus results obtained under the consensus functions by adopting an adjustable DS evidence theory based on the optimal clustering subset obtained in the S2 to obtain final division results of the different clusters;
and S4, respectively establishing a regression model for the obtained different clusters, and performing high-entropy alloy hardness prediction calculation.
2. The method according to claim 1, characterized in that said S1 comprises the following process:
utilization of local density rho by density peak clustering algorithm i And a relative distance δ i Screening out possible candidate clustering centers C P :
Figure FDA0003934378890000011
Setting the random initialization range of the cluster number as
Figure FDA0003934378890000012
wherein ,|CP L is the set C P The number of elements in (1);
randomly deleting one of the high-correlation attribute pairs by adopting a Pearson correlation coefficient algorithm, and generating a base clustering result by adopting the remaining characteristics;
respectively generating M/2 base clustering results by adopting a fuzzy C mean value algorithm and a density peak value clustering algorithm to obtain a base clustering combination pi = { pi = 12 ,…,π M ]。
3. The method according to claim 2, characterized in that said S2 comprises the following procedure:
s21, pi = { pi ] based on base clustering combination 12 ,…,π M And calculating the base cluster combination under the given consensus strategyCalculating the normalized mutual information of the consensus result, namely an NMI value;
s22, clustering combination pi = { pi at base 12 ,…,π M On the basis of the method, respectively and independently calculating an NMI value which is combined by sequentially removing one base cluster, and then selecting an NMI value to reach the optimal combination containing M-1 base clusters;
s23, based on the obtained combination containing M-1 base clusters, respectively and independently calculating the NMI value of the combination formed by removing one base cluster in sequence, and selecting the combination containing M-2 base clusters which enables the NMI value to reach the optimal value, so as to carry out iterative calculation until no base cluster which can be removed exists;
selecting the base cluster combination II with the highest NMI value score * ={π 1*2* ,…,π L* And L is less than or equal to M and is used as the optimal base cluster subset under the given consensus strategy.
4. The method according to claim 3, wherein the consensus strategy in S21 comprises a spectral clustering base consensus strategy and a density peak clustering base consensus strategy;
the spectrum clustering base consensus strategy is a modified similarity matrix S DIS As input, a modified similarity matrix S with the sample points as nodes is constructed DIS As a new undirected graph of an adjacency matrix between nodes
Figure FDA0003934378890000021
Figure FDA0003934378890000022
Wherein V = X is a node set composed of sample points,
Figure FDA0003934378890000023
is an edge set; in undirected graph
Figure FDA0003934378890000024
In (1), similarity matrix S DIS Determining the weight of an edge, for a given node x i and xj The edge weight between the two is defined as:
Figure FDA0003934378890000025
to undirected graph
Figure FDA0003934378890000026
The Laplace matrix is regularized:
Figure FDA0003934378890000027
wherein, I is an identity matrix, D belongs to R N×N Is a degree matrix, and any one element on the diagonal line
Figure FDA0003934378890000028
For regularization
Figure FDA0003934378890000029
Performing eigenvalue decomposition to obtain the minimum front C * The characteristic vector corresponding to each characteristic value; from this C * The characteristic vectors are developed according to the column standardization to form a new matrix F epsilon R N×C* (ii) a Finally, obtaining a consensus clustering result pi by using a K mean value clustering algorithm on the basis of the matrix F SC Namely:
Figure FDA00039343788900000210
wherein ,
Figure FDA00039343788900000211
clustering the optimal basis obtained by embedding SC as consensus strategy into BCESF algorithmAnd (4) combining the members.
5. Method according to claim 4, characterized in that said modified similarity matrix S DIS The establishment process comprises the following steps:
Figure FDA00039343788900000212
Figure FDA00039343788900000213
Figure FDA00039343788900000214
wherein ,di,j Is a sample point x i and xj In between, min (d) and max (d) are the minimum and maximum values of the distances, respectively.
6. The method of claim 4, wherein the density peak clustering base consensus strategy is modified distance matrix D in S21 SIM As input, based on the distance matrix D SIM Calculating the local density p i
Figure FDA00039343788900000215
wherein ,dc For the truncation distance, the position of 1% -2% of the distance in ascending order is generally taken;
when x is i Relative distance delta for non-maximum local density points i By a distance x i Nearest sample point x j Determining:
Figure FDA0003934378890000031
when x is i The relative distance delta of the point of maximum local density i Is denoted as delta max Namely:
δ max =max j (d i,j ) (11)
local density ρ obtained based on the above calculation i And a relative distance δ i Before selection of C * Each has a maximum gamma i =ρ i ·δ i Sample points of values and marking them as cluster centers, where the local density p i And a relative distance δ i Satisfy the requirement of
Figure FDA0003934378890000032
And is
Figure FDA0003934378890000033
Finally, distributing each residual non-central point to the point closest to the non-central point as the same cluster to obtain a consensus clustering result pi DC Namely:
Figure FDA0003934378890000034
wherein ,
Figure FDA0003934378890000035
embedding the DC as a consensus strategy into the optimal base cluster member combination obtained by the BCESF algorithm.
7. Method according to claim 6, characterized in that said distance matrix D SIM The establishment process comprises the following steps:
Figure FDA0003934378890000036
Figure FDA0003934378890000037
8. the method according to claim 1, wherein the calculation process of S3 is:
first, each sample point x is calculated i K nearest neighbor NN of (2) k (x i ) Said NN k (x i ) The calculation formula of (2) is as follows:
Figure FDA0003934378890000038
wherein ,Nk (x i ) Is the sample point x i The kth neighbor of (1);
NN-based k (x i ) And the qth clustering integration algorithm Y q Calculating a sample point x i Elementary probability value m belonging to cluster label r q (A r ) The initial values of (a) are:
Figure FDA0003934378890000039
wherein, | r (x) j ) I is the sample point x i K is the number of elements belonging to the cluster label r in the neighbor;
for the initial m q (A r ) Are weighted to obtain
Figure FDA00039343788900000310
Figure FDA00039343788900000311
By an adjustable coefficient w q and mq (A r ) Determining, namely:
Figure FDA00039343788900000312
wherein ,
Figure FDA00039343788900000313
Figure FDA00039343788900000314
as shown in equation (20), the Q consensus results are fused to obtain a fused result m (A) r ):
Figure FDA0003934378890000041
Class A is calculated by r The confidence value of (c):
Figure FDA0003934378890000042
and finally, performing final distribution on the cluster label to which the sample point belongs according to the obtained confidence value result:
Figure FDA0003934378890000043
obtaining a fusion result pi based on a consensus strategy DSC DSC Namely:
π DSC =BCESF-DSC(Y 1 ,Y 2 ,...,Y Q ) (23)。
9. the method of claim 1, wherein the regression model in S4 is a linear SVR model.
10. A high-entropy alloy hardness prediction system based on a double-granularity clustering integration algorithm of three consensus strategies is characterized by comprising a program module corresponding to the steps of any one of claims 1 to 9 and being used for executing the steps in the high-entropy alloy hardness prediction method based on the double-granularity clustering integration algorithm of the three consensus strategies.
CN202211397847.9A 2022-11-09 2022-11-09 High-entropy alloy hardness prediction method based on dual-granularity clustering integration algorithm of three consensus strategies Active CN115691700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211397847.9A CN115691700B (en) 2022-11-09 2022-11-09 High-entropy alloy hardness prediction method based on dual-granularity clustering integration algorithm of three consensus strategies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211397847.9A CN115691700B (en) 2022-11-09 2022-11-09 High-entropy alloy hardness prediction method based on dual-granularity clustering integration algorithm of three consensus strategies

Publications (2)

Publication Number Publication Date
CN115691700A true CN115691700A (en) 2023-02-03
CN115691700B CN115691700B (en) 2023-05-02

Family

ID=85050421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211397847.9A Active CN115691700B (en) 2022-11-09 2022-11-09 High-entropy alloy hardness prediction method based on dual-granularity clustering integration algorithm of three consensus strategies

Country Status (1)

Country Link
CN (1) CN115691700B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434880A (en) * 2023-03-06 2023-07-14 哈尔滨理工大学 High-entropy alloy hardness prediction method based on fuzzy self-consistent clustering integration

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8195734B1 (en) * 2006-11-27 2012-06-05 The Research Foundation Of State University Of New York Combining multiple clusterings by soft correspondence
US20150178446A1 (en) * 2013-12-18 2015-06-25 Pacific Biosciences Of California, Inc. Iterative clustering of sequence reads for error correction
US20170161606A1 (en) * 2015-12-06 2017-06-08 Beijing University Of Technology Clustering method based on iterations of neural networks
CN107169511A (en) * 2017-04-27 2017-09-15 华南理工大学 Clustering ensemble method based on mixing clustering ensemble selection strategy
CN112232383A (en) * 2020-09-27 2021-01-15 江南大学 Integrated clustering method based on super-cluster weighting
CN113222027A (en) * 2021-05-19 2021-08-06 哈尔滨理工大学 Self-adaptive clustering center density peak value clustering algorithm based on weighted shared nearest neighbor
CN114613456A (en) * 2022-03-07 2022-06-10 哈尔滨理工大学 High-entropy alloy hardness prediction method based on improved density peak value clustering algorithm
CN114663770A (en) * 2022-04-12 2022-06-24 聊城大学 Hyperspectral image classification method and system based on integrated clustering waveband selection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8195734B1 (en) * 2006-11-27 2012-06-05 The Research Foundation Of State University Of New York Combining multiple clusterings by soft correspondence
US20150178446A1 (en) * 2013-12-18 2015-06-25 Pacific Biosciences Of California, Inc. Iterative clustering of sequence reads for error correction
US20170161606A1 (en) * 2015-12-06 2017-06-08 Beijing University Of Technology Clustering method based on iterations of neural networks
CN107169511A (en) * 2017-04-27 2017-09-15 华南理工大学 Clustering ensemble method based on mixing clustering ensemble selection strategy
CN112232383A (en) * 2020-09-27 2021-01-15 江南大学 Integrated clustering method based on super-cluster weighting
CN113222027A (en) * 2021-05-19 2021-08-06 哈尔滨理工大学 Self-adaptive clustering center density peak value clustering algorithm based on weighted shared nearest neighbor
CN114613456A (en) * 2022-03-07 2022-06-10 哈尔滨理工大学 High-entropy alloy hardness prediction method based on improved density peak value clustering algorithm
CN114663770A (en) * 2022-04-12 2022-06-24 聊城大学 Hyperspectral image classification method and system based on integrated clustering waveband selection

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FUXIANG LI ET.AL: "A New Density Peak Clustering Algorithm Based on Cluster Fusion Strategy" *
YUNXIAO SHAN ET.AL: "A Density Peaks Clustering Algorithm With Sparse Search and K-d Tree" *
吕红伟;王士同;: "预测子空间聚类的聚类集成算法", 小型微型计算机系统 *
王留洋;俞扬信;陈伯伦;章慧;: "基于共识和分类改善文档聚类的识别信息方法" *
鲍舒婷;孙丽萍;郑孝遥;郭良敏;: "基于共享近邻相似度的密度峰聚类算法", 计算机应用 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434880A (en) * 2023-03-06 2023-07-14 哈尔滨理工大学 High-entropy alloy hardness prediction method based on fuzzy self-consistent clustering integration
CN116434880B (en) * 2023-03-06 2023-09-08 哈尔滨理工大学 High-entropy alloy hardness prediction method based on fuzzy self-consistent clustering integration

Also Published As

Publication number Publication date
CN115691700B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN114613456B (en) High-entropy alloy hardness prediction method based on improved density peak clustering algorithm
CN112926397B (en) SAR image sea ice type classification method based on two-round voting strategy integrated learning
CN108446619B (en) Face key point detection method and device based on deep reinforcement learning
CN112766425A (en) Deep missing clustering machine learning method and system based on optimal transmission
CN115691700A (en) High-entropy alloy hardness prediction method based on double-granularity clustering integration algorithm of three consensus strategies
CN110675912B (en) Gene regulation and control network construction method based on structure prediction
CN110097176A (en) A kind of neural network structure searching method applied to air quality big data abnormality detection
CN104021230B (en) Collaborative filtering method based on community discovery
CN111340069A (en) Incomplete data fine modeling and missing value filling method based on alternate learning
CN109636809B (en) Image segmentation level selection method based on scale perception
CN112001950A (en) Multi-target tracking algorithm based on target detection and feature extraction combined model
CN109711439A (en) A kind of extensive tourist's representation data clustering method in density peak accelerating neighbor seaching using Group algorithm
Ding et al. Histogram-based estimation of distribution algorithm: A competent method for continuous optimization
CN108549729B (en) Personalized user collaborative filtering recommendation method based on coverage reduction
CN108256564B (en) Self-adaptive template matching method and device based on distance measurement dissimilarity
Lazarevic et al. Clustering-regression-ordering steps for knowledge discovery in spatial databases
CN115394381A (en) High-entropy alloy hardness prediction method and device based on machine learning and two-step data expansion
CN113256645B (en) Color image segmentation method based on improved density clustering
CN116434880B (en) High-entropy alloy hardness prediction method based on fuzzy self-consistent clustering integration
CN103902982B (en) Target tracking method based on soft distribution BoF
CN114580492A (en) Cross-domain pedestrian re-identification method based on mutual learning
CN109389127A (en) Structuring multiple view Hessian regularization sparse features selection method
CN115018247A (en) Power transmission and transformation project evaluation method based on fuzzy hierarchical analysis and improved weighted combination
CN109785331B (en) Sonar image segmentation method based on self-adaptive pixel value constraint and MRF
CN110782949A (en) Multilayer gene weighting grouping method based on maximum minimum sequence search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant