CN111985823B - Crystal bar quality assessment method for roller mill orientation instrument - Google Patents

Crystal bar quality assessment method for roller mill orientation instrument Download PDF

Info

Publication number
CN111985823B
CN111985823B CN202010862885.1A CN202010862885A CN111985823B CN 111985823 B CN111985823 B CN 111985823B CN 202010862885 A CN202010862885 A CN 202010862885A CN 111985823 B CN111985823 B CN 111985823B
Authority
CN
China
Prior art keywords
sample
canopy
samples
cluster
crystal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010862885.1A
Other languages
Chinese (zh)
Other versions
CN111985823A (en
Inventor
关守平
王文奇
宋阳
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202010862885.1A priority Critical patent/CN111985823B/en
Publication of CN111985823A publication Critical patent/CN111985823A/en
Application granted granted Critical
Publication of CN111985823B publication Critical patent/CN111985823B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/248Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
    • G06V30/2504Coarse or fine approaches, e.g. resolution of ambiguities or multiscale approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides a crystal bar quality evaluation method for a rolling mill orientation instrument, and relates to the technical field of monocrystalline material processing. Firstly, establishing a sample set containing M crystal bar detection data, and dividing the sample set into a training set and a testing set; then adopting an improved Canopy algorithm and a K-means algorithm to accurately cluster the training set samples, and determining a cluster center of an improved Canopy-K-means model for crystal bar quality evaluation; the test set sample data are used for an improved Canopy-K-means model, and the evaluation of crystal bar quality is realized by calculating the distance between the test set sample and each clustering center; meanwhile, a k-NN algorithm is adopted to further perform clustering calculation on samples with the absolute value of the distance difference between the samples and the two clustering centers being smaller than a set threshold epsilon and training samples in the Canopy clusters where the two corresponding clustering centers are located, and quality assessment of the high-similarity crystal bars is completed.

Description

Crystal bar quality assessment method for roller mill orientation instrument
Technical Field
The invention relates to the technical field of monocrystalline material processing, in particular to a crystal bar quality evaluation method for a rolling mill orientation instrument.
Background
In the process of single crystal growth, various defects are inevitably generated due to the inherent limitations of the growth technology, and the quality of the processed crystal bars is also uneven. However, the quality of the ingot also directly influences the technical properties of the corresponding product, and in order to produce high quality ingots, researchers have demanded to continuously improve the crystal growth process and seek more efficient processing means. Currently common crystal bar processing methods include grinding, mechanical polishing, dry mechanochemical polishing, wet mechanochemical and chemico-mechanical polishing, hydration polishing, float polishing, and the like. But in addition to seeking more efficient processing methods, quality assessment of the ingot during processing is also a key issue that cannot be ignored.
The rolling mill orientation instrument is equipment integrating single crystal bar processing and orientation, and realizes the integration and integration of two technological processes of ingot excircle grinding and crystal bar orientation, thereby improving production efficiency and orientation precision. In general, the roll mill orientator performs the reference plane positioning processing after the orientation selection of the optimal crystal plane is completed. But now considered are the problems: since the quality index of each crystal face of the single crystal ingot is obtained by the foregoing process, it is possible to perform an overall evaluation of the ingot quality based on these indexes? Although deviations may occur in the quality assessment of the ingot based solely on the quality of the crystal planes, since after all the growth inside the ingot is not clear, the quality of the individual crystal planes also reflects to some extent the quality of the growth of the ingot.
In the process of orientation of the crystal bar, the quality of each crystal face of the crystal bar is evaluated by a barreling machine orientation instrument, and the crystal face direction with the best evaluation quality is selected for grinding. However, the overall quality evaluation of the ingot lacks a corresponding evaluation method, which is a function lacking in the existing roller mill orientation instrument.
Disclosure of Invention
The invention aims to solve the technical problem of providing a crystal bar quality evaluation method for a roller mill orientation instrument aiming at the defects of the prior art, and the evaluation of the crystal bar quality is realized on the basis of crystal face quality evaluation.
In order to solve the technical problems, the invention adopts the following technical scheme: a crystal bar quality assessment method for a tumbling mill orientation apparatus comprises the following steps;
step 1: establishing a sample set containing M crystal bar detection data, wherein each crystal bar detection data comprises detection grades of s crystal face qualities and corresponding scores;
the method for determining the detection grade and the corresponding score of the crystal face quality comprises the following steps:
in the grinding and orientation stage of the roller mill orientation instrument, the quality detection of the crystal face is divided into n grades, and if the first grade of the crystal face is grade A, the corresponding score is n; the crystal face grade is grade B, and the corresponding score is n-1; the rest grades and scores are analogized in turn;
step 2: dividing a sample set containing M crystal bar detection data into a training set and a testing set; the training set comprises N samples, and the test set comprises M-N samples;
step 3: performing coarse clustering on samples in the training set by adopting an improved Canopy algorithm;
step 3.1: centralizing trainingThe samples being arranged randomly, i.e. x= [ X 1 ,X 2 ,...,X i ,...,X N ]Wherein X is 1 ,X 2 ,...,X i ,...,X N For the sample data in the training set, N is the total number of samples in the training set, X i =[x i1 ,x i2 ,...,x ir ...,x is ]The feature vector of the ith sample in the training set X relative to the crystal face detection score of the crystal bar comprises s-dimension data points, namely that the crystal bar has s crystal faces, and X ir A detection score representing the r-th crystal plane of the i-th training sample; and the label Y= [ Y ] exists for each sample data in the training set 1 ,y 2 ,...,y i ,...,y N ]The data processing unit is used for representing the category to which each sample data in the training set belongs; two distance threshold values T are selected again 1 、T 2 And T is 1 >T 2
Step 3.2: randomly selecting one sample data X from a training set sample X i As the first Canopy center point, and sample data X i Delete from X;
step 3.3: selecting one sample data X from the training set sample X j (j.noteq.i), calculating X by the Bray-Curtis method j Minimum distance d to the center point of the created Canopy BCD And compares it with two distance thresholds T 1 、T 2 Comparison is performed:
(a) If d BCD ≤T 1 Give X j Weak flag indicating that it belongs to the current Canopy cluster, X will be j To the current Canopy cluster, X j Not deleted from training set X;
(b) If d BCD <T 2 Give X j Strong markers indicating that they belong to the current Canopy cluster and X will be j Deleting from the training set X;
(c) If d BCD >T 1 X is then j Not belonging to the current Canopy cluster, X j Form a new Canopy center point and X j Deleting from the training set X;
calculation of X by the Bray-Curtis method j To the center point X of the already generated Canopy i Is the minimum distance d of (2) BCD Is calculated as follows:
wherein x is jr And x ir Respectively are vectors X j Sum vector X i R=1, 2..s;
step 3.4: repeating the step 3.3 until the samples in the training set X are empty, and grouping the training set sample data into K Canopy clusters to obtain K cluster centers C 1 ,C 2 ,...,C k ,...,C K Wherein C k =[c k1 ,c k2 ,...,c kr ,...,c ks ]Each cluster center corresponds to one sample class;
step 4: on the basis of performing coarse clustering on training samples by an improved Canopy algorithm, performing accurate clustering on the training set samples by a K-means algorithm, and determining a clustering center of an improved Canopy-K-means model for crystal bar quality evaluation, wherein the specific method comprises the following steps:
step 4.1: the K Canopy centers are defined as initial cluster centers (C 1 (1) ,C 2 (1) ,...,C k (1) ,...,C K (1) );
Step 4.2: setting the current iteration number as t, and aiming at all samples X in the training set i Sequentially calculating to each cluster center C k (t) Euclidean distance of (2)The following formula is shown:
wherein Dist (i, k) is the ith sample X in the training set i =[x i1 ,x i2 ,...,x ir ...,x is ]To the kth aggregationClass center C k =[c k1 ,c k2 ,...,c kr ,...,c ks ]Is the euclidean distance of (2);
step 4.3: finding each sample point in the training set about each cluster center C k (t) And will correspond to the sample point X i Dividing into and clustering center C k (t) In Canopy with the smallest distance;
step 4.4: updating the center point of each Canopy cluster at the t+1st iteration, as shown in the following formula:
wherein n is k Represents the total number of training samples belonging to the kth Canopy cluster, X kl Represents the first sample data in the kth copy cluster, l=1, 2,.. k
Step 4.5: judging whether each cluster center converges, if so, satisfying C k (t+1) =C k (t) Stopping K-means algorithm iteration, determining a clustering center of an improved Canopy-K-means model for crystal bar quality evaluation, otherwise, repeatedly executing the steps 4.2-4.4;
step 5: using the test set sample data for the improved Canopy-K-means models, realizing the evaluation of the quality of the crystal bars by calculating the distance between the test set sample and the clustering center of each improved Canopy-K-means model, and calculating the evaluation average accuracy of the test set sample under the improved Canopy-K-means models;
calculating the distance between each sample in the test set and each cluster center of the improved Canopy-K-means model, finding out the minimum distance, classifying the test samples into the class of the cluster center corresponding to the minimum distance, wherein different classes represent different quality grades of the crystal bars;
the estimated average accuracy P of the test set samples under the modified Canopy-K-means model is calculated as shown in the following formula:
wherein k is the number of Canopy clusters, TP k For the correct number of classification of class k samples, TF k For the number of misclassifications of the kth sample, k=1, 2, Ω, K, total number of samples Total k The method meets the following conditions:
Total k =TP k +FP k (5)
step 6: judging whether a certain sample exists in the test set, wherein the absolute value of the distance difference between the certain sample and the two clustering centers is smaller than a set threshold epsilon, and if the absolute value of the distance difference is not smaller than the set threshold epsilon, the step 5 is the final evaluation result of the sample of the test set; if the test sample exists, further carrying out clustering calculation on the test sample and training samples in the Canopy cluster where the two corresponding clustering centers are located by adopting a k-NN algorithm, so as to finish the quality evaluation of the high-similarity crystal bars;
the specific method for performing clustering calculation on the test sample and training samples in the Canopy cluster where the two corresponding clustering centers are located by adopting the k-NN algorithm to finish the quality evaluation of the high-similarity crystal bar comprises the following steps:
(1) Sequentially calculating Euclidean distances between samples with absolute values of distance differences between the test set and the two clustering centers smaller than a set threshold epsilon and all training samples in a Canopy cluster where the two clustering centers are located;
(2) Sequencing all the Euclidean distances obtained by calculation in sequence according to increment;
(3) And selecting the first k 'samples with the smallest distance with the current test sample, and returning the category with the highest occurrence frequency of the first k' samples as the category of the current test sample to finish the quality evaluation of the test sample.
The method is based on a hybrid clustering algorithm, and performs quality evaluation on crystal bars with known multi-crystal face quality. And establishing a Canopy-K-means model by adopting an improved mixed algorithm combining Canopy and K-means, and further introducing a K-Nearest Neighbor (K-NN) algorithm to enable the rolling mill orientation instrument to complete quality assessment of the whole crystal bar.
The essence of the Canopy algorithm is a data processing process, and the clustering number K and the clustering center can be rapidly determined by using the process of coarse clustering of different crystal bar quality samples by the Canopy algorithm. The basic idea of the algorithm is that the distance between different samples is calculated firstly, and then the samples with higher similarity are put into a subset, which is called a Canopy cluster; and a plurality of Canopy clusters are obtained through a series of calculation, and the same sample data can be in the plurality of Canopy clusters, so that the situation that a certain sample does not belong to any Canopy cluster can not occur. However, the uncorrelation of the data sample points is neglected by adopting Euclidean distance calculation in the conventional Canopy algorithm, so that the Canopy algorithm of the invention adopts the Bray-Curtis distance to improve the uncorrelation; and then adopting a K-means algorithm to further accurately cluster the overlapping part of the improved Canopy algorithm clustering result. The K-means algorithm has the advantages of relatively simple principle and implementation, higher execution efficiency and stronger scalability of large data volume; there is a problem in that once isolated points and noise are selected as initial cluster centers, great problems are brought to the accuracy of the entire clustering process thereafter. Therefore, the improved Canopy algorithm is selected as an initial clustering center of the clustering technology, and the efficiency of the K-means clustering algorithm can be greatly optimized.
However, the absolute value of the difference between the distance between the sample to be measured and the two cluster centers is smaller than the set threshold epsilon, so that it is difficult to determine which category the samples specifically belong to. Therefore, on the basis of the algorithm, a k-NN algorithm is introduced, further clustering analysis is carried out on the algorithm, accuracy of crystal bar clustering division is improved, and quality assessment of the high-similarity crystal bars is further achieved.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the crystal bar quality assessment method for the roller mill orientation instrument, an improved Canopy clustering algorithm is provided according to the characteristics of crystal bar attributes, crystal bars are roughly classified, and the number of clustering clusters and the clustering center are determined; further, a K-means algorithm is adopted for accurate clustering, the problem of sample data overlapping is solved, and a crystal bar quality evaluation classification model is established; in the evaluation process, the k-NN algorithm is adopted to further evaluate samples which are close to the clustering centers and are difficult to evaluate in quality, and the high-similarity crystal bar quality evaluation work is completed. The method fills the gap of the crystal bar quality evaluation function of the roller mill orientation instrument, and lays a solid foundation for processing high-quality crystals and better putting the crystals into industrial production.
Drawings
FIG. 1 is a flow chart of a method for evaluating quality of an ingot for a rolling mill orientation apparatus according to an embodiment of the present invention;
FIG. 2 is a diagram of a sapphire crystal plane provided by an embodiment of the present invention;
FIG. 3 is a flowchart of coarse clustering of training set samples using a modified Canopy algorithm according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for accurately clustering training set samples by using a K-means algorithm according to an embodiment of the present invention;
FIG. 5 is a flowchart of a k-NN algorithm provided in an embodiment of the invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
In the embodiment, a sapphire crystal bar of a single crystal material is taken as an example, and the quality of the sapphire crystal bar is evaluated by adopting the crystal bar quality evaluation method for the barreling machine direction finder.
A crystal bar quality evaluation method for a tumbling mill orientation apparatus, as shown in fig. 1, comprises the following steps;
step 1: establishing a sample set containing M crystal bar detection data, wherein each crystal bar detection data comprises detection grades of s crystal face qualities and corresponding scores;
the method for determining the detection grade and the corresponding score of the crystal face quality comprises the following steps:
in the grinding and orientation stage of the roller mill orientation instrument, the quality detection of the crystal face is divided into n grades, and if the first grade of the crystal face is grade A, the corresponding score is n; the crystal face grade is grade B, and the corresponding score is n-1; the rest grades and scores are analogized in turn;
in this example, 630 ingot sample sets were selected, each ingot having 6 a-Plane facets for single crystal sapphire grown along the C-axis as shown in fig. 2, each ingot sample consisting of 6 dimension data points, so s=6. In the embodiment, after the quality detection of the crystal face A-Plane of the crystal bar is divided into 5 quality grades, the first grade of the crystal face is A grade, and the corresponding score is 5 grade; the grade of the crystal face is grade B, and the corresponding score is 4; the third class of crystal face is C grade, and the corresponding score is 3; the class D of the crystal face IV grade corresponds to a score of 2; the five-class crystal face product is E grade, and the corresponding score is 1 score. If the quality of all 6A-Plane crystal planes is A grade, the crystal bar is divided into 30 points, and if all 6A-Plane crystal planes are E grade, the crystal bar is divided into 6 points, so the score of each crystal bar is between 6 and 30 points. In this embodiment, the quality detection grades of the a-Plane crystal planes are collected clockwise from the orientation reference Plane, for example, the crystal Plane grade detection result of a sapphire crystal bar is [ a, E, B, C, a ], and the corresponding score vector is [5,5,1,4,3,5].
Step 2: dividing a sample set containing M crystal bar detection data into a training set and a testing set; the training set comprises N samples, and the test set comprises M-N samples;
in this embodiment, 630 crystal bar sample sets are divided into a training set and a test set. The training set is 600, and the test set is 30. According to the principle of an improved Canopy-K-means algorithm, firstly, carrying out coarse clustering on training set sample data by adopting an improved Canopy algorithm; continuously using a K-means algorithm to accurately cluster training set samples on the basis of an improved Canopy algorithm, and determining a final cluster center of an improved Canopy-K-means model; the test set sample is used for verifying the quality evaluation precision of the clustering model;
step 3: the samples in the training set are subjected to coarse clustering by adopting an improved Canopy algorithm, as shown in fig. 3, and the specific method is as follows:
step 3.1: the samples in the training set are arranged randomly, i.e. x= [ X ] 1 ,X 2 ,...,X i ,...,X N ]Wherein X is 1 ,X 2 ,...,X i ,...,X N For the sample data in the training set, N is the total number of samples in the training set, X i =[x i1 ,x i2 ,...,x ir ...,x is ]The feature vector of the ith sample in the training set X relative to the crystal face detection score of the crystal bar comprises s-dimension data points, namely that the crystal bar has s crystal faces, and X ir A detection score representing the r-th crystal plane of the i-th training sample; and the label Y= [ Y ] exists for each sample data in the training set 1 ,y 2 ,...,y i ,...,y N ]The data processing unit is used for representing the category to which each sample data in the training set belongs; two distance threshold values T are selected again 1 、T 2 And T is 1 >T 2
Step 3.2: randomly selecting one sample data X from a training set sample X i As the first Canopy center point, and sample data X i Delete from X;
step 3.3: selecting one sample data X from the training set sample X j (j.noteq.i), calculating X by the Bray-Curtis method j Minimum distance d to the center point of the created Canopy BCD And compares it with two distance thresholds T 1 、T 2 Comparison is performed:
(a) If d BCD ≤T 1 Give X j Weak flag indicating that it belongs to the current Canopy cluster, X will be j To the current Canopy cluster, X j Not deleted from training set X;
(b) If d BCD <T 2 Give X j Strong markers indicating that they belong to the current Canopy cluster and X will be j Deleting from the training set X;
(c) If d BCD >T 1 X is then j Not belonging to the current Canopy cluster, X j Form a new Canopy center point and X j Deleting from the training set X;
the Canopy algorithm generally uses geometric distances to measure similarity, and common distance calculation methods are Euclidean distance (Euclidean distance), manhattan distance (Manhattan distance), and the like, where Euclidean distance is the most used distance to measure the absolute distance between each point in the multidimensional space, that is, the true distance. In s-dimensional vector X j (x j1 ,x j2 ,...,x jr ,...,x js ) And X i (x i1 ,x i2 ,...,x ir ,...,x is ) For example, the two-point Euclidean distance calculation is given by the following formula:
wherein x is jr And x ir Respectively are vectors X j Sum vector X i R=1, 2..s;
euclidean distance is a widely used and relatively simple way of calculating distance, but since ingot sample data is composed of a plurality of uncorrelated values, the use of euclidean distance ignores such uncorrelation. The solution of the present invention is to modify the distance function based on the Canopy algorithm, and no longer use euclidean distance as a measure of clustering. Since the Bray-Curtis distance is more sensitive to sample differences, the irrelevance of each data sample attribute is applicable, and the method is introduced into the crystal rod quality evaluation problem to be realized by the invention, and the Bray-Curtis method is adopted to calculate X j To the center point X of the already generated Canopy i Is the minimum distance d of (2) BCD Is calculated as follows:
wherein x is jr And x ir Respectively are vectors X j Sum vector X i R=1, 2..s; d, d BCD A value close to 0 means that the two samples are very similar in relation, and d BCD A value of close to 1 means the largest difference that can be observed between the two samples.
Step 3.4: repeating the step 3.3 until the samples in the training set X are empty, and grouping the training set sample data into K Canopy clusters to obtain K cluster centers C 1 ,C 2 ,...,C k ,...,C K Wherein C k =[c k1 ,c k2 ,...,c kr ,...,c ks ]Each cluster center corresponds to one sample class;
in the embodiment, 600 sapphire crystal bar sample data in a training set are subjected to coarse clustering by adopting an improved Canopy algorithm, and different thresholds T are adjusted through multiple experiments 1 Is 0.1, T 2 0.08. The training set samples were divided into 6 different Canopy clusters, i.e., the ingot quality assessment was divided into 6 grades, i.e., the K value was 6, and the Canopy center point is shown in Table 1.
TABLE 1 Canopy center Point obtained by modified Canopy Algorithm
Step 4: on the basis of performing coarse clustering on training samples by an improved Canopy algorithm, performing accurate clustering on the training set samples by a K-means algorithm, and determining a clustering center of an improved Canopy-K-means model for crystal bar quality evaluation, wherein the specific method is as shown in fig. 4:
step 4.1: the K Canopy centers are defined as initial cluster centers (C 1 (1) ,C 2 (1) ,...,C k (1) ,...,C K (1) );
Step 4.2: setting the current iteration number as t, and aiming at all samples X in the training set i Sequentially calculating to each cluster center C k (t) Euclidean distance of (2)The following formula is shown:
wherein Dist (i, k) is the ith sample X in the training set i =[x i1 ,x i2 ,...,x ir ...,x is ]To the kth cluster center C k =[c k1 ,c k2 ,...,c kr ,...,c ks ]Is the euclidean distance of (2);
step 4.3: finding each sample point in the training set about each cluster center C k (t) And will correspond to the sample point X i Dividing into and clustering center C k (t) In Canopy with the smallest distance;
step 4.4: updating the center point of each Canopy cluster at the t+1st iteration, as shown in the following formula:
wherein n is k Represents the total number of training samples belonging to the kth Canopy cluster, X kl Represents the first sample data in the kth copy cluster, l=1, 2,.. k
Step 4.5: judging whether each cluster center converges, if so, satisfying C k (t+1) =C k (t) Stopping K-means algorithm iteration, determining a clustering center of an improved Canopy-K-means model for crystal bar quality evaluation, otherwise, repeatedly executing the steps 4.2-4.4;
the K value of the K-means algorithm needs to be specified in advance, and the initial clustering center is randomly selected to have large uncertainty, so that after coarse clustering is carried out according to the Canopy algorithm, the value of K is set to be 6, namely the number of Canopy clusters. And the initial cluster center of the K-means algorithm was set to the Canopy center point shown in table 1 obtained by the modified Canopy algorithm. The K-means algorithm uses Euclidean distance to calculate the distance between the sample and each cluster center, and the iteration is continued until the algorithm converges, namely the cluster center is not changed any more, and the clustering is finished. The improved Canopy-K-means model cluster centers for ingot quality assessment were finally obtained, each cluster center corresponding to one sample class, as shown in Table 2. The invention uses the improved Canopy algorithm and the clustering algorithm combined with K-means to evaluate the quality of the training set sample, and the algorithm flow chart is shown in figure 3.
TABLE 2 clustering centers for improved Canopy-K-means model
Step 5: the test set sample data are used for the improved Canopy-K-means models, the quality division standard of the crystal bars of the same grade is essentially that the distances are used as a measurement standard, so that the evaluation of the crystal bar quality is realized by calculating the distances between the test set sample and the clustering center of each improved Canopy-K-means model, and the average accuracy of evaluation is calculated;
calculating the distance between each sample in the test set and each cluster center of the improved Canopy-K-means model, finding out the minimum distance, classifying the test samples into the class of the cluster center corresponding to the minimum distance, wherein different classes represent different quality grades of the crystal bars;
the closer the distance from the cluster center is, the higher the similarity between the sample and the sample in the category is, the sample is classified into the category, and different quality grades of crystal bars are symbolized by different categories. The estimated average accuracy P of the test set samples under the modified Canopy-K-means model is calculated as shown in the following formula:
wherein k is the number of Canopy clusters, TP k For the correct number of classification of class k samples, TF k For the K-th sample error classification number, k=1, 2, …, K, total number of samples Total k The method meets the following conditions:
Total k =TP k +FP k (6)
in this example, 30 sample data in the test set are shown in table 3:
table 3 30 sample data in test set
The 30 test set sample data described above were used for the modified Canopy-K-means model. The model test results obtained according to the euclidean distance formula (3) are shown in table 4, wherein 'v' represents correct classification and 'x' represents incorrect classification.
TABLE 4 classification results of 30 test set samples for improved Canopy-K-means model
/>
The data in Table 4 includes the distances from each sample data in the test set to each cluster center and the modified Canopy-K-means model labels the classification results and whether the classification was correct or not for the samples in the test set in Table 3. The number of correct classifications is 26, the number of error classifications is 4, and the average accuracy of the test set sample evaluation obtained by the formula (5) reaches 86.67%. In this embodiment, the sample points are to the cluster center C 2 Distance D (C) 2 ) Representing the sample points to the cluster center C 3 Is used for the sample points D (C 3 ) The distance from the sample point to the cluster center is represented by D (C 4 ) The distance from the sample point to the cluster center is represented by D (C 5 ) And (3) representing.
Because of the large amount of data in the table, this embodiment only uses sample X in Table 4 24 -X 28 The clustering results of the (4) are analyzed and compared, and the analysis principle of the rest conditions is unchanged. From sample X 24 -X 28 As can be seen from the distance calculation result from the cluster center of Table 2, sample X 25 And cluster center C 1 Recently, it is first grade; sample X 26 And cluster center C 5 Recently, five grades are adopted; sample X 27 And cluster center C 6 Recently, six products are adopted; and sample X 24 And sample X 28 Are all distant from the cluster center C 3 They are of a similar class to three. But from the distance value, sample X is known 28 Distance cluster center C 3 The distance value of (2) is smaller, so sample X 28 The quality is better than that of sample X 24 . Therefore, the quality evaluation result of the five sapphire crystal bar samples is sample X from high to low 25 >Sample X 28 >Sample X 24 >Sample X 26 >Sample X 27
Step 6: judging whether a certain sample exists in the test set, wherein the absolute value of the distance difference between the certain sample and the two clustering centers is smaller than a set threshold epsilon, and if the absolute value of the distance difference is not smaller than the set threshold epsilon, the step 5 is the final evaluation result of the sample of the test set; if yes, further performing clustering calculation on the test sample and training samples in the Canopy cluster where the two corresponding clustering centers are located by adopting a k-NN algorithm to finish quality assessment of the high-similarity crystal bar, wherein the specific method is as shown in fig. 5:
(1) Sequentially calculating Euclidean distances between samples with absolute values of distance differences between the test set and the two clustering centers smaller than a set threshold epsilon and all training samples in a Canopy cluster where the two clustering centers are located;
(2) Sequencing all the Euclidean distances obtained by calculation in sequence according to increment;
(3) And selecting the first k 'samples with the smallest distance with the current test sample, and returning the category with the highest occurrence frequency of the first k' samples as the category of the current test sample to finish the quality evaluation of the test sample.
In this example, the threshold ε is set to be equal to 0.1, and it can be seen from Table 4 that sample X 29 Co-cluster center C 2 And C 3 The absolute value of the distance difference of 0.0575, less than ε; sample X 30 Same as C 4 And C 5 The absolute value of the distance difference is 0.0243, less than epsilon. So respectively correspond to C 2 And C 3 Is a cluster center, C 4 And C 5 Sample X is subjected to k-NN algorithm for sample data contained in two Canopy clusters in the cluster center 29 And X 30 Further quality assessment. Calculating Euclidean distance between test set samples of the category which are difficult to distinguish and training samples in the Canopy cluster where two cluster centers with similar distances are located; then sequencing all the distance values from small to large; and counting the category with highest occurrence frequency of the first k' samples as the category of the current sample. In this embodiment, the optimal k' value is 9 through trial and error, so the sample point X is selected 29 And X 30 The first 9 training samples closest to the value were compared and the clustering results are shown in table 5:
table 5 k-NN Algorithm clustering results
Table 5 shows the classification results for high-similarity ingot sample X 29 Among the first 9 samples closest in distance, belong to C 2 Sample X because the number of samples is 6 29 Belonging to C 2 Category. Sample X 30 Among the first 9 samples closest in distance, belong to C 4 Sample X because the number of samples is 5 30 Belonging to C 4 Category. Compared with the classification result of the improved Canopy-K-means algorithm in the table 4, the introduction of the K-NN algorithm further improves the average accuracy of the classification result, well completes the evaluation of the quality of the high-similarity crystal bars, and can be applied to equipment with different precision requirements.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims (4)

1. A crystal bar quality assessment method for a tumbling mill orientation apparatus is characterized by comprising the following steps of: comprises the following steps of;
step 1: establishing a sample set containing M crystal bar detection data, wherein each crystal bar detection data comprises detection grades of s crystal face qualities and corresponding scores;
step 2: dividing a sample set containing M crystal bar detection data into a training set and a testing set; the training set comprises N samples, and the test set comprises M-N samples;
step 3: performing coarse clustering on samples in the training set by adopting an improved Canopy algorithm;
step 3.1: the samples in the training set are arranged randomly, i.e. x= [ X ] 1 ,X 2 ,...,X i ,...,X N ]Wherein X is 1 ,X 2 ,...,X i ,...,X N For the sample data in the training set, N is the total number of samples in the training set, X i =[x i1 ,x i2 ,...,x ir ...,x is ]The feature vector of the ith sample in the training set X relative to the crystal face detection score of the crystal bar comprises s-dimension data points, namely that the crystal bar has s crystal faces, and X ir A detection score representing the r-th crystal plane of the i-th training sample; and the label Y= [ Y ] exists for each sample data in the training set 1 ,y 2 ,...,y i ,...,y N ]The data processing unit is used for representing the category to which each sample data in the training set belongs; two distance threshold values T are selected again 1 、T 2 And T is 1 >T 2
Step 3.2: randomly selecting one sample data X from a training set sample X i As the first Canopy center point, and sample data X i Delete from X;
step 3.3: selecting one sample data X from the training set sample X j (j.noteq.i), calculating X by the Bray-Curtis method j Minimum distance d to the center point of the created Canopy BCD And compares it with two distance thresholds T 1 、T 2 Comparison is performed:
(a) If d BCD ≤T 1 Give X j Weak flag indicating that it belongs to the current Canopy cluster, X will be j To the current Canopy cluster, X j Not deleted from training set X;
(b) If d BCD <T 2 Give X j Strong markers indicating that they belong to the current Canopy cluster and X will be j Deleting from the training set X;
(c) If d BCD >T 1 X is then j Not belonging to the current Canopy cluster, X j Form a new Canopy center point and X j Deleting from the training set X;
calculation of X by the Bray-Curtis method j To the center point X of the already generated Canopy i Is the minimum distance d of (2) BCD Is calculated as follows:
wherein x is jr And x ir Respectively are vectors X j Sum vector X i R=1, 2..s;
step 3.4: repeating the step 3.3 until the samples in the training set X are empty, and grouping the training set sample data into K Canopy clusters to obtain K cluster centers C 1 ,C 2 ,...,C k ,...,C K Wherein C k =[c k1 ,c k2 ,...,c kr ,...,c ks ]Each cluster center corresponds to one sample class;
step 4: performing accurate clustering on the training set samples by using a K-means algorithm on the basis of performing coarse clustering on the training samples by using an improved Canopy algorithm, and determining a clustering center of an improved Canopy-K-means model for crystal bar quality evaluation;
step 4.1: the K Canopy centers are defined as initial cluster centers (C 1 (1) ,C 2 (1) ,...,C k (1) ,...,C K (1) );
Step 4.2: setting the current iteration number as t, and aiming at all samples X in the training set i Sequentially calculating to each cluster center C k (t) Euclidean distance of (2)The following formula is shown:
wherein Dist (i, k) is the ith sample X in the training set i =[x i1 ,x i2 ,...,x ir ...,x is ]To the kth cluster center C k =[c k1 ,c k2 ,...,c kr ,...,c ks ]Is the euclidean distance of (2);
step 4.3: finding each sample point in the training set about each cluster center C k (t) And will correspond to the sample point X i Dividing into and clustering center C k (t) In Canopy with the smallest distance;
step 4.4: updating the center point of each Canopy cluster at the t+1st iteration, as shown in the following formula:
wherein n is k Represents the total number of training samples belonging to the kth Canopy cluster, X kl Represents the first sample data in the kth copy cluster, l=1, 2,.. k
Step 4.5: judging whether each cluster center converges, if so, satisfying C k (t+1) =C k (t) Stopping K-means algorithm iteration, determining a clustering center of an improved Canopy-K-means model for crystal bar quality evaluation, otherwise, repeatedly executing the steps 4.2-4.4;
step 5: using the test set sample data for the improved Canopy-K-means models, realizing the evaluation of the quality of the crystal bars by calculating the distance between the test set sample and the clustering center of each improved Canopy-K-means model, and calculating the average accuracy of the evaluation;
step 6: judging whether a certain sample exists in the test set, wherein the absolute value of the distance difference between the certain sample and the two clustering centers is smaller than a set threshold epsilon, and if the absolute value of the distance difference is not smaller than the set threshold epsilon, the step 5 is the final evaluation result of the sample of the test set; if the test set sample exists, further carrying out clustering calculation on the test set sample and training samples in the Canopy cluster where the two corresponding clustering centers are located by adopting a k-NN algorithm, and completing the quality assessment of the high-similarity crystal bars.
2. The method for evaluating the quality of an ingot for a tumbling mill orientation machine according to claim 1, wherein: the method for determining the detection grade and the corresponding score of the crystal face quality in the step 1 comprises the following steps:
in the grinding and orientation stage of the roller mill orientation instrument, the quality detection of crystal faces of the crystal bars is divided into n grades, and if the grade of the crystal faces is grade A, the corresponding score is n; the crystal face grade is grade B, and the corresponding score is n-1; the remaining ranks and scores are analogized.
3. The method for evaluating the quality of an ingot for a tumbling mill orientation machine according to claim 2, wherein: the specific method in the step 5 is as follows:
calculating the distance between each sample in the test set and each cluster center of the improved Canopy-K-means model, finding out the minimum distance, classifying the test samples into the class of the cluster center corresponding to the minimum distance, wherein different classes represent different quality grades of the crystal bars;
the estimated average accuracy P of the test set samples under the modified Canopy-K-means model is calculated as shown in the following formula:
wherein k is the number of Canopy clusters, TP k For the correct number of classification of class k samples, TF k For the K-th sample error classification number, k=1, 2, …, K, total number of samples Total k The method meets the following conditions:
Total k =TP k +FP k (5)。
4. a method for ingot quality assessment for a tumbling mill orienter as claimed in claim 3, wherein: and step 6, further performing clustering calculation on the test sample and training samples in the Canopy cluster where the two corresponding clustering centers are located by adopting a k-NN algorithm, wherein the specific method for completing the quality evaluation of the high-similarity crystal bar comprises the following steps:
(1) Sequentially calculating Euclidean distances between samples with absolute values of distance differences between the test set and the two clustering centers smaller than a set threshold epsilon and all training samples in a Canopy cluster where the two clustering centers are located;
(2) Sequencing all the Euclidean distances obtained by calculation in sequence according to increment;
(3) And selecting the first k 'samples with the smallest distance with the current test sample, and returning the category with the highest occurrence frequency of the first k' samples as the category of the current test sample to finish the quality evaluation of the test sample.
CN202010862885.1A 2020-08-25 2020-08-25 Crystal bar quality assessment method for roller mill orientation instrument Active CN111985823B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010862885.1A CN111985823B (en) 2020-08-25 2020-08-25 Crystal bar quality assessment method for roller mill orientation instrument

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010862885.1A CN111985823B (en) 2020-08-25 2020-08-25 Crystal bar quality assessment method for roller mill orientation instrument

Publications (2)

Publication Number Publication Date
CN111985823A CN111985823A (en) 2020-11-24
CN111985823B true CN111985823B (en) 2023-10-27

Family

ID=73444212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010862885.1A Active CN111985823B (en) 2020-08-25 2020-08-25 Crystal bar quality assessment method for roller mill orientation instrument

Country Status (1)

Country Link
CN (1) CN111985823B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897402A (en) * 2022-05-26 2022-08-12 西安奕斯伟材料科技有限公司 Crystal bar manufacturing management method and crystal bar manufacturing management system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528823A (en) * 2017-07-03 2017-12-29 中山大学 A kind of network anomaly detection method based on improved K Means clustering algorithms
CN109725013A (en) * 2018-12-20 2019-05-07 深圳晶泰科技有限公司 X ray diffracting data analysis system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528823A (en) * 2017-07-03 2017-12-29 中山大学 A kind of network anomaly detection method based on improved K Means clustering algorithms
CN109725013A (en) * 2018-12-20 2019-05-07 深圳晶泰科技有限公司 X ray diffracting data analysis system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于大数据架构的变电设备运行质量智能评价;张吉生;张波;于烨;;电力大数据(09);43-47 *

Also Published As

Publication number Publication date
CN111985823A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN107515895B (en) Visual target retrieval method and system based on target detection
CN109887015B (en) Point cloud automatic registration method based on local curved surface feature histogram
CN111080684B (en) Point cloud registration method for point neighborhood scale difference description
EP3910504A1 (en) Aviation blade profile detection method and system based on variable tolerance zone constraint
CN108804731B (en) Time series trend feature extraction method based on important point dual evaluation factors
CN101140624A (en) Image matching method
CN109034262B (en) Batch processing method for defect identification of X-ray orientation instrument
CN110188225B (en) Image retrieval method based on sequencing learning and multivariate loss
CN108197647B (en) Rapid clustering method for automobile starter endurance test data
CN111898443B (en) Flow monitoring method for wire feeding mechanism of FDM type 3D printer
CN112085252B (en) Anti-fact prediction method for set type decision effect
CN111985823B (en) Crystal bar quality assessment method for roller mill orientation instrument
CN112287980B (en) Power battery screening method based on typical feature vector
CN113889192B (en) Single-cell RNA-seq data clustering method based on deep noise reduction self-encoder
CN116523320A (en) Intellectual property risk intelligent analysis method based on Internet big data
CN114139618A (en) Signal dependent noise parameter estimation method based on improved density peak clustering
CN116109613A (en) Defect detection method and system based on distribution characterization
CN115293290A (en) Hierarchical clustering algorithm for automatically identifying cluster number
CN108537249B (en) Industrial process data clustering method for density peak clustering
CN116883393B (en) Metal surface defect detection method based on anchor frame-free target detection algorithm
CN110909792A (en) Clustering analysis method based on improved K-means algorithm and new clustering effectiveness index
CN112164144B (en) Casting three-dimensional model classification method combining D2 operator and normal operator
CN113361616A (en) K-means algorithm for optimizing clustering center
CN114813347A (en) Method for predicting mechanical property of ultrathin niobium strip
Shao et al. Design and research of metal surface defect detection based on machine vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant