CN101484910B

CN101484910B - Clustering system, and defect kind judging device

Info

Publication number: CN101484910B
Application number: CN200780025547.5A
Authority: CN
Inventors: 楜泽信; 胜吕昭男; 大西孝二
Original assignee: Asahi Glass Co Ltd
Current assignee: AGC Inc
Priority date: 2006-07-06
Filing date: 2007-07-03
Publication date: 2015-04-08
Anticipated expiration: 2027-07-03
Also published as: KR20090018920A; JPWO2008004559A1; CN101484910A; WO2008004559A1; JP5120254B2; KR100998456B1; TW200818060A; TWI434229B

Abstract

Provided is a clustering system capable of classifying target data more rapidly and precisely than the examples of the prior art. The clustering system classifies input data into each of clusters formed of the populations of learning data, in terms of featuring quantities owned by the input data. The clustering system comprises a featuring set storage unit stored with featuring quantity sets or a combination of featuring quantities to be used for the classifications, in a manner to correspond to the individual clusters, a featuring quantity extracting unit for extracting a preset featuring quantity from the input data, a distance calculating unit for calculating and outputting the distances between the center of the population of the individual clusters and the input data, individually as set distances for each featuring quantity set corresponding to each cluster, on the basis of the featuring quantities contained in the featuring quantity set, and a rank extracting unit for arraying the individual set distances in the order of the smaller length.

Description

Clustering system and defect kind judging device

Technical field

The present invention relates to the characteristic signal the clustering system that the classification of defect is classified and defect kind judging device that the parts of images of the defect part in the image of detected object thing are taken out, extracted from this parts of images defect.

Background technology

Implemented at large in the past and utilized the distance of unknown data and learning data, the clustering method of such as geneva (Mahalanobis generalized distance) distance.That is, by judging that the classification whether unknown data belongs to as the colony (population) learnt in advance is classified, clustering processing is carried out.Such as, according to the size of the mahalanobis distance to multiple classification, classification unknown data being belonged to which colony judges (such as with reference to patent documentation 1).

In addition, for effectively calculating above-mentioned distance, multiple characteristic quantity is selected to carry out clustering processing.

In addition, the method that the classification utilize the ballot of the result obtained by multiple recognizer (classifier), belonging to this unknown data judges is also comparatively general, use the output of different sensors recognition result or for (such as with reference to patent documentation 2) such as the recognition results in the identification of the unknown data of zones of different on an image.

Utilize above-mentioned clustering method, the parameter obtained according to the result of blood test medical diagnosis on disease, namely belong in the cluster of any disease, there is following method, namely, being set in each in multiple classification is a combination with 2 classifications, to this combination each of all combinations, carries out the judgement that detected data is similar to any classification, according to the statistics of this judgement number, decide to be categorized into the classification (such as patent documentation 3) that the number that is determined is more.

By on LCD glass substrate with each defect, classify according to each defect kind preset time, carry out following cluster, namely, corresponding with identification during classification, the each characteristic quantity used in classification is optimized, and each characteristic quantity is weighted respectively, make corresponding with this optimization, and use the characteristic quantity after this optimization, judge (such as patent documentation 4) belonging to any classification.

Patent documentation 1: JP 2005-214682 publication

Patent documentation 2: JP 2001-56861 publication

Patent documentation 3: Unexamined Patent 07-105166 publication

Patent documentation 4: JP 2002-99916 publication

But in the cluster shown in patent documentation 3, be not optimized each combination, utilization not yet in effect becomes and differentiates the characteristic quantity of material, and if the classification that should differentiate becomes many time, then number of combinations is huge, has the problem that determination processing required time increases.

In addition, in cluster shown in patent documentation 4, although want according to judgement rate, discrimination precision is improved to characteristic quantity weighting, but not to the concept that the characteristic quantity of each classification is optimized, same with above-mentioned patent documentation 3, utilize characteristic quantity due to not yet in effect, therefore there is the shortcoming cannot carrying out high-precision classification.

Summary of the invention

The present invention completes in view of such situation, the characteristic quantity from the object and object of classification extracting data being categorized into affiliated classification is effectively utilized when differentiating, compared with conventional example, provide can classify to object of classification data with more speed, more high precision, such as can by glass surface with classification of defects become clustering system and the defect kind judging device of the classification corresponding with defect kind.

For solving the problem, from utilize the characteristic quantity of one species calculate object of classification data and and distance between respectively classifying to decide the conventional example of class object different, in the present invention, owing to can obtain the set of the characteristic quantity of difference at of all categories to each category setting, and utilize different characteristic quantities to obtain distance between respective classification, therefore compared with the pastly carry out the higher classification of precision.

The set of above-mentioned characteristic quantity, owing to being implement according to belonging to learning data of all categories, is therefore made up of the characteristic quantity can distinguished with other classification.

That is, the present invention adopts following formation.

Clustering system of the present invention, utilize input data (input data) have characteristic quantity (parameter), by this input Data classification become by the colony of learning data (learning data) formed of all categories, wherein, comprise: characteristic quantity set storage part, this characteristic quantity set storage part store corresponding with of all categories, classify in the characteristic quantity set (parameter set) of combining as characteristic quantity that uses; Feature amount extraction module, the characteristic quantity that this feature amount extraction module presets from input extracting data; Distance calculating part, this distance calculating part to characteristic of correspondence duration set each and of all categories, to calculate respectively according to the characteristic quantity comprised in this characteristic quantity set and the distance exported between the center of colony of all categories and described input data as aggregate distance; And precedence extraction unit, described each aggregate distance arranges with order from small to large by this precedence extraction unit.

The preferred clustering system of the present invention, to the multiple described characteristic quantity set of each category setting.

The preferred clustering system of the present invention, also there is category classification portion, in the described aggregate distance that this classification division obtains in each characteristic quantity set, utilize the mode of rule of the expression set by the precedence of this aggregate distance to the classification benchmark of all categories of input data to detect described input data and belong to which classification.

The preferred clustering system of the present invention, described category classification portion utilizes the precedence of described aggregate distance to belong to which classification to detect described input data, detects that the more classification of the preceding aggregate distance of this precedence is as the classification belonging to described input data.

The preferred clustering system of the present invention, described category classification portion has the threshold value for the preceding number of precedence, when preceding classification is more than this threshold value, detects as the classification belonging to input data.

The preferred clustering system of the present invention, described distance calculating part is multiplied by corresponding to the correction coefficient set by characteristic quantity set described aggregate distance, and by the aggregate distance standardization between each characteristic quantity set.

The preferred clustering system of the present invention, also there is the characteristic quantity set generating unit of the characteristic quantity set generating each classification, described characteristic quantity set generating unit is to each combination of multiple combinations of each characteristic quantity, with the mean value of the learning data of colony of all categories for initial point, the mean value of the distance between each learning data obtaining the colony of this initial point and other classification, select the combination of the maximum characteristic quantity of mean value, as identifying characteristic quantity set that is of all categories and other classification.

Defect kind judging device of the present invention, is provided with any one in the clustering system of above-mentioned record, and described input data are view data of product defects, utilizes the characteristic quantity representing defect, the kind of the defect in view data by defect is classified.

The preferred defect kind judging device of the present invention, described product is glass article, the kind of the defect of this glass article by defect is classified.

Defect detecting device of the present invention, is provided with the kind that above-mentioned defect kind judging device carrys out testing product defect.

Production status decision maker of the present invention, the defect of defect kind judging device to product being provided with above-mentioned record is classified, according to the occurrence cause of the corresponding defect detected in production run with the occurrence cause corresponding to this kind.

The preferred production status decision maker of the present invention, be provided with any one in the clustering system of above-mentioned record, described input data are the characteristic quantities of the working condition represented in process of producing product, are classified by this characteristic quantity by the production status in each operation of production run.

The preferred production status decision maker of the present invention, described product is glass article, is classified by the characteristic quantity in the production run of this glass article by the production status in each operation of production run.

Production status pick-up unit of the present invention, the kind of production status decision maker to the production status in each operation of process of producing product being provided with above-mentioned record detects.

Production status pick-up unit of the present invention, the kind of production status decision maker to the production status in each operation of process of producing product being provided with above-mentioned record detects, and carries out the process control in the operation of production run according to the item controlled corresponding to this kind.

As described above, according to the present invention, due to the classification to each class object, the multiple characteristic quantities had by object of classification data, preset the combination with the characteristic quantity of the best of other classification distance, calculate object of classification data respectively and of all categories between distance, and the classification that the distance becoming this to calculate object of classification Data classification is minimum, therefore compared with the existing methods, more accurately object of classification Data classification can be become corresponding classification.

In addition, according to the present invention, due to the multiple combinations thereof of each category setting, the result of calculation of the distance of whole classification and object of classification data is arranged with order from small to large, and the classification that the number comprised in group before the number becoming to preset by object of classification Data classification is maximum, therefore compared with the pastly can carry out high-precision classification.

Accompanying drawing explanation

Fig. 1 is the block scheme of the configuration example of the clustering system representing the of the present invention 1st and the 2nd embodiment.

Fig. 2 illustrates the list for the process selecting characteristic set according to discrimination standard value λ.

Fig. 3 illustrates the list for the process selecting characteristic set according to discrimination standard value λ.

Fig. 4 illustrates the histogram for the effect selecting characteristic set according to discrimination standard value λ.

Fig. 5 is the process flow diagram to the action case in the process of selection characteristic quantity of all categories set representing the 1st embodiment.

Fig. 6 is the process flow diagram to the action case in the clustering processing of object of classification data representing the 1st embodiment.

Fig. 7 is the process flow diagram of the action case of the list representing the mode of rule used in the generation clustering processing in the 2nd embodiment.

Fig. 8 is the process flow diagram to the action case in the clustering processing of object of classification data representing the 2nd embodiment.

Fig. 9 is the process flow diagram to the action case in other clustering processing of object of classification data representing the 2nd embodiment.

Figure 10 is the process flow diagram to the action case in the clustering processing of object of classification data representing the 3rd embodiment.

Figure 11 represents the process flow diagram of setting as the action case of the arithmetic expression of the transform method of characteristic quantity.

Figure 12 is the process flow diagram of the action case representing the evaluation of estimate calculated in the process flow diagram of Figure 11.

Figure 13 represents to use the characteristic quantity after being converted by set transform method to calculate the process flow diagram of the action case of distance.

Figure 14 is the list representing the learning data belonging to of all categories.

Figure 15 represents to utilize the clustering method of conventional example by the results list of sorted for the learning data of Figure 14 result.

Figure 16 illustrates the concept map calculating the method for Integral correction judgement rate.

Figure 17 represents to utilize the clustering system in the 1st embodiment by the results list of sorted for the learning data of Figure 14 result.

Figure 18 represents to utilize the clustering system in the 2nd embodiment by the results list of sorted for the learning data of Figure 14 result.

Figure 19 represents to utilize the clustering system in the 2nd embodiment by the results list of sorted for the learning data of Figure 14 result.

Figure 20 is the block scheme of the configuration example representing the testing fixture employing clustering system of the present invention.

Figure 21 is the process flow diagram of the action case of the selection characteristic quantity set represented in the testing fixture of Figure 20.

Figure 22 is the process flow diagram of the action case of the clustering processing represented in the testing fixture of Figure 20.

Figure 23 is the block scheme of the configuration example representing the defect kind judging device employing clustering system of the present invention.

Figure 24 is the block scheme of the configuration example representing the production management device employing clustering system of the present invention.

Figure 25 is the block scheme of the configuration example representing other production management device employing clustering system of the present invention.

Label declaration

1 ... characteristic quantity set generating unit

2 ... feature amount extraction module

3 ... distance calculating part

4 ... characteristic quantity set storage part

5 ... category database

100 ... checked property

101 ... image acquiring section

102 ... lighting device

103 ... camera head

104 ... defect candidate test section

105 ... cluster portion

200,300 ... control device

201,202 ... image capturing device

301,302 ... process units

303 ... notification unit

304 ... storage part

Embodiment

Clustering system of the present invention relates to a kind of characteristic quantity utilizing the input data of object of classification to have, by the clustering system of all categories that this input Data classification becomes to be formed as colony by learning data, have of all categories corresponding with described, store the characteristic quantity set storage part of the characteristic quantity set as characteristic quantity combination used in classification, feature amount extraction module is according to this characteristic quantity set preset, from described input extracting data characteristic quantity, distance calculating part is to characteristic of correspondence duration set each and of all categories, calculate respectively with the distance of colony and described input data as aggregate distance according to the characteristic quantity comprised in this characteristic quantity set, each aggregate distance arranges with order from small to large by precedence extraction unit, and the classification carried out corresponding to putting in order classification.

< the 1st embodiment >

Below, the clustering system of the 1st embodiment of the present invention is described with reference to accompanying drawing.Fig. 1 is the block scheme of the configuration example of the clustering system representing this embodiment.

The clustering system of present embodiment as shown in Figure 1, has characteristic quantity set generating unit 1, feature amount extraction module 2, distance calculating part 3, characteristic quantity set storage part 4 and category database 5.

Corresponding with identifying information of all categories in characteristic quantity set storage part 4, to store the combination of the characteristic quantity to the presentation class object data that each classification sets respectively characteristic quantity set.Such as, characteristic quantity set of all categories { when a, b, c, d}, is set the combination as the characteristic quantity of the kind such as [a, b], [a, b, c, d], [c] by the set being characteristic quantity in object of classification data.In the following description, from the set of described characteristic quantity, any one in the combination of the combination of all characteristic quantities or multiple (in described example, any 2,3 characteristic quantities in set) is defined as " combination of characteristic quantity ".

Here, when classification A, B and C are set the classification as class object, be use to be categorized into learning data of all categories in advance, to obtain as the combination of the characteristic quantity maximum with the distance of other classification of all categories with characteristic of correspondence duration set of all categories, and be stored in characteristic quantity set storage part 4.

Such as, the vector characteristic quantity set set for classification A setting formed as the mean value of each characteristic quantity by the learning data belonging to classification A and the distance of vector be made up of the mean value of each characteristic quantity of the learning data of the classification B and classification C that belong to other become the combination of maximum characteristic quantity.

In addition, object of classification data and of all categories in the learning data of colony be made up of the set of same characteristic quantity.

Feature amount extraction module 2 is when being calculated with distance of all categories by the object of classification data inputted, read from characteristic quantity set storage part 4 and the classification characteristic of correspondence duration set becoming calculating object, and extract and this characteristic quantity set characteristic of correspondence amount from multiple characteristic quantities of object of classification data, the characteristic quantity after extracting is outputted to distance calculating part 3.

Distance calculating part 3 using the identifying information of the classification of calculating object as keyword, the vector be made up of the mean value of each eigenwert of the learning data of the classification becoming calculating object is read from category database 5, according to such other characteristic quantity set, calculate the distance of the vector that the characteristic quantity that extracted by object of classification data forms and the vector (representing the center of gravity vector of the centre of gravity place of the multiple learning datas in classification) be made up of the mean value of each characteristic quantity of learning data.

When carrying out the calculating of described distance, distance calculating part 3 is in order to make the data unit indifference between characteristic quantity, and by the data normalization between characteristic quantity, utilize formula (1) below, each characteristic quantity v (i) in object of classification data is normalized.

V(i)＝(v(i)-avg.(i))/std.(i)…(1)

Here, v (i) is characteristic quantity, avg. (i) be calculating object classification in learning data in the mean value of characteristic quantity, std. (i) be calculating object classification in learning data in the standard deviation (standardized deviation) of characteristic quantity, V (i) is the characteristic quantity after normalization.Thus, when calculating distance, distance calculating part 3 needs the normalization each characteristic quantity set being carried out to each characteristic quantity.

In addition, distance calculating part 3 is to each characteristic quantity calculated for the distance in object of classification data, and the mean value of characteristic of correspondence amount and standard deviation carry out described normalized respectively to use learning data.

In addition, as distance, any one amount of the standardization Euclidean distance (standardized Euclidean distance) of the characteristic quantity after using above-mentioned standardization, mahalanobis distance, Ming Shi distance (Minkowskydistance) etc. also can be adopted.

Here, when using mahalanobis distance, utilize formula (2) below to obtain Mahalanobis square distance (Mahalanobis square distance) MHD.

MHD＝(1/n)·(V ^TR ^-1V)…(2)

Each key element V (i) in matrix V in above-mentioned formula (2) is characteristic quantity v (i) of multidimensional to unknown data, the characteristic quantity that utilizes mean value avg. (i) of the characteristic quantity of the learning data in this classification and standard deviation std. (i) to be obtained by above-mentioned formula (1).N is degree of freedom, represents number and the characteristic quantity number of the characteristic quantity in the characteristic quantity set (aftermentioned) of present embodiment.Thus, Mahalanobis square distance be by n convert after characteristic quantity difference be added after numerical value, utilize (Mahalanobis square distance)/n, then the unit distance that colony is average becomes 1.In addition, V ^twith the transposed matrix of characteristic quantity v (i) matrix V that is key element, R ^-1it is the inverse matrix of correlation matrix (correlation matrix) R between each characteristic quantity in the learning data in classification.

The characteristic quantity set that uses when characteristic quantity set generating unit 1 calculates the distance of described distance calculating part 3 between calculating object of classification data and be of all categories to each classification, correspond to identifying information of all categories by calculating result, write-in characteristic duration set storage part 4 also stores.

When calculating characteristic quantity set, characteristic quantity set generating unit 1 is to each classification, according to belong to generating feature duration set object type learning data center of gravity vector (barycentric vector) and belong to the distance of center of gravity vector of learning data of other all categories beyond this object type, utilize the value λ of formula (3) computational discrimination benchmark (discriminant criterion) below.Below, the combination of characteristic quantity is described as characteristic quantity set.

λ＝ω _oω _i(μ _o-μ _i) ²/(ω _oσ _o ²+ω _iσ _i ²)…(3)

In above-mentioned formula (3), μ _ithe center of gravity vector be made up of the mean value of the characteristic quantity in the characteristic quantity set of " belonging to the learning data (classification in-group) of object type ".σ _iit is the standard deviation of the vector of the characteristic quantity according to the learning data belonging to this classification in-group.ω _ithe learning data that belongs to classification in-group to the ratio of number of learning data belonging to all categories.In addition, μ _othe center of gravity vector be made up of the mean value of the characteristic quantity in the characteristic quantity set of " belonging to the learning data (object type outgroup) of the classification beyond object type ".σ _oit is the standard deviation of the vector of the characteristic quantity according to the learning data belonging to this object type outgroup.ω _oit is the ratio belonging to the number of the learning data of classification outgroup belonged in the learning data of all categories.Here, (the μ in formula (3) _o-μ _i) also can use the numerical value after getting log (logarithm) and square root.In addition, here when calculating each vector, the characteristic quantity after characteristic quantity set generating unit 1 utilizes formula (1) to calculate each characteristic quantity normalization uses.In addition, eigenvalue can also be set as comparative example ω in advance _iand ω _ocarry out after computing, that difference is larger numerical value.

Then, characteristic quantity set generating unit 1 uses above-mentioned formula (3) to each object type, for the combination of any or all of the characteristic quantity of formation learning data, calculate the described discrimination standard value λ with other classification, and the discrimination standard value λ after calculating is arranged with order from big to small, export the sequence list of discrimination standard value λ.

Here, the combination with maximum discrimination standard value λ characteristic of correspondence amount as the characteristic quantity set of object type, is corresponded to the identifying information of classification, is stored into characteristic quantity set storage part 4 by characteristic value collection generating unit 1 together with discrimination standard value λ.

The decision of above-mentioned discrimination standard value λ is as shown in Fig. 2 (a), when characteristic quantity set generating unit 1 carries out the setting of characteristic quantity set of all categories, be a at the characteristic quantity of learning data and object of classification data, b, c, when 4 of d amounts, calculate all discrimination standard value λ in the whole, multiple of these 4 characteristic quantities and the whole combinations of any one.

Then, characteristic quantity set generating unit 1 selects the combination of characteristic quantity b, c in the highest numerical value, such as Fig. 2 (a).

In addition, as the method for other discrimination standard value λ, BSS method as shown in Fig. 2 (b), namely to using the discrimination standard value λ of whole n the characteristic quantities comprised in the set of object of classification data to carry out computing, then to the whole combinatorial operation discrimination standard value λ taking out n-1 in the set from n characteristic quantity.Then, from the discrimination standard value λ of this n-1, select the combination of maximal value, specifically, to whole combinatorial operation discrimination standard value λ of the n-2 selected from this n-1 characteristic quantity.Like this, also can constitutive characteristic duration set generating unit 1, make in turn respectively from set to reduce by 1 characteristic quantity, from the set of the characteristic quantity after reducing, select to reduce the combination union discrimination standard value λ after 1 again, select the combination can carrying out with less characteristic quantity number differentiating.

In addition, as the method for another kind of discrimination standard value λ, FSS method as shown in Fig. 2 (c), namely 1 all categories each and every one reading characteristic quantity in n the characteristic quantity comprised from the set of object of classification data, the discrimination standard value λ of each characteristic quantity of computing, therefrom selects the characteristic quantity with maximum discrimination standard value.Then, generate the combination be made up of 2 characteristic quantities of this characteristic quantity and the characteristic quantity except it, calculate the discrimination standard value λ to respective combination.Then, from this combination, the combination with maximum discrimination standard value is selected.Then, generate the combination be made up of 3 characteristic quantities of the characteristic quantity do not comprised in this combination and this combination, and generate respective discrimination standard value λ.Like this, also can constitutive characteristic duration set generating unit 1, make the characteristic quantity in turn selecting to have maximum discrimination standard value λ from the combination of the characteristic quantity before immediately, non-existent characteristic quantity in 1 this combination is increased to the combination of the characteristic quantity of combination, calculate the discrimination standard value λ of the characteristic quantity of the combination after increasing, the combination with maximum discrimination standard value λ is selected from this combination, computing is carried out to the discrimination standard value λ of the combination of the characteristic quantity increased again in 1 this combination after non-existent characteristic quantity, final from the whole combination after computational discrimination reference value λ, discrimination standard value λ is selected to become maximum combination as characteristic quantity set.

Then, Fig. 3 and Fig. 4 is utilized to illustrate according to discrimination standard value λ, the validity selecting the characteristic quantity set used in cluster.

Illustrate in Fig. 3: from characteristic quantity a, b, c, d, the combination of characteristic quantity a and g, the combination of characteristic quantity a and h and the combination of characteristic quantity d and e is extracted in e, as the combination selecting characteristic quantity set, be combined in classification 1, classification 2 and 3 by these, the selection of characteristic quantity set has high-class characteristic compared with conventional example.

In Fig. 3, μ 1 corresponds to described μ _i, μ 2 corresponds to described μ _o, σ 1 corresponds to described σ _i, σ 2 corresponds to described σ _o, ω 1 corresponds to described ω _i, ω 2 corresponds to described ω _o.

Wherein, in described combination, the combination of what the value of discrimination standard value λ was maximum is characteristic quantity a and h, by this combination for separating of classification 1 and the classification except it, utilizes Fig. 4 to confirm the classification results of classification 1 and the classification except it (classification 2 and 3).

In Fig. 4, transverse axis represents that the mahalanobis distance after using the combination of characteristic quantity to carry out computing gets the numerical value of log, and the longitudinal axis represents the number (histogram) of the separate object data with corresponding numerical value.Here, the numerical value 1.4 of transverse axis represent the numerical value that mahalanobis distance gets log be 1.4 less than and more than 1.2 left side of 1.4 (numerical value).Numerical value on other transverse axis is also same.In addition, 1.4≤expression more than 1.4 in Fig. 4.The mahalanobis distance of Fig. 4 is use and classification 1 characteristic of correspondence duration set, calculates respectively the object of classification data belonging to classification 1 and the classification except it.

Fig. 4 (a) is the example using the combination of characteristic quantity a and g to carry out computing mahalanobis distance, Fig. 4 (b) is the example using the combination of characteristic quantity a and h to carry out computing mahalanobis distance, and Fig. 4 (c) is the example using the combination of characteristic quantity d and e to carry out computing mahalanobis distance.

The histogram observed in Fig. 4 is known, when the numerical value of discrimination standard value λ is larger, can carry out the classification of classification 1 and other classification preferably.

Then, with reference to Fig. 5 and Fig. 6, the action of the clustering system of the 1st embodiment of key diagram 1.Fig. 5 is the process flow diagram of the action case of the characteristic quantity set generating unit 1 of the clustering system representing the 1st embodiment, and Fig. 6 is the process flow diagram of the action case of the cluster of presentation class object data.

In the following description, such as object of classification data are glass articles with the set of characteristic quantity of scar time, suppose to obtain " a: the length of scar (scratch) ", " b: the area of scar ", " c: the width of scar ", " d: the transmissivity comprising the presumptive area of scar part " and " e: the reflectivity comprising the presumptive area of scar part " etc. as this characteristic quantity from image procossing or measurement result.Thus, the set (hereinafter referred to as characteristic quantity set) as characteristic quantity becomes { a, b, c, d, e}.In addition, in present embodiment, the mahalanobis distance of the distance used in cluster as the characteristic quantity after use standardization is calculated.Here, the above-mentioned glass article in present embodiment as an example, such as, is sheet glass or glass substrate for display.

A. characteristic quantity set generating process (process flow diagram of corresponding diagram 5)

User detect on glass with scar, take this image and obtain view data, and utilize image procossing to carry out measuring etc. extracting characteristic quantity to the length of scar part from this view data, gather the characteristic quantity data be made up of the set of described characteristic quantity.Then, user for the producing cause or shape etc. of scar want to carry out to classify of all categories, according to information such as the producing cause judged in advance or shapes, characteristic quantity data are distributed as learning data, as the colony of learning data of all categories, never illustrated processing terminal corresponds to the identifying information of classification, is stored into category database 5 (step S1).

Then, when generating the control command for characteristic quantity set of all categories from described processing terminal input, characteristic quantity set generating unit 1 corresponds to identifying information of all categories from category database 5, reads in the colony of learning data.

Then, characteristic quantity set generating unit 1 calculates mean value and the standard deviation of each characteristic quantity in classification in-group to each classification, uses this mean value and standard deviation, is calculated the characteristic quantity after the standardization in each learning data by formula (1).

Then, each characteristic quantity set of whole combinations of the characteristic quantity comprised in the 1 pair of characteristic quantity set of characteristic quantity set generating unit, utilizes formula (3) to calculate discrimination standard value λ.

Now, characteristic quantity set generating unit 1 is to each classification, and the characteristic quantity after the standardization of use classes in-group, calculates: by mean value (center of gravity vector) μ of the vector formed with each characteristic quantity set characteristic of correspondence amount _i; With the standard deviation of the vector by the learning data formed with the characteristic quantity set characteristic of correspondence amount in classification in-group _i, and the characteristic quantity after the standardization of use classes outgroup, calculates: by mean value (center of gravity vector) μ of the vector formed with each characteristic quantity set characteristic of correspondence amount _o; By the standard deviation of the vector of the learning data formed with the characteristic quantity set characteristic of correspondence amount in classification outgroup _o; The proportion omegab of the learning data number of the classification in-group in global learning data amount check _i; With the proportion omegab of the learning data number of the classification outgroup in global learning data amount check _o.

Then, characteristic quantity set generating unit 1 uses described center of gravity vector μ _i, μ _o; Standard deviation _i, σ _o; And proportion omegab _i, ω _o, utilize formula (3) to each classification, for the characteristic quantity set of whole combinations of characteristic quantity set, calculate and each classification differentiated and the discrimination standard value λ of distance of other classification.

At the end of the calculating of all discrimination standard value λ, discrimination standard value λ arranges with order from big to small each classification by characteristic quantity set generating unit 1, detect and maximum discrimination standard value λ characteristic of correspondence duration set, as judge to belonging to of all categories time, represent during distance calculates the characteristic quantity set (step S2) of the set of the combination of the characteristic quantity used.

Then, characteristic quantity set generating unit 1 uses in calculating in the distance of distance calculating part 3, calculates: and the coefficient R between each characteristic quantity set characteristic of correspondence amount; With mean value avg. (i) and standard deviation std. (i) (step S3) of the characteristic quantity of the learning data in in-group of all categories.

Then, characteristic quantity set generating unit 1 calculates correction coefficient λ from described discrimination standard value λ ^-(1/2).This correction coefficient λ ^-(1/2)obtain the standardized coefficient between each characteristic quantity set.Due to different according to classification, and the distance between other classification has deviation, therefore for improving nicety of grading, needs to carry out the standardization between characteristic quantity set.In addition, be not λ as correction coefficient yet ^-(1/2), but log (λ), or merely use (μ _o-μ _i), as long as comprise the function of λ and the standardization can carried out between characteristic quantity set.

In addition, in above-mentioned formula (3), the center of gravity vector μ in the characteristic quantity set of object type outgroup is calculated _otime, as the learning data in object type outgroup, be that any one selecting three kinds below calculates.

The global learning data of the object type outgroup a. in global learning data

B. the specific learning data corresponding with the object of classification in above-mentioned object type outgroup

The learning data of the object type outgroup in the learning data used in the selection of c. characteristic quantity

Here, the object of the classification of b. is attachedly clearly from the classification paid close attention to give different difference, is use to think attachedly to give the learning data comprised in other different classifications as learning data.

Then, characteristic quantity set generating unit 1 corresponds to each identifying information of all categories, by characteristic quantity set; Be λ in the correction coefficient corresponding with characteristic quantity set, present embodiment ^-(1/2)value; Inverse matrix R ^-1; Mean value avg. (i); Standard deviation std. (i), calculates data as distance and is stored into characteristic quantity set storage part 4 (step S4).

B. clustering processing (process flow diagram of corresponding diagram 6)

When object of classification data are transfused to, feature amount extraction module 2 utilizes identification signal of all categories, reads and each classification characteristic of correspondence duration set from characteristic quantity set storage part 4.

Then, feature amount extraction module 2 corresponds to the kind of the characteristic quantity in the characteristic quantity set read, to each classification extraction characteristic quantity from object of classification data, and correspond respectively to the identifying information of classification, the characteristic quantity of extraction is stored into storage inside portion (step S11).

Then, distance calculating part 3 reads mean value avg. (i) corresponding with each characteristic quantity from object of classification extracting data and standard deviation std. (i) from characteristic quantity set storage part 4, by carrying out the computing of described formula (2) by each characteristic quantity normalization, and the characteristic quantity being stored in storage inside portion is replaced to the characteristic quantity after standardization.

Then, distance calculating part 3 generates the matrix V be made up of the key element of the above-mentioned V (i) obtained like that, calculates the transposed matrix V of this matrix V ^t, utilize formula (3), in turn calculate object of classification data and of all categories between mahalanobis distance, and correspond to identifying information of all categories, be stored into storage inside portion (step S12).

Then, distance calculating part 3, for the described mahalanobis distance of result of calculation, is multiplied by the correction coefficient λ corresponding with characteristic quantity set ^-(1/2), obtain correction distance, and replace (step S13) with mahalanobis distance respectively.In addition, when being multiplied by correction coefficient, also can be multiplied after the log calculating mahalanobis distance or square root.

Then, distance calculating part 3 compare in storage inside portion and of all categories between correction distance (step S14), detect minimum correction distance, using the classification of the identifying information corresponding with this correction distance as the classification belonging to object of classification data, corresponding to the identifying information of the classification of class object, to category database 5, store sorted object of classification data (step S15).

< the 2nd embodiment >

The 1st above-mentioned embodiment describes the characteristic quantity set of use when carrying out cluster the situation of each classification as a kind, and also can as the 2nd embodiment described below, to the multiple characteristic quantity set of each category setting, the mahalanobis distance that computing is corresponding with respective characteristic quantity set, calculate correction distance, this correction distance is rearranged with order from small to large, utilize the correction distance within preceding predetermined precedence, according to the rule preset, set the classification belonging to object of classification data.

Namely, distance calculating part 3 in present embodiment is in the object of classification data obtained each characteristic quantity set and distance of all categories, the presentation class object data that utilization sets according to the precedence of this distance is to the mode of rule of classification benchmark of all categories, and detecting object of classification data is belong to which classification.

Below, the formation of the 2nd embodiment is identical with the 1st embodiment shown in Fig. 1, to the same label of each formation note, only uses Fig. 7 to be described to actions different from the 1st embodiment in each formation.In the 2nd embodiment, there is the process being set above-mentioned mode of rule by learning data.Fig. 7 is the process flow diagram of the action case of the pattern learning of the precedence of adjusting the distance representing setting mode of rule.Fig. 8 and Fig. 9 is the process flow diagram of the action case of the cluster represented in the 2nd embodiment.

In addition, in 1st embodiment, during generating feature duration set, characteristic quantity set generating unit 1 is to each classification, multiple characteristic quantity set for the combination as characteristic quantity calculate discrimination standard value λ, the maximal value characteristic of correspondence duration set of setting and multiple discrimination standard value λ obtained, as characteristic quantity set of all categories.

On the other hand, in 2nd embodiment, characteristic quantity set generating unit 1 is to each classification, for one or more combination or all other classification of other classification, the maximal value of the number of combinations characteristic of correspondence duration set of setting and characteristic quantity respectively, by obtaining multiple discrimination standard value λ like this, each category setting is used for the multiple characteristic quantity set be separated with other classification.

Then, characteristic quantity set generating unit 1 is obtained distance to each characteristic quantity set and is calculated data, corresponding to the identifying information of classification, the distance calculating data of multiple characteristic quantity set and each characteristic quantity set is stored into characteristic quantity set storage part 4.

Then, in the figure 7, when learning data is transfused to, feature amount extraction module 2 utilizes identification signal of all categories, from characteristic quantity set storage part 4, read the multiple characteristic quantity set corresponding with each classification.

Then, feature amount extraction module 2 corresponds to the kind of the characteristic quantity in each characteristic quantity set read, to each classification extraction characteristic quantity from learning data, and correspond respectively to the identifying information of classification, the characteristic quantity of extraction is stored into storage inside portion (step S21) to each characteristic quantity set.

Then, distance calculating part 3 reads mean value avg. (i) corresponding with each characteristic quantity extracted from learning data and standard deviation std. (i) from characteristic quantity set storage part 4 to each characteristic quantity set, by carrying out the computing of described formula (2) by each characteristic quantity normalization, and the characteristic quantity being stored in storage inside portion is replaced to the characteristic quantity after standardization.

Then, distance calculating part 3 generates the matrix V be made up of the key element of the above-mentioned V (i) obtained like that, calculates the transposed matrix V of this matrix V ^t, utilize formula (3), in turn calculate learning data and of all categories between mahalanobis distance, and correspond to identifying information of all categories, storage inside portion (step S22) be stored into each characteristic quantity set.

Then, distance calculating part 3, for the described mahalanobis distance of result of calculation, is multiplied by the correction coefficient λ corresponding with characteristic quantity set ^-(1/2), obtain correction distance, and replace (step S23) with mahalanobis distance respectively.

Then, distance calculating part 3 by storage inside portion and of all categories between correction distance rearrange (being rearranged into the precedence come by less correction distance more) with order from small to large, namely carry out arranging (step S24) with the preceding order of the identifying information of the classification little with the correction distance of object of classification data.

Then, distance calculating part 3 detect with from a little side (above) to the identifying information of classification corresponding to each correction distance of n-th, and the number of the identifying information of each classification comprised in this n is counted, namely carry out ballot process of all categories.

Then, distance calculating part 3 detects the pattern of the counting of the identifying information of all categories of each learning data, the mode of rule common with the learning data that comprises in same classification.

Such as, if when n is 10, when the learning data of classification B, the pattern that classification A is 5 if detect, classification B is 3, classification C is the counting of 2, then using this as regular R1.

In addition, when the learning data of classification C, if detect 3 classification C, even if then classification A is 7, classification B is 0, also must be classification C, if such situation is common, if then classification C be counted as more than 3, so irrelevantly with the counting of other classification get classification C, using this as regular R2.

In addition, when the learning data of classification A, when classification A occupies the pattern of the arrangement of the 1st and the 2nd above, though classification B be counted as 8, also get classification A, using this as regular R3 with the counting of other classification is irrelevant.

As mentioned above, detect the systematicness being categorized into the counting of all categories that other each learning data of same class has, as mode list, inside is stored in advance in the identifying information of each classification.Here, can, to setting of all categories 1 rule, also can set multiple.In addition, in the above description, be hypothesis distance calculating part 3 extracting rule pattern, but user also can in order to change nicety of grading of all categories, the mode of rule of at random setting counting or arrangement.

Different according to classification, sometimes also can be similar in the characteristic of characteristic information to other classification, also can exist by the object pattern of the pattern as the correlativity of multiple classification, counting namely of all categories or the arrangement of precedence above to carry out the classification of object of classification data and the situation making precision higher, present embodiment is supplemented this point.

Then, the clustering processing of the process flow diagram of Fig. 8 to the 2nd embodiment using the rule recorded in above-mentioned list is used to be described.

When object of classification data are transfused to, feature amount extraction module 2 utilizes identification signal of all categories, from characteristic quantity set storage part 4, read the multiple characteristic quantity set corresponding with each classification.

Then, feature amount extraction module 2 corresponds to the kind of the characteristic quantity in each characteristic quantity set read, to each classification extraction characteristic quantity from object of classification data, and correspond respectively to the identifying information of classification, the characteristic quantity of extraction is stored into storage inside portion (step S31) to each characteristic quantity set.

Then, distance calculating part 3 generates the matrix V be made up of the key element of the above-mentioned V (i) obtained like that, calculates the transposed matrix V of this matrix V ^t, utilize formula (3), in turn calculate object of classification data and of all categories between mahalanobis distance, corresponding to identifying information of all categories, storage inside portion (step S32) is stored into each characteristic quantity set.

Then, distance calculating part 3, for the described mahalanobis distance of result of calculation, is multiplied by the correction coefficient λ corresponding with characteristic quantity set ^-(1/2), obtain correction distance, and replace (step S33) with mahalanobis distance respectively.

Then, distance calculating part 3 by storage inside portion and of all categories between correction distance rearrange with order from small to large, namely carry out arranging (step S34) with the preceding order of the identifying information of the classification little with the correction distance of object of classification data.

After rearranging, distance calculating part 3 detect with from a little side (above) to the identifying information of classification corresponding to each correction distance of n-th, and the number of the identifying information of each classification comprised in this n is counted, namely carry out ballot process of all categories.

Then, distance calculating part 3 carries out the pattern (or pattern of arrangement) to the counting of all categories in n before each object of classification data, the control treatment (step S35) whether be present in the list being stored in inside.

Then, if distance calculating part 3 detects that above-mentioned results of comparison is that the mode of rule conformed to the object pattern of object of classification data is when having record in lists, then judge that these object of classification data belong to the classification of the identifying information corresponding with the rule that this conforms to, object of classification Data classification is become this classification (step S36).

In addition, the process flow diagram of Fig. 9 other clustering processing to the 2nd embodiment using the rule recorded in above-mentioned list is used to be described.

In other clustering processing shown in this Fig. 9, process till step S31 ~ step S35 is identical with the process shown in Fig. 8, distance calculating part 3 as described, according to the mode of rule stored in list, carries out the control treatment with the object pattern of object of classification data in step 35.

Then, distance calculating part 3 detects in above-mentioned results of comparison, whether retrieves the mode of rule conformed to above-mentioned object pattern, detect retrieve the mode of rule conformed to time, step S47 is transferred in process, on the other hand, detect do not retrieve the mode of rule conformed to time, process is transferred to step S48 (step S46).

Detect retrieve the mode of rule conformed to time, distance calculating part 3 judges that these object of classification data belong to the classification of the identifying information corresponding with the rule that this conforms to, object of classification Data classification is become this classification, corresponding to the identifying information of the classification of class object, to category database 5, store classified object of classification data (step S47).

On the other hand, detect do not retrieve the mode of rule conformed to time, distance calculating part 3 detects counting, identifying information that namely votes is maximum, and object of classification Data classification is become the classification corresponding with this identifying information.

Then, distance calculating part 3 corresponds to the identifying information of the classification of ownership target, to category database 5, stores classified object of classification data (step S48).

< the 3rd embodiment >

The 2nd above-mentioned embodiment describes: prepare to have the object of classification data after calculating and distance of all categories in the list of the mode of rule from a little side (similarity is large) above in n, utilize whether corresponding with the mode of rule in this list, carry out the clustering processing of each object of classification data, but also can the 3rd embodiment as described below such, to the multiple characteristic quantity set of each category setting, the mahalanobis distance corresponding to the characteristic quantity set with respective carries out computing, calculate correction distance, using classifications many for the correction distance within preceding predetermined precedence as the classification belonging to object of classification data.

Below, the formation of the 3rd embodiment is identical with the shown in Fig. 1 the 1st and the 2nd embodiment, to the same label of each formation note, only uses Figure 10 to be described to actions different from the 2nd embodiment in each formation.In 3rd embodiment, do not set the process of above-mentioned rule by learning data, and directly carry out the step S48 in Fig. 9.Figure 10 is the process flow diagram of the action case of the cluster represented in the 3rd embodiment.

In other clustering processing shown in this Figure 10, process till step S31 ~ step S34 is identical with the process shown in Fig. 8, distance calculating part 3 is as described, in step 34, by in storage inside portion and of all categories between correction distance rearrange with order from small to large, namely carry out arranging (step S34) with the preceding order of the identifying information of the classification little with the correction distance of object of classification data.

Then, distance calculating part 3 detect with from a little side (above) to the identifying information of classification corresponding to each correction distance of n-th, and the number of the identifying information of each classification comprised in this n is counted, namely carry out ballot process (step S55) to of all categories.

Then, distance calculating part 3 detects in voting results, the identifying information of maximum count value (votes), using the classification corresponding with this identifying information as the classification belonging to object of classification data, correspond to the identifying information of the classification of ownership target, to category database 5, store classified object of classification data (step S56).

In addition, user also can preset the threshold value for the votes stopped to each identifying information in distance calculating part 3, when the votes of the identifying information that votes is maximum does not reach this threshold value, does not belong to the process of any classification.

Such as, to classification A, B, when object of classification data are classified by 3 classifications of C, be 5 to the votes of the identifying information of classification A, being 3 to the votes of the identifying information of classification B, is 2 to the votes of the identifying information of classification C, in this case, distance calculating part 3 detects that the identifying information that votes is maximum is classification A.

But using when setting as 6 the above-mentioned threshold value of classification A, because the votes of the identifying information to classification A does not reach threshold value, therefore distance calculating part 3 does not belong to the judgement of any classification.

Thus, in the cluster of classification characteristic quantity and other classification being only had to very little difference, the reliability of the classification process to classification of object of classification data can be improved.

The transform method > of < characteristic quantity

Although expect that the colony of each characteristic quantity is that normal distribution is to carry out cluster, but consider that the kind (area, length etc.) according to characteristic quantity is different, sometimes not normal distribution, but colony has partial velocities, now object of classification data and of all categories between distance calculating, namely judge object of classification data and similarity of all categories time precision reduce.

Therefore, different according to characteristic quantity, need to utilize preordering method to convert the characteristic quantity of colony, make it close to normal distribution to improve the precision of similarity determination.

As the transform method to this normal distribution transform, utilize and comprise log or square root cubic root transform characteristics amount is carried out Deng n th Root or factorial or any one arithmetic expression of function of being obtained by numerical evaluation.

Below, use Figure 11 that the setting process of the transform method of each characteristic quantity is described.Figure 11 is the process flow diagram of the action case of the setting process of the transform method representing each characteristic quantity.In addition, this transform method sets each characteristic quantity unit comprised in each classification classification.In addition, the setting of this transform method uses and belongs to learning data of all categories and carry out.Although following process carries out illustrating as by characteristic quantity set generating unit 1, also the handling part corresponding with this process can be arranged on elsewhere.

Characteristic quantity set generating unit 1 using the identifying information of the classification of object of classification as keyword, from category database 5, read the learning data comprised in this classification, and calculate the characteristic quantity (normalized) (step 61) of each learning data.

Then, characteristic quantity set generating unit 1 uses and is stored in inner any one arithmetic expression of carrying out characteristic quantity conversion, by carrying out computing to the above-mentioned each learning data read, carries out the conversion (step S62) of characteristic quantity.

At the end of the characteristic quantity conversion of whole learning datas, characteristic quantity set generating unit 1 calculates and represents the distribution that obtained by conversion process whether close to the evaluation of estimate (step S63) of normal distribution.

Then, whether characteristic quantity set generating unit 1 carries out to the detection being stored in whole arithmetic expressions that are inner, that namely preset as transform method and calculating evaluation of estimate, when detect calculate by whole arithmetic expression, characteristic quantity is converted after obtain the evaluation of estimate of distribution time, process is made to enter into S65, on the other hand, when detect the characteristic quantity that utilizes whole arithmetic expression calculate not yet at the end of, for carrying out the process of the arithmetic expression of next setting, process is made to turn back to step S62 (step S64).

At the end of utilizing the conversion of the characteristic quantity of whole arithmetic expression, characteristic quantity set generating unit 1 detects distribution that in the distribution obtained by the arithmetic expression set, evaluation of estimate is minimum, namely closest to the distribution of normal distribution, the arithmetic expression used for generating the distribution that detects is determined as transform method, and as the transform method of such other characteristic quantity at inner setting (step S65).

Characteristic quantity set generating unit 1 carries out above-mentioned process to each characteristic quantity of all categories, corresponding to each characteristic quantity setting transform method in respective classification.

Then, the calculating of the evaluation of estimate in above-mentioned steps S63 is described with Figure 12.Figure 12 illustrates the process flow diagram obtaining the action case of the process of the evaluation of estimate of the distribution utilizing arithmetic expression to obtain.

Characteristic quantity set generating unit 1 utilizes the arithmetic expression of setting to convert the characteristic quantity (step S71) of each learning data belonging to object type.

After being converted by the characteristic quantity of whole learning datas, characteristic quantity set generating unit 1 calculates average value mu and the standard deviation (step S72) of the distribution (colony) obtained by the characteristic quantity after this conversion.

Then, characteristic quantity set generating unit 1 uses average value mu and the standard deviation of above-mentioned colony, utilizes (x-μ)/σ to calculate z value (1) (step S73).

Then, characteristic quantity set generating unit 1 calculates the cumulative probability (step S74) in above-mentioned colony.

After calculating, characteristic quantity set generating unit 1 utilizes the cumulative probability in the colony obtained, and calculates the contrafunctional value (step S75) of z value (2) as the cumulative distribution function of standardized normal distribution.

Then, characteristic quantity set generating unit 1 obtains the difference of 2 z values, i.e. z value (1) and the z value (2) of the distribution of characteristic quantity, the error (step S76) of 2 z values namely in distribution.

When obtaining the error of z value, characteristic quantity set generating unit 1 calculates the summation (quadratic sum) of error sum, i.e. this error of above-mentioned 2 z values as evaluation of estimate (step S77).

The error of 2 above-mentioned z values is less, and distribution is more close to normal distribution, if do not have the error of z value, be then normal distribution, on the other hand, distribute away from normal distribution more, then error becomes larger.

Then, use Figure 13 to before the clustering processing of carrying out in the embodiment of the 1st ~ 3rd, the calculating of characteristic quantity of object of classification data be described.Figure 13 is the process flow diagram of the action case calculated of the characteristic quantity data of presentation class object data.

Distance calculating part 3 corresponds to the characteristic quantity set to setting of all categories, from the characteristic quantity of the object of classification extracting data identification object of input, and the normalized illustrated (step S81).

Then, distance calculating part 3 utilizes the transform method (arithmetic expression) for the setting of such other characteristic quantity, the characteristic quantity used is converted (step S82) in the classification of the classification to object of classification in object of classification data.

Then, distance calculating part 3, as described in the 1st ~ 3rd embodiment, calculates and the distance of the classification of object of classification (step S83).

Then, distance calculating part 3 utilizes the transform method corresponding to characteristic quantity setting of all categories, characteristic quantity is converted by whole classifications of object of classification, and the characteristic quantity after whether having utilized this conversion calculates the distance with classification, when detect distance is obtained to whole classifications of object of classification time, make process enter into step S85, on the other hand, when detecting that residue has the classification of object of classification, process is made to turn back to step S82 (step S84).

Then, the process (step S85) in the 1st ~ 3rd each embodiment from distance calculates finish time is started.

Utilize above-mentioned process, in the mahalanobis distance used by present embodiment, during due to distance between obtaining object of classification data and be of all categories, desired character amount is normal distribution, therefore the distribution of each characteristic quantity of colony is more close to normal distribution, then and of all categories between more can obtain distance (similarity) accurately, and can expect to improve the precision of classification of all categories.

Embodiment

< calculated example >

Then, use the clustering system of the above-mentioned the 1st, the 2nd and the 3rd embodiment, confirm the precision of classification that is that utilize the sample data shown in Figure 14 and conventional example.Although sample size is less, although the characteristic quantity of known use is few, conventional example or its above accuracy can be obtained.In this Figure 14, as classification, to kind 1, kind 2 and kind 3 each definition 10 learning datas respectively, each learning data has 8 characteristic quantity a, b, c, d, e, f, g, h.In this example, as shown in Figure 14 belong to learning data of all categories, determine the characteristic quantity set used in cluster, then, the same learning set that uses carries out cluster as object of classification data.

As result of calculation, Figure 15 represents as existing computing method, uses characteristic quantity a and g as the combination of characteristic quantity and result of determination to each learning data computing mahalanobis distance shown in Figure 14 of classification 1 ~ classification 3.In Figure 15 (a), the row of classification 1 are the mahalanobis distances with classification 1, and the row of classification 2 are the mahalanobis distances with classification 2, and the mahalanobis distance with classification 3 is shown in the list of classification 3.In addition, the list of kind shows that the classification belonging to each learning data reality, result of determination represent learning data and the minimum classification of mahalanobis distance.What kind was consistent with the numeral of result of determination is represent correct sorted characteristic quantity data.

In Figure 15 (b), the number table of row shows the classification belonging to learning data reality, and the numbering of row represents the classification be determined.Such as, in 10 classifications of " 8 " expression classification 1 of mark R1,8 are determined as classification 1, and in 10 classifications of " 2 " expression classification 1 of mark R2,2 are determined as classification 3.P0 represents the correct concordance rate separated and answer, and p1 represents the probability of both chance coincidences, and κ is Integral correction judgement rate, utilizes formula below to obtain.This κ is higher, and the precision of presentation class is higher.

κ＝(p0-p1)/(1-p1)

p0＝(a+d)/(a+b+c+d)

p1＝[(a+b)·(a+c)·(b+d)·(c+d)]·(a+b+c+d) ²

A in above formula, relation Figure 16 of b, c, d illustrates.

The number that the data belonging to classification 1 are classified as classification 1 is a, and the number that the data belonging to classification 1 are classified as classification 2 is that b, a+b represent the data amount check belonging to classification 1.In addition, similarly, the number that the data belonging to classification 2 are classified as classification 2 is d, and the number that the data belonging to classification 2 are classified as classification 1 is that c, c+d represent the data amount check belonging to classification 2.A+c is the number being classified into classification 1 in total data a+b+c+d, and b+d is the number being classified into classification 2 in total data a+b+c+d.

Then, Figure 17 represents the computing method of use the 1st embodiment, the result of determination to each learning data computing mahalanobis distance shown in Figure 14 of classification 1 ~ classification 3.For the view of this Figure 17 (a) and (b), due to identical with Figure 15, therefore the description thereof will be omitted.The Probability p 1 of known correct solution rate p0, chance coincidence, the existing computing method of Integral correction judgement rate κ and Figure 15 are identical.Here, use from the combination of above-mentioned entirety, select to have the method for the combination of maximum discrimination standard value λ to each classification, and to calculate and characteristic of correspondence duration set of all categories.Use the combination of characteristic quantity a and h as with classification 1 characteristic of correspondence duration set, use the combination of characteristic quantity a and d as with classification 2 characteristic of correspondence duration set, use the combination of characteristic quantity a and g as with classification 3 characteristic of correspondence duration set.

Then, Figure 18 represents the computing method of use the 2nd embodiment, the result of determination to each learning data computing mahalanobis distance shown in Figure 14 of classification 1 ~ classification 3.For the view of Figure 18 (a) and (b), due to identical with Figure 15, therefore the description thereof will be omitted.Correct solution rate p0 is 0.8333, and the Probability p 1 of chance coincidence is 0.3333, and Integral correction judges that rate κ is as 0.75, known compared with the existing computing method of Figure 15, and nicety of grading improves.Here, use from the combination of above-mentioned entirety, each classification is selected to the method for the combination of the discrimination standard value λ had till first 3, and to calculate and characteristic of correspondence duration set of all categories.Use 3 combinations of characteristic quantity ah, ag, de as with classification 1 characteristic of correspondence duration set, use 3 combinations of characteristic quantity af, ad, ab as with classification 2 characteristic of correspondence duration set, use 3 combinations of characteristic quantity eg, ac, ag as with classification 3 characteristic of correspondence duration set.

In addition, as the judgement of ballot, be arranged in order from mahalanobis distance is little, calculate the number of the classification entering front 3 from little, using classifications maximum for number as the classification belonging to these object of classification data.

Then, Figure 19 represent use the 2nd embodiment computing method, to the mahalanobis distance of result of calculation, correction coefficient (λ) is multiplied by further to each learning data computing mahalanobis distance shown in Figure 14 of classification 1 ~ classification 3 ^-(1/2)the result of determination of the ranking of laggard row distance.For the view of Figure 19 (a) and (b), due to identical with Figure 15, therefore the description thereof will be omitted.Correct solution rate p0 is 0.8333, and the Probability p 1 of chance coincidence is 0.3333, and Integral correction judges that rate κ is as 0.75, known compared with the existing computing method of Figure 15, and nicety of grading improves.Here, use from the combination of above-mentioned entirety, each classification is selected to the method for the combination of the discrimination standard value λ had till first 3, and to calculate and characteristic of correspondence duration set of all categories.Use 3 combinations of characteristic quantity ah, ag, de as with classification 1 characteristic of correspondence duration set, use 3 combinations of characteristic quantity af, ad, ab as with classification 2 characteristic of correspondence duration set, use 3 combinations of characteristic quantity eg, ac, ag as with classification 3 characteristic of correspondence duration set.

From each classification results shown in above-mentioned Figure 15,17,18,19, present embodiment, compared with conventional example, carries out high speed and high-precision clustering processing, can confirm the superiority of present embodiment relative to conventional example.

< application examples > of the present invention

A. testing fixture

As shown in figure 20, the testing fixture (defect detecting device) of classifying to the kind of the scar of checked property, such as glass baseplate surface is described.Figure 21 is the process flow diagram of the action case of the selection of characterization duration set, and Figure 22 is the process flow diagram of the action case illustrated in clustering processing.

First, the action of the selection of characterization duration set.The collection of the learning data in the step S1 of the process flow diagram of Fig. 5 corresponds to the step S101 of the process flow diagram of Figure 21 to step S105.

The step S2 to step S4 of Figure 21, due to identical with the process flow diagram of Fig. 5, therefore omits the description.

By the operation of operator, gather the sample (step S101) of the learning data corresponding respectively with the classification wanting the kind of scar to carry out classifying.

Image acquiring section 101 utilizes lighting device 102 to irradiate the shape of carrying out the scar gathered as learning data, utilizes camera head 103 to obtain the view data (step S102) of scar part.

Then, from the view data that image acquiring section 101 obtains, calculate the characteristic quantity (step S103) of the scar of each learning data.

The characteristic quantity of the learning data obtained is distributed respectively to by the visual class object obtained, carries out the determination (step S104) of learning data of all categories.

Then, repeat the process from step S101 to step 102, until learning data of all categories reach predetermined number (pre-set number of samples), such as each about 30, when reaching predetermined number, cluster portion 105 carries out the later process of step S2 that Fig. 5 illustrated.Here, cluster portion 105 is the clustering systems in the 1st or the 2nd embodiment.

Then, with reference to the clustering processing in the testing fixture of Figure 22 key diagram 4.Here, step S31 to step S34, S55 and S56 of Figure 22, due to identical with the process flow diagram of Figure 10, therefore omit the description.

In the testing fixture of Figure 20, when checking beginning, lighting device 102 throws light on to the glass substrate as checked property 100, and camera head 103 is taken glass baseplate surface and this shooting image is outputted to image acquiring section 101.Thus, when defect candidate test section 104 detects parts different from flat shape in the shooting image inputted from image acquiring section 101, it can be used as the defect candidate (step S201) that should classify.

Then, defect candidate test section 104 takes out the view data of the part of this defect candidate from shooting image, as object of classification data.

Then, defect candidate test section 104 calculates characteristic quantity by the view data of object of classification data, cluster portion 105 is exported to the object of classification data (step S202) be made up of the set of the characteristic quantity extracted.

For clustering processing afterwards, illustrate due to existing in the step of Figure 10, therefore omit.As mentioned above, testing fixture of the present invention can by glass substrate with scar classify accurately by each kind of scar.

B. defect kind judging device

The cluster portion 105 of the defect kind judging device shown in Figure 23 corresponds to the clustering system of the present invention illustrated.

Image capturing device 201 is made up of the image acquiring section 101 in Figure 20, lighting device 102 and camera head 103.

Obtain the learning data of all categories object of classification data being carried out the target of classifying, and prepared in the category database 5 of clustering apparatus 105.Thus, the selection of the characteristic quantity set in Fig. 5 also terminates.

From the shooting image inputted by the image capturing device 202 being arranged on each process units, detect defect candidate, take out this view data, and extract characteristic quantity and be input to data collector 203.Control device 200 will be input to the object of classification data transfer of data collector 203 to cluster portion 105.Then, as already explained, of all categoriesly to classify pair corresponding with the kind of scar for the object of classification data of input in cluster portion 105.

C. production management device

Production management device of the present invention as shown in figure 24, by control device 300; Process units 301,302; Notification unit 303; Storage part 304; Not good device detection unit 305 and defect kind judging device 306 are formed.Here, defect kind judging device 306 is identical with the defect kind judging device illustrated in above-mentioned B item.

Defect kind judging device 306 is by from the shooting image of image capturing device 201,202 being separately positioned on process units 301 and process units 302, in the defect candidate test section 104 of correspondence, carry out image procossing and extract characteristic quantity, carrying out the classification of object of classification data.

Then, not good device detection unit 305 has the list representing the identifying information of classified classification and the relation of the occurrence cause corresponding with this classification, from described list, read the occurrence cause corresponding with the identifying information of the classification of the class object inputted from described defect kind judging device 306, be determined to be the process units of occurrence cause.That is, not good device detection unit 305 corresponds to the identifying information of classification, detects the occurrence cause of the defect in the production run of product.

Then, not good device detection unit 305 is from notification unit 303 to operator notification, and correspond to the date judged, the identifying information of the identiflication number of the classification after defect is classified, occurrence cause and this process units is stored into storage part 304 as historical record.In addition, the process units that control device 300 makes not good device detection unit 305 judge stops, or controls controling parameters.

D. production management device

Another production management device of the present invention as shown in figure 25, by control device 300; Process units 301,302; Notification unit 303; Storage part 304 and cluster portion 105 are formed.Here, cluster portion 105 is identical with the formation illustrated in above-mentioned A, B item.

In cluster portion 105, different from the situation of above-mentioned A ~ C, the characteristic of object of classification data is the characteristic quantities according to being made up of the working condition (composition, treatment temperature, pressure, processing speed etc. of material) in the production run of industrial products, such as glass substrate, classifies by the production status of each operation of production run.The process information that described characteristic quantity detects as the sensor being arranged on each process units 301 or 302 is input to cluster portion 105 as characteristic quantity.

Namely, the production status of the Improving Glass Manufacturing Processes in each operation of each process units, according to the characteristic quantity of above-mentioned object of classification data, is categorized into the classification such as " normal condition ", " easily producing the state that defect needs adjustment ", " the dangerous state needing adjustment " by cluster portion 105.Then, cluster portion 105 utilizes notification unit 303 to the above-mentioned classification results of operator notification, and the identifying information of the classification of classification results is outputted to control device 300, in addition, correspond to the date judged, using the identiflication number of the classification after the production status of above-mentioned each operation is classified, be stored in storage part 304 as the working condition of characteristic quantity and the identifying information of this process units that become most problem as historical record.

Control device 300 has other identifying information of representation class and working condition is recovered the list of corresponding relation of normal adjusted iterm and data thereof, read corresponding with the identifying information of the classification inputted from cluster portion 105, working condition is recovered normal adjusted iterm and data thereof, utilize the process units that the Data Control that reads is corresponding.

In addition, also the program of the function being used for the clustering system realized in Fig. 1 can be stored into computer read/write memory medium, read in by making computer system the program being stored in this storage medium and perform, carry out the clustering processing of object of classification data.In addition, here so-called " computer system ", refer to and comprise the hardware such as OS and peripheral equipment.In addition, " computer system " also comprises and has the WWW system that homepage provides environment (or display environment).In addition, so-called " computer read/write memory medium ", refers to the removable medium such as floppy disk, photomagneto disk, ROM, CD-ROM, is built in the memory storages such as the hard disk of computer system.There is what is called " computer read/write memory medium " again, also comprise as by the volatile memory (RAM) becoming the inside computer system of server or client computer during the communication line such as network or the telephone line transmission program such as internet, keep program at certain hour medium.

In addition, said procedure also can from the computer system storing this program memory storage etc. by transmission medium or utilize transmission carrier-wave transmission in transmission medium to other computer system.Here, " transmission medium " of transmission procedure refers to the medium with the function of transmission information as the communication lines (order wire) such as the networks such as internet (communication network) or telephone line.In addition, said procedure also can be the program of the part for realizing above-mentioned functions.Also by the combination with the program stored in computer systems, which to realize above-mentioned functions, i.e. so-called differential file (difference program).

Industrial practicality

The present invention can be applicable to the field information with various features amount being carried out with high precision the defects detection etc. as glass article etc. classifying and differentiating, also can be used on further in production status pick-up unit or production management devices.

In addition, quote the full content of the instructions of Japanese patent application 2006-186628 of application on July 6th, 2006, the scope of claim, accompanying drawing and specification digest here, the disclosure as instructions of the present invention adopts.

Claims

1. a clustering system, utilize the characteristic quantity that has of input data this input Data classification is become by the colony of learning data formed of all categories, it is characterized in that, comprise:

Characteristic quantity set storage part, this characteristic quantity set storage part store with described of all categories corresponding, classify in the characteristic quantity set of combining as characteristic quantity that uses;

Feature amount extraction module, the characteristic quantity that this feature amount extraction module presets from input extracting data;

Distance calculating part, this distance calculating part characteristic of correspondence duration set each and of all categories is calculated respectively according to the characteristic quantity comprised in this characteristic quantity set and distance between the vector that forms from the characteristic quantity that described input data are extracted of the vector that forms of the mean value exporting each characteristic quantity of learning data of all categories as aggregate distance; And

Precedence extraction unit, described each aggregate distance arranges with order from small to large by this precedence extraction unit,

Wherein to the multiple described characteristic quantity set of each category setting,

And also there is category classification portion, in the described aggregate distance that this classification division obtains in each characteristic quantity set, utilize the mode of rule of the expression set by the precedence of this aggregate distance to the classification benchmark of all categories of input data to detect described input data and belong to which classification.

2. clustering system as claimed in claim 1, is characterized in that,

Described category classification portion utilizes the precedence of described aggregate distance to belong to which classification to detect described input data, detects that the more classification of the preceding aggregate distance of this precedence is as the classification belonging to described input data.

3. clustering system as claimed in claim 2, is characterized in that,

Described category classification portion has the threshold value for the preceding number of precedence, when preceding classification is more than this threshold value, detects as the classification belonging to input data.

4. the clustering system as described in any one of claims 1 to 3, is characterized in that,

Described distance calculating part is multiplied by corresponding to the correction coefficient set by characteristic quantity set described aggregate distance, and by the aggregate distance standardization between each characteristic quantity set.

5. the clustering system as described in any one of claims 1 to 3, is characterized in that,

Also there is the characteristic quantity set generating unit of the characteristic quantity set generating each classification,

Described characteristic quantity set generating unit is to each combination of multiple combinations of each characteristic quantity, with the mean value of the learning data of colony of all categories for initial point, the mean value of the distance between each learning data obtaining the colony of this initial point and other classification, select the combination of the maximum characteristic quantity of mean value, as identifying characteristic quantity set that is of all categories and other classification.

6. clustering system as claimed in claim 4, is characterized in that,

7. a defect kind judging device, is characterized in that,

Be provided with clustering system according to claim 1,

Described input data are view data of product defects, utilize the characteristic quantity representing defect, the kind of the defect in view data by defect are classified.

8. defect kind judging device as claimed in claim 7, is characterized in that,

Described product is glass article, the kind of the defect of this glass article by defect is classified.

9. a defect detecting device, is characterized in that,

The defect kind judging device be provided with described in claim 7 or 8 carrys out the kind of testing product defect.

10. a production status decision maker, is characterized in that,

Be provided with the defect of the defect kind judging device described in claim 7 or 8 to product to classify, detect the occurrence cause of the defect in production run according to the corresponding relation between this kind and the occurrence cause corresponding to this kind.

11. 1 kinds of production status decision makers, is characterized in that,

Be provided with clustering system according to claim 1,

Described input data are the characteristic quantities of the working condition represented in process of producing product, are classified by this characteristic quantity by the production status in each operation of production run.

12. production status decision makers as claimed in claim 11, is characterized in that,

Described product is glass article, is classified by the characteristic quantity in the production run of this glass article by the production status in each operation of production run.

13. 1 kinds of production status pick-up units, is characterized in that,

Be provided with the kind of the production status decision maker described in claim 11 or 12 to the production status in each operation of process of producing product to detect.

14. 1 kinds of production management devices, is characterized in that,

Be provided with the kind of the production status decision maker described in claim 11 or 12 to the production status in each operation of process of producing product to detect, and carry out the process control in the operation of production run according to the item controlled corresponding to this kind.