CN113538423B

CN113538423B - Industrial part defect detection interval clustering method based on combined optimization algorithm

Info

Publication number: CN113538423B
Application number: CN202111078182.0A
Authority: CN
Inventors: 邱增帅; 王罡; 侯大为; 潘正颐
Original assignee: Changzhou Weiyizhi Technology Co Ltd
Current assignee: Changzhou Weiyizhi Technology Co Ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2022-01-07
Anticipated expiration: 2041-09-15
Also published as: CN113538423A

Abstract

The invention discloses a clustering method of industrial part defect detection intervals based on a combined optimization algorithm, which comprises the following specific steps: step 1, collecting data; step 2, data cleaning; step 3, balancing data distribution; step 4, feature selection; step 5, selecting positive sample data points, setting interval combinations, gradually contracting intervals for optimization, and generating rules; and 6, removing the data in the rule from the data set, and repeating the step 5 for the rest data until all positive samples are selected by the rule to obtain a series of rule descriptions, and finishing the combinatorial optimization approximation algorithm. The method performs combined optimized clustering distinction of positive and negative samples on each optical surface with different defects of the industrial parts, and has certain robustness so as to ensure accurate detection and division of the defects of multiple items.

Description

Industrial part defect detection interval clustering method based on combined optimization algorithm

Technical Field

The invention relates to the technical field of image data processing, in particular to an industrial part defect detection interval clustering method based on a combined optimization algorithm.

Background

At present, most of image data processing-based methods select physical quantity intervals for clustering according to experience, and the difference of physical quantity weight, optical surface and defect type influences the accuracy of positive and negative sample division, so that the method has many limitations. Most obviously, the length and width physical quantity of the linear defect is heavier, and the area physical quantity is not considered; the block defect is a defect with a large area physical weight, and the length and width physical quantities are not considered. This results in a partial interval combination that is not the preferred result. Meanwhile, the optical surfaces with the same defect are different, so that the combination of the setting intervals becomes complicated. However, in order to accurately perform industrial data analysis, accurate positive and negative sample divisions of the workpiece must be found.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in order to solve the problems existing in the background technology, a clustering method for the defect detection interval of the industrial parts based on a combined optimization algorithm is provided, positive and negative sample combined optimization clustering differentiation is carried out on each optical surface of different defects of the industrial parts, and certain robustness is achieved so as to ensure that the defect accurate detection and division of multiple items can be obtained.

The technical scheme adopted by the invention for solving the technical problems is as follows: a method for clustering industrial part defect detection intervals based on a combined optimization algorithm comprises the following specific steps:

step 1, data acquisition: shooting a workpiece picture by an equipment machine, reading contour points in an original picture, and finishing data acquisition;

step 2, data cleaning: carrying out data consistency check, data missing value processing and data abnormal value processing;

step 3, balancing data distribution: because the variable data are distributed in an unbalanced manner, the number of positive samples is extremely small, the number of negative samples is extremely large, the data are balanced by an oversampling method in consideration of the particularity of the data, the positive sample data are randomly copied, and the number of the positive samples is expanded to the number of the negative samples;

step 4, feature selection: using a filtering method to perform feature selection on the expanded data, and using the variance as a feature scoring standard; before selection from data before expansion

The characteristics with the largest contribution degree form a new data set, the new data set is subjected to combined optimization,

number of physical quantities after feature selection, 1 ≦

Not more than the total physical quantity of data, an

Is a positive integer;

and 5, selecting positive sample data points, setting interval combinations, gradually contracting the intervals for optimization, and generating rules, wherein the specific steps are as follows:

and 5.1, selecting a positive sample data point, and setting an interval combination: firstly, randomly selecting a positive sample data point from a data set after feature selection

Then, forming an interval combination by taking the maximum value and the minimum value of each physical quantity in the data set as interval boundaries;

and 5.2, optimizing the gradual shrinkage interval to generate a rule: then turn on the point

Under the condition in the interval combination, the interval combination is contracted, the negative samples are filtered, and the interval combination is set as a rule until the proportion of the negative samples to the positive samples in the interval combination is less than or equal to 1:3 and the number of the positive samples is the maximum;

and 6, removing the data in the rule from the data set, repeating the step 5 on the rest data until all positive samples are selected by the rule to obtain a series of rule descriptions, finishing the combinatorial optimization approximation algorithm, namely removing the data in the rule from the data set, and repeating the step 5 on the rest data until no positive sample exists in the data to obtain a group of rule descriptions for performing optimal division on the positive and negative samples.

Further specifically, in the above technical solution, in the 5.2 step of the 5 th step, if the ratio of the negative samples to the positive samples in the interval combination is less than or equal to 1:3 and the number of the positive samples is the largest, the interval combination is a local optimization rule, and data in the selected rule is removed from the dataset; if the ratio of negative samples to positive samples in the interval combination is not less than or equal to 1:3 and the number of positive samples is not the maximum, repeat step 5.2 to point

Under conditions within the interval combination, the interval combination is contracted and the negative sample is filtered.

Further specifically, in the above technical solution, in the 5.2 nd step of the 5 th step, if the ratio of the negative samples to the positive samples in the interval combination is not less than 1:3 and the number of the positive samples is the largest, the 5.2 th step is repeated; if the ratio of the negative samples to the positive samples in the interval combination is less than or equal to 1:3 and the number of the positive samples is not the maximum, the step 5.2 is repeated.

Further specifically, in the above technical solution, in the step 6, after the complete algorithm flow is finished, a series of rule descriptions are generated and implemented, and if there is a new data set, the new data set contains a positive sample and does not conform to the existing generated rule, the new data set is placed into the algorithm to repeat the step 5; if the new data set has no positive samples, a series of rule descriptions are obtained, and the combined optimization approximation algorithm is ended.

More specifically, in the above technical solution, in the 4 th step, the variance calculation formula of the characteristic physical quantity is as follows:

(1)

wherein,

a variance representing a characteristic physical quantity;

representing physical quantities of points

Average of (d);

a value representing the physical quantity on each piece of data;

representing the total number of samples in the data set containing positive and negative samples.

The invention has the beneficial effects that: the invention relates to a method for clustering defect detection intervals of industrial parts based on a combinatorial optimization algorithm, which reduces the number of rules by screening the characteristics of defect physical quantities, and uses combinatorial optimization approximation to carry out sample division on data, so that the number of positive samples in the rules is large, and the number of negative samples is maintained within the relative proportion, thus obtaining a series of rule descriptions of the combinatorial optimization approximation of the positive and negative samples under the optical surface of the defect; the clustering method can be used for clustering and distinguishing positive and negative samples of various optical surfaces with different defects of industrial parts, and meanwhile, interval rules have certain robustness, so that adverse factors with inconsistent defect physical quantity descriptions caused by illumination conditions, workpiece materials, workpiece shapes and the like are overcome, and the defect accurate detection and division of multiple items are completed.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is an industrial camera capturing artwork;

FIG. 2 is a defect distribution graph;

FIG. 3 is a diagram of a defect area and minimum average luminance distribution;

FIG. 4 is a flow chart of a combinatorial optimization approach algorithm;

FIG. 5 is a diagram of a defect area and minimum average luminance rule division;

FIG. 6 is a graph of defect area versus minimum average luminance rule partition approximation;

FIG. 7 is a graph of defect area versus minimum average luminance rule partitioning for local optimality;

FIG. 8 is an algorithm flow diagram of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 4 and 8, the industrial part defect detection interval clustering method based on the combinatorial optimization algorithm of the invention specifically comprises the following steps:

step 1, data acquisition: the equipment machine takes a picture of the workpiece, reads the contour points (pixel coordinates) in the original picture, and finishes the data acquisition work. The equipment machine can be electronic 3C type surface defect appearance detection equipment. The workpiece is an electronic 3C workpiece, such as a mobile phone shell, a notebook shell, a mobile phone accessory and the like.

Step 2, data cleaning: carrying out data consistency check, data missing value processing and data abnormal value processing; the consistency check is to check whether or not there is data having a maximum value or a minimum value, etc., different from most of the values of the physical quantities, among the data. Data missing value processing, that is, when a missing value exists in a certain piece of data, the piece of data is deleted. And (3) data abnormal value processing, namely, deleting the data when the numerical value of the data under a certain physical quantity or a plurality of physical quantities exceeds the value range of the physical quantity.

Step 3, balancing data distribution: because the variable data are unbalanced in category distribution, the number of positive samples is extremely small, the number of negative samples is extremely large, the data are balanced by an oversampling method in consideration of the particularity of the data, and the data are constant data and are real data with practical significance. Randomly copying positive sample data, and expanding the number of the positive samples to the number of the negative samples; the invention is mainly used for defect detection, defaults that the defect data is positive sample data, and all non-defect data are negative sample data. The data is industrial real data, all positive samples are defect data, and the data distribution is balanced on the premise of not missing the positive sample data; and (4) balancing by using an oversampling method, respectively counting the number of the positive samples and the number of the negative samples, randomly extracting the positive samples and the negative samples, copying the positive samples into the positive samples, stopping until the number of the positive samples is the same as that of the negative samples, and finishing data balancing.

Step 4, feature selection: selecting features of the expanded data by using a filtering method, taking variance as a feature scoring standard (the greater the variance value difference of the features, the greater the contribution degree of the features to distinguishing samples), and selecting the pre-expanded data from the pre-expanded data

The characteristics with the largest contribution degree are subjected to combined optimization,

number of physical quantities after feature selection, 1 ≦

Not more than the total physical quantity of data, an

Is a positive integer. The filtration method comprises the following specific steps: calculating the variance of each group by using the physical quantities as the groups by using the data after data balance (if there are 50 data of 100 positive and negative samples, 12 physical quantities, 12 groups of data exist when the physical quantities are used as the groups, each group has 100 values of the physical quantities to obtain 12 variances), using the variance as the scoring standard of the characteristic weight, using the physical quantities with large variance as the characteristic weight, namely, having high characteristic weight, namely, having large contribution degree, using the physical quantities with small variance and low characteristic weight, and selecting the former physical quantities with high characteristic weight

The physical quantities are used as physical quantities after "feature selection", and the following steps of the combinatorial optimization approximation algorithm are performed using these physical quantities.

The variance calculation formula of the characteristic physical quantity is as follows:

(1)

wherein,

a variance representing a characteristic physical quantity;

representing physical quantities of points

Average of (d);

a value representing the physical quantity on each piece of data;

representing the total number of samples in the data set containing positive and negative samples. Here, the physical quantity is a characteristic, and 12 physical quantities are taken as an example and are represented by letters A, B, C, etc., and 100 pieces of data of positive and negative samples (50 pieces of positive and negative samples, respectively) are represented

100, there is data A in the physical quantity A group₁To A₁₀₀For the 100 data, the variance of the 100 data is calculated as the variance of the physical quantity a, and the rest of the physical quantities are also applicable.

Then, a section combination is formed with the maximum value and the minimum value of each physical quantity in the data set as section boundaries (assuming that after the feature selection, A, B, C three physical quantities are retained in the data set and A, B, C maximum value and minimum value are [0,60 ] respectively],[0.5,12.2],[802,7034]Random selection ofPositive sample data point of

(20,3.1,5000) is within the range, and the interval combination 1 is shown in table 1, and the number of positive and negative samples in the interval combination is the total number of positive and negative samples in the data set); this is a preliminarily formed combination of intervals, which is the maximum and minimum values of the physical quantity in the entire data set, for example, the minimum value of the physical quantity a in the data set is 0, the maximum value is 60, and the value of the physical quantity a is [0,60 ] at any point in the data set]The other physical quantities are equivalent to each other.

Under the condition of interval combination, gradually shrinking interval combination, filtering negative samples until the proportion of the negative samples to the positive samples in the interval combination is less than or equal to a certain proportion and the number of the positive samples is maximum, and combining the interval (shrinking interval combination, positive sample point when filtering the negative samples, negative sample point

(20,3.1,5000) is always included in the interval combination, the interval combination 2 after contraction is shown in table 1, the proportion of positive and negative samples in the interval combination at this time meets the requirement, and the interval combination at this time is a local optimization rule) is set as a rule; the calculation formula of the contraction step for each physical quantity interval is as follows:

（2）

wherein,

representing the contraction steps of each physical quantity interval;

represents the maximum value of each physical quantity;

represents the minimum value of each physical quantity;

representing the total number of samples in the data set containing positive and negative samples. For example, the number of all samples in the data set is 1000, and the contraction step of the physical quantity A is

= 0.06. The contraction method comprises (0 +0.06 × U) and (60-0.06 × V), wherein U and V are contraction step coefficients, U is a positive integer greater than or equal to 1, V is a positive integer greater than or equal to 1, and U + V is less than or equal to 1000. (0 +0.06 × U) indicates that the lower boundary of the physical quantity a gradually shrinks inward by two units starting from the minimum value of the physical quantity a, and when U =2, the lower boundary of the physical quantity a is converted from the minimum value 0 to (0 +0.06 × 2) =0.12, and the shrinkage is 0.12. (60-0.06 × V) indicates a gradual inward contraction starting from the maximum value of the physical quantity a, and when V =1 indicates that the upper boundary of the physical quantity a has contracted inward by one unit, the upper boundary of the physical quantity a is transformed from the maximum value 60 to (60-0.06 × 1) =59.94, and the contraction is 0.06. To be provided with

The points (20,3.1,5000) are always included in the interval combination, and the interval contraction method of other physical quantities is the same as the principle of increasing U, V gradually. Until the positive/negative sample ratio in the interval is greater than or equal to 3: and stopping at 1 time to generate a local optimal interval combination.

If the proportion of the negative samples to the positive samples in the interval combination is less than or equal to a certain proportion and the number of the positive samples is the largest, the interval combination is a local optimization rule, and data in the selected rule are removed from the data set; if the proportion of the negative samples to the positive samples in the interval combination is not less than or equal to a certain proportion and the number of the positive samples is not the maximum, repeating the step 5.2 to point

Under conditions within the interval combination, the interval combination is contracted and the negative sample is filtered. If the proportion of the negative samples to the positive samples in the interval combination is not less than or equal to a certain proportion and the number of the positive samples is the largest, repeating the step 5.2; and if the proportion of the negative samples to the positive samples in the interval combination is less than or equal to a certain proportion and the number of the positive samples is not the maximum, repeating the step 5.2. It should be noted that, for a certain ratio, for example: continuously filtering the negative samples for d times, wherein d is a positive integer greater than or equal to 1, until the d +1 th time of filtering, the positive samples are filtered, and the proportion of the positive samples to the negative samples is less than or equal to 3: 1, stopping (normally adopting the ratio of 3: 1 depending on the setting requirement), generating a rule, and taking the interval combination value of the d-th time according to the rule.

TABLE 1

Section combination 1	0≤A_MLess than or equal to 60; and 0.5 is less than or equal to B_MLess than or equal to 12.2; and 802 is equal to or more than C_M≤7034
		Section combination 2	14.7≤A_MLess than or equal to 55; and 0.5 is less than or equal to B_MLess than or equal to 6.3; and 4000. ltoreq.C_M≤7034

Wherein, A in Table 1_MIndicating points

A value on the physical quantity a; b is_MIndicating points

A value on the physical quantity B; c_MIndicating points

The value on the physical quantity C.

And 6, removing the data in the rule from the data set, repeating the step 5 on the rest data until all positive samples are selected by the rule to obtain a series of rule descriptions, finishing the combinatorial optimization approximation algorithm (the combinatorial optimization approximation algorithm is mainly in logic traversal, namely, judging whether the positive samples meet the requirements one by one), removing the data in the rule from the data set, repeating the step 5 on the rest data until no positive samples exist in the data, and obtaining a group of rule descriptions to perform optimal division on the positive samples and the negative samples. After the complete algorithm flow is finished, a series of rule descriptions are generated and implemented, if a new data set exists, the new data set contains positive samples and does not accord with the existing generated rules, the new data set is put into the algorithm, and the step 5 is repeated; if the new data set has no positive samples, a series of rule descriptions are obtained, and the combined optimization approximation algorithm is ended. After the interval combination 1 (shown in table 1) is obtained, the data corresponding to the interval combination 1 is removed from the data set, and the rest data generates an interval combination 2 (shown in table 1), and so on, so that the effect that the data corresponding to the previously generated interval combination influences the subsequent interval combination can be avoided. A series of composition rule descriptions may be: the combination of the sections 1U section is combined with the section 2U section is combined with the section 3U

Wherein

∈[1,∞]And is

Is a positive integer; the symbol u indicates that the relation between each interval combination is or, that is, a positive sample and a negative sample are preferably divided when the rule of any interval combination is met. A series of composition rule descriptions may also be such that: (A)₁∩B₁∩C₁）∪（A₂∩B₂∩C₂）∪（A₃∩B₃∩C₃）∪……∪（A_g∩B_g∩C_g) Wherein

∈[1,∞]And is

Is a positive integer; the symbol @ indicates that the relationship between the respective section combinations is or, and the symbol @indicatesthat the relationship between the physical quantities within the respective section combinations is and.

If the data in the new data set does not conform to the existing series of combination rules (5 pieces), the new data set is put into the algorithm, the steps 5 and 6 are repeated until no positive sample exists in the new data set, and the newly generated 2 pieces of rules are combined with the existing series of combination rules (5 pieces) to form a new combination rule which contains 7 pieces of rules.

See fig. 1, which is an original image captured by an industrial camera, wherein black dots indicate positive samples and gray dots indicate negative samples. From this figure, it can be seen that the shape of the workpiece and the defect data need to be extracted by reading the information such as contour pixels.

FIG. 2 is a defect distribution graph; the approximate location of the defect distribution can be seen from this figure.

See fig. 3, which is a distribution of defect area and minimum average brightness, from which an approximate distribution of defect area and minimum average brightness can be seen.

See fig. 5, which is a diagram of the regular division of the defect area and the minimum average brightness, and the black line frame area is the initial interval combination range and takes points

Under the condition in the interval combination, the number of positive samples in the interval is 430, and the number of negative samples in the interval is more than 25%, wherein 25% of the number of positive samples in the interval is 25%, and the total number of negative samples in the interval is the sum of the number of positive samples and the number of negative samples.

See FIG. 6, which is a plot of defect area versus minimum average luminance rule partition approximationThe number of positive samples is 62; the number of negative examples is 165; the black line frame area is the interval combination range in the approximate optimization, and points are used

Under conditions within the interval combinations.

See fig. 7, which is a local optimum diagram divided by the defect area and minimum average brightness rule, and the black line frame region is an interval combination rule approaching the optimum local optimum and takes points

Under the condition in the interval combination, the number of the positive samples in the optimized interval is 3, the number of the negative samples in the interval is 1 (25%), and 25% of the number of the negative samples accounts for 25% of the total number of the samples in the interval.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims

1. A method for clustering industrial part defect detection intervals based on a combined optimization algorithm is characterized by comprising the following specific steps:

step 4, feature selection: make itCarrying out feature selection on the expanded data by using a filtering method, and taking the variance as a feature scoring standard; before selection from data before expansion

number of physical quantities after feature selection, 1 ≦

Not more than the total physical quantity of data, an

Is a positive integer;

2. The industrial part defect detection interval clustering method based on the combinatorial optimization algorithm according to claim 1, characterized in that: in the 5.2 th step of the 5 th step, if the proportion of the negative samples to the positive samples in the interval combination is less than or equal to 1:3 and the number of the positive samples is the largest, the interval combination is a local optimization rule, and data in the selected rule is removed from the data set; if the ratio of negative samples to positive samples in the interval combination is not less than or equal to 1:3 and the number of positive samples is not the maximum, repeat step 5.2 to point

3. The industrial part defect detection interval clustering method based on the combinatorial optimization algorithm according to claim 1, characterized in that: in the 5.2 th step of the 5 th step, if the proportion of the negative samples to the positive samples in the interval combination is not less than or equal to 1:3 and the number of the positive samples is the largest, repeating the 5.2 th step; if the ratio of the negative samples to the positive samples in the interval combination is less than or equal to 1:3 and the number of the positive samples is not the maximum, the step 5.2 is repeated.

4. The industrial part defect detection interval clustering method based on the combinatorial optimization algorithm according to claim 1, characterized in that: in the 6 th step, after the complete algorithm flow is finished, a series of rule descriptions are generated and implemented, if a new data set exists, the new data set contains positive samples and does not accord with the existing generated rules, the new data set is put into the algorithm, and the 5 th step is repeated; if the new data set has no positive samples, a series of rule descriptions are obtained, and the combined optimization approximation algorithm is ended.

5. The industrial part defect detection interval clustering method based on the combinatorial optimization algorithm according to claim 1, characterized in that: in the 4 th step, the variance calculation formula of the characteristic physical quantity is as follows:

(1)

wherein,

a variance representing a characteristic physical quantity;

representing physical quantities of points

Average of (d);

a value representing the physical quantity on each piece of data;