CN107886539B

CN107886539B - High-precision gear visual detection method in industrial scene

Info

Publication number: CN107886539B
Application number: CN201710974598.8A
Authority: CN
Inventors: 张印辉; 田敏; 王森; 何自芬
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2017-10-19
Filing date: 2017-10-19
Publication date: 2021-05-14
Anticipated expiration: 2037-10-19
Also published as: CN107886539A

Abstract

The invention relates to a high-precision gear visual detection method in an industrial scene, and belongs to the field of target detection of machine learning technology. Firstly, acquiring a positive sample image with a gear target and a negative sample image without the gear target, carrying out bounding box labeling on the gear target, and dividing the gear target into a training set and a testing set according to the proportion of 1: 1; after the image is subjected to Par-King image enhancement processing, HOG features of the gradient direction histogram are extracted, and corresponding feature positive samples and feature negative samples are obtained; training two different classifiers for the extracted training set characteristic samples, wherein one classifier is a common overall SVM classifier, and the other classifier is a local SVR combined classifier on a frequency domain; and then, carrying out joint matching on the test set characteristic samples by the two classifiers to obtain the optimal detection position of the target. The method can effectively acquire the high-precision position information of the gear target in the industrial scene by using the combined model matching method.

Description

High-precision gear visual detection method in industrial scene

Technical Field

The invention relates to a high-precision gear visual detection method in an industrial scene, in particular to a gear visual detection method based on joint model matching in the industrial scene, and belongs to the field of target detection of machine learning technology.

Background

Industrial robots in industrial production automatically recognize the category and specific position of parts to be processed by using an object detection technique, and perform corresponding processes such as grasping, welding, cutting, and the like. Compared with the traditional production mode, the automatic production mode of automatically detecting the target improves the production efficiency and saves labor force.

The object detection technology is an important research topic in the field of pattern recognition and digital image processing. The research subject is rapidly progressing in more than ten years, a plurality of excellent target detection algorithms are proposed every year, and the detection effect and speed are continuously optimized. The AdaBoost algorithm framework based on Viola et al uses Haar-like wavelet feature classification, and then adopts a sliding window search strategy to realize accurate and effective positioning. The method is a first object class detection algorithm capable of processing in real time and giving a good detection rate, and is mainly applied to face detection. Dalal et al propose to use image local Gradient direction Histogram (HOG) as a feature and Support Vector Machine (SVM) as a classifier for pedestrian detection, the HOG feature can well reflect the direction information of the target object, and since the feature appears, the target detection technology has been developed more rapidly, and various improved HOG features are also called for.

Based on the requirements on detection precision and speed, Henriques et al propose a Block-circular Decomposition algorithm (Block-circular Decomposition) in 2013, and the detection effect of the algorithm on the pedestrian detection data sets of INRIA and ETH is obviously improved compared with the traditional detection algorithm. The method applies the algorithm to the gear target detection of the industrial scene, and aims at the characteristics that the illumination change is obvious in the industrial scene, part of target images are not clear in acquisition, and the HOG characteristics are sensitive to the directional gradient, and the original images are converted into the fuzzy domain to be subjected to gradient enhancement processing, and then the HOG characteristics are extracted, so that the characteristics have better separability; according to the steps of a block cycle decomposition algorithm, an extracted feature sample is subjected to Fourier transform, an independent Support Vector Regression (SVR) classifier is trained for each block at a corresponding position in HOG features in a frequency domain, then the block is reversely transformed into an integral combined SVR classifier of a space domain, as the classifier is a combined classifier focusing on a single local block, the identification power of integral differences is weaker than the identification power of the local differences, on the basis of the consideration, an integral SVM classifier model is trained by using the extracted features, and a target is jointly detected by using the SVM classifier model and the combined SVR classifier model, and the detection method is finally detected by using an image pyramid matching method provided by Felzenszwalb and the like. The invention is funded by national science fund projects (61461022 and 61761024), mainly aims to explore a global and local characteristic multi-scale coupling mechanism and a multi-scale perception error measure robust fusion algorithm, solves the problems of inconsistency of a coupled posterior test and real distribution and inconsistency of a multi-scale error measure optimization structure, and provides a theoretical basis for efficient, rapid and accurate foreground target information detection and segmentation of a production line in a dynamic scene.

Disclosure of Invention

Based on the problems, the invention provides a high-precision gear visual detection method in an industrial scene. According to the requirement of detection precision, the separability of the target and background gradient direction histogram features is improved by using a Par-King image enhancement algorithm, two different classifier models of an SVM (support vector machine) and a frequency domain SVR (support vector regression) are trained from a self-built data set, and the images of the test set are subjected to joint visual detection by using an image pyramid matching method in a variability component model algorithm.

The technical scheme of the invention is as follows: a high-precision gear visual detection method in an industrial scene comprises the following specific steps:

step1, acquiring a positive sample image with a gear target and a negative sample image without the gear target in an industrial scene, carrying out bounding box labeling on the gear target, and dividing the gear target into a training set and a test set according to the proportion of 1: 1;

step2, enhancing the images of the training set and the test set by using a Par-King image enhancement algorithm;

step3, extracting HOG (histogram of gradient directions) characteristics of the image processed in Step2 to obtain corresponding characteristic positive samples and characteristic negative samples;

step4, training two different classifiers for the extracted training set feature samples, wherein one classifier is a common overall SVM classifier, and the other classifier is a local SVR combined classifier on a frequency domain;

and Step5, carrying out joint matching on the feature samples of the test set by the two classifiers by using an image pyramid matching algorithm to obtain the optimal detection position of the target.

In Step2, the image enhancement method is as follows:

step2.1, regarding a gray image X with the gray level L as a fuzzy point array, assuming that X is an M multiplied by N gray image, and the fuzzy point array is as follows:

in the formula, x_mnIs the pixel value, mu, of the pixel point corresponding to the coordinate (m, n) in the image X_mnFor which the pixel point corresponds to a fuzzy eigenvalue. And a plane formed by all the fuzzy characteristic values is a fuzzy characteristic plane.

Step2.2, the image X is transformed from the image domain to the blur domain, and the transformation function (also called membership function) is:

m＝1,2,...,M；n＝1,2,...N.

in the formula, x_maxIs the maximum value of the pixel values in image X, F_dFuzzification of the parameters for the denominator, F_eFor the exponential blurring parameter, for images with pixel values in the range of 0 to 255, F is usually chosen_dAnd F_eThe parameter values are 128 and 1, respectively.

Step2.3 for fuzzy domain { μ }_mnPerforming enhancement processing, wherein an enhancement function is as follows:

step2.4, after the image enhancement is finished, the formula x is reused_mn＝G^-1(μ_mn) Inverse transform back to spatial domain, G^-1Is the inverse of function G.

Because the Par-King algorithm processes a single-channel gray image, and the training set and test set image data are RGB three-channel images, the Par-King algorithm is used for independently enhancing three channels in one frame of image, and then the three channels of RGB images are combined to obtain a final enhanced image.

In Step3, the HOG feature extraction method comprises the following steps:

the original HOG characteristics divide the sample image into a plurality of cells multiplied by cells (the cells respectively use the center difference to calculate the gradient amplitude M of each pixel point (x, y)_(x,y)And the gradient direction omega_(x,y)The calculation formula is as follows:

Ω_(x,y)＝arctan(M_y/M_x) (5)

in the formula M_xAnd M_yThe horizontal gradient and the vertical gradient at the pixel point (x, y) are respectively calculated according to the following formula:

M_x＝N_(x+1,y)-N_(x-1,y) (6)

M_y＝N_(x,y+1)-N_(x,y-1) (7)

in the formula, N (x, y) is a pixel value of the pixel (x, y).

Since all image data formats of the present invention are RGB color images, three color channel gradient magnitude maxima at each pixel point location are selected as outputs. Extracting improved HOG characteristics by using a vlfeat function library, averagely dividing the gradient direction of [0,2 pi ] into 2 xk intervals (bins) (wherein k is 1,2,3 …), carrying out histogram statistics on all gradient values in each unit by using the 2 xk intervals (bins) to obtain a 2 xk-dimensional characteristic vector, wherein each adjacent four units are a region block (block), and carrying out bilinear interpolation calculation on four characteristic vectors in one region block to obtain 2 xk-dimensional characteristic vector output of the region block; in addition, the gradient direction of [0, pi ] is averagely divided into k intervals (bins), a histogram without gradient amplitude values is counted, and the characteristic vector output of the k dimension is obtained in the same way; finally, the reciprocal of the L2 norm is obtained for the four units in each area block to be used as the normalization factor of each unit, and the four normalization factors are output to be used as the 4-dimensional characteristic vector of the area block. The dimension of the HOG feature vector of one area block obtained finally is (4+3 xk) dimension.

And traversing the sample image by using the block feature vector calculation rule, wherein the traversal step length is cellsize. Go from the top left corner first down and then right to the bottom right corner, and at least half of each region block is within the sample image. Therefore, for an RGB image with a size of w × h × 3, the number of horizontal area blocks hogw and the number of vertical area blocks hogh are:

hogw＝(w+cellsize/2)/cellsize (8)

hogh＝(h+cellsize/2)/cellsize (9)

the dimension of the HOG feature matrix of the sample image obtained finally is hogw × hogh × (4+3 × k).

In Step4, the classifier training method is as follows:

step4.1, for training SVM model, the size of each extracted feature sample is m₁×n₁X p, a positive samples of features, b negative samples of features₁A plurality of; each feature sample was pulled to 1 × (m)₁×n₁X p) dimensional feature vector, using a size of

(a+b₁)×(m₁×n₁X p) dimension feature matrix to train out SVM classifier, and then changing the size of the classifier back to m₁×n₁X p as the final SVM model w₁；

Step4.2 for training the SVR model, the size of each extracted feature sample is m₂×n₂X p, a positive samples of features, b negative samples of features₂A plurality of; fourier transform is carried out on the extracted features, a hyperplane model is independently trained for each block of all feature samples in a frequency domain, then an integral hyperplane model is combined according to the original spatial position, and then inverse Fourier transform is carried out to obtain a final SVR model w₂。

In Step5, the joint matching method comprises the following steps:

because the size of the target to be detected is variable and the angle images of the target are different, a standard image pyramid is obtained by repeated smoothing and downsampling, namely, an image is obtainedGenerating a series of images with different sizes; reuse of learned model w₁And w₂And respectively carrying out dot product operation on the features of each layer of image of the pyramid to obtain the scores of the two models at different scales and different positions for output, defining an integral score for each root position, finally selecting the optimal target position and the score for the two models, comparing the scores of the two target positions detected by the two models, and selecting the target position with high score as the final position for output.

The invention has the beneficial effects that:

(1) the gear visual identification method introduces the machine learning algorithm to the gear visual identification under the industrial scene, and the identification effect of the machine learning algorithm is obviously improved compared with the detection precision of the traditional visual algorithm;

(2) according to the method, a Par-King image enhancement algorithm is introduced to enhance the original image, so that the separability of the target and background gradient direction histogram features is improved;

(3) the invention trains two classifiers to carry out joint detection on the test image, thereby realizing the advantage complementation of different classifiers;

(4) the method can effectively acquire the high-precision position information of the gear target in the industrial scene by using the combined model matching method.

Drawings

FIG. 1 is a block flow diagram of the present invention;

FIG. 2 is an example of a partial positive sample image of the present invention; wherein the black square frame is marked for bounding box;

FIG. 3 is an example of a partial negative example image of the present invention;

FIG. 4 is a partial original image of the present invention;

FIG. 5 is an enhanced image corresponding to the image of FIG. 4 according to the present invention;

FIG. 6 shows the present invention w₁The model is visualized, wherein a positive sample model is arranged on the left side, and a negative sample model is arranged on the right side;

FIG. 7 shows the present invention w₂The model is visualized, wherein a positive sample model is arranged on the left side, and a negative sample model is arranged on the right side;

FIG. 8 shows a part of the test results of the present invention;

FIG. 9 is a graph of accuracy versus recall according to the present invention;

fig. 10 is an enlarged view of a portion of fig. 9 of the present invention.

Detailed Description

Example 1: as shown in fig. 1 to 10, a high-precision gear visual inspection method in an industrial scene includes the following specific steps:

step1, the equipment used to collect data here has a six degree of freedom manipulator and binocular stereo vision system, a camera, and several gear parts. The method comprises the steps of taking a conveyor belt and the periphery of the conveyor belt in an industrial scene as a main background, carrying out image acquisition on gear positive samples and background negative samples, acquiring 320 images of the positive samples and 688 images of the negative samples in a JPG format, storing the images in the JPG format, marking the coordinate position of a boundary box where an object in the image is located by the positive samples, using the coordinate position as a grouping label of the object, and manufacturing a data set, wherein 160 positive samples of a training set, 344 negative samples of the training set, 160 positive samples of a testing set and 344 negative samples of the testing set are obtained. The collected partial data are shown in FIGS. 2 and 3;

in Step2, the image enhancement method is as follows:

a gray image X with a gray level L is regarded as a fuzzy point array, the gray image X is converted into a fuzzy domain by utilizing a membership function, and then is converted into an image domain after being enhanced by an enhancement function to obtain an enhanced image;

because the Par-King algorithm processes single-channel gray images, and the image data of the training set and the test set are RGB three-channel images, the Par-King algorithm is used for independently enhancing three channels in one frame of image, and then the three channels of RGB images are combined to obtain a final enhanced image. Fig. 4 and 5 show a part of the original image and the enhanced image. The qualitative contrast shows that the edge contour of the enhanced image is clearer compared with the edge contour of the original image.

Quantitative analysis of the image after Par-King algorithm enhancement processing by using the average gradient magnitude difference of unit pixels before and after the image is enhancedGradient amplitude variation, assuming gradient amplitude at pixel point (x, y) of RGB image I of size M N3

The enhanced image of I is I', and the calculation formula of the average gradient magnitude difference L of each channel of the image I is as follows:

when L >0, it indicates that the average gradient amplitude of the enhanced image is increased compared to the original image. The L values for each channel of the four exemplary images of FIGS. 6-7 are found by the above equation as shown in Table 1:

TABLE 1 partial image mean gradient amplitude enhancement results

Image of a person		Channel
					R	G	B
Image 1	0.5975	0.1604	0.1529
				Image 2	1.1827	0.2905	0.3022
Image 3	0.8658	0.2086	0.1924
				Image 4	1.8346	0.2275	0.2542

As can be seen from Table 1, the gradient amplitude of the image can be effectively improved by enhancing the image by using the Par-King algorithm, so that the gradient information of the image is more obvious.

in Step3, the HOG feature extraction method comprises the following steps:

selecting the maximum value of the gradient amplitudes of the three color channels at the position of each pixel point as output; averagely dividing the gradient direction of [0,2 pi) into 2 xk interval bins (wherein k is 1,2,3 …), realizing the extraction of improved HOG characteristics by using a vlfeat function library, and finally obtaining the dimension of the HOG characteristic vector of one region block as (4+3 xk);

traversing the sample image by using a block feature vector calculation rule, wherein the traversal step length is cellsize; for an RGB image of size w × h × 3, the number of horizontal area blocks hogw and the number of vertical area blocks hogh are:

hogw＝(w+cellsize/2)/cellsize (11)

hogh＝(h+cellsize/2)/cellsize (12)

Specifically, 31-dimensional HOG features of a small part of area containing a target in an original positive sample are used as feature positive samples, and the spatial position relationship of different blocks after the features are extracted is unchanged, so that after the HOG features are extracted from an original negative sample, a plurality of different HOG features are respectively divided into a plurality of feature negative samples from top left to bottom right in a certain step length in the longitudinal direction and the transverse direction according to the size standard of the feature positive sample, and the HOG features are used as a plurality of feature negative samples, so that the number of the negative samples is effectively expanded, and the defect that the number of the negative samples is possibly insufficient is overcome;

in Step4, the classifier training method is as follows:

step4.1, for training SVM model, the size of each extracted feature sample is 19 × 18 × 31, 316 feature positive samples and 3480 feature negative samples. Drawing each feature sample into a feature vector with the dimension of 1 x 10602, training a SVM classifier by using a feature matrix with the dimension of 3796 x 10602, and then changing the size of the classifier back to 19 x 18 x 31 to be used as a final SVM model w₁；

Step4.2, for training the SVR model, the size of each extracted feature sample is 21 × 20 × 31, 316 feature positive samples and 4676 feature negative samples. By using the idea of a block cyclic decomposition algorithm for reference, Fourier transform is carried out on the extracted features, a hyperplane model is independently trained for each block of all feature samples in a frequency domain, then an integral hyperplane model is combined according to the original spatial position, and then inverse Fourier transform is carried out to obtain a final SVR model w₂. The two classifier models were visualized separately using a vlfeat library as shown in fig. 6-7.

According to the target detection process, a Par-King algorithm is utilizedThe method enhances the gradient information of the image, extracts the HOG characteristics of 31 dimensions, and trains an integral SVM model w by utilizing the characteristic samples of the training set₁And frequency domain SVR block combination model w₂Then, the following steps are carried out;

In Step5, the joint matching method comprises the following steps:

because the size of the target to be detected is variable and the angle images of the target are different, a standard image pyramid is obtained through repeated smoothing and downsampling, namely a series of images with different sizes are generated from one image; reuse of learned model w₁And w₂And respectively carrying out dot product operation on the features of each layer of image of the pyramid to obtain the scores of the two models at different scales and different positions for output, defining an integral score for each root position, finally selecting the optimal target position and the score for the two models, comparing the scores of the two target positions detected by the two models, and selecting the target position with high score as the final position for output.

Aiming at the application of the visual detection of mechanical parts in an industrial scene, the effect of the method is tested by utilizing a self-made gear single-type target detection data set. FIG. 8 shows w₁And w₂Example of partial detection qualitative results, dark box w₁The detection result of the model has a bright color square frame of w₂The detection result of the model is shown in the figure, the independent detection effects of the two models are good, but the comparison of the two detection results shows that w is₁The model often incorporates a dimly lit portion into the target, accounting for w₂Robustness ratio w of model to illumination effect₁The model is strong, but where the illumination effect is relatively small, w₂The model detects more redundant backgrounds, when w₁Model ratio w₂The model detection is more precise.

The two models are combined to carry out target detection, and the optimal positions of the two models are selected as final detection output, so that the purpose of making the two models get the best and make up for the weakness can be achieved. The test is carried out on a notebook computer configured as a Kurui i7 processor and a 12G memory, the accuracy and recall rate indexes are used for comparing the algorithm of the text, the detection effect of the original SVM and circulation algorithm, the circulation + SVM combined algorithm without adding the Par-King algorithm and the SVM and circulation algorithm with adding the Par-King algorithm, the operation time and the accuracy are shown in the table 2, and the accuracy-recall rate curve graph is made as shown in the attached figures 9-10.

Although the improved method of the invention is not superior in calculation speed, the average detection precision can reach 96.8%, and the 93% average detection precision applied to the invention is improved compared with the circular pedestrian detection algorithm. The comparison of the properties in Table 2 shows that.

TABLE 2 Performance comparison Table for six detection algorithms

Method	Classifier training time/s	Average detection time/s	Average detection accuracy/%)
				SVM	3.745	0.608	91.9
circulant	19.019	0.392	93.0
				circulant+SVM	22.764	0.991	95.1
Par-King+SVM	3.636	2.911	94.8
				Par-King+circulant	16.481	2.718	93.8
The method of the invention	20.117	5.614	96.8

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A high-precision gear visual detection method in an industrial scene is characterized by comprising the following steps: the method comprises the following specific steps:

2. The visual inspection method for the high-precision gear in the industrial scene according to claim 1 is characterized in that: in Step2, the image enhancement method is as follows:

3. The visual inspection method for the high-precision gear in the industrial scene according to claim 1 is characterized in that:

in Step3, the HOG feature extraction method comprises the following steps:

selecting the maximum value of the gradient amplitudes of the three color channels at the position of each pixel point as output; averagely dividing the gradient direction of [0,2 pi) into 2 xk interval bins, wherein k is 1,2,3 …, realizing the extraction of improved HOG characteristics by using a vlfeat function library, and finally obtaining the dimension of the HOG characteristic vector of one region block as (4+3 xk);

hogw＝(w+cellsize/2)/cellsize (1)

hogh＝(h+cellsize/2)/cellsize (2)

4. The visual inspection method for the high-precision gear in the industrial scene according to claim 1 is characterized in that: in Step4, the classifier training method is as follows:

step4.1, for training SVM model, the size of each extracted feature sample is m₁×n₁X p, a positive samples of features, b negative samples of features₁A plurality of; each feature sample was pulled to 1 × (m)₁×n₁X p) dimensional feature vector using a size of (a + b)₁)×(m₁×n₁X p) dimension feature matrix to train out SVM classifier, and then changing the size of the classifier back to m₁×n₁X p as the final SVM model w₁；

5. The visual inspection method for the high-precision gear in the industrial scene according to claim 1 is characterized in that: in Step5, the joint matching method comprises the following steps:

because the size of the target to be detected is variable and the angle images of the target are different, a standard image pyramid is obtained by repeated smoothing and downsampling, namely a series of images are generatedImages of different sizes; reuse of learned model w₁And w₂And respectively carrying out dot product operation on the features of each layer of image of the pyramid to obtain the scores of the two models at different scales and different positions for output, defining an integral score for each root position, finally selecting the optimal target position and the score for the two models, comparing the scores of the two target positions detected by the two models, and selecting the target position with high score as the final position for output.