CN107392929B

CN107392929B - Intelligent target detection and size measurement method based on human eye vision model

Info

Publication number: CN107392929B
Application number: CN201710580068.5A
Authority: CN
Inventors: 李庆武; 周亚琴; 马云鹏; 邢俊; 许金鑫; 席淑雅
Original assignee: Changzhou Campus of Hohai University
Current assignee: Changzhou Campus of Hohai University
Priority date: 2017-07-17
Filing date: 2017-07-17
Publication date: 2020-07-10
Anticipated expiration: 2037-07-17
Also published as: CN107392929A

Abstract

The invention discloses an intelligent target detection and size measurement method based on a human eye vision model. The method comprises the steps of combining human eye significance detection with a binocular vision imaging model, detecting significance edges of a target and feature points constrained by spatial information as seed points, completing target detection by utilizing a multi-feature fusion growth mechanism, and finally completing size measurement according to the binocular vision model. The intelligent target detection and size measurement system based on the human eye vision model can accurately complete the detection and size measurement of various targets, the measurement error rate is lower than 2%, and the actual target detection and measurement requirements are met.

Description

Intelligent target detection and size measurement method based on human eye vision model

Technical Field

The invention relates to an intelligent target detection and size measurement method based on a human eye vision model, and belongs to the field of digital image processing, target detection and size measurement.

Background

The size is one of important characteristics of an object, so the size measurement becomes an important technology for production life, such as workpiece size measurement, large-scale cultural building size measurement, vehicle size measurement and the like. Existing dimensional measurement methods can be classified into the following categories: three-coordinate measuring machine, theodolite measuring system, total station measuring system, articulated coordinate measuring machine, indoor Global Positioning System (GPS), laser tracking measuring system, acoustic ranging system, laser ranging system, manual measurement and machine vision measuring system. However, the conventional target dimension measuring method is easily restricted by factors such as the shape of an object and the background. Such as a three-coordinate measuring machine, a total station, a joint type coordinate machine, manual measurement and laser tracking all belong to contact measurement methods, are not suitable for non-contact objects or materials, and have limited application range; the acoustic ranging and the laser ranging are restricted by the measuring range, and the remote measurement is difficult to carry out. Machine vision has the remarkable advantages of non-contact property, high measuring speed, high measuring precision, strong real-time property and the like, and is one of research hotspots in the field of dimension measurement.

The machine vision measuring method can be divided into a plurality of research directions such as monocular vision, binocular vision, multi-ocular vision, infrared vision, ultraviolet vision, mixed vision and the like according to the number and the types of the vision sensors. Among them, monocular vision is often served in various fields as the simplest and fastest image acquisition method, but the obtained visual information has a large limitation. Monocular vision contains a small amount of information, fails to embody three-dimensional information, has a limited visual field range and is easily influenced by the external environment. Compared with monocular vision, the binocular vision technology simulates a human binocular vision system, obtains three-dimensional information of an environment and a target by utilizing parallax information in left and right eye images, and has the following specific advantages: 1) the human binocular vision system is simulated, and the depth information of the target can be acquired on the basis of the two-dimensional information of the target, so that the space positioning and three-dimensional measurement of the target are completed; 2) the human binocular vision system is suitable for obtaining and processing multi-source images through the evolution of 'competitive selection of things and survival of fittest'.

The existing size measuring method using binocular vision has high precision, but is limited to the improvement of measuring precision, and omits a target detection step. In an actual measurement environment, targets are in different backgrounds, and accurate target detection is an important prerequisite for size measurement. Therefore, it is one of the problems to be solved in the present stage to complete target detection before dimension measurement and complete intelligent target detection and dimension measurement.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the existing dimension measurement method omits an object detection step and has a narrow application range.

In order to solve the technical characteristics, the invention provides an intelligent target detection and size measurement method based on a human eye vision model, which comprises the following steps:

1) acquiring binocular images to be detected: taking a target as an angle of a complete individual closest to the camera, and shooting a group of left eye images and right eye images by using a calibrated binocular camera to serve as images to be measured;

2) extracting effective significant points: extracting a salient point diagram of the left eye image by using a Gaussian difference algorithm, calculating the mean value of all salient points with the salient values larger than 0, taking the mean value as a screening threshold value, and extracting pixel points with the salient values larger than the threshold value as image salient points;

3) extracting a significant edge: acquiring an edge image of the image by using a Canny edge detection algorithm, calculating the number of salient points on each edge, sequencing all edge lines in a descending order according to the number of the salient points, and reserving the first a% of edges as salient edges; a, taking 10;

4) and (3) registering the left eye image and the right eye image: detecting a matching point pair of the binocular image by using a SURF algorithm, selecting the matching point pair according to Euclidean distance minimization between the characteristic points, screening the characteristic points by using slopes between the matching point pairs, removing abnormal point pairs, and reserving effective matching point pairs;

5) extracting feature points restricted by spatial information: calculating parallax information of all effective matching points, sorting the matching points according to a parallax descending order, and extracting the top c% as a characteristic point restricted by spatial information; c, taking 10;

6) target detection: rapidly growing a target area by utilizing a multi-feature fusion method of significant information, spatial information and color information to complete target detection;

7) size measurement using binocular vision model: the method comprises the steps of obtaining two point pairs with the largest distance in the horizontal direction and the vertical direction in a two-dimensional image of a detected target, calculating three-dimensional coordinates of the two point pairs according to binocular parallax of a binocular vision model and internal and external parameters of a camera, calculating Euclidean distances of the two point pairs in the horizontal direction and the vertical direction to serve as the length and the height of the target, and completing size measurement of the target.

Compared with the prior art, the method has the advantages that the method firstly combines human eye vision saliency detection with a binocular vision three-dimensional information perception model, extracts saliency edges and feature points constrained by spatial information as seed points, then completes target detection by utilizing a multi-feature fusion growth mechanism of saliency, spatial information and color information, and finally calculates the size of a target according to the binocular vision model. The automatic detection and the size measurement of various targets are completed, the application range is greatly improved, the measurement error is lower than 2%, and the actual measurement requirement is met.

Drawings

FIG. 1 is a flow chart of an intelligent target detection and size measurement method;

FIG. 2 is a schematic representation of salient edge extraction;

FIG. 3 is a schematic diagram of feature point extraction constrained by spatial information;

FIG. 4 is a schematic diagram of a multi-feature fusion growth mechanism;

FIG. 5 is a growth strategy diagram;

FIG. 6 is a schematic view of an object dimension measurement method;

fig. 7 is a diagram illustrating the measurement result of the target dimension.

Detailed Description

FIG. 1 is a flow chart of the intelligent target detection and size measurement method of the present invention, which is characterized in that a binocular camera is calibrated by using a Zhang calibration method, then a left and right eye image of a target to be measured is shot by using the calibrated binocular camera with the target as an angle of a complete individual closest to the camera, a Gaussian difference algorithm (DoG) and a Canny algorithm are combined to calculate a saliency edge, a modified SURF algorithm is used to match the left and right eye image, and feature points restricted by spatial information are obtained according to parallax. And taking the significant edge and the characteristic points restricted by the spatial information as seed points, quickly detecting the target by using the color similarity, and calculating the size of the target by using a binocular vision model.

The method comprises the following specific steps in sequence:

(1) and shooting a plurality of groups of binocular images by using a chessboard model of 7 × 7, inputting the images into an MAT L AB calibration work box, completing camera calibration and acquiring internal and external parameters of the binocular camera.

(2) And shooting the target by using a calibrated binocular camera according to the angle of the complete individual with the target closest to the camera, acquiring a left eye image and a right eye image of the target, and transmitting the left eye image and the right eye image to a PC (personal computer) through a USB (universal serial bus) interface for displaying and storing.

(3) As shown in fig. 2(b), a gaussian difference saliency map of the left eye image is extracted. The DoG algorithm can achieve the effects of exciting a local central region and inhibiting surrounding neighborhoods, and accords with the visual characteristics of human eyes, so that the significance of an image can be reflected to a certain degree. The initial gaussian difference saliency map for the left eye image is calculated as follows:

wherein σ₁And σ₂Respectively representing excitation and suppression bandwidths, the value σ being taken in the text₁＝0.6，σ₂0.9, I is a grayscale image, symbol

Representing that sliding frequency filtering is carried out on the image, and the DoG (x, y) is the significance metric value of the obtained pixel point (x, y);

setting a negative value generated by the saliency metric DoG (x, y) as 0, and setting a saliency mean value as a threshold value T for extraction of a saliency point to obtain a saliency point diagram D (x, y):

wherein, count (DoG > 0) represents the number of significant points with significant value greater than 0 in DoG, and sum (DoG > 0) represents the sum of significant values greater than 0.

(4) As shown in fig. 2(c), the Canny edge detection algorithm can detect all continuous edges of the image, but cannot highlight salient objects, especially the background and shadow edges interfere with the object detection greatly. In order to ensure the accuracy of target detection, the saliency of the DoG is used for constraining the Canny edge, and the saliency edge of the target is extracted, as shown in fig. 2 (d). The method for obtaining the significant edge comprises the following steps:

a. screening effective edges: the Canny detection result has a plurality of short edges caused by shadow, reflection and other factors, which are not beneficial to the extraction of the target edge, and the extraction of the long edge as the effective edge is beneficial to removing the false detection edge, refining the edge information and reducing the calculated amount, so that the lengths of all edges are sequenced, and 50 percent of the edge boundary with the longest length is reserved as the long edge;

b. calculating the edge significance: counting the number of significant points on each effective boundary as a standard for measuring the edge significance, wherein the number of significant points is in direct proportion to the edge significance;

c. extracting a significant edge: sorting the edges in descending order according to the number of the significant points, and reserving the edge line of the first a% as a significant edge;

(5) the method adopts an improved SURF algorithm to register the left eye image and the right eye image, accurately completes the matching of the random feature points of the left eye and the random feature points of the right eye, uses the SURF algorithm to detect the feature points of the images in a scale space, selects windows with different sizes when filtering the images, extracts the feature points through a Hessian matrix, and defines the Hessian matrix as follows:

where σ is the scale of the Gaussian kernel function, L_xx(x, σ) is a second order differential of Gaussian

Convolution with the image I (x, y) in the x-direction, L_yy(x, σ) is a second order differential of Gaussian

Convolution with the image I (x, y) in the y direction,

convolution with image I (x, y) in the xy direction, the formula for which is as follows:

g (σ) is a Gaussian kernel function.

a. Selecting a main direction of the characteristic points: in order to ensure the direction invariance of the algorithm, the SURF algorithm assigns a unique main direction for the feature points according to the information of surrounding pixel points. First, with the detected feature point as the center, σ as the step size, and the size of the wavelet as 4 σ, the response of the Harr wavelet in the horizontal and vertical directions in a circular neighborhood with a radius of 6 σ is calculated, where σ represents the scale space in which it is located. And performing Gaussian weighting on the Harr wavelet response taking the current characteristic point as the center by using a Gaussian function with the scale of 2 sigma. The smaller the distance between the specification and the characteristic point is, the smaller the weight of the pixel point is, and a new Harr wavelet response value is obtained. Constructing a fan-shaped window with an angle of 60 degrees by the SURF algorithm, calculating a Harr wavelet response value in the window, continuously rotating and traversing the whole circular region by using the window until the Harr wavelet response in the window reaches the strongest value, and defining the direction pointed by the fan-shaped window at the moment as the main direction of the characteristic point;

b. constructing a block with the size of 20 sigma along the main direction of the current feature point by taking the current feature point as the center, then dividing the region into 16 small regions, and calculating the haar wavelet response in each region with the size of 5 sigma × 5 sigma, wherein each sub-region v is expressed as v (∑ dx, ∑ | dx |, ∑ dy, ∑ | dy |), and finally obtaining a 64-dimensional descriptor of the point, and the feature points of the left-eye image and the right-eye image can be defined as:

wherein Pos1 represents the feature point parameter of the left eye image, Pos2 represents the feature point parameter of the right eye image, m and n are the number of feature points in the left eye image and the right eye image respectively, i and j represent the subscripts of the feature points of the left eye image and the right eye image respectively, (x)_m'，y_m') denotes the mth feature point coordinate of the left eye image; (x)_n，y_n) Representing the coordinates of the nth characteristic point of the right eye image;

c. calculating Euclidean distances of all points in feature point parameters Pos1 and Pos2 of the left eye image and the right eye image, selecting a point with the minimum Euclidean distance as a rough matching point pair, sorting the rough matching point pair according to ascending Euclidean distances, deleting abnormal points, selecting the first K matching point pairs, and defining as:

Pos_K＝{{(x'₁,y'₁),(x₁,y₁)},{(x'₂,y'₂),(x₂,y₂)},...,{(x'_i,y'_i),(x_i,y_i)},...{(x'_K,y'_K),(x_K,y_K)}}，1≤i≤K；

d. screening the matched point pairs according to the slopes of the corresponding points in the K matched point pairs Pos _ K, calculating the slopes of all coarse matched point pairs, and calculating the slope of 10^-2Keeping all slope values in the order of magnitude, calculating the occurrence frequency of all slopes, selecting the slope with the maximum occurrence frequency as a leading slope, deleting the matching point pairs corresponding to other abnormal slopes, and updating to obtain the H groups of accurate matching point pairs Pos _ K_newThe formula is as follows:

(x_zi,y_zi),(x_yi,y_yi) Respectively represent a group of matching points to center the left eyeFeature point coordinates of the image and the right eye image.

(6) Extracting the feature points restricted by the spatial information, directly calculating and obtaining the parallaxes of all the matching points through SURF algorithm matching, sequencing the parallaxes in an ascending order, and reserving the first c% matching points with the maximum parallaxes as the feature points restricted by the spatial information. As shown in fig. 3, the white automobile model is a complete target in the image with a space closest to the camera, and the extracted feature points constrained by the space information are marked in the left and right target images. All the extracted feature points in the graph are on the target, so that the probability of segmentation errors is reduced, and generally c is 10.

(7) As shown in fig. 4, a complete target cannot be obtained only from the significant edge, but the significant edge and the feature points constrained by the spatial information can accurately represent the features of the target, and the selection requirement of the growing seed points is being satisfied. Therefore, the invention fuses the obvious edge information, the space information and the color information to carry out rapid growth and finish the target detection. Fig. 5 simulates the process of seed point growth, and a repeated growth area exists in the growth process, that is, two seed points simultaneously determine the area as the area to be grown, and repeated operation reduces the growth efficiency.

In the growth process of the seed set, the similarity judgment is a very important growth or stop index, in order to quantitatively calculate the similarity between the seed set and the region to be grown, a similarity function is selected to calculate the similarity, and if the similarity between the region to be grown and the seed set is greater than a set threshold value, the growth is carried out. Because the HSI color model accords with the visual characteristics of human eyes, in order to reduce the influence of brightness on target detection, the method adopts the HSI color space to calculate the similarity.

Taking the significant edge and the feature point restricted by the spatial information as growth seed points, taking all the growth seed points as initial seed areas, converting the left eye image of the image into an HSI color model, respectively calculating the mean component mean square deviations of the hue (H), the saturation (S) and the bright spot (I) of the area to be grown and the seed areas in the HSI color information, carrying out weighted summation on the mean square deviations, taking the weighted summation value as the difference value between the area to be grown and the seed areas, growing the point if the difference value is lower than a threshold value, combining the point into the corresponding seed areas, updating to obtain new seed areas, and continuously circulating until all the seed areas have no pixels meeting the conditions or reach the edge of the image, and stopping growing.

Taking the significant edge and the characteristic points constrained by the spatial information as initial seed regions, and counting the number of the initial seed points to be N, R_iThen it is the ith seed region, i ∈ [1, N]，

Is a seed region R_iAnd each seed region grows according to the similarity with the region to be grown, newly grown pixel points are continuously combined into the corresponding seed region to be used as a new seed region for the next growth, and the growth is stopped until all the seed regions do not meet the condition of the region to be grown.

Region to be grown

With the seed region R_iSimilarity function of

Is defined as:

k is the number of pixel points in each seed region, the number of the pixel points is continuously increased along with the growth of the seed region, j represents the jth pixel point in the growth region, and H_t、S_t、I_tRespectively the hue, saturation and brightness value in the region to be seeded,

respectively being growth areas

Internal hue, saturation, and brightness mean;₁，₂and₃for adjusting the weights of the hue component, saturation and brightness components. Since the mean square error can reflect the difference, the larger the difference is, the larger the influence degree of the component on the image is, so the mean square error of the H, S, I component of the image is used as the mean square error of the image₁，₂And₃the accuracy of target area detection can be improved.

(8) Given that the base-line distance between the binocular vision cameras is b, the focal length of the cameras is f, and the parallax is represented by d, assuming that the left eye image and the right eye image are already registered, the parallax can be represented by the position difference of the same point in the image pair as d ═ x (x)_l-x_r)。

Wherein x is_l、x_rThe abscissa of the matched point in the left eye image and the right eye image respectively, and the spatial coordinate (x) of a certain point P in the left camera coordinate system can be calculated according to the following formula^c,y^c,z^c)，(x^c,y^c,z^c) For spatial coordinates of the matching points in the world coordinate system, (x)_l,y_l) Is the two-dimensional image coordinates of the matching point in the left eye image.

Calculating the point pair with the maximum distance between the segmented target in the horizontal and vertical directions on the two-dimensional plane image, such as L in the horizontal direction in FIG. 6₁And L₂H in the vertical direction₁And H₂And obtaining space coordinates of four points according to a binocular vision principle, and calculating the maximum Euclidean distance in the horizontal direction and the vertical direction as the horizontal width and the vertical height of the target by using the three-dimensional coordinates.

Suppose that these four three-dimensional coordinates are L respectively₁(x_L1,y_L1,z_L1)、L₂(x_L2,y_L2,z_L2) And H₁(x_H1,y_H1,z_H1)、H₂(x_H2,y_H2,z_H2) The maximum length and height of the object can be calculated according to the following formula:

as shown in FIG. 7, the present invention completes the detection and size measurement of the object, and calculates the length and height of the object with an accuracy of 0.1mm, and displays them on the image.

The present invention has been disclosed in terms of the preferred embodiment, but it is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting the equivalent embodiments fall within the scope of the present invention.

Claims

1. An intelligent target detection and size measurement method based on a human eye vision model is characterized by comprising the following steps:

3) extracting a significant edge: acquiring an edge image of the image by using a Canny edge detection algorithm, calculating the number of salient points on each edge, sequencing all edge lines in a descending order according to the number of the salient points, and reserving the first a% of edges as salient edges;

5) extracting feature points restricted by spatial information: calculating parallax information of all effective matching points, sorting the matching points according to a parallax descending order, and extracting the top c% as a characteristic point restricted by spatial information;

2. The intelligent target detection and size measurement method based on human eye vision model as claimed in claim 1, wherein in the step 2), an initial gaussian difference saliency map of the left eye image is calculated as follows:

wherein σ₁And σ₂Respectively representing excitation bandwidth and suppression bandwidth, I is gray image, symbol

3. The intelligent object detection and dimension measurement method based on human eye vision model as claimed in claim 1, wherein in the step 3), a-10.

4. The intelligent target detection and dimension measurement method based on human eye vision model as claimed in claim 1, wherein in the step 4), the following steps are included:

a. obtaining feature point coordinates of the left eye image and the right eye image and 64-dimensional descriptors of all feature points by using an SURF algorithm, wherein the feature points of the left eye image and the right eye image are defined as follows:

b. calculating Euclidean distances of all points in feature point parameters Pos1 and Pos2 of the left eye image and the right eye image, selecting a point with the minimum Euclidean distance as a rough matching point pair, sorting the rough matching point pair according to ascending Euclidean distances, deleting abnormal points, selecting the first K matching point pairs, and defining as:

Pos_K＝{{(x′₁,y′₁),(x₁,y₁)},{(x′₂,y′₂),(x₂,y₂)},...,{(x′_i,y′_i),(x_i,y_i)},...{(x′_K,y′_K),(x_K,y_K)}}，1≤i≤K；

c. screening the matched point pairs according to the slopes of the corresponding points in the K matched point pairs Pos _ K, calculating the slopes of all coarse matched point pairs, and calculating the slope of 10^-2Keeping all slope values in the order of magnitude, calculating the occurrence frequency of all slopes, selecting the slope with the maximum occurrence frequency as a leading slope, deleting the matching point pairs corresponding to other abnormal slopes, and updating to obtain the H groups of accurate matching point pairs Pos _ K_newThe formula is as follows:

(x_zi,y_zi),(x_yi,y_yi) Respectively representing the coordinates of the characteristic points of the left eye image and the right eye image in a group of matching point pairs.

5. The intelligent object detection and dimension measurement method based on human eye vision model as claimed in claim 1, wherein in the step 5), c-10.

6. The intelligent target detection and dimension measurement method based on human eye vision model as claimed in claim 1, characterized in that in the step 6), the salient edges and the feature points restricted by the spatial information are used as growth seed points, all the growing seed points are used as initial seed areas, the left eye image is converted into an HSI color model, the hue, saturation and mean square deviation of the bright point mean value components of the area to be grown and the seed areas in HSI color information are respectively calculated, and the mean square error is weighted and summed, the weighted and summed value is used as the difference value between the area to be grown and the seed area, if the difference value is lower than the threshold value, growing the point, combining the point into the corresponding seed region, updating to obtain a new seed region, and continuously circulating until all seed regions have no qualified pixels or reach the image edge, and stopping growing.

7. The intelligent target detection and size measurement method based on the human eye vision model as claimed in claim 6, wherein a repeated growth area exists in the growth process, that is, two seed points simultaneously judge a certain area as an area to be grown, the repeated operation reduces the growth efficiency, if a certain pixel point is processed by the previous seed point, the pixel point is marked with a 'processed pixel point label', and the next seed point does not repeatedly judge the area.

8. The method as claimed in claim 1, wherein in the step 7), the base distance between the binocular vision cameras is b, the focal length of the cameras is f, the parallax is d, and the parallax is represented by d (x ═ x) which is the difference between the positions of the same point in the image point pair_l-x_r) Wherein x is_l、x_rRespectively, the abscissa of the matched point in the left eye image and the right eye image, and the spatial coordinate (x) of a certain point P in the left camera coordinate system is calculated according to the following formula^c,y^c,z^c)，(x^c,y^c,z^c) For spatial coordinates of the matching points in the world coordinate system, (x)_l,y_l) Two-dimensional image coordinates of the matching points in the left eye image;

calculating the point pairs with the maximum distance of the segmented target in the horizontal and vertical directions on the two-dimensional plane image, obtaining the space coordinates of four points according to the binocular vision principle, and calculating the maximum Euclidean distance in the horizontal and vertical directions by using the three-dimensional coordinates as the horizontal width and the vertical height of the target;

suppose that the four-point three-dimensional coordinates are L respectively₁(x_L1,y_L1,z_L1)、L₂(x_L2,y_L2,z_L2) And H₁(x_H1,y_H1,z_H1)、H₂(x_H2,y_H2,z_H2) The maximum length and height of the object are calculated as follows: