CN104778721B

CN104778721B - The distance measurement method of conspicuousness target in a kind of binocular image

Info

Publication number: CN104778721B
Application number: CN201510233157.3A
Authority: CN
Inventors: 王进祥; 杜奥博; 石金进
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date: 2015-05-08
Filing date: 2015-05-08
Publication date: 2017-08-11
Anticipated expiration: 2035-05-08
Also published as: CN104778721A

Abstract

The distance measurement method of conspicuousness target in a kind of binocular image, the present invention relates to a kind of distance measurement method of target in binocular image.The purpose of the present invention is to propose to a kind of distance measurement method of conspicuousness target in binocular image, to solve the problem of existing target distance measurement method processing speed is slow.Step 1: carrying out significant characteristics extraction to binocular image using vision significance model, and mark seed point and background dot；Step 2: setting up weighted graph to binocular image；Step 3: using the weighted graph in the seed point and background dot and step 2 in step one, the conspicuousness Target Segmentation in binocular image is come out by random walk image segmentation algorithm；Step 4: conspicuousness target is individually carried out into crucial Point matching by SIFT algorithms；Step 5: the parallax matrix K that step 4 is obtained ' substitute into the model of binocular ranging and obtain conspicuousness target range.Present invention can apply to the range measurement in intelligent vehicle running to visual field forward image conspicuousness target.

Description

distance measurement method for significant target in binocular image

Technical Field

The invention relates to a distance measuring method for a target in a binocular image, in particular to a distance measuring method for a significant target in the binocular image, and belongs to the technical field of image processing.

Background

The distance information is mainly applied to providing safety judgment for a control system of an automobile in traffic image processing. In the research process of intelligent automobiles, the traditional target measurement method is to measure the distance of a target by using radar or laser with a specific wavelength. Compared with radar and laser, the visual sensor has the advantages of price and wider visual angle. And the specific content of the target can be judged while measuring the target distance by using the vision sensor.

However, the current traffic image information is relatively complex, the traditional target distance measurement algorithm is difficult to obtain an ideal result in a complex image, and because a salient target in the image cannot be found but is detected globally, the processing speed is low, a lot of irrelevant data is added, and the algorithm cannot meet the requirements of practical application.

Disclosure of Invention

The invention aims to provide a distance measuring method for a significant target in a binocular image, which aims to solve the problem that the existing target distance measuring method is low in processing speed.

The invention relates to a distance measuring method of a significant target in a binocular image, which is realized according to the following steps: the method comprises the following steps of firstly, extracting the saliency characteristics of a binocular image by using a visual saliency model, marking out seed points and background points, and specifically comprising the following steps:

the method comprises the following steps of firstly, extracting the saliency characteristics of a binocular image by using a visual saliency model, marking out seed points and background points, and specifically comprising the following steps:

firstly, preprocessing is carried out, edge detection is carried out on a binocular image, and an edge map of the binocular image is generated; performing saliency feature extraction on the binocular image by using a visual saliency model to generate a saliency feature map;

step three, finding out a pixel point with the maximum gray value in the image according to the saliency characteristic map, and marking the pixel point as a seed point; traversing pixels in a 25 multiplied by 25 window with the seed point as the center, and finding out pixel points with the gray value of less than 0.1 and the farthest distance from the seed point as background points;

step two, establishing a weighted graph for the binocular image;

establishing a weighted graph for the binocular image by using a classical Gaussian weight function:

wherein, W_ijRepresents the weight, g, between vertex i and vertex j_iDenotes the brightness of the vertex i, g_jDenotes the intensity of the vertex j, β is a free parameter, e is a natural base number;

the laplace matrix L of the weighted graph is found by:

wherein L is_ijElements of the Laplace matrix L corresponding to vertices i to j, d_iIs the sum of the vertex i and the surrounding point weights, d_i＝ΣW_ij；

Thirdly, segmenting the salient objects in the binocular image by using the seed points and the background points in the first step and the weighted graph in the second step through a random walk image segmentation algorithm;

step three, dividing pixel points of the binocular image into two sets according to the seed points and the background points marked in the step one, namely a marked point set V_MAnd the unmarked point set V_ULaplace matrix L according to V_MAnd V_UPreferentially arranging the marked points and then arranging the non-marked points; wherein said L is divided into L_M、L_U、B、B^TAnd four parts, representing the Laplace matrix as follows:

wherein L is_MLaplace matrix, L, from mark point to mark point_ULaplace matrix from unmarked point to unmarked point, B and B^TLaplace matrixes from mark points to non-mark points and from non-mark points to mark points respectively;

step two, solving a combined Dirichlet integral Dx according to the Laplace matrix and the mark points;

the combined dirichlet integral formula is as follows:

wherein x is a probability matrix from a top point to a mark point in the weighted graph, and x_iAnd x_jThe probabilities from the vertexes i and j to the mark points are respectively;

according to the set V of the mark points_MAnd the unmarked point set V_UDividing x into x_MAnd x_UTwo moieties, x_MSet of marked points V_MCorresponding probability matrix, x_UFor unmarked point set V_UA corresponding probability matrix; decomposing formula (4) into:

for the mark point s, m is set^sIf any vertex i is s, thenOtherwiseFor D [ x ]_u]For x_UAnd (3) differentiating to obtain a solution of the minimum value of the formula (5), namely the Dirichlet probability value of the mark point s:

wherein,representing the probability that the vertex i reaches the mark point s for the first time;

according to passing throughObtained by combining dirichlet integralsPerforming threshold segmentation according to equation (7) to generate a segmentation map:

wherein s is_iThe pixel size of a corresponding position of a certain vertex i in the segmentation graph is obtained;

the pixel point with the brightness of 1 in the segmentation graph is represented as a saliency target in the image, and the pixel point with the brightness of 0 is the background;

thirdly, multiplying the segmentation image by the pixels corresponding to the original image to generate a target image, namely extracting the segmented saliency target, wherein the formula is as follows:

t_i＝s_i·I_i(8)

wherein, t_iIs the gray value of a certain vertex I of the target graph T_iA gray value corresponding to the position I of the input image I (sigma);

fourthly, independently matching key points of the significant targets through an SIFT algorithm;

establishing a Gaussian pyramid for the target image, calculating the difference between every two filtered images to obtain a DOG image, wherein the DOG image is defined as D (x, y, sigma), and the formula is as follows:

wherein,is a Gaussian function with variable scale, p and q represent the dimensionality of a Gaussian template, and (x and y) are pixel points in a Gaussian pyramidPosition in the tower image, σ is the scale space factor of the image, k represents a certain specific scale value, C (x, y, σ) is defined as the convolution of G (x, y, σ) with the target map T (x, y), i.e., C (x, y, σ) ═ G (x, y, σ) × T (x, y);

step two, solving extreme points in adjacent DOG images, determining the positions and scales of the extreme points as key points by fitting a three-dimensional quadratic function, and performing stability detection on the key points according to a Hessian matrix to eliminate edge response, wherein the method specifically comprises the following steps:

(one) solving a curve fitting D (X) of DOG in the scale space by performing Taylor expansion:

where X ═ (X, y, σ)^TD is curve fitting, and the offset formula (11) of the extreme point is obtained by deriving the formula (10) and making it 0:

to remove the extreme points of low contrast, equation (11) is substituted into equation (10) to obtain equation (12):

if the value of the formula (12) is greater than 0.03, the extreme point is reserved, and the accurate position and scale of the extreme point are obtained, otherwise, the extreme point is discarded;

secondly, unstable key points are eliminated through Hessian matrix screening at the key points;

calculating curvature by using the ratio between characteristic values of the Hessian matrix;

judging edge points according to the curvatures of the key point neighborhoods;

the curvature ratio is set to be 10, if the curvature ratio is larger than 10, deleting the curvature ratio, otherwise, keeping the curvature ratio, and keeping the curvature ratio as a stable key point;

step four, using the pixel of the 16 × 16 window of the key point neighborhood to assign a direction parameter for each key point;

for the key points detected in the DOG image, the magnitude and direction of the gradient are calculated as follows:

wherein C is the scale space where the key point is located, m is the gradient size of the key point, and theta is the gradient direction of the solved point; with the key point as the center, a 16 x 16 neighborhood is defined in the surrounding area, the gradient size and the gradient direction of the pixel point are solved, and the gradient of the point in the neighborhood is counted by using a histogram; the abscissa of the histogram is the direction, divide 360 degrees into 36, each is a item in 10 degrees corresponding histogram, the ordinate of the histogram is the magnitude of gradient, add for the magnitude of the point of the corresponding gradient direction, its sum is the magnitude of the ordinate; the main direction is defined as the interval direction with the maximum gradient size hm, and the interval with the gradient size above 08 × hm is used as the auxiliary direction of the main direction to enhance the matching stability;

fourthly, establishing descriptors to express local characteristic information of key points

Firstly, rotating coordinates around a key point to be the direction of the key point;

then, a 16 × 16 window around the keypoint is selected, the neighborhood is divided into 16 4 × 4 small windows, the size and direction of the corresponding gradient in the 4 × 4 small windows are calculated, the gradient information of each small window is counted by using a histogram of 8 bins, and the descriptor of the 16 × 16 window around the keypoint is calculated by a gaussian weighting algorithm according to the following formula:

wherein h is a descriptor, (a, b) is the position of the key point in the Gaussian pyramid image, and m_gThe gradient size of the key point, namely the gradient size in the main direction of the histogram in the fourth step and the third step, d is the side length of the window, namely 16, (x, y) is the position of the pixel point in the Gaussian pyramid image, (x ', y') is the new coordinate of the pixel in the neighborhood of the direction of rotating the coordinate into the key point, and the calculation formula of the new coordinate is as follows:

θ_gthe gradient direction of the key point;

the feature vectors of 128 key points are obtained by calculating a window of 16 × 16, and are recorded as H ═ H₁,h₂,h₃,...,h₁₂₈) Normalizing the feature vector, and recording the normalized feature vector as L_gThe normalization formula is as follows:

wherein L is_g＝(l₁,l₂,...,l_i,...,l₁₂₈) Feature vectors for the keypoints after normalization, l_i1,2, 3.. is some normalized vector;

matching the key points in the binocular image by using Euclidean distance of the feature vectors of the key points as judgment measurement of similarity of the key points in the binocular image, wherein the coordinate information of the key points which are matched with each other is used as a group of key information;

step four, screening the generated matching key points;

calculating the horizontal parallax of each pair of key points to generate a parallax matrix, wherein the parallax matrix is defined as K_n＝{k₁,k₂...k_nN is the logarithm of the match, k₁、k₂、k_nAs a single matching point disparity;

determining the median k of the disparity matrix_mAnd obtaining a reference parallax matrix, which is marked as K_n', the formula is as follows:

K_n'＝{k₁-k_m,k₂-k_m,...,k_n-k_m} (17)

setting parallax threshold value to be 3, and setting K_n' deletion of corresponding parallaxes larger than the threshold value in the ' sequence to obtain the final visual matrix result K ', K_1'、k_2'、k_n'All the parallax errors are the parallax errors of the screened correct matching points, n' is the logarithm of the final correct matching, and the formula is as follows:

K'＝{k_1',k_2',...,k_n'} (18)

substituting the parallax matrix K' obtained in the step four into a binocular ranging model to obtain a significant target distance;

the focal lengths of the two identical imaging systems are separated by J along the horizontal direction, the two optical axes are parallel to the horizontal plane, and the image plane is parallel to the vertical plane;

suppose a target point M (X, Y, Z) in the scene, Pl (X) at the left and right imaging points respectively₁,y₁) And Pr (x)₂,y₂)，x₁,y₁And x₂,y₂Coordinates of Pl and Pr in the vertical plane of the image, respectively, and the parallax in the binocular model is defined as k ═ Pl-Pr | -x₂-x₁And l, obtaining a distance formula according to a triangular similarity relation, wherein X, Y and Z are coordinates of a horizontal axis, a vertical axis and a longitudinal axis in a space coordinate system:

wherein dx 'represents the physical distance of each pixel in the horizontal axis direction of the imaged negative film, f is the focal length of the imaging system, Z is the distance from the target point M to the connecting line of the two imaging centers, the parallax matrix obtained in the step four is taken into a formula (19), and the corresponding distance matrix Z' ═ Z is obtained according to the physical information of the binocular model₁,z₂,...,z_n'}，z₁，z₂，z_n'The distance of the significant target is calculated for single matching parallax, and finally the average value of the distance matrix is calculated, namely the distance Z of the significant target in the binocular image_fThe formula is as follows:

the invention has the beneficial effects that:

1. the invention adopts a method of simulating a human visual system to extract the region interested by human eyes, and the significance target extracted by an algorithm is basically consistent with the detection result of the human eyes, so that the significance target can be automatically identified as the human eyes by extraction.

2. The invention automatically completes the distance measurement of the saliency target without manually selecting the saliency target.

3. The method matches the same target, thereby ensuring that the parallax results of key point matching are similar, effectively screening out wrong matching points, ensuring that the matching accuracy is close to 100 percent, ensuring that the relative error of the parallax is less than 2 percent, and increasing the accuracy of distance measurement.

4. The method has less matching information, can effectively reduce additional irrelevant calculation, reduces the matching calculation by at least 75 percent, reduces the introduction of irrelevant data, has the utilization rate of the matching data of more than 90 percent, realizes the measurement of the distance of the significant target in a complex image environment, and improves the image processing efficiency.

5. The invention measures the distance of the image significance target in front of the field of vision during the driving of the intelligent automobile, thereby providing key information for the safe driving of the automobile, solving the defect that the traditional image distance measurement can only carry out depth detection on the whole image, and well avoiding the problems of larger error and excessive noise.

6. According to the method, the saliency features of the binocular images are extracted, the saliency target is segmented, the target range is narrowed, the matching time is shortened, the efficiency is improved, the saliency target key points are matched, the parallax is calculated, the distance measurement is further realized, wrong matching key points can be well screened out due to the fact that the target is on a vertical plane, the accuracy is improved, and the method can be used for rapidly identifying the saliency target and accurately measuring the distance of the saliency target.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart of visual saliency analysis;

FIG. 3 is a flow chart of a random walk algorithm;

FIG. 4 is a SIFT algorithm flow chart;

FIG. 5 shows a binocular measurement system, where X, Y, and Z are defined spatial coordinate systems, M is a point in space, Pl and Pr are imaging points of M on an imaging plane, M is a point in space, and f is the focal length of the imaging system.

Detailed Description

The embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

The first embodiment is as follows: the present embodiment is described below with reference to fig. 1 to 5, and the method of the present embodiment includes the steps of:

and performing saliency extraction on the binocular image by using a visual saliency model, respectively calculating three saliency characteristics of brightness, color and direction of each pixel point of the binocular image, and normalizing the three saliency characteristics to obtain a weighted saliency map of the image. Each pixel on the saliency map represents a magnitude of saliency for a corresponding location in the image. Finding out the point with the maximum pixel value in the graph, namely the point with the strongest significance, and marking the point as a seed point; and gradually expanding the range around the seed point to find the point with the weakest significance, and marking the point as a background point. The process of extracting image saliency using a visual saliency model is shown in fig. 2.

Firstly, preprocessing is carried out, edge detection is carried out on a binocular image, a visual saliency model is generated, and edge information is important saliency information of the image;

performing saliency feature extraction on the binocular image by using a visual saliency model to generate a saliency feature map;

step three, finding out a pixel point with the maximum brightness in the image according to the saliency characteristic map, and marking the pixel point as a seed point; traversing pixels in a 25 multiplied by 25 window with the seed point as the center, and finding out pixel points with the gray value of less than 0.1 and the farthest distance from the seed point as background points;

step two, establishing a weighted graph for the binocular image;

establishing a weighted graph for the binocular image by using a classical Gaussian weight function, giving a certain weight between each pixel point in the binocular image and the surrounding pixels thereof as an edge through different gray levels of the pixels, and establishing the weighted graph containing the vertex and the edge by taking each pixel point as the vertex;

the method comprises the following steps of using graph theory to regard the whole image as an undirected weighted graph and regard each pixel as a vertex in the weighted graph, wherein the edges of the weighted graph are weighted by the gray value of the pixel, and specifically adopting a classical Gaussian weight function as follows:

wherein, W_ijRepresents the weight, g, between vertex i and vertex j_iRepresenting the brightness, g, of the pixel i_jRepresenting the luminance of pixel j, β is a free parameter, e is a natural base number;

the laplace matrix L of the weighted graph is found by:

wherein L is_ijElements of the Laplace matrix L corresponding to vertices i to j, d_iIs the sum of the vertex i and the surrounding point weights, d_i＝∑W_ij；

the combined dirichlet integral formula is as follows:

setting m^sIs defined as for the marked point s, if any vertex i is s, thenOtherwiseFor D [ x ]_u]For x_UAnd (3) differentiating to obtain a solution of the minimum value of the formula (5), namely the Dirichlet probability value of the mark point s:

from a combined Dirichlet integralPerforming threshold segmentation according to equation (7) to generate a segmentation map:

t_i＝s_i·I_i(8)

wherein, t_iIs the gray value of the corresponding position I of the target map T_iA gray value corresponding to the position I of the input image I (sigma);

and (3) independently detecting and matching key points of the segmented saliency target through an SIFT algorithm, screening the obtained matching coordinate, and providing a result of the error matching to leave a correct matching result.

The process of matching the binocular image by the SIFT algorithm is shown in fig. 4.

wherein,the method comprises the following steps that (1) a Gaussian function with a variable scale is adopted, p and q represent the dimension of a Gaussian template, (x and y) represent the positions of pixel points in a Gaussian pyramid image, sigma is a scale space factor of the image, k represents a specific scale value, and C (x, y and sigma) is defined as the convolution of G (x, y and sigma) and a target image T (x and y), namely C (x, y and sigma) is G (x, y and sigma) T (x and y);

the key points are composed of local extreme points of the DOG image, each point on the DOG image is traversed, the gray value of 26 points which are the same as 8 adjacent points of the same scale and adjacent upper and lower 2 multiplied by 9 points is detected for each point, and if the gray value is larger or smaller than the adjacent points around, the gray value is the extreme point.

The extreme points found are not real key points, and in order to improve stability, it is necessary to (a) find the curve fitting d (x) of the scale space DOG by taylor expansion:

judging edge points according to the curvatures of the key point neighborhoods;

if the value of equation (12) is greater than 0.03, the extreme point is retained and the exact position (the original position plus the offset after fitting) and scale of the extreme point are obtained, otherwise discarded. To eliminate unstable keypoints, screening was performed by Hessian matrix at the keypoints:

and step three, after the positions and the scales of the key points are determined, assigning a direction to the key points, and defining that the key point descriptors are relative to the direction. Using the pixels of the 16 x 16 window of the key point neighborhood to assign a direction parameter to each key point;

wherein C is the scale space where the key point is located, m is the gradient size of the key point, and theta is the gradient direction of the key point; taking a key point as a center, dividing a neighborhood in a surrounding area, and counting the gradient of a point in the neighborhood by using a histogram;

the abscissa of the histogram is the direction, and 360 degrees are divided into 36 shares, each of which is one of 10 degrees corresponding to the histogram. The ordinate of the histogram is the magnitude of the gradient, and the magnitudes of the points corresponding to the respective gradient directions are added, and the sum is the magnitude of the ordinate. The main direction is defined as the interval direction with the maximum gradient size hm, and other intervals with the height of 08 × hm are used as auxiliary directions of the main direction to enhance the stability of matching.

And step four, after the above stages, each detected key point has three kinds of information of position, direction and scale. And establishing a descriptor for each key point to express the local characteristic information of the key point.

The coordinates around the keypoint are first rotated into the direction of the keypoint. Then a 16 x 16 window around the keypoint is chosen, divided into 16 small windows of 4 x 4 in the neighborhood. In a small 4 × 4 window, the magnitude and direction of its corresponding gradient are calculated. And a histogram of 8 bins is used to count the gradient information for each small window. The descriptor is calculated by gaussian weighting algorithm for a 16 × 16 window around the keypoint as follows:

wherein h is a descriptor, (a, b) is the position of a key point in the Gaussian pyramid image, d is the side length of a window, namely 16, (x, y) is the position of a pixel point in the Gaussian pyramid image, and (x ', y') is a new coordinate of the pixel in a neighborhood which rotates the coordinate into the direction of the key point, and the calculation formula of the new coordinate is as follows:

theta is the direction of the key point.

The feature vectors of 128 key points are obtained by calculating a window of 16 × 16, and are recorded as H ═ H₁,h₂,h₃,...,h₁₂₈) In order to reduce the influence of light, the feature vector is normalized, and the normalized feature vector is recorded as L_gThe normalization formula is as follows:

wherein L is_g＝(l₁,l₂,l₃,...,l₁₂₈) The feature vectors of the key points after normalization;

after descriptors of key points of two images of the binocular image are generated, matching the key points in the binocular image by using Euclidean distance of feature vectors of the key points as judgment measurement of similarity of the key points in the binocular image, and using coordinate information of the key points which are matched with each other as a group of key information;

step four, screening the generated matching key points in order to avoid the generation of errors to the maximum extent;

since the measurement system is a binocular model, the key points of the saliency target are a horizontal plane in the two images, and the level difference of each pair of key points is theoretically equal. Thus obtainingThe coordinate horizontal parallax of each pair of key points generates a parallax matrix, and the parallax matrix is defined as K_n＝{k₁,k₂...k_nN is the logarithm of the match, k₁、k₂、k_nAs a single matching point disparity;

K_n'＝{k₁-k_m,k₂-k_m,...,k_n-k_m}

setting parallax threshold value to be 3, and setting K_nThe corresponding parallaxes larger than the threshold value in the 'are deleted to obtain a final visual matrix result K', so that the interference caused by mismatching key points is avoided. k is a radical of_1'、k_2'、k_n'All the parallax errors are the parallax errors of the screened correct matching points, n' is the logarithm of the final correct matching, and the formula is as follows:

K'＝{k_1',k_2',...,k_n'}

and subtracting the coordinate of the key point matched with the saliency target to obtain the parallax of the saliency target in the binocular image. And (4) substituting the parallax into a binocular ranging model to obtain the significant target distance.

The binocular imaging can acquire two images of the same scene at different visual angles, and the binocular model is as shown in figure 5.

The focal lengths of the two identical imaging systems are separated by a distance B along the horizontal direction, the two optical axes are parallel to the horizontal plane, and the image plane is parallel to the vertical plane;

suppose a point M (X, Y, Z) in the scene, Pl (X) at the left and right imaging points, respectively₁,y₁) And Pr (x)₂,y₂)，x₁,y₁And x₂,y₂Are respectively provided withFor the coordinates of Pl and Pr in the vertical plane of the image, the disparity in the binocular model is defined as k ═ Pl-Pr | ═ x₂-x₁And l, obtaining a distance formula according to a triangular similarity relation, wherein X, Y and Z are coordinates of a horizontal axis, a vertical axis and a longitudinal axis in a space coordinate system:

wherein dx represents the physical distance of each pixel in the horizontal axis direction of the imaged negative film, f is the focal length of the imaging system, Z is the distance from the target point M to the connecting line of the two imaging centers, the parallax matrix obtained in the step four is taken into a formula (17), and the corresponding distance matrix Z' ═ Z is obtained according to the physical information of the binocular model₁,z₂,...,z_n'}，z₁，z₂，z_n'The distance of the significant target is calculated for single matching parallax, and finally the average value of the distance matrix is calculated, namely the distance Z of the significant target in the binocular image_fThe formula is as follows:

the second embodiment is as follows: the present embodiment will be described below with reference to the accompanying drawings, which are different from the specific embodiments in that: step one, the specific process of performing edge detection on the image is as follows:

performing convolution operation on binocular images by adopting a 2D Gaussian filtering template to eliminate noise interference of the images one by one;

step one, two, respectively calculating gradient amplitude and gradient direction of pixels on the filtered binocular image I (x, y) by utilizing the difference of first-order partial derivatives in the horizontal direction and the vertical direction, wherein the partial derivatives dx and dy in the x direction and the y direction are respectively as follows:

dx＝[I(x+1,y)-I(x-1,y)]/2 (21)

dy＝[I(x,y+1)-I(x,y-1)]/2 (22)

the gradient magnitude is then:

D'＝(dx²+dy²)^1/2(23)

the gradient direction is as follows:

θ'＝arctan(dy/dx) (24)；

d 'and theta' respectively represent the gradient magnitude and gradient direction of pixels on the filtered binocular image I (x, y);

step three, carrying out non-maximum value inhibition on the gradient, and then carrying out double-threshold processing on the image to generate an edge image; the gray value of the edge point of the edge image is 255, and the gray value of the non-edge point is 0.

The third concrete implementation mode: the present embodiment will be described below with reference to the accompanying drawings, which are different from the first or second embodiments in that: the second step is that the visual saliency model is used for carrying out saliency feature extraction on the binocular image, and the specific process of generating the saliency feature map is as follows:

after the first binocular image edge detection, overlapping the original image and the edge image:

I₁(σ)＝0.7I(σ)+0.3C(σ) (25)

wherein I (sigma) is the original image of the input binocular image, C (sigma) is the edge image, I₁(σ) is the image after the superimposition processing;

calculating nine layers of Gaussian pyramids of the images after superposition processing by adopting a Gaussian difference function, wherein the 0 th layer is an input superposition image, the 1 to 8 layers are respectively formed by adopting Gaussian filtering and reduced order sampling on the previous layer, the size of the layer corresponds to 1/2 to 1/256 of the input image, and the brightness, the color and the direction characteristics are extracted for each layer of the Gaussian pyramid to generate a corresponding brightness pyramid, a corresponding color pyramid and a corresponding direction pyramid;

the formula for extracting the brightness features is as follows:

I_n＝(r+g+b)/3(26)

wherein r, g, b correspond to red, green, blue three components of the input binocular image color, I_nIs a brightness characteristic;

the formula for extracting the color features is as follows:

R＝r-(g+b)/2 (27)

G＝g-(r+b)/2 (28)

B＝b-(r+g)/2 (29)

Y＝r+g-2(|r-g|+b) (30)

r, G, B, Y correspond to the color components of the superimposed image;

o (σ, ω) is for the luminance characteristic I_nThe directional characteristic of Gabor function filtering extraction is carried out in the scale direction, omega is the direction of the Gabor function, namely the number of Gaussian pyramid layers, and sigma is the total directional number of the Gabor function, wherein sigma ∈ [0,1,2 …,8],ω∈[0°,45°,90°,135°]；

Step two, step three, to solving the three characteristics of luminance, colour and direction of different yardstick of gaussian pyramid carry on the central peripheral contrast to make the difference, specifically:

let the scale c (c ∈ {2,3,4}) be the central scale, and the scale u (u ═ c +, ∈ {3,4}) be the peripheral scale; there are 6 combinations (2-5, 2-6, 3-6, 3-7, 4-7, 4-8) between the central dimension c and the peripheral dimension u in the gaussian pyramid of 9 levels;

the local directional feature contrast, which is the difference between the central and peripheral contrast, is represented by the difference between the feature maps of scale c and scale u as follows:

I_n(c,u)＝|I_n(c)-I_n(u)| (31)

RG(c,u)＝|(R(c)-G(c))-(G(u)-R(u))| (32)

BY(c,u)＝|(B(c)-Y(c))-(Y(u)-B(u))| (33)

O(c,u,ω)＝|O(c,ω)-O(u,ω)| (34)

before difference making, interpolation is needed to make the sizes of the two images consistent, and then difference making is carried out;

and step two, four, fusing the feature maps of different features generated by difference through normalization to generate a significant feature map of the input binocular image, specifically comprising the following steps:

firstly, the scale contrast characteristic diagram of each characteristic is normalized and fused to generate a comprehensive characteristic diagram of the characteristic The feature map is normalized for the luminance features,the feature map is normalized for the color features,normalizing the feature map for the directional features; the calculation process is shown in the following formula:

wherein, N (-) represents a normalization calculation function, firstly, for a feature map to be calculated, normalizing the feature value of each pixel in the feature map to a closed region [0,255], then finding out a global maximum significant value A in each normalized feature map, then calculating the average value a of local maximum values in the feature map, and finally multiplying the feature value corresponding to each pixel of the feature by 2 (A-a);

and then, carrying out normalization processing by using the comprehensive characteristic diagram of each characteristic to obtain a final significant characteristic diagram S, wherein the calculation process is as follows:

Claims

1. A method for distance measurement of a salient object in a binocular image, the method comprising the steps of:

firstly, preprocessing is carried out, edge detection is carried out on a binocular image, and an edge map of the binocular image is generated;

step two, establishing a weighted graph for the binocular image;

the laplace matrix L of the weighted graph is found by:

the combined dirichlet integral formula is as follows:

t_i＝s_i·I_i(8)

judging edge points according to the curvatures of the key point neighborhoods;

wherein h is a descriptor, (a, b) is the position of the key point in the Gaussian pyramid image, and m_gBeing a key pointThe gradient size is the gradient size in the main direction of the histogram in the fourth step and the third step, d is the side length of the window, i.e. 16, (x, y) is the position of the pixel point in the gaussian pyramid image, (x ', y') is the new coordinate of the pixel in the neighborhood of the direction of rotating the coordinate into the key point, and the calculation formula of the new coordinate is as follows:

θ_gthe gradient direction of the key point;

calculating the feature vectors of 128 key points by a window of 16 × 16_，Is expressed as H ═ H₁,h₂,h₃,...,h₁₂₈) Normalizing the feature vector, and recording the normalized feature vector as L_gThe normalization formula is as follows:

step four, screening the generated matching key points;

K_n'＝{k₁-k_m,k₂-k_m,...,k_n-k_m} (17)

setting parallax threshold value to be 3, and setting K_n' deletion of corresponding parallaxes larger than the threshold value in the ' sequence to obtain the final parallax matrix result K ', K_1'、k_2'、k_n'All the parallax errors are the parallax errors of the screened correct matching points, n' is the logarithm of the final correct matching, and the formula is as follows:

K'＝{k_1',k_2',...,k_n'} (18)

2. the method for measuring the distance between the salient objects in the binocular images according to claim 1, wherein the specific process of edge detection of the images by the binocular camera comprises the following steps:

dx＝[I(x+1,y)-I(x-1,y)]/2 (21)

dy＝[I(x,y+1)-I(x,y-1)]/2 (22)

the gradient magnitude is then:

D'＝(dx²+dy²)^1/2(23)

the gradient direction is as follows:

θ'＝arctan(dy/dx) (24)；

3. The method for measuring the distance between the salient objects in the binocular images according to claim 2, wherein the step two of extracting the salient features of the binocular images by using the visual saliency model comprises the following specific steps:

I₁(σ)＝0.7I(σ)+0.3C(σ) (25)

the formula for extracting the brightness features is as follows:

I_n＝(r+g+b)/3 (26)

the formula for extracting the color features is as follows:

R＝r-(g+b)/2 (27)

G＝g-(r+b)/2 (28)

B＝b-(r+g)/2 (29)

Y＝r+g-2(|r-g|+b) (30)

r, G, B, Y correspond to the color components of the superimposed image;

setting a scale c, c belonging to {2,3,4} as a central scale, a scale u, u belonging to c +, and a scale belonging to {3,4} as a peripheral scale; 6 combinations of 2-5, 2-6, 3-6, 3-7, 4-7 and 4-8 are arranged between the central scale c and the peripheral scale u in the Gaussian pyramid with 9 layers;

the local directional feature contrast, which represents the difference between the central and peripheral contrast by the difference of the feature maps of scale c and scale u, is given by:

I_n(c,u)＝|I_n(c)-I_n(u)| (31)

RG(c,u)＝|(R(c)-G(c))-(G(u)-R(u))| (32)

BY(c,u)＝|(B(c)-Y(c))-(Y(u)-B(u))| (33)

O(c,u,ω)＝|O(c,ω)-O(u,ω)| (34)