CN107679542B

CN107679542B - Double-camera stereoscopic vision identification method and system

Info

Publication number: CN107679542B
Application number: CN201710888927.7A
Authority: CN
Inventors: 翁彧
Original assignee: Minzu University of China
Current assignee: Minzu University of China
Priority date: 2017-09-27
Filing date: 2017-09-27
Publication date: 2020-08-11
Anticipated expiration: 2037-09-27
Also published as: CN107679542A

Abstract

The invention discloses a method and a system for identifying stereoscopic vision of double cameras. The method comprises the following steps: acquiring a left image and a right image containing an environmental object; weighting the gray value of each point in the left image with the wavelength value of the color of the corresponding point to obtain a left gray image with color characteristics; weighting the gray value of each point in the right image with the wavelength value of the color of the corresponding point to obtain a right gray image with color characteristics; performing pixel-level point matching on the two gray level images with the color characteristics to obtain matching points; generating a depth image by using the matching points; establishing a stereo feature set according to the depth image; extracting texture features and color features of an image acquired by a camera, and establishing a texture feature set and a color feature set; and inputting the stereo feature set, the texture feature set and the color feature set into a classifier for classification, and identifying to obtain the target barrier in the environment object. By adopting the method or the system, the number of the matching points can be increased, and the matching precision and the identification accuracy are further improved.

Description

Double-camera stereoscopic vision identification method and system

Technical Field

The invention relates to the technical field of image processing and visual identification, in particular to a method and a system for identifying stereoscopic vision of double cameras.

Background

With the development of society, the life of people is continuously improved, the internet of things, sensors and computer image technology are also greatly developed, more and more demands point to robot products with intelligent vision, and therefore, the double-camera vision recognition for robots is more and more important.

At present, the commonly adopted dual-camera visual identification method is as follows: firstly, a classifier is used for detecting a target object for two images acquired by two cameras to obtain dozens to hundreds of scattered pixel matching points, and the obtained pixel matching points generally do not exceed 5 percent of the total number of pixels of the images; and then roughly calculating the distance of each point of a parallax pair generated by two horizontally placed cameras, and taking the coordinates of each point as the coordinates of an adjacent area to obtain the rough overall three-dimensional position information of the target object. The method is utilized to obtain the total three-dimensional position information of the target object for speed measurement, tracking, obstacle avoidance and the like.

The existing double-camera visual identification method adopts a mode of firstly detecting as a main mode and distance measurement as an auxiliary mode, so that the shooting angles and the imaging ranges of a left camera and a right camera are different, the number of found matching points is small, the pixel matching points of two images have large deviation or cannot give the three-dimensional position information of a small target when a target object is identified, and the identification accuracy is not high.

Disclosure of Invention

Accordingly, there is a need for a method and system for stereoscopic vision recognition with two cameras that can improve the accuracy of the recognition.

In order to achieve the purpose, the invention provides the following scheme:

a double-camera stereoscopic vision identification method comprises the following steps:

respectively acquiring a left image and a right image containing an environmental object by using a left camera and a right camera;

converting the left image into a left gray image, acquiring the wavelength value of the color of each point in the color space of the left image, and weighting the gray value of each point in the left image and the wavelength value of the color of the corresponding point to obtain the left gray image with color characteristics;

converting the right image into a right gray image, acquiring the wavelength value of the color of each point in the color space of the right image, and weighting the gray value of each point in the right image and the wavelength value of the color of the corresponding point to obtain the right gray image with color characteristics;

performing pixel-level point matching on the left gray image with the color feature and the corresponding right gray image with the color feature to obtain matching points of the left gray image with the color feature and the right gray image with the color feature;

generating a depth image by using the matching points;

establishing a stereo feature set according to the depth image;

extracting texture features of an image acquired by any camera, and establishing a texture feature set;

extracting color features of an image acquired by any camera, and establishing a color feature set;

and inputting the three-dimensional feature set, the texture feature set and the color feature set into a classifier for classification, and identifying to obtain the target obstacle in the environment object.

Optionally, before the left and right cameras are used to respectively obtain the left and right images containing the environmental object, the internal parameters and external parameters of the cameras are obtained; after a left image and a right image containing an environmental object are respectively obtained by a left camera and a right camera, correcting the images obtained by the cameras by using internal parameters and external parameters of the cameras;

the internal parameter and the external parameter of the camera are obtained, and the method specifically comprises the following steps:

acquiring an image of a chessboard graph by using a camera, wherein the chessboard graph is composed of a plurality of black and white alternating squares and a plurality of black hollow circles;

carrying out binarization on an image acquired by a camera, and identifying black hollow circles and common angular points in a chessboard graph by using a boundary after binarization, wherein the black hollow circles are positioned at four corners of the chessboard graph, the black hollow circles are used for searching and positioning the chessboard grids, and the common angular points are common points of two black squares and two white squares;

establishing a projection matrix by using points on a chessboard drawing plane and corresponding points on an image acquired by a camera according to the black hollow circles and the common angular points

Wherein s is a scale factor, u is an abscissa of a point on the image acquired by the camera, v is an ordinate of a point on the image acquired by the camera, H is a projection matrix,

X_Wis the abscissa, Y, of a point on the plane of the checkerboard_WIs the ordinate of the point on the chessboard figure plane;

solving the projection matrix H by using a least square method;

acquiring internal parameters of the camera according to the projection matrix H;

and acquiring external parameters of the camera according to the projection matrix and the internal parameters of the camera.

Optionally, before performing pixel-level point matching on the left grayscale image with the color feature and the right grayscale image with the color feature and obtaining a matching point of the left grayscale image with the color feature and the right grayscale image with the color feature, the method further includes:

respectively calculating the omnidirectional operators of the left gray level image with the color characteristic and the right gray level image with the color characteristic, wherein the omnidirectional operators are

Sobel(x,y)＝2*[P(x+1,y)-P(x-1,y)]+P(x+1,y-1)-P(x-1,y-1)+P(x+1,y+1)-P(x-1,y+1)+2*[P(x,y+1)-P(x,y-1)]+P(x-1,y+1)-P(x-1,y-1)+P(x+1,y+1)-P(x+1,y-1)

Wherein Sobel (x, y) represents an omnidirectional operator of a point (x, y) on the image, P represents a pixel of the image, and P (x, y) represents a pixel of the point (x, y) on the image;

mapping the left gray level image with the color characteristic and the right gray level image with the color characteristic by using a mapping function, wherein the pixel of the mapped image is

Wherein, the PreFilterCap is a constant parameter with the value of 15;

the pixel-level point matching is performed on the left grayscale image with the color feature and the right grayscale image with the color feature, so as to obtain a matching point between the left grayscale image with the color feature and the right grayscale image with the color feature, and the method specifically includes:

and matching by using an SAD algorithm according to the mapped pixels of the left gray image with the color features and the mapped pixels of the right gray image with the color features to obtain matching points of the mapped left gray image with the color features and the mapped right gray image with the color features.

Optionally, the generating a depth image by using the matching points specifically includes:

calculating the real coordinates of each pair of matching points, and obtaining the three-dimensional space coordinates of the measured point corresponding to each pair of matching points by using a parallax principle, wherein the real coordinates are the coordinates of the matching points in the real space relative to a three-dimensional coordinate system taking the centers of the two cameras as the origin;

and generating a depth image by using the three-dimensional space coordinates of all the measured points corresponding to the matching points.

Optionally, the establishing a stereo feature set according to the depth image specifically includes:

searching the boundary of the depth image;

partitioning the depth images according to the boundaries, and calculating the size and shape of each depth image and the distance between each depth image and the left camera and the right camera;

establishing a stereo feature set according to the size and shape of each depth image and the distance between each depth image and the left camera and the right camera;

the extracting of the texture features of the image acquired by any camera and the establishing of the texture feature set specifically include:

converting a left image into a left gray image, extracting Haar characteristics of the left gray image, and establishing a texture characteristic set, wherein the Haar characteristics comprise an original moment characteristic, an edge characteristic, a linear characteristic and a center surrounding characteristic;

the extracting of the color feature of the image obtained by any camera and the establishment of the color feature set specifically include:

converting the left image into color block images, and acquiring the wavelength value of the color of each point in the color space of each color block image;

and establishing a color feature set by using the wavelength value.

The invention also provides a double-camera stereoscopic vision identification system, and the double-camera stereoscopic vision identification method is applied to the system.

The system comprises:

the image acquisition module is used for respectively acquiring a left image and a right image containing an environmental object by utilizing the left camera and the right camera;

the left gray image acquisition module is used for converting the left image into a left gray image, acquiring the wavelength value of the color of each point in the color space of the left image, and weighting the gray value of each point in the left image and the wavelength value of the color of the corresponding point to obtain the left gray image with color characteristics;

the right gray image acquisition module is used for converting the right image into a right gray image, acquiring the wavelength value of the color of each point in the color space of the right image, and weighting the gray value of each point in the right image and the wavelength value of the color of the corresponding point to obtain the right gray image with color characteristics;

the matching module is used for performing pixel-level point matching on the left gray image with the color features and the corresponding right gray image with the color features to obtain matching points of the left gray image with the color features and the right gray image with the color features;

the depth image generation module is used for generating a depth image by using the matching points;

the stereoscopic feature set establishing module is used for establishing a stereoscopic feature set according to the depth image;

the texture feature set establishing module is used for extracting the texture features of the image acquired by any camera and establishing a texture feature set;

the color feature set establishing module is used for extracting the color features of the image acquired by any camera and establishing a color feature set;

and the classification module is used for inputting the three-dimensional feature set, the texture feature set and the color feature set into a classifier for classification, and identifying and obtaining the target obstacle in the environment object.

Optionally, the method further includes: the device comprises an internal parameter and external parameter acquisition module and a correction module, wherein the internal parameter and external parameter acquisition module is used for acquiring internal parameters and external parameters of the camera, and the correction module is used for correcting images acquired by the camera by using the internal parameters and the external parameters of the camera;

the internal parameter and external parameter obtaining module specifically includes:

the chessboard pattern image acquisition unit is used for acquiring an image of a chessboard pattern by using a camera, wherein the chessboard pattern is composed of a plurality of black and white alternating squares and a plurality of black hollow circles;

the binarization unit is used for binarizing the image acquired by the camera and identifying black hollow circles and common angular points in the checkerboard image by using the boundary after binarization, wherein the black hollow circles are positioned at four corners of the checkerboard image, the black hollow circles are used for searching and positioning the checkerboard, and the common angular points are common points of two black squares and two white squares;

a projection matrix establishing unit for establishing a projection matrix by using the points on the chessboard drawing plane and the corresponding points on the image acquired by the camera according to the black hollow circle and the common angular point

the projection matrix solving unit is used for solving the projection matrix H by using a least square method;

the internal parameter acquisition unit is used for acquiring internal parameters of the camera according to the projection matrix H;

and the external parameter acquisition unit is used for acquiring the external parameters of the camera according to the projection matrix and the internal parameters of the camera.

Optionally, the method further includes:

an omnidirectional operator calculating module, configured to calculate omnidirectional operators of the left grayscale image with the color feature and the right grayscale image with the color feature respectively, where the omnidirectional operators are

a mapping module for mapping the left gray image with color features and the right gray image with color features by using a mapping function, wherein the pixels of the mapped images are

Wherein, the PreFilterCap is a constant parameter with the value of 15;

the matching module specifically comprises:

and the matching point acquisition unit is used for matching by utilizing an SAD algorithm according to the mapped pixels of the left gray image with the color characteristics and the mapped pixels of the right gray image with the color characteristics to acquire the matching points of the mapped left gray image with the color characteristics and the mapped right gray image with the color characteristics.

Optionally, the depth image generating module specifically includes:

the measured point coordinate acquisition unit is used for calculating the real coordinates of each pair of matching points and obtaining the three-dimensional space coordinates of the measured point corresponding to each pair of matching points by utilizing the parallax principle, wherein the real coordinates are the coordinates of the matching points in the real space relative to a three-dimensional coordinate system taking the centers of the two cameras as the origin;

and the depth image generating unit is used for generating a depth image by using the three-dimensional space coordinates of all the measured points corresponding to the matching points.

Optionally, the stereoscopic feature set establishing module specifically includes:

a boundary searching unit, configured to search for a boundary of the depth image;

the partitioning and calculating unit is used for partitioning the depth images according to the boundaries and calculating the size and the shape of each depth image and the distance between each depth image and the left camera and the right camera;

the stereoscopic feature set establishing unit is used for establishing a stereoscopic feature set according to the size and the shape of each depth image and the distance between each depth image and the left camera and the right camera;

the texture feature set establishing module specifically includes:

the texture feature set establishing unit is used for converting a left image into a left gray image, extracting Haar features of the left gray image and establishing a texture feature set, wherein the Haar features comprise an original moment feature, an edge feature, a linear feature and a center surrounding feature;

the stereo color feature set establishing module specifically comprises:

a wavelength value acquisition unit for converting the left image into patch images and acquiring a wavelength value of a color of each point in a color space of each patch image;

and the color feature set establishing unit is used for establishing a color feature set by using the wavelength value.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method and a system for identifying stereoscopic vision of double cameras, which are characterized in that the gray value of each point in an image acquired by a camera is weighted with the wavelength value of the color of the corresponding point to form a gray image with color characteristics, and the left and right gray images with the color characteristics are subjected to pixel-level point matching, so that the matching precision can be improved; generating a depth image by using three-dimensional space coordinates of all measured points corresponding to matching points, establishing a three-dimensional characteristic set according to the depth image, respectively extracting texture characteristics and color characteristics of an image acquired by any camera, establishing a texture characteristic set and a color characteristic set, inputting the three-dimensional characteristic set, the texture characteristic set and the color characteristic set into a classifier for classification, identifying to obtain a target object, establishing the three-dimensional characteristic set and the color characteristic set on the basis of the texture characteristic set, and inputting the three characteristic sets into the classifier for classification, so that the identification accuracy can be greatly improved; and the chessboard diagrams are used for calibrating the left camera and the right camera, and before the two images are matched, the images acquired by the cameras are corrected according to calibration data, so that the matching precision can be further improved. The method or the system can improve the number of the matching points, improve the matching precision and further improve the identification accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flowchart of a stereoscopic vision recognition method with two cameras according to an embodiment of the present invention;

FIG. 2 is a chessboard diagram in the stereoscopic vision recognition method of two cameras according to the embodiment of the invention;

FIG. 3 is a schematic diagram of calculating three-dimensional space coordinates of a measured point in the dual-camera stereo vision recognition method according to the embodiment;

fig. 4 is a flowchart of a stereoscopic vision recognition system with two cameras according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a flowchart of a stereoscopic vision recognition method with two cameras according to an embodiment of the present invention.

Referring to fig. 1, the dual-camera stereoscopic vision recognition method of the embodiment includes:

step S1: and acquiring internal parameters and external parameters of the camera.

The method specifically comprises the following steps:

step S101: and acquiring the image of the chessboard by using the camera. Fig. 2 is a checkerboard diagram in the dual-camera stereoscopic vision recognition method according to the embodiment of the present invention, and referring to fig. 2, the checkerboard diagram is composed of a plurality of black and white alternating squares and a plurality of black hollow circles, specifically, 10 rows by 14 columns of the black and white alternating squares are provided, the size of the squares is 20mm by 20mm, the black hollow circles are located at four corners of the checkerboard pattern, the diameter of the black hollow circle is 20mm, the diameter of the hollow part is 6.5mm, and the black hollow circles are used for finding and positioning the checkerboard.

Step S102: the image acquired by the camera is binarized, and a black hollow circle and a common angular point in a chessboard pattern shown in fig. 2 are identified by using a boundary after binarization, wherein the common angular point is a common point of two black squares and two white squares.

Step S103: establishing a projection matrix by using points on a chessboard drawing plane and corresponding points on an image acquired by a camera according to the black hollow circles and the common angular points

X_Wis the abscissa, Y, of a point on the plane of the checkerboard_WIs the ordinate of a point on the plane of the checkerboard diagram.

Step S104: and solving the projection matrix H by using a least square method.

The method specifically comprises the following steps:

from the projection matrix in step S103

Can obtain

By further derivation, it can be obtained

Multiplying the denominator to the left of the equation, i.e. by

Let h ═ h₁₁h₁₂h₁₃h₂₁h₂₂h₂₃h₃₁h₃₂]^T，

Namely have

Order to

The equations of multiple corresponding points are added up to be regarded as Sh ' ═ e, and then the Sh ' ═ e is solved by using the least square method to obtain h ' ═ (S)^TS)^-1S^TAnd e, finally solving a projection matrix H.

Step S105: and acquiring the internal parameters of the camera according to the projection matrix H.

The method specifically comprises the following steps:

let h_iRepresenting projection matrix HEach column vector, r_iEach column vector representing the rotation matrix R, which is different from the projection matrix H obtained in step S104 by a scaling factor λ, can be written as:

[h₁h₂h₃]＝λA[r₁r₂t]

wherein h is₁、h₂And h₃Respectively representing the 1 st, 2 nd and 3 rd columns, r, of the projection matrix H₁And r₂Respectively representing the 1 st and 2 nd columns of the rotation matrix R, t is the translation matrix, A is the internal parameter matrix, and R is₁And r₂Is a unit orthogonal vector, so there are

This provides two constraint equations for the solution of the internal parameters.

Next, let

Wherein α is focal length in x direction, β is focal length in y direction, γ is tilt parameter of coordinate axis, and u is₀Is the abscissa of the principal point, v₀Is the ordinate of the principal point, where B is a symmetric matrix, which can be defined by a 6-dimensional vector, i.e.

b＝[B₁₁B₁₂B₂₂B₁₃B₂₃B₃₃]^T

Let the ith column vector of H be H_i＝[h_i1,h_i2,h_i3]^TThus there are

Wherein V_ij＝[h_i1h_j1,h_i1h_j2+h_i2h_j1,h_i2h_j2,h_i3h_j1+h_i1h_j3,h_i3h_j2+h_i2h_j3,h_i3h_j3]^TThen, two constraints on the internal parameters can be written as equations for b

If there are n images, their equations are added to obtain

Vb＝0

Where V is a matrix of 2n × 6, when n ≧ 3, B can be uniquely determined in the sense of a scale factor, and when n is 2, where the number of equations is less than the number of unknowns, we can add an additional constraint γ of 0, i.e., B₁₂0, therefore, [ 010000 ] can be used]b is 0 as an additional equation for 0, and the least squares solution for 0 is V^TNormalizing the characteristic vector corresponding to the minimum characteristic value of V to obtain the required B, and further obtain B; when n is 1, the two equations can only solve two unknowns, and we can assume that the optical center projects at the center of the image, and thus find the maximum multiple of the camera in the horizontal and vertical directions.

Once b is found, the internal parameter matrix a of the camera can be calculated according to the following two methods:

(1) b is constructed by B, and A is solved by Cholesky matrix decomposition algorithm^-1And then inverting to obtain A.

(2) B is formed from B, in the sense that it differs by a scale factor, i.e. B ═ λ a^-TA^-1From the properties of the absolute quadratic curve, the internal parameters of the camera are easily found:

α＝-B₁₂f_x ²f_y/λ

u₀＝αv₀/f_x-B₁₃f_x ²/λ。

step S106: and acquiring external parameters of the camera according to the projection matrix and the internal parameters of the camera.

The method specifically comprises the following steps:

from [ h ] in step S105₁h₂h₃]＝λA[r₁r₂t]Obtaining r₁＝λA^-1h₁，r₂＝λA^-1h₂，t＝λA^-1h₃，λ＝1/||A^-1h₁||＝1/||A^-1h₂||，r₃＝r₁×r₂Substituting the projection matrix H and the internal parameter matrix a obtained in steps S104 and S105 to obtain a rotation matrix R ═ R₁,r₂,r₃]And the rotation matrix is an external parameter matrix of the camera.

Step S107: and optimizing the internal parameter matrix A and the rotation matrix R.

The obtained internal parameter matrix A of the camera and the rotation matrix R corresponding to each image are only a rough solution and have no specific physical significance, so that all parameters are subjected to nonlinear optimization through maximum likelihood estimation and further refined.

Specifically, assuming that there are n images about the template plane with m index points, the merit function may be established:

wherein m is_ijIs the jth image point, R, in the ith image_iIs the rotation matrix of the ith plot coordinate system, t_iIs the translation vector of the ith plot coordinate system, M_jIs the spatial coordinate of the jth point, m (A, R, t)_i,M_j) Are the coordinates of the image points determined from these known quantities.

Since the rotation matrix has 9 parameters but only three degrees of freedom, it can be represented by vectors of three parameters, i.e. a rotation can be represented by a three-dimensional vector, i.e. a rotation vector, whose direction is the direction of the rotation axis and whose modulus is equal to the rotation angle.

Determined by three Euler angle parameters, r (r)₁,r₂,r₃)^TIs the Rodrigues (Rodrigues) representation of the rotation matrix, the relationship between R and R is given by the formula:

wherein, the rotation vector r is (x, y, z)^TDefining an antisymmetric matrix formed by it as

Wherein

Is the angle of rotation.

Minimizing the merit function A, R, t_i,M_jThe minimum value of the evaluation function is obtained by adopting a Levenberg-Marquardt algorithm.

Step S2: and respectively acquiring a left image and a right image containing the environmental object by using the left camera and the right camera.

Step S3: and respectively correcting the left image and the right image containing the environmental object.

Specifically, the left image including the environmental object acquired by the left camera in step S2 is corrected using the internal parameters and the external parameters of the left camera acquired in step S1, and the right image including the environmental object acquired by the right camera in step S2 is corrected using the internal parameters and the external parameters of the right camera acquired in step S1.

Step S4: and acquiring a left gray image with a color characteristic and a right gray image with the color characteristic.

Specifically, the left image is converted into a left gray image, the wavelength value of the color of each point in the color space of the left image is obtained, and the gray value of each point in the left image and the wavelength value of the color of the corresponding point are weighted to obtain a left gray image with color characteristics; and converting the right image into a right gray image, acquiring the wavelength value of the color of each point in the color space of the right image, and weighting the gray value of each point in the right image and the wavelength value of the color of the corresponding point to obtain the right gray image with color characteristics.

Step S5: performing pixel-level point matching on the left grayscale image with the color feature and the corresponding right grayscale image with the color feature to obtain matching points of the left grayscale image with the color feature and the right grayscale image with the color feature, specifically including:

step S501: the image acquired in step S4 is mapped to acquire pixels of the mapped image.

Specifically, the omnidirectional operators of the left gray level image with the color feature and the right gray level image with the color feature are respectively calculated, and the omnidirectional operators are

Wherein, the preflterCap is a constant parameter with the value of 15.

Step S502: and matching by using an SAD algorithm according to the obtained pixels of the mapped left gray image with the color features and the pixels of the mapped right gray image with the color features to obtain matching points of the mapped left gray image with the color features and the mapped right gray image with the color features.

Specifically, the SAD algorithm is used for calculating the matching cost, and the cost calculation formula is

n denotes the size of the calculation range taken around the pixel point (x, y), d denotes the gradient, L denotes the pixel point value of the left image, and I denotes the pixel point value of the right image.

In order to make the result more accurate, besides the SAD method is adopted to calculate the image gradient obtained in the last step, the original image is further sampled and calculated, wherein the sampling proportion is 50%, namely half of pixels of the original image are sampled on average and substituted into a cost calculation formula for calculation.

After all cost results are obtained, matching each point by adopting a dynamic programming mode, specifically adopting a dynamic programming mode

Wherein L is_r(p, d) represents the minimum cost value along the current direction (i.e., left-to-right) when the disparity value of the current pixel p is d; r denotes a direction pointing to the current pixel p, which is herein understood to be the adjacent pixel to the left of the pixel p, k denotes the number of the point to be matched of the p-r pixel;

as can be seen from the above, there are four paths, among which:

P1＝8*cn*SADWindowSize*SADWindowSize

P2＝32*cn*SADWindowSize*SADWindowSize

cn is the number of image channels, sadwindow size is the size of an SAD window, that is, the size of a generated SAD image, and sadwindow size is an odd number;

the method comprises the following steps that due to the inevitable phenomena of noise, large area and no contour and the like in an image, the uniqueness of a matching point needs to be detected, when the lowest cost in a parallax window range is (1+ uniquenessRatio/100) times of the next lowest cost, the parallax value corresponding to the lowest cost is the parallax of the pixel point, otherwise, the parallax of the pixel point is 0, wherein uniquenessRatio is a constant parameter, and the calculation of the parallax d is as follows:

wherein the content of the first and second substances,

Sp[d]an array representing the stored disparity d, DIS _ SCALE representing the binary form of 10000;

after obtaining the parallax d, in order to reduce errors, detecting the consistency of the parallax d specifically includes:

let the disparity of the left image be dispL [ X ], the disparity of the right image be dispR [ X ], if dispL [ X ] is known to be d, dispR [ X-d ] is known to be d, and if a plurality of points of the left image correspond to one point in the right image, dispL [ X ] is known to be d, dispL [ X + n ] is known to be d + n, and dispR [ X-d ] is known to be dispR [ (X + n) - (d + n) ], so that the unqualified points are eliminated.

Step S6: and generating a depth image by using the matching points.

The method specifically comprises the following steps:

step S601: calculating the real coordinates of each pair of matching points, and obtaining the three-dimensional space coordinates of the measured point corresponding to each pair of matching points by using a parallax principle, wherein the real coordinates are coordinates of the matching points in the real space relative to a three-dimensional coordinate system with the centers of the two cameras as the origin, and the calculation principle of the three-dimensional space coordinates of the measured point is as follows:

the principle of the double-camera distance measurement is the same as the binocular stereo vision of human eyes, and the principle is obtained according to the difference of the imaging positions of the two cameras (binocular) at the same point of the obtained image.

Fig. 3 is a schematic diagram of calculating three-dimensional space coordinates of a measured point in the dual-camera stereo vision recognition method according to the embodiment.

Referring to fig. 3, the center of projection of the left camera is o, and the center of projection of the right camera is o_tThe distance between the projection centers of the two cameras is a baseline distance, a left camera coordinate system is set to be O-xyz and is the same as a world coordinate system, no rotation occurs, and an image coordinate system is set to be O₁-X₁Y₁Effective focal length of f₁And the coordinate system of the right camera is o_t-x_ty_tz_tImage coordinate system of O_t-X_tY_tEffective focal length of f_tWhen two cameras watch the same characteristic point P of a space-time object at the same time, images of the point P are respectively obtained on the left eye and the right eye, and the coordinate of the point P is (x, y, z), so that the following relational expression can be obtained according to a projection model of the cameras:

according to the o-xyz coordinate system and o_t-x_ty_tz_tThe positional relationship between the coordinate systems can be found as follows:

wherein M is_lrIs a spatial transformation matrix in which R represents the coordinate systems o-xyz and o_t-x_ty_tz_tT represents the origin o and the origin o_tTranslation transformation vector between;

similarly, for a spatial point in the o-xyz coordinate system, the correspondence between two camera image points can be expressed as:

thus, the spatial point three-dimensional coordinates can be expressed as:

the three-dimensional space coordinates of the measured point can be reconstructed by substituting the internal parameters of the left camera and the right camera, the effective focal lengths of the left camera and the right camera and the image coordinates of the space points in the left camera and the right camera into the formula.

Step S602: and generating a depth image by using the three-dimensional space coordinates of all the measured points corresponding to the matching points.

Specifically, the three-dimensional space coordinates of all the measured points are stored in an array form in other channels of the image, so that the image with coordinate information, namely the depth image, can be formed.

Step S7: and establishing a stereo feature set according to the depth image.

The method specifically comprises the following steps:

step S701: and searching the boundary of the depth image.

Step S702: and partitioning the depth image according to the boundary, and calculating the size and the shape of each depth image and the distance between each depth image and the left camera and the right camera.

Step S703: and establishing a stereo feature set according to the size and the shape of each depth image and the distance between each depth image and the left camera and the right camera.

Step S8: and extracting the texture features of the image acquired by any camera to establish a texture feature set.

Specifically, the texture features of the left image acquired by the left camera may be extracted to establish a left image texture feature set, the texture features of the right image acquired by the right camera may also be extracted to establish a right image texture feature set, for convenience of calculation, in this embodiment, the left image is converted into a left gray image, the Haar features of the left gray image are extracted, and a left image texture feature set is established, where the Haar features include an original moment feature, an edge feature, a linear feature, and a center surrounding feature.

Step S9: and extracting the color characteristics of the image acquired by any camera to establish a color characteristic set.

Specifically, the color feature of the left image acquired by the left camera may be extracted to establish a left image color feature set, or the color feature of the right image acquired by the right camera may be extracted to establish a right image color feature set, for convenience of calculation, the left image is converted into color block images in this embodiment, and the wavelength value of the color of each point in each color block image color space is acquired; and establishing a left image color feature set by using the wavelength value.

Step S10: and inputting the three-dimensional feature set, the texture feature set and the color feature set into a classifier for classification, and identifying to obtain the target obstacle in the environment object.

In this embodiment, the stereo feature set, the left image texture feature set, and the left image color feature set are input into a trained classifier to be classified, and a target obstacle in the environmental object is identified and obtained, where the trained classifier includes a multi-layer two-dimensional feature classifier established by using extracted texture features of a known camera image, and a multi-layer stereo-color feature classifier established by using extracted stereo features of a known depth image and extracted color feature sets of a known camera image, the multi-layer two-dimensional feature classifier is used for performing pre-classification, and the multi-layer stereo-color feature classifier is used for further filtering and classifying results obtained by the pre-classification by using three-dimensional coordinate information and color information.

In this embodiment, a novel Cascade Classifier (Cascade Classifier), i.e., a classification recognition method in which a plurality of feature classifiers are connected in series, is adopted, so that a high classification accuracy can be achieved with few dimensions and in a short time. For example, assuming that a general classifier needs 200-dimensional features to achieve 99% accuracy and 1% false detection rate, and needs 10-dimensional features to achieve 99.9% accuracy and 50% false detection rate, and by means of cascading, assuming 10-level cascading, the finally obtained accuracy and false detection rate are respectively the accuracy and the false detection rate obtained by the cascading method

TPR＝(0.999)¹⁰＝99.0％

FPR＝(0.5)¹⁰＝0.1％

The analysis shows that the classification can be realized faster and better by using fewer features and a simpler classifier through the cascade classifier, in addition, in the detection process, because the TPR is higher, once a certain area is detected to be not a target, the subsequent detection can be directly stopped, and because the non-target area occupies most parts in most detection applications, most detection windows can be stopped quickly, so that the classification speed is greatly improved.

In this embodiment, a multilayer two-dimensional Haar feature classifier and a multilayer stereo-color feature classifier are adopted to perform cascade connection to quickly classify target obstacles contained in an environmental object, the multilayer two-dimensional Haar classifier performs multilayer analysis on 20 pixel features of a target to preliminarily identify a plurality of possible targets, and then the multilayer stereo-color feature classifier further performs filtering classification on the plurality of possible targets, specifically:

the multi-layer two-dimensional Haar feature classifier converts the image into Haar feature vectors, and calculates Haar feature values of the regions according to the following formula:

featureValue_Haar(x)＝weight_all×∑_Pixel∈allPixel+weight_black×∑_Pixel∈ _blackPixel

comparing the obtained calculation result with the characteristic value in the xml configuration file to obtain a plurality of intermediate results;

the multilayer stereo-color feature classifier maps coordinate information into a three-dimensional coordinate system, and a fitting curved surface is formed after distance vector extreme values of suspicious target regions given by the multilayer two-dimensional Haar feature classifier are removed:

lny therein_i＝z_i，

Then, the coordinate center of the curved surface is taken as the origin, the tangential direction of the whole curved surface is taken as the y axis for coordinate conversion, the vector set of each point is extracted, and the final characteristic value is obtained through the following calculation

featureValue(x)＝weight_x×∑_Pixel∈xPixel+weight_y×∑_Pixel∈yPixel+

weight_z×∑_Pixel∈zPixel+weight_color×∑_{Pixel∈color}Pixel

Wherein, weight_x、weight_y、weight_zRespectively the weighted values, weight, of the pixel points in the x direction, y direction and z direction_colorIs a weight value of a wavelength corresponding to a color,

∑_Pixel∈xPixel、∑_Pixel∈yPixel、∑_Pixel∈zpixel is the sum of the coordinates of Pixel points meeting the conditions in the x direction, the y direction and the z direction respectively,

wherein weight_x＝0.1、weight_y＝0.1、weight_z＝0.53、weight_color＝0.27；

The result obtained through the above process is the final recognition result.

The method for identifying the stereoscopic vision of the double cameras in the embodiment comprises the steps of firstly correcting two images acquired by the cameras, weighting the gray value of each point in the two images with the wavelength value of the color of the corresponding point to obtain two corresponding gray images with color characteristics, and matching the two images, so that the number of matching points can be increased, and the matching precision and the identification accuracy are improved; the establishment of the multilayer stereo-color feature classifier realizes the stereo position detection at the level of a single pixel point and the classification detection of stereo position features, can improve the imaging quality and the depth detection accuracy of the double cameras, reduce the distortion rate of the depth detection in the motion state of the cameras, add the classification detection of the stereo features, and is more suitable for robots.

The double-camera stereoscopic vision identification method in the embodiment solves the problems that double-camera imaging points are sparse, distance is distorted when the cameras move, the requirement on the quality of the cameras is high, a classifier cannot detect stereoscopic features and the like, meets the requirements of a robot on visual identification and three-dimensional imaging in a motion state, realizes continuous and reliable identification of objects such as human faces and the like, and has certain capability of distinguishing real objects, photos and images.

Referring to fig. 4, the dual-camera stereo vision recognition system 40 of the embodiment includes:

an internal parameter and external parameter obtaining module 401, configured to obtain internal parameters and external parameters of the camera.

The internal parameter and external parameter obtaining module 401 specifically includes:

the chessboard pattern image acquisition unit is used for acquiring an image of a chessboard pattern by using a camera, and the chessboard pattern is composed of a plurality of black and white alternating squares and a plurality of black hollow circles.

And the binarization unit is used for binarizing the image acquired by the camera and identifying black hollow circles and common angular points in the checkerboard by using the boundary after binarization, the black hollow circles are positioned at four corners of the checkerboard, the black hollow circles are used for searching and positioning the checkerboard, and the common angular points are common points of two black squares and two white squares.

And the projection matrix solving unit is used for solving the projection matrix H by using a least square method.

And the internal parameter acquisition unit is used for acquiring the internal parameters of the camera according to the projection matrix H.

An image obtaining module 402, configured to obtain a left image and a right image containing an environmental object by using the left and right cameras, respectively.

And a rectification module 403, configured to rectify the image acquired by the camera with internal parameters and external parameters of the camera.

A left grayscale image obtaining module 404, configured to convert the left image into a left grayscale image, obtain a wavelength value of a color of each point in a left image color space, and weight the grayscale value of each point in the left image and the wavelength value of the color of the corresponding point to obtain a left grayscale image with color characteristics.

A right grayscale image obtaining module 405, configured to convert the right image into a right grayscale image, obtain a wavelength value of a color of each point in a right image color space, and weight the grayscale value of each point in the right image and the wavelength value of the color of the corresponding point to obtain the right grayscale image with color characteristics.

An omnidirectional operator calculating module 406, configured to calculate omnidirectional operators of the left grayscale image with the color feature and the right grayscale image with the color feature respectively, where the omnidirectional operators are

Wherein Sobel (x, y) represents an omnidirectional operator of a point (x, y) on the image, P represents a pixel of the image, and P (x, y) represents a pixel of the point (x, y) on the image.

A mapping module 407, configured to map the left grayscale image with color features and the right grayscale image with color features by using a mapping function, where a pixel of the mapped image is

Wherein, the preflterCap is a constant parameter with the value of 15.

The matching module 408 is configured to perform pixel-level point matching on the left grayscale image with the color feature and the corresponding right grayscale image with the color feature, and obtain a matching point of the left grayscale image with the color feature and the right grayscale image with the color feature.

The matching module 408 specifically includes:

And a depth image generating module 409, configured to generate a depth image using the matching points.

The depth image generation module 409 specifically includes:

and the measured point coordinate acquisition unit is used for calculating the real coordinates of each pair of matching points and obtaining the three-dimensional space coordinates of the measured point corresponding to each pair of matching points by utilizing the parallax principle, wherein the real coordinates are the coordinates of the matching points in the real space relative to a three-dimensional coordinate system taking the centers of the two cameras as the origin.

A stereo feature set establishing module 410, configured to establish a stereo feature set according to the depth image.

The stereo feature set establishing module 410 specifically includes:

the texture feature set creating module 411 is configured to extract texture features of an image acquired by any one of the cameras, and create a texture feature set.

The texture feature set creating module 411 specifically includes:

the texture feature set establishing unit is used for converting a left image into a left gray image, extracting Haar features of the left gray image and establishing a texture feature set, wherein the Haar features comprise original moment features, edge features, linear features and center surrounding features.

The color feature set establishing module 412 is configured to extract color features of an image acquired by any one of the cameras, and establish a color feature set.

The stereo color feature set establishing module 412 specifically includes:

and the wavelength value acquisition unit is used for converting the left image into color block images and acquiring the wavelength value of the color of each point in the color space of each color block image.

And the classification module 413 is configured to input the stereo feature set, the texture feature set, and the color feature set into a classifier for classification, and identify a target obstacle in the environmental object.

The double-camera stereoscopic vision identification system in the embodiment can improve the number of the matching points, improve the matching precision and further improve the identification accuracy; the imaging quality and the depth detection accuracy of the double cameras can be improved, the distortion rate of the depth detection in the motion state of the cameras is reduced, and the method is more suitable for robots.

In the system disclosed by the embodiment in the specification, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A method for identifying stereoscopic vision of double cameras is characterized by comprising the following steps:

generating a depth image by using the matching points;

establishing a stereo feature set according to the depth image; the establishing of the stereo feature set according to the depth image specifically includes: searching the boundary of the depth image; partitioning the depth images according to the boundaries, and calculating the size and shape of each depth image and the distance between each depth image and the left camera and the right camera; establishing a stereo feature set according to the size and shape of each depth image and the distance between each depth image and the left camera and the right camera;

extracting color features of an image acquired by any camera, and establishing a color feature set; the extracting of the color feature of the image obtained by any camera and the establishment of the color feature set specifically include: converting the left image into color block images, and acquiring the wavelength value of the color of each point in the color space of each color block image; establishing a color feature set by using the wavelength values;

2. The method for stereoscopic vision identification of two cameras according to claim 1, further comprising: before a left image and a right image containing an environmental object are respectively obtained by using a left camera and a right camera, internal parameters and external parameters of the cameras are obtained; after a left image and a right image containing an environmental object are respectively obtained by a left camera and a right camera, correcting the images obtained by the cameras by using internal parameters and external parameters of the cameras;

solving the projection matrix H by using a least square method;

3. The method of claim 1, wherein pixel-level point matching is performed on the left grayscale image with color features and the right grayscale image with color features, and before the matching points of the left grayscale image with color features and the right grayscale image with color features are obtained, the method further comprises:

Wherein, the PreFilterCap is a constant parameter with the value of 15;

4. The method for stereoscopic vision identification with two cameras according to claim 1, wherein the generating a depth image by using the matching points specifically includes:

5. The method for stereoscopic vision identification with two cameras according to claim 1, wherein the extracting texture features of the image obtained by any one camera and establishing a texture feature set specifically comprises:

converting a left image into a left gray image, extracting Haar features of the left gray image, and establishing a texture feature set, wherein the Haar features comprise original moment features, edge features, linear features and center surrounding features.

6. A dual-camera stereo vision recognition system, comprising:

the classification module is used for inputting the three-dimensional feature set, the texture feature set and the color feature set into a classifier for classification, and identifying and obtaining a target obstacle in the environment object;

the stereoscopic feature set establishing module specifically includes:

the color feature set establishing module specifically includes:

7. The dual-camera stereo vision recognition system of claim 6, further comprising: the device comprises an internal parameter and external parameter acquisition module and a correction module, wherein the internal parameter and external parameter acquisition module is used for acquiring internal parameters and external parameters of the camera, and the correction module is used for correcting images acquired by the camera by using the internal parameters and the external parameters of the camera;

8. The dual-camera stereo vision recognition system of claim 6, further comprising:

Wherein, the PreFilterCap is a constant parameter with the value of 15;

the matching module specifically comprises:

9. The dual-camera stereoscopic vision recognition system of claim 6, wherein the depth image generation module specifically comprises:

10. The dual-camera stereo vision recognition system of claim 6, wherein the texture feature set creating module specifically comprises: