CN113792645A

CN113792645A - AI eyeball fusing image and laser radar

Info

Publication number: CN113792645A
Application number: CN202111064566.7A
Authority: CN
Inventors: 黄明
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2021-12-14

Abstract

The invention relates to an AI eyeball fusing an image and a laser radar to realize the intelligent detection and tracking of a target, and the method comprises the following steps: step 1) establishing imaging models of a laser radar and a camera, step 2) jointly calibrating the laser radar and the camera, determining a conversion relation between a laser radar coordinate system and a camera coordinate, namely a rotation matrix R and a translation matrix T, step 3) collecting a video image in front of a camera shooting lens, performing target detection through a convolutional neural network model, generating a target candidate frame, step 4) extracting characteristic point position information of the candidate frame, calculating a direction vector under the camera coordinate system, step 5) calculating the direction vector and an azimuth angle under the laser radar coordinate system according to a projection transformation matrix R, T, and step 6) performing targeted scanning on a target according to the azimuth angle laser radar and calculating a target point cloud; and step 7) the video image with the target information is output by combining the image and the point cloud, the scheme has small computational demand and stable and reliable system, and can be used for sensing the surrounding environment and carrying out target detection and tracking.

Description

AI eyeball fusing image and laser radar

Technical Field

The invention relates to a method, in particular to an AI eyeball fusing an image and a laser radar, and belongs to the technical field of visual perception.

Background

In addition, the unmanned road test is opened in China, the intelligent automobile industry is vigorously developed, the development of the unmanned technology is accelerated, and the commercial landing application of unmanned intelligent equipment is promoted. The unmanned intelligent equipment system comprises perception, decision making, execution and the like, environment perception is the basis of the unmanned intelligent equipment, and the improvement of environment perception capability is the necessary course of the unmanned intelligent equipment, namely how to perceive information such as the direction, relative speed and the like of the surrounding environment relative to the unmanned intelligent equipment. Most of existing perception methods are based on image data or point cloud data, namely environment perception is carried out by single 2D or 3D vision, the single vision requires a large visual field, ultrahigh resolution, high operation processing capacity and the like, the single vision is easily affected by the external environment, system reliability is reduced, and robustness is poor. The multi-sensor fusion is a perception scheme proposed by some scholars at present, the respective advantages of the laser radar and the image vision sensor are combined to realize target detection, so that a system can obtain richer and more effective target information, and compared with a single sensor, the target information obtained by the multi-sensor fusion is more visual and more effective, has a larger visual field and higher robustness. However, in the current method of detecting and tracking a target by combining a laser radar and image data, obstacle identification is usually performed only by using 3D point cloud data of the laser radar, mapping is performed by using 2D information of a visible light or infrared image, and target detection and tracking are performed by using the mapped image data, which has the following problems:

1. the laser radar firstly carries out global scanning on the surrounding environment to obtain point cloud data of all scenes, then extracts an interested area after filtering information such as interference, background and the like, and does not need to utilize all the point cloud data, thereby causing great waste and burden of data and computing power;

2. because the obstacle identification is carried out by means of the point cloud information of the laser radar, and then the image target detection tracking is carried out by the identified point cloud obstacle, when the identification cannot be carried out due to the fact that the point cloud resolution of the target is low, the whole system cannot work effectively, and the stability and the reliability of the system are low. Therefore, a new solution to solve the above technical problems is urgently needed.

Disclosure of Invention

The invention provides the AI eyeball fusing the image and the laser radar aiming at the problems in the prior art, and the technical scheme provides a target detection and tracking method which has small computational demand and stable and reliable system and can be used for perceiving the surrounding environment.

In order to achieve the above object, according to the technical solution of the present invention, an AI eyeball fusing an image and a laser radar includes:

step 1) establishing imaging models of a laser radar and a camera, and determining a linear relation between known parameters and unknown parameters;

step 2) jointly calibrating the laser radar and the camera, and determining a conversion relation between a laser radar coordinate system and a camera coordinate, namely projection transformation matrixes R and t;

step 3) acquiring a video image in the front environment of a camera shooting lens, and performing target detection, target classification and target positioning through a convolutional neural network model to obtain a target candidate area and a target candidate frame;

step 4), extracting the position information of the feature points of the candidate frame, and calculating the direction vector of each feature point in a camera coordinate system;

step 5) calculating an orientation vector of each characteristic point under a laser radar coordinate system and an origin azimuth corresponding to the orientation vector according to the projection transformation matrix and the direction vector of the characteristic point in the step 4);

step 6), the operation and control unit controls the emergent light beam of the laser radar to rotate to a corresponding angle according to the azimuth of the origin in the step 5) and scans according to a certain rule to obtain point cloud data;

and 7) adding the target category information obtained in the step 3) and the target effective information obtained in the step 6), including target depth, position, speed and acceleration information, into the target candidate frame, and finally outputting the video image with the target information.

As an improvement of the present invention, step 1) establishes an imaging model of the laser radar and the camera, specifically, as follows, assuming that coordinates of a spatial point P in a laser radar coordinate system and an image pixel coordinate system are (Xlp, Ylp, Zlp) and (μ, ν), respectively, according to a pinhole imaging relationship, an image-pixel relationship and a projection transformation relationship, a relationship between the two coordinate systems can be expressed as follows:

wherein Zsp is a scale factor, f, dx, dy, mu₀、ν₀For the camera intrinsic parameters, f is the focal length of the camera, dx and dy denote the physical dimensions of each pixel in the image coordinate system (μ)₀，ν₀) Pixel coordinates representing the intersection of the optical axis and the image plane, R and t represent projective transformation matrices between the camera and the lidar coordinate systems.

As an improvement of the invention, step 2) laser radar and camera are jointly calibrated, and the conversion relation between a laser radar coordinate system and camera coordinates, namely a projection transformation matrix, is determined; the method comprises the following specific steps:

according to the equation relationship shown in equation (1), let:

the formula (2) can be substituted for the formula (1):

elimination of Zsp with the last row yields:

as can be seen from equation (4), for each corresponding point in the image captured by the camera and the point cloud of the laser radar, the above equation can obtain two equations, that is, when the corresponding points are n, the equation number is 2n, so as long as the corresponding points under two coordinate systems are found, the projection transformation matrix T is solved, and the corresponding points are set to be n (n is greater than or equal to 6), so that:

L＝[T₁₁ T₁₂T₁₄T₁₄ T₂₁T₂₂ T₂₃ T₂₄T₃₁ T₃₂T₃₃ T₃₄]^T (6)

equation (4) can be simplified to AL ═ 0, using the singular value decomposition matrix a:

[U∑V]＝SVD(A) (7)

u is made of AA^TIs formed by a^TAnd the characteristic vector of the A is formed, so that a singular vector in the V corresponding to the minimum singular value in the singular value matrix sigma is the least square solution of the L, namely the matrix T, and then R and T are solved according to the formula (2) to determine the coordinate conversion relation between the camera and the laser radar sensor.

As an improvement of the present invention, in this embodiment, a calibration board shown in fig. 3 is used to determine corresponding points of the camera and the laser radar, and a standard pattern, such as a circular hole, a regular quadrangle, etc., is formed on the calibration board in advance, in this embodiment, the circular hole is taken as an example for description, the number of the holes is greater than 6, and the hole center position is taken as a feature point.

In this embodiment, as shown in fig. 4, a flow of solving a projection transformation matrix of two sensors using the calibration plate is firstly performed by scanning the calibration plate with a laser radar to generate calibration plate point cloud data, fitting n (n is greater than or equal to 6) round holes on the calibration plate according to the point cloud data, obtaining spatial coordinates (Xlpi, Ylpi, Zlpi) of a hole center under a laser radar coordinate system, i is 1,2, and then obtaining a calibration plate image with a camera, identifying a corresponding circle in the image based on a Hough transformation theory, and obtaining a coordinate (μ) of the circle center under a pixel coordinate system_i,ν_i)，i＝1,2, substituting the coordinates of the corresponding point pair into equations (5) and (7) to solve the projective transformation matrix T.

As an improvement of the present invention, step 3) a camera captures a video image in front of a lens, performs target detection, target classification and target positioning through a convolutional neural network, and obtains a target candidate region and a target candidate frame, specifically, as shown in fig. 5, an algorithm flow of video image target detection is shown.

As an improvement of the present invention, step 4) extracts feature point position information of the candidate frame, and calculates a direction vector of each feature point in the camera coordinate system, specifically as follows:

according to the generated target candidate frame, taking a rectangular candidate frame as an example, and a rectangular ABCD, it can be known from the pinhole imaging model that direction vectors of four corner points of the rectangular ABCD in a camera coordinate system are:

Oi＝(X_i，Y_i，Z_i)＝((μ_i-μ₀)dx/f，(v_i-v₀)dy/f，1)，i＝A，B，C，D (8)

f, dx, dy, μ in the formula (8)₀、ν₀For camera reference, f is the focal length of the camera, dx and dy denote the physical dimensions of each pixel in the image coordinate system (μ)₀，ν₀) Pixel coordinates representing the intersection of the optical axis and the image plane.

As an improvement of the invention, step 5) calculates an orientation vector of each feature point under a laser radar coordinate system and an origin azimuth angle corresponding to the orientation vector according to the projection transformation matrix and the direction vector of the feature point in step 4); the method comprises the following specific steps:

and if Oi' is a direction vector of the characteristic point in the laser radar coordinate system, transforming the matrix according to the projection:

Oi′＝R*Oi+t＝(Xi′，Yi′Zi′)，i＝A，B，C，D (9)

let α and β be the azimuth of the origin corresponding to Oi', as shown in fig. 7, then:

as an improvement of the invention, the operation and control unit in step 6) controls the emergent beam of the laser radar to rotate to a corresponding angle and scans according to a certain rule to obtain point cloud data according to the azimuth of the origin in step 5), and the laser radar requires that the emergent beam can reach any angle and position in the visual field through the control unit; the point cloud scanning rule is that effective information is acquired through scanning. The point cloud scanning rule is to acquire effective information through scanning, and an optional rule is as follows:

let n candidate frame feature points (n ≧ 4) be P₁,P₂,P₃,…,P_nIs provided with P_i∈(P₁,P₂,…,P_n/2) Then P is_i+n/2∈(P_n/2,P_(n/2)+1,…,P_n) Is connected to P_iP_i+n/2，P_iP_i+n/2The method comprises the steps of obtaining target information, namely a path to be passed by a laser beam, generating point cloud under the path and obtaining the target information; when n is a base number, calculation is optionally performed using (n-1) or (n + 1).

As a modification of the present invention, step 7) adds the target category information and the target effective information, including target depth, position, velocity, acceleration, etc. obtained in step 3) to the target candidate frame, and finally outputs the video image with the target information, including but not limited to the target category, target depth, position, velocity, acceleration, etc.

Compared with the prior art, the invention has the following advantages:

1. according to the technical scheme, the laser radar is not used for carrying out global scanning on the surrounding environment, a large amount of point cloud data is not required to be processed, the data waste is avoided, and meanwhile the system has low demand on computing power;

2. the technical scheme of the invention adopts a mode of firstly taking a camera and then taking a laser radar, and the resolution ratio of the camera is far higher than that of the laser radar, so that the whole system can be ensured to work effectively in real time, and the system is stable and reliable;

3. the output result of the technical scheme of the invention is a video image with target information, which can be used for sensing the surrounding environment, detecting and tracking the target and providing a stable, reliable and effective signal for the upper-level system.

Drawings

FIG. 1 is a schematic flow chart of the present invention;

FIG. 2 is a schematic diagram of the relationship between the camera and the lidar coordinate system;

fig. 3 is a schematic diagram of a calibration board as described in the example, to determine the corresponding feature points of the camera and the lidar,

FIG. 4 is a schematic flow chart of solving projective transformation matrices of two sensors;

FIG. 5 is a flow chart of an algorithm for video image target detection;

FIG. 6 is a schematic diagram of a rectangular candidate block;

fig. 7 is an azimuth view.

The specific implementation mode is as follows:

for the purpose of enhancing an understanding of the present invention, the present embodiment will be described in detail below with reference to the accompanying drawings.

Example 1: referring to fig. 1 to 7, an AI eyeball fusing an image and a laser radar, the method comprising the steps of:

Step 1) establishing imaging models of a laser radar and a camera, which comprises the following specific steps: fig. 2 is a schematic diagram showing a relationship between the camera and the lidar coordinate system of the embodiment, assuming that coordinates of a spatial point P in the lidar coordinate system and the image pixel coordinate system are (Xlp, Ylp, Zlp) and (μ, ν), respectively, according to the pinhole imaging relationship, the image-pixel relationship and the projective transformation relationship, the relationship between the two coordinate systems can be represented as follows:

Step 2) jointly calibrating the laser radar and the camera, and determining a conversion relation between a laser radar coordinate system and a camera coordinate, namely a projection transformation matrix; the method comprises the following specific steps:

according to the equation relationship shown in equation (1), let:

the formula (2) can be substituted for the formula (1):

elimination of Zsp with the last row yields:

[U∑V]＝SVD(A) (7)

u is made of AA^TIs formed by a^TA, therefore, the singular vector in V corresponding to the minimum singular value in the singular value matrix sigma is the least square solution of L, namely the matrix T, then R and T are solved according to the formula (2), and two sensors of the camera and the laser radar are determinedCoordinate conversion relationship between them.

In the embodiment, the calibration board shown in fig. 3 is used to determine the corresponding points of the camera and the laser radar, and standard patterns, such as round holes, a regular quadrangle, and the like, are formed on the calibration board in advance.

In this embodiment, as shown in fig. 4, a flow of solving a projection transformation matrix of two sensors using the calibration plate is firstly performed by scanning the calibration plate with a laser radar to generate calibration plate point cloud data, fitting n (n is greater than or equal to 6) round holes on the calibration plate according to the point cloud data, obtaining spatial coordinates (Xlpi, Ylpi, Zlpi) of a hole center under a laser radar coordinate system, i is 1,2, and then obtaining a calibration plate image with a camera, identifying a corresponding circle in the image based on a Hough transformation theory, and obtaining a coordinate (μ) of the circle center under a pixel coordinate system_i,ν_i) And i is 1,2, and the coordinates of the corresponding point pair are substituted into equations (5) and (7) to solve the projective transformation matrix T.

And 3) acquiring a video image in the front environment of a camera shooting lens by a camera, performing target detection, target classification and target positioning through a convolutional neural network to obtain a target candidate region and a target candidate frame, specifically, as shown in FIG. 5, performing feature extraction through models such as a convolutional neural network according to the video image acquired by the camera, then predicting the category and the coordinate of an object by using a series of small convolution modules according to the extracted features, deleting the extracted background information, performing post-processing through methods such as non-maximum value inhibition, screening out the region with the highest confidence coefficient and the target, and finally outputting the target detection image and the target candidate frame.

Step 4), extracting the position information of the feature points of the candidate frame, and calculating the direction vector of each feature point in the camera coordinate system, wherein the method specifically comprises the following steps:

Step 5) calculating an orientation vector of each characteristic point under a laser radar coordinate system and an origin azimuth corresponding to the orientation vector according to the projection transformation matrix and the direction vector of the characteristic point in the step 4); the method comprises the following specific steps:

Oi′＝R*Oi+t＝(Xi′，Yi′Zi′)，i＝A，B，C，D (9)

assuming that α and β are azimuth angles of origin corresponding to Oi', as shown in fig. 7, α and β are only one way of representing azimuth angles, but not limited thereto, then:

step 6) the operation and control unit controls the emergent beam of the laser radar to rotate to a corresponding angle and scans according to a certain rule to obtain point cloud data according to the azimuth of the origin in the step 5), wherein the laser radar requires that the emergent beam can reach any angle and position in a visual field through the control unit, for example, the laser radar in the patent of an orthogonal double-shaft two-dimensional closed-loop continuous rotation scanning device and the laser radar including the same (patent application number 202121167450.1); the point cloud scanning rule is to acquire effective information through scanning, and an optional rule is as follows:

let n candidate frame feature points (n ≧ 4) be P₁,P₂,P₃,…,P_nIs provided with P_i∈(P₁,P₂,…,P_n/2) Then P is_i+n/2∈(P_n/2,P_(n/2)+1,…,P_n) Is connected to P_iP_i+n/2，P_iP_i+n/2That is, the path to be traveled by the laser beam, and generating a point cloud under the path to obtain target information, as shown in fig. 6, when n is 4, AC and BD are scanning paths of the laser beam; when n is a base number, calculation is optionally performed using (n-1) or (n + 1).

And 7) adding the target category information and the target effective information acquired in the step 3), including target depth, position, speed, acceleration and the like, into the target candidate frame, and finally outputting the video image with the target information, wherein the target information includes but is not limited to the target category, the target depth, the position, the speed, the acceleration and the like.

It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and all equivalent modifications and substitutions based on the above-mentioned technical solutions are within the scope of the present invention as defined in the claims.

Claims

1. An AI eyeball fusing an image and a lidar, characterized in that the method comprises the following steps:

2. The AI eyeball for fusing image and lidar according to claim 1, wherein step 1) is to establish an imaging model of the lidar and the camera as follows: assuming that the coordinates of the space point P in the lidar coordinate system and the image pixel coordinate system are (Xlp, Ylp, Zlp) and (μ, v), respectively, the relationship between the two coordinate systems is expressed as follows according to the pinhole imaging relationship, the image-to-pixel relationship and the projective transformation relationship:

wherein Zsp is a scale factor, f, dx, dy, mu₀、v₀For the camera intrinsic parameters, f is the focal length of the camera, dx and dy denote the physical dimensions of each pixel in the image coordinate system (μ)₀，v₀) Pixel coordinates representing the intersection of the optical axis and the image plane, R and t represent projective transformation matrices between the camera and the lidar coordinate systems.

3. The AI eyeball fusing image and lidar according to claim 2, wherein step 2) the lidar and the camera are jointly calibrated to determine a transformation relationship between the lidar coordinate system and the camera coordinates, i.e. a projective transformation matrix; the method comprises the following specific steps:

according to the equation relationship shown in equation (1), let:

the formula (2) can be substituted for the formula (1):

elimination of Zsp with the last row yields:

as can be seen from equation (4), for each corresponding point in the image captured by the camera and the point cloud of the laser radar, the above equation obtains two equations, that is, when the corresponding points are n, the equation number is 2n, so as long as the corresponding points under two coordinate systems are found, the projection transformation matrix T is solved, and the corresponding points are set to be n (n is greater than or equal to 6), so that:

equation (4) is reduced to AL ═ 0, using the singular value decomposition matrix a:

[U∑V]＝SVD(A) (7)

4. The image-and-lidar-fused AI eyeball of claim 3,

and 3) acquiring a video image in front of a camera shooting lens by a camera, performing target detection, target classification and target positioning through a convolutional neural network to obtain a target candidate region and a target candidate frame.

5. The image-and-lidar-fused AI eyeball according to claim 3 or 4,

step 4), extracting the position information of the feature points of the candidate frame, and calculating the direction vector of each feature point in the camera coordinate system, wherein the method specifically comprises the following steps: according to the generated target candidate frame and the rectangular ABCD, the direction vectors of four corner points of the rectangular ABCD under a camera coordinate system are known as follows according to the pinhole imaging model:

f, dx, dy, μ in the formula (8)₀、v₀For camera reference, f is the focal length of the camera, dx and dy denote the physical dimensions of each pixel in the image coordinate system (μ)₀，v₀) Pixel coordinates representing the intersection of the optical axis and the image plane.

6. The AI eyeball for fusing image and lidar according to claim 5, wherein step 5) calculates an orientation vector of each feature point in the lidar coordinate system and an origin azimuth corresponding to the orientation vector according to the projective transformation matrix and the direction vector of the feature point in step 4); the method comprises the following specific steps:

Oi′＝R*Oi+t＝(Xi′，Yi′Zi′)，i＝A，B，C，D (9)

assuming that α and β are the azimuth of the origin corresponding to Oi', then:

7. the AI eyeball fusing an image and a laser radar according to claim 6, wherein the operation and control unit of step 6) controls the emergent beam of the laser radar to rotate to a corresponding angle and scan according to a certain rule to obtain point cloud data according to the azimuth of the origin in step 5), and the laser radar requires that the emergent beam can reach any angle and position in the visual field through the control unit; the point cloud scanning rule is to realize the acquisition of effective information through scanning, and the rule is as follows:

let n candidate frame feature points (n ≧ 4) be P₁，P₂，P₃，…，P_nIs provided with P_i∈(P₁，P₂，…，P_h/2) Then P is_i+n/2∈(P_n/2，P_(n/2)+1，…，P_n) Is connected to P_iP_i+n/2，P_iP_i+n/2The method comprises the steps of obtaining target information, namely a path to be passed by a laser beam, generating point cloud under the path and obtaining the target information; when n is a base number, the calculation is performed using (n-1) or (n + 1).

8. The AI eyeball for fusing image and lidar according to claim 6, wherein step 7) adds target depth, position, velocity, and acceleration, which are the target category information obtained in step 3) and the target effective information obtained in step 6), to the target candidate frame, and finally outputs a video image with the target information.