WO2021248636A1

WO2021248636A1 - System and method for detecting and positioning autonomous driving object

Info

Publication number: WO2021248636A1
Application number: PCT/CN2020/102996
Authority: WO
Inventors: 王峰; 潘观潮; 刘进辉; 王宏武; 王晓洒
Original assignee: 东莞市普灵思智能电子有限公司
Priority date: 2020-06-12
Filing date: 2020-07-20
Publication date: 2021-12-16
Also published as: CN111929718A

Abstract

A system and method for detecting and positioning an autonomous driving object. The method comprises the following steps: obtaining the measurement data of an inertial navigation module of an autonomous driving vehicle, the image data of a stereoscopic vision module, satellite navigation original measurement data, and a deep neural network module detection and determination software module; tightly coupling the image data of the stereoscopic vision module, the satellite navigation original measurement data, and the measurement data of the inertial navigation module, so as to limit the increase of a drift error of the inertial navigation module, and ensure the precision of positioning; and establishing three-dimensional environment information by using a satellite-inertial navigation assisted stereoscopic vision method, and then using a deep neural network module to determine and classify targets and obstacles in an environment. The method has the advantages of maximally improving the positioning precision of an autonomous driving sensing system, and improving the calculation efficiency and reliability.

Description

Automatic driving object detection and positioning system and method

Technical field

The present invention relates to the field of computer application technology, in particular to an automatic driving object detection and positioning system and method.

Background technique

In recent years, as people’s awareness of car safety has increased and the development of information technology has made the field of autonomous driving more and more concerned, many companies and scientific research institutions in the world have begun to invest in research and development of autonomous driving related products. It is expected that autonomous vehicles will be driven in 2021. Will enter the market and bring huge changes to the automotive industry. Relevant research shows that the development of autonomous driving technology will bring disruptive development in many fields. For example, its development can enhance road traffic safety, alleviate traffic congestion and reduce environmental pollution. At the same time, autonomous driving technology is also a measure of a country. An important symbol of scientific research strength and industrial level, it has broad application prospects in the fields of national defense and national economy.

Autonomous driving means that the car perceives the road environment through the on-board sensor system, and controls the steering and speed of the vehicle based on the road, vehicle position and obstacle information obtained by the perception, and then automatically plans the driving route and controls the vehicle to reach the predetermined target Technology.

Nowadays, in terms of autonomous driving, major companies have their own technical direction. The existing technology has a combination system of binocular direct method vision system and inertial navigation module, but the error generated by the vision system and inertial navigation module in the combined system cannot be Effective restriction, the error of the combined system will increase indefinitely when there is no image gradient for a long time, resulting in the failure of the combined system's perception.

The prior art also has monocular feature point method vision systems, inertial navigation modules, and satellite navigation close-coupled automatic driving perception systems, but monocular cameras cannot detect uncharacteristic obstacles, such as isolated guardrails on highways, bicycles or animals, etc. . The existing vision system also uses a binocular stereo vision system for coupling, but still uses the feature point method, which requires a large amount of calculation and high hardware performance requirements. At present, the most advanced binocular stereo vision environment detection method only uses the visual difference information in binocular vision, and does not use the images collected by the camera at different times and positions to realize the three-dimensional modeling of the environment.

Summary of the invention

The purpose of the present invention is to provide an automatic driving object detection and positioning system and method, which has the advantages of maximizing the positioning accuracy of the automatic driving perception system, and improving the calculation efficiency and reliability, so as to solve the problems raised in the background art.

In order to achieve the above objectives, the present invention provides the following technical solutions:

An automatic driving object detection and positioning system, including a stereo vision image processing module, a satellite navigation module, an inertial navigation module and a system tight coupling module, in which:

Stereo vision image processing module, using or multi-eye camera to obtain the image data of the stereo vision module;

The satellite navigation module is used to obtain the original measurement data of satellite navigation through the receiver;

The inertial navigation module is used to obtain the measurement data of the inertial navigation module by using inertial sensors;

The system tight coupling module is used to tightly couple the measurement data of the inertial navigation module, the image data of the stereo vision module and the original measurement data of the satellite navigation, and establish three-dimensional environment information, and finally use the deep neural network module to detect Targets and obstacles in the environment;

The deep neural network is composed of a three-dimensional sparse convolutional neural network, a dot network neural network, and a combination of the two;

The stereo vision image processing module, the satellite navigation module and the inertial navigation module are all connected to the system tight coupling module.

Further, the stereo vision image processing module includes a binocular or multi-eye camera.

Another technical solution to be solved by the present invention: a method for automatic driving object detection and positioning, including the following steps:

S1: Obtain the image data of the stereo vision module, the measurement data of the inertial navigation module and the original measurement data of the satellite navigation;

S2: tightly couple the image data of the stereo vision module, the original measurement data of the satellite navigation and the measurement data of the inertial navigation module to correct the drift error of the inertial navigation module;

S3: Tightly couple the image data of the stereo vision module, the original measurement data of the satellite navigation and the measurement data of the inertial navigation module to establish three-dimensional environment information, and then use the deep neural network module to judge and classify the targets and obstacles in the environment;

S4: The deep neural network uses a three-dimensional sparse convolutional neural network, a dot network neural network, or a combination of the two.

Further, the stereo vision module adopts multi-camera, including binocular camera direct method for processing, and the image data includes the visual difference between multi-camera or binocular stereo vision cameras at the same time and each camera at different time, The image information captured by the location is used to establish the three-dimensional information of the environment.

Further, the tight coupling consists of a weighted reprojection error of stereo vision, satellite navigation error, and state error from inertial navigation to form a cost function.

Further, the S1 specifically includes the following steps:

S11: Use multi-eye or binocular cameras to obtain image data of the stereo vision module. The image data includes the visual difference between the multi-eye or binocular stereo vision cameras at the same time and the images captured by each camera at different times and positions information;

S12: Use inertial sensors to obtain measurement data of the inertial navigation module;

S13: Obtain the original measurement data of satellite navigation through the receiver;

S14: Perform tight coupling processing on the measurement data of the inertial navigation module, the image data, and the original measurement data of the satellite navigation.

Further, the S11 specifically includes the following steps:

S111: Use a multi-eye or binocular camera to collect environmental image signals;

S112: Combine the visual difference captured by multiple different cameras at the same time and the images captured by each camera at different times and positions to form a direct method of stereo vision observation;

S113: Combine the measurement data of inertial navigation and satellite navigation to establish three-dimensional environment information, and finally use the deep neural network module to classify and judge the targets and obstacles in the environment.

Further, the S12 specifically includes the following steps:

S121: Use inertial sensors to measure the 3-axis acceleration and 3-axis angular velocity of the autonomous vehicle in a fixed coordinate system;

S122: Rotate the acceleration and the angular velocity to the navigation coordinate system, solve the inertial navigation mechanical arrangement equation, and calculate the position and attitude angle of the autonomous vehicle;

S123: Combine image data, inertial navigation data and satellite navigation data to establish three-dimensional environment information, and then use deep neural network modules to classify and judge objects and obstacles in the environment.

Further, the S14 specifically includes: correcting the drift error of the inertial navigation module by combining the original measurement data of the satellite navigation with the image data of the stereo vision module.

Compared with the prior art, the beneficial effect of the present invention is: the present invention tightly couples the measurement data of the inertial navigation module, the image data of the stereo vision module, and the original measurement data of the satellite navigation, and measures the error of the measurement data of the inertial navigation module. Correction, thereby improving the positioning accuracy, and then using the three-dimensional environment modeling data and deep neural network realized by the direct method of stereo vision to identify and judge objects in the environment. The system uses high-precision three-dimensional sparse convolutional neural networks, dot-net neural networks, or their combination to improve the accuracy of obstacle judgment, and no longer resorts to expensive laser scanning radar, thereby reducing the cost of autonomous vehicles.

Description of the drawings

FIG. 1 is a flowchart of an automatic driving positioning and object detection method according to an embodiment of the present invention;

2 is a structural diagram of a three-dimensional sparse convolutional neural network of an automatic driving object detection and positioning system according to an embodiment of the present invention;

FIG. 3 is a structural diagram of a point network neural network of an automatic driving object detection and positioning system according to an embodiment of the present invention;

[Corrected according to Rule 91 27.10.2020]
Fig. 4 is a schematic structural diagram of an automatic driving object detection and positioning system according to an embodiment of the present invention.
Fig. 5 is an architecture diagram of the automatic driving object detection and positioning system of the present invention.

detailed description

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

An automatic driving object detection and positioning system and method based on the tight coupling of a stereo vision module, an inertial navigation module and a satellite navigation module. Inertial navigation can continuously provide information with high accuracy in a short time, but positioning errors will accumulate over time; satellite navigation will be long-term Stability is good, but it is susceptible to interference, and the data update frequency is low; the usual stereo vision uses the feature point method, first select multiple feature points in the image, and then use the visual difference of these feature points captured by the left and right eyes to match. Finally, the triangulation method is used to determine the distance information of these feature points to the camera, so as to firstly detect and locate the three-dimensional environment. The stereo vision method based on feature points needs to select many feature points in each frame of image, which consumes a lot of computing resources, and in many cases, there are no suitable feature points in the image. For example, when the image intensity has only one fixed gradient, the feature point method will fail completely. The stereo vision method based on feature points does not use the images captured by the camera at different positions to achieve distance measurement, which wastes valuable measurement data and leads to environmental modeling accuracy errors and positioning errors.

A better algorithm is to use the direct method of stereo vision, it does not need to select feature points, saving computing resources, and as long as there are gradient images, environment modeling and positioning can be achieved. In addition, it not only uses the image data captured by different cameras at the same time, but also uses the image information captured by the cameras at different times and positions for three-dimensional modeling and positioning. The stereo vision calculates the obstacles to the obstacles according to the parallax of different cameras. The distance of the camera, but the visual system cannot effectively locate and detect the distance of obstacles in an environment where the image lacks gradient. The stereo vision direct method, inertial navigation and satellite navigation form a combined navigation system, which can assist each other in perceiving the state of the vehicle and the environment, complement each other in different environments, improve the reliability and navigation accuracy, can accurately extract the distance, and can replace the current situation. Some laser scanning radars reduce costs.

Fig. 1 is a flowchart of a tightly coupled automatic driving perception method according to an embodiment of the present invention. As shown in Fig. 1, the tightly coupled automatic driving perception method specifically includes:

S1. Obtain the measurement data of the inertial navigation module of the autonomous vehicle, the image data of the stereo vision module and the original measurement data of the satellite navigation; S2, in the tight coupling process, combine the image data of the stereo vision module and the original measurement data of the satellite navigation with the inertial navigation The module measurement data is tightly coupled to correct the error of the inertial navigation module measurement data, so as to perceive the state of the vehicle and the state of the environment. The process of the specific autonomous driving perception method is described in detail below:

(1) The image data of the stereo vision module is obtained by the stereo vision image processing module of the binocular direct method. The detailed process of using the binocular camera to obtain the image data of the stereo vision module is described below:

The direct method is based on the assumption that the gray level is invariable. The pixel gray level of the same spatial point is fixed in each image. The direct method does not need to extract features. At the same time, it lacks corners and edges or the light changes are not obvious. The environment also has better results; and the direct method requires less data to be processed, which can achieve high computational efficiency.

The specific process of using the binocular direct method to process the images captured by the camera stereo vision is as follows:

1. Use the binocular camera to obtain the grayscale image of two or more images of the measured object from different positions (the color image can use the intensity of red, green, and blue, respectively use this method), and record the binocular camera at t The images obtained at time and t+1 are I _i and I _j .

2. Carry out camera calibration through matlab or opencv to obtain the internal parameters of the camera.

3. Perform distortion correction processing on the acquired image according to the camera's internal parameters.

4. Input the above-mentioned image data into the tightly coupled module of the system, calculate the photometric measurement energy according to the captured image, and then calculate the error energy function. By minimizing the error function, the positioning of the camera body and the three-dimensional perception of the environment are realized. Specifically, the spatial point Pi in the image Ii appears in another frame of image Ij, where Ii and Ij are images captured by the same camera at different positions and attitudes, and a spatial point Pi appearing in the images Ii and Ij is selected. , According to the principle of the assumption that the gray values of the three-dimensional points in the same space are not changed under various viewing angles, the projection p'of the point p in the image Ii on the image Ij is calculated by the following formula:

Where Π _k and

Is the projection and back projection function of the camera image frame point, d _p is the inverse depth of p point, and T _ji is the conversion relationship between image frames:

Among them, R _ji is a 3*3 rotation matrix, and t is the translation vector.

The luminosity error energy function is:

in,

It is the Huber norm to prevent the error energy function from increasing too fast with the luminosity error; ω _p is the weight value, in order to reduce the weight corresponding to the pixels with large gradients in the image, so that the final energy function can reflect most of the pixels The luminosity error is not the error of individual pixels with large gradients; I _i [p] and I _j [p'] are respectively _{the gray value of Pi} at the corresponding point of the image.

The camera position and attitude angle are obtained by optimization by minimizing the photometric error energy. When the photometric error energy is at the minimum, the image data of the camera pose and environmental information of the stereo vision module can be obtained. The objective function is:

The above formulas (1-3, 1-4) are the photometric error energy function of the monocular camera. When the camera is a binocular or multi-camera, the direct method is a stereo vision module. The combined image data includes the visual static difference data collected by the binocular cameras at the same time and the dynamic time series data collected by each camera at different times and positions. The binocular direct method balances the relative weight of each image frame of the camera and the static stereo vision by introducing a coupling factor λ, and its error energy function is:

Among them, obs(p) represents all image frames where p points can be observed,

Is the error energy function in static stereo vision, i belongs to different image frames, p belongs to multiple points in the same frame, and j is all frames where p points can be observed. According to formula (1-4) and formula (1-5), the objective function of the new binocular camera can be obtained. _{R ji} and t appearing in the conversion between the above two image frames are the pose of the camera. By using the gradient descent method or Gauss Newton method to solve the objective function, the pose of the camera and the three-dimensional space point Pi are obtained. coordinate. Furthermore, the camera pose is used to correct the errors in the measurement data of the inertial navigation module of the autonomous vehicle in the inertial navigation. The camera pose acquired by the binocular direct method does not require feature extraction, and the ability to respond to road and environmental changes is significantly improved. The collection of the three-dimensional coordinates of many spatial points Pi becomes the three-dimensional point cloud of the environment.

(2) The measurement data of the inertial navigation module is obtained through the inertial measurement unit. The measurement data of the inertial navigation module includes the acceleration vector and the rotation angular rate vector of the autonomous vehicle. The detailed process of obtaining the measurement data of the inertial navigation module is described below:

First, inertial sensors are used to measure the 3-axis acceleration and 3-axis angular velocity of the autonomous vehicle in a fixed coordinate system. The inertial measurement unit installed on the vehicle is an accelerometer and a gyroscope. The accelerometer is used to measure the acceleration of the vehicle, and the gyroscope is used to measure the angular velocity of the vehicle. The measurement sensor has measurement errors such as zero drift. These measurement errors cause the positioning error to increase in the square of time and the attitude angle error to increase in proportion to time. If it is not restricted, the navigation system will quickly lose its ability. Then, error compensation is carried out through modeling. Can reduce deterministic error and random drift error.

Rotate the acceleration and angular velocity to the navigation coordinate system, solve the inertial navigation mechanical arrangement equation and calculate the position and attitude angle of the autonomous vehicle. Specifically, the navigation system selected for the update of the pose value and the error compensation is the n system, which is usually the northeast sky.

In the northeast sky coordinate system, the vehicle's position update formula is as follows:

In the above formula,

Is the projection of the carrier velocity in the n system; λ, L and h represent the longitude, latitude and height of the carrier respectively; a represents the basic geodetic parameter under the WGS-84 coordinate system-the length of the semi-major axis of the ellipsoid, and e is the length of the ellipsoid Eccentricity. Finally obtained differential equation λ, L, h, and can be solved in accordance with λ, L value, h, which can calculate the position of the automatic driving of the vehicle, R _N is prime vertical curvature, R _M is the radius of curvature of the meridian.

In the northeast sky coordinate system, the vehicle speed update formula is as follows:

In the above formula,

Is the quaternion direction cosine matrix

The transpose of; v ⁿ is the projection of the velocity of the carrier in the n system;

Is the displacement angular rate calculated from the relative velocity of the carrier;

It represents the projection of the earth's rotation angular rate under n; g ⁿ is the projection of the local gravitational acceleration on n; f ^b is the measured output value of the accelerometer. The symbol "×" represents a vector cross product. Finally get about v ⁿ 、

The differential equation of, according to the solution, ^{v n,}

Therefore, the speed and angular rate of displacement of the autonomous vehicle can be calculated.

Furthermore, combining the inertial navigation kinematics with a simple dynamic deviation model, the following equations are obtained:

in,

The elements are each uncorrelated zero-mean Gaussian white noise process.

Is the accelerometer measurement, g _W is the earth’s gravitational acceleration vector,

Is the three-dimensional position of the autonomous vehicle, q _ws is the 4 elements representing the attitude angle of the vehicle,

Is the speed vector of the vehicle, b _a and b _g are the acceleration and angular velocity deviation vectors respectively. Compared with the gyroscope bias modeled as a random walk, the time constant τ>0 is used to model the accelerometer bias as a bounded random walk. The matrix Ω is estimated by the measurement accompanied by the gyroscope

Angular rate

composition:

_{Finally, the prediction error e s} of the inertial navigation part of the integrated navigation is obtained, and e _s is the state error term of the inertial navigation:

Among them, the middle three items on the right are the 1, 2, and 3 components of the quaternion.

It is the velocity vector of the inertial navigation at time k, the deviation vector of the acceleration and angular velocity of the inertial measurement unit.

(3) The usual satellite navigation module outputs the three-dimensional position, speed and local time of the satellite receiving antenna installed on the vehicle. The tight coupling method of the present invention requires the original measurement data of satellite navigation, including pseudorange measurement of visible satellites, Doppler frequency drift measurement and carrier phase measurement, and uses the satellite navigation data stream to assist the inertial measurement and stereo vision system of the present invention, Comprehensively determine the position, attitude angle and environmental three-dimensional information of the autonomous vehicle. The detailed process of obtaining the original measurement data of satellite navigation is described below:

The Global Navigation Satellite System (GNSS) uses wireless broadcast signals to provide navigation positioning and time synchronization services for the public. Get the satellite antenna receiver on the autonomous vehicle to receive the signal of the navigation satellite; analyze the ephemeris information of each satellite according to the received satellite signal, and calculate the satellite position and satellite of each satellite based on the ephemeris information speed. At the same time, the satellite shuttle plane uses the radio signal sent by the satellite to calculate the pseudo-range, Doppler frequency drift and carrier phase of the satellite to the local receiver.

Specifically, the satellite navigation receiver uses a single-point positioning method to determine the pseudo-distance. The pseudo-distance is the flight time between a certain satellite and the user's antenna, multiplied by the speed of light. In satellite navigation, the pseudorange ρ ⁽ⁿ⁾ (t) is calculated by using the time difference between the time when the _{n satellite signal is received t u} (t) and the transmission time t _s ^{(n) (t-τ) and the vacuum} The speed c of the radio wave is multiplied, and the expression is as follows:

Among them, the symbol τ represents the actual time interval from when the GNSS signal is transmitted to when it is received by the user. However, since the clocks of GNSS satellites and user receivers are usually not synchronized with GNSS time t, the satellite time is ahead of GNSS time.

Indicates that the receiver time is ahead of the GNSS time by δt _u (t), namely:

t _u (t)=t+δt _u (t) (3-3)

Structures such as the ionosphere and troposphere will delay the propagation of electromagnetic waves to a certain extent, so τ needs to be subtracted from the ionospheric delay I ⁽ⁿ⁾ (t) and the tropospheric delay T ⁽ⁿ⁾ (t) to be the satellite signal The propagation time ^{of the geometric distance r (n)} from the position of the satellite to the position of the receiver, namely:

d _trop = cT ⁽ⁿ⁾ (t)

d _iono = cI ⁽ⁿ⁾ (t)

in,

Is the satellite clock error, _iono ionospheric delay and d _trop tropospheric delay are all known quantities, ε ⁽ⁿ⁾ represents the unknown pseudorange measurement noise, r ⁽ⁿ⁾ is the receiver in physical space (x, y, z) The geometric distance to the nth satellite (x ⁽ⁿ⁾ , y ⁽ⁿ⁾ , z ⁽ⁿ⁾ ), the coordinates of each satellite position (x ⁽ⁿ⁾ , y ⁽ⁿ⁾ , z ⁽ⁿ⁾ ) can be It is calculated from the ephemeris broadcast by each satellite.

Similarly, the observation equation of carrier phase in satellite navigation is as follows:

φλ=ρ(t _s ,t _r )+c(dt _r -dt _s )+d _trop -d _iono +d _rel +d _SA +d _multi +Nλ+ε (3-5)

in,

Is the observed value of the carrier phase, λ is the carrier wavelength, N is the ambiguity of the whole cycle, t _s is the time when the satellite transmits the signal, t _r is the time when the receiver receives the signal, dt _s and dt _r are the clock difference between the satellite and the receiver, respectively, c is the speed of light, d _iono ionospheric delay, d _trop delay of the troposphere, d _SA is the SA influence, d _multi multi-path effect, ε is the carrier measurement _{_{noise, ρ (t s, t r}} ) is t _s time satellite and The geometric distance between the receiver antennas at the time t _r , which includes the station coordinates, satellite orbit and earth rotation parameters, etc.

Since the Doppler frequency shift observation is an instantaneous observation of the carrier phase rate, the changes in the ionospheric and tropospheric delays with respect to time are ignored. Differentiate the carrier phase observation equation:

in,

Then, the Doppler frequency shift measurement equation is:

Among them, λ is the wavelength corresponding to carrier L1 (f1=1575.42MHz),

Is the Doppler shift of user u relative to satellite i, v ⁽ⁱ⁾ is the moving speed of satellite i,

Is the moving speed of user u,

Is the unit vector of user u pointing to satellite i, c is the speed of light in vacuum, δf _α is the clock drift of user u, δf ⁽ⁱ⁾ is the clock drift of satellite i,

It is the Doppler frequency measurement noise of user u relative to satellite i. It can be seen from equations 3-4, 3-5 and 3-7 that each navigation satellite can provide 2 independent measurement results (pseudo-range and carrier phase). These measurement equations can be used to calibrate the positioning deviation of inertial navigation and stereo vision. .

(4) The data input of the three parts of the stereo vision image processing module, the inertial navigation module and the satellite navigation module are tightly coupled. The data includes the measurement data of the inertial navigation module, the image data of the stereo vision module and the original measurement data of the satellite navigation. The original measurement data is combined with the image data of the stereo vision module to correct the drift error of the inertial navigation module, which can assist the inertial navigation module to limit the drift error. Figure 2 is a specific flow chart of the tightly coupled module of the system. In this figure, the inertial sensor is used to obtain the 3-axis acceleration and 3-axis angular velocity of the vehicle in a fixed coordinate system and input to the inertial navigation module; the binocular camera collects pictures for direct method input; the GPS receiver is used to obtain satellites Raw measurement data. Combine the reprojection error of the direct method of the binocular camera, the satellite navigation error obtained according to the original satellite measurement data and the time error of the inertial navigation, and perform the overall optimization estimation to correct the drift error of the inertial navigation module, and finally output the optimal position posture.

When performing tight coupling, tight coupling is expressed by the cost function of weighted reprojection error of stereo vision, satellite navigation error, and time error from inertial navigation. The expression is as follows:

Among them, e _r is the weighted reprojection error of stereo vision, e _g is the satellite navigation error, e _s is the time error term of the inertial navigation, i is the camera index, k is the camera frame index, and j is the landmark index. The landmark labels visible in the k-th frame and the i-th camera are written as the set J(i,k). also,

Represents the information matrix measured by the corresponding subscript, and

Indicates the error information of the inertial navigation corresponding to the k-th frame picture,

Represents the error information matrix corresponding to the satellite corresponding to the sth satellite at time t.

Furthermore, e _r is the weighted re-projection error of stereo vision, and the manifestation of the re-projection error of stereo vision is:

Wherein, h _i denotes a projection camera ^model, z ^{i, j, k} represents the coordinates of the image features.

Indicates the attitude of the optimized system,

Represents the external parameters of inertial navigation and camera,

Represents feature coordinates.

e _g is the satellite navigation error, the error of each navigation satellite s at each time t contains three parts of error, and the expression is as follows:

Wherein, e _p represents a pseudorange error, ed denotes the Doppler error, e _c represents the carrier phase error.

Furthermore, the form of the corresponding error information matrix W _g becomes the following form:

The value of the cost function J(x) is minimized by linear or non-linear optimization methods, thereby completing the tight coupling between the stereo vision image processing module, the inertial navigation module and the satellite navigation module, and combining the original measurement data of the satellite navigation The image data of the stereo vision module corrects the drift error of the inertial navigation module. When the number of satellites is less than the set number, the receiver cannot be positioned. The system error of the inertial navigation module is corrected through the original measurement data of satellite navigation combined with the image data of the stereo vision module, thereby improving the accuracy of navigation and greatly enhancing the navigation Robustness of the system.

While positioning the vehicle, the above steps also use the direct method of stereo vision to realize the description of the three-dimensional point cloud of the environment without the need for expensive laser scanning radar. The environment perception system then judges the targets and obstacles in the environment through the deep learning neural network. The central problem of object detection is to segment different objects from the environmental three-dimensional point cloud information, and to determine the nature of the object. This question is handed to semantic segmentation. The three-dimensional environmental information generated by this system is a sparse point cloud composed of points with varying intensity in the image. And under the color camera, each point also carries the intensity information of yellow, green and blue. The deep neural network module that can process three-dimensional point clouds, realize semantic segmentation, and target detection is usually divided into three-dimensional sparse convolutional neural network (3D CNN), point network neural network (PointNet) or a combination of the two. The three-dimensional sparse convolutional neural network can efficiently process the three-dimensional point cloud under the grid point approximation. Like the usual two-dimensional convolutional neural network, it is engineered by multiple convolutional layers and pooling layers. Figure 2 is the convolutional layer of the three-dimensional convolutional neural network, which consists of a 3x3x3 convolution kernel with a step size of 1 to perform convolution operations on the three-dimensional data. Figure 2 shows that for a 3x3x3 three-dimensional convolution kernel, when its input is 9x9x9, it outputs a new 7x7x7 three-dimensional data. In more cases, filling the input data edge with 0 can make the input and output have the same scale.

A complete three-dimensional convolutional neural network is composed of multiple three-dimensional convolutional layers and multiple three-dimensional pooling layers (Figure 3). It has multiple convolution kernels, which produce different features. Then the maximum pooling layer takes the maximum value of the pooling unit as the output of the unit. If 2x2x2 is used as the size of the pooling unit and the step size is 2, the three-dimensional data after pooling will be reduced by half in each direction. After the multi-level convolutional layer and the pooling layer, the data passes through the multi-layer fully connected layer, and finally the result of the classification judgment is output from the output layer, usually a normalized exponential function (Softmax).

Although the three-dimensional convolutional neural network can efficiently process the three-dimensional point cloud data after gridding, the gridding loses part of the subtle position information, causing errors. The PointNet method does not require gridded 3D data. It can directly use 3D point clouds as input to achieve semantic segmentation of 3D environmental point clouds.

The basic unit of the dot network is called Set Abstraction. It is composed of sampling combined layer and dot network layer. The sampling combination layer uses the iterative farthest point sampling method to divide the points in the three-dimensional space into a combination of points near the centroid region, and then use the multi-layer perceptron network to extract the point cloud features. The dot network is generally composed of multiple layers of combined abstract units to form a multi-level combined abstract function, and finally a fully connected layer and an output layer are used. The output layer uses the normalized index (Softmax) fully connected to output the classification results (Figure 4).

The tightly coupled autonomous driving perception system can help provide high-precision measurements of the environment and build maps without resorting to expensive laser scanning radar, reducing the cost of autonomous vehicles.

Based on the aforementioned tightly coupled automatic driving perception method, the present invention also provides an automatic driving object detection and positioning system 100. FIG. 3 is a schematic structural diagram of the automatic driving object detection and positioning system 100 according to an embodiment of the present invention, as shown in FIG. 5. As shown, the automatic driving object detection and positioning system 100 includes the stereo vision image processing module 10 of the binocular direct method, the satellite navigation module 20, the inertial navigation module 30, the system tight coupling module 40 and the deep neural network module 50, which can judge the environment in the environment. Goals and obstacles.

Specifically, the inertial navigation module 30 is used to obtain the measurement data of the inertial navigation module by using an inertial sensor; the details are as above. The stereo vision image processing module 10 adopts the binocular direct method to obtain the image data of the stereo vision module; the details are as above. The satellite navigation module 20 is used to obtain the original satellite navigation measurement data of the navigation satellite through the receiver; the details are as above. The system tight coupling module 40 is used to perform tight coupling processing on the measurement data of the inertial navigation module, the image data of the stereo vision module, and the original measurement data of the satellite navigation; the details are as above. The connection relationship between the various modules is: the stereo vision image processing module 10, the satellite navigation module 20, and the inertial navigation module 30 are all connected to the system tight coupling module 40.

Among them, the stereo vision image processing module 10 includes a binocular camera, and the binocular camera is installed on an autonomous vehicle; the details are as described above. Furthermore, the inertial navigation module 30 includes inertial sensors, and the inertial sensors are fixed on the autonomous vehicle; the details are as described above.

Among them, the deep neural network module uses a three-dimensional convolutional neural network or a three-dimensional dot network to realize object detection and semantic segmentation. The details are as described above.

The invention tightly couples the measurement data of the inertial navigation module, the image data of the stereo vision module, and the original measurement data of the satellite navigation to correct the error of the measurement data of the inertial navigation module, thereby improving the positioning accuracy without resorting to expensive Laser scanning radar, thereby reducing the cost of self-driving cars.

The above are only the preferred specific embodiments of the present invention, but the scope of protection of the present invention is not limited to this. Anyone familiar with the technical field within the technical scope disclosed in the present invention, according to the technical solution of the present invention Equivalent replacements or changes to its inventive concept should all fall within the protection scope of the present invention.

Claims

An automatic driving object detection and positioning system, which is characterized by comprising a stereo vision image processing module, a satellite navigation module, an inertial navigation module and a system tight coupling module, wherein:

Stereo vision image processing module, using binocular or multi-eye cameras to obtain image data of the stereo vision module;

The satellite navigation module is used to obtain the original measurement data of satellite navigation through the receiver;

The inertial navigation module is used to obtain the measurement data of the inertial navigation module by using inertial sensors;

The system tight coupling module is used to tightly couple the measurement data of the inertial navigation module, the image data of the stereo vision module and the original measurement data of the satellite navigation, and establish three-dimensional environment information, and finally use the deep neural network module to detect Targets and obstacles in the environment;

The deep neural network is composed of a three-dimensional sparse convolutional neural network, a dot network neural network, and a combination of the two;

The stereo vision image processing module, the satellite navigation module and the inertial navigation module are all connected to the system tight coupling module.
The automatic driving object detection and positioning system according to claim 1, wherein the stereo vision image processing module comprises a binocular or multi-eye camera.
A method for detecting and locating an autonomous driving object according to claim 1, characterized in that it comprises the following steps:

S1: Obtain the image data of the stereo vision module, the measurement data of the inertial navigation module and the original measurement data of the satellite navigation;

S2: tightly couple the image data of the stereo vision module, the original measurement data of the satellite navigation and the measurement data of the inertial navigation module to correct the drift error of the inertial navigation module;

S3: Tightly couple the image data of the stereo vision module, the original measurement data of the satellite navigation and the measurement data of the inertial navigation module to establish three-dimensional environment information, and then use the deep neural network module to judge and classify the targets and obstacles in the environment;

S4: The deep neural network uses a three-dimensional sparse convolutional neural network, a dot network neural network, or a combination of the two.
The method for detecting and positioning an automatic driving object according to claim 3, wherein the stereo vision module adopts a multi-eye, including binocular camera direct method for processing, and the image data includes multi-eye or dual-eye at the same time. The visual difference between the three-dimensional vision cameras and the image information captured by each camera at different times and positions are used to establish the three-dimensional information of the environment.
The method for detecting and positioning an automatic driving object according to claim 3, wherein the tight coupling is composed of a weighted reprojection error of stereo vision, satellite navigation error, and state error from inertial navigation to form a cost function.
The method for detecting and locating an autonomous driving object according to claim 3, wherein the S1 specifically includes the following steps:

S11: Use multi-eye or binocular cameras to obtain image data of the stereo vision module. The image data includes the visual difference between the multi-eye or binocular stereo vision cameras at the same time and the images captured by each camera at different times and positions information;

S12: Use inertial sensors to obtain measurement data of the inertial navigation module;

S13: Obtain the original measurement data of satellite navigation through the receiver;

S14: Perform tight coupling processing on the measurement data of the inertial navigation module, the image data, and the original measurement data of the satellite navigation.
The method for detecting and locating an autonomous driving object according to claim 6, wherein the S11 specifically includes the following steps:

S111: Use a multi-eye or binocular camera to collect environmental image signals;

S112: Combine the visual difference captured by multiple different cameras at the same time and the images captured by each camera at different times and positions to form a direct method of stereo vision observation;

S113: Combine the measurement data of inertial navigation and satellite navigation to establish three-dimensional environment information, and finally use the deep neural network module to classify and judge the targets and obstacles in the environment.
The method for detecting and positioning an autonomous driving object according to claim 6, wherein the S12 specifically includes the following steps:

S121: Use inertial sensors to measure the 3-axis acceleration and 3-axis angular velocity of the autonomous vehicle in a fixed coordinate system;

S122: Rotate the acceleration and the angular velocity to the navigation coordinate system, solve the inertial navigation mechanical arrangement equation, and calculate the position and attitude angle of the autonomous vehicle;

S123: Combine image data, inertial navigation data and satellite navigation data to establish three-dimensional environment information, and then use deep neural network modules to classify and judge objects and obstacles in the environment.
The method for detecting and positioning an automatic driving object according to claim 6, wherein said S14 specifically comprises: comparing the original measurement data of the satellite navigation with the image data of the stereo vision module to the inertial navigation module. The drift error is corrected.