CN114199259B

CN114199259B - Multi-source fusion navigation positioning method based on motion state and environment perception

Info

Publication number: CN114199259B
Application number: CN202210156085.7A
Authority: CN
Inventors: 孙蕊; 戴晔莹; 朱永文; 王均晖; 王长春; 林挺; 孔洁
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-06-17
Anticipated expiration: 2042-02-21
Also published as: CN114199259A

Abstract

The invention provides a multisource fusion navigation positioning method based on motion state and environment perception, which comprises the following steps of 1, collecting image frames by using a camera, eliminating dynamic points in the image frames, and obtaining static characteristic points and positions of the image frames; step 2, obtaining the position of a local camera based on the feature point position of each frame of image and a visual and IMU tight combination fusion positioning algorithm; step 3, obtaining the motion state of the vehicle based on the IMU measurement data; switching the self-adaptive algorithm according to the motion state of the vehicle; and 4, obtaining a final positioning result based on the local camera pose, the switching result of the self-adaptive algorithm, Kalman filtering and a GNSS, IMU and visual pine combined fusion positioning algorithm. The method effectively exerts the function of vision in integrated navigation, and improves the precision and robustness of multi-source fusion positioning.

Description

Multi-source fusion navigation positioning method based on motion state and environment perception

Technical Field

The invention belongs to the technical field of satellite positioning and navigation, and particularly relates to a multisource fusion navigation positioning method based on motion state and environment perception.

Background

Intelligent traffic is rapidly developing in urban environments, and the automatic driving technology has provided a lot of convenience for human life. The navigation positioning technology is the key for determining high precision, continuity and reliability of vehicle positioning and is also an important precondition for path planning and vehicle control. The GNSS is an important sensor for navigation and positioning, can provide high-precision space-time information in an open environment, but can cause reduction and even loss of positioning precision due to GNSS signal blockage and shielding in urban complex environments such as urban canyons, viaducts, tunnels and the like. It is difficult to maintain good positioning accuracy in a complicated and varied environment by using a single sensor, and thus, it is necessary to make up for the defects and shortcomings by using the advantages of each of the plurality of sensors. At present, the main research direction that GNSS performance descends among the solution urban environment is navigated in the multisensor integration, GNSS INS integrated navigation is the most extensive integrated navigation system who uses, but long-time GNSS is in sheltering from the region and only relies on inertial navigation system still can produce great error, consequently need other sensors to reduce the drift of being used to lead in the short time, the camera can carry out environmental perception and visual information acquisition through the image that obtains the surrounding environment, and have and be difficult for receiving advantages such as interference, low price, be ideal vehicle-mounted sensor. Zhang Yiran et al proposed a method for locally improving the positioning accuracy of GNSS by establishing a landmark database using the geographical position of landmarks through binocular vision, but only assisted with GNSS by a camera, and did not use the camera as a main sensor for integrated navigation, and Scheiwarong et al proposed a vision-assisted INS/GNSS navigation system based on a federal filter, which uses INS to perform loose integration with GNSS and vision respectively, but when the quality of observation information and data is not good, the performance of integrated navigation will be affected. Therefore, the research on the GNSS/IMU/visual combined navigation system with reliability and robustness has great development prospect.

Interpretation of terms:

GNSS Global Navigation Satellite System (Global Navigation Satellite System)

IMU inertia Measurement Unit (Inertial Measurement Unit)

INS Inertial Navigation System (Inertial Navigation System)

VIO Visual inertia odometer (Visual inertia odometer)

SLAM synchronous positioning and mapping (simultaneous localization and mapping)

VO Visual odometer (Visual odometer)

BN Batch Normalization (Batch Normalization)

ECEF Earth-core-Earth-Fixed (Earth-Fixed )

KLT sparse optical flow Tracking Method (Kanade-Lucas-Tomasi Tracking Method)

EKF Extended Kalman Filter (Extended Kalman Filter).

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides a multisource fusion navigation positioning method based on motion state and environment perception.

In order to solve the technical problem, the invention discloses a multi-source fusion navigation positioning method based on motion state and environment perception, which comprises the following steps.

Step 1, a camera is used for collecting image frames, dynamic points in the image frames are removed, and static characteristic points and positions of the static characteristic points of each image frame are obtained.

And 2, obtaining the position of the local camera based on the feature point position of each frame of image and a visual and IMU tight combination fusion positioning algorithm.

Step 3, obtaining the motion state of the vehicle based on the IMU measurement data; and switching the self-adaptive algorithm according to the motion state of the vehicle.

And 4, obtaining a final positioning result based on the local camera pose, the switching result of the self-adaptive algorithm, Kalman filtering and a GNSS, IMU and visual pine combined fusion positioning algorithm.

Further, the step 1 includes a step 1.1 of semantically segmenting pedestrians and vehicles of each frame of image in the image frame, for an image frame set OPic collected by a camera, respectively extracting and tracking feature points of each frame of image by using a KLT sparse optical flow method to obtain feature point positions, namely pixel coordinates of the feature points, and performing semantic segmentation of the vehicles and the pedestrians on the basis of a convolutional neural network to obtain the feature points of each frame of image belonging to semantic categories of the vehicles or the pedestrians.

The KLT sparse optical flow method extracts feature points, tracks the position change of the same pixel point in a pixel coordinate, and is the basis of subsequent pose estimation. And through semantic segmentation and dynamic point elimination, dynamic characteristic points of people and vehicles can be removed, so that the negative influence of the dynamic points on final pose estimation is reduced, the number of points tracked by a sparse optical flow method can be reduced, and the operation efficiency is improved.

Step 1.2, pre-integrating IMU measurement data to obtain an IMU pre-integration result, namely a relative pose relationship of two adjacent frames of images, including relative position, speed and attitude change relationships of the two adjacent frames of images, wherein the measurement data comprises measurement values output by an accelerometer and a gyroscope.

Step 1.3, removing dynamic points, and projecting feature points of each frame of image belonging to semantic categories of vehicles or pedestrians to the next frame of image based on an IMU pre-integration result as predicted positions of the feature points in the next frame of image based on the IMU pre-integration result; and eliminating the dynamic points by tracking the inconsistent pixel points of the actual positions and the predicted positions of the points in the next frame of image, namely the dynamic points, and taking the pixel points of the actual positions and the predicted positions as the static points, thereby obtaining the static characteristic points and the positions of the next frame of image.

Step 1.1, dynamic points of all vehicles and pedestrians are segmented, but not all vehicles and pedestrians are in motion, so that the moving part of the vehicle and pedestrian characteristic points obtained in step 1.1 needs to be found through IMU pre-integration. The division is carried out firstly, so that the pre-integral projection is not needed to be carried out on all the characteristic points, objects such as trees and street lamps are static, vehicles and pedestrians which are movable are found out firstly, only the part is projected, and the calculation amount can be reduced.

Further, the convolutional neural network in step 1.1 adopts an improved U-net convolutional neural network, which replaces convolutional layers in the encoder and the decoder with a Ghost module, which has the same function as a conventional convolutional module, but belongs to lightweight operation, and the linear transform operation used in the convolutional neural network is more low in cost and less in calculation amount while ensuring accuracy.

Semantic segmentation of vehicles and pedestrians based on a convolutional neural network includes step 1.1.1, training an improved U-net convolutional neural network model by using a data set; the data set acquires road images under different scenes in advance in a mode of vehicle-mounted camera acquisition, establishes a vehicle and pedestrian data set, or directly downloads the existing public road data set; and training the improved U-net convolutional neural network model to obtain the trained improved U-net convolutional neural network.

Step 1.1.2, inputting the image frame set OPic collected by the camera into a trained encoder of the improved U-net convolutional neural network to obtain a high-level feature vector of the image frame set OPic collected by the camera.

And 1.1.3, inputting the high-level feature vector of the image frame set OPic collected by the camera into a decoder of the trained improved U-net convolutional neural network to obtain a feature image for semantic category segmentation.

Step 1.1.4, inputting the characteristic image for semantic category segmentation into a softmax layer, and obtaining the characteristic point of each frame of image belonging to the semantic category of the vehicle or the pedestrian in an image frame set OPic collected by a camera.

Further, step 2.1, performing VIO initialization based on the feature point position of each frame of image and the IMU pre-integration result to obtain initial values of external parameters, the depth of the feature point, the tightly coupled position, the speed and the attitude between the camera and the IMU.

And 2.2, performing the vision and IMU tight combination nonlinear optimization of the sliding window according to initial values of external parameters, depths of characteristic points, tight coupling positions, speeds and postures between the camera and the IMU based on a nonlinear optimization mode, and solving the optimal camera pose, namely the local camera pose, in the sliding window.

Further, in the step 2.1, the pose transformation of the image frame is restored through a motion restoration structure technology based on the feature point position of each frame of image, the pose transformation of the image frame is aligned with the IMU pre-integration result, the external reference between the camera and the IMU is restored, and the transformation relation between the IMU coordinate system and the world coordinate system is obtained.

The step 2.2 comprises: obtaining a state vector X of a sliding window with a combination of vision and IMU, solving a nonlinear optimization problem with the combination of vision and IMU by using a minimum reprojection error algorithm, and solving the nonlinear optimization problem by using a nonlinear optimization library Ceres solution library, thereby solving the optimal camera pose in the sliding window.

The error function to be optimized for the visual and IMU tightly combined nonlinear optimization problem is the following formula.

In the formula, gamma_margIs prior information from marginalization, i.e. the oldest frame in the sliding window and its corresponding state quantity are marginalized as prior information; gamma ray_IAnd gamma_LRespectively an IMU residual error item and a visual residual error item; i is the set in which all IMU measurements are taken,

the covariance matrix of an IMU pre-integration noise item between the kth frame image and the (k + 1) th frame image is obtained; l is the set of feature points observed at least twice in the current sliding window,

is the covariance matrix of the visual observation noise of the ith pixel coordinate in the theta frame image in the set L.

Further, in step 3, the switching of the adaptive algorithm refers to constraining and correcting the GNSS, the IMU and the visual loose combination fusion positioning algorithm and the visual and IMU tight combination fusion positioning algorithm respectively by using zero-speed correction, a robust kernel function and incomplete constraint according to the motion state of the vehicle.

And 3.1, designing a sliding window with the size of N based on the measured value output by the accelerometer to solve an acceleration standard deviation factor, judging whether the vehicle state is in a static state or not according to the acceleration standard deviation factor, and performing static zero-speed correction in a GNSS, IMU and visual loose combined fusion positioning algorithm if the vehicle state is in the static state.

Step 3.2, if the vehicle is detected to be in a non-static state, the vehicle is in a turning state or a straight-going state; and if the vehicle is in a turning state, adding a robust kernel function into the vision and IMU tightly-combined fusion positioning algorithm.

And 3.3, if the vehicle is in a straight-going state, adding an incomplete constraint in a GNSS, IMU and vision loose combined fusion positioning algorithm to correct the accumulated error of inertial navigation.

The influence of the vehicle motion state on the positioning accuracy is large, and different constraints and corrections can be added into the algorithm by judging the difference of the vehicle motion state, so that the positioning accuracy is improved. The invention sets three motion state judgments, when the vehicle is static, the sensor IMU and the camera are not static due to self errors, which brings errors, so that the vehicle is accurately judged to be static, and the error is corrected by static zero speed correction. When the vehicle moves straight, lateral movement and vertical movement are not possible, and the use condition of incomplete restriction is met. When a vehicle turns, an image shot by the camera is blurred, the camera can generate mismatching when the feature points are matched, and the accuracy of the camera is greatly reduced, so that a robust kernel function is added, and the error influence is reduced.

Further, step 4 comprises a step 4.1 of resolving the GNSS position information.

And 4.2, initializing VIO and GNSS, and converting the local camera pose obtained in the step 2 and the GNSS position information obtained by resolving in the step 4.1 into corresponding information in a geocentric and geostationary coordinate system.

And 4.3, obtaining a global estimation pose, namely a final positioning result, based on Kalman filtering and a combined fusion positioning algorithm of the GNSS, the IMU and the visual pine.

Further, step 3.1 includes a sliding window acceleration standard deviation of magnitude N

Expressed as the following equation.

Wherein the content of the first and second substances,

respectively outputting three axes of an accelerometer xyz at the time t; i denotes the time index of the acceleration data within the sliding window, i ∈ [ t-N +1, t ∈](ii) a μ is the average of all accelerations within the sliding window, and as a mathematical expectation, an unbiased estimate is used here.

By setting a stationary threshold value of the vehicle during movement of the vehicle

To judge whether the vehicle is stationary or not, when the acceleration standard deviation of the vehicle is

And (3) judging that the vehicle is in a zero-speed static state, adding zero-speed correction to the vehicle, namely in the Kalman filtering process of the step 4.3, using the speed solved by inertial navigation as the observed quantity of the system speed error to carry out Kalman filtering estimation, and using the updated state estimator to feed back and correct the navigation parameter result in the INS subsystem so as to finish static zero-speed correction.

Wherein the vehicle is stationary

Setting as dynamic threshold, setting a sliding window with size of M, acceleration standard deviation at different time in the sliding window

Threshold value of vehicle standstill

The zero speed state of the vehicle can be detected more flexibly by changing adaptively with the sliding of the sliding window M.

Wherein j represents the standard deviation of the acceleration within the sliding window M

Is given by the time index of (j) e [ t-M +1, t ∈]，η_jThe weights are weights of the respective acceleration standard deviations, and the smaller the weight of the acceleration standard deviation farther from the current time t is, the larger the weight is the closer the acceleration standard deviation is.

Defining the upper limit of the size of the sliding window M as M_maxThe lower limit is M_minFirst get M_minAs the length of the sliding window, a current group is set

When the mean value sigma is less than or equal to the threshold epsilon of the stationarity of the acceleration value, the acceleration value representing the period of time is relatively stable, a larger sliding window length M value is selected, and when the mean value sigma is more than the threshold epsilon of the stationarity of the acceleration value, the sliding window length M representing the staged change of the acceleration value is selected, and a smaller sliding window length M is selected to improve the sensitivity.

Wherein M' belongs to [0, M ∈_max-M_min]σ is taken to be M_minWhen the window length is, the group

The mean value of (a); collecting acceleration output by IMUs of a plurality of groups of vehicles in different motion states to calculate a plurality of groups of sigma, and selecting a critical value of whether the acceleration value is stable or not as an acceleration value stability threshold epsilon.

When the vehicle is static, the output of the accelerometer is stable, and when the vehicle is not static, the data has larger fluctuation, and the standard deviation can reflect the stability of a group of data. In different static scene time, the stability threshold values of the acceleration values should be different due to external factors such as shaking of the vehicle body and the like, so that the standard deviation of some nearby time periods has reference value for determining the stability threshold values of the acceleration values, and the stability threshold values of the dynamic acceleration values are set.

Further, step 3.2 includes designing a turn detection factor based on the average value of the angular velocity based on the measured value of the angular velocity, and using the average value | ω | of the angular velocity within the time of the sliding window N as the judgment basis for the turning of the vehicle when the average value of the angular velocity is used

If so, the vehicle is judged to be in a turning state, otherwise, the vehicle is judged to be in a straight running state.

In the formula, ω_zIs the output of the gyroscope in the z-axis direction; TH_ωIs a set angular velocity mean threshold.

And if the vehicle is in a turning state, adding a robust kernel function into a visual and IMU (inertial measurement unit) close-combination fusion positioning algorithm, and adding a robust kernel function Huber kernel function into a residual error term of visual measurement in an error function to be optimized of a visual and IMU close-combination nonlinear optimization problem.

And delta is a threshold value for judging whether the visual residual error item is overlarge, respectively acquiring data with good straight-ahead quality of the vehicle and data with image blurring and mismatching during turning, and comparing the obtained visual residual error items so as to determine the threshold value delta of the visual residual error item for judging whether the quality of the visual data is good or bad.

In the step 3.3, if it is determined that the vehicle is in a straight-ahead state, the vehicle does not generate lateral sideslip and motion along the vertical direction of the road surface in the driving process, and an incomplete constraint is added in the kalman filtering process of the step 4.3 to correct the accumulated error of inertial navigation, that is, the speeds in the lateral direction and the vertical direction are set to be zero.

When the vehicle moves straight, lateral movement and vertical movement are not possible, and the use condition of incomplete constraint is met. When the vehicle turns, the image shot by the camera is blurred, the camera can generate mismatching when the feature points are matched, and the accuracy of the camera is greatly reduced, so that a robust kernel function is used, and the error influence is reduced.

Further, step 4.1 comprises: removing satellites with satellite elevation angles lower than a satellite elevation angle threshold value to obtain satellite observation data after screening; and resolving to obtain GNSS position information through a pseudo-range single-point positioning algorithm according to the screened satellite observation data.

Step 4.3 comprises: inputting corresponding information of the local camera pose and the GNSS position information obtained in the step 4.2 in a geocentric geocoordinate system and a switching result of the adaptive algorithm in the step 3 into a Kalman filter to obtain error correction information; inputting the error correction information into the sliding window-based close-combination nonlinear optimization of the step 2.2, and carrying out feedback correction on the navigation parameters; and finally outputting the corrected pose information as a global estimation pose, namely a final positioning result.

If the positioning of the GNSS itself is problematic, the combined effect of it and the VIO is greatly affected. The signals of the satellites with elevation angles lower than the elevation angle threshold of the satellites are easy to reflect by obstacles such as buildings, so that the measurement information has large errors, and therefore the satellites with the low elevation angles need to be rejected.

The GNSS can provide an absolute positioning position of the vehicle, and under the condition of good signals, the GNSS positioning precision is very high, and no accumulated error exists, so that the GNSS and the VIO are combined for navigation to obtain a good global pose result.

Has the advantages that: 1. aiming at the problem that the GNSS/IMU combined navigation can drift under the condition that the GNSS fails for a long time, the camera and the GNSS/IMU are adopted for combined navigation, firstly, the visual information and the IMU are tightly combined based on sliding window optimization, and then, the output local pose and the positioning result of the GNSS are input into Kalman filtering for loose combination, so that the effect of the vision on the combined navigation is effectively exerted, and the precision and the robustness of multi-source fusion positioning are improved.

2. Aiming at the problem of dynamic point elimination in combined navigation data preprocessing, the improved U-net convolutional neural network is adopted, namely, a Ghost module is used for replacing an original traditional convolutional layer, semantic segmentation is carried out on a camera shooting picture, pedestrian and vehicle characteristic points are extracted, the part of characteristic points are projected into a next frame of image based on an IMU (inertial measurement unit) pre-integration result, points with different positions are found out and are taken as dynamic points, efficient dynamic point elimination is effectively achieved, and computing efficiency is optimized.

3. Aiming at the problem of accuracy reduction caused by the change of the motion state of the vehicle, the invention constructs a detection factor based on IMU acceleration and angular velocity, detects whether the motion state of the vehicle is static, turning or straight, and respectively uses zero velocity correction, robust kernel function and incomplete restriction to restrict and correct the filtering process and nonlinear optimization aiming at the three states.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

Fig. 1 is a schematic flow chart of a multi-source fusion navigation positioning method based on motion state and environmental awareness according to an embodiment of the present application.

Fig. 2 is a schematic diagram of the Ghost module.

Fig. 3 is a schematic diagram of a network structure of an improved U-net convolutional neural network.

Fig. 4 is an image frame acquired by a camera.

Fig. 5 is a diagram illustrating a dynamic point culling effect of fig. 4 according to the prior art.

Fig. 6 is a schematic diagram illustrating a dynamic point elimination effect in a multi-source fusion navigation positioning method combining a vehicle motion state and environment perception provided by an embodiment of the present application.

Fig. 7 is a schematic diagram of comparing three-dimensional position errors in a northeast coordinate system of navigation and positioning by a multi-source fusion navigation and positioning method combining vehicle motion state and environmental perception provided by the prior art and the embodiment of the present application.

Fig. 8 is a schematic diagram illustrating a comparison of attitude errors in navigation and positioning by a multi-source fusion navigation and positioning method combining vehicle motion state and environmental awareness according to the prior art and the embodiments of the present application.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings.

The prior art has the following problems.

1. Currently, the most widely used GPS/INS integrated navigation system is adopted, under the condition that the GNSS fails for a long time, the IMU can generate error accumulation and drift, and the INS is high in manufacturing cost and long in production period. Meanwhile, most of the existing visual combined navigation methods are VO or visual assisted GNSS positioning, and the visual positioning effect is not fully exerted.

2. The existing GNSS/IMU/vision combined navigation has the problem of lack of dynamic environment self-adaption, dynamic points are not accurately and efficiently removed in the vision data preprocessing, and the defects that a large amount of calculation is brought to processing of all characteristic points and all states of pedestrian and vehicle characteristic points are removed exist.

3. When the motion state of the vehicle is changeable, the positioning precision of the existing GNSS/IMU/vision combined navigation is reduced, and the problems of performance reduction and insufficient reliability exist.

In order to meet the requirements of high-precision, continuity and reliability navigation and positioning of vehicles under all-weather and all-road conditions and improve the self-adaptability of the vehicles in a complex environment and motion state change, the embodiment of the invention provides a multi-source fusion navigation and positioning method combining vehicle motion state and environment perception, and as shown in figure 1, the method comprises the following steps of 1, acquiring image frames by using a camera, eliminating dynamic points in the image frames and obtaining static characteristic points and positions of the image frames.

Step 1.1, based on improved convolution neural network, semantic segmentation of pedestrians and vehicles

For an image frame set OPic collected by a camera, firstly, respectively extracting and tracking feature points of each frame of image by using a KLT sparse optical flow method, obtaining feature point positions, namely pixel coordinates of the feature points, inputting the image frame set OPic collected by the camera into a convolutional neural network to carry out semantic segmentation of 'vehicles' and 'pedestrians', and finally carrying out dynamic point elimination by using an IMU pre-integration result.

The KLT sparse optical flow method extracts and tracks feature points of each frame of image, specifically refer to reference: peri-epi, Marie ORB feature matching optimization based on sparse optical flow methods [ J ] applied optics, 2019,40(04): 583-.

In the step 1.1, the convolutional neural network adopts an improved U-net convolutional neural network, as shown in fig. 3, a conventional convolutional layer is replaced by a Ghost module for feature extraction, and then the image is subjected to semantic segmentation. The improved U-net convolutional neural network is of an encoder-decoder structure, a Ghost module is used for replacing a traditional convolutional layer in an encoder part to extract features, an up-sampling and convolution combination method is adopted in a decoder part to achieve deconvolution, and the Ghost module is used for replacing the traditional convolutional layer.

Step 1.1.1, training the improved U-net convolutional neural network model by using a data set. The data set acquires road images under different scenes in a mode of vehicle-mounted camera acquisition, establishes a vehicle and pedestrian data set, or directly downloads the existing public road data set, such as a KITTI data set. And training the improved U-net convolutional neural network model to obtain the trained improved U-net convolutional neural network.

And step 1.1.2, inputting the image collection OPic collected by the camera into a coder of the improved U-net convolutional neural network after the input, and obtaining the high-level feature vector of the image collection OPic collected by the camera.

The output of the feature map is increased by using the Ghost module, and is divided into two parts, as shown in fig. 2, and the first part is a traditional convolutional layer which outputs a small number of feature maps, as shown in formula (1).

Y’＝OPic*f’ (1)

Where Y 'is the set of output feature maps, f' is the convolution kernel of the convolution operation, pic is the set of image frames captured by the camera, and x is the conventional convolution. The number of channels of the output signature of this convolution operation is smaller than the number of channels of a conventional convolution layer.

The second part is a lightweight linear transform layer for generating redundant feature maps, as shown in equation (2).

Wherein, the first and the second end of the pipe are connected with each other,

for m feature images of Y', each feature map of Y

All operated by lightweight linear

Obtaining s feature images, here

It may be a convolution operation of 3 × 3 or 5 × 5. Therefore, m feature maps collectively obtain m × s feature images by this linear operation. And setting the number of Ghost modules as 5, and performing down-sampling on the picture through maximum pooling after each Ghost module to reduce the dimension of the features. And outputting the high-level feature vectors of the image after passing through 5 Ghost modules, and transmitting the high-level feature vectors into a decoder.

And 1.1.3, decoding, namely decoding the extracted features by adopting 4 decoding modules, wherein each decoding module consists of a deconvolution module followed by a Ghost module, and the coding of the same level is the same as the convolution module of the decoding part. And the output of the upper level and the feature graph output by the corresponding encoder are used as the input of the decoding module, the features of the lower level output by the encoder are fused, and all the features are fully utilized to obtain the feature image for semantic category segmentation.

Step 1.1.4, inputting the feature image for semantic category segmentation into a softmax layer, thereby obtaining the probability that each pixel of each frame of image in an image frame set OPic collected by a camera belongs to a semantic category of 'vehicle' or 'pedestrian', as shown in formula (3).

Wherein s is_d(x) The score of each pixel point x corresponding to the semantic category D is D, which is the number of semantic categories, here, two categories are "pedestrian" and "vehicle", and D is 2. Q_d(x) Is the result of the classification of the semantic class d to the pixel point x such that the most likely result is maximized while suppressing the probability of other classes. According to multiple tests, multiple groups of images of pedestrians and vehicles are collected, multiple groups of probabilities of semantic categories can be obtained through the improved U-net network, for example, the probability of the pedestrian category obtained by the pedestrian feature points in the images and the probability of the vehicle category obtained by the vehicle feature points are both greater than 0.8, and then the probability experience threshold Q is obtained_thrIs 0.8. For a certain feature point, it will find out two probabilities, one is the probability Q that it belongs to a pedestrian₁(x) And secondly, the probability Q that it belongs to a vehicle₂(x) The obtained probability and a determined probability empirical threshold Q are compared_thrBy comparison, if Q₁(x) Greater than a probabilistic empirical threshold Q_thrThen the pixel is determined to belong to the "pedestrian" category, if Q₂(x) Greater than Q_thrIt is deemed to belong to the "vehicle" category.

The measurement data of the accelerometer and gyroscope are shown in equation (4) and equation (5).

Wherein the content of the first and second substances,

is the measurement of accelerometer output at time t output by the IMU; a is_tIs the true value of acceleration; b_aIs the bias of the accelerometer; n is a radical of an alkyl radical_aIs the accelerometer noise term; w represents the world coordinate system, g^wIs the gravitational acceleration in the world coordinate system;

the rotation matrix from a world coordinate system to a carrier coordinate system can be obtained through the mechanical arrangement of inertial navigation;

is a gyroscope measurement; omega_tIs the true value of angular velocity; b_gIs the gyroscope bias; n is_gIs the gyroscope noise term.

For image frames k and k +1, the above two equations are integrated, and the time t can be calculated_kAnd t_k+1Relative position, velocity, attitude changes therebetween. The pre-integration equations are shown in equations (6) to (11).

Wherein, Δ t_kIs t_kAnd t_k+1The time interval between, B denotes the carrier coordinate system,

is a rotation matrix from a carrier coordinate system to a world coordinate system;

and

respectively the position, the speed and the posture of an image frame k and an image frame k +1 under a world coordinate system;

representing the posture variation of the camera relative to the k frame at the time t; Ω is a quaternion representing the attitude;

and

respectively representing true values omega of angular velocity at t time_tThe components on the xyz three axes of the carrier coordinate system, i.e.

And 1.3, removing dynamic points, based on an IMU pre-integration result, namely a relative pose relationship between two continuous frames, including relative position, speed and pose change relationships of two adjacent frames of images, projecting part of feature points with semantic categories of 'vehicles' and 'pedestrians' into the next frame of image based on the pose change, and taking the part of feature points as predicted positions of the part of feature points in the next frame of image. And eliminating the dynamic points to obtain the static characteristic points and the positions thereof of the next frame image by tracking the inconsistent pixel points of the actual positions and the predicted positions of the points in the next frame image, namely the dynamic points, and taking the pixel points of which the actual positions and the predicted positions are consistent as the static points.

Fig. 4 is an image frame acquired by a camera, fig. 5 is a dynamic point removing effect of the prior art on fig. 4, fig. 6 is a dynamic point removing effect of the embodiment of the present application, and it can be seen from fig. 6 that the dynamic point removing of the embodiment of the present application can completely divide the part of pixels of the "person" and completely remove the pixel points of the moving person, whereas the prior art cannot completely remove all dynamic points, and the moving "person" in fig. 5 still has partial dynamic feature point residues.

Step 2.1, VIO initialization.

The feature point matching result after the dynamic point is removed restores the pose transformation of the image frame through a motion restoration structure technology, restores the initial values of the depth and position speed postures of the external reference feature points of the Camera, specifically refer to the reference Yang Z, Shen S. The initialization process is completed and all the metric values are entered into the next step.

And 2.2, solving the optimal camera pose in the sliding window, namely the local camera pose, based on the nonlinear optimization of the sliding window.

The state vectors of the tight combination sliding window are shown in equations (12) to (14).

Wherein X is a state vector; chi shape_hIs the state of the IMU at the h-th moment in the sliding window, and comprises the position of the IMU in the world coordinate system at the h-th frame image

Speed of rotation

And posture

Where h ∈ [0, kf ]]Kf is the total number of key frames extracted from the image frames acquired by the camera in step 1.1; fp is the total number of features in the sliding window; lambda [ alpha ]_lIs the inverse depth of the point in the camera coordinate system when the first feature point is observed for the first time, which is the reciprocal of the depth obtained by the initialization process, where l ∈ [0, fp]；

Is an external parameter of the camera, C is a camera coordinate system,

and

respectively an origin translation vector and a rotation attitude vector between a camera coordinate system and a carrier coordinate system.

And solving the optimal camera pose by using a minimum reprojection error method, wherein the to-be-optimized error function of the nonlinear optimization problem is shown as a formula (15).

In the formula, gamma_margIs prior information from marginalization, i.e. the oldest frame in the sliding window and its corresponding state quantity are marginalized as prior information; gamma ray_IAnd gamma_LRespectively an IMU residual error item and a visual residual error item; i is the set of all IMU measurements therein,

The IMU residual term is shown in equation (16).

In the formula (I), the compound is shown in the specification,

and

three-dimensional error state representation of the position and the speed between the image of the k frame and the image of the (k + 1) th frame respectively;

is a three-dimensional error state representation of the attitude quaternion from the kth frame image to the (k + 1) th frame image; Δ b_aAnd Δ b_gAre respectively provided withAre error states of the accelerometer and gyroscope biases.

Assuming that the ith feature point is observed for the first time in the upsilon frame image, the visual residual term of the upsilon frame image (theta > upsilon) is as shown in formula (17).

Wherein the content of the first and second substances,

the coordinates of the ith characteristic pixel point appearing in the upsilon frame image are observed for the first time;

is the pixel coordinate of the observation of the same feature point in the theta frame image;

is the pixel coordinates before back projection;

is a back projection function, which projects the residual vector onto a unit sphere, divides the unit sphere by the inverse depth to obtain the coordinates of the pixel points in a camera coordinate system, projects the coordinates onto a normalized tangent plane,

and

are two arbitrarily chosen orthogonal bases in the tangential plane;

the rotation matrix from the camera coordinate system to the carrier coordinate system is known as the external parameter of the camera.

And

is a rotation matrix and an origin translation vector from a carrier coordinate system to a world coordinate system when the upsilon frame image is provided,

and

the rotation matrix and the origin translation vector of the carrier coordinate system from the world coordinate system to the theta frame image are obtained from the camera attitude at the moment.

Solving the nonlinear optimization problem by using a nonlinear optimization library Ceres solution library to ensure that the error function (15) to be optimized obtains the minimum value, namely the optimal carrier pose estimation p is obtained_vioAnd q is_vioAnd inputting the data to step 4 to be loosely combined with the GNSS.

And (3) inputting the acceleration and the angular velocity measured by the IMU into the step (3) to perform the switching of the self-adaptive algorithm.

Step 3.1, designing a sliding window acceleration standard deviation factor with the size of N based on the measured value of the acceleration

And judging whether the vehicle state is in a static state or not according to the acceleration standard deviation factor, and if the vehicle state is in the static state, performing static zero-speed correction in a GNSS, IMU and visual loose combined fusion positioning algorithm.

Sliding window acceleration scale with size of NTolerance of the laser

As shown in equations (20) and (21).

Wherein the content of the first and second substances,

respectively outputting three axes of an accelerometer xyz at the time t; n is the length of the sliding window, where the window length is set to 100 data, N equals 100; i denotes the time index of the acceleration data within the sliding window, i ∈ [ t-N +1, t ∈](ii) a Mu is the average of all accelerations within the sliding window, and as a mathematical expectation, here an unbiased estimate is used,

is the standard deviation of the acceleration calculated by the sliding window at the time t.

During the movement of the vehicle, the static threshold value of the vehicle can be set

To determine whether the vehicle is stationary.

In which the vehicle is stationary

Then the fixed threshold is not set and a window of size M is set. Acceleration standard deviation at different times within the window

Threshold value for vehicle standstill

The zero speed state of the vehicle can be detected more flexibly by changing the sliding of the window in a self-adaptive manner.

Where j represents the time index of the acceleration standard deviation within the sliding window M, j ∈ [ t-M +1, t ∈ [ ]]，η_jThe weights of the respective standard deviations are set, and the smaller the weight of the standard deviation farther from the current time is, the larger the weight of the standard deviation closer to the current time is.

It is important to select a suitable size M. Therefore, a window size self-adaption method is designed, and the requirements of the sensitivity and the precision of the sliding window are met. Defining the upper limit of the size of the sliding window to be M_maxThe lower limit is M_minFirst get M_minAs the window length. Set a current group

When the mean value sigma is less than or equal to the threshold epsilon of the stability of the acceleration value, the acceleration value representing the period of time is relatively stable, a larger M value is selected, and when the mean value sigma is greater than the threshold, the acceleration value representing the stage change is selected, and a smaller window length is selected to improve the sensitivity.

Wherein, M' belongs to [0 ],M_max-M_min]σ is taken to be M_minThe mean of the set of σ ajs for the window length; collecting acceleration output by IMUs of a plurality of groups of vehicles in different motion states to calculate a plurality of groups of sigma, and selecting a critical value of whether the acceleration value is stable or not as an acceleration value stability threshold epsilon. Through test experiments, the result that the window length is 20-50 is less different, so M_maxTake 50, M_minTaking 20; the acceleration value stationarity threshold ε is set to 0.05.

And when the acceleration value of the vehicle meets the formula (22), judging that the vehicle is in a zero-speed static state, adding zero-speed correction to the vehicle, namely in the Kalman filtering process of the step 4.3, using the velocity solved by inertial navigation as the observed quantity of the system velocity error, performing Kalman filtering estimation, and using the updated state estimator to feedback and correct the navigation parameter result in the INS subsystem to finish the static zero-speed correction.

And 3.2, if the vehicle is detected to be in a non-static state, entering a linear motion detection part, designing a turning detection factor based on the angular velocity mean value based on the measured value of the angular velocity, and adopting the angular velocity mean value | omega | in the sliding window N time as a judgment basis for turning of the vehicle.

Wherein, | ω | is the average value of the angular velocity, and N is the data amount in the sliding window; omega_zIs the output of the gyroscope in the z-axis direction; TH (TH)_ωFor a set angular rate mean threshold, set here at 2 °/s. When the z-axis angular rate exceeds TH_ωIf so, that is, if equation (27) is satisfied, it is determined that the vehicle is in a turning state, otherwise, it is determined that the vehicle is traveling straight.

When a vehicle turns, because vision generates larger errors due to image blurring and characteristic point matching is prone to errors, a robust kernel function is added in the process of tightly combining vision and IMU to improve the tolerance of vision navigation, a robust kernel function Huber kernel function is added in a vision residual error item in the formula (15), and when the residual error item is larger, the original quadratic function is changed into a linear function, so that the influence of error data is reduced.

And delta is a threshold value for judging whether the visual residual error item is overlarge, data with good straight-ahead quality of the vehicle and data with image blurring and mismatching during turning are respectively collected, and the obtained visual residual error item is compared, so that the visual residual error item threshold value delta for judging whether the visual data quality is good or bad can be determined. For example, the residual term values obtained from the data with better quality are all less than 0.5, and the residual term values when the image is blurred are all greater than 0.5, the value of the residual term value δ is set to be 0.5.

And 3.3, if the algorithm judges that the vehicle is in a straight-ahead state, the vehicle cannot generate transverse sideslip and movement in the vertical direction of the road surface in the running process, and incomplete constraint is added in the Kalman filtering process of the step 4.3 to correct the accumulated error of inertial navigation, namely, the speeds in the transverse direction and the vertical direction are set to be zero.

Step 4.1, GNSS position information resolving

And removing the satellites with the satellite elevation angles lower than the satellite elevation angle threshold value. Empirically setting the satellite elevation threshold to alpha_thre15 DEG, and the satellite elevation angle alpha obtained by traversing_sat(sat＝1,2,…,num_vis)，num_visThe number of visible stars for the current epoch if alpha is satisfied_sat>α_threAnd if not, deleting the observation data related to the satellite sat. And solving the screened observation information through a pseudo-range point positioning algorithm to obtain GNSS position information.

And 4.2, initializing the VIO/GNSS, and converting the local camera pose obtained in the step 2 and the GNSS position information obtained by resolving in the step 4.1 into corresponding information in a geocentric and geostationary coordinate system.

The pose estimate of the VIO is in the local coordinate system, so it must be converted to the earth-centered earth-fixed ECEF coordinate system. Recording the initial one-minute GNSS position information of the start point in the step 4.1, namely the positions of the origin points of the northeast coordinate system and the carrier coordinate system (the origin points are the same) in the ECEF coordinate system

The heading angle offset ψ between the carrier coordinate system and the northeast coordinate system is corrected using a low-noise doppler measurement. The rotation matrix from the northeast to the ECEF coordinate system is shown in equation (30).

Wherein n is the northeast coordinate system; e is the ECEF coordinate system; alpha and beta are respectively the latitude and longitude of the origin of the northeast coordinate system, the rotation matrix from the carrier coordinate system to the northeast coordinate system

As shown in equation (31).

Therefore, the transformation relationship from the carrier coordinate system to the ECEF is shown in equation (32).

Wherein the content of the first and second substances,

are the coordinates of the receiver or GNSS in ECEF,

is the difference between the position of the receiver and the origin of the carrier coordinate system in the carrier coordinate system. To this end, GNSS location information

And navigation parameters p of VIO output_vioAnd q is_vioUnified under the ECEF coordinate system.

Step 4.3, GNSS/IMU/visual loose integrated navigation based on Kalman filtering

And (3) inputting corresponding information of the local camera pose and the GNSS position information obtained in the step (4.2) in a geocentric and geostationary coordinate system and a switching result of the adaptive algorithm in the step (3) into a Kalman filter, and outputting to obtain corresponding error correction information. And inputting the output error correction information into the sliding window-based tight combination nonlinear optimization of the IMU/visual tight combination process in the step 2.2 to perform feedback correction on the navigation parameters, and finally outputting the corrected pose information as a final positioning result.

Fig. 7, fig. 8, tables 1 and table 2 are schematic diagrams of comparison of three-dimensional position errors and attitude errors in a northeast coordinate system of a navigation and positioning method of multi-source fusion navigation and positioning in combination with vehicle motion state and environmental perception and a root mean square error table, which are provided by the prior art and embodiments of the present invention.

Table 1 three-dimensional position root mean square error table of the present application and prior art

Table 2 root mean square error table of attitude of the present application and prior art

The invention provides a multisource fusion navigation positioning method based on motion state and environment perception, and a plurality of methods and ways for implementing the technical scheme are provided, the above description is only a specific implementation manner of the invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and decorations can be made without departing from the principle of the invention, and these improvements and decorations should also be regarded as the protection scope of the invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A multi-source fusion navigation positioning method based on motion state and environment perception is characterized by comprising the following steps:

step 1, collecting image frames by using a camera, eliminating dynamic points in the image frames, and obtaining static characteristic points and positions of the static characteristic points of each image frame;

step 2, obtaining the position of a local camera based on the feature point position of each frame of image and a visual and IMU tight combination fusion positioning algorithm;

step 3, obtaining the motion state of the vehicle based on the IMU measurement data; switching the self-adaptive algorithm according to the motion state of the vehicle;

step 4, obtaining a final positioning result based on the local camera pose, the switching result of the self-adaptive algorithm, Kalman filtering and a GNSS, IMU and visual pine combined fusion positioning algorithm;

the step 1 comprises the following steps:

step 1.1, semantically segmenting pedestrians and vehicles of each frame of image in an image frame, for an image frame set OPic collected by a camera, respectively extracting and tracking feature points of each frame of image by using a KLT sparse optical flow method to obtain feature point positions, and semantically segmenting the vehicles and the pedestrians on the basis of a convolutional neural network to obtain feature points of each frame of image belonging to semantic categories of the vehicles or the pedestrians;

step 1.2, performing pre-integration on IMU measurement data to obtain an IMU pre-integration result, namely a relative pose relationship of two adjacent frames of images, including relative position, speed and attitude change relationships of the two adjacent frames of images, wherein the measurement data comprises measurement values output by an accelerometer and a gyroscope;

step 1.3, removing dynamic points, and projecting feature points of each frame of image belonging to semantic categories of vehicles or pedestrians to the next frame of image based on an IMU pre-integration result as predicted positions of the feature points in the next frame of image based on the IMU pre-integration result; the dynamic point is removed by tracking the inconsistent pixel points of the actual positions and the predicted positions of the points in the next frame of image, namely the dynamic point, and the pixel points of the actual positions and the predicted positions are static points, so that the static characteristic points and the positions of the static characteristic points of the next frame of image are obtained;

in the step 1.1, the convolutional neural network adopts an improved U-net convolutional neural network, the improved U-net convolutional neural network encoder and decoder include a Ghost module, and performing semantic segmentation on vehicles and pedestrians based on the convolutional neural network includes:

step 1.1.1, training an improved U-net convolutional neural network model by using a data set; the data set acquires road images under different scenes in advance in a mode of vehicle-mounted camera acquisition, establishes a vehicle and pedestrian data set, or directly downloads the existing public road data set; training the improved U-net convolutional neural network model to obtain a trained improved U-net convolutional neural network;

step 1.1.2, inputting an image frame set OPic collected by a camera into a trained encoder of an improved U-net convolutional neural network to obtain a high-level feature vector of the image frame set OPic collected by the camera;

step 1.1.3, inputting high-level feature vectors of an image frame set OPic collected by the camera into a decoder of a trained improved U-net convolution neural network to obtain feature images for semantic category segmentation;

step 1.1.4, inputting a characteristic image for semantic category segmentation into a softmax layer to obtain a characteristic point of each frame of image belonging to a vehicle or pedestrian semantic category in an image frame set OPic acquired by a camera;

in step 3, the switching of the self-adaptive algorithm refers to restraining and correcting the GNSS, the IMU and the visual loose combination fusion positioning algorithm and the visual and IMU tight combination fusion positioning algorithm respectively by using zero speed correction, a robust kernel function and incomplete restraint according to the motion state of the vehicle;

the step 3 comprises the following steps:

step 3.1, designing a sliding window with the size of N to solve an acceleration standard deviation factor based on a measured value output by an accelerometer, judging whether the vehicle state is in a static state or not according to the acceleration standard deviation factor, and if the vehicle state is in the static state, performing static zero-speed correction in a GNSS, IMU and visual loose combined fusion positioning algorithm;

step 3.2, if the vehicle is detected to be in a non-static state, the vehicle is in a turning state or a straight-going state; if the vehicle is in a turning state, adding a robust kernel function into a visual and IMU (inertial measurement unit) close-combination fusion positioning algorithm;

and 3.3, if the vehicle is judged to be in a straight-going state, adding incomplete constraint in a GNSS, IMU and vision loose combination fusion positioning algorithm to correct the accumulated error of inertial navigation.

2. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 1, wherein the step 2 comprises:

step 2.1, performing VIO initialization based on the feature point position of each frame of image and the IMU pre-integration result to obtain initial values of external parameters, the depth of the feature point, the tightly coupled position, the speed and the posture between the camera and the IMU;

3. The multi-source fusion navigation positioning method based on motion state and environmental perception according to claim 2, characterized in that in step 2.1, based on the feature point position of each frame of image, the pose transformation of the image frame is restored through a motion restoration structure technology, and then the pose transformation of the image frame is aligned with the IMU pre-integration result, so as to restore the external reference between the camera and the IMU and obtain the transformation relation between the IMU coordinate system and the world coordinate system;

the step 2.2 comprises: acquiring a state vector X of a sliding window with a combination of vision and IMU, solving a nonlinear optimization problem with the combination of vision and IMU by using a minimum reprojection error algorithm, and solving the nonlinear optimization problem by using a nonlinear optimization library Ceres solution library so as to solve the optimal camera pose in the sliding window;

the error function to be optimized for the tight combination of vision and IMU nonlinear optimization problem is:

4. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 3, wherein the step 4 includes:

step 4.1, resolving GNSS position information;

step 4.2, initializing VIO and GNSS, and converting the local camera pose obtained in the step 2 and the GNSS position information obtained by resolving in the step 4.1 into corresponding information under a geocentric geostationary coordinate system;

5. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 4, wherein the step 3.1 comprises:

sliding window acceleration standard deviation of magnitude N

Expressed as the following equation:

wherein the content of the first and second substances,

respectively outputting three axes of an accelerometer xyz at the time t; i denotes the time index of the acceleration data within the sliding window, i ∈ [ t-N +1, t ∈](ii) a μ is the average of all accelerations within the sliding window, as a mathematical expectation, where an unbiased estimate is used;

When the vehicle is determined to be at zero speed and quietStopping the state, namely adding zero-speed correction to the vehicle, namely in the Kalman filtering process of the step 4.3, using the velocity solved by inertial navigation as the observed quantity of the system velocity error to carry out Kalman filtering estimation, and using the updated state estimator to feedback and correct the navigation parameter result inside the INS subsystem to complete static zero-speed correction;

wherein the vehicle is stationary

Threshold value of vehicle standstill

The zero speed state of the vehicle can be detected more flexibly by changing along with the sliding of the sliding window M in a self-adaptive manner:

Is given by the time index of (j) e [ t-M +1, t ∈]，η_jThe weight of each acceleration standard deviation, the more distant the acceleration standard deviation from the current time t, the smaller the weight, and the more close the weight, the larger the weight;

When the average value sigma is less than or equal to the stability threshold epsilon of the acceleration value, the acceleration value representing the period of time is relatively stable, the length M of the sliding window is increased, and when the average value sigma is greater than the stability threshold epsilon of the acceleration value, the sliding window represents that the acceleration value is changed in stages, the length M of the sliding window is reduced, and the sensitivity is improved;

wherein M' belongs to [0, M ∈_max-M_min]σ is taken to be M_minFor window length, the set

The mean value of (a); and acquiring the accelerated speeds output by IMUs of a plurality of groups of vehicles in different motion states to calculate a plurality of groups of sigma, and selecting a critical value of whether the accelerated speed value is stable or not as a threshold value epsilon of stability of the accelerated speed value.

6. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 5, wherein the step 3.2 includes:

designing a turning detection factor based on an angular velocity mean value based on a measured value of the angular velocity, adopting the angular velocity mean value | omega | in the time of a sliding window N as a judgment basis of vehicle turning, and when the angular velocity mean value is used

If so, judging that the vehicle is in a turning state, otherwise, judging that the vehicle is in a straight-going state;

in the formula, ω_zIs the output of the gyroscope in the z-axis direction; TH_ωFor a set angular rate mean threshold；

If the vehicle is judged to be in a turning state, adding a robust kernel function into a visual and IMU (inertial measurement Unit) close combination fusion positioning algorithm, and adding a robust kernel function Huber kernel function into a residual error item of visual measurement in an error function to be optimized of a visual and IMU close combination nonlinear optimization problem:

the delta is a threshold value for judging whether the visual residual error item is too large, data with good straight-ahead quality of the vehicle and data with image blurring and mismatching during turning are respectively collected, and the obtained visual residual error item is compared, so that a visual residual error item threshold value delta for judging whether the visual data quality is good or bad is determined;

7. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 6, wherein the step 4.1 includes: removing satellites with satellite elevation angles lower than a satellite elevation angle threshold value to obtain satellite observation data after screening; resolving to obtain GNSS position information through a pseudo-range single-point positioning algorithm according to the screened satellite observation data;