CN114199259A

CN114199259A - Multi-source fusion navigation positioning method based on motion state and environment perception

Info

Publication number: CN114199259A
Application number: CN202210156085.7A
Authority: CN
Inventors: 孙蕊; 戴晔莹; 朱永文; 王均晖; 王长春; 林挺; 孔洁
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-03-18
Anticipated expiration: 2042-02-21
Also published as: CN114199259B

Abstract

The invention provides a multisource fusion navigation positioning method based on motion state and environment perception, which comprises the following steps of 1, collecting image frames by using a camera, eliminating dynamic points in the image frames, and obtaining static characteristic points and positions of the image frames; step 2, obtaining the position of a local camera based on the feature point position of each frame of image and a visual and IMU tight combination fusion positioning algorithm; step 3, obtaining the motion state of the vehicle based on the IMU measurement data; switching the self-adaptive algorithm according to the motion state of the vehicle; and 4, obtaining a final positioning result based on the local camera pose, the switching result of the self-adaptive algorithm, Kalman filtering and a GNSS, IMU and visual pine combined fusion positioning algorithm. The method effectively plays the role of vision in integrated navigation, and improves the precision and robustness of multi-source fusion positioning.

Description

Multi-source fusion navigation positioning method based on motion state and environment perception

Technical Field

The invention belongs to the technical field of satellite positioning navigation, and particularly relates to a multi-source fusion navigation positioning method based on motion state and environment perception.

Background

Intelligent traffic is rapidly developing in urban environments, and the automatic driving technology has provided a lot of convenience for human life. The navigation positioning technology is the key for determining high precision, continuity and reliability of vehicle positioning and is also an important precondition for path planning and vehicle control. The GNSS is an important sensor for navigation and positioning, can provide high-precision space-time information in an open environment, but can cause reduction and even loss of positioning precision due to GNSS signal blockage and shielding in urban complex environments such as urban canyons, viaducts, tunnels and the like. It is difficult to maintain good positioning accuracy in a complicated and varied environment by using a single sensor, and thus, it is necessary to make up for the defects and shortcomings by using the advantages of each of the plurality of sensors. At present, the main research direction that GNSS performance descends among the solution urban environment is navigated in the multisensor integration, GNSS INS integrated navigation is the most extensive integrated navigation system who uses, but long-time GNSS is in sheltering from the region and only relies on inertial navigation system still can produce great error, consequently need other sensors to reduce the drift of being used to lead in the short time, the camera can carry out environmental perception and visual information acquisition through the image that obtains the surrounding environment, and have and be difficult for receiving advantages such as interference, low price, be ideal vehicle-mounted sensor. Zhang Yiran et al proposed a method for locally improving the positioning accuracy of GNSS by establishing a landmark database using the geographical position of landmarks through binocular vision, but only assisted with GNSS by a camera, and did not use the camera as a main sensor for integrated navigation, and Scheiwarong et al proposed a vision-assisted INS/GNSS navigation system based on a federal filter, which uses INS to perform loose integration with GNSS and vision respectively, but when the quality of observation information and data is not good, the performance of integrated navigation will be affected. Therefore, the research on the GNSS/IMU/visual combined navigation system with reliability and robustness has great development prospect.

Interpretation of terms:

GNSS Global Navigation Satellite System (Global Navigation Satellite System)

IMU inertia Measurement Unit (Inertial Measurement Unit)

INS Inertial Navigation System (Inertial Navigation System)

VIO Visual inertia odometer (Visual inertia odometer)

SLAM synchronous positioning and mapping (simultaneous localization and mapping)

VO Visual odometer (Visual odometer)

BN Batch Normalization (Batch Normalization)

ECEF Earth-core-Earth-Fixed (Earth-Fixed )

KLT sparse optical flow Tracking Method (Kanade-Lucas-Tomasi Tracking Method)

EKF Extended Kalman Filter (Extended Kalman Filter).

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a multisource fusion navigation positioning method based on motion state and environment perception aiming at the defects of the prior art.

In order to solve the technical problem, the invention discloses a multi-source fusion navigation positioning method based on motion state and environment perception, which comprises the following steps.

Step 1, a camera is used for collecting image frames, dynamic points in the image frames are removed, and static characteristic points and positions of the static characteristic points of each image frame are obtained.

And 2, obtaining the position of the local camera based on the feature point position of each frame of image and a visual and IMU tight combination fusion positioning algorithm.

Step 3, obtaining the motion state of the vehicle based on the IMU measurement data; and switching the self-adaptive algorithm according to the motion state of the vehicle.

And 4, obtaining a final positioning result based on the local camera pose, the switching result of the self-adaptive algorithm, Kalman filtering and a GNSS, IMU and visual pine combined fusion positioning algorithm.

Further, step 1 includes step 1.1, semantic segmentation is carried out on pedestrians and vehicles of each image frame in the image frame, and the image frame collection collected by the camera is subjected to semantic segmentationOPicFirstly, respectively extracting and tracking feature points of each frame of image by using a KLT sparse optical flow method, obtaining the feature point positions, namely pixel coordinates of the feature points, and performing semantic segmentation on vehicles and pedestrians based on a convolutional neural network to obtain the feature points of each frame of image belonging to semantic categories of the vehicles or the pedestrians.

The KLT sparse optical flow method extracts feature points, tracks the position change of the same pixel point in a pixel coordinate, and is the basis of subsequent pose estimation. And through semantic segmentation and dynamic point elimination, dynamic characteristic points of people and vehicles can be removed, so that the negative influence of the dynamic points on final pose estimation is reduced, the number of points tracked by a sparse optical flow method can be reduced, and the operation efficiency is improved.

Step 1.2, pre-integrating IMU measurement data to obtain an IMU pre-integration result, namely a relative pose relationship of two adjacent frames of images, including relative position, speed and attitude change relationships of the two adjacent frames of images, wherein the measurement data comprises measurement values output by an accelerometer and a gyroscope.

Step 1.3, removing dynamic points, and projecting feature points of each frame of image belonging to semantic categories of vehicles or pedestrians to the next frame of image based on an IMU pre-integration result as predicted positions of the feature points in the next frame of image based on the IMU pre-integration result; and eliminating the dynamic points by tracking the inconsistent pixel points of the actual positions and the predicted positions of the points in the next frame of image, namely the dynamic points, and taking the pixel points of the actual positions and the predicted positions as the static points, thereby obtaining the static characteristic points and the positions of the next frame of image.

Step 1.1 divides dynamic points of all vehicles and pedestrians, but all vehicles and pedestrians are not in motion, so that the moving part of the vehicle and pedestrian characteristic points obtained in step 1.1 needs to be found through IMU pre-integration. The division is carried out firstly, so that the pre-integral projection is not needed to be carried out on all the characteristic points, objects such as trees and street lamps are static, vehicles and pedestrians which are movable are found out firstly, only the part is projected, and the calculation amount can be reduced.

Further, the convolutional neural network in step 1.1 adopts an improved U-net convolutional neural network, which replaces convolutional layers in the encoder and the decoder with a Ghost module, which has the same function as a conventional convolutional module, but belongs to lightweight operation, and the linear transform operation used in the convolutional neural network is more low in cost and less in calculation amount while ensuring accuracy.

Semantic segmentation of vehicles and pedestrians based on a convolutional neural network comprises the steps of 1.1.1, training an improved U-net convolutional neural network model by using a data set; the data set acquires road images under different scenes in advance in a mode of vehicle-mounted camera acquisition, establishes a vehicle and pedestrian data set, or directly downloads the existing public road data set; and training the improved U-net convolutional neural network model to obtain the trained improved U-net convolutional neural network.

Step 1.1.2, collecting image frames collected by a cameraOPicInputting the trained encoder of the improved U-net convolutional neural network to obtain the image frame collection collected by the cameraOPicHigh level feature vectors.

Step 1.1.3, collecting the image frames collected by the cameraOPicThe high-level feature vector is input into a decoder of the trained improved U-net convolutional neural network to obtain a feature image for semantic category segmentation.

Step 1.1.4, inputting the characteristic image for semantic category segmentation into a softmax layer to obtain an image frame collection acquired by a cameraOPicWherein each frame of image belongs to the feature points of the semantic category of vehicles or pedestrians.

Further, step 2.1, performing VIO initialization based on the feature point position of each frame of image and the IMU pre-integration result, and obtaining initial values of external parameters, feature point depths, tightly-coupled positions, speeds and postures between the camera and the IMU.

And 2.2, performing the vision and IMU tight combination nonlinear optimization of the sliding window according to initial values of external parameters, depths of characteristic points, tight coupling positions, speeds and postures between the camera and the IMU based on a nonlinear optimization mode, and solving the optimal camera pose, namely the local camera pose, in the sliding window.

Further, in the step 2.1, the pose transformation of the image frame is restored through a motion restoration structure technology based on the feature point position of each frame of image, the pose transformation of the image frame is aligned with the IMU pre-integration result, the external reference between the camera and the IMU is restored, and the transformation relation between the IMU coordinate system and the world coordinate system is obtained.

The step 2.2 comprises: obtaining a state vector of a tightly combined visual and IMU sliding windowXAnd solving a nonlinear optimization problem of the combination of vision and IMU by using a minimum reprojection error algorithm, and solving the nonlinear optimization problem by using a nonlinear optimization library Ceres solution library so as to solve the optimal camera pose in the sliding window.

The error function to be optimized for the visual and IMU tightly combined nonlinear optimization problem is the following formula.

In the formula,γ _margis prior information from marginalization, i.e. the oldest frame in the sliding window and its corresponding state quantity are marginalized as prior information;γ _Iandγ _Lrespectively an IMU residual error item and a visual residual error item;Iis the set of measurements in which all IMUs are measured,

is the firstkFrame image tokCovariance matrix of IMU pre-integration noise term between +1 frame image;Lis in the current sliding window toA set of feature points is observed as few as two times,

is a setLTo middleθIn the frame imagelA covariance matrix of visual observation noise at each pixel coordinate.

Further, in step 3, the switching of the adaptive algorithm refers to constraining and correcting the GNSS, the IMU and the visual loose combination fusion positioning algorithm and the visual and IMU tight combination fusion positioning algorithm respectively by using zero-speed correction, a robust kernel function and incomplete constraint according to the motion state of the vehicle.

Step 3 comprises a step 3.1 of designing a size based on the measurements output by the accelerometer to beNThe sliding window is used for solving an acceleration standard deviation factor, whether the vehicle state is in a static state or not is judged according to the acceleration standard deviation factor, and if the vehicle state is in the static state, static zero-speed correction is carried out in a GNSS, IMU and visual loose combined fusion positioning algorithm.

Step 3.2, if the vehicle is detected to be in a non-static state, the vehicle is in a turning state or a straight-going state; and if the vehicle is in a turning state, adding a robust kernel function into the vision and IMU tightly-combined fusion positioning algorithm.

And 3.3, if the vehicle is in a straight-going state, adding an incomplete constraint in a GNSS, IMU and vision loose combined fusion positioning algorithm to correct the accumulated error of inertial navigation.

The influence of the vehicle motion state on the positioning accuracy is large, and different constraints and corrections can be added into the algorithm by judging the difference of the vehicle motion state, so that the positioning accuracy is improved. The invention sets three motion state judgments, when the vehicle is static, the sensor IMU and the camera are not static due to self errors, which brings errors, so that the vehicle is accurately judged to be static, and the error is corrected by static zero speed correction. When the vehicle moves straight, lateral movement and vertical movement are not possible, and the use condition of incomplete restriction is met. When a vehicle turns, an image shot by the camera is blurred, the camera can generate mismatching when the feature points are matched, and the accuracy of the camera is greatly reduced, so that a robust kernel function is added, and the error influence is reduced.

Further, step 4 comprises a step 4.1 of resolving the GNSS position information.

And 4.2, initializing VIO and GNSS, and converting the local camera pose obtained in the step 2 and the GNSS position information obtained by resolving in the step 4.1 into corresponding information in a geocentric and geostationary coordinate system.

And 4.3, obtaining a global estimation pose, namely a final positioning result, based on Kalman filtering and a combined fusion positioning algorithm of the GNSS, the IMU and the visual pine.

Further, step 3.1 includes a size ofNSliding window acceleration standard deviation of

Expressed as the following equation.

Wherein,

are respectivelytTime accelerometerxyzOutput of three axes;ithe time index representing the acceleration data within the sliding window,i∈[t-N+1,t]；μis the average of all accelerations within the sliding window, where an unbiased estimate is used as the mathematical expectation.

By setting a stationary threshold value of the vehicle during movement of the vehicle

To judge whether the vehicle is stationary or not, when the acceleration standard deviation of the vehicle is

And (3) judging that the vehicle is in a zero-speed static state, adding zero-speed correction to the vehicle, namely in the Kalman filtering process of the step 4.3, using the speed solved by inertial navigation as the observed quantity of the system speed error to carry out Kalman filtering estimation, and using the updated state estimator to feed back and correct the navigation parameter result in the INS subsystem so as to finish static zero-speed correction.

Wherein the vehicle is stationary

Set to a dynamic threshold and then set to a size ofMThe sliding window is internally provided with the standard deviation of the acceleration at different moments

Vehicle stationary threshold

Window capable of sliding alongMThe slip of (2) is adaptively changed, and the zero speed state of the vehicle can be more flexibly detected.

Wherein,jrepresenting sliding windowsMStandard deviation of internal acceleration

The time index of (a) is given,j∈[t-M+1,t]，η _jis the weight of each acceleration standard deviation from the current timetThe more distant acceleration standard deviation is weighted less, the more recent is weighted more.

Defining a sliding windowMThe upper limit of size isM _maxThe lower limit isM _minFirst getM _minAs the length of the sliding windowDegree, setting a current group

Mean value ofσLess than or equal to threshold value of stability of acceleration valueεThe acceleration value representing the time is stable, and the length of the sliding window is largerMValue, when greater than threshold of smoothness of acceleration valueεWhen the acceleration value changes in stages, a smaller sliding window length should be selectedMThe sensitivity is improved.

Wherein,

,σto getM _minFor window length, the set

The mean value of (a); multiple sets of acceleration calculation sets output by IMU (inertial measurement Unit) for acquiring multiple sets of vehicles in different motion statesσSelecting the threshold value of whether the acceleration value is stable or not as the threshold value of the stability of the acceleration valueε。

When the vehicle is static, the output of the accelerometer is stable, and when the vehicle is not static, the data has larger fluctuation, and the standard deviation can reflect the stability of a group of data. In different static scene time, the stability threshold values of the acceleration values should be different due to external factors such as shaking of the vehicle body and the like, so that the standard deviation of some nearby time periods has reference value for determining the stability threshold values of the acceleration values, and the stability threshold values of the dynamic acceleration values are set.

Further, step 3.2 includes designing a turn detection factor based on the mean value of the angular velocity based on the measured value of the angular velocityBy sliding windowsNMean angular rate over time-ωI is used as the judgment basis of vehicle turning, and the average value of angular rate

If so, the vehicle is judged to be in a turning state, otherwise, the vehicle is judged to be in a straight running state.

In the formula,ω _zbeing gyroscopeszAn output in the axial direction;TH _ωis a set angular velocity mean threshold.

And if the vehicle is in a turning state, adding a robust kernel function into a visual and IMU (inertial measurement unit) close-combination fusion positioning algorithm, and adding a robust kernel function Huber kernel function into a residual error term of visual measurement in an error function to be optimized of a visual and IMU close-combination nonlinear optimization problem.

Wherein,δin order to judge whether the visual residual error item is an overlarge threshold value, respectively collecting data with good straight-going quality of the vehicle and data with fuzzy and mismatching of turning images, and comparing the obtained visual residual error items, thereby determining the threshold value of the visual residual error item for judging whether the visual data quality is good or badδ。

In the step 3.3, if it is determined that the vehicle is in a straight-ahead state, the vehicle does not generate lateral sideslip and motion along the vertical direction of the road surface in the driving process, and an incomplete constraint is added in the kalman filtering process of the step 4.3 to correct the accumulated error of inertial navigation, that is, the speeds in the lateral direction and the vertical direction are set to be zero.

When the vehicle moves straight, lateral movement and vertical movement are not possible, and the use condition of incomplete restriction is met. When the vehicle turns, the image shot by the camera is blurred, the camera can generate mismatching when the feature points are matched, and the accuracy of the camera is greatly reduced, so that a robust kernel function is used, and the error influence is reduced.

Further, step 4.1 comprises: removing satellites with satellite elevation angles lower than a satellite elevation angle threshold value to obtain satellite observation data after screening; and resolving to obtain GNSS position information through a pseudo-range single-point positioning algorithm according to the screened satellite observation data.

Step 4.3 comprises: inputting corresponding information of the local camera pose and the GNSS position information obtained in the step 4.2 in a geocentric geocoordinate system and a switching result of the adaptive algorithm in the step 3 into a Kalman filter to obtain error correction information; inputting the error correction information into the sliding window-based close-combination nonlinear optimization of the step 2.2, and carrying out feedback correction on the navigation parameters; and finally outputting the corrected pose information as a global estimation pose, namely a final positioning result.

If the positioning of the GNSS itself is problematic, the combined effect of it and the VIO is greatly affected. The signals of the satellites with elevation angles lower than the elevation angle threshold of the satellites are easy to reflect by obstacles such as buildings, so that the measurement information has large errors, and therefore the satellites with the low elevation angles need to be rejected.

The GNSS can provide an absolute positioning position of the vehicle, and under the condition of good signals, the GNSS positioning precision is very high, and no accumulated error exists, so that the GNSS and the VIO are combined for navigation to obtain a good global pose result.

Has the advantages that: 1. aiming at the problem that the GNSS/IMU combined navigation can drift under the condition that the GNSS fails for a long time, the camera and the GNSS/IMU are adopted for combined navigation, firstly, the visual information and the IMU are tightly combined based on sliding window optimization, and then, the output local pose and the positioning result of the GNSS are input into Kalman filtering for loose combination, so that the effect of the vision on the combined navigation is effectively exerted, and the precision and the robustness of multi-source fusion positioning are improved.

2. Aiming at the problem of dynamic point elimination in the combined navigation data preprocessing, the improved U-net convolutional neural network is adopted, namely, a Ghost module replaces the original traditional convolutional layer, semantic segmentation is carried out on a camera shot picture, pedestrian and vehicle characteristic points are extracted, the part of characteristic points are projected into the next frame of image based on an IMU pre-integration result, and points with different positions are found out to be dynamic points, so that efficient dynamic point elimination is effectively realized, and the calculation efficiency is optimized.

3. Aiming at the problem of accuracy reduction caused by the change of the motion state of the vehicle, the invention constructs a detection factor based on IMU acceleration and angular velocity, detects whether the motion state of the vehicle belongs to static state, turning state or straight-going state, and respectively uses zero-velocity correction, robust kernel function and incomplete restriction to restrict and correct the filtering process and nonlinear optimization aiming at the three states.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

Fig. 1 is a schematic flow chart of a multi-source fusion navigation positioning method based on motion state and environmental awareness according to an embodiment of the present application.

Fig. 2 is a schematic diagram of the Ghost module.

Fig. 3 is a schematic diagram of a network structure of an improved U-net convolutional neural network.

Fig. 4 is an image frame acquired by a camera.

FIG. 5 is a diagram illustrating the dynamic point culling effect of the prior art to FIG. 4.

Fig. 6 is a schematic diagram illustrating a dynamic point elimination effect in a multi-source fusion navigation positioning method combining a vehicle motion state and environment perception provided by an embodiment of the present application.

Fig. 7 is a schematic diagram of comparing three-dimensional position errors in a northeast coordinate system of navigation and positioning by a multi-source fusion navigation and positioning method combining vehicle motion state and environmental perception provided by the prior art and the embodiment of the present application.

Fig. 8 is a schematic diagram of comparing attitude errors of navigation and positioning by a multi-source fusion navigation and positioning method combining vehicle motion state and environmental perception provided by the prior art and the embodiment of the present application.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings.

The prior art has the following problems.

1. Currently, the most widely used GPS/INS integrated navigation system is adopted, under the condition that the GNSS fails for a long time, the IMU can generate error accumulation and drift, and the INS is high in manufacturing cost and long in production period. Meanwhile, most of the existing visual combined navigation methods are VO or visual assisted GNSS positioning, and the visual positioning effect is not fully exerted.

2. The existing GNSS/IMU/vision combined navigation has the problem of lack of dynamic environment self-adaption, dynamic points are not accurately and efficiently removed in the vision data preprocessing, and the defects that a large amount of calculation is brought to processing of all characteristic points and all states of pedestrian and vehicle characteristic points are removed exist.

3. When the motion state of the vehicle is changeable, the positioning precision of the existing GNSS/IMU/vision combined navigation is reduced, and the problems of performance reduction and insufficient reliability exist.

In order to meet the requirements of high-precision, continuity and reliability navigation and positioning of vehicles under all-weather and all-road conditions and improve the self-adaptability of the vehicles in a complex environment and motion state change, the embodiment of the invention provides a multi-source fusion navigation and positioning method combining vehicle motion state and environment perception, and as shown in figure 1, the method comprises the following steps of 1, acquiring image frames by using a camera, eliminating dynamic points in the image frames and obtaining static characteristic points and positions of the image frames.

Step 1.1, based on improved convolution neural network, semantic segmentation of pedestrians and vehicles

Image frame collection for camera acquisitionOPicFirstly, respectively extracting and tracking feature points of each frame of image by using a KLT sparse optical flow method, obtaining the position of the feature points, namely pixel coordinates of the feature points, and collecting image frames acquired by a cameraOPicAnd (3) inputting the convolution neural network to carry out semantic segmentation of vehicles and pedestrians, and finally carrying out dynamic point elimination by utilizing an IMU pre-integration result.

The KLT sparse optical flow method extracts and tracks feature points of each frame of image, specifically refer to reference: peri-epi, Marie ORB feature matching optimization based on sparse optical flow methods [ J ] applied optics, 2019,40(04): 583-.

In the step 1.1, the convolutional neural network adopts an improved U-net convolutional neural network, as shown in fig. 3, a conventional convolutional layer is replaced by a Ghost module for feature extraction, and then the image is subjected to semantic segmentation. The improved U-net convolutional neural network is of an encoder-decoder structure, a Ghost module is used for replacing a traditional convolutional layer in an encoder part to extract features, an up-sampling and convolution combination method is adopted in a decoder part to achieve deconvolution, and the Ghost module is used for replacing the traditional convolutional layer.

Step 1.1.1, training the improved U-net convolutional neural network model by using a data set. The data set acquires road images under different scenes in a mode of vehicle-mounted camera acquisition, establishes a vehicle and pedestrian data set, or directly downloads the existing public road data set, such as a KITTI data set. And training the improved U-net convolutional neural network model to obtain the trained improved U-net convolutional neural network.

Step 1.1.2, gathering the pictures collected by the cameraOPicInputting the improved U-net convolution neural network encoder after training to obtain the image frame collection collected by the cameraOPicHigh level feature vectors.

The output of the feature map is increased by using the Ghost module, and is divided into two parts, as shown in fig. 2, and the first part is a traditional convolutional layer which outputs a small number of feature maps, as shown in formula (1).

Y’ = OPic * f’ (1)

Wherein,Y’is a set of output feature maps that are,f’is the convolution kernel of the convolution operation,OPicis a collection of image frames acquired by the camera, is a conventional convolution. The number of channels of the output signature of this convolution operation is less than the number of channels of a conventional convolution layer. The second part is a lightweight linear transform layer for generating redundant feature maps, as shown in equation (2).

(2)

Wherein,

is composed ofY’Is/are as followsmThe characteristic images are used for identifying the characteristic images,Y’each characteristic map of

All operated by lightweight linear operation

To obtainsA characteristic image, here

May be 3×3 or 5×5. Therefore, the temperature of the molten metal is controlled,mthe characteristic diagram is obtained through the linear operationm×sA feature image. And setting the number of Ghost modules as 5, and performing down-sampling on the picture through maximum pooling after each Ghost module to reduce the dimension of the features. And outputting the high-level feature vectors of the image after passing through 5 Ghost modules, and transmitting the high-level feature vectors into a decoder.

And 1.1.3, decoding, namely decoding the extracted features by adopting 4 decoding modules, wherein each decoding module consists of a deconvolution module followed by a Ghost module, and the coding of the same level is the same as the convolution module of the decoding part. And the output of the upper level and the feature graph output by the corresponding encoder are used as the input of the decoding module, the features of the lower level output by the encoder are fused, and all the features are fully utilized to obtain the feature image for semantic category segmentation.

Step 1.1.4, inputting the characteristic image for semantic category segmentation into a softmax layer so as to obtain an image frame collection acquired by a cameraOPicThe probability that each pixel of each frame of image belongs to the semantic category of "vehicle" or "pedestrian", as shown in equation (3).

(3)

Wherein,s _d(x) Is each pixel pointxCorresponding semantic categoriesdThe score of (a) is obtained,Dis the number of semantic categories, here two categories "pedestrian" and "vehicle",D=2。Q _d(x) Is a semantic categorydTo pixel pointxTo maximize the most likely outcome while suppressing the probability of other classes. According to multiple tests, multiple groups of images of pedestrians and vehicles are collected, multiple groups of probabilities of semantic categories can be obtained through the improved U-net network, for example, the probability of the pedestrian category obtained by the pedestrian feature points in the images and the probability of the vehicle category obtained by the vehicle feature points are both greater than 0.8, and then the probability experience threshold is setQ _thrIs 0.8. For a certain feature point, it will find out two probabilities, one is the probability that it belongs to a pedestrianQ ₁ （x）And secondly, the probability that it belongs to a vehicleQ ₂ （x）The obtained probability and the determined probability empirical threshold value are comparedQ _thrIn comparison, ifQ ₁ （x）Greater than a probabilistic empirical thresholdQ _thrThen the pixel is determined to belong to the "pedestrian" category, if soQ ₂ （x）Is greater thanQ _thrIt is deemed to belong to the "vehicle" category.

The measurement data of the accelerometer and gyroscope are shown in equation (4) and equation (5).

(4)

(5)

Wherein,

is thattA measurement value output by the accelerometer and output by the IMU at the moment;a _tis the true value of acceleration;b _ais the bias of the accelerometer;n _ais the accelerometer noise term;wwhich represents a world coordinate system, is,g ^wis the gravitational acceleration in the world coordinate system;

the rotation matrix from a world coordinate system to a carrier coordinate system can be obtained through the mechanical arrangement of inertial navigation;

is a gyroscope measurement;ω _tis the true value of angular velocity;b _gis the gyroscope bias;n _gis the gyroscope noise term.

For image frames k and k +1, the above two equations are integrated, and the relative position, velocity, and attitude changes between times tk and tk +1 can be calculated. The pre-integration equations are shown in equations (6) to (11).

(6)

(7)

(8)

（9）

(10)

(11)

Wherein, Deltat _kIs thatt _kAndt _k+1the time interval between the start of the cycle,Ba carrier coordinate system is represented and,

is a rotation matrix from a carrier coordinate system to a world coordinate system;

、

、

and

、

、

respectively, the image frame under the world coordinate systemkAnd image framek+1 position, velocity, and attitude;

to representtTime of day camera relative tokThe pose variation of the frame; Ω is a quaternion representing the attitude;

、

and

respectively representtTrue value of time angular velocityω _tIn a carrier coordinate systemxyzComponent on three axes, i.e.

。

And 1.3, removing dynamic points, based on an IMU pre-integration result, namely a relative pose relationship between two continuous frames, including relative position, speed and pose change relationships of two adjacent frames of images, projecting part of feature points with semantic categories of 'vehicles' and 'pedestrians' into the next frame of image based on the pose change, and taking the part of feature points as predicted positions of the part of feature points in the next frame of image. And eliminating the dynamic points to obtain the static characteristic points and the positions thereof of the next frame image by tracking the inconsistent pixel points of the actual positions and the predicted positions of the points in the next frame image, namely the dynamic points, and taking the pixel points of which the actual positions and the predicted positions are consistent as the static points.

Fig. 4 is an image frame acquired by a camera, fig. 5 is a dynamic point removing effect of the prior art on fig. 4, fig. 6 is a dynamic point removing effect of the embodiment of the present application, and it can be seen from fig. 6 that the dynamic point removing of the embodiment of the present application can completely divide the part of pixels of the "person" and completely remove the pixel points of the moving person, whereas the prior art cannot completely remove all dynamic points, and the moving "person" in fig. 5 still has partial dynamic feature point residues.

Step 2.1, VIO initialization.

The feature point matching result after the dynamic point is removed restores the pose transformation of the image frame through a motion restoration structure technology, restores the initial values of the depth and position speed postures of the external reference feature points of the Camera, specifically refer to the reference Yang Z, Shen S. cellular Visual-initial State Estimation With Online Initialization and Camera-IMU external Calibration [ J ]. IEEE Transactions on Automation Science and Engineering, 2017, 14(1):1-13. The initialization process is completed and all the metric values are entered into the next step.

And 2.2, solving the optimal camera pose in the sliding window, namely the local camera pose, based on the nonlinear optimization of the sliding window.

The state vectors of the tight combination sliding window are shown in equations (12) to (14).

(12)

(13)

(14)

Wherein,Xis a state vector;χ _his the first in the sliding windowhThe state of the IMU at each moment, includinghPosition of IMU in world coordinate system when frame image

Speed, velocity

And posture

Here, theh∈[0,kf]，kfIs the total number of key frames extracted from the image frames acquired by the camera in step 1.1;fpis the total number of features in the sliding window;λ _lis observed for the first timelThe inverse depth of a feature point in the camera coordinate system, which is the inverse of the depth obtained by the initialization process, wherel∈[0, fp]；

Is an external parameter of the camera, C is a camera coordinate system,

and

respectively an origin translation vector and a rotation attitude vector between a camera coordinate system and a carrier coordinate system.

And solving the optimal camera pose by using a minimum reprojection error method, wherein the to-be-optimized error function of the nonlinear optimization problem is shown as a formula (15).

(15)

is the firstkFrame image tokCovariance matrix of IMU pre-integration noise term between +1 frame image;Lis a set of feature points observed at least twice in the current sliding window,

The IMU residual term is shown in equation (16).

(16)

In the formula,

and

are respectively the firstkFrame image tok+1 frame inter-image position and velocity three-dimensional error state representation;

is the firstkFrame image tok+1 frame of three-dimensional error state representation of the attitude quaternion between the images;Δb _aandΔb _gerror states for accelerometer and gyroscope bias, respectively.

Suppose thatlA characteristic point isυObserved for the first time in the frame image, the second timeθFrame image (θ＞υ) The term of the visual residual of (a) is shown in equation (17).

(17)

(18)

(19)

Wherein,

is the first observation to appear atυIn the frame imagelThe coordinates of the characteristic pixel points;

is the firstθObserved pixel coordinates of the same feature point in the frame image;

is the pixel coordinates before back projection;

is a back projection function, which projects the residual vector onto a unit sphere, divides the unit sphere by the inverse depth to obtain the coordinates of the pixel points in a camera coordinate system, projects the coordinates onto a normalized tangent plane,φ ₁andφ ₂are two arbitrarily chosen orthogonal bases in the tangential plane;

the rotation matrix from the camera coordinate system to the carrier coordinate system is known as the external parameter of the camera.

And

is the firstυA rotation matrix and an origin translation vector from the carrier coordinate system to the world coordinate system when the image is framed,

and

is the world coordinate system toθAnd the rotation matrix and the origin translation vector of the carrier coordinate system during image frame can be obtained from the camera attitude at the moment.

Solving the nonlinear optimization problem by using a nonlinear optimization library Ceres solution library to ensure that the error function (15) to be optimized obtains the minimum value, namely the optimal carrier pose estimation is obtainedp _vioAndq _vioand inputting the data to step 4 to be loosely combined with the GNSS.

And (3) inputting the acceleration and the angular velocity measured by the IMU into the step (3) to perform the switching of the self-adaptive algorithm.

Step 3.1, designing a magnitude based on the measured value of the acceleration toNSliding window acceleration standard deviation factor

And judging whether the vehicle state is in a static state or not according to the acceleration standard deviation factor, and if the vehicle state is in the static state, performing static zero-speed correction in a GNSS, IMU and visual loose combined fusion positioning algorithm.

A size ofNSliding window acceleration standard deviation of

As shown in equations (20) and (21).

(20)

(21)

Wherein,

are respectivelytTime accelerometerxyzOutput of three axes;Nis the length of the sliding window, where the window length is set to 100 data,N=100；ithe time index representing the acceleration data within the sliding window,i∈[t-N+1,t]；μis the average of all accelerations within the sliding window, as a mathematical expectation, here using an unbiased estimate,

is thattCalculated by sliding window of timeStandard deviation of acceleration.

During the movement of the vehicle, the static threshold value of the vehicle can be set

To determine whether the vehicle is stationary.

(22)

In which the vehicle is stationary

Then not set to a fixed threshold and then set to a size ofMThe window of (2). Acceleration standard deviation at different times within the window

Vehicle stationary threshold

The zero speed state of the vehicle can be more flexibly detected by changing along with the sliding of the window in a self-adaptive manner.

(23)

(24)

The time index of (a) is given,j∈[t-M+1,t]，η _jthe weights are weights of the respective standard deviations, and the weights of the standard deviations farther from the current time are smaller, and the weights of the standard deviations closer to the current time are larger.

Selecting a suitable sizeMIs very important. Therefore, a window size self-adaption method is designed, and the requirements of the sensitivity and the precision of the sliding window are met. Defining the upper limit of the size of the sliding window asM _maxThe lower limit isM _minFirst getM _minAs the window length. Set a current group

Mean value ofσLess than or equal to threshold value of stability of acceleration valueεThe acceleration value representing the time is stable, and a larger value is selectedMThe value, when greater than the threshold, represents a stepwise change in acceleration value, and a smaller window length should be selected to increase sensitivity.

(25)

(26)

Wherein,

，σto getM _minFor window length, the set

The mean value of (a); multiple sets of acceleration calculation sets output by IMU (inertial measurement Unit) for acquiring multiple sets of vehicles in different motion statesσSelecting the threshold value of whether the acceleration value is stable or not as the threshold value of the stability of the acceleration valueε. Through test experiments, the result that the window length is 20-50 is less different, so the method has the advantages of simple operation, low cost and low costM _maxTaking out the 50 percent of the raw materials,M _mintaking 20; threshold for acceleration value stationarityεSet to 0.05.

And when the acceleration value of the vehicle meets the formula (22), judging that the vehicle is in a zero-speed static state, adding zero-speed correction to the vehicle, namely in the Kalman filtering process of the step 4.3, using the velocity solved by inertial navigation as the observed quantity of the system velocity error, performing Kalman filtering estimation, and using the updated state estimator to feedback and correct the navigation parameter result in the INS subsystem to finish the static zero-speed correction.

Step 3.2, if the vehicle is detected to be in a non-static state, entering a linear motion detection part, wherein a turning detection factor based on the angular velocity mean value is designed based on the measured value of the angular velocity, and a sliding window is adoptedNMean angular rate over time-ωAnd l is used as the judgment basis of the turning of the vehicle.

(27)

In the formula, [ mu ] fωL is the mean value of the angular velocity,Nthe number of data in the sliding window;ω _zbeing gyroscopeszAn output in the axial direction;TH _ωfor a set angular rate mean threshold, set here at 2 °/s. When in usezAxial angular velocity exceedingTH _ωIf so, that is, if equation (27) is satisfied, it is determined that the vehicle is in a turning state, otherwise, it is determined that the vehicle is traveling straight.

When a vehicle turns, because the vision generates larger errors due to image blurring, and the matching of characteristic points is easy to generate errors, a robust kernel function is added in the process of tightly combining the vision and the IMU, so that the tolerance of the visual navigation is improved, a robust kernel function Huber kernel function is added in a visual residual error term in the formula (15), and when the residual error term is larger, the original quadratic function is changed into a linear function, so that the influence of error data is reduced.

(28)

(29)

Wherein,δto judge the visual disabilityRespectively collecting data with good straight-going quality of the vehicle and data with fuzzy and mismatching of the turning image, comparing the obtained visual residual error items, and determining the threshold of the visual residual error item for judging the quality of the visual dataδ. For example, the residual term values obtained from the data with better quality are all less than 0.5, and the residual term values when the image is blurred are all more than 0.5, the residual term values are setδThe value of (A) is 0.5.

And 3.3, if the algorithm judges that the vehicle is in a straight-ahead state, the vehicle cannot generate transverse sideslip and movement in the vertical direction of the road surface in the running process, and incomplete constraint is added in the Kalman filtering process of the step 4.3 to correct the accumulated error of inertial navigation, namely, the speeds in the transverse direction and the vertical direction are set to be zero.

Step 4.1, GNSS position information resolving

And removing the satellites with the satellite elevation angles lower than the satellite elevation angle threshold value. Empirically setting the satellite elevation threshold toα _thre=15 °, ergodic satellite elevation angleα _sat(sat=1,2,…,num _vis)，num _visThe number of visible stars for the current epoch, if satisfiedα _sat>α _threThen the satellite is reservedsatRelevant observation data, otherwise, the satellite is deletedsatRelevant observations. And solving the screened observation information through a pseudo-range point positioning algorithm to obtain GNSS position information.

And 4.2, initializing the VIO/GNSS, and converting the local camera pose obtained in the step 2 and the GNSS position information obtained by resolving in the step 4.1 into corresponding information in a geocentric and geostationary coordinate system.

The pose estimate of the VIO is in the local coordinate system, so it must be converted to the earth-centered earth-fixed ECEF coordinate system. (ii) recording the initial one minute GNSS position information of the start point stationary in step 4.1, i.e. the origin of the northeast coordinate system and the carrier coordinate system: (The same origin) in the ECEF coordinate system

. Correcting course angle bias between carrier coordinate system and northeast coordinate system using low noise doppler measurementsψ. The rotation matrix from the northeast to the ECEF coordinate system is shown in equation (30).

(30)

Wherein n is the northeast coordinate system; e is the ECEF coordinate system; alpha and beta are respectively the latitude and longitude of the origin of the northeast coordinate system, the rotation matrix from the carrier coordinate system to the northeast coordinate system

As shown in equation (31).

(31)

Therefore, the transformation relationship from the carrier coordinate system to the ECEF is shown in equation (32).

(32)

Wherein,

are the coordinates of the receiver or GNSS in ECEF,

is the difference between the position of the receiver and the origin of the carrier coordinate system in the carrier coordinate system. To this end, GNSS location information

And navigation parameters of the VIO outputp _vioAndq _viounified under the ECEF coordinate system.

Step 4.3, GNSS/IMU/visual loose integrated navigation based on Kalman filtering

And (3) inputting corresponding information of the local camera pose and the GNSS position information obtained in the step (4.2) in a geocentric and geostationary coordinate system and a switching result of the adaptive algorithm in the step (3) into a Kalman filter, and outputting to obtain corresponding error correction information. And inputting the output error correction information into the sliding window-based tight combination nonlinear optimization of the IMU/visual tight combination process in the step 2.2 to perform feedback correction on the navigation parameters, and finally outputting the corrected pose information as a final positioning result.

Fig. 7, fig. 8, tables 1 and table 2 are schematic diagrams of comparison of three-dimensional position errors and attitude errors in a northeast coordinate system of a navigation and positioning method of multi-source fusion navigation and positioning in combination with vehicle motion state and environmental perception and a root mean square error table, which are provided by the prior art and embodiments of the present invention.

Table 1 three-dimensional position root mean square error table of the present application and prior art

Table 2 root mean square error table of attitude of the present application and prior art

The invention provides a multisource fusion navigation positioning method based on motion state and environment perception, and a plurality of methods and ways for implementing the technical scheme are provided, the above description is only a specific implementation manner of the invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and decorations can be made without departing from the principle of the invention, and these improvements and decorations should also be regarded as the protection scope of the invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A multi-source fusion navigation positioning method based on motion state and environment perception is characterized by comprising the following steps:

step 1, collecting image frames by using a camera, eliminating dynamic points in the image frames, and obtaining static characteristic points and positions of the static characteristic points of each image frame;

step 2, obtaining the position of a local camera based on the feature point position of each frame of image and a visual and IMU tight combination fusion positioning algorithm;

step 3, obtaining the motion state of the vehicle based on the IMU measurement data; switching the self-adaptive algorithm according to the motion state of the vehicle;

2. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 1, wherein the step 1 comprises:

step 1.1, semantically segmenting pedestrians and vehicles of each image frame in the image frames, and collecting image frame sets collected by a cameraOPicFirstly, respectively extracting and tracking feature points of each frame of image by using a KLT sparse optical flow method to obtain feature point positions, and performing semantic segmentation on vehicles and pedestrians on the basis of a convolutional neural network to obtain feature points of each frame of image belonging to semantic categories of the vehicles or the pedestrians;

step 1.2, performing pre-integration on IMU measurement data to obtain an IMU pre-integration result, namely a relative pose relationship of two adjacent frames of images, including relative position, speed and attitude change relationships of the two adjacent frames of images, wherein the measurement data comprises measurement values output by an accelerometer and a gyroscope;

3. The multi-source fusion navigation positioning method based on motion state and environmental perception according to claim 2, wherein the convolutional neural network in step 1.1 adopts a modified U-net convolutional neural network, a Ghost module is included in the modified U-net convolutional neural network encoder and decoder, and semantic segmentation of vehicles and pedestrians based on the convolutional neural network includes:

step 1.1.1, training an improved U-net convolutional neural network model by using a data set; the data set acquires road images under different scenes in advance in a mode of vehicle-mounted camera acquisition, establishes a vehicle and pedestrian data set, or directly downloads the existing public road data set; training the improved U-net convolutional neural network model to obtain a trained improved U-net convolutional neural network;

step 1.1.2, collecting image frames collected by a cameraOPicInputting the trained encoder of the improved U-net convolutional neural network to obtain the image frame collection collected by the cameraOPicHigh-level feature vectors of (1);

step 1.1.3, collecting the image frames collected by the cameraOPicInputting the trained improved U-net convolutional neural network decoder by the high-level feature vector to obtain a feature image for semantic category segmentation;

4. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 3, wherein the step 2 comprises:

step 2.1, performing VIO initialization based on the feature point position of each frame of image and the IMU pre-integration result to obtain initial values of external parameters, the depth of the feature point, the tightly coupled position, the speed and the posture between the camera and the IMU;

5. The multi-source fusion navigation positioning method based on motion state and environmental perception according to claim 4, characterized in that in step 2.1, based on the feature point position of each frame of image, the pose transformation of the image frame is restored through a motion restoration structure technology, and then the pose transformation of the image frame is aligned with the IMU pre-integration result, so as to restore the external reference between the camera and the IMU and obtain the transformation relation between the IMU coordinate system and the world coordinate system;

the step 2.2 comprises: obtaining a state vector of a tightly combined visual and IMU sliding windowXSolving a nonlinear optimization problem of tight combination of vision and IMU by using a minimum reprojection error algorithm, and solving the nonlinear optimization problem by using a nonlinear optimization library Ceres solution library so as to solve the optimal camera pose in a sliding window;

the error function to be optimized for the tight combination of vision and IMU nonlinear optimization problem is:

in the formula,γ _margis a priori information from marginalization, i.e. the oldest frame in the sliding window and its corresponding state quantity are marginalizedAs prior information;γ _Iandγ _Lrespectively an IMU residual error item and a visual residual error item;Iis the set of measurements in which all IMUs are measured,

6. The multi-source fusion navigation positioning method based on motion state and environmental perception according to claim 5, wherein in step 3, the switching of the adaptive algorithm refers to respectively constraining and modifying the GNSS, IMU and visual loose combination fusion positioning algorithm and the visual and IMU tight combination fusion positioning algorithm with zero-speed modification, robust kernel function and incomplete constraint according to the motion state of the vehicle;

the step 3 comprises the following steps:

step 3.1, designing a size based on the measured value output by the accelerometer to beNThe sliding window is used for solving an acceleration standard deviation factor, whether the vehicle state is in a static state or not is judged according to the acceleration standard deviation factor, and if the vehicle state is in the static state, static zero-speed correction is carried out in a GNSS, IMU and visual loose combined fusion positioning algorithm;

step 3.2, if the vehicle is detected to be in a non-static state, the vehicle is in a turning state or a straight-going state; if the vehicle is in a turning state, adding a robust kernel function into a visual and IMU (inertial measurement unit) close-combination fusion positioning algorithm;

7. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 6, wherein the step 4 comprises:

step 4.1, resolving GNSS position information;

step 4.2, initializing VIO and GNSS, and converting the local camera pose obtained in the step 2 and the GNSS position information obtained by resolving in the step 4.1 into corresponding information under a geocentric geostationary coordinate system;

8. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 7, wherein the step 3.1 includes:

a size ofNSliding window acceleration standard deviation of

Expressed as the following equation:

wherein,

are respectivelytTime accelerometerxyzOutput of three axes;ithe time index representing the acceleration data within the sliding window,i∈[t-N+1,t]；μis the average of all accelerations within the sliding window, as a mathematical expectation, where an unbiased estimate is used;

during the course of the movement of the vehicle,by setting a vehicle standstill threshold

Judging that the vehicle is in a zero-speed static state, adding zero-speed correction to the vehicle, namely in the Kalman filtering process of the step 4.3, using the speed solved by inertial navigation as the observed quantity of system speed errors to carry out Kalman filtering estimation, and using the updated state estimator to feed back and correct the navigation parameter result in the INS subsystem so as to finish static zero-speed correction;

wherein the vehicle is stationary

Vehicle stationary threshold

Window capable of sliding alongMThe slip of (2) is adaptively changed, and the zero speed state of the vehicle can be more flexibly detected:

The time index of (a) is given,j∈[t-M+1,t]，η _jis the weight of each acceleration standard deviation from the current timetThe more distant acceleration standard deviation is weighted less, the more recent is weighted more;

defining a sliding windowMThe upper limit of size isM _maxThe lower limit isM _minFirst getM _minAs the length of the sliding window, a current group is set

Mean value ofσLess than or equal to threshold value of stability of acceleration valueεThe acceleration value representing the period of time is more stable, and the length of the sliding window is increasedMValue, when greater than threshold of smoothness of acceleration valueεWhen the acceleration value changes in stages, the length of the sliding window is reducedMThe sensitivity is improved;

wherein,

，σto getM _minFor window length, the set

9. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 8, wherein the step 3.2 includes:

designing a turning detection factor based on the mean value of the angular velocity based on the measured value of the angular velocity, and adopting a sliding windowNMean angular rate over time-ωI is used as the judgment basis of vehicle turning, and the average value of angular rate

If so, judging that the vehicle is in a turning state, otherwise, judging that the vehicle is in a straight running state;

in the formula,ω _zbeing gyroscopeszAn output in the axial direction;TH _ωis a set angular velocity mean threshold;

if the vehicle is judged to be in a turning state, adding a robust kernel function into a visual and IMU (inertial measurement unit) close-combination fusion positioning algorithm, and adding a robust kernel function Huber kernel function into a residual error term of visual measurement in an error function to be optimized of a visual and IMU close-combination nonlinear optimization problem:

wherein,δin order to judge whether the visual residual error item is an overlarge threshold value, respectively collecting data with good straight-going quality of the vehicle and data with fuzzy and mismatching of turning images, and comparing the obtained visual residual error items, thereby determining the threshold value of the visual residual error item for judging whether the visual data quality is good or badδ；

10. The multi-source fusion navigation positioning method based on motion state and environment perception according to claim 9, wherein the step 4.1 includes: removing satellites with satellite elevation angles lower than a satellite elevation angle threshold value to obtain satellite observation data after screening; resolving to obtain GNSS position information through a pseudo-range single-point positioning algorithm according to the screened satellite observation data;