CN111928857A

CN111928857A - Method and related device for realizing SLAM positioning in dynamic environment

Info

Publication number: CN111928857A
Application number: CN202011098360.1A
Authority: CN
Inventors: 单国航; 贾双成; 朱磊; 吴志洋; 李倩
Original assignee: Mushroom Car Union Information Technology Co Ltd
Current assignee: Mushroom Car Union Information Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2020-11-13
Anticipated expiration: 2040-10-14
Also published as: CN111928857B

Abstract

The application discloses a method and a related device for realizing SLAM positioning in a dynamic environment. The method comprises the following steps: acquiring at least two frames of pictures acquired by a monocular automobile data recorder in the driving process of a vehicle; identifying and calibrating a target object in each frame of picture; acquiring characteristic points which are positioned outside a target object calibration range on each frame of picture; matching the feature points of each frame of picture to obtain a first feature point set which is successfully matched; constructing three-dimensional space coordinates of the first characteristic point set; acquiring a next frame of picture, identifying a calibrated target object and acquiring feature points; determining the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set; and determining the position of the vehicle according to the pose of the monocular automobile data recorder. The method and the device can remove the interference of the dynamic object on the three-dimensional space construction under monocular vision so as to realize the instant positioning of the vehicle, and obtain the moving track of the vehicle through the continuous updating positioning.

Description

Method and related device for realizing SLAM positioning in dynamic environment

Technical Field

The present application relates to the field of navigation technologies, and in particular, to a method and a related apparatus for implementing SLAM positioning in a dynamic environment.

Background

SLAM (Simultaneous Localization And Mapping) is mainly used for solving the problem of performing positioning navigation And Mapping when a mobile device runs in an unknown environment. For positioning and drawing, data acquisition is needed firstly, and most of the prior art adopt a binocular camera or a laser sensor for data acquisition. However, for a device with only a monocular camera, such as a monocular automobile data recorder, when the instant positioning navigation and the map construction are required, the conventional method cannot be used for realizing the instant positioning navigation and the map construction.

In addition, the video images acquired by the driving recorder often contain other moving objects (such as vehicles or pedestrians), and the positions and states of the moving objects generally change in real time, so that the stability is poor, and the map construction is not facilitated. Therefore, how to effectively realize instant positioning and map construction by using the video image acquired by the monocular automobile data recorder in a dynamic environment is a very worthy technical problem.

Disclosure of Invention

The application provides a method and a related device for realizing SLAM positioning in a dynamic environment, which can remove the interference of a dynamic object on the construction of a three-dimensional space in monocular vision so as to realize the instant positioning of a vehicle, and can obtain the moving track of the vehicle by continuously updating the positioning.

A first aspect of the present application provides a method for implementing SLAM positioning in a dynamic environment, including:

acquiring at least two frames of pictures acquired by a monocular automobile data recorder in the driving process of a vehicle;

identifying the target object in the at least two frames of pictures, and calibrating the target object in each frame of picture;

acquiring a characteristic point which is positioned outside the calibration range of the target object on each of the at least two pictures;

matching the obtained feature points of the at least two frames of pictures to obtain a first feature point set successfully matched in the at least two frames of pictures;

constructing three-dimensional space coordinates of the first feature point set;

acquiring a next frame of picture acquired by the monocular automobile data recorder, identifying a target object in the next frame of picture, and acquiring feature points of the next frame of picture;

determining the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set;

and determining the position of the vehicle according to the pose of the monocular automobile data recorder when the next frame of picture is shot.

As an optional implementation manner, in the first aspect of the present application, the identifying a target object in the at least two frames of pictures includes:

and identifying the target object in the at least two pictures by using the yolo network.

As an optional implementation manner, in the first aspect of the present application, the constructing three-dimensional space coordinates of the first feature point set includes:

calculating a rotation matrix and a translation matrix between the at least two frames of pictures by using the first feature point set and adopting epipolar constraint;

and generating the three-dimensional space coordinates of the first characteristic point set according to the rotation matrix and the translation matrix between the at least two frames of pictures.

As an optional implementation manner, in the first aspect of the present application, the method further includes:

carrying out iterative processing on each frame of picture subsequently acquired by the monocular automobile data recorder to obtain the pose of the monocular automobile data recorder when each frame of picture is shot;

and determining the moving track of the vehicle according to the pose of the monocular automobile data recorder when the pictures of the frames are shot.

As an optional implementation manner, in the first aspect of the present application, the determining, according to the feature points of the next frame of picture and the three-dimensional space coordinates of the first feature point set, the pose of the monocular automobile data recorder when the next frame of picture is taken includes:

matching the next frame of picture with each of the at least two frames of pictures to respectively obtain a feature point set of the next frame of picture successfully matched with each frame of picture;

according to the feature point set successfully matched with the next frame of picture and each frame of picture, determining feature points in the next frame of picture successfully matched with at least a preset number of frames of pictures in the at least two frames of pictures at the same time as a second feature point set;

determining the three-dimensional space coordinate of the second characteristic point set according to the three-dimensional space coordinate of the first characteristic point set;

and determining the pose of the monocular automobile data recorder when the next frame of picture is shot by using the three-dimensional space coordinates of the second characteristic point set and the positions of the characteristic points, positioned on the next frame of picture, in the second characteristic point set.

utilizing the residual characteristic point set obtained after the second characteristic point set is removed from the characteristic point set successfully matched with the next frame of picture and each frame of picture, and calculating the three-dimensional space coordinates of the residual characteristic point set by adopting triangulation;

and adjusting the three-dimensional space coordinates of the first characteristic point set and the three-dimensional space coordinates of the second characteristic point set by using the three-dimensional space coordinates of the residual characteristic point sets.

A second aspect of the present application provides an apparatus for implementing SLAM positioning in a dynamic environment, including:

the first acquisition unit is used for acquiring at least two frames of pictures acquired by the monocular automobile data recorder in the driving process of the vehicle;

the recognition unit is used for recognizing the target object in the at least two frames of pictures and calibrating the target object in each frame of picture;

the second obtaining unit is further used for obtaining a feature point which is positioned outside the target object calibration range on each of the at least two pictures;

the matching unit is used for matching the acquired feature points of the at least two frames of pictures to obtain a first feature point set which is successfully matched in the at least two frames of pictures;

the construction unit is used for constructing three-dimensional space coordinates of the first characteristic point set;

the first obtaining unit is further configured to obtain a next frame of picture acquired by the monocular automobile data recorder;

the identification unit is further used for identifying and calibrating the target object in the next frame of picture;

the second obtaining unit is further configured to obtain a feature point in the next frame of picture, where the feature point is located outside the target object calibration range;

the determining unit is used for determining the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set;

the determining unit is further configured to determine the position of the vehicle according to the pose of the monocular automobile data recorder when the next frame of picture is taken.

As an optional implementation manner, in the second aspect of the present application, the manner of identifying the target object in the at least two frames of pictures by the identifying unit is specifically that:

A third aspect of the present application provides an apparatus for implementing SLAM positioning in a dynamic environment, including:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.

A fourth aspect of the present application provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform a method as described above.

According to the technical scheme, two or more than two frames of pictures collected by a monocular automobile data recorder in sequence during vehicle running are obtained, target object identification is carried out on each frame of picture, the identified target object is calibrated, feature points of each frame of picture, which are located outside the calibration range of the target object, are obtained, the feature points of each frame of picture are matched, a first feature point set which is successfully matched is obtained, and three-dimensional space coordinates are constructed by utilizing the first feature point set; further, a next frame of picture acquired by the monocular automobile data recorder can be acquired, the feature points of the next frame of picture are acquired after the target object in the next frame of picture is identified and calibrated, the pose of the monocular automobile data recorder when the next frame of picture is shot is determined according to the feature points of the next frame of picture and the three-dimensional space coordinates of the first feature point set, and then the position of the vehicle when the next frame of picture is shot can be acquired. According to the technical scheme, dynamic object recognition can be carried out on the pictures collected under monocular vision, interference of the dynamic objects on three-dimensional space construction is removed, and instant positioning of the vehicle is achieved, so that moving tracks of the vehicle can be obtained through continuous updating and positioning of the pictures collected subsequently.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application, as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

Fig. 1 is a flowchart illustrating a method for implementing SLAM positioning in a dynamic environment according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating identification and calibration of a target object on a picture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a translation matrix and rotation matrix algorithm shown in an embodiment of the present application;

fig. 4 is a schematic diagram of a vehicle movement track obtained by implementing SLAM positioning in a dynamic environment according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an apparatus for implementing SLAM positioning in a dynamic environment according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for implementing SLAM positioning in another dynamic environment according to an embodiment of the present application.

Detailed Description

Preferred embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present application provides a method for implementing SLAM positioning in a dynamic environment. As shown in fig. 1, the method may comprise at least the following steps:

110. and acquiring at least two frames of pictures acquired by the monocular automobile data recorder in the driving process of the vehicle.

In this application embodiment, monocular vehicle event data recorder can set up in the front windshield department of vehicle. During the running process of the vehicle, the monocular automobile data recorder can be used for collecting the video data in front of the vehicle. In order to obtain a picture, the acquired video data needs to be decimated. Generally, the frame rate of the video is 30 frames per second, and the video can be decimated according to a preset rule, so as to obtain the picture. The at least two frames of pictures can be two or more continuous frames of pictures acquired by the monocular automobile data recorder in time sequence. Specifically, the at least two frames of pictures may be real-time pictures obtained by frame extraction of a real-time video acquired by the monocular video recorder during the driving process of the vehicle, or several frames of pictures in a picture sequence of one frame obtained by frame extraction of the whole video acquired by the monocular video recorder during the whole driving process of the vehicle, and are not limited herein.

It can be understood that, in the embodiment of the present application, a monocular automobile data recorder on a vehicle is taken as an example for description, and the monocular automobile data recorder may also be other monocular devices on the vehicle, such as a monocular camera, a mobile phone, and other devices capable of acquiring a monocular video. In addition, the monocular device may be disposed at the head of the vehicle to capture the video in front of the vehicle, or may be disposed at the tail of the vehicle to capture the video behind the vehicle, which is not limited herein.

120. And identifying the target object in the at least two frames of pictures, and calibrating the target object in each frame of picture.

In the embodiment of the application, when the driving recorder collects the video images in front of or behind the vehicle, some dynamic objects such as other driving vehicles, people and animals walking on sidewalks or two sides of roads, or moving objects such as airplanes or kites flying on the sky are often collected. The target object can be regarded as one or more kinds of preset dynamic objects. Since the position and the posture of the dynamic object may change in real time, if a three-dimensional space is constructed using the characteristics of the dynamic object, the accuracy of the constructed three-dimensional space is poor. Therefore, in order to ensure the accuracy of the three-dimensional space construction, it is necessary to eliminate the influence of the dynamic object on the three-dimensional space construction, so that all the dynamic objects on the picture need to be found first.

Specifically, all the dynamic objects in the picture can be identified, and the identified dynamic objects are calibrated. Taking one of the frames of pictures acquired by the monocular automobile data recorder in the driving process of the vehicle shown in fig. 2 as an example, the pictures shown in fig. 2 are subjected to dynamic object identification, all vehicles and pedestrians on the pictures are identified, and the identified vehicles and pedestrians are calibrated by using a rectangular frame so as to calibrate the position and size of each dynamic object. The position and size of the dynamic object may be determined according to the coordinate positions of the four vertices of the rectangular frame, or the range of the rectangular frame may be determined by a vector with one of the vertices as an origin, and the like. It is understood that the dynamic object may be calibrated by using a circle, an oval ring or other regular or irregular shapes, which is not limited herein.

In an optional embodiment, a specific implementation of the step 120 of identifying the target object in the at least two pictures may include the following steps:

11) and identifying the target object in the at least two pictures by using the yolo network.

Yolo is an object recognition and positioning algorithm based on a deep neural network. Before the identification operation is carried out, a training sample can be constructed in advance, specifically, a plurality of sample pictures can be collected, after the target object in the sample pictures is calibrated, the calibrated sample pictures are trained to obtain a sample model, and then the at least two frames of pictures are input into the sample model to be learned so as to identify all the target objects on the picture. It is understood that the convolutional neural network CNN algorithm, R-CNN algorithm, or other algorithms may also be used to identify the target object in the picture, which is not limited herein.

130. And acquiring the characteristic points outside the calibration range of the target object on each of the at least two pictures.

In the embodiment of the present application, the feature points on the picture may be used to identify some marker objects on the picture, and generally, a point where the gray value on the picture changes drastically or a point with a large curvature on the edge of the picture (e.g., an intersection of two edges) is regarded as a feature point of the picture. For better subsequent picture matching, stable points in the picture that do not change with the movement, rotation or illumination change of the camera can be generally selected as feature points. Therefore, points on other areas except the target object calibration range on each frame of picture can be selected as the feature points. For example, in fig. 2, feature points in a fixed building (such as a roadside house), a fixed tree, a billboard, or the like may be selected, and feature points on a target object such as a vehicle or a pedestrian, the sky, or the ground may not be selected.

140. And matching the acquired feature points of the at least two frames of pictures to obtain a first feature point set successfully matched in the at least two frames of pictures.

In the embodiment of the present application, the at least two pictures may include the same object (such as a building, a billboard, a guideboard, etc.) under different viewing angles. By matching the feature points on the pictures, some feature points of the same object on different pictures can be successfully matched. The first feature point set is a set of feature points successfully matched on each picture of the at least two pictures. For example, when the at least two pictures only include two pictures (e.g., A, B two pictures), the first feature point set is A, B two pictures for matching and matching successfully matching feature points; when the at least two pictures include A, B, C three pictures, the first feature point set is feature points that match the A, B, C three pictures at the same time and match successfully, that is, the feature points that match successfully appear on the A, B, C three pictures at the same time, and cannot appear on only one or two of the A, B, C three pictures.

In an optional implementation manner, the specific implementation manner of obtaining the feature point located outside the calibration range of the target object on each of the at least two pictures in step 130 may include the following steps:

12) and extracting the characteristic points outside the target object calibration range on each picture in the at least two pictures by using a brisk operator, describing the characteristic points of each picture, and taking the described characteristic points as the characteristic points of the picture.

The specific implementation manner of the step 140 of matching the obtained feature points of the at least two frames of pictures to obtain the first feature point set successfully matched in the at least two frames of pictures may include the following steps:

13) and matching the feature points described by the at least two frames of pictures, and determining the feature points with the matching distance smaller than a preset value as a first feature point set which is successfully matched.

Specifically, the brisk algorithm has good performance in image registration application due to the characteristics of good rotation invariance, scale invariance, good robustness and the like. One feature point of a picture may be composed of two parts: key points and descriptors. The brisk algorithm mainly uses FAST9-16 to detect feature points, and obtains points with larger scores as feature points (i.e., key points), i.e., completes the extraction of the feature points. The feature point matching cannot be performed well only by using the information of the key points, so that more detailed information needs to be further obtained to distinguish features, and therefore, feature point description needs to be performed to obtain a feature descriptor. The change of the scale and the direction of the pictures caused by the change of the visual angle can be eliminated through the feature descriptor, and the pictures can be better matched. Each feature descriptor on a picture is unique and exclusive, and the similarity between each feature descriptor and each feature descriptor is reduced as much as possible. The brisk feature descriptor may be represented by a binary number, such as a 256-bit or 512-bit binary number.

The feature descriptors of each frame of picture are matched, specifically, a certain feature descriptor on one frame of picture is matched with all feature descriptors on other frames of pictures, matching distances (such as hamming distances) are respectively calculated, and a feature point on the other frames of pictures, where the matching distance is the minimum and the matching distance is less than a preset value, is taken as a matching point. According to the method, all the feature points on each frame of picture can be matched one by one, and the feature points which are successfully matched are found. It can be understood that after the matching distance is obtained, matching feature points may be determined together with uv coordinates of the feature points on the picture, for example, when the matching distance is smaller than a preset value and a difference between the uv coordinates of the feature points is within an allowable range, the feature points are determined as the matching feature points, otherwise, the feature points are not matched.

When a certain feature point on one frame of picture is matched with the feature point on one or more frames of pictures in other frames of pictures, but is not matched with the feature point on a certain frame or some frames of pictures, the feature point can be regarded as an invalid feature point, and can be discarded. When a certain feature point on one frame of picture can find a matched feature point on other frames of pictures, the feature point can be regarded as an effective feature point. All the valid feature points are collected together and can be regarded as a first feature point set.

For example, when the at least two pictures only include A, B pictures collected successively, it is assumed that 100 feature points are extracted from the a-frame picture and 200 feature points are extracted from the B-frame picture by using the brisk algorithm. Describing feature points in A, B two frames of pictures to obtain corresponding feature descriptors; after all feature descriptors on A, B two pictures are matched one by one, 50 successfully matched feature points are obtained, that is, 50 feature points on the picture of the a frame are matched with 50 feature points on the picture of the B frame one by one, so that the first feature point set can include 50 successfully matched feature points on the picture of the a frame and 50 feature points on the picture of the B frame, that is, the first feature point set can be regarded as 50 pairs of feature points.

For another example, when the at least two pictures include A, B, C three pictures collected successively, it is assumed that 100 feature points are extracted from the a-frame picture, 150 feature points are extracted from the B-frame picture, and 120 points are extracted from the C-frame picture by using the brisk algorithm. Describing feature points in A, B, C three-frame pictures to obtain corresponding feature descriptors; after matching all feature descriptors on A, B, C three-frame pictures one by one, 50 feature points are obtained, that is, 50 feature points on the a-frame picture, 50 feature points on the B-frame picture, and 50 feature points on the C-frame picture are all successfully matched, and then the first feature point set may include the successfully matched 50 feature points on the a-frame picture, 50 feature points on the B-frame picture, and 50 feature points on the C-frame picture, that is, the first feature point set may be regarded as 50 groups of feature points.

It is understood that other algorithms (such as ORB, SURF, or SIFT algorithm, etc.) may be used to extract and describe the image feature points, and different image registration results may be obtained by using different algorithms, that is, the registration results may be different.

150. And constructing three-dimensional space coordinates of the first characteristic point set.

In the embodiment of the application, based on the successfully matched first feature point set, the pose change, namely the translation amount and the rotation amount, of the monocular automobile data recorder during acquisition of each frame of picture can be calculated by utilizing the epipolar geometry. And then, the three-dimensional space coordinates of the first characteristic point set can be calculated by using the translation amount and the rotation amount among the frames of pictures.

Specifically, in an alternative embodiment, the specific implementation of constructing the three-dimensional space coordinates of the first feature point set in step 150 may include the following steps:

14) calculating a rotation matrix and a translation matrix between the at least two frames of pictures by using the first feature point set and by adopting epipolar constraint;

15) and generating a three-dimensional space coordinate of the first characteristic point set according to the rotation matrix and the translation matrix between the at least two frames of pictures.

For example, when the at least two pictures only include A, B pictures collected successively, feature points on A, B two pictures are matched to obtain 8 matching points, that is, the first feature point set includes 8 point pairs. From the 8 point pairs, a rotation matrix and a translation matrix of the B frame picture with respect to the a frame picture can be calculated.

Specifically, as shown in fig. 3, two frames of pictures of the same target object are taken at different positions, and pixel points corresponding to the same object in the pictures satisfy an epipolar constraint relationship. Where P is a real object in the world coordinate system, such as a point on a building. O is₁、O₂The optical center positions of the monocular automobile data recorder when the A frame picture and the B frame picture are shot respectively. I is₁、I₂Respectively representing a-frame pictures and B-frame pictures. p is a radical of₁、p₂The projection of the point P in the a frame picture and the projection of the point P in the B frame picture, that is, a pair of points matching successfully in the A, B two frames of pictures, are respectively. O is₁P is projected as e on B frame picture₂p₂Is marked as₂，O₂P is projected as e on A frame picture₁p₁Is marked as₁Wherein l is₁、l₂Called polar line, e₁、e₂Referred to as poles. According to the epipolar constraint:

obtaining:

wherein:

e is the essential matrix, t is the translation matrix, and R is the rotation matrix.

E was obtained by the 8-point method:

wherein (u)₁，v₁) Is p₁Image pixel coordinates of (u)₂，v₂) Is p₂The image pixel coordinates of (2).

Obtaining:

wherein:

the same representation is used for other pairs of points, so that all the equations obtained are put together to obtain a linear system of equations (u)ⁱ，vⁱ) Representing the ith matched point pair.

The essential matrix E is obtained by the above system of linear equations.

And (3) decomposing the singular value E to obtain 4 groups of t and R values which are respectively:

only one depth value in the 4 groups of results is positive, and the combination of t and R values with the positive depth value is a translation matrix and a rotation matrix of the B frame picture relative to the A frame picture.

It is understood that the above process is illustrated by an eight-point method, but is not limited thereto. When there are more than eight pairs of matched feature points on A, B two frames of pictures, a least square method can be constructed by using epipolar constraint to find a translation matrix and a rotation matrix between the two frames, wherein the least square method is a mature prior art, and a specific implementation process thereof will not be described here.

In addition, after the rotation matrix R and the translation matrix t between the respective frames of pictures are obtained using the first feature point set, the three-dimensional space coordinates of the respective feature points in the first feature point set (that is, the 3D positions of the feature points) can be calculated by triangulation.

160. And acquiring a next frame of picture acquired by the monocular automobile data recorder, identifying a target object in the next frame of picture, and acquiring the feature point of the next frame of picture.

In this embodiment of the application, after the three-dimensional space coordinate is constructed according to the at least two frames of pictures, a next frame of picture acquired by the monocular automobile data recorder may be acquired in real time, and a next frame of picture located after the at least two frames of pictures may also be acquired from a picture sequence, which is not limited herein. The yolo algorithm can be utilized to identify the target object on the next frame of picture, and the identified target object is calibrated. Further, feature points in the next frame of picture except for the target object calibration range may be extracted by using a brisk algorithm, and the extracted feature points are described to obtain a feature descriptor.

170. And determining the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set.

In an alternative embodiment, the specific implementation of the step 170 of determining the pose of the monocular automobile data recorder when the next frame of picture is taken according to the feature points of the next frame of picture and the three-dimensional space coordinates of the first feature point set may include the following steps:

16) matching the next frame of picture with each of the at least two frames of pictures to respectively obtain a feature point set of the next frame of picture successfully matched with each frame of picture;

17) according to the feature point set successfully matched with the next frame of picture and each frame of picture, determining feature points successfully matched with at least a preset number of frames of pictures in the next frame of picture as a second feature point set;

18) determining the three-dimensional space coordinate of a second characteristic point set according to the three-dimensional space coordinate of the first characteristic point set;

19) and determining the pose of the monocular automobile data recorder when the next frame of picture is shot by using the three-dimensional space coordinates of the second feature point set and the positions of the feature points on the next frame of picture in the second feature point set.

For example, two pictures are taken as a window, and it is assumed that the at least two pictures include A, B pictures, the next picture is a C picture, the number of feature points in the a picture is 100, the number of feature points in the B picture is 200, and the number of feature points in the A, B pictures matched successfully is 50, that is, the first feature point set includes 50 points. 200 feature points in the C frame picture are extracted, 70 feature points in the A frame picture are successfully matched with the feature points in the A frame picture, 60 feature points in the B frame picture are successfully matched with the feature points in the B frame picture, and the feature points in the C frame picture, which are successfully matched with both the feature points in the A frame picture and the feature points in the B frame picture, are classified into a second feature point set. For example, if the feature point numbered C1 in the C-frame picture matches the feature point numbered a3 in the a-frame picture and matches the feature point numbered B2 in the B-frame picture, the feature point C1 is a valid feature point, and the feature points (a 3, B2, C1) are one of the feature points in the second feature point set. When the feature point numbered C1 on the C frame picture matches only the feature point numbered a3 on the a frame picture, and no matching feature point is found in the B frame picture, the feature point C1 is an invalid feature point (or noise point) and will not be included in the second feature point set. According to the method, the matched feature points in the three frames of pictures can be found to form a second feature point set.

It is assumed that there are 30 feature points in all three frames of 70 feature points where the C-frame picture and the a-frame picture are successfully matched and 60 feature points where the C-frame picture and the B-frame picture are successfully matched, and the 30 feature points are included in the A, B50 feature points where the two pictures are successfully matched, so that the three-dimensional space coordinates of the 30 feature points can be extracted from the three-dimensional space coordinates of the 50 feature points. Of course, the three-dimensional space coordinates of the 30 feature points may be calculated by triangulation as it is, but not limited thereto. Further, the pose of the monocular automobile data recorder when the C frame picture is shot can be calculated by adopting a PnP optimization method according to the three-dimensional space coordinates of the 30 feature points and the positions (i.e. uv coordinates) of the 30 feature points on the C frame picture.

For example, three pictures are taken as a window, and it is assumed that the at least two pictures include A, B, C three pictures, the next picture is a D-frame picture, the number of feature points in the a-frame picture is 100, the number of feature points in the B-frame picture is 200, the number of feature points in the C-frame picture is 150, and the number of feature points in the A, B, C three-frame pictures matched successfully is 50, that is, the first feature point set includes 50 pairs of points. 200 feature points in the D frame picture are extracted, 70 feature points in the A frame picture are successfully matched with the feature points in the A frame picture, 60 feature points in the B frame picture are successfully matched with the feature points in the C frame picture, and 65 feature points in the C frame picture are successfully matched with the feature points in the D frame picture. Feature points in the D-frame picture that are successfully matched with at least two of the A, B, C three-frame pictures at the same time can be classified into the second feature point set, for example, a feature point in the D-frame picture can find a matched feature point in both of the A, B, C three-frame pictures, or a feature point can find a matched feature point in two of the A, B, C three-frame pictures, and then the feature point can be considered as a valid feature point and combined with feature points that are successfully matched with other pictures to serve as a group of feature points in the second feature point set. When a feature point in the D-frame picture finds a matching feature point on only one of the A, B, C three frames of pictures, the feature point can be considered as an invalid feature point (or noise point), and will not be included in the second feature point set. And matching one by one according to the method to find the matched feature points meeting the conditions to form a second feature point set. And further, calculating the pose of the monocular automobile data recorder when the D frame picture is shot by adopting a PnP (pseudo-random projection) optimization method through the three-dimensional space coordinate of the second feature point set and the position of the second feature point set in the D frame picture.

In practical applications, other numbers of frame pictures can be used as the reference window, such as 4 frames, 5 frames, 6 frames, or other values. When the number of the windows is different, the preset number in the step 16) is changed, for example, when the window takes 4 frames of pictures, the preset number can be set to 2, 3 or 4; when the window takes 5 pictures, the preset number may be set to 3 or 4 or 5.

180. And determining the position of the vehicle according to the pose of the monocular automobile data recorder when the next frame of picture is shot.

In the embodiment of the application, the monocular automobile data recorder is arranged on the vehicle, so that the pose of the monocular automobile data recorder when a certain frame of picture is shot can be regarded as the pose of the vehicle at that time, the position of the vehicle can be obtained, and the positioning of the vehicle is realized. Of course, a position relationship may also be preset between the monocular automobile data recorder and the vehicle, and the position of the monocular automobile data recorder may be converted according to the position relationship, so as to obtain the position of the vehicle.

In an alternative embodiment, the method depicted in fig. 1 may further include the steps of:

20) utilizing the residual characteristic point set obtained after the second characteristic point set is removed from the characteristic point set successfully matched with the next frame of picture and each frame of picture, and calculating the three-dimensional space coordinates of the residual characteristic point set by adopting triangulation;

21) and adjusting the three-dimensional space coordinates of the first characteristic point set and the three-dimensional space coordinates of the second characteristic point set by using the three-dimensional space coordinates of the residual characteristic point sets.

Still taking the two frames of pictures as the window as an example, the number of the remaining feature points between the C frame picture and the a frame picture is 70-30=40, and the number of the remaining feature points between the C frame picture and the B frame picture is 60-30=30, and the three-dimensional space coordinates of the 40 remaining feature points and the three-dimensional space coordinates of the 30 remaining feature points are calculated by triangulation respectively, so that the three-dimensional space coordinates of the first feature point set and the three-dimensional space coordinates of the second feature point set are adjusted by using the three-dimensional space coordinates of the remaining feature points, so that the three-dimensional space range corresponding to the first feature point set and the three-dimensional space range corresponding to the second feature point set can be expanded, a three-dimensional map containing more information is constructed, and further, the subsequent picture registration is facilitated, and the registration accuracy is improved.

22) carrying out iterative processing on each frame of picture subsequently acquired by the monocular automobile data recorder to obtain the pose of the monocular automobile data recorder when each frame of picture is shot;

23) and determining the moving track of the vehicle according to the pose of the monocular automobile data recorder when each frame of picture is shot.

The two frames of pictures are taken as the window as an example. The specific process of the iterative processing may be: when the next frame (D frame) picture is to be aligned, B, C frames of pictures are selected as reference windows, three-dimensional space coordinates of a first feature point set of B, C frames of pictures are constructed, feature points of the D frame are respectively matched with feature points of B, C frames of pictures, and a second feature point set which is successfully matched is obtained. And determining the pose of the monocular automobile data recorder when the D frame picture is shot by utilizing the three-dimensional space coordinate of the second characteristic point set and the position of the second characteristic point set on the D frame picture so as to obtain the position of the vehicle when the D frame picture is shot. When the next frame (E frame) picture is to be aligned, C, D frames of pictures are selected as reference windows, three-dimensional space coordinates of a first feature point set of C, D frames of pictures are constructed, and feature points of the E frame are respectively matched with feature points of C, D frames of pictures to obtain a second feature point set which is successfully matched. And determining the pose of the monocular automobile data recorder when the E frame picture is shot by utilizing the three-dimensional space coordinate of the second characteristic point set and the position of the second characteristic point set on the E frame picture so as to obtain the position of the vehicle when the E frame picture is shot. And (5) iterating backwards according to the process until the last frame of picture is obtained so as to obtain the position of the vehicle when the last frame of picture is shot.

The three frames of pictures are still taken as the window. The specific process of the iterative processing may be: when the next frame (E frame) picture is to be aligned, B, C, D three frames of pictures are selected as reference windows, three-dimensional space coordinates of a first feature point set of B, C, D three frames of pictures are constructed, and feature points of the E frame are respectively matched with feature points of B, C, D three frames of pictures to obtain a second feature point set which is successfully matched. And determining the pose of the monocular automobile data recorder when the E frame picture is shot by utilizing the three-dimensional space coordinate of the second characteristic point set and the position of the second characteristic point set on the E frame picture so as to obtain the position of the vehicle when the E frame picture is shot. When the next frame (F frame) picture is to be aligned, C, D, E three frames of pictures are selected as reference windows, three-dimensional space coordinates of a first feature point set of C, D, E three frames of pictures are constructed, feature points of the F frame are respectively matched with feature points of C, D, E three frames of pictures, and a second feature point set which is successfully matched is obtained. And determining the pose of the monocular automobile data recorder when the F frame picture is shot by utilizing the three-dimensional space coordinate of the second characteristic point set and the position of the second characteristic point set on the F frame picture so as to obtain the position of the vehicle when the F frame picture is shot. And (5) iterating backwards according to the process until the last frame of picture is obtained so as to obtain the position of the vehicle when the last frame of picture is shot.

As shown in fig. 4, the relative movement track of the vehicle can be determined according to the position of the vehicle when each frame of picture is taken. If the initial position of the vehicle is known (which can be obtained by a GPS positioning module or a beidou positioning device or an IMU on the vehicle) when shooting the initial several frames of pictures, the actual movement track of the vehicle can be determined according to the known initial position. According to the embodiment of the application, the GPS or IMU is only needed to be used for positioning at the beginning, the position of the vehicle is not needed any more subsequently, but the position and pose change of the monocular automobile data recorder during the acquisition of different pictures is used for estimating the position of the vehicle.

Therefore, in the embodiment of the application, two or more frames of pictures collected by a monocular automobile data recorder in sequence during the running of a vehicle are obtained, the target object identification is performed on each frame of picture, the identified target object is calibrated, the feature points of each frame of picture, which are positioned outside the calibration range of the target object, are obtained, the feature points of each frame of picture are matched, a first feature point set which is successfully matched is obtained, and the three-dimensional space coordinate is constructed by using the first feature point set; further, a next frame of picture acquired by the monocular automobile data recorder can be acquired, the feature points of the next frame of picture are acquired after the target object in the next frame of picture is identified and calibrated, the pose of the monocular automobile data recorder when the next frame of picture is shot is determined according to the feature points of the next frame of picture and the three-dimensional space coordinates of the first feature point set, and then the position of the vehicle when the next frame of picture is shot can be acquired. According to the technical scheme, dynamic object recognition can be carried out on the pictures collected under monocular vision, interference of the dynamic objects on three-dimensional space construction is removed, and instant positioning of the vehicle is achieved, so that moving tracks of the vehicle can be obtained through continuous updating and positioning of the pictures collected subsequently.

Referring to fig. 5, an embodiment of the present application provides an apparatus for implementing SLAM positioning in a dynamic environment. The device can be used for executing the method for realizing SLAM positioning in the dynamic environment provided by the embodiment. Specifically, as shown in fig. 5, the apparatus may include:

the first acquiring unit 51 is used for acquiring at least two frames of pictures acquired by the monocular automobile data recorder in the driving process of the vehicle;

the recognition unit 52 is configured to recognize the target object in the at least two frames of pictures and calibrate the target object in each frame of picture;

a second obtaining unit 53, configured to obtain a feature point located outside the calibration range of the target object on each of the at least two frames of pictures;

a matching unit 54, configured to match the obtained feature points of the at least two frames of pictures to obtain a first feature point set successfully matched in the at least two frames of pictures;

a construction unit 55, configured to construct three-dimensional space coordinates of the first feature point set;

the first obtaining unit 51 is further configured to obtain a next frame of picture acquired by the monocular automobile data recorder;

the identifying unit 52 is further configured to identify the target object marked in the next frame of picture;

the second obtaining unit 53 is further configured to obtain a feature point located outside the target object calibration range in the next frame of picture;

the determining unit 56 is configured to determine the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set;

the determining unit 56 is further configured to determine the position of the vehicle according to the pose of the monocular automobile data recorder when the next frame of picture is taken.

Optionally, the specific implementation of the identifying unit 52 identifying the target object in the at least two frames of pictures may be:

Optionally, the constructing unit 55 may be specifically configured to calculate, by using the first feature point set, a rotation matrix and a translation matrix between the at least two frames of pictures by using epipolar constraint; and generating a three-dimensional space coordinate of the first characteristic point set according to the rotation matrix and the translation matrix between the at least two frames of pictures.

Optionally, the apparatus shown in fig. 5 may further include:

the iteration unit is used for carrying out iteration processing on each frame of picture subsequently acquired by the monocular automobile data recorder to obtain the pose of the monocular automobile data recorder when each frame of picture is shot;

the determining unit 56 may further be configured to determine a moving track of the vehicle according to the pose of the monocular automobile data recorder when the each frame of picture is taken.

Optionally, the specific implementation manner of determining the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set by the determining unit 56 may be:

according to the feature point set successfully matched with the next frame of picture and each frame of picture, determining feature points successfully matched with at least a preset number of frames of pictures in the next frame of picture as a second feature point set;

determining the three-dimensional space coordinate of a second characteristic point set according to the three-dimensional space coordinate of the first characteristic point set;

and determining the pose of the monocular automobile data recorder when the next frame of picture is shot by utilizing the three-dimensional space coordinates of the second characteristic point set and the positions of the characteristic points which are positioned on the next frame of picture in the second characteristic point set.

Optionally, the apparatus shown in fig. 5 may further include:

the calculating unit is used for calculating the three-dimensional space coordinates of the residual characteristic point set by triangulation by utilizing the residual characteristic point set obtained by removing the second characteristic point set from the characteristic point set successfully matched with the next frame of picture and each frame of picture;

and the adjusting unit is used for adjusting the three-dimensional space coordinates of the first characteristic point set and the three-dimensional space coordinates of the second characteristic point set by using the three-dimensional space coordinates of the residual characteristic point sets.

By implementing the device shown in fig. 5, dynamic object recognition can be performed on the pictures acquired under monocular vision, and interference of the dynamic object on the three-dimensional space construction is removed, so that the instant positioning of the vehicle is realized, and the moving track of the vehicle can be obtained by continuously updating and positioning the pictures acquired subsequently.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated upon here.

Referring to fig. 6, an embodiment of the present application further provides another apparatus for implementing SLAM positioning in a dynamic environment. The device can be used for executing the method for realizing SLAM positioning in the dynamic environment provided by the embodiment. The apparatus may be any device having a computing unit, such as a computer, a server, a handheld device (e.g., a smart phone, a tablet computer, etc.), or a vehicle event data recorder, and the embodiments of the present application are not limited thereto. Specifically, as shown in fig. 6, the apparatus 600 may include: at least one processor 601, memory 602, at least one communication interface 603, and the like. Wherein the components may be communicatively coupled via one or more communication buses 604. Those skilled in the art will appreciate that the configuration of the apparatus 600 shown in fig. 6 is not intended to limit embodiments of the present application, and may be a bus or star configuration, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components. Wherein:

the Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 602 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 601 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. In addition, the memory 602 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 602 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The communication interface 603 may include a wired communication interface, a wireless communication interface, and the like, and may be used for performing communication interaction with the automobile data recorder, such as acquiring a video image captured by the automobile data recorder.

The memory 602 has stored thereon executable code which, when processed by the processor 601, causes the processor 601 to perform some or all of the steps of the methods described above.

In particular, processor 601 may be configured to invoke one or more executable codes stored in memory 602 to perform the following operations:

acquiring characteristic points outside the calibration range of the target object on each of the at least two pictures;

constructing three-dimensional space coordinates of the first characteristic point set;

Optionally, the specific implementation of the processor 601 identifying the target object in the at least two frames of pictures may be:

Optionally, a specific implementation of the processor 601 for constructing the three-dimensional space coordinates of the first feature point set may be:

calculating a rotation matrix and a translation matrix between the at least two frames of pictures by using the first feature point set and by adopting epipolar constraint;

and generating a three-dimensional space coordinate of the first characteristic point set according to the rotation matrix and the translation matrix between the at least two frames of pictures.

Optionally, the processor 601 may also call one or more executable codes stored in the memory 602 to perform the following operations:

and determining the moving track of the vehicle according to the pose of the monocular automobile data recorder when each frame of picture is shot.

Optionally, the specific implementation manner of determining the pose of the monocular automobile data recorder when the next frame of picture is shot according to the feature point of the next frame of picture and the three-dimensional space coordinate of the first feature point set by the processor 601 may be:

utilizing the residual characteristic point set obtained after the second characteristic point set is removed from the characteristic point set successfully matched with each frame of picture, and calculating the three-dimensional space coordinates of the residual characteristic point set by triangulation;

The aspects of the present application have been described in detail hereinabove with reference to the accompanying drawings. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. Those skilled in the art should also appreciate that the acts and modules referred to in the specification are not necessarily required in the present application. In addition, it can be understood that the steps in the method of the embodiment of the present application may be sequentially adjusted, combined, and deleted according to actual needs, and the modules in the device of the embodiment of the present application may be combined, divided, and deleted according to actual needs.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device, causes the processor to perform part or all of the steps of the above-described method according to the present application.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the applications disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for realizing SLAM positioning in a dynamic environment is characterized by comprising the following steps:

2. The method of claim 1, wherein the identifying the target object in the at least two frames of pictures comprises:

3. The method of claim 1, wherein constructing three-dimensional spatial coordinates of the first set of feature points comprises:

4. The method of implementing SLAM positioning in a dynamic environment as recited in claim 1, further comprising:

5. The method for realizing SLAM positioning in a dynamic environment according to any one of claims 1 to 4, wherein the determining the pose of the monocular automobile data recorder when taking the next frame of picture according to the feature point of the next frame of picture and the three-dimensional space coordinates of the first feature point set comprises:

6. The method of claim 5, wherein the method further comprises:

7. An apparatus for implementing SLAM positioning in a dynamic environment, comprising:

the second acquisition unit is used for acquiring the characteristic points which are positioned outside the calibration range of the target object on each picture in the at least two pictures;

8. The apparatus as claimed in claim 7, wherein the means for identifying the target object in the at least two pictures is specifically configured to:

9. An apparatus for implementing SLAM positioning in a dynamic environment, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-6.

10. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-6.