CN111340870B

CN111340870B - Topological map generation method based on vision

Info

Publication number: CN111340870B
Application number: CN202010041500.5A
Authority: CN
Inventors: 任鹏举; 曹琳阁; 丁焱; 李创; 熊帆; 李凌阁; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2022-04-01
Anticipated expiration: 2040-01-15
Also published as: CN111340870A

Abstract

The invention discloses a topological map generation method based on vision, which comprises the following steps: acquiring image information, and extracting image frames at a predetermined frequency based on the image information; obtaining a key frame based on image frame preprocessing, judging whether a loop is generated at present, if the loop is generated, correcting the existing topological map at present, and if the key frame is the first frame image in the process of drawing operation at this time, setting a topological node at the position of the key frame; other topological point discrimination methods include two types: firstly, comparing the image information with the image of the newly added topological node in the existing topological map, judging whether the similarity of the descriptors of the two images is lower than a first threshold value, if so, adding the position of the image of the key frame as a new topological node into the topological map, and storing the current frame. And secondly, judging whether the angle information of a plurality of continuous frames relative to the last topological node is larger than a second threshold value, if so, adding the position of the image of the key frame as a new topological node into the topological map, and storing the current frame.

Description

Topological map generation method based on vision

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a topological map generation method based on vision.

Background

Unmanned systems such as autonomous vehicles, unmanned aerial vehicles, and the like acquire and describe the driving state of the system by using sensors built in the system, and complete autonomous navigation tasks by using the acquired information. Typical sensors used in unmanned systems include lasers, radars, cameras, IMUs, and the like. Because laser and radar are expensive, the price of the camera is relatively low, and the navigation technology based on the computer vision technology is widely researched and applied.

The instant positioning and Mapping (SALM) technology refers to a system with a specific sensor, and a map including ambient environment information is established in a motion process by estimating a state of the system through motion. When we mainly use a camera as an input sensor, the system is called a visual SLAM system. For example, the ORB-SLAM system is a complete scheme based on monocular, binocular and RGB-D, and can realize the functions of map reuse, loop detection and relocation. However, the current visual SLAM method has the problems of large calculation amount, large memory occupation, sensitivity to external conditions such as illumination and the like, so that the overall performance of the SLAM system is reduced. Especially, when the system is mounted on a mobile intelligent agent for use, a large amount of information, such as a large number of feature points and the like, needs to be stored when a map is established by using the conventional visual SLAM technology, and along with the increase of working time, the calculation amount of the system is larger and larger, and the load is larger and larger. The existing visual SLAM method does not meet the requirements of limited computing capability and less memory occupation of a mobile terminal.

In order to complete the navigation task of the unmanned system, a navigable map needs to be established first, and currently, the representation method of the environment space can be roughly divided into a metric map and a topological map. The metric map is divided into sparse and dense, but both need to consume a large amount of storage space. Topological maps emphasize relationships between map elements more than metric maps. The topological map consists of edges and nodes, a large amount of detailed and fussy information is ignored, only the connectivity of the nodes is considered, the expression is more compact and concise, and the navigation between the nodes can be well completed. In the navigation problem of a large environment range, the requirement of the system on the position precision is low, and in this case, compared with a measurement map, the topological map has the advantages of saving calculation and space. However, in the existing method, the generation of the topological map lacks effective logic, and the space representation efficiency of the generated nodes is low.

The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art.

Disclosure of Invention

Aiming at the problems of large calculation amount, large occupied storage space and the like in the prior art, the invention provides a topological map generation method based on vision, which utilizes visual information to establish a simple and efficient navigable map, so that the topological map has high representation efficiency, high system operation speed and small occupied memory space.

The invention aims to realize the purpose through the following technical scheme, and the visual-based topograph generation method comprises the following steps of:

a vision-based topological map generation method, the method comprising the steps of:

in the first step, image information is obtained, and image frames are extracted at a preset frequency based on the image information;

in the second step, a key frame is obtained based on the image frame image preprocessing, whether a loop is generated at present is judged, if the loop is generated, the current existing topological map is corrected, and if the key frame is the first frame image in the process of drawing construction and operation at this time, a topological node is arranged at the position of the key frame;

in the third step, comparing the image information with the image of the latest added topology node in the existing topology map, judging whether the similarity of the descriptors of the two images is lower than a first threshold value, if so, adding the position of the image of the key frame as a new topology node into the topology map, storing the current frame, and entering the sixth step, otherwise, entering the fourth step;

in the fourth step, extracting horizontal rotation angles based on the key frames, if a plurality of frames continuously appear and are compared with the orientation of the last topological node, and the rotation angles are larger than a second threshold value according to the horizontal rotation angles of a plurality of latest frames, judging that the current state is turning, adding the position of the current key frame into a topological map, entering the sixth step, and if not, entering the fifth step;

in the fifth step, when no topological node is newly added beyond the preset maximum spacing distance, adding the current position into a topological map, retaining corresponding information, entering the sixth step, and otherwise, entering the processing of the next frame of image;

and in the sixth step, updating the current latest topological node information, and entering the processing of the next frame of image.

In the second step, the image frame is input into an instant positioning and map building system SLAM for preprocessing to obtain a key frame, the instant positioning and map building system SLAM detects whether a loop is generated at present, and the image frame is input into a convolutional neural network to generate a descriptor of the current image.

In the method, the descriptor normalization formula is as follows:

wherein is f_iCurrently entered descriptor, μ_sAnd σ_sTo accumulate the mean and variance, f_i' is the descriptor that is finally stored.

In the method, in the third step, the descriptor similarity of the two images is calculated as the cosine distance between the descriptors of the two images, the smaller the cosine distance is, the greater the similarity is, and the descriptor similarity formula is as follows:

wherein f is_iAnd f_jThe descriptors respectively represent an image i and an image j, and the value range of the above formula is [ -1, 1]。

In the third step, the first threshold is 0, and when the similarity between the current frame image and the descriptor of the latest node is less than 0, the current frame is considered as a new node, and the related information of the current frame is stored in the new topology node.

In the method, the prediction posture during the SLAM preprocessing of the instant positioning and map building system is as follows: and (4) a dose: { { position: x, y, z }, { orientation: x, y, z, w } }, wherein position.orientation.w represents an angle difference value between the current orientation of the intelligent agent and the orientation of the previous node position, when the difference value is greater than a second threshold value, the current system state is considered to be turning, a turning flag bit is set to be 1, the current frame is set to be t moment, three flag bits are set to respectively represent the turning states of t-2 moment, t-1 moment and t moment, 0 is taken when no turning operation is performed, otherwise 1 is taken, and when three turning flag bits are continuously 1, the position of the system at the current t moment is taken as a new topological node to be added into the topological map and stored into the current frame.

In the method, in the sixth step, when the current latest topological node information is updated, a topological node is added into the corresponding coordinate in the map_nWhen n > 1, the node is the latest topological node_n-1Make a connection with the direction pointing to the node_n。

In the method, in a first step, image information is acquired from a camera sensor, and the acquired image frame is subjected to correction processing.

In the fourth step, the second threshold value is set to be 20-30 degrees.

In the method, in the fifth step, the maximum distance is preset to ensure that the navigation drift between the topological points does not exceed 5% of the path length between the navigation points.

Advantageous effects

The invention realizes the establishment of the topological map by acquiring the image descriptor by using the neural network based on the visual information. The similarity between the descriptors can reflect the similarity between the images well, so that a criterion for adding nodes is provided for the topological map. The speed of obtaining the image descriptor by using the neural network is high, the obtained descriptor occupies a small memory, the accuracy is high, and the effect is better for the mobile terminal intelligent agent. The descriptor extracted based on the method not only has better representation to the image, but also occupies less memory. For similar input images, the image descriptors obtained by the neural network should have a higher similarity, or a shorter distance between the descriptor vectors of the two images. The invention realizes the loop detection of the topological map. According to the method, the SLAM algorithm is combined, the SLAM map is built simultaneously in the process of building the topological map, the SLAM helps the topological map to drift and correct, loop detection of the topological map is achieved, and the more accurate topological map is built. During navigation, the navigation function of the topological map can be realized without simultaneously operating the SLAM, and the consumption of the memory is greatly reduced.

Drawings

Various other advantages and benefits of the present invention will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings in the specification are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be obtained from them without inventive effort. Also, like parts are designated by like reference numerals throughout the drawings.

In the drawings:

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart illustrating a typical SLAM algorithm, taking an ORB-SLAM as an example;

FIG. 3 is a topology node decision flow diagram;

fig. 4 is a schematic diagram of the experimental results.

The invention is further explained below with reference to the figures and examples.

Detailed Description

Specific embodiments of the present invention will be described in more detail below with reference to fig. 1 to 4. While specific embodiments of the invention are shown in the drawings, it will be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

It should be noted that certain terms are used throughout the description and claims to refer to particular components. As one skilled in the art will appreciate, various names may be used to refer to a component. The present specification and claims do not distinguish between components by way of noun differences, but rather differentiate between components in function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. The description which follows is a preferred embodiment of the invention, but is made for the purpose of illustrating the general principles of the invention and not for the purpose of limiting the scope of the invention. The scope of the invention is to be determined by the claims appended hereto.

For the purpose of facilitating an understanding of the embodiments of the present invention, the following detailed description will be given by way of example with reference to the accompanying drawings, and the drawings are not intended to limit the embodiments of the present invention.

For better understanding, fig. 1 is a flowchart of a visual-based topological map generation method, and as shown in fig. 1, a visual-based topological map generation method includes the following steps:

in a first step S1, image information is acquired, and image frames are decimated at a predetermined frequency based on the image information;

in a second step S2, a key frame is obtained based on the image frame image preprocessing, whether a loop is generated currently is determined, if a loop is generated, the current existing topological map is corrected, and if the key frame is the first frame image in the current mapping operation process, a topological node is set at the position of the key frame. The selection of the key frame is based on whether enough parallax exists between the current frame and the previous key frame, and the loop detection is based on the image descriptor similarity and the feature point matching comprehensive judgment;

in a third step S3, comparing the image information with the image of the newly added topological node in the existing topological map, determining whether the similarity of the descriptors of the two images is lower than a first threshold, if so, adding the position of the image of the key frame as a new topological node into the topological map, storing the current frame, and entering a sixth step, otherwise, entering the fourth step; the key frame in S2 is also the key frame currently being processed, and if the similarity between the currently processed key frame image and the image of the node created last time is low, we consider that the scene has changed greatly and need to create a new node.

In a fourth step S4, a horizontal rotation angle is extracted based on the key frames, and according to the horizontal rotation angles of the latest frames, if several frames appear continuously and are compared with the orientation of the last topological node, and the rotation angle is greater than a second threshold, it is determined that the current state is a turn, the position where the current key frame is located is added to the topological map, and the sixth step is performed, otherwise, the fifth step is performed, where the second threshold is determined based on the angular velocity and the directional stability of the robot, and further, several frames are three frames.

In a fifth step S5, when no new topological node is added beyond the predetermined maximum distance, adding the current position into the topological map, retaining the corresponding information, and entering the sixth step, otherwise, entering the processing of the next frame of image;

in the sixth step S6, the current latest topology node information is updated, and the process proceeds to the next frame image.

In a preferred embodiment of the method, in the second step S2, the image frame is input into the instant location and map construction system SLAM to be preprocessed to obtain a key frame, the instant location and map construction system SLAM detects whether a loop is generated currently, and the image frame is input into the convolutional neural network to generate a descriptor of the current image.

In a preferred embodiment of the method, the descriptor normalization formula is:

The method describedIn a preferred embodiment, in the third step S3, the descriptor similarity of the two images is calculated as the cosine distance between the descriptors of the two images, and the similarity is greater as the cosine distance is smaller, and the descriptor similarity formula is:

In a preferred embodiment of the method, in the third step S3, when the first threshold is 0, and the similarity between the current frame image and the descriptor of the latest node is less than 0, the current frame is considered as a new node, and the relevant information of the current frame is stored in the new topology node.

In the preferred embodiment of the method, the predicted pose during the SLAM preprocessing of the instant positioning and mapping system is as follows: and (4) a dose: { { position: x, y, z }, { orientation: x, y, z, w } }, wherein, position.orientation.w represents the angle difference between the current orientation of the intelligent agent and the orientation of the previous node position, when the difference is greater than a second threshold value, the current system state is considered to be turning, a turning flag bit is set to be 1, the current frame is set to be t moment, three flag bits are set to respectively represent the turning states of t-2 moment, t-1 moment and t moment, 0 is taken when no turning operation is carried out, otherwise 1 is taken, and when three turning flag bits are continuously 1, the position of the system at the current t moment is taken as a new topological node to be added into the topological map and stored into the current frame.

In a preferred embodiment of the method, in the sixth step S6, when the current latest topological node information is updated, a topological node is added to the corresponding coordinate in the map_nWhen n > 1, the node is connected with the last latest topological node_n-1Make a connection with the direction pointing to the node_n。

In a preferred embodiment of the method, in the first step S1, image information is acquired from a camera sensor, and a correction process is performed on an acquired image frame.

In a preferred embodiment of the method, in the fourth step S4, the second threshold is set to 20 to 30 °.

In a preferred embodiment of the method, in a fifth step S5, the maximum separation distance is predetermined to be such that the navigation drift between the topological points does not exceed 5% of the path length between the navigation points.

To further understand the present invention, in one embodiment, the following steps are included:

step 1, acquiring image information from a camera sensor, extracting image frames at a preset frequency, correcting the image frames and sending the image frames into a system;

and 2, inputting the corrected image into a system, preprocessing the image, judging whether a loop is generated currently or not, and correcting the current existing topological map if the loop is generated. If the image is the first frame image in the process of the current drawing operation, a topological node is automatically arranged at the position of the image;

and 3, comparing the image information extracted in the previous step with the image of the newly added node in the existing topological map, and judging whether the similarity of the descriptors of the two images is lower than a threshold value. If the image position of the key frame is lower than the threshold value, the position of the image of the key frame is used as a new topological node to be added into the topological map, the relevant information of the current frame is stored, and the step 6 is entered. Otherwise, entering step 4;

and 4, extracting data representing the horizontal rotation angle from the key frame information obtained in the step 2, judging that the current state is a turn according to the horizontal rotation angle information of the key frames of the latest frames if the rotation angle is larger than a threshold value when the orientation of the key frames of the latest frames is continuously compared with the orientation of the last topological node, adding the position of the current key frame into a topological map, and entering the step 6. Otherwise, go to step 5. The threshold value is set to be 20-30 degrees, and excessive invalid topological points are prevented from being established due to gradual curve change and orb algorithm errors;

and 5, finally, setting a maximum spacing distance, adding the current position into the topological map when no topological node is newly added beyond the distance, retaining corresponding information, and entering the step 6. Otherwise, processing of the next frame image is started. The judgment of the maximum spacing distance is mainly determined by the precision of the inertial sensor, and the navigation drift between the topological points is required to be not more than 5% of the path length between the navigation points.

And 6, updating the current latest topology node information. And entering the processing of the next frame of image.

Further, the implementation method of step 2:

step 2.1, input image into SLAM system, SLAM system flowchart taking ORB-SLAM as example is shown in FIG. 2. The ORB-SLAM processes the current frame image, extracts feature points and key frames and constructs a map.

And at the same time, SLAM performs loop detection. When the SLAM system detects a loop, the local map is optimized, and at this time, the topological map needs to be updated accordingly. The system searches all the topological nodes in the established topological map, updates all the topological nodes according to the corresponding optimized SLAM map, and corrects the drift error.

And 2.2, inputting the image to a specific convolutional neural network to finish the generation of the descriptor of the current image, taking AlexNet obtained by training based on a places365 data set as an example, taking a feature map of a third convolutional layer of the network, and performing dimension reduction through Gaussian random mapping to obtain a vector with the length of 4096 as the descriptor of the current image.

In the process of solving the descriptor, the descriptor needs to be normalized, and the formula is as follows:

wherein is f_iCurrently entered descriptor, μ_sAnd σ_sTo accumulate the mean and variance, f_i' is the last stored descriptor.

Further, the implementation method of step 3: and (3) solving the similarity between the current image descriptor obtained in the step (2) and the latest descriptor of the topological node. The similarity is calculated by calculating the cosine distance between two image descriptors, and the smaller the cosine distance is, the greater the similarity is, so the similarity is obtained by subtracting the cosine distance from 1, and the formula is as follows:

wherein f is_iAnd f_jRepresenting the descriptors of image i and image j, respectively. The value range of the above formula is [ -1, 1]According to the experimental results, 0 is taken as the threshold. When the similarity between the current frame image and the latest node is less than 0, the current frame is considered as a new node, and the related information of the current frame is stored into the new topological node.

Further, the implementation method of step 4:

in step 2, the ORB-SLAM will provide a predicted pose, the predicted pose being in the form of:

pose：{{posiztion：x，y，z}，{orientation：x，y，z，w}}

position.w represents the angular difference between the current orientation of the agent and the orientation of the last node position. When the difference is larger than a certain threshold value, the current system state is considered to be turning, and the turning flag bit is set to be 1. And setting the current frame as the time t, setting three flag bits to represent turning states at the time t-2, the time t-1 and the time t respectively, and taking 0 when no turning operation is performed, or taking 1 when no turning operation is performed. And when three turning zone bits are all 1, adding the position of the system at the current time t as a new topological node into the topological map, and storing the relevant information of the current frame.

Further, the implementation method of step 6: when the system confirms that a new topological node needs to be added currently, the contents to be completed include: adding a topology node into the corresponding coordinate in the map_nWhen n > 1, the node is connected with the last latest topological node_n-1Make connection with direction pointing to node_n. This part requires the use of a database, such as MongoDB, to build the topological map.

The specific method for adding the nodes is to add the related information of the current topological nodes into the database. The information of a topological node can be represented as:

{

Node，

Enable node：{node₁，node₂...node_n}，

Pose，

Keypoints：{kp₁，kp₂...kp_n}，

Descriptors

}

node-name representing a topology Node;

enable node-represents other node names that the current topology node can communicate with;

Pose-Pose information representing topological nodes, including world system coordinates and directions;

keypoints-feature information representing all feature points in the current image;

descriptors — a visual descriptor that represents the current image.

Furthermore, there is a need for offset tolerance for a given node pose, i.e., how many difference ranges the pose can be considered back to the node.

The obtained topological map occupies relatively less storage space, can meet the lightweight requirement of a mobile intelligent body, can represent enough information at the same time, and can be used for navigation. The experimental effect is shown in fig. 4.

Although the embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments and application fields, and the above-described embodiments are illustrative, instructive, and not restrictive. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto without departing from the scope of the invention as defined by the appended claims.

Claims

1. A vision-based topological map generation method, the method comprising the steps of:

in a first step (S1), image information is acquired, and image frames are decimated at a predetermined frequency based on the image information;

in the second step (S2), a key frame is obtained based on the image frame image preprocessing, whether a loop is generated currently is judged, if the loop is generated, the current existing topological map is corrected, and if the key frame is the first frame image in the process of image construction and operation, a topological node is arranged at the position of the key frame;

in the third step (S3), comparing the image information with the image of the newly added topological node in the existing topological map, determining whether the descriptor similarity of the two images is lower than a first threshold, if so, adding the position of the image of the key frame as the new topological node into the topological map, storing the current frame, and entering the sixth step, otherwise, entering the fourth step;

in the fourth step (S4), extracting horizontal rotation angles based on the key frames, and according to the horizontal rotation angles of the latest frames, if the orientation of a plurality of frames is continuously compared with the orientation of the last topological node and the rotation angle is greater than a second threshold, determining that the current state is a turn, adding the position of the current key frame into a topological map, entering the sixth step, otherwise, entering the fifth step;

in the fifth step (S5), when no topological node is added beyond the preset maximum distance, adding the current position into the topological map, keeping the corresponding information, and entering the sixth step, otherwise, entering the processing of the next frame of image;

in the sixth step (S6), the current latest topology node information is updated, and the process proceeds to the next frame image.

2. The method as claimed in claim 1, wherein in the second step (S2), the image frame is input into a location and map system SLAM to be preprocessed to obtain a key frame, the location and map system SLAM detects whether a loop is generated currently, and the image frame is input into a convolutional neural network to generate a descriptor of the current image.

3. The method of claim 2, wherein the descriptor normalization formula is:

4. The method according to claim 3, wherein in the third step (S3), the descriptor similarity of the two images is calculated as the cosine distance between the descriptors of the two images, and the smaller the cosine distance, the greater the similarity, and the descriptor similarity formula is:

5. The method according to claim 1, wherein in the third step (S3), the first threshold is 0, and when the similarity between the current frame image and the descriptor of the latest node is less than 0, the current frame is considered as the new node, and the related information of the current frame is stored in the new topology node.

6. The method of claim 2, wherein the predicted pose at the time of SLAM pre-processing by the instant positioning and mapping system is: and (4) a dose: { { position: x, y, z }, { orientation: x, y, z, w } }, wherein position.orientation.w represents the angle difference between the current orientation of the intelligent agent and the orientation of the previous node position, when the difference is greater than a second threshold value, the current system state is considered to be turning, a turning flag bit is set to 1, the current frame is set to t moment, three flag bits are set to respectively represent the turning states of t-2 moment, t-1 moment and t moment, 0 is taken when no turning operation is performed, otherwise 1 is taken, and when three turning flag bits are continuously 1, the position where the current t moment system is located is taken as a new topological node to be added into the topological map and stored into the current frame.

7. The method according to claim 1, wherein in the sixth step (S6), when updating the current latest topological node information, a topological node is added to the corresponding coordinates in the map_nWhen n > 1, the node is connected with the last latest topological node_n-1Make a connection with the direction pointing to the node_n。

8. The method according to claim 1, wherein in a first step (S1), image information is acquired from a camera sensor, and the acquired image frames are subjected to a correction process.

9. The method according to claim 1, wherein in the fourth step (S4), the second threshold value is set to 20-30 °.

10. The method according to claim 1, wherein in a fifth step (S5), the predetermined maximum separation distance is dimensioned such that the navigation drift between topological points does not exceed 5% of the path length between navigation points.