CN111735446B

CN111735446B - Laser and visual positioning fusion method and device

Info

Publication number: CN111735446B
Application number: CN202010656372.5A
Authority: CN
Inventors: 王小挺; 白静; 程伟; 谷桐; 张晓凤; 陈士凯
Original assignee: Shanghai Slamtec Co Ltd
Current assignee: Shanghai Slamtec Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-11-13
Anticipated expiration: 2040-07-09
Also published as: CN111735446A; WO2022007385A1

Abstract

The method comprises the steps of obtaining a first map established by a laser mapping engine and a second map established by a visual mapping engine, wherein the second map comprises a plurality of visual label subgraphs; calculating the three-dimensional transformation relation between each visual label subgraph and the corresponding first map; when the visual mapping engine is in a positioning mode, screening all visual label sub-images, and determining positioning quality according to the screened visual label sub-images and the three-dimensional transformation relation; and determining visual positioning observation input information according to the positioning quality, and fusing the visual positioning observation input information with laser positioning information and odometer information. Therefore, through the cooperative work of the laser mapping engine and the visual mapping engine, the positioning stability can be kept and the applicable application scene range is expanded.

Description

Laser and visual positioning fusion method and device

Technical Field

The application relates to the technical field of computers, in particular to a method and equipment for fusing laser and visual positioning.

Background

Laser synchronous positioning and map construction (SLAM) are started earlier, are relatively mature in theory, technology and product landing, and are the most stable/mature positioning and navigation method at present. The grid map obtained by laser SLAM mapping can be directly used for path planning and navigation. However, when the map is different from the actual environment greatly or when there are more glass walls, it is difficult for the laser SLAM to maintain the positioning stability.

The VSLAM has the advantages of low deployment cost, large information amount, wide application range and the like, is a mainstream direction for future development, and has the defects of large influence by illumination, incapability of using a constructed map for path planning and the like. The visual tag slam (tagslam) is basically the same as the principle of the conventional VSLAM, but the decodable visual tag is adopted to replace the image feature points, and the geometric constraint of the visual tag and the uniqueness of the coded value are utilized to carry out accurate positioning. The TagSLAM can attach the label to an area which is not easily changed, such as a ceiling, so that the TagSLAM can maintain the positioning stability even if the environment is greatly changed. The TagSLAM has the disadvantages that the TagSLAM needs to be deployed in an intrusive way to modify the environment, cannot be applied to all scenes, and is difficult to cover the whole environment for some places where tags can be deployed.

Disclosure of Invention

An object of the present application is to provide a method and an apparatus for fusing laser and visual positioning, which solve the problems that the positioning stability of the existing positioning method in the prior art is difficult to maintain and cannot be applied to all scenes.

According to one aspect of the present application, there is provided a method of laser, visual-localization fusion, the method comprising:

acquiring a first map established by a laser mapping engine and a second map established by a visual mapping engine, wherein the second map comprises a plurality of visual label subgraphs;

calculating the three-dimensional transformation relation between each visual label subgraph and the corresponding first map;

when the visual mapping engine is in a positioning mode, screening all visual label sub-images, and determining positioning quality according to the screened visual label sub-images and the three-dimensional transformation relation;

and determining visual positioning observation input information according to the positioning quality, and fusing the visual positioning observation input information with laser positioning information and odometer information.

Further, calculating a three-dimensional transformation relationship of each visual label subgraph and the corresponding first map, comprising:

recording the corresponding relation between the visual mapping key frame in the second map and the laser mapping key frame in the first map according to the time stamp;

and calculating the three-dimensional transformation relation between each visual label subgraph and the corresponding first map according to the corresponding relation.

Further, when the visual mapping engine is in a positioning mode, screening all visual tag subgraphs, including:

acquiring all visual labels observed when the visual mapping engine is in a positioning mode;

decoding all the visual labels to obtain legal coding values, judging whether one visual label sub-image in all the visual label sub-images contains the coding values, and if so, taking the visual label sub-image containing the coding values as the screened visual label sub-image.

Further, calculating a three-dimensional transformation relation between each visual label subgraph and the corresponding first map according to the corresponding relation, and the method comprises the following steps:

calculating the position of each visual label subgraph according to the corresponding relation;

and calculating the three-dimensional transformation from the position of each visual label subgraph to the corresponding first map to obtain a three-dimensional transformation relation.

Further, determining the positioning quality according to the screened visual label subgraphs and the three-dimensional transformation relation, and the method comprises the following steps:

determining the average distance between corresponding transformed key frames according to the screened visual label subgraphs and the three-dimensional transformation relation;

normalizing the average distance to obtain the mapping quality of the visual mapping;

and determining the positioning quality according to the mapping quality of the visual mapping.

Further, determining the positioning quality according to the mapping quality of the visual mapping, comprising:

and determining the positioning quality according to the mapping quality of the visual mapping, the number of the visual labels and the threshold value of the number of the visual labels.

Further, determining an average distance between corresponding transformed key frames according to the screened visual label subgraphs and the three-dimensional transformation relation, comprising:

acquiring the pose of a visual mapping key frame in the screened visual label sub-images and the pose of a laser mapping key frame in a first map corresponding to the screened visual label sub-images;

and calculating the average distance between the transformed visual mapping key frame and the corresponding laser mapping key frame according to the pose of the visual mapping key frame, the pose of the laser mapping key frame and the three-dimensional transformation relation.

Further, determining visual positioning observation input information according to the positioning quality comprises:

determining the current pose of the equipment where the vision mapping engine is located;

taking the laser mapping as a main system, polling the current pose of equipment where the vision mapping engine is located through process communication, and judging whether the current pose is an effective pose;

and when the current pose is an effective pose, preprocessing the current pose according to the positioning quality, and determining visual positioning observation input information according to a preprocessing result.

According to yet another aspect of the present application, there is also provided a laser, visual positioning fusion apparatus, comprising:

the acquisition device is used for acquiring a first map established by a laser mapping engine and a second map established by a visual mapping engine, wherein the second map comprises a plurality of visual label subgraphs;

the computing device is used for computing the three-dimensional transformation relation between each visual label subgraph and the corresponding first map;

the determining device is used for screening all the visual label sub-images when the visual mapping engine is in a positioning mode, and determining the positioning quality according to the screened effective visual label sub-images and the three-dimensional transformation relation;

and the fusion device is used for determining visual positioning observation input information according to the positioning quality and fusing the visual positioning observation input information with the laser positioning information and the odometer information.

According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method as described above.

Compared with the prior art, the method and the device have the advantages that the first map established by the laser mapping engine and the second map established by the visual mapping engine are obtained, wherein the second map comprises a plurality of visual label sub-maps; calculating the three-dimensional transformation relation between each visual label subgraph and the corresponding first map; when the visual mapping engine is in a positioning mode, screening all visual label sub-images, and determining positioning quality according to the screened visual label sub-images and the three-dimensional transformation relation; and determining visual positioning observation input information according to the positioning quality, and fusing the visual positioning observation input information with laser positioning information and odometer information. Therefore, through the cooperative work of the laser mapping engine and the visual mapping engine, the positioning stability can be kept and the applicable application scene range is expanded.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a schematic flow diagram of a method of laser, visual-location fusion provided in accordance with an aspect of the present application;

FIG. 2 is a schematic flow chart diagram illustrating the construction of a specific embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating a method for laser and visual positioning fusion based on an extended Kalman filter according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a laser and visual positioning fusion device provided in another aspect of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or flash Memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change RAM (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic Disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

Fig. 1 shows a schematic flow chart of a method of laser and visual positioning fusion provided according to an aspect of the present application, the method comprising: step S11 to step S14,

in step S11, acquiring a first map created by a laser mapping engine and a second map created by a visual mapping engine, wherein the second map includes a plurality of visual label subgraphs; the laser mapping engine is used for mapping the laser SLAM, the established map is a first map, the first map is a grid map, the visual mapping engine is used for mapping the TagSLAM, the established map is a second map, when the second map is established, mapping is only carried out in an area with a label, the area where the label is located is a sub-map, and after the overall mapping is completed, a plurality of visual label sub-maps are obtained. The application range of the visual tag SLAM can be greatly expanded through the subgraph.

In step S12, calculating a three-dimensional transformation relationship between each visual label subgraph and the corresponding first map; the TagSLAM is constructed in the form of sub-graphs, and after each visual tag sub-graph is completed, the three-dimensional transformation relation between the visual tag sub-graph and the corresponding first map is calculated, wherein the transformation relation is a rotation matrix r and a translation vector t.

In step S13, when the visual mapping engine is in the positioning mode, screening all the visual label sub-images, and determining the positioning quality according to the screened visual label sub-images and the three-dimensional transformation relationship; and if the effective visual tag is obtained through observation, determining positioning quality according to a subgraph where the effective visual tag is located and the three-dimensional transformation relation obtained through calculation, wherein the positioning quality is used for describing the positioning reliability, the visual positioning quality is between 0 and 1, and the larger the numerical value is, the more reliable the visual positioning quality is.

In step S14, visual positioning observation input information is determined according to the positioning quality, and the visual positioning observation input information is fused with the laser positioning information and the odometer information. Determining whether the visual positioning observation input information is the current pose or a new pose which needs to be obtained by repositioning according to the positioning quality, or not fusing the current pose; the determined visual positioning observation input information is fused with the laser positioning and the odometer, an Extended Kalman Filter (EKF) can be used for the fusion, and of course, other positioning information such as GPS and third-party VSLAM positioning can be accessed, and the EKF can process a nonlinear system, so that the different positioning information from different sources can be well fused.

In an embodiment of the present application, in step S12, the correspondence between the visual mapping key frame in the second map and the laser mapping key frame in the first map is recorded according to the time stamp; and calculating the three-dimensional transformation relation between each visual label subgraph and the corresponding first map according to the corresponding relation. The laser SLAM and the tagSLAM independently build a map, meanwhile, the laser SLAM pushes the key frames to the tagSLAM in real time, and the tagSLAM records the one-to-one corresponding relation of the key frames in the two maps according to the time stamps during map building. If a key frame is generated during laser mapping, the first frame after the key frame is a visual key frame; if the observed position of the label is compared with the previous frame, and the distance is greater than the preset pixel (for example, 18 pixels), the selected key frame participates in drawing, but the selected key frame does not have a corresponding laser key frame and does not participate in the process of three-dimensional transformation of a subsequent computational sub-graph. And when the map building is finished, calculating the three-dimensional transformation from the TagSLAM map to the laser SALM map by an ICP (Iterative Closest Point) algorithm according to the matching relation of the coordinate values of the key frames. In the embodiment of the application, the visual label sub-images refer to local maps established by a visual mapping engine, a plurality of visual label sub-images in the second map are mutually independent, and the pose of each visual label sub-image in a world coordinate system is obtained by calculating the three-dimensional transformation relation between each visual label sub-image and the corresponding first map.

Specifically, the position of each visual tag sub-image can be calculated according to the corresponding relation; and calculating the three-dimensional transformation from the position of each visual label subgraph to the corresponding first map to obtain a three-dimensional transformation relation. And obtaining three-dimensional transformation from each visual label subgraph to the corresponding first map through a point cloud matching algorithm according to the one-to-one correspondence relationship of the key frames.

In an embodiment of the present application, in step S13, all visual labels observed by the visual mapping engine when in the positioning mode are obtained; decoding all the visual labels to obtain legal coding values, judging whether one visual label sub-image in all the visual label sub-images contains the coding values, and if so, taking the visual label sub-image containing the coding values as the screened visual label sub-image. One visual label only belongs to one sub-image, and one sub-image comprises a plurality of visual labels; and (3) screening all observed visual tags when the TagSLAM is in a positioning mode, and screening out effective visual tags, namely calculating the current pose according to the corresponding visual tag subgraphs and the three-dimensional transformation relation if the effective visual tags are observed. Wherein the determination of the valid visual label is as follows: and if the visual label is decoded to obtain a legal coded value, and the coded value is in a certain visual label sub-image, the visual label is an effective visual label, participates in positioning, and is positioned according to the sub-image where the effective visual label is positioned and the three-dimensional transformation relation.

In an embodiment of the present application, in step S13, determining an average distance between corresponding transformed key frames according to the filtered visual label sub-images and the three-dimensional transformation relationship; normalizing the average distance to obtain the mapping quality of the visual mapping; and determining the positioning quality according to the mapping quality of the visual mapping. And calculating the average distance between the converted corresponding key frames, and then carrying out normalization processing on the average distance for evaluating the mapping quality, so as to obtain the positioning quality according to the mapping quality. The pose of the visual mapping key frame in the effective visual label sub-graph and the pose of the laser mapping key frame in the first map corresponding to the effective visual label sub-graph can be obtained; and calculating the average distance between the transformed visual mapping key frame and the corresponding laser mapping key frame according to the pose of the visual mapping key frame, the pose of the laser mapping key frame and the three-dimensional transformation relation. Here, the absolute average error (mean absolute error) between the transformed corresponding key frames is calculated according to the following formula:

wherein, TK_iFor the pose of TagSLAM ith keyframe, LK_iThe pose of the ith keyframe of the laser SLAM. Then, the mae is normalized through the following formula:

wherein q is the mapping quality, and the closer the value is to 1, the smaller the error is.

Specifically, the positioning quality can be determined according to the mapping quality of the visual mapping, the number of visual labels and a threshold value of the number of visual labels. The positioning quality is used for describing the positioning reliability, the visual positioning quality is between 0 and 1, the larger the visual positioning quality is, the more reliable the visual positioning quality is, the positioning quality can be determined by the mapping quality, the number of observed visual labels and a label number threshold, and the calculation formula is as follows: localization quality = mapping quality x number of labels observed/number of labels threshold, wherein the number of labels threshold is configurable.

In an embodiment of the present application, in step S14, a current pose of a device where the vision mapping engine is located is determined; taking the laser mapping as a main system, polling the current pose of equipment where the vision mapping engine is located through process communication, and judging whether the current pose is an effective pose; and when the current pose is an effective pose, preprocessing the current pose according to the positioning quality, and determining visual positioning observation input information according to a preprocessing result. The laser SLAM is positioned through an algorithm, and both a positioning result and an odometer value are used as observation values and input into an extended Kalman filter; and the TagSLAM independently positions according to an observed result, wherein the observed result is that a plurality of visual labels are obtained by decoding the image shot by the camera, a corresponding visual label sub-graph is found according to the label obtained by decoding, the found visual label sub-graph is an effective visual label sub-graph, the pose of the visual label in a world coordinate system is obtained, the current camera pose is obtained through a PNP algorithm, the pose of the current machine is obtained according to the camera external parameters and the pose of the sub-graph, and the pose of the current machine is an effective pose. The laser SLAM serves as a main system, TagSLAM positioning results are polled through interprocess communication, if the positioning results fail, the positioning results are ignored and fused through an extended Kalman filtering algorithm, and final positioning is output; and if the result is successful, preprocessing the TagSLAM positioning information, and inputting the TagSLAM positioning information serving as an observation value into the extended Kalman filter.

Specifically, when both the TagSlAM positioning quality and the laser SLAM positioning quality are lower than the quality threshold, since the TagSlAM positioning quality and the laser SLAM positioning quality provide credible rough positioning information, prompt information that the user environment changes greatly and the positioning quality is low is performed. And when the distance between the laser SLAM and the current pose is large, the current pose is directly corrected by taking the visual positioning of the TagSLAM as the standard, so that the final positioning is obtained. The default value of the visual positioning quality can be 0.6, and when the default value is greater than 0.6, the positioning quality is considered to be high, and the fusion process of the extended Kalman filter is participated in.

In a specific embodiment of the present application, as shown in a schematic diagram of a mapping process shown in fig. 2, the present application uses a TagSLAM mapping and a laser SLAM mapping, the laser SLAM pushes a laser key frame of an obtained map into the TagSLAM mapping, the laser key frame of the TagSLAM and the laser key frame are in one-to-one correspondence according to a timestamp, when the sub-map is finished or the mapping is suspended, a global BundleAdjustment (beam adjustment method) is performed, in robot navigation, a 2D image feature point is re-projected back into a three-dimensional domain, and a position of a true 3D point has a deviation, and the BundleAdjustment is used for minimizing the deviation through an algorithm such as a least square method, so as to obtain an accurate value of a robot pose. The global BundleAdjustment is optimized for a visual label subgraph as a whole, and a local BundleAdjustment exists for a sliding window in the process of drawing establishment. And after the BundleAdjustment is carried out on the sub-map, calculating the three-dimensional transformation relation from the visual label sub-map to the laser SLAM map according to the corresponding relation of the key frames.

In a specific embodiment of the present application, as shown in fig. 3, the method is a schematic flow diagram of a laser and visual positioning fusion method based on an extended kalman filter, where the TagSLAM acquires an image to determine whether an effective tag is observed, and if not, the pose is cleared; if so, calculating the current pose and storing a timestamp, polling the tagSLAM pose by the laser SLAM through inter-process communication, judging whether the polled pose is effective positioning, if not, neglecting not to perform subsequent fusion, if so, judging whether the positioning quality is higher than a threshold value, if not, generating a low positioning alarm and not fusing the current pose, if so, judging whether the distance between the laser SLAM and the current pose is large and the laser positioning quality is low, so as to determine whether the final positioning needs to be directly corrected, and if not, fusing the positioning result obtained by the SLAM positioning and the value of the odometer in an extended Kalman filter to obtain the final positioning result.

In addition, the embodiment of the present application further provides a computer readable medium, on which computer readable instructions are stored, the computer readable instructions being executable by a processor to implement the aforementioned method for laser and visual positioning fusion.

Corresponding to the method described above, the present application also provides a terminal, which includes modules or units capable of executing the method steps described in fig. 1, fig. 2, fig. 3, or various embodiments, and these modules or units may be implemented by hardware, software, or a combination of hardware and software, and this application is not limited thereto. For example, in an embodiment of the present application, there is also provided a laser, visual positioning fusion apparatus, the apparatus including:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method as previously described.

For example, the computer readable instructions, when executed, cause the one or more processors to:

a method of laser, visual-locational fusion, the method comprising:

Fig. 4 is a schematic structural diagram of a laser and visual positioning fusion device provided in another aspect of the present application, the device including: the map fusion method comprises an acquisition device 11, a calculation device 12, a determination device 13 and a fusion device 14, wherein the acquisition device 11 is used for acquiring a first map established by a laser mapping engine and a second map established by a visual mapping engine, and the second map comprises a plurality of visual label subgraphs; the computing device 12 is used for computing the three-dimensional transformation relation between each visual label subgraph and the corresponding first map; the determining device 13 is configured to, when the visual mapping engine is in a positioning mode, screen all the visual tag sub-images, and determine positioning quality according to the screened visual tag sub-images and the three-dimensional transformation relationship; the fusion device 14 is used for determining the visual positioning observation input information according to the positioning quality and fusing the visual positioning observation input information with the laser positioning information and the odometer information.

It should be noted that the content executed by the obtaining device 11, the calculating device 12, the determining device 13 and the fusing device 14 is the same as or corresponding to the content in the above steps S11, S12, S13 and S14, respectively, and for the sake of brevity, the description thereof is omitted.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method of laser, visual-locational fusion, the method comprising:

determining visual positioning observation input information according to the positioning quality, and fusing the visual positioning observation input information with laser positioning information and odometer information;

when the visual mapping engine is in a positioning mode, screening all visual label subgraphs, wherein the screening comprises the following steps:

2. The method of claim 1, wherein computing a three-dimensional transformation relationship of each visual tag sub-graph to a corresponding first map comprises:

3. The method of claim 2, wherein computing a three-dimensional transformation relationship between each visual label subgraph and the corresponding first map according to the correspondence comprises:

4. The method of claim 1, wherein determining the localization quality according to the filtered visual label subgraphs and the three-dimensional transformation relationship comprises:

5. The method of claim 4, wherein determining the localization quality from the mapping quality of the visual mapping comprises:

6. The method of claim 4, wherein determining an average distance between corresponding transformed keyframes from the filtered visual label sub-images and the three-dimensional transformation relationship comprises:

7. The method of claim 1, wherein determining visual positioning observation input information based on the positioning quality comprises:

8. A laser, visual positioning fusion apparatus, comprising:

the determining device is used for screening all the visual label sub-images when the visual mapping engine is in a positioning mode, and determining the positioning quality according to the screened visual label sub-images and the three-dimensional transformation relation;

the fusion device is used for determining visual positioning observation input information according to the positioning quality and fusing the visual positioning observation input information with laser positioning information and odometer information;

wherein the determining means is for:

9. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 7.