CN117392347B

CN117392347B - Map construction method, device, computer equipment and readable storage medium

Info

Publication number: CN117392347B
Application number: CN202311330820.2A
Authority: CN
Inventors: 李天威; 高继扬
Original assignee: Suzhou Yanhaitu Technology Co ltd
Current assignee: Suzhou Yanhaitu Technology Co ltd
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-04-30
Anticipated expiration: 2043-10-13
Also published as: CN117392347A

Abstract

The embodiment of the application discloses a map construction method, a map construction device, computer equipment and a readable storage medium. According to the method, after the sensor signals are acquired, three-dimensional semantic information corresponding to the sensor signals is obtained based on a network model obtained through pre-training, namely, the position information and semantic category of each semantic grid in a three-dimensional space, and then map construction is carried out based on the three-dimensional semantic information and the relative pose among the sensor signals, and it can be understood that after each target in the map is represented as the three-dimensional semantic grid, the three-dimensional semantic grid is represented in a three-dimensional mode, and therefore, even under the conditions of visual angle transformation and the like, each three-dimensional semantic grid does not change, and therefore the scheme can resist visual angle transformation and can improve the robustness of map construction. And when the map is built for each semantic grid, the semantic grid is newly built only when the semantic grid is not built, and when the semantic grid is built, the semantic grid is only checked, so that repeated building is avoided, and the accuracy of map building can be improved.

Description

Map construction method, device, computer equipment and readable storage medium

Technical Field

The present application relates to the field of map construction technologies, and in particular, to a map construction method, a map construction device, a computer device, and a readable storage medium.

Background

In various applications, such as localization, simulation, environmental measurement, environmental monitoring, etc., it is necessary to rely on a constructed map, which may also be referred to as mapping. And (3) mapping, namely reconstructing 3D and semantic information of an environment in three dimensions.

Known mapping methods are usually based on feature point descriptors, such as SIFT (Scale-INVARIANT FEATURE TRANSFORM, scale invariant feature transform) and the like, for three-dimensional reconstruction. However, in this method, when the feature point descriptors are extracted, they are completely dependent on the captured image, and when the angle of the object in the captured image is changed in the case of view angle conversion or the like, the extracted feature point descriptors are also changed, thereby affecting the map construction result. That is, the image is easy to fail in the method, and the robustness of map construction is low. Therefore, how to improve the robustness of map construction becomes a technical problem to be solved.

Disclosure of Invention

The application provides a map construction method, a map construction device, computer equipment and a readable storage medium, so as to improve the robustness of map construction. The specific technical scheme is as follows.

In a first aspect, an embodiment of the present application provides a map construction method, where the method includes:

Acquiring a multi-sensor signal comprising at least: wheel sensor signals and visual signals; the time synchronization among the multiple sensor signals is completed;

Based on at least the wheel sensor signals, estimating the pose of the carrier to obtain the relative pose between every two adjacent frames of sensor signals;

Inputting the visual signals into a pre-trained network model to obtain three-dimensional semantic information corresponding to each frame of the visual signals; the three-dimensional semantic information at least comprises position information and semantic category of each semantic grid in the three-dimensional space; the network model is obtained by training in advance according to a sample visual signal and labeling three-dimensional semantic information corresponding to each frame of the sample visual signal;

Determining each key frame in the multi-sensor signal, determining an interesting semantic grid according to the semantic category of each semantic grid in the key frame for each key frame, and determining whether the interesting semantic grid is established or not according to the position information of the interesting semantic grid and the relative pose between the key frame and the last key frame for each interesting semantic grid;

When the interesting semantic grid is established, determining whether the semantic category of the interesting semantic grid is consistent with the semantic category of the established interesting semantic grid, if so, determining the accumulated observation times of the interesting semantic grid in a key frame, and identifying the interesting semantic grid as mature when the accumulated observation times reach a preset value;

When the interesting semantic grid is not established, establishing the interesting semantic grid in the current subgraph, and identifying the interesting semantic grid as immature;

And judging whether the current subgraph is constructed completely, and if so, adding all mature interesting semantic grids in the current subgraph into a three-dimensional map.

In the embodiment of the application, after the sensor signals are acquired, three-dimensional semantic information corresponding to the sensor signals, namely, the position information and semantic category of each semantic grid in the three-dimensional space can be obtained based on the network model obtained by training in advance, and then, the map construction is carried out based on the three-dimensional semantic information and the relative pose among the sensor signals, and it can be understood that after each target in the map is represented as the three-dimensional semantic grid, as the three-dimensional representation is carried out, each three-dimensional semantic grid does not change even under the conditions of visual angle transformation and the like, so that the scheme can resist the visual angle transformation and improve the robustness of the map construction. When the map is built for each semantic grid in each key frame, whether the semantic grid is built or not is firstly determined, the semantic grid is built only when the semantic grid is not built, and when the semantic grid is built, the semantic grid is checked again only, repeated building is avoided, and therefore accuracy of map building can be improved. When the construction of the current subgraph is completed, only mature semantic grids included in the current subgraph are added to the three-dimensional map, namely, only targets with enough observation times are added to the three-dimensional map, so that the targets which are observed accidentally and suddenly appear in the view angle can be prevented from being added to the map, and the accuracy of the map construction is further improved.

Optionally, the method further comprises:

Determining whether to trigger a loop back based on the key frames or the current subgraph;

when a loop is triggered, determining a first sub-graph corresponding to the first arrival of the carrier at the target position and a second sub-graph corresponding to the second arrival of the carrier at the target position, calculating the offset between the first sub-graph and the second sub-graph, and correcting the three-dimensional map according to the offset.

Optionally, the step of determining whether to trigger loop back based on the key frames or the current sub-graph includes:

for each key frame, calculating a current description vector corresponding to the key frame, searching whether a target description vector with similarity greater than a first preset threshold exists in reference description vectors of the key frames of the three-dimensional map, and if so, determining a trigger loop; or (b)

Detecting whether the current sub-graph overlaps any historical sub-graph or not, or determining that the distance between the target in the current sub-graph and the target of any historical sub-graph is smaller than a second preset threshold value if the current sub-graph overlaps any historical sub-graph, and triggering a loop if the current sub-graph overlaps the target of any historical sub-graph.

Optionally, the step of calculating an offset between the first sub-graph and the second sub-graph and correcting the three-dimensional map according to the offset includes:

acquiring the pose x of each key frame and the relative pose delta x between each key frame and the previous key frame; x comprises a position of 3 degrees of freedom and a rotation of 3 degrees of freedom;

Calculating an offset Δx _t-α between the first sub-graph and the second sub-graph; wherein alpha is the time when the carrier comes from the target position for the first time, and t is the time when the carrier comes from the target position for the second time;

The following nonlinear optimizers were constructed to obtain the correction for each x when the error e _ij was minimized:

Optionally, the three-dimensional semantic information further includes: attribute information of each semantic grid, the attribute information including at least one of: the material, softness and hardness, has an effect on the carrier or has no effect on the carrier.

Optionally, the training process of the network model includes:

Constructing an initial network;

acquiring sample visual signals and labeling three-dimensional semantic information corresponding to each sample visual signal; the labeling three-dimensional semantic information is obtained based on a preset labeling device or manual labeling;

Inputting the sample visual signals and the corresponding marked three-dimensional semantic information into the initial network to obtain predicted three-dimensional semantic information corresponding to each sample visual signal, and carrying out parameter adjustment on the initial network according to the predicted three-dimensional semantic information and the marked three-dimensional semantic information to obtain an initial network with identification accuracy meeting the condition as the network model.

Optionally, the method further comprises:

And after the current subgraph is constructed, extracting the description information of each target included in the current subgraph, and inserting the description information into a map manager in an octree mode.

In a second aspect, an embodiment of the present application provides a map construction apparatus, including:

the signal acquisition module is used for acquiring a multi-sensor signal, and the multi-sensor signal at least comprises: wheel sensor signals and visual signals; the time synchronization among the multiple sensor signals is completed;

The pose estimation module is used for estimating the pose of the carrier based on at least the wheel sensor signals to obtain the relative pose between every two adjacent frames of sensor signals;

The semantic determining module is used for inputting the visual signals into a pre-trained network model to obtain three-dimensional semantic information corresponding to each frame of the visual signals; the three-dimensional semantic information at least comprises position information and semantic category of each semantic grid in the three-dimensional space; the network model is obtained by training in advance according to a sample visual signal and labeling three-dimensional semantic information corresponding to each frame of the sample visual signal;

The grid judging module is used for determining each key frame in the multi-sensor signal, determining interesting semantic grids according to the semantic category of each semantic grid in the key frame for each key frame, and determining whether the interesting semantic grid is established or not according to the position information of the interesting semantic grid and the relative pose between the key frame and the last key frame for each interesting semantic grid;

The grid observation module is used for determining whether the semantic category of the semantic grid of interest is consistent with the semantic category of the established semantic grid of interest when the grid judgment module determines that the semantic grid of interest is established, if so, determining the accumulated observation times of the semantic grid of interest in a key frame, and marking the semantic grid of interest as mature when the accumulated observation times reach a preset value;

The grid establishing module is used for establishing the interesting semantic grid in the current subgraph and marking the interesting semantic grid as immature when the grid judging module determines that the interesting semantic grid is not established;

and the map construction module is used for judging whether the current subgraph is constructed, and if so, adding all mature interesting semantic grids in the current subgraph into a three-dimensional map.

Optionally, the apparatus further includes:

the loop detection module is used for determining whether to trigger loop sending or not based on the key frames or the current subgraph;

And the map correction module is used for determining a first sub-graph corresponding to the first arrival of the carrier at the target position and a second sub-graph corresponding to the second arrival of the carrier at the target position when the loop is triggered, calculating the offset between the first sub-graph and the second sub-graph, and correcting the three-dimensional map according to the offset.

Optionally, the loop detection module is specifically configured to:

Optionally, the map modification module is specifically configured to:

Optionally, the apparatus further includes: a model training module; the model training module is specifically configured to:

Constructing an initial network;

Optionally, the apparatus further includes:

And the information extraction module is used for extracting the description information of each target included in the current subgraph after the current subgraph is constructed, and inserting the description information into the map manager in an octree mode.

In a third aspect, an embodiment of the present application provides a computer apparatus, including: a memory and a processor, the memory and the processor coupled;

The memory is used for storing one or more computer instructions;

the processor is configured to execute the one or more computer instructions to implement the mapping method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon one or more computer instructions executable by a processor to implement a map construction method as described in the first aspect above.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the map construction method of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the application. Other figures may be derived from these figures without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a schematic flow chart of a map construction method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an original image and corresponding three-dimensional semantic information in an embodiment of the present application;

FIG. 3 is a schematic diagram of any mapping result according to an embodiment of the present application;

FIG. 4 is a block diagram of a mapping system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a map building device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present application and the accompanying drawings are intended to cover non-exclusive inclusions. A process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may alternatively include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a map construction method, a map construction device, computer equipment and a readable storage medium, which can improve the robustness of map construction.

Fig. 1 is a flow chart of a map construction method according to an embodiment of the present application, where the method may be applied to an electronic device, and the method includes the following steps:

s110: acquiring a multi-sensor signal, the multi-sensor signal comprising at least: wheel sensor signals and visual signals; time synchronization has been completed between the multiple sensor signals.

In the embodiment of the application, the data required for map construction can be acquired by driving a carrier in the area where the map is required to be constructed. The carrier may be, for example, an autopilot vehicle, a robot, etc., which may be all possible, and the embodiment of the present application does not limit the specific type of the carrier. In particular, various types of sensors may be mounted in the carrier, for example, at least including: wheel sensor and vision sensor, so that wheel sensor signals and vision signals can be acquired. Among them, the wheel sensor may include a wheel speed sensor and a wheel angle sensor, and the visual sensor may include a camera, which are all possible.

Optionally, the sensor may further include: IMU (inertial measurement unit inertial measurement unit), lidar, GPS (Global Positioning System ), etc. And, the number of the above-mentioned sensors mounted in the carrier may be one or more. For example, the visual sensor may be a camera, or a camera group, respectively mounted at different positions of the carrier, so as to obtain environmental information of different directions during the running process of the carrier.

In the running process of the carrier, the sensor signals acquired by the sensors can be stored in a fixed storage space, and when the map is constructed, the electronic equipment can acquire the multi-sensor signals from the corresponding storage space. It will be appreciated that the sensors mounted in the carrier may include a plurality of types, and the types of sensor signals acquired by the sensors also correspond to a plurality of types, and the electronic device may acquire all or part of the types of sensor signals as required when constructing the map.

When the carrier acquires various sensor signals, time synchronization needs to be performed on each sensor first, so that the acquired various sensor signals can be guaranteed to be time synchronized, for example, millisecond-level time synchronization can be performed on each sensor signal in order to guarantee the accuracy of image construction.

S120: and estimating the pose of the carrier based on at least the wheel sensor signals to obtain the relative pose between every two adjacent frames of sensor signals.

After the multi-sensor signals are obtained, the electronic equipment can estimate the pose of the multi-sensor signals, and the relative pose between frames is obtained. For example, the electronic device may calculate the relative pose between every two adjacent frames of sensor signals based solely on the wheel sensor signals, or may also incorporate visual signals, such as images, lidar, etc., to estimate the pose of the carrier based on the various sensor signals. The carrier is subjected to pose estimation by utilizing various sensor signals, so that the accuracy of pose estimation can be improved.

In one implementation, the carrier may be pose estimated by a multi-sensor odometer, with its input being the multi-sensor signal that completes the time synchronization and its output being the relative pose between frames, where the relative pose may include a 3 degree of freedom position and a 3 degree of freedom rotation.

S130: inputting the visual signals into a pre-trained network model to obtain three-dimensional semantic information corresponding to each frame of visual signals; the three-dimensional semantic information at least comprises position information and semantic category of each semantic grid in the three-dimensional space; the network model is obtained by training according to the sample visual signals and the labeling three-dimensional semantic information corresponding to each frame of sample visual signals in advance.

In the embodiment of the application, in order to improve the robustness of map construction, the view transformation is resisted, and the map construction can be performed based on a semantic grid (also called a semantic occupied grid). Semantic occupancy grid: a local space representation method is that a M x N x H space can be divided into M x N x H arranged lattices, each lattice having corresponding semantic information.

Specifically, a network model may be trained in advance, where an input of the network model may be a visual signal, such as an image, and/or a laser radar point cloud, and the output is three-dimensional semantic information corresponding to the visual signal, that is, position information and semantic category of each semantic grid in a three-dimensional space corresponding to the visual signal. The semantic category may be one of known object categories (hundreds), or an unknown category. For example, it may include: street lamp poles, parking space lines, sidewalks, buildings and the like. Further, the network model may also output attribute information for each semantic grid, including but not limited to materials, hardness, impact on the carrier, or no impact on the carrier.

The training process of the network model may include: constructing an initial network; acquiring sample visual signals and labeling three-dimensional semantic information corresponding to each sample visual signal; labeling three-dimensional semantic information is obtained based on a preset labeling device or manual labeling; inputting the sample visual signals and the corresponding marked three-dimensional semantic information into an initial network to obtain predicted three-dimensional semantic information corresponding to each sample visual signal, and carrying out parameter adjustment on the initial network according to the predicted three-dimensional semantic information and the marked three-dimensional semantic information to obtain an initial network with recognition accuracy meeting the conditions, wherein the initial network is used as a network model.

The structure of the initial network can be any neural network structure as long as feature extraction and classification can be completed. The sample visual signal may be, for example, an image, and/or a lidar point cloud. It can be appreciated that the sample visual signal needs to be consistent with the visual signal used in the mapping process; for example, when the sample visual signal is an image, in the mapping process, the visual signal input into the network model may be only an image; when the sample visual signal is a laser radar point cloud, in the mapping process, the visual signal input into the network model can be only the laser radar point cloud, and other conditions are similar and are not listed here.

The labeling three-dimensional semantic information can be obtained based on a preset labeler or manual labeling. Specifically, when the visual signal is an image, the visual signal can be marked on the image directly by a person, and semantic information corresponding to each three-dimensional semantic grid is marked; or the image can be input into the annotator, and the annotator can output the three-dimensional semantic information corresponding to the image.

Fig. 2 shows a schematic representation of an original image and corresponding three-dimensional semantic information. As shown in fig. 2, in the three-dimensional semantic information corresponding to the original image, different pixel values may be used to represent different semantic categories, and the arrow in the figure shows the correspondence between a part of the target in the image and the target position in the three-dimensional semantic information.

S140: determining each key frame in the multi-sensor signal, determining a semantic grid of interest according to the semantic category of each semantic grid in each key frame for each key frame, and determining whether the semantic grid of interest is established or not according to the position information of the semantic grid of interest and the relative pose between the key frame and the last key frame for each semantic grid of interest, if so, executing step S150, and if not, executing step S160.

After the relative pose between frames and the three-dimensional semantic information of each sensor signal are obtained, the interested semantic grid can be subjected to three-dimensional reconstruction.

It will be appreciated that the period in which the sensor acquires the sensor signal is typically relatively short, typically in the order of milliseconds, in which case the difference between adjacent sensor signals will be relatively small. In the embodiment of the application, in order to improve the efficiency of map construction and improve the execution speed of an algorithm, some key frames can be firstly determined in all sensor signals, and then map construction can be carried out only according to the key frames. For example, a key frame may be determined according to a set selection rule, such as every 5 frames, or may be determined according to other selection rules, as well.

Also, in the mapping process, not all objects appearing in the three-dimensional space need to be built into the map. It will be appreciated that, in general, we focus on the inherent objects in the environment, such as buildings, lampposts, etc., and the position, shape, etc. attribute information of these objects will not generally change over time, and these objects are important points of attention when using the map later; the moving targets such as vehicles, pedestrians and the like in the environment have the contingency, and the positions and the states of the targets are not fixed, so that the targets are not targets which need to be focused when people construct a map.

Therefore, in the embodiment of the application, when the map is constructed, firstly, for each key frame, the interesting semantic grids, namely, the semantic grids of certain fixed categories, can be determined according to the semantic category of each semantic grid in the key frame. The semantic grid of interest is an inherent target in the environment, which types of semantic grids of interest need to be determined as the semantic grid of interest can be preset, and the embodiment of the application is not limited in detail.

After the semantic grid of interest is determined, the semantic grid of interest may be built into the map. As described above, there is a portion of overlap between the sensor signals acquired by the carrier, and thus, there may be a case where a portion of the semantic grid of interest has been established when mapping the current keyframe. In the embodiment of the application, for each interesting semantic grid, whether the interesting semantic grid is established or not can be determined according to the position information of the interesting semantic grid and the relative pose between the key frame and the last key frame. That is, for each semantic grid of interest, it may be determined whether the semantic grid of interest has occupied a corresponding position in the current subgraph based on the position information of the semantic grid of interest and the relative pose between the key frame and the previous key frame, and if so, it indicates that the semantic grid of interest has been established.

S150: determining whether the semantic category of the interesting semantic grid is consistent with the semantic category of the established interesting semantic grid, if so, determining the accumulated observation times of the interesting semantic grid in the key frame, and identifying the interesting semantic grid as mature when the accumulated observation times reach a preset value.

When it is determined that the semantic grid of interest has been established, in order to avoid generating duplicate semantic grids, the semantic grid of interest is no longer established, and the established semantic grid is checked based only on the current key frame to confirm whether it is accurate. Specifically, it may be determined whether the semantic category of the semantic grid of interest is consistent with the semantic category of the established semantic grid of interest, and if so, it indicates that the two observations are consistent, in which case the cumulative number of observations of the semantic grid of interest in the keyframe, that is, the number of observations of the semantic grid of interest, may be further determined. When the cumulative number of observations reaches a preset value (e.g., 5, 6, 8, etc.), the semantic grid of interest is identified as mature.

The same semantic grid of interest appears in multiple key frames and its observations of location and category are consistent, then the semantic grid can be added to the map as a mature semantic grid. When the observed results are confirmed to be consistent, but the accumulated observed times do not reach the preset value, the accumulated observed times can be recorded, but the identification information of the interesting semantic grid is still immature.

S160: the semantic grid of interest is established in the current subgraph and identified as immature.

When it is determined that the semantic grid of interest is not established, the semantic grid of interest may be established in the current subgraph and identified as immature. That is, the semantic grid of interest is observed only once, and cannot be added to the map as a mature target, requiring multiple subsequent checks.

S170: and judging whether the current subgraph is constructed completely, and if so, adding all mature interesting semantic grids in the current subgraph into the three-dimensional map.

For example, whether the construction is completed can be determined according to the area size of the current sub-image, the number of included targets, the integrity of constructed targets and the like, when the construction of the current sub-image is completed, only all mature interesting semantic grids in the current sub-image are added to the three-dimensional map, and immature interesting semantic grids are not added to the three-dimensional map.

Fig. 3 shows a schematic diagram of any mapping result. As can be seen, the constructed targets include: lane lines, marking arrows, parking space lines, posts, sidewalks, and other inherent objects.

In the embodiment of the application, because the method errors of the odometer are accumulated, under the condition of lacking global observation or poor global observation precision, the errors can be accumulated gradually along with the operation of the system. Therefore, in order to improve the accuracy of map construction, loop detection can be performed and global optimization can be performed. Specifically, when the carrier returns to the same place, the system can recognize that the carrier is a once-coming area, then build constraints between the data, and then make the map more accurate in a global optimization mode.

In one implementation, the loop detection and global optimization process may include the steps of:

step one: based on each key frame or current sub-graph, it is determined whether to trigger a loop.

That is, it may be determined whether the carrier is returned to the same location based on each key frame or current sub-graph. For example, for each key frame, a current description vector corresponding to the key frame may be calculated, and in a reference description vector of the key frame of the three-dimensional map, whether a target description vector with similarity to the current description vector being greater than a first preset threshold exists is searched, and if so, a trigger loop is determined. Or whether the current sub-graph overlaps any historical sub-graph or not can be detected, or the distance between the target in the current sub-graph and the target of any historical sub-graph is smaller than a second preset threshold value, and if so, the loop triggering is determined.

The similarity of the description vectors of the two key frames is higher, or the current sub-image and the historical sub-image are overlapped, or the target distance between the target in the current sub-image and any historical sub-image is smaller, the fact that the carrier possibly returns to the same place can be indicated, under the condition, semantic grids of the same target possibly appear in the key frames, and if global optimization is not performed, conditions such as double images possibly appear, so that the map construction accuracy is affected.

The first preset threshold may be set according to requirements, for example, may be 80%, 90%, 95%, etc., which may be all possible, and the embodiment of the present application does not limit the specific value thereof. The second preset threshold may be set according to requirements, for example, may be set as a distance between two sub-graphs multiplied by a divergence rate, where the divergence rate may be preset to, for example, 0.5%,1%,1.5%, etc., and the divergence rate corresponds to an estimation of the accuracy of the odometer, which may be all possible.

Step two: when the loop is triggered, determining a first sub-image corresponding to the first arrival of the carrier at the target position and a second sub-image corresponding to the second arrival of the carrier at the target position, calculating the offset between the first sub-image and the second sub-image, and correcting the three-dimensional map according to the offset.

Specifically, the pose x of each key frame and the relative pose Δx between each key frame and the previous key frame may be obtained first; x comprises a position of 3 degrees of freedom and a rotation of 3 degrees of freedom; then calculating an offset delta x _t-α between the first sub-graph and the second sub-graph; wherein alpha is the moment when the carrier first comes from the target position, and t is the moment when the carrier second comes from the target position; finally, the following nonlinear optimizers are constructed to obtain the correction of each x when the error e _ij is minimized:

After the correction amount of each x is obtained, the position of each target in the map can be corrected according to the correction amount, and the accuracy of map construction is improved.

As an implementation manner of the embodiment of the present application, after the current sub-graph is constructed, description information of each target included in the current sub-graph may be extracted, and the description information may be inserted into the map manager in an octree manner. Wherein, the above description information may include: information on the target class, shape, etc.

The map manager has two functions, firstly, the density of map elements can be determined when the map elements are output by the local map building module, and the efficient addition, deletion and modification of the map elements is supported. Assuming that the size of the voxels is set to 2cm x 2cm, there will be only one map element within this cube of side length 2 cm. Second, in the loop detection and optimization process, after detecting that the loop information optimizes the pose of the key frame, the map manager updates all the positions of the corresponding occupied grids and filters out the merged redundant occupied grids in an octree manner.

Fig. 4 is a schematic diagram of a mapping system according to an embodiment of the present application, and in detail, a mapping scheme provided by an embodiment of the present application is described below with reference to fig. 4.

The mapping scheme is built based on SLAM technology. SLAM, point-of-care localization and mapping (Simultaneous localization AND MAPPING, SLAM) techniques. The construction based on SLAM makes the map construction system capable of constructing map under the condition of lacking global position information such as GPS.

As shown in fig. 4, the system frame of the present solution is divided into four blocks: the system comprises a multi-sensor odometer, a semantic occupied grid sensing module, a local map building and loop-back detection global optimization module, and a final map building result is converged in a map manager.

The input of the multi-sensor odometer is a multi-sensor signal for completing millisecond synchronization, the main sensor is a wheel sensor, and sensors such as single-path or multi-path image data, laser radar, inertial navigation and the like can be added. The method has the main functions of estimating the pose by utilizing multi-sensor fusion and then outputting the relative pose among frames.

The input of the semantic occupied grid sensing module is a multi-sensor signal, and the semantic occupied grid is output.

The input of the semantic occupancy grid sensing module is a multi-sensing group signal, which can be a multi-frame camera image or (and) a laser radar point cloud. The module gives the 3D position information of each grid and the corresponding semantic information under the current time through the calculation of the neural network. Specifically, for each spatial grid, its semantics may be one of the known object categories (hundreds), or an unknown category. Further, the module may also calculate properties of each spatial grid, including but not limited to materials, hardness, presence/absence of impact on the robot, etc.

After the input accurate pose of the semantic grid at each moment and the corresponding moment are provided, the local mapping module carries out three-dimensional reconstruction on the semantic grid interested by the local mapping module. We take the ground-library scenario as an example to describe the scheme function, in particular, the scheme can work in any scenario where the semantic occupancy grid-aware module works, including indoor, outdoor, ground-library, public roads, etc.

When a frame is determined to be a key frame, the algorithm determines whether each semantic occupancy grid is built. If not, the system generates a new occupancy grid to occupy the corresponding location in the map. This determination is made each time an occupancy grid is generated from a new key frame, avoiding the generation of duplicate occupancy grids in the same place. Each occupancy grid calculates the category and whether it is mature based on observations of successive frames. For example, assume that an occupancy grid is determined to be a carport line category in a keyframe, and its corresponding occupancy grid is also established. The occupied grid calculates whether the observations of the corresponding positions of other frames nearby are consistent, and the definition of the consistency covers two kinds of consistency, namely whether the categories are consistent; and secondly, whether the observation of the relative position of the occupied grid corresponding to other frames is within a certain range or not. If both items are satisfied, the occupancy grid is determined to be mature and added to the map.

Because the method errors of the odometer are cumulative, in the absence of global observations or in the absence of global observations with poor accuracy, the errors can accumulate gradually as the system operates. Due to the effect of the loop detection module, when the carrier returns to the same place, the system can recognize that the carrier is a once-coming area, then the constraint is established between the data, and then the map is more accurate in a global optimization mode.

Loop detection may take the form of visual loop detection, which converts the image of each frame into a bag of words description vector based on visual features, which attempts to find similar description vectors when a new key frame is inserted; alternatively, when the algorithm detects that the current sub-graph and other historical sub-graphs have overlapping portions, or that the distance between the map object in the current sub-graph and the map object in the other sub-graphs is less than a threshold, the system considers that a loop may occur, and then matches the current sub-graph with all the sub-graphs that may have overlapping portions.

The two methods can complete the loop detection function, and after the matching of the two new and old sub-graphs is completed, the algorithm calculates the offset between the two sub-graphs and performs global optimization to correct the related track and map elements in the map.

The map manager manages all map elements in a hierarchical form; during the mapping process, the odometer module inserts key frames into the local mapping module, and each key frame is used to generate an occupancy grid and insert the occupancy grid into the sub-graph. When a subgraph is judged to be built, the subgraph can extract object-level description information, such as map targets like lane lines, arrows, sidewalks and the like, and corresponding information can be inserted into a map manager.

Fig. 5 shows a schematic structural diagram of a map building apparatus according to an embodiment of the present application, where the apparatus includes:

a signal acquisition module 510, configured to acquire a multi-sensor signal, where the multi-sensor signal includes at least: wheel sensor signals and visual signals; the time synchronization among the multiple sensor signals is completed;

the pose estimation module 520 is configured to perform pose estimation on the carrier based at least on the wheel sensor signals, to obtain a relative pose between every two adjacent frames of sensor signals;

The semantic determining module 530 is configured to input the visual signal into a pre-trained network model, so as to obtain three-dimensional semantic information corresponding to the visual signal of each frame; the three-dimensional semantic information at least comprises position information and semantic category of each semantic grid in the three-dimensional space; the network model is obtained by training in advance according to a sample visual signal and labeling three-dimensional semantic information corresponding to each frame of the sample visual signal;

A grid judging module 540, configured to determine each key frame in the multi-sensor signal, determine, for each key frame, a semantic grid of interest according to a semantic category of each semantic grid in the key frame, and determine, for each semantic grid of interest, whether the semantic grid of interest has been established according to position information of the semantic grid of interest and a relative pose between the key frame and a previous key frame;

A grid observation module 550, configured to determine, when the grid determination module 540 determines that the semantic grid of interest has been established, whether the semantic category of the semantic grid of interest is consistent with the semantic category of the established semantic grid of interest, and if so, determine a cumulative number of observations of the semantic grid of interest in a keyframe, and identify the semantic grid of interest as mature when the cumulative number of observations reaches a preset value;

A grid establishment module 560 for establishing the semantic grid of interest in the current subgraph and identifying the semantic grid of interest as immature when the grid determination module 540 determines that the semantic grid of interest is not established;

The map construction module 570 is configured to determine whether the current sub-graph is constructed, and if so, add all mature semantic grids of interest in the current sub-graph to the three-dimensional map.

Optionally, the apparatus further includes:

Optionally, the loop detection module is specifically configured to:

Optionally, the map modification module is specifically configured to:

Constructing an initial network;

Optionally, the apparatus further includes:

The device embodiment corresponds to the method embodiment, and has the same technical effects as the method embodiment, and the specific description refers to the method embodiment. The apparatus embodiments are based on the method embodiments, and specific descriptions may be referred to in the method embodiment section, which is not repeated herein.

Next, referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device includes:

One or more processors 40;

the processor 40 is coupled to a storage means 41, which storage means 41 is adapted to store one or more programs,

When the one or more programs are executed by the one or more processors 40, the electronic device is caused to implement the technical solution of a map construction method as described in fig. 1 to 4.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the technical scheme of a map construction method as described in fig. 1 to 4.

The present application provides a computer program product comprising a computer program which, when executed by a processor, implements the technical solution of a map construction method as described in fig. 1 to 4.

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the application.

Those of ordinary skill in the art will appreciate that: the modules in the apparatus of the embodiments may be distributed in the apparatus of the embodiments according to the description of the embodiments, or may be located in one or more apparatuses different from the present embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of map construction, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 2, wherein the step of determining whether to trigger a loop based on the key frames or the current sub-graph comprises:

4. The method of claim 2, wherein the step of calculating an offset between the first sub-graph and the second sub-graph and correcting the three-dimensional map based on the offset comprises:

5. The method of any of claims 1-4, wherein the three-dimensional semantic information further comprises: attribute information of each semantic grid, the attribute information including at least one of: the material, softness and hardness, has an effect on the carrier or has no effect on the carrier.

6. The method of any of claims 1-4, wherein the training process of the network model comprises:

Constructing an initial network;

7. The method according to any one of claims 1-4, further comprising:

8. A map construction apparatus, characterized in that the apparatus comprises:

9. The apparatus of claim 8, wherein the apparatus further comprises:

10. The device according to claim 9, wherein the loop detection module is specifically configured to:

11. The apparatus of claim 9, wherein the map modification module is specifically configured to:

12. The apparatus of any of claims 8-11, wherein the three-dimensional semantic information further comprises: attribute information of each semantic grid, the attribute information including at least one of: the material, softness and hardness, has an effect on the carrier or has no effect on the carrier.

13. The apparatus according to any one of claims 8-11, wherein the apparatus further comprises: a model training module; the model training module is specifically configured to:

Constructing an initial network;

14. The apparatus according to any one of claims 8-11, wherein the apparatus further comprises:

15. A computer device, comprising: a memory and a processor, the memory and the processor coupled;

The memory is used for storing one or more computer instructions;

The processor is configured to execute the one or more computer instructions to implement the mapping method of any of claims 1 to 7.

16. A readable storage medium having stored thereon one or more computer instructions, the instructions being executable by a processor to implement the mapping method of any of claims 1 to 7.