CN110853085B

CN110853085B - Semantic SLAM-based mapping method and device and electronic equipment

Info

Publication number: CN110853085B
Application number: CN201810955924.5A
Authority: CN
Inventors: 杨帅
Original assignee: Shenzhen Horizon Robotics Science and Technology Co Ltd
Current assignee: Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2022-08-19
Anticipated expiration: 2038-08-21
Also published as: CN110853085A

Abstract

A drawing establishing method and device based on semantic SLAM and electronic equipment are disclosed. According to one embodiment, a mapping method based on semantic SLAM can include: obtaining a current frame image through a camera; processing the current frame image by using a semantic segmentation algorithm to obtain semantic information of the current frame image; optimizing a camera pose corresponding to the current frame by utilizing a reference frame in a mode of minimizing a reconstructed semantic error; and determining, based on the optimized camera pose, depth information of the immature point on the reference frame without depth information and the corresponding point on the current frame to form a mature point with depth information. Therefore, the camera pose corresponding to the image frame can be optimized based on the semantic information, and accurate map point cloud can be obtained more stably.

Description

Semantic SLAM-based mapping method and device and electronic equipment

Technical Field

The present application relates to the field of image processing, and more particularly, to a semantic SLAM-based mapping method and apparatus, an electronic device, and a mobile apparatus including the electronic device.

Background

The high-precision map is used as an important component of the unmanned technology and is important to the aspects of navigation, positioning, control, safety guarantee and the like of the vehicle. For this reason, a high-precision map construction method which is low in cost and easy to update is required.

In recent years, with the improvement of the precision of the hard algorithm and the improvement of the hardware computing capability, in order to obtain a dense or semi-dense map, the SLAM (Simultaneous localization and mapping) is applied to the construction of a high-precision map in a larger scale. Generally, the SLAM mapping method can be largely classified into a feature point method and a direct method according to whether the feature points are used. The feature point method requires optimization of camera pose and feature point coordinates by feature point matching, minimizing reprojection errors. The direct rule is based on the assumption that luminosity is not changed, namely, the luminosity values of pixel points of the same space point in continuous frame images are the same, and the camera attitude and the characteristic point coordinates are optimized by minimizing luminosity errors. The latter does not need to detect matching feature points, and can obtain dense or semi-dense maps, so that the method is more widely applied to construction of high-precision maps. However, the assumption of constant illuminance in the direct method is not always true due to factors such as uneven light source, reflection of light from an object, and variation in exposure time of a camera.

Therefore, there is still a need for an improved SLAM-based mapping scheme that enables further improved mapping accuracy.

Disclosure of Invention

The present application is proposed to solve at least one of the above and other technical problems. The embodiment of the application provides a drawing establishing method and device based on semantic SLAM, electronic equipment and a movable device comprising the electronic equipment. In the embodiment of the invention, the camera pose is optimized based on the semantic information, so that accurate map point cloud can be obtained more stably.

According to one aspect of the application, a mapping method based on semantic SLAM is provided, which comprises the following steps: obtaining a current frame image through a camera; processing the current frame image by using a semantic segmentation algorithm to obtain semantic information of the current frame image; optimizing a camera pose corresponding to the current frame by utilizing a reference frame in a mode of minimizing a reconstructed semantic error; and determining, based on the optimized camera pose, depth information for an immature point on the reference frame that does not have depth information and a corresponding point on the current frame to form a mature point with depth information.

In the method for building a graph based on the semantic SLAM, the method further comprises the following steps: determining whether the current frame is a key frame; discarding the current frame if the current frame is not a key frame; and if the current frame is a key frame, saving the current frame in the constructed map, wherein the reference frame is a previous key frame.

In the above mapping method based on semantic SLAM, determining whether the current frame is a key frame includes: determining whether the current frame is a key frame based on semantic differences between the current frame and the reference frame; alternatively, it is determined whether the current frame is a key frame based on the pose change of the camera.

In the mapping method based on the semantic SLAM, the method further comprises the following steps: selecting a plurality of consecutive key frames using a sliding window; and optimizing the plurality of consecutive keyframes using a non-linear optimization method to determine more accurate camera pose and depth information.

In the mapping method based on the semantic SLAM, the method further comprises the following steps: detecting and generating a new immature point on the current frame image.

In the mapping method based on the semantic SLAM, optimizing the camera pose corresponding to the current frame by using the reference frame in a manner of minimizing a reconstructed semantic error includes: determining a prior camera pose corresponding to the current frame; calculating a reprojected point of a point on the reference frame on the current frame based on the a priori camera poses; and adjusting the a priori camera pose to minimize a difference between semantics of a point on the current frame corresponding to a point on the reference frame and semantics of a re-projected point on the current frame.

In the mapping method based on the semantic SLAM, the determining the prior camera pose corresponding to the current frame includes: determining a camera pose corresponding to the current frame as the a priori camera pose based on the camera pose corresponding to the reference frame and camera motion information provided by an in-vehicle sensor.

In the graph building method based on the semantic SLAM, the semantic information comprises probability values of pixel points belonging to various categories.

According to another aspect of the present application, there is provided a mapping apparatus based on semantic SLAM, including: the semantic segmentation unit is used for processing the current frame image to obtain semantic information of the current frame image; the camera pose optimization unit is used for optimizing the camera pose corresponding to the current frame by utilizing the reference frame in a mode of minimizing a reconstructed semantic error; and a maturing unit for determining an immature point on the reference frame having no depth information and depth information of a corresponding point on the current frame based on the optimized camera pose to form a mature point having depth information.

In the mapping apparatus based on semantic SLAM, the mapping apparatus further includes: a key frame identification unit for determining whether the current frame is a key frame and saving the current frame in the constructed map in response to the current frame being a key frame.

In the above semantic SLAM-based mapping apparatus, the determining, by the key frame identification unit, whether the current frame is a key frame includes: determining whether the current frame is a key frame based on semantic differences between the current frame and the reference frame; alternatively, it is determined whether the current frame is a key frame based on the pose change of the camera.

In the mapping apparatus based on semantic SLAM, the mapping apparatus further includes: a sliding window optimization unit to select a plurality of consecutive keyframes using a sliding window and optimize the plurality of consecutive keyframes using a non-linear optimization method to determine more accurate camera pose and depth information and to detect and generate new immature points on the current frame image.

In the mapping apparatus based on semantic SLAM, the camera pose optimization unit is configured to: determining a prior camera pose corresponding to the current frame; calculating a reprojection point of a point on the reference frame on the current frame based on the a priori camera pose; and adjusting the prior camera pose to minimize a difference between semantics of a point on the current frame corresponding to the point on the reference frame and semantics of a re-projected point on the current frame.

In the semantic SLAM-based mapping apparatus, the determining, by the camera pose optimization unit, the prior camera pose corresponding to the current frame includes: and determining the camera pose corresponding to the current frame as the prior camera pose based on the camera pose corresponding to the reference frame and the camera motion information provided by the vehicle-mounted sensor.

In the mapping device based on the semantic SLAM, the semantic information includes probability values of the pixel points belonging to various categories.

According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the semantic SLAM-based mapping method as described above.

According to a further aspect of the application, there is provided a mobile apparatus comprising an electronic device as described above.

According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the semantic SLAM-based mapping method as described above.

Compared with the prior art, by adopting the semantic SLAM-based mapping method and device, the electronic equipment and the movable device comprising the electronic equipment, the current frame image can be obtained through the camera; processing the current frame image by using a semantic segmentation algorithm to obtain semantic information of the current frame image; optimizing a camera pose corresponding to the current frame by utilizing a reference frame in a mode of minimizing a reconstructed semantic error; and determining, based on the optimized camera pose, depth information for an immature point on the reference frame that does not have depth information and a corresponding point on the current frame to form a mature point having depth information. Therefore, the camera pose corresponding to the image frame can be optimized based on the semantic information, errors caused by factors such as luminosity change and the like are avoided, and accurate map point clouds can be obtained more stably.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 illustrates a flow chart of a semantic SLAM based mapping method according to an embodiment of the present application;

FIG. 2 illustrates a schematic diagram of optimizing camera pose by minimizing reconstructed semantic errors according to an embodiment of the application;

FIG. 3 is a schematic diagram illustrating an implementation of a semantic SLAM-based mapping method by an in-vehicle system according to an embodiment of the present application;

FIG. 4 illustrates a block diagram of a semantic SLAM-based mapping apparatus according to an embodiment of the present application;

FIG. 5 illustrates a block diagram of an electronic device in accordance with an embodiment of the application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

As described above, SLAM plays an increasingly large role in the construction of high-precision maps. Conventionally, SLAM can be classified into a feature point method and a direct method according to whether or not feature points are used. The former needs to optimize the camera pose and feature point coordinates by feature point matching, minimizing reprojection errors. The latter is based on the assumption that luminosity is unchanged, that is, the luminosity values of pixel points of the same space point in continuous frame images are the same, and the camera pose and the feature point coordinates are optimized by minimizing luminosity errors. The latter is more widely used in the construction of high-precision maps because the latter does not need to detect matching feature points and can obtain dense or semi-dense maps. However, the assumption of constant light intensity in the direct method is not always true due to the influence of unevenness of the light source, reflection of an object, variation in the exposure time of the camera, and the like.

Aiming at the technical problem, the basic concept of the application is to optimize the camera pose and the map point coordinate by utilizing the semantic information and minimizing the semantic error, namely the difference value of the semantic categories of the projection pixels of the space points on different images, so as to construct the high-precision map with the semantic information.

In recent years, with the development of deep learning algorithm, the semantic perception effect is greatly improved. Compared with the photometric value on the original image, the semantic perception result contains higher-level information and is more robust to external disturbance. Therefore, the anti-disturbance capability of the system can be further improved by using a higher-level semantic index instead of the traditional luminosity index.

Specifically, the semantic SLAM-based mapping method and device provided by the embodiment of the application can obtain a current frame image through a camera; processing the current frame image by using a semantic segmentation algorithm to obtain semantic information of the current frame image; optimizing a camera pose corresponding to the current frame by utilizing a reference frame in a mode of minimizing a reconstructed semantic error; and determining, based on the optimized camera pose, depth information of the immature point on the reference frame without depth information and the corresponding point on the current frame to form a mature point with depth information. Therefore, the camera pose corresponding to the image frame can be optimized based on the semantic information, and the acquired semi-dense map point cloud can be more accurate and stronger in anti-jamming capability by using higher-level semantic information. And through post-processing (such as point cloud cleaning, splicing and vectorization), a final high-precision map with semantics can be formed.

Here, the semantic SLAM-based mapping method and apparatus and the electronic device according to the embodiments of the present application may be applied to various mobile apparatuses that require a SLAM function, such as, but not limited to, a vehicle, various industrial or household mobile robots, and the like. Examples of mobile robots include sweeping robots, weeding robots, logistic transfer robots, and the like.

Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary method

Fig. 1 illustrates a flowchart of a semantic SLAM-based mapping method according to an embodiment of the present application.

As shown in fig. 1, the mapping method based on semantic SLAM according to the embodiment of the present application includes: s110, obtaining a current frame image through a camera; s120, processing the current frame image by using a semantic segmentation algorithm to obtain semantic information of the current frame image; s130, optimizing the camera pose corresponding to the current frame by using the reference frame in a mode of minimizing a reconstructed semantic error; and S140, determining the depth information of the immature point without depth information on the reference frame and the corresponding point on the current frame based on the optimized camera pose to form a mature point with depth information.

In step S110, a current frame image is obtained by the camera. Here, in the embodiment of the present application, the camera is not limited to an image pickup device such as a camera, but may include a laser radar, an ultrasonic radar, or the like that can be used to acquire an image. Through these imaging elements, successive video images, including the current frame image, can be obtained while the mobile device carrying it is moving.

In step S120, the current frame image is processed by using a semantic segmentation algorithm to obtain semantic information thereof. Here, examples of the semantic segmentation algorithm may include non-deep learning algorithms such as thresholding, clustering, graph partitioning, and deep learning based algorithms such as fully convolutional neural networks, diluted convolution, Conditional Random Field (CRF) post-processing. Embodiments of the present invention are not limited to these known algorithms, and semantic segmentation algorithms and the like developed in the future may also be employed.

For example, through an algorithm based on deep learning, a semantic segmentation result, that is, a probability that each pixel point in an original image belongs to each semantic category, can be obtained from one frame of original image. For example, the probability of a pixel belonging to a building, a lane line, a sign board, and a street lamp is 0.1, 0.05, and 0.8, respectively, and these probability values may constitute a semantic segmentation image used by the system.

In step S130, a camera pose corresponding to the current frame is optimized by using the reference frame in a manner of minimizing a reconstruction semantic error. Here, as will be described in detail below, the reference frame may be a last key frame including a number of pixel points having corresponding speech information, and the determined camera pose. By utilizing the reference frame, the camera pose corresponding to the current frame can be determined by minimizing the reconstructed semantic error. Meanwhile, as can be understood by those skilled in the art, while optimizing the camera pose corresponding to the current frame, the corresponding points on the current frame may also be optimized, that is, which points on the current frame correspond to those on the reference frame are determined, which is similar to the scheme of minimizing the reconstruction error in the direct method SLAM, and the difference is mainly that the photometric information is replaced by semantic information.

Next, step S130 will be specifically explained. Fig. 2 illustrates a schematic diagram of optimizing camera pose by minimizing reconstructed semantic errors according to an embodiment of the application. First, a priori camera pose corresponding to the current frame is determined. For example, as shown in FIG. 2, the reference frame 101 has a pixel P1, and the current frame 102 has a pixel P2. It should be understood that although fig. 2 only shows one pixel P1 and one pixel P2, there are actually several such pixels P1 and P2 on the reference frame 101 and the current frame 102. It can be determined through semantic matching that pixel point P1 on the reference frame 101 and pixel point P2 on the current frame 102 correspond to the same 3D spatial point P. The camera moves from the position where the reference frame 101 is located to the position where the current frame 102 is located, and since the reference frame 101 has its corresponding camera pose, the pose change of the mobile device can be determined based on the relevant motion sensor, e.g. a pose sensor on the mobile device, such as an Inertial Measurement Unit (IMU) commonly used on vehicles, resulting in a camera pose corresponding to the current frame 102, which is an a priori estimate of the current pose of the camera. Then, by using the prior pose of the camera, the 3D spatial point P corresponding to the pixel point P1 on the reference frame 101 can be projected onto the current frame 102, so as to obtain a corresponding projected point P2'. There is a certain semantic error between the reprojected point P2 'and the matched pixel point P2, which may be referred to as reprojected semantic error, and this error may be represented by, for example, the mean square error of the semantic values of the pixel point P2 and the reprojected point P2'. By adjusting the a priori camera pose such that the reprojection semantic error is minimized, the camera pose corresponding to the minimized reprojection semantic error can be determined to be the camera pose corresponding to the current frame 102.

With continued reference to fig. 1, in step S140, the immature point on the reference frame without depth information and the depth information of the corresponding point on the current frame are determined based on the determined camera pose to form a mature point with depth information, which may also be referred to as immature point depth filtering. Here, the immature points refer to pixels whose depth information in the image space is unknown, i.e., 3D spatial information of these pixels is not obtained from the image space information, and thus these points are referred to as immature points. Depth information for these points can be obtained by immature point depth filtering. Specifically, the immature point depth filtering may determine the depth information of the corresponding pixel point in a multi-view geometric manner, that is, based on the pose difference between adjacent frames and the coordinate difference of the corresponding pixel point, which is the same as the principle of determining the depth information by a binocular camera, and therefore, is not described in detail here. Through the depth filtering of the immature points, the depth uncertainty of the immature points on the image frame can be continuously reduced, and thus, through multiple times of depth filtering, the depth information of most of the immature points can be determined.

Through the steps, the depth information of the pixel points on the image frame can be determined, and the image frame is stored in the established map. However, in some embodiments, it may not be necessary to incorporate every frame into the built map, which may result in a built map that is too bulky and large. Thus, in some embodiments, the step of determining whether the current frame is a key frame may also be included. If the current frame is not a key frame, it may be discarded after it is used to depth filter immature points on the reference frame (i.e., the previous key frame). Conversely, if the current frame is determined to be a key frame, it is saved in the constructed map.

Whether the current frame is a key frame may be determined based on a variety of criteria. In an embodiment, whether the current frame is a key frame may be determined according to semantic differences between the current frame and the reference frame. For example, if the semantic difference between the current frame and the reference frame is large, indicating that the scene change is large, the current frame may be determined as a key frame and stored into the constructed map. On the contrary, if the difference is not large, the current frame is similar to the scene of the reference frame, so that the current frame can be discarded instead of being taken as the key frame.

In another embodiment, it may also be determined whether the current frame is a key frame based on a pose change of the camera. For example, a current frame is selected as a key frame every time the camera moves a predetermined distance, such as 1 meter, 2 meters, 5 meters, or more or less. Of course, other criteria may also be utilized to determine whether the current frame is a key frame.

After the current frame is determined to be a key frame and saved into the constructed map, the sliding window can be further used to perform further optimization on the key frame in the map. The sliding window may comprise a dynamic sequence of a plurality of consecutive key frames, and after the current frame is determined to be a key frame, the current frame may be added to the sliding window and the oldest key frame in the sliding window may be deleted to ensure that the window size is a constant. For a plurality of key frames in the sliding window, a nonlinear optimization method such as Gauss-Newton (Gauss-Newton) method, Levenberg-marquardt (Levenberg-marquardt) method, or other optimization algorithms can be used to optimize the camera pose and the depth information of the pixel points corresponding to each key frame, so that the constructed map is more accurate. The use of the sliding window optimization method can effectively solve the drifting problem of the SLAM and ensure the real-time performance of the whole system.

In the above, after the current frame is determined as a key frame and the depth information of some points (for example, pixel points corresponding to pixel points on the reference frame) on the current frame is determined by performing the depth filtering of the immature points, it can be understood that a plurality of immature points with unknown depth information are generally included on the current frame. Thus, in some embodiments, some new immature points may also be detected and generated on the semantic image of the current frame, which may be processed in a depth filtering operation associated with a subsequent image frame to determine its depth information.

Fig. 3 is a schematic diagram illustrating a mapping method based on semantic SLAM according to an embodiment of the present application, which is implemented in a vehicle-mounted system based on the above principle. As shown in fig. 3, an original image acquired by the vehicle-mounted camera, that is, a current frame image, enters the semantic module 210, and the semantic module 210 may perform semantic segmentation on the image to obtain a semantic image including a semantic perception result, and the semantic image may enter the front-end optimization module 220. The vehicle chassis 230 may include several sensors, such as IMU units, which may provide an a priori estimate of the pose in six degrees of freedom of the vehicle, i.e., the pose a priori estimate of the onboard camera. The front-end optimization module 220 uses the previous key frame as a reference frame and the prior estimation of the camera pose to complete the optimization of the camera pose corresponding to the current frame, and may further perform the depth filtering of the immature points to determine the depth information of at least a part of the immature points.

Next, in decision flow 240, it may be determined whether the current frame is a key frame. If the current frame is not a key frame, then in flow block 250, the current frame may be discarded. If it is a key frame, the current frame can be saved to the constructed map and a sliding window is used to perform further optimization processing in the back-end optimization module 260. For example, the current frame may be added to the sliding window, and the earliest key frame in the sliding window may be deleted, and for a plurality of key frames in the sliding window, the camera pose and the depth information of the pixel points corresponding to each key frame may be optimized by using, for example, a nonlinear optimization method or other optimization algorithms, so that the constructed map is more accurate. Finally, in block 270, immature points on the current frame for which depth information is not known may also be detected, so as to facilitate depth filtering during processing of subsequent image frames to determine depth information thereof, and finally completing construction of the map.

Since the steps of the method shown in fig. 3 have been described in detail above with respect to fig. 1 and 2, they are only briefly described here, and a repeated detailed description thereof is omitted.

Exemplary devices

Fig. 4 illustrates a functional block diagram of a semantic SLAM-based mapping apparatus according to an embodiment of the present application. As shown in fig. 4, the mapping apparatus 300 based on semantic SLAM according to the embodiment of the present application includes: a semantic segmentation unit 310, configured to process the current frame image to obtain semantic information thereof, where the semantic information may include probability values of pixel points belonging to various categories; a camera pose optimization unit 320, configured to optimize a camera pose corresponding to the current frame by using the reference frame in a manner that minimizes a reconstructed semantic error; and a maturing unit 330 for determining depth information of an immature point on the reference frame having no depth information and a corresponding point on the current frame based on the optimized camera pose to form a mature point having depth information.

In an example, in the mapping apparatus 300 based on the semantic SLAM, the mapping apparatus further includes: a key frame identifying unit 340, configured to determine whether the current frame is a key frame. If the current frame is a key frame, the key frame identification unit 340 may further store the current frame in the constructed map; if not, the key frame identification unit 340 may discard the current frame.

In one example, in the semantic SLAM-based mapping apparatus 300 described above, the key frame identifying unit 340 may determine whether the current frame is a key frame based on, for example, the following criteria: semantic difference between the current frame and the reference frame, for example, if the semantic difference is large, the current frame is a key frame, otherwise, the current frame is not a key frame; if the pose of the camera changes greatly, for example, the camera is a key frame, otherwise, the camera is not a key frame.

In an example, in the mapping apparatus 300 based on the semantic SLAM, the mapping apparatus may further include: a sliding window optimization unit 350 to select a plurality of consecutive keyframes using a sliding window and optimize the plurality of consecutive keyframes using a non-linear optimization method to determine more accurate camera pose and depth information. In an example, the sliding window optimization unit 350 may also detect and generate new immature points on the current frame image.

In one example, in the semantic SLAM-based mapping apparatus 300 described above, the camera pose optimization unit 320 may be configured to implement the camera pose optimization by: determining a prior camera pose corresponding to the current frame; calculating a reprojection point of a point on a reference frame on the current frame based on the prior camera pose; and adjusting the prior camera pose to minimize a difference between semantics of a point on the current frame corresponding to the point on the reference frame and semantics of the reprojected point on the current frame. The camera pose corresponding to the minimized re-projection semantic error is the camera pose corresponding to the current frame.

In one example, in the semantic SLAM-based mapping apparatus 300 described above, the camera pose optimization unit 320 may determine the camera pose corresponding to the current frame as the a priori camera pose based on the camera pose corresponding to the reference frame and the camera motion information provided by the in-vehicle sensor.

Here, it can be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the semantic SLAM-based mapping apparatus 300 described above have been described in detail in the method described above with reference to fig. 1 to 3, and thus, a repetitive description thereof will be omitted.

As described above, the semantic SLAM-based mapping apparatus 300 according to the embodiment of the present application may be implemented in various terminal devices, for example, in-vehicle navigation devices of vehicles, mobile robots, and the like. In one example, the semantic SLAM-based mapping apparatus 300 according to the embodiment of the present application may be integrated into the terminal device as a software module and/or a hardware module. For example, the apparatus 300 may be a software module in an operating system of the terminal device, or may be an application developed for the terminal device, which runs on a CPU (central processing unit) and/or a GPU (graphics processing unit), or on a dedicated hardware acceleration chip, such as a dedicated chip adapted to run a deep neural network. Of course, the apparatus 300 may also be one of many hardware modules of the terminal device.

Alternatively, in another example, the semantic SLAM-based mapping apparatus 300 and the terminal device may be separate devices, and the apparatus 300 may be connected to the terminal device through a wired and/or wireless network and transmit the interaction information according to an agreed data format.

Exemplary electronic device

FIG. 5 illustrates a block diagram of an electronic device in accordance with an embodiment of the application. As shown in fig. 5, the electronic device 10 includes one or more processors 11 and memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 11 to implement the semantic SLAM-based mapping method of the various embodiments of the present application described above and/or other desired functions. Various contents such as camera parameters, image information, map information, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 13 may be used to receive image information input from, for example, a camera or its corresponding image processing device, which may be used to perform, for example, the mapping process described above. The output device 14 can output various information including the constructed map and the like to the outside. For example, the output device 14 may output the constructed map to an automated driving system to be utilized to plan a path, make driving decisions, and the like.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 5, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the semantic SLAM-based mapping method according to various embodiments of the present application described in the "exemplary methods" section above of this specification.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the semantic SLAM-based mapping method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is provided for purposes of illustration and understanding only, and is not intended to limit the application to the details which are set forth in order to provide a thorough understanding of the present application.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably herein. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A graph establishing method based on semantic SLAM comprises the following steps:

obtaining a current frame image through a camera;

processing the current frame image by using a semantic segmentation algorithm to obtain semantic information of the current frame image;

optimizing a camera pose corresponding to the current frame by utilizing a reference frame in a mode of minimizing a reconstructed semantic error; and

for an immature point on the reference frame without depth information, determining depth information of the immature point on the reference frame without depth information and depth information of a point on the current frame corresponding to the immature point based on the optimized camera pose, forming a mature point on the reference frame with depth information,

wherein optimizing the camera pose corresponding to the current frame in a manner that minimizes a reconstructed semantic error using the reference frame comprises:

determining a prior camera pose corresponding to the current frame;

calculating a reprojection point of a point on the reference frame on the current frame based on the a priori camera pose; and

adjusting the a priori camera pose to minimize a difference between semantics of a point on the current frame corresponding to a point on the reference frame and semantics of a reprojected point on the current frame.

2. The method of claim 1, further comprising:

determining whether the current frame is a key frame;

if the current frame is not a key frame, discarding the current frame; and

if the current frame is a key frame, saving the current frame in the constructed map,

wherein the reference frame is a previous key frame.

3. The method of claim 2, wherein determining whether the current frame is a key frame comprises:

determining whether the current frame is a key frame based on semantic differences between the current frame and the reference frame; or

Determining whether the current frame is a key frame based on a pose change of the camera.

4. The method of claim 2, further comprising:

selecting a plurality of consecutive key frames using a sliding window; and

the plurality of consecutive keyframes are optimized using a non-linear optimization method to determine more accurate camera pose and depth information.

5. The method of claim 4, further comprising:

new immature points are detected and generated on the current frame image.

6. The method of claim 1, wherein determining an a priori camera pose corresponding to the current frame comprises:

determining a camera pose corresponding to the current frame as the a priori camera pose based on the camera pose corresponding to the reference frame and camera motion information provided by an in-vehicle sensor.

7. The method of claim 1, wherein the semantic information includes a probability value that a pixel belongs to each category.

8. A drawing device based on semantic SLAM comprises:

the semantic segmentation unit is used for processing the current frame image to obtain semantic information of the current frame image;

the camera pose optimization unit is used for optimizing the camera pose corresponding to the current frame by utilizing the reference frame in a mode of minimizing a reconstructed semantic error; and

a maturing unit for determining depth information of an immature point without depth information on the reference frame and depth information of a point corresponding to the immature point on the current frame based on the optimized camera pose for the immature point without depth information on the reference frame, forming a mature point with depth information on the reference frame,

wherein the camera pose optimization unit is configured to:

determining a prior camera pose corresponding to the current frame;

calculating a reprojected point of a point on the reference frame on the current frame based on the a priori camera poses; and

9. The apparatus of claim 8, further comprising:

a key frame identification unit for determining whether the current frame is a key frame and saving the current frame in the constructed map in response to the current frame being a key frame.

10. The apparatus of claim 9, further comprising:

a sliding window optimization unit for selecting a plurality of consecutive key frames using a sliding window and optimizing the plurality of consecutive key frames using a non-linear optimization method to determine more accurate camera pose and depth information, and detecting and generating new immature points on the current frame image.

11. An electronic device, comprising:

a processor; and

a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the method of any of claims 1-7.

12. A mobile device comprising the electronic apparatus of claim 11.

13. A computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-7.