CN117830625A

CN117830625A - Map repositioning method and device combined with semantic information

Info

Publication number: CN117830625A
Application number: CN202211185542.1A
Authority: CN
Inventors: 陈晨; 王欣
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2024-04-05

Abstract

The embodiment of the application provides a map repositioning method and device combining semantic information, which are used for extracting feature points of an input image, calculating descriptors of the feature points, carrying out semantic segmentation on the input image to obtain a semantic segmentation image, projecting the feature points onto the semantic segmentation image, and extracting semantic features of the feature points. And when the feature points are matched, matching the descriptors of the feature points of the input image and the map image with semantic features in a combined way, and performing pose calculation on the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system. By introducing semantic features of the image in feature matching and pose resolving, the precision of feature matching and pose resolving is improved, and therefore the success rate and precision of repositioning are improved.

Description

Map repositioning method and device combined with semantic information

Technical Field

The embodiment of the application relates to the technical field of computer vision, in particular to a map repositioning method and device combined with semantic information.

Background

The technology of instant positioning and map construction (Simultaneous Localization and Mapping, SLAM for short) is widely applied to head-mounted display equipment (head-mounted equipment for short hereinafter) of virtual reality, and performs positioning tracking, safe zone identification and the like through a map repositioning method.

The existing map repositioning method mainly carries out descriptive sub-feature matching on visual feature points of a current input image and feature points in a map, and then calculates relative pose changes of a current coordinate system of equipment and a map coordinate system by utilizing a Perspective n-Point (PnP) algorithm.

When the features are matched, the matching is inaccurate due to external environment factors such as equipment position change, visual angle change, environment illumination change, object movement and the like, so that the problems of repositioning failure, low positioning accuracy and the like are caused.

Disclosure of Invention

The embodiment of the application provides a map repositioning method and device combining semantic information, which improves the precision of feature matching and pose resolving by introducing semantic features of images in feature matching and pose resolving, thereby improving the success rate and precision of repositioning.

In a first aspect, an embodiment of the present application provides a map repositioning method combined with semantic information, where the method includes:

extracting feature points of an input image, and calculating descriptors of the feature points;

performing semantic segmentation on the input image to obtain a semantic segmentation image;

projecting the feature points onto the semantic segmentation image, and extracting semantic features of the feature points;

Matching the descriptors and semantic features of the feature points of the input image with the descriptors and semantic features of the feature points of the stored map image;

and carrying out pose calculation according to the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system.

In some embodiments, the semantic segmentation result is a probability that each pixel point of the input image belongs to a respective category, the projecting the feature point onto the semantic segmented image, extracting the semantic feature of the feature point includes:

and projecting the characteristic points to the corresponding positions of the semantic segmentation image, and determining the probability distribution of the projection points of the characteristic points as the semantic characteristics of the characteristic points.

In some embodiments, matching the descriptors and semantic features of the feature points of the input image with descriptors and semantic features of feature points of a map image includes:

calculating a descriptor feature distance between descriptors of feature points of the input image and the map image;

calculating semantic feature distances between semantic features of feature points of the input image and the map image;

Carrying out weighted calculation on the descriptive sub-feature distance and the semantic feature distance of the feature points of the input image and the map image to obtain a joint feature distance;

and matching the characteristic points of the input image and the map image by using the joint characteristic distance.

In some embodiments, the performing pose calculation according to the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system includes:

and (3) calculating a matching result of the feature points by using a PnP algorithm to obtain the relative pose change of the world coordinate system and the map coordinate system.

In some embodiments, the calculating the matching result of the feature points by using PnP algorithm to obtain the relative pose changes of the world coordinate system and the map coordinate system includes:

projecting feature points in the map image into a plane in which the input image is located;

randomly selecting a first matching point pair from the matching point pair under a random sampling consistent RANSAC framework to carry out PnP calculation to obtain the relative pose change of a world coordinate system and a map coordinate system;

calculating the pixel distance between a projection point of a second matching point pair in the plane of the input image and a characteristic point in the map image, wherein the second matching point pair is the point pair which is remained outside the first matching point pair in the matching point pair;

Acquiring semantic features of projection points of the second matching point pairs on a plane where the input image is located;

calculating the semantic feature distance between the projection point of the second matching point on the plane where the input image is and the feature point in the map image;

according to the pixel distance and the semantic feature distance of the second matching point pair, checking whether the relative pose change obtained by calculation is correct or not;

and determining whether to perform PnP calculation next time according to the verification result.

In some embodiments, the obtaining the semantic feature of the second matching point pair at the projection point of the plane of the input image includes:

and acquiring probability distribution of the second matching point pair on projection points of the input image from the semantic segmentation image of the input image as semantic features.

In some embodiments, the descriptive sub-feature distance is a euclidean distance, a manhattan distance, or a hamming distance.

In some embodiments, the semantic feature distance is KL divergence.

In some embodiments, the semantic feature distance is a KL divergence and the pixel distance is a pixel euclidean distance or a manhattan distance.

In some embodiments, the semantically segmenting the input image to obtain a semantically segmented image includes:

And carrying out semantic segmentation on the input image by using a neural network to obtain the semantic segmentation image.

In some embodiments, the descriptors are used to describe geometric features around feature points.

In another aspect, an embodiment of the present application provides a map repositioning device that combines semantic information, the device including:

the characteristic point extraction module is used for extracting characteristic points of the input image;

the descriptor calculation module is used for calculating descriptors of the feature points;

the semantic segmentation module is used for carrying out semantic segmentation on the input image to obtain a semantic segmentation image;

the semantic feature extraction module is used for projecting the feature points onto the semantic segmentation image and extracting semantic features of the feature points;

the feature point matching module is used for matching the descriptors and semantic features of the feature points of the input image with the descriptors and semantic features of the feature points of the stored map image;

and the pose resolving module is used for resolving the pose according to the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system.

In another aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory for storing a computer program, the processor being for invoking and running the computer program stored in the memory to perform the method as described in any of the above.

In another aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program that causes a computer to perform a method as set forth in any one of the preceding claims.

In another aspect, embodiments of the present application provide a computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements a method as described in any of the above.

According to the map repositioning method and device combining semantic information, feature point extraction is conducted on an input image, descriptors of the feature points are calculated, semantic segmentation is conducted on the input image, a semantic segmentation image is obtained, the feature points are projected onto the semantic segmentation image, and semantic features of the feature points are extracted. And when the feature points are matched, matching the descriptors of the feature points of the input image and the map image with semantic features in a combined way, and performing pose calculation on the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system. By introducing semantic features of the image in feature matching and pose resolving, the precision of feature matching and pose resolving is improved, and therefore the success rate and precision of repositioning are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a map repositioning method combined with semantic information according to an embodiment of the present application;

FIG. 2 is a flow chart of a feature matching method according to a second embodiment of the present application;

FIG. 3 is a flowchart of a method for resolving a pose using a PnP algorithm according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of a map repositioning device with semantic information provided in a fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a map repositioning method combined with semantic information, which can be applied to Extended Reality (XR), wherein the XR refers to the combination of Reality and Virtual through a computer, so as to create a Virtual environment capable of man-machine interaction, and the XR is also a collective term of various technologies such as Virtual Reality (VR), augmented Reality (Augmented Reality, AR), mixed Reality (MR) and the like. By integrating the visual interaction technologies of the three, the method brings the 'immersion' of seamless transition between the virtual world and the real world for the experienter.

VR: the technology of creating and experiencing a virtual world, calculating and generating a virtual environment, which is a multi-source information (the virtual reality mentioned herein at least comprises visual perception, auditory perception, tactile perception, motion perception, even taste perception, olfactory perception and the like, and the virtual reality also comprises gustatory perception, olfactory perception and the like), realizes the simulation of the fusion, interactive three-dimensional dynamic view and entity behaviors of the virtual environment, immerses a user into the simulated virtual reality environment, and realizes the application in various virtual environments such as maps, games, videos, education, medical treatment, simulation, collaborative training, sales, assistance in manufacturing, maintenance and repair and the like.

VR devices are terminals for realizing virtual reality effects, and can be generally provided in the form of glasses, head mounted displays (Head Mount Display, HMD), or contact lenses for realizing visual perception and other forms of perception, although the form of virtual reality device realization is not limited thereto, and can be further miniaturized or enlarged as needed.

AR: an AR set refers to a simulated set with at least one virtual object superimposed over a physical set or representation thereof. For example, the electronic system may have an opaque display and at least one imaging sensor for capturing images or videos of the physical set, which are representations of the physical set. The system combines the image or video with the virtual object and displays the combination on an opaque display. The individual uses the system to view the physical set indirectly via an image or video of the physical set and observe a virtual object superimposed over the physical set. When the system captures images of a physical set using one or more image sensors and presents an AR set on an opaque display using those images, the displayed images are referred to as video passthrough. Alternatively, the electronic system for displaying the AR scenery may have a transparent or translucent display through which the individual may directly view the physical scenery. The system may display the virtual object on a transparent or semi-transparent display such that an individual uses the system to view the virtual object superimposed over the physical scenery. As another example, the system may include a projection system that projects the virtual object into a physical set. The virtual object may be projected, for example, on a physical surface or as a hologram, such that an individual uses the system to view the virtual object superimposed over the physical scene. In particular, a technique for calculating camera attitude parameters of a camera in the real world (or three-dimensional world, real world) in real time in the process of acquiring images by the camera and adding virtual elements on the images acquired by the camera according to the camera attitude parameters. Virtual elements include, but are not limited to: images, videos, and three-dimensional models. The goal of AR technology is to socket the virtual world over the real world on the screen for interaction.

MR: by presenting virtual scene information in a real scene, an interactive feedback information loop is built up among the real world, the virtual world and the user, so as to enhance the sense of realism of the user experience. For example, integrating computer-created sensory input (e.g., virtual objects) with sensory input from a physical scenery or representations thereof in a simulated scenery, in some MR sceneries, the computer-created sensory input may be adapted to changes in sensory input from the physical scenery. In addition, some electronic systems for rendering MR scenes may monitor orientation and/or position relative to the physical scene to enable virtual objects to interact with real objects (i.e., physical elements from the physical scene or representations thereof). For example, the system may monitor movement such that the virtual plants appear to be stationary relative to the physical building.

Of course, map repositioning is not only applied to XR devices, but also in the fields of unmanned aerial vehicles, etc. Taking VR as an example, SLAM can finely construct the position and the posture of each object in the three-dimensional space of the environment by observing the environmental characteristics in real time in the motion process of the equipment and then performing incremental map construction according to the position of the SLAM.

SLAM relocation in VR devices is often applied in lost recovery and secure zone identification. The lost recovery means that when the map is lost, the repositioning algorithm can match with the stored map information according to the newly input image information, recover the relative pose and continue positioning and tracking.

When the device is restarted, the identification of the safety area is triggered, and the safety area (also called as a safety area or a safety boundary) of the VR device refers to an area which can be used by the VR device, so that a user wearing the VR device can avoid collision with an object in a space when moving in the safety area in a real scene. When the VR device is used, a user wears the bracelet or the controller on the hand, and when the action amplitude of the user is large, the bracelet or the controller is about to touch a safety zone, the bracelet can vibrate or the voice prompts the user to be about to touch or touch the safety zone.

When the safety area is identified, the stored map image of the safety area is loaded, the environment image of the current environment is collected, the characteristic points of the current environment image are matched with the characteristic points of the stored map image of the safety area, and when the characteristic points of the current environment image and the characteristic points of the stored map image of the safety area are successfully matched, the configuration information of the stored safety area is loaded.

Fig. 1 is a flowchart of a map repositioning method combined with semantic information according to an embodiment of the present application, where the method of the present embodiment may be performed by an electronic device that performs repositioning, and in particular may be performed by a SLAM positioning module of the device, as shown in fig. 1, and the map repositioning method combined with semantic information includes the following steps:

s101, extracting feature points of an input image, and calculating descriptors of the feature points.

The input image may be an image acquired by a camera of the device in real time, and the feature point refers to a point where the gray value of the image changes drastically or a point with a larger curvature on the edge of the image (i.e. an intersection point of two edges). The image feature points play a very important role in an image matching algorithm based on the feature points, the image feature points can reflect the essential features of the image, the target objects in the image can be identified, and the matching of the image can be completed through the matching of the feature points.

The feature extraction may be performed on the input image using a feature extractor, which may be a function of performing feature extraction on the image, the features extracted by the function being represented using feature vectors. The feature extractor may also be a neural network including, but not limited to: convolutional neural networks (Convolutional Neural Networks, CNN), multi-layer perceptron neural networks (Multi-layer perceptron neural networks, MLP), transform structured neural networks, and the like.

For example, the location of feature points may be detected using FAST feature point detection algorithm or Harris corner detection algorithm or Scale-invariant feature transform (SIFT), acceleration robust features (Speeded Up Robust Features, SURF), or the like.

After the position of the feature point is detected, a descriptor of the feature point is calculated, and the descriptor is used for describing the detected feature point and is a binary code descriptor. Descriptors can be used to describe information around feature points, such as geometric features around feature points, with common descriptors having binary robust independent basic feature (Robust Independent Elementary Features, BRIEF) descriptors.

S102, carrying out semantic segmentation on the input image to obtain a semantic segmentation image.

Semantic segmentation is the classification of each pixel in an image with respect to those, a picture is made up of a large number of pixels, and semantic segmentation is the classification of pixels belonging to the same object together using a segmentation algorithm. For example, the picture includes people, trees, houses, roads and sky, and the semantic segmentation is to classify pixels in the picture, namely, which pixels belong to people, which pixels belong to trees and which pixels belong to houses, so as to complete the segmentation of each object in the picture.

The input image may be semantically segmented by using a neural network (may be referred to as a semantic segmentation network), and assuming that the size of the image before segmentation is h×w, and the number of classes that can be predicted by the semantic segmentation network is n, the size of the semantically segmented image obtained after segmentation is h×w×n, and each pixel point may be represented as (h _i *w _j *n)，h _i *w _j Represents the position of the pixel point, n is an n-dimensional vector, and represents the pixel point h _i *w _j Probabilities of belonging to various categories (e.g., tables, beds, floors, walls, etc.).

Each pixel in the input image has a probability for each class, and the sum of the probabilities that the pixel belongs to all classes is equal to 1. Assuming that n has a value of 5, the vector n represents the probability that the pixel belongs to 5 categories, for example, the probability that the pixel a belongs to category 1 is 0.02, the probability that the pixel belongs to category 2 is 0.1, the probability that the pixel belongs to category 3 is 0.03, the probability that the pixel belongs to category 4 is 0.82, and the probability that the pixel belongs to category 5 is 0.03.

S103, projecting the feature points onto the semantic segmentation image, and extracting semantic features of the feature points.

Illustratively, the extracted feature points are projected onto corresponding positions of the semantic segmentation image, and the probability distribution of the projected points of the feature points is determined as semantic features of the feature points. Each feature point has unique projection points on the semantic segmentation image, the probability distribution of the projection points is the semantic features of the feature points, and the semantic feature length of the feature points is equal to the number of categories which can be predicted by the semantic segmentation network.

It will be appreciated that the feature point extraction and the semantic segmentation may be performed in parallel, and after the feature points are extracted, the descriptors are further performed according to the feature points, and the feature points are projected onto the semantic segmented image.

S104, matching the descriptors and semantic features of the feature points of the input image with the descriptors and semantic features of the feature points of the stored map image.

Before the relocation is started, the device loads stored map data, which may be automatically stored map data when the last tracking fails, or may be feature points of environment data of a security zone stored when the security zone is set.

The map data comprises key frame information and incremental data, the key frame information comprises characteristic points of the key frames, and when the map is reconstructed, the characteristic points of the key frames are used for characteristic matching.

Unlike the prior art, in the embodiment of the present application, the feature information of the feature points of the key frame stored in the device includes two types: the feature points are described in terms of semantics and the semantic features, wherein the semantic features are newly added feature information.

Correspondingly, when the feature matching is carried out, the descriptors of the feature points and the semantic features are used for carrying out the feature matching together, and compared with the feature matching carried out by using a single descriptor, the feature matching is carried out by combining the descriptors and the semantic features together, so that the problem that the feature matching fails or the matching precision is low due to the change of the equipment position, the change of the visual angle, the environmental illumination or the movement of an object and the like can be avoided, and the matching precision of the feature points is improved.

And S105, performing pose calculation according to the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system.

Any existing pose calculation method may be used to estimate the relative pose change of the world coordinate system and the map coordinate system, which is not limited in this embodiment.

For example, the matching result may be resolved using PnP algorithm, which is a method of solving 3D to 2D point-to-motion in order to solve for the pose of the camera coordinate system (i.e., map coordinate system) with respect to the world coordinate system. It describes how the pose of the camera (i.e. solving the world coordinate system to the rotation matrix and translation vector of the camera coordinate system) is estimated given the coordinates of the n 3D points (relative to the world coordinate system) and the pixel coordinates of these points.

Alternatively, the PnP algorithm may be combined with random sample consensus (RANdom SAmple Consensus, RANSAC) to perform a solution, and RANSAC may iteratively estimate parameters of the mathematical model from a set of observation data sets containing "outliers". It is an uncertain algorithm that has a certain probability to get a reasonable result, and the number of iterations must be increased in order to increase the probability.

The basic assumption of RANSAC is:

(1) The data consists of "intra-office points", such as: the distribution of the data may be interpreted with some model parameters;

(2) An "outlier" is data that cannot fit the model;

(3) The data in addition belongs to noise.

The reasons for the generation of the off-site points are: extreme values of noise; an erroneous measurement method; false assumptions about data.

RANSAC also makes the following assumptions: given a set of (usually small) intra-site points, there is a process by which model parameters can be estimated, and the model can be interpreted or applied to the intra-site points.

And (S104) carrying out characteristic point matching to obtain a plurality of matching point pairs, wherein each matching point pair consists of two matched characteristic points, when a RANSAC algorithm is adopted, selecting part of the matching point pairs to carry out PnP calculation, adopting the rest matching point pairs to check PnP results, and if the verification is not passed, selecting part of the matching point pairs to continue the PnP calculation, and repeating the iterative process until the iteration ending condition is met.

Unlike the prior art, in the pose calculation, besides calculating the pixel error of the projection pixel, the embodiment introduces semantic projection error to assist in verification, and after introducing the semantic projection error, consistency of the pose calculated by PnP RANSAC in geometric space structure and semantic information distribution can be ensured, and repositioning precision and robustness are improved.

After the repositioning is finished, if the equipment performs tracking loss recovery, normal positioning tracking can be recovered after the repositioning is successful; if the safe area identification is carried out, after the repositioning is successful, the position of the safe area under the stored map coordinate system is subjected to coordinate conversion according to the calculated relative pose and projected into the current world coordinate system, so that the safe area identification is completed.

In this embodiment, feature point extraction is performed on an input image, descriptors of the feature points are calculated, semantic segmentation is performed on the input image to obtain a semantic segmentation image, the feature points are projected onto the semantic segmentation image, and semantic features of the feature points are extracted. And when the feature points are matched, matching the descriptors of the feature points of the input image and the map image with semantic features in a combined way, and performing pose calculation on the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system. By introducing semantic features of the image in feature matching and pose resolving, the precision of feature matching and pose resolving is improved, and therefore the success rate and precision of repositioning are improved.

On the basis of the first embodiment, a feature matching method is provided in the second embodiment of the present application, which is used for describing step S104 in the first embodiment in detail, fig. 2 is a flowchart of the feature matching method provided in the second embodiment of the present application, and as shown in fig. 2, the method in the present embodiment includes the following steps.

S201, calculating descriptor feature distances between descriptors of feature points of the input image and the map image.

The descriptor feature distance is used for evaluating the similarity of descriptors of two feature points, wherein the smaller the distance is, the larger the similarity is.

Illustratively, the descriptive sub-feature distance is an L1 distance (manhattan distance), an L2 distance (euclidean distance), or a hamming distance. Different algorithms can be selected according to the types of descriptors, and for floating point descriptors such as SIFT, the L2 distance can be used for calculating the characteristic distance of the descriptors; for binary descriptors, such as fast rotation of direction short (Oriented FAST and Rotated BRIEF, ORB), hamming distance may be employed.

S202, calculating semantic feature distances between semantic features of feature points of the input image and the map image.

The semantic feature distance is used for evaluating the semantic similarity between two feature points, and the smaller the distance is, the higher the similarity degree is.

By way of example, the semantic feature distance of two feature points may be calculated using the KL divergence, which is calculated as follows:

wherein d _sem For semantic feature distance, f ₁ And f ₂ The semantic features of the two feature points are respectively, namely the semantic features of the feature points of the input image and the semantic features of the feature points of the map image, and k represents the feature dimension.

Alternatively, the euclidean distance or manhattan distance may be used to calculate the semantic feature distance between two feature points, which is not limited in the embodiments of the present application. Wherein, KL divergence is more accurate to the measurement of semantic feature distance.

And S203, carrying out weighted calculation on the descriptive sub-feature distance and the semantic feature distance of the feature points of the input image and the map image to obtain the joint feature distance.

Calculating to obtain semantic feature distance d _sem And describe sub-feature distance d _geo Then, weighting calculation is carried out on the two parts of distances to obtain a joint characteristic distance d _union ：

d _union ＝d _geo +λ·d _sem

Wherein, lambda is a natural number larger than 0, and can be set according to the use scene.

204. The feature points of the input image and the map image are matched using the joint feature distance.

After the joint feature distance of the two feature points is obtained, judging whether the joint feature distance of the two feature points is smaller than a distance threshold, if the joint feature distance of the two feature points is smaller than the distance threshold, determining that the two feature points are matched, and if the joint feature distance of the two feature points is larger than or equal to the distance threshold, indicating that the two feature points are not matched.

In this embodiment, when matching feature points of an input image and a map image, semantic feature distances and description sub-feature distances of feature points of the input image and the map image are calculated respectively, and the description sub-feature distances and the semantic feature distances of the feature points are used for matching in a combined manner, so that mismatching caused by similar appearance among different types of objects can be avoided to a certain extent by using the combined feature matching, accuracy of feature point matching is improved, and errors caused to subsequent pose resolving are prevented.

On the basis of the first embodiment and the second embodiment, a method for resolving a pose using a PnP algorithm is provided in the third embodiment of the present application, which is used for describing step S105 in the first embodiment in detail, and fig. 3 is a flowchart of the method for resolving a pose using a PnP algorithm provided in the third embodiment of the present application, as shown in fig. 3, where the method in the third embodiment includes the following steps.

S301, projecting characteristic points in the map image into a plane where the input image is located.

The feature points in the map image are 3D feature points, and the 3D feature points of the map image are projected into the plane in which the input image lies, i.e. the 3D feature points are projected into the 2D plane.

S302, randomly selecting a first matching point pair from the matching point pairs under the RANSAC frame to carry out PnP calculation, and obtaining the relative pose change of the world coordinate system and the map coordinate system.

All matching point pairs of the input image and the map image can be obtained through feature matching, pnP (provider-specific) calculation is performed by using part of feature point pairs in the matching point pairs, and the specific implementation manner of the PnP calculation can refer to the prior art and is not repeated here.

Under the PnP RANSAC architecture, a preset number of first matching point pairs can be randomly selected to perform PnP calculation.

S303, calculating the pixel distance between the projection point of the second matching point pair in the plane of the input image and the characteristic point in the map image, wherein the second matching point pair is the point pair which is remained outside the first matching point pair in the matching point pair.

The pixel distance, which is also called a pixel error, is used to evaluate whether the pixels of the two feature points are similar, and may be the euclidean distance or the manhattan distance of the two feature points. The smaller the pixel distance of the two feature points, the greater the similarity of the two feature points.

S304, acquiring semantic features of projection points of the second matching point pairs on the plane where the input image is.

Illustratively, the probability distribution of the projection points of the second matching point pair on the plane of the input image is read from the semantically segmented image of the input image, and the probability distribution of the projection points of the second matching point pair on the plane of the input image is taken as a semantic feature.

S305, calculating semantic feature distances between projection points of the second matching point pairs on the plane of the input image and feature points in the map image.

The semantic feature distance may be referred to as a semantic projection error and may be calculated using a KL divergence, euclidean distance, or Manhattan distance.

S306, checking whether the calculated relative pose change is correct or not according to the pixel distance and the semantic feature distance of the second matching point pair.

When the method of the embodiment performs PnP result verification, pixel errors are adopted, and semantic projection errors are introduced. For each second matching point pair, comparing the pixel distance of the second matching point pair with the pixel distance threshold, and comparing the magnitude of the semantic feature distance of the second matching point pair with the magnitude of the semantic feature distance threshold, and determining that the verification is correct when the pixel distance of all or the second matching point pairs larger than a certain proportion of the threshold is smaller than the pixel distance threshold and the semantic feature distance is smaller than the semantic feature distance threshold, namely the PnP solution result. The ratio threshold may be set, for example, at 90%, 95%, or 80%.

Correspondingly, when the pixel distance of at least one second matching point pair which is not more than the proportion threshold is not more than the pixel distance threshold, and/or the semantic feature distance of at least one second matching point pair which is not more than the proportion threshold is not more than the semantic feature distance threshold, it is determined that verification fails, that is, the result of the PnP solution is wrong, the next PnP solution can be performed, that is, the solution flow of repeatedly executing steps S303-S306 is returned.

Optionally, when the verification fails, judging whether an iteration end condition is met, ending the PnP calculation when the iteration end condition is met, and if the iteration end condition is not met, continuing to carry out the PnP calculation next time. The iteration end condition may be that the number of iterations reaches a threshold.

In this embodiment, the relative pose of the world coordinate system and the map coordinate system is restored by using the PnP RANSAC algorithm, and when the PnP calculation result is verified, not only the traditional pixel error verification result is used, but also the semantic projection error verification result is introduced, so that after the semantic projection error is introduced, the consistency of the pose calculated by the PnP RANSAC in the geometrical space structure and the semantic information distribution can be ensured, and the repositioning precision and robustness are improved.

In order to facilitate better implementation of the map repositioning method combining semantic information according to the embodiment of the application, the embodiment of the application also provides a map repositioning device combining semantic information. Fig. 4 is a schematic structural diagram of a map repositioning device with semantic information according to a fourth embodiment of the present application, and as shown in fig. 4, the map repositioning device 100 with semantic information may include:

a feature point extraction module 11, configured to extract feature points from an input image;

a descriptor calculation module 12, configured to calculate descriptors of the feature points;

the semantic segmentation module 13 is used for carrying out semantic segmentation on the input image to obtain a semantic segmentation image;

a semantic feature extraction module 14, configured to project the feature points onto the semantic segmentation image, and extract semantic features of the feature points;

a feature point matching module 15, configured to match a descriptor and a semantic feature of the feature point of the input image with a descriptor and a semantic feature of a feature point of a stored map image;

and the pose resolving module 16 is configured to perform pose resolving according to the feature point matching result, so as to obtain a relative pose change of the world coordinate system and the map coordinate system.

In some embodiments, the semantic segmentation result is a probability that each pixel of the input image belongs to a respective class, and the semantic feature extraction module 14 is specifically configured to:

In some embodiments, the feature point matching module 15 is specifically configured to:

In some embodiments, the pose resolving module 16 is specifically configured to:

In some embodiments, the acquiring semantic features of the second matching point pair at a projection point of a plane where the input image is located specifically includes:

In some embodiments, the semantic feature distance is KL divergence.

In some embodiments, the semantic segmentation module 13 is specifically configured to:

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here.

The apparatus 100 of the embodiments of the present application is described above from the perspective of functional modules in connection with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

The embodiment of the application also provides electronic equipment. Fig. 5 is a schematic structural diagram of an electronic device provided in a fifth embodiment of the present application, as shown in fig. 5, the electronic device 200 may include:

a memory 21 and a processor 22, the memory 21 being adapted to store a computer program and to transfer the program code to the processor 22. In other words, the processor 22 may call and run a computer program from the memory 21 to implement the methods in the embodiments of the present application.

For example, the processor 22 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the present application, the processor 22 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the present application, the memory 21 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the present application, the computer program may be partitioned into one or more modules that are stored in the memory 21 and executed by the processor 22 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device.

As shown in fig. 5, the electronic device 200 may further include: a transceiver 23, the transceiver 23 being connectable to the processor 22 or the memory 21.

The processor 22 may control the transceiver 23 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 23 may include a transmitter and a receiver. The transceiver 23 may further include antennas, the number of which may be one or more.

It will be appreciated that, although not shown in fig. 5, the electronic device 200 may further include a camera module, a WIFI module, a positioning module, a bluetooth module, a display, a controller, etc., which are not described herein.

It will be appreciated that the various components in the electronic device are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

The present application also provides a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the electronic device reads the computer program from the computer readable storage medium, and the processor executes the computer program, so that the electronic device executes a corresponding flow in the map repositioning method combined with the semantic information in the embodiment of the present application, which is not described herein for brevity.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A map repositioning method in combination with semantic information, comprising:

2. The method according to claim 1, wherein the semantic segmentation result is a probability that each pixel point of the input image belongs to a respective class, the projecting the feature point onto the semantic segmented image, extracting a semantic feature of the feature point, includes:

3. The method of claim 2, wherein matching the descriptors and semantic features of the feature points of the input image with descriptors and semantic features of feature points of a map image comprises:

4. A method according to any one of claims 1 to 3, wherein the performing pose calculation according to the feature point matching result to obtain the relative pose change of the world coordinate system and the map coordinate system includes:

5. The method of claim 4, wherein the calculating the matching result of the feature points using PnP algorithm to obtain the relative pose changes of the world coordinate system and the map coordinate system comprises:

6. The method of claim 5, wherein the obtaining semantic features of the second matching point pair at the projection point of the plane of the input image comprises:

7. A method according to claim 3, wherein the descriptor feature distance is a euclidean distance, a manhattan distance, or a hamming distance.

8. The method according to claim 3, wherein the semantic feature distance is KL divergence.

9. The method of claim 5, wherein the semantic feature distance is KL divergence and the pixel distance is a pixel euclidean distance or a manhattan distance.

10. A method according to any of claims 1-3, wherein said semantically segmenting the input image to obtain semantically segmented images comprises:

11. A method according to any one of claims 1-3, wherein the descriptors are used to describe geometrical features around feature points.

12. A map repositioning apparatus incorporating semantic information, comprising:

13. An electronic device, comprising:

a processor and a memory for storing a computer program, the processor being for invoking and running the computer program stored in the memory to perform the method of any of claims 1 to 11.

14. A computer readable storage medium storing a computer program for causing a computer to perform the method of any one of claims 1 to 11.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 11.