CN110298320B

CN110298320B - Visual positioning method, device and storage medium

Info

Publication number: CN110298320B
Application number: CN201910586511.9A
Authority: CN
Inventors: 李照虎; 张永杰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2021-06-22
Anticipated expiration: 2039-07-01
Also published as: CN110298320A

Abstract

The embodiment of the invention provides a visual positioning method, a visual positioning device and a storage medium, wherein the method comprises the following steps: collecting panoramic data; inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result; obtaining a positioning map based on semantic features according to the classification result; and inputting at least one image data to be processed acquired by the current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map. By adopting the embodiment of the invention, the accurate direction positioning can be realized by using the existing magnetometer, and the hardware cost for upgrading the magnetometer is reduced.

Description

Visual positioning method, device and storage medium

Technical Field

The present invention relates to the field of computer vision technologies, and in particular, to a method and an apparatus for visual positioning, and a storage medium.

Background

One application scenario for the visual localization process is: when the same target object (such as a building, a vehicle, a mobile phone terminal, a street lamp in a surrounding environment, etc.) is viewed from different perspectives (such as a left-to-right perspective, a right-to-left perspective, a top-to-bottom top-down perspective, etc.), if the viewing results are similar, it is difficult to determine the direction (or orientation) of the target object, and the target object needs to be located. For example, currently in the scene of an intersection, the orientation of a certain target object can be viewed through the magnetometer. Because more vehicles are arranged at the intersection, more traffic lights and other facilities are arranged at the intersection, and the facilities bring great electromagnetic interference, the magnetometer used for detecting the direction of the target object can bring great errors after the electromagnetic interference, and the direction of the target object can not be accurately determined. Currently, if it is desired to accurately determine the direction in this scenario, only more advanced magnetometers can be used, which increases costs. However, this problem is not effectively solved.

Disclosure of Invention

Embodiments of the present invention provide a visual positioning method to solve one or more technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a visual positioning method, where the method includes:

collecting panoramic data;

inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result;

obtaining a positioning map based on semantic features according to the classification result;

and inputting at least one image data to be processed acquired by the current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map.

In an embodiment, the inputting the panoramic data into a classification model as a training sample for classification to obtain a classification result includes:

in the classification model, performing image preprocessing on at least one image data in the panoramic data according to a semantic segmentation strategy to obtain a preprocessing result, wherein the preprocessing result is a partial image area in the at least one image data;

classifying the preprocessing result to obtain semantic features corresponding to the partial image areas and coordinate information corresponding to the partial image areas;

and determining the semantic features and the coordinate information as the classification result.

In an embodiment, the image preprocessing at least one image data in the panoramic data according to a semantic segmentation policy to obtain a preprocessing result includes:

identifying an object that is stationary for a specified period of time from the at least one image data;

taking an image area corresponding to the object as static information;

and taking the static information as the preprocessing result.

In one embodiment, the obtaining a positioning map based on semantic features according to the classification result includes:

acquiring the semantic features and the coordinate information;

according to the semantic features and the coordinate information, semantic block areas in the map are described correspondingly;

according to the coordinate information, configuring an observation visual angle aiming at the semantic block area;

and obtaining the positioning map consisting of a plurality of semantic block areas according to the semantic features, the coordinate information and the observation visual angle.

In one embodiment, the configuring the viewing perspective for the semantic block region includes:

configuring different observation visual angles according to different positioning accuracies corresponding to different object observation directions in the panoramic data;

the viewing perspective includes at least: viewing angles in at least two of east, south, west, and north directions.

In one embodiment, the method further comprises: dividing the viewing direction in a horizontal direction for the panoramic data; alternatively, the first and second electrodes may be,

for the panoramic data, the viewing direction is divided in a pitch direction.

In an embodiment, the inputting at least one image data to be processed acquired by a current target object into the classification model, and obtaining a direction of the target object by positioning in combination with the positioning map includes:

in the classification model, performing image preprocessing on at least one image data to be processed according to a semantic segmentation strategy, and reserving static information in the at least one image data to be processed;

and positioning through the positioning map to obtain the direction of the target object according to the semantic features, the coordinate information and the observation visual angle corresponding to the static information.

In an embodiment, the obtaining, according to the semantic features, the coordinate information, and the observation angle corresponding to the static information, the direction of the target object by the positioning of the positioning map includes:

performing image matching on the static information and semantic block areas in the positioning map to obtain at least one target semantic block area with image similarity with the static information, wherein the at least one target semantic block area corresponds to the same coordinate information;

when the at least one target semantic block area has the superposition of a plurality of observation visual angles, obtaining a positioning parameter according to a multi-visual-angle superposition area;

and positioning to obtain the direction of the target object according to the positioning parameters.

In a second aspect, an embodiment of the present invention provides a visual positioning apparatus, including:

the acquisition unit is used for acquiring panoramic data;

the classification unit is used for inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result;

the map generation unit is used for obtaining a positioning map based on semantic features according to the classification result;

and the positioning unit is used for inputting at least one piece of image data to be processed acquired by the current target object into the classification model and positioning to obtain the direction of the target object by combining the positioning map.

In one embodiment, the classification unit further includes:

a preprocessing subunit, configured to perform, in the classification model, image preprocessing on at least one image data in the panoramic data according to a semantic segmentation policy to obtain a preprocessing result, where the preprocessing result is a partial image area in the at least one image data;

a classification subunit, configured to classify the preprocessing result to obtain semantic features corresponding to the partial image regions and coordinate information corresponding to the partial image regions;

In one embodiment, the preprocessing subunit is further configured to:

taking an image area corresponding to the object as static information;

and taking the static information as the preprocessing result.

In one embodiment, the map generation unit further includes:

the information acquisition subunit is used for acquiring the semantic features and the coordinate information;

the area description subunit is used for correspondingly describing the semantic block area in the map according to the semantic features and the coordinate information;

the view angle configuration subunit is used for configuring an observation view angle aiming at the semantic block area according to the coordinate information;

and the map generation subunit is used for obtaining the positioning map formed by a plurality of semantic block areas according to the semantic features, the coordinate information and the observation visual angle.

In one embodiment, the view angle configuring subunit is further configured to:

In one embodiment, the apparatus further includes a direction dividing unit configured to:

dividing the viewing direction in a horizontal direction for the panoramic data; alternatively, the first and second electrodes may be,

for the panoramic data, the viewing direction is divided in a pitch direction.

In one embodiment, the positioning unit further includes:

the image preprocessing subunit is used for performing image preprocessing on at least one image data to be processed according to a semantic segmentation strategy in the classification model and reserving static information in the at least one image data to be processed;

and the object positioning subunit is used for positioning through the positioning map to obtain the direction of the target object according to the semantic features, the coordinate information and the observation visual angle corresponding to the static information.

In one embodiment, the object positioning subunit is configured to:

In a third aspect, an embodiment of the present invention provides a visual positioning apparatus, where functions of the apparatus may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the apparatus includes a structure having a processor and a memory, the memory is used for storing a program supporting the apparatus to execute any of the above visual positioning methods, and the processor is configured to execute the program stored in the memory. The apparatus may also include a communication interface for communicating with other devices or a communication network.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for an information processing apparatus, which includes a program for executing any one of the above-mentioned visual positioning methods.

One of the above technical solutions has the following advantages or beneficial effects:

in the embodiment of the invention, panoramic data is collected; inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result; obtaining a positioning map based on semantic features according to the classification result; and inputting at least one image data to be processed acquired by the current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map. By adopting the embodiment of the invention, for the condition that the direction (or called orientation) of the target object is difficult to determine, the extracted panoramic data is input into the classification model as the training sample for classification, the positioning map based on the semantic features is obtained according to the classification result, and then the direction of the current target object can be positioned by applying the classification model and the positioning map. Because the direction of the target object can be positioned by means of the classification model and the positioning map without changing the current hardware, the accurate direction positioning can be realized by using the current magnetometer, and simultaneously, the hardware cost for upgrading the magnetometer is reduced.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

Fig. 1 shows a flow chart of a visual positioning method according to an embodiment of the invention.

Fig. 2 shows a flow chart of a visual positioning method according to an embodiment of the invention.

FIG. 3 shows a schematic view of a visual positioning scene according to an embodiment of the invention.

Fig. 4 shows a block diagram of a visual positioning apparatus according to an embodiment of the present invention.

Fig. 5 shows a block diagram of a visual positioning apparatus according to an embodiment of the present invention.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

In the related art, in an application scenario, one and the same target object (e.g., a building, a vehicle, a mobile phone terminal, a tree in a surrounding environment, or a street lamp on the street) is viewed from different perspectives (e.g., a left-to-right perspective, a right-to-left perspective, a top-to-bottom perspective, etc.), and if the viewing results are similar, it is difficult to determine the direction (or orientation) of the target object, and the target object needs to be located. The orientation is typically determined using a magnetometer. However, in a scene such as an intersection, various interferences exist around the scene, for example, a large number of passing vehicles (a metal housing may affect the magnetometer), telegraph poles, and railings (a metal material may affect the magnetometer), which all may cause a large electromagnetic interference, so that the magnetometer for detecting the direction of the target object may cause a large error after the electromagnetic interference, and the direction cannot be accurately determined.

Magnetometers, also known as geomagnetic and magnetic sensors, may be used to measure magnetic field strength and direction, and to locate the orientation of a target object (e.g., a current device). The principle of the magnetometer is similar to that of a compass, and the included angles between the target object and the east, west, south and north directions can be measured, so that the gyroscope knows that the target object turns around, the accelerometer knows that the target object moves forward by several meters, and the magnetometer knows that the target object is in the west direction. In practical application, due to the error modification and compensation requirements, besides the magnetometer, the gyroscope and the accelerometer can be combined for positioning, and the characteristics of each sensor are utilized, so that the final positioning result is more accurate, for example, the positioning result is obtained by combining the magnetic field direction and the direction movement condition at the same time.

However, due to electromagnetic interference, if the direction of the target object is to be accurately determined in the above-described scenario, only a more advanced magnetometer can be used, which tends to increase hardware costs. Therefore, the hardware cost needs to be reduced while accurate positioning is realized, and particularly, the visual positioning processing of the embodiment of the invention is provided.

Fig. 1 shows a flow chart of a visual positioning method according to an embodiment of the invention. As shown in fig. 1, the process includes:

step 101, collecting panoramic data.

And 102, inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result.

In one example, the panoramic data may be image data collected at an intersection, including a pedestrian walking at the intersection, a vehicle, a building, a cell phone terminal, a street lamp or a tree in the surrounding environment, and so on. When the panoramic data is collected, the same target object (such as a vehicle, a pedestrian, a building, a mobile phone terminal, a street lamp of a tree or a street in the surrounding environment, etc.) can be collected from different perspectives (such as a left-to-right perspective, a right-to-left perspective, a top-to-bottom-down perspective, etc.), and different target objects can also be collected from different perspectives (such as a left-to-right perspective, a right-to-left perspective, a top-to-bottom-down perspective, etc.).

The panoramic data obtained by the acquisition mode is used as a training sample, the training sample is input into a classification model for classification, and a classification result is obtained, wherein the classification result can be an image area where each target object is located in any image data of the panoramic data, and the image area has semantic features and corresponding coordinate information. For example, it can be known through semantic classification which area is a vehicle, which area is a pedestrian, which area is a building, which area is a mobile phone terminal, which area is a tree or which area is a street lamp in the surrounding environment, and the like in the image data.

And 103, obtaining a positioning map based on semantic features according to the classification result.

In one example, the classification result includes an image area for distinguishing each target object in any image data of the panoramic data, and the image area has semantic features and corresponding coordinate information. And assigning observation visual angles to the image area according to the corresponding coordinate information, wherein the same target object can correspond to at least two observation visual angles. The target object is a hexahedron in the three-dimensional space, and accordingly, the observation angle may be divided in the three-dimensional space, but is not limited thereto, and may also be divided in the two-dimensional space, and may also be mapped to the three-dimensional space after the two-dimensional space is divided, and so on. Examples of viewing perspectives are, for example, a left-to-right perspective, a right-to-left perspective, a top-down perspective, and the like.

And 104, inputting at least one piece of image data to be processed acquired by the current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map.

In one example, the target object may include: vehicles, pedestrians, buildings, cell phone terminals, street lights of a tree or street in the surrounding environment, etc. For example, the target object is a mobile phone terminal, and a user takes a scene image (the scene image may be a captured image at different viewing angles) at a current position by using the mobile phone terminal, where the scene image may be the image data to be processed. Due to the steps 101-104, the classification model capable of being classified is obtained through training by inputting the training sample composed of the panoramic data, and the corresponding classification result can be obtained. In practical application, the image data (such as the image data to be processed) is still input into the existing classification model, and the obtained positioning map is combined, that is, the same processing logic as that in step 101 and step 104 is utilized to directly position and obtain the direction of the mobile phone terminal, of course, the same processing logic can be used to position and obtain the pose of the person holding the mobile phone terminal, and the like, and the pose can also be deduced by performing relative position transformation after the direction of the mobile phone terminal is obtained through positioning.

By adopting the embodiment of the invention, the processing logic can be positioned at the terminal acquisition side and also can be positioned at the server side of the background, namely: the processing logic is used for optimization of direction positioning at a front end (target objects such as mobile phone terminals or vehicle terminals, etc.), and is used for optimization of direction positioning at a background server such as a cluster formed by a server cluster. For the situation that the direction (or orientation) of the target object is difficult to determine, the embodiment of the present invention obtains a positioning map based on semantic features by using the panoramic data as a training sample and inputting the training sample into a classification model for classification, and then positions the direction (or orientation) corresponding to the current target object by applying the classification model and the positioning map. Because the direction of the target object can be positioned by means of the classification model and the positioning map without changing the current hardware, the accurate direction positioning can be realized by using the current magnetometer, and simultaneously, the hardware cost for upgrading the magnetometer is reduced.

Fig. 2 shows a flow chart of a visual positioning method according to an embodiment of the invention. As shown in fig. 2, the process includes:

step 201, collecting panoramic data.

Step 202, inputting the panoramic data into a classification model as a training sample, and performing image preprocessing on at least one image data in the panoramic data in the classification model according to a semantic segmentation strategy to obtain a preprocessing result, wherein the preprocessing result is a partial image area in the at least one image data.

And 203, classifying the preprocessing result to obtain semantic features corresponding to the partial image regions and coordinate information corresponding to the partial image regions, and determining the semantic features and the coordinate information as the classification result.

Through the above step 202 and 203, the panoramic data can be input into the classification model as the training sample, and the obtained classification result includes: semantic features and corresponding coordinate information of a partial image region in at least one image data. The partial image area can be an image area which does not change for a long time, such as a building, a plaque and the like, in the image extracted by semantic segmentation, and the static information is the image area which does not change for a long time and has data stability and operation reliability for classification operation, so that the static information is extracted and classified, an accurate classification result can be achieved, and an accurate direction positioning effect can be obtained after a positioning map is obtained by the classification result subsequently. The partial image area may be a semantic block area in the positioning map, or a semantic block area in the corresponding positioning map.

And 204, obtaining a positioning map based on semantic features according to the classification result.

Step 205, inputting at least one image data to be processed collected by the current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map.

In one example, the target object may include: vehicles, pedestrians, buildings, cell phone terminals, street lights of a tree or street in the surrounding environment, etc. For example, the target object is a mobile phone terminal, and a user takes a scene image (the scene image may be a captured image at different viewing angles) at a current position by using the mobile phone terminal, where the scene image may be the image data to be processed. Due to the

above step

201 and 205, the classification model that can be classified is obtained by training the training sample composed of the input panoramic data, and the corresponding classification result can be obtained. In practical application, the image data (such as the image data to be processed) is still input into the existing classification model, and the obtained positioning map is combined, that is, the same processing logic as that in

step

201 and 205 is utilized to directly position and obtain the direction of the mobile phone terminal, of course, the same processing logic can be used to position and obtain the pose of the person holding the mobile phone terminal, and the like, and the pose can also be derived by performing position relative transformation after the direction of the mobile phone terminal is obtained by positioning.

In one embodiment, the image preprocessing at least one image data in the panoramic data according to a semantic segmentation strategy to obtain a preprocessing result includes: and identifying an object (such as a building, an object which does not move for a long time, such as a plaque) in a static state within a specified time period from the at least one image data, taking an image area corresponding to the object as static information, and taking the static information as the preprocessing result.

In one embodiment, the obtaining a positioning map based on semantic features according to the classification result includes: acquiring the semantic features and the coordinate information; according to the semantic features and the coordinate information, semantic block areas in the map are described correspondingly; according to the coordinate information, configuring an observation visual angle aiming at the semantic block area; and obtaining the positioning map consisting of a plurality of semantic block areas according to the semantic features, the coordinate information and the observation visual angle.

In one example, a map carrying semantic features and Inertial Measurement Unit (IMU) information is constructed using panoramic data acquired from the map, that is, a positioning map based on the semantic features is constructed by using the panoramic data, so that the problem of directional positioning of a target object can be solved. The method comprises the following contents of how to construct the positioning map based on the semantic features and carrying out visual positioning according to images uploaded by target objects.

For constructing the semantic map, the following details are described:

1. the panoramic data collected by the general map has relatively accurate positioning information related to the position and the direction, such as a GPS, an IMU, a magnetometer and the like, so that the coordinates and the direction of the shooting position (shooting direction for different parts of the panoramic image) of each panoramic image are both existing and relatively accurate.

2. Static information in the image (image areas that do not change over time, such as buildings, plaques, etc.) is extracted and classified using semantic segmentation. Thus each region of the image has semantic information and coordinate information.

3. Each semantic block area in a database (database) such as a map database is assigned with an observation visual angle or observation direction (for example, south-east, west-north, or a specific observation angle), and finally, the observation visual angles are put into a library to construct a positioning map (semantic map for short) based on semantic features. The positioning map based on the semantic features comprises a plurality of semantic regions, semantic features corresponding to each semantic region, coordinate information and direction information for positioning.

In one embodiment, configuring a viewing perspective for the semantic block region includes: and configuring different observation visual angles according to different positioning accuracies corresponding to different object observation directions in the panoramic data. Wherein the viewing perspective includes at least: viewing angles in at least two of east, south, west, and north directions. Dividing the viewing direction in a horizontal direction for the panoramic data; alternatively, the observation direction is divided in a pitch direction with respect to the panoramic data. For example, if only the requirement of low-precision orientation is met (e.g., four directions of south, east, west, and north are determined), the viewing directions may be divided by using low-granularity positioning precision, for example, 360 degrees of panoramic data is divided into four viewing angles of south, east, west, and north, so as to assign corresponding viewing angles to the semantic block regions corresponding to the respective directions. Of course, in order to achieve higher positioning accuracy, the panoramic data may be divided into more viewing angles. Further, the panoramic data may be divided not only in the horizontal direction but also in the pitch direction.

In an embodiment, the inputting at least one image data to be processed acquired by a current target object into the classification model, and obtaining a direction of the target object by positioning in combination with the positioning map includes: in the classification model, performing image preprocessing on at least one image data to be processed according to a semantic segmentation strategy, and reserving static information in the at least one image data to be processed; and positioning through the positioning map to obtain the direction of the target object according to the semantic features, the coordinate information and the observation visual angle corresponding to the static information.

In an embodiment, the classification model and the positioning map are used to position the direction of the target object, which may be simply to view overlapping sector areas. The sector area is an example of a positioning area in the semantic area in the embodiment of the present invention, and the specific shape of the area is not limited. Obtaining the direction of the target object through the positioning of the positioning map according to the semantic features, the coordinate information and the observation visual angle corresponding to the static information, wherein the method comprises the following steps: performing image matching on the static information and semantic block areas in the positioning map to obtain at least one target semantic block area with image similarity with the static information, wherein the at least one target semantic block area corresponds to the same coordinate information; when the at least one target semantic block area has the superposition of a plurality of observation visual angles, obtaining a positioning parameter according to a multi-visual-angle superposition area; and positioning to obtain the direction of the target object according to the positioning parameters.

How to construct the positioning map based on semantic features has been described previously, and how to perform visual positioning according to an image uploaded by a target object, that is, how to apply the classification model and the positioning map to realize positioning of the direction of the target object, the following is detailed:

1. and performing semantic segmentation on the uploaded to-be-processed acquired image, and reserving static information in the image.

2. And matching each semantic area in the acquired image with the image semantic area in the map database, so that a plurality of semantic level matches are obtained. The semantic region may also be referred to as a semantic block.

3. Because each semantic area has an observation range (e.g., corresponds to one or more observation visual angles) when the positioning map is constructed, each match will generate a sector area on the 2D plane, and the area with the densest intersection can obtain a rough direction or pose (position) and an observation visual angle of the collected image. If the viewing perspective also includes a pitch angle, the sector is in 3D space, not on a 2D plane.

4. If a more precise direction or orientation (pos) is desired, the map of locations needs to be constructed into point cloud data using SFM techniques. And then continuing to perform matching of the 2D-to-3D mapping in the above matching.

For SFM, at least comprising: a step of feature extraction (generally adopting SIFT operator, and having scale and rotation invariance); matching and establishing tracking images (such as a track list), such as matching two images by two by Euclidean distance; initializing an image pair to find the image pair that is the largest relative to a baseline of an acquisition device (e.g., a camera); initializing a relative orientation of the pair of images; the step of sparsely reconstructing the SFM waits. It should be noted that, in addition to obtaining the position information of the target object, the position of the observer (user) can be inferred by using the superposition of the multiple observation angles in the above positioning process, so that the position of the user when the image is captured can be further obtained, and not only the position information of the target object (current device). It should be noted that the panoramic data in the embodiment of the present invention may also be a point cloud map, so that more accurate position information and position can be obtained. By adopting the embodiment of the invention, the orientation of the current equipment such as a mobile phone terminal and the like in a scene with difficult direction determination, such as a crossroad, can be determined without adopting a magnetometer with higher hardware cost, and even the specific position can be determined.

Application example:

fig. 3 shows a schematic view of a visual positioning scene according to an embodiment of the present invention, as shown in fig. 3, panoramic data (captured image) is captured, the captured image is input into a classification model, and the image can be digitized before input. In the process of inputting the collected image as a training sample into a classification model for classification, preprocessing the collected image to retain static information (buildings, boards and the like) in the image, classifying the collected image in the classification model based on semantic features to obtain semantic features and coordinate information of an image area where the static information is located, and outputting the semantic features and the coordinate information corresponding to the image area as a classification result. The collected image may be image data collected at an intersection, including pedestrians walking at the intersection, vehicles, buildings, a mobile phone terminal, a tree or street lamp in the surrounding environment, and the like, for the collected image, the same target object (such as a vehicle, a pedestrian, a building, a mobile phone terminal, a tree or street lamp in the surrounding environment, and the like) may be collected from different perspectives (such as a left-to-right perspective, a right-to-left perspective, a top-to-bottom downward perspective, and the like), and different target objects may also be collected from different perspectives (such as a left-to-right perspective, a right-to-left perspective, a top-to-bottom downward perspective, and the like). Through semantic classification, it can be known which area is a vehicle, which area is a pedestrian, which area is a building, which area is a mobile phone terminal, which area is a tree or which area is a street lamp in the surrounding environment, and the like in the image data. And obtaining a positioning map based on the semantic features according to the semantic features, the coordinate information and at least one observation visual angle of the assignment. And the corresponding observation visual angles can be assigned to the image areas according to the coordinate information, and at least two observation visual angles can be corresponding to the same target object. The target object is a hexahedron in the three-dimensional space, and accordingly, the observation angle may be divided in the three-dimensional space, but is not limited thereto, and may also be divided in the two-dimensional space, and may also be mapped to the three-dimensional space after the two-dimensional space is divided, and so on. Examples of viewing perspectives are, for example, a left-to-right perspective, a right-to-left perspective, a top-down perspective, and the like. And finally, carrying out direction positioning on the target object based on the positioning map and the classification model of the semantic features.

Fig. 4 shows a block diagram of a visual positioning apparatus according to an embodiment of the present invention, the apparatus includes: an acquisition unit 31 for acquiring panoramic data; the classification unit 32 is configured to input the panoramic data as a training sample into a classification model for classification, so as to obtain a classification result; the map generation unit 33 is used for obtaining a positioning map based on semantic features according to the classification result; and the positioning unit 34 is configured to input at least one to-be-processed image data acquired by a current target object into the classification model, and obtain a direction of the target object by positioning in combination with the positioning map.

In one embodiment, the classification unit further includes: a preprocessing subunit, configured to perform, in the classification model, image preprocessing on at least one image data in the panoramic data according to a semantic segmentation policy to obtain a preprocessing result, where the preprocessing result is a partial image area in the at least one image data;

a classification subunit, configured to classify the preprocessing result to obtain semantic features corresponding to the partial image regions and coordinate information corresponding to the partial image regions; and determining the semantic features and the coordinate information as the classification result.

In one embodiment, the preprocessing subunit is further configured to: identifying an object that is stationary for a specified period of time from the at least one image data; taking an image area corresponding to the object as static information; and taking the static information as the preprocessing result.

In one embodiment, the map generation unit further includes: the information acquisition subunit is used for acquiring the semantic features and the coordinate information; the area description subunit is used for correspondingly describing the semantic block area in the map according to the semantic features and the coordinate information; the view angle configuration subunit is used for configuring an observation view angle aiming at the semantic block area according to the coordinate information; and the map generation subunit is used for obtaining the positioning map formed by a plurality of semantic block areas according to the semantic features, the coordinate information and the observation visual angle.

In one embodiment, the view angle configuring subunit is further configured to: configuring different observation visual angles according to different positioning accuracies corresponding to different object observation directions in the panoramic data; the viewing perspective includes at least: viewing angles in at least two of east, south, west, and north directions.

In one embodiment, the apparatus further includes a direction dividing unit configured to: dividing the viewing direction in a horizontal direction for the panoramic data; alternatively, the observation direction is divided in a pitch direction with respect to the panoramic data.

In one embodiment, the positioning unit further includes: the image preprocessing subunit is used for performing image preprocessing on at least one image data to be processed according to a semantic segmentation strategy in the classification model and reserving static information in the at least one image data to be processed; and the object positioning subunit is used for positioning through the positioning map to obtain the direction of the target object according to the semantic features, the coordinate information and the observation visual angle corresponding to the static information.

In one embodiment, the object positioning subunit is configured to: performing image matching on the static information and semantic block areas in the positioning map to obtain at least one target semantic block area with image similarity with the static information, wherein the at least one target semantic block area corresponds to the same coordinate information; when the at least one target semantic block area has the superposition of a plurality of observation visual angles, obtaining a positioning parameter according to a multi-visual-angle superposition area; and positioning to obtain the direction of the target object according to the positioning parameters.

The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.

Fig. 5 shows a block diagram of the structure of an information processing apparatus according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the automatic driving method in the above-described embodiment when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.

The device also includes: and a communication interface 930 for communicating with an external device to perform data interactive transmission.

Memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A visual positioning method, characterized in that the method comprises:

collecting panoramic data;

inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result; the classification result comprises an image area used for distinguishing each target object in any image data of the panoramic data, and the image area has semantic features and corresponding coordinate information;

inputting at least one image data to be processed acquired by a current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map;

obtaining a positioning map based on semantic features according to the classification result comprises the following steps: acquiring the semantic features and the coordinate information; according to the semantic features and the coordinate information, semantic block areas in the map are described correspondingly; according to the coordinate information, configuring an observation visual angle aiming at the semantic block area; and obtaining the positioning map consisting of a plurality of semantic block areas according to the semantic features, the coordinate information and the observation visual angle.

2. The method of claim 1, wherein the inputting the panoramic data into a classification model as a training sample for classification to obtain a classification result comprises:

3. The method of claim 2, wherein the image preprocessing at least one image data of the panoramic data according to a semantic segmentation strategy to obtain a preprocessing result comprises:

taking an image area corresponding to the object as static information;

and taking the static information as the preprocessing result.

4. The method of claim 1, wherein the configuring of the viewing perspective for the semantic block region comprises:

5. The method of claim 4, further comprising: dividing the viewing direction in a horizontal direction for the panoramic data; alternatively, the first and second electrodes may be,

for the panoramic data, the viewing direction is divided in a pitch direction.

6. The method according to any one of claims 1-5, wherein the inputting at least one image data to be processed acquired by a current target object into the classification model, and positioning to obtain the direction of the target object in combination with the positioning map comprises:

7. The method according to claim 6, wherein the obtaining the direction of the target object through the positioning of the positioning map according to the semantic features, the coordinate information and the observation angle corresponding to the static information comprises:

8. A visual positioning device, the device comprising:

the acquisition unit is used for acquiring panoramic data;

the classification unit is used for inputting the panoramic data serving as a training sample into a classification model for classification to obtain a classification result; the classification result comprises an image area used for distinguishing each target object in any image data of the panoramic data, and the image area has semantic features and corresponding coordinate information;

the positioning unit is used for inputting at least one image data to be processed acquired by a current target object into the classification model, and positioning to obtain the direction of the target object by combining the positioning map;

the map generation unit further includes:

9. The apparatus of claim 8, wherein the classification unit further comprises:

10. The apparatus of claim 9, wherein the pre-processing subunit is further configured to:

taking an image area corresponding to the object as static information;

and taking the static information as the preprocessing result.

11. The apparatus of claim 8, wherein the view configuration subunit is further configured to:

12. The apparatus of claim 11, further comprising a direction dividing unit configured to:

for the panoramic data, the viewing direction is divided in a pitch direction.

13. The apparatus according to any one of claims 8-12, wherein the positioning unit further comprises:

14. The apparatus of claim 13, wherein the object positioning subunit is configured to:

15. A visual positioning device, the device comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.