CN116664812A

CN116664812A - Visual positioning method, visual positioning system and electronic equipment

Info

Publication number: CN116664812A
Application number: CN202211521750.4A
Authority: CN
Inventors: 李毅超; 陈敬濠
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-08-29

Abstract

The embodiment of the application discloses a visual positioning method, a visual positioning system and electronic equipment, which are applicable to the technical field of visual positioning, and the method comprises the following steps: and acquiring query images of the position points of the equipment to be positioned and equipment point cloud data. And determining the target range of the equipment to be positioned according to the query image and the equipment point cloud data. And carrying out image matching on the sample image related to the target range based on the query image, determining a target image matched with the query image, and determining the position information of the equipment to be positioned based on the target image. The embodiment of the application can improve the positioning precision of visual positioning in similar scenes.

Description

Visual positioning method, visual positioning system and electronic equipment

Technical Field

The present application relates to the field of visual positioning, and in particular, to a visual positioning method, a visual positioning system, and an electronic device.

Background

The visual positioning is a technology for realizing positioning by utilizing images, has the advantages of low cost, high positioning precision and the like, and is widely applied to various indoor or outdoor positioning scenes.

In practical applications, it is found that in some scenes with similar or repetitive objects (hereinafter referred to as similar scenes), the positioning accuracy of visual positioning may be significantly reduced, and even a situation that positioning cannot be performed may occur. For example, in some scenes including doors and windows of the same appearance, in scenes including rooms with repeated structures, and in scenes including underground parking lots with a large number of the same columns, visual positioning may be degraded or even impossible.

Therefore, how to improve the positioning accuracy of visual positioning in similar scenes is a problem to be solved in practical applications.

Disclosure of Invention

In view of this, the embodiment of the application provides a visual positioning method, a visual positioning system and an electronic device, which can improve the positioning accuracy of visual positioning in similar scenes.

A first aspect of an embodiment of the present application provides a visual positioning method, applied to a visual positioning device, where the method includes:

firstly, acquiring a query image of a position point of equipment to be positioned and equipment point cloud data of the equipment to be positioned, wherein the equipment point cloud data comprises three-dimensional point cloud data of the position point of the equipment to be positioned. And determining the target range of the equipment to be positioned according to the query image and the equipment point cloud data. And finally, carrying out image matching on the sample image associated with the target range based on the query image, determining a target image matched with the query image, and determining the position information of the equipment to be positioned based on the position information associated with the target image.

In the embodiment of the application, a large number of similar sample images possibly exist in similar scenes, so that coarse matching is firstly performed, and the possible position range of the equipment to be positioned is screened by utilizing the equipment point cloud data provided by the equipment to be positioned, so as to determine the target range of the equipment to be positioned. Because the device point cloud data is larger in view on the one hand than the query image, on the other hand it also contains depth information for a large number of spatial points within the surrounding environment. Therefore, based on the equipment point cloud data, richer space information around the equipment to be positioned can be obtained, and similar objects can be better distinguished. On the basis, the position range of the equipment to be positioned in the actual position can be accurately determined. And finally, according to the query image, carrying out accurate matching on the sample image in the position range of the equipment to be positioned, so that the equipment to be positioned can be accurately positioned in the similar scene, and the positioning accuracy in the similar scene is improved.

As an optional embodiment of the application, after determining the target image matched with the query image, pose information of the equipment to be positioned can be determined according to the query image and the template image.

In the embodiment of the application, whether to analyze the pose information of the equipment to be positioned can be selected according to actual requirements.

As an alternative embodiment of the application, a feature map is provided in the visual positioning device. The feature map comprises a plurality of position points in the space area, sample images respectively associated with the position points, and three-dimensional point cloud data respectively associated with the position points.

In a first possible implementation manner of the first aspect, determining, according to the query image and the device point cloud data, a target range in which the device to be located is located includes:

the method comprises the steps of firstly determining a plurality of matched position ranges matched with a query image, then carrying out point cloud matching on three-dimensional point cloud data of each position range based on equipment point cloud data, and determining a target range matched with the equipment point cloud data.

In the embodiment of the application, firstly, rough matching is carried out, and one or more position ranges where the equipment to be positioned is possibly located are determined, so that rough positioning of the equipment to be positioned is realized. And secondly screening the possible position ranges by utilizing the equipment point cloud data provided by the equipment to be positioned. Because the equipment point cloud data can obtain richer space information around the equipment to be positioned, similar objects can be better distinguished. Therefore, the screening of the position range is carried out on the basis, and the target range of the equipment to be positioned in practice can be accurately determined.

As an embodiment of the present application, each location range includes one or more location points.

In a second possible implementation manner of the first aspect, determining a number of matched location ranges matched with the query image includes:

firstly, a plurality of similar images similar to a query image are searched from preset sample images, then image clustering is carried out on all the searched similar images, a plurality of clustering results are obtained, and the position range corresponding to each clustering result is determined.

The embodiment of the application provides a scheme for dividing the position range based on image clustering, wherein the size, the shape and the like of the position range determined at the moment are determined by the condition of actual similar images, namely, the self-adaptive position range division of the actual image searching condition. Therefore, the embodiment of the application has stronger self-adaptive capability to the possible position range, and the division of the position range is more accurate and flexible, and possible position points are not easy to miss, so that the positioning accuracy of the embodiment of the application is higher.

In a third possible implementation manner of the first aspect, searching a plurality of similar images similar to the query image from preset sample images includes:

And carrying out image retrieval on a preset sample image based on the query image to obtain a plurality of similar images.

The similar images are searched in an image searching mode, so that the searching efficiency of the similar images can be improved. The top N Zhang Yangben image with the highest matching degree in the image search result can be used as the similar image.

In a fourth possible implementation manner of the first aspect, acquiring device point cloud data of a device to be located includes:

and receiving the equipment point cloud data sent by the equipment to be positioned. Or alternatively, the process may be performed,

and receiving video data and inertial measurement unit data sent by the equipment to be positioned, and processing the video data and the inertial measurement unit data to obtain equipment point cloud data.

In the embodiment of the application, the equipment point cloud data can be actively analyzed by the equipment to be positioned and sent to the visual positioning equipment, or the equipment to be positioned can acquire the related video data and the related inertial measurement unit data and upload the video data and the related inertial measurement unit data, and the visual positioning equipment processes the video data and the related inertial measurement unit data to obtain the equipment point cloud data. Therefore, the method can be suitable for actual requirements in various application scenes.

In a fifth possible implementation manner of the first aspect, a query image of a location point where the device to be located is obtained. Acquiring device point cloud data of a device to be positioned, including:

The latest image frame in the video data is extracted as a query image.

In the embodiment of the application, the latest image frame in the video data can be used as the query image, and at the moment, the workload of acquiring the query image can be reduced under the condition of the existing video data for a user, thereby improving the positioning efficiency.

In a sixth possible implementation manner of the first aspect, the device point cloud data includes: three-dimensional point cloud data acquired by the device to be positioned within the latest preset time length.

In the embodiment of the application, the content actually contained in the acquired equipment point cloud data can be set in a mode of acquiring time.

A second aspect of an embodiment of the present application provides a visual positioning system comprising: a visual positioning device and a device to be positioned in the method of any of the above first aspects.

And acquiring a query image of the position point of the equipment to be positioned. Before acquiring the equipment point cloud data of the equipment to be positioned, the method further comprises the following steps:

the equipment to be positioned acquires a query image of a position point where the equipment is located, three-dimensional point cloud data acquired in the latest preset time length, and the three-dimensional point cloud data acquired in the latest preset time length is used as equipment point cloud data.

And the device to be positioned sends the query image and the device point cloud data to the visual positioning device.

And acquiring a query image of the position point of the equipment to be positioned. Acquiring device point cloud data of a device to be positioned, including:

the visual positioning device receives the query image and the device point cloud data.

A third aspect of an embodiment of the present application provides a visual positioning apparatus, including:

and the image acquisition module is used for acquiring a query image of the position point of the equipment to be positioned.

The device comprises a point cloud acquisition module, a positioning module and a positioning module, wherein the point cloud acquisition module is used for acquiring device point cloud data of a device to be positioned, and the device point cloud data comprises three-dimensional point cloud data of a position point of the device to be positioned.

And the range determining module is used for determining the target range of the equipment to be positioned according to the query image and the equipment point cloud data.

The position determining module is used for carrying out image matching on the sample image associated with the target range based on the query image, determining a target image matched with the query image, and determining the position information of the equipment to be positioned based on the position information associated with the target image.

As an embodiment of the application, the visual positioning device may also be used to implement the method of any of the above-mentioned first aspects.

In a fourth aspect, an embodiment of the application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a method according to any of the first aspects described above when executing the computer program.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which when executed by a processor performs a method as in any of the first aspects above.

In a sixth aspect, an embodiment of the present application provides a chip system, including a processor, the processor being coupled to a memory, the processor executing a computer program stored in the memory to implement a method according to any one of the first aspects. The chip system can be a single chip or a chip module composed of a plurality of chips.

In a seventh aspect, embodiments of the present application provide a computer program product for, when run on an electronic device, causing the electronic device to perform the method of any one of the first aspects above.

It will be appreciated that the advantages of the second to seventh aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

FIG. 1 is a schematic diagram of an implementation principle of visual positioning according to an embodiment of the present application;

FIG. 2 is a point cloud diagram of a spatial region according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a visual positioning method according to an embodiment of the present application;

fig. 4 is a schematic flow chart of a coarse positioning implementation method according to an embodiment of the present application;

fig. 5 is a schematic view of a scene of a similar image of an underground parking garage according to an embodiment of the present application;

fig. 6 is a schematic view of a scene of image clustering based on an Ncut algorithm according to an embodiment of the present application;

FIG. 7 is a schematic view of a scene of a position range according to an embodiment of the present application;

fig. 8 is a schematic diagram of a scene for implementing point cloud matching based on NDT algorithm according to an embodiment of the present application;

fig. 9 is an overall flow chart of a visual positioning method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a visual positioning device according to an embodiment of the present application;

fig. 11 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Concepts or nouns that may be involved in embodiments of the application are described below:

a plurality of: in the embodiment of the present application, a plurality means two or more.

Visual positioning system, to-be-positioned device and visual positioning device: the device to be positioned refers to an electronic device needing to acquire own position information. The visual positioning device is a generic term of electronic equipment for positioning equipment to be positioned and determining the position information of the equipment to be positioned. The visual positioning device is an execution main body of the visual positioning method provided by the embodiment of the application. A visual positioning system refers to a system comprising a device to be positioned and a visual positioning device. The device to be positioned and the visual positioning device may be the same electronic device or may be different electronic devices. When the device to be positioned and the visual positioning device are the same electronic device, the device to be positioned can execute each step of the visual positioning method in the embodiment of the application so as to realize the visual positioning of the device to be positioned.

Meanwhile, the embodiment of the application does not limit the specific equipment types of the equipment to be positioned and the visual positioning equipment too much. For example, in some alternative embodiments, the device to be positioned may be a virtual reality device (such as VR glasses), an enhanced display device (such as AR glasses), a personal computer, a mobile phone, a tablet computer, or a terminal-like intelligent electronic device such as a wearable device, and the visual positioning device may be an electronic device with a relatively high data processing capability such as a server, a super mobile personal computer, or a personal computer. For example, in an application scenario where a user sends a positioning request to a server by using a mobile phone to position user location information, the server is a visual positioning device, and the mobile phone is a device to be positioned.

Spatial region: the embodiment of the application does not excessively limit the size, shape, geographical position and the like of the space area, and can be determined by a technician according to requirements. For example, in some alternative implementations, the spatial region may be some open area outdoors, such as a square, a road, or an area within a city, etc. In other alternative embodiments, the spatial region may be some indoor region, such as a building or a particular region within a building, such as one or more floors within a building, or one or more rooms within a building, etc. And can be, for example, an indoor scene such as a mall, a garage, a hotel, an industrial park or a train station. In still other alternative embodiments, the selection or setting of the spatial regions may be more flexible, for example, some boundary determining rules of the spatial regions may be set as required, and the spatial regions may be divided by the determined boundaries. In this case, the space region may not be divided in the indoor or outdoor dimension. For example, a fixed point in the space may be set as the center of the circle, and an area included in a range within a preset radius of the center of the circle may be set as the space area in the embodiment of the present application. The center of the circle at this time may be a spatial point at any position in the real world, for example, a point on an object in the room or a point on a sculpture outdoors.

Visual localization is a technique that utilizes images to achieve localization, such as the widely applied visual localization system (Visual Positioning System/Service, VPS) technique. Reference may be made to fig. 1, which is a schematic diagram of an implementation principle of visual positioning according to an embodiment of the present application. In the visual positioning, a technician first uses techniques such as synchronous positioning and mapping (Simultaneous Localization and Mapping, SLAM) to construct feature maps for some spatial areas, and stores the feature maps in a visual positioning device in the cloud. The feature map includes three-dimensional point cloud data (hereinafter, the three-dimensional point cloud data is simply referred to as point cloud data) and sample images, which correspond to different position points in the spatial region. On this basis, the user can take a picture of where himself is located using the device to be located and send the picture as a query image to the visual locating device. After receiving the query image, the visual positioning device performs image retrieval on the query image in sample images in the feature map, and matches the most similar sample images. Since the location of the sample image in the spatial region is known (recorded in the feature map), the location information of the query image actually corresponding in the spatial region can be ascertained from the matched sample image. The determination of position information based on visual localization can be done so far.

On the basis of determining the position information of the equipment to be positioned, the visual positioning can also continuously determine the pose information of the equipment to be positioned according to the requirements of practical application. With continued reference to fig. 1, the visual localization device may now continue to extract 2-dimensional feature points of the query image. And meanwhile, based on the association relation between the 2-dimensional feature points of the sample images in the cloud feature map and the 3-dimensional feature points in the three-dimensional point cloud, determining 3-dimensional feature point data associated with the most similar sample images. And matching the 2-dimensional characteristic points of the query image with the determined 3-dimensional characteristic point data, and calculating the actual pose information of the equipment to be positioned in the space region. Meanwhile, confidence coefficient data of pose information can be determined according to requirements.

Since the position information of the sample image capturing place is known data at the time of constructing the feature map of the spatial region, it is very accurate. Thus, the visual positioning is performed by means of image matching, and positioning with higher accuracy can be realized theoretically. Thus, the visual positioning can be widely applied to various indoor or outdoor positioning scenes. Meanwhile, the visual positioning can also acquire pose information of the equipment position information, so that the visual positioning is very suitable for being applied to the fields of virtual reality, augmented reality, metauniverse and the like. Through visual localization technology, users can obtain excellent immersive operation experience in the fields of virtual reality, augmented reality, meta universe and the like.

From the above description of visual positioning, the positioning accuracy of visual positioning has an important relationship with the matching accuracy of the query image. When the matching accuracy of the query image is low, the credibility of the sample image matched by the visual positioning device is reduced. At this time, the accuracy of the position information of the query image determined based on the position information of the sample image with low reliability is also reduced. In practical applications, when the space region includes a similar scene with similar or repetitive objects, the feature map of the space region also includes a plurality of sample images with similar objects. At this time, if the user is in a similar scene and the query image uploaded by the user contains the similar or repeated objects, the visual positioning device will have difficulty in distinguishing the similar or repeated objects, so that the difficulty in matching the query image by the visual positioning device is increased, and the accuracy of matching is reduced. Therefore, the positioning accuracy of visual positioning is obviously reduced, and even the situation that positioning is impossible can occur.

For example, assume that the spatial region is building a, and that there is an underground parking garage (in this case, the underground parking garage is a similar scene) that includes a large number of identical columns within building a. In constructing the feature map, many of the sample images captured for the underground parking garage contain these same or similar columns. Thus, the final feature map has a large number of similar sample images containing columns. At this time, if the user is in the underground parking lot, a picture containing the column is taken by the mobile phone (i.e. the to-be-positioned device) and is uploaded to the server (i.e. the visual positioning device) as a query image. Because the number of sample images containing the columns in the feature map is large, the difficulty of matching when the server matches the query images is increased, and the matching accuracy is reduced. The sample image matched at this time is likely to be an image containing a column taken at other locations in the underground parking garage. Thus, the positioning of the true position of the user becomes inaccurate and the accuracy is reduced.

In order to improve the positioning accuracy of visual positioning in similar scenes, the visual positioning device in the embodiment of the application can acquire a query image of the device to be positioned and device point cloud data containing three-dimensional point cloud data of the point where the device to be positioned is currently located. On the basis, the visual positioning device firstly determines the position range where the device to be positioned is actually located based on the query image and the device point cloud data. And finally, carrying out image matching on the sample image of the feature map in the position range based on the query image, so as to accurately determine a matched target image. And at the moment, according to the position information of the target image, determining the position information of the equipment to be positioned to realize the positioning of the equipment to be positioned.

In the embodiment of the application, a large number of similar sample images possibly exist in similar scenes, so that coarse matching is firstly performed, and the possible position range of the equipment to be positioned is screened by utilizing the equipment point cloud data provided by the equipment to be positioned, so as to determine the position range of the equipment to be positioned. Because the device point cloud data is larger in view on the one hand than the query image, on the other hand it also contains depth information for a large number of spatial points within the surrounding environment. Therefore, based on the equipment point cloud data, richer space information around the equipment to be positioned can be obtained, and similar objects can be better distinguished. The position range is screened on the basis, so that the position range of the equipment to be positioned in the actual position can be accurately determined. And finally, according to the query image, carrying out accurate matching on the sample image in the position range of the equipment to be positioned, so that the equipment to be positioned can be accurately positioned in the similar scene, and the positioning accuracy in the similar scene is improved.

In order to illustrate the technical solution of the present application, the to-be-positioned device is also a terminal device, and the visual positioning device is a cloud server (hereinafter referred to as a cloud) for illustration through specific embodiments. According to the operation time sequence of visual positioning in practical application, the visual positioning can be divided into two stages: the construction of the map and visual localization is detailed as follows:

stage one, building a map:

before visual localization can be performed, a technician first needs to construct or acquire a feature map (also referred to as a visual map) of the spatial region. Taking the construction of the feature map as an example, the embodiment of the application does not limit the specific construction method of the feature map too much, and can be selected or set by a technician. For example, in some alternative embodiments, a feature map may be constructed using SLAM. At this time, a technician or an electronic device (such as a mobile robot) responsible for constructing a map may perform measurements at a plurality of location points in the spatial region, that is, take sample images (the sample images may be taken at this time or may be image frames in a taken video), and collect point cloud data around the location points. And thus, a characteristic map comprising the space region point cloud image and the sample image is obtained.

As can be seen from the above description, the feature map in the embodiment of the present application at least includes a plurality of location points in the spatial region, sample images associated with the location points, and point cloud data associated with the location points. When the SLAM is used to construct the feature map, the three-dimensional point cloud data in the feature map may also be referred to as SLAM point cloud data.

Consider the situation that the shooting angle, light, weather, etc. during shooting in practical application may affect the finally shot image. Thus, for each location point, multiple sample images may be selected to be taken to enhance the effectiveness of subsequent image queries and matches. At this time, for a single location point, a plurality of sample images may be associated. Meanwhile, after the point cloud data of each position point are acquired, a point cloud image of a space region formed by the point cloud data can be obtained. Thus, the feature map can also be described as: the method comprises the step of including a point cloud image of a space region and a sample image respectively associated with a plurality of position points in the space region. Reference may be made to fig. 2, which is a schematic diagram of a point cloud of an exemplary spatial region provided by an embodiment of the present application.

After the feature map is built, the feature map can be stored in the cloud for use.

It should be understood that:

1. in the embodiment of the application, the position information of each position point in the space area is known and accurate, so that the position information of the sample image shot at each position point also belongs to known data. The location information storage method of these location points is not limited in this case, and may be stored in association with a feature map, or may be selected to be present in association with a feature map, for example, as labeling information of a feature map.

2. The embodiment of the application does not limit the number of the position points which are specifically measured in the space area too much, and can be set by a technician. Wherein, the more location points are selected, the denser the location points are, and the higher the highest accuracy of locating based on the characteristic map is theoretically. Conversely, the fewer the location points are selected, the more sparse the location points, and the lower the highest accuracy of the positioning based on the feature map.

3. In practical application, after the initial construction of the feature map is completed, technicians can update the feature map according to actual requirements so as to perfect or expand the feature map of the space region. The updating operation of the feature map is the same as the construction operation, and thus will not be described in detail. Based on the existing one feature map, the updating operation of the feature map may occur in the stage two: the visual positioning is performed prior to or simultaneously with stage two, and is not limited herein.

Stage two, visual positioning:

fig. 3 shows a flowchart of an implementation of the visual positioning method in the stage two according to the first embodiment of the present application, which is described in detail below:

s201, the terminal equipment acquires equipment point cloud data and images of current position points, takes the acquired images as query images, and sends the query images and the equipment point cloud data to a cloud. The device point cloud data comprises three-dimensional point cloud data of the current position point of the terminal device.

On the basis of the existing space region feature map of the cloud, if a user needs to locate in the process of using the terminal equipment, on one hand, an image of a current position point can be acquired first, and the image is sent to the cloud as a query image. The acquisition mode of the query image can be determined according to the actual application condition. For example, in some embodiments, a user may take a photograph of the current location point (i.e., the current location point) in any direction using the host device and take the photograph as a query image. The terminal device may acquire the query image by taking a photograph at this time. In other embodiments, other electronic devices or image capturing hardware besides the terminal device may be used to take a photograph of the current location point of the terminal device in any direction. The terminal device may then acquire the query image by receiving the image. In still other embodiments, the terminal device may also select the locally stored image as the query image. The terminal device may acquire the query image by reading the local image at this time.

On the other hand, in order to achieve positioning, the terminal device may further obtain device point cloud data including point cloud data of the current location point, and send the device point cloud data to the cloud. The device point cloud data are used for matching the point cloud data by the cloud to distinguish different position ranges containing the same or similar objects in similar scenes.

It should be noted that, the device point cloud data may also include point cloud data of other location points except the current location point of the terminal device. The specific content contained in the device point cloud data can be set by a technician according to actual requirements, and is not excessively limited. For example, in some optional embodiments, the device point cloud data may be set to include only the point cloud data of the current location point of the terminal device, where the terminal device may acquire the point cloud data of the current location point. In other alternative embodiments, the device point cloud data may be set to include the point cloud data of the n recently acquired position points, or set to include all the point cloud data within a preset duration range nearest to the current time. Wherein n is any integer greater than 1.

In addition, the embodiment of the application does not limit the acquisition mode of the equipment point cloud data too much, and can be determined according to actual application conditions. For example, in some embodiments, the terminal device may collect the device point cloud data from the environment. In other embodiments, other electronic devices or point cloud collecting hardware besides the terminal device may also be used to collect the device point cloud data and send the data to the terminal device. In still other embodiments, the terminal device may also choose to use locally stored device point cloud data.

As an optional embodiment of the present application, the operation of obtaining the device point cloud data in S201 may be replaced by: the terminal equipment acquires three-dimensional point cloud data acquired in a preset time period nearest to the current moment, and takes the acquired three-dimensional point cloud data as equipment point cloud data.

The three-dimensional point cloud data acquired at this time contains the three-dimensional point cloud data of the current position point acquired at the current moment, so the three-dimensional point cloud data can be used as equipment point cloud data. The specific value of the preset time period is not limited herein, and may be set by a technician according to actual requirements. For example, the value may be set to any value within 10 seconds to 60 seconds, for example, 30 seconds, and the terminal device may use the three-dimensional point cloud data acquired within approximately 30 seconds as the device point cloud data.

It should be appreciated that when setting specific content contained in the device point cloud data, the more content the device point cloud data contains (e.g., contains point cloud data at more location points, or contains point cloud data collected for a longer period of time), the greater the amount of environmental information it contains. On the one hand, the higher the requirements for collecting and processing the equipment point cloud data of the terminal equipment are, the higher the requirements for processing the equipment point cloud data by the cloud are. On the other hand, however, since the amount of environmental information contained is larger, the higher the matching accuracy is when the device point cloud data is subsequently used for data matching to distinguish between different location ranges. Conversely, when the content contained in the set device point cloud data is smaller, the amount of the environment information contained in the set device point cloud data is smaller. At this time, the requirements on the terminal equipment and the cloud end are reduced. Therefore, in practical application, the content contained in the point cloud data of the equipment actually required can be set according to the requirements of the terminal equipment and the cloud end actually and the accuracy requirements of the matching of the subsequent point cloud data.

S202, the cloud acquires query images and device point cloud data of the terminal device, and performs position matching on the feature map based on the query images to obtain a plurality of matched position ranges.

After receiving the query image and the device point cloud data sent by the terminal device, the cloud device can firstly perform coarse positioning on the terminal device based on the query image, namely, determine one or more position ranges where the terminal device is likely to be located in the space region according to the query image. It should be noted that, in the embodiment of the present application, the location range refers to a range formed by one or more location points in a spatial area. That is, a single position range may include only one position point, or may include a plurality of position points at the same time.

The embodiment of the application does not limit the specific method of coarse positioning too much, and can be set by a technician. For example, in some alternative embodiments, the query Image may be used to perform Image Retrieval (Image Retrieval) on the sample images in the feature map, so as to determine the matching N similar images (i.e., the top N sample images with the highest matching degree or similarity degree). And then taking the position points corresponding to the N similar images as the corresponding position ranges, thereby obtaining M matched position ranges. Wherein N is any positive integer. Because there may be a situation that a plurality of similar images belong to one position point in the N similar images, so that a situation that a part of position points are repeated may exist in the position points corresponding to the N similar images, in practical application, M is a positive integer less than or equal to N.

As an alternative embodiment of the present application, referring to fig. 4, a schematic flow chart of a coarse positioning implementation method provided by an embodiment of the present application is shown. In the embodiment of the present application, the "performing location matching on the feature map based on the query image" in S202 to obtain a plurality of matched location ranges "may be replaced by: s301 and S302.

S301, the cloud end performs image retrieval on sample images in the feature map based on the query image to obtain a plurality of similar images corresponding to the query image.

In order to implement coarse positioning of the terminal device, in the embodiment of the present application, image retrieval may be performed according to the query image first, so as to find out an image similar to the query image (i.e., a similar image in the embodiment of the present application) from the sample image in the feature map. Considering that the number of sample images contained in the feature map is generally large, the embodiment of the application adopts an image retrieval technology to realize the search of similar images so as to improve the search efficiency of the similar images. In theory, other technologies or algorithms besides the image retrieval technology can be adopted to find similar images, so that the method for finding similar images can be set according to actual requirements in practical application. The embodiment of the application does not excessively limit the specific number N of the similar images which are inquired, and can be set according to actual requirements. For example, in some alternative embodiments, a particular value may be set as the value of N, e.g., may be any value between 20 and 80, such as N may be 50. In other alternative embodiments, a matching degree threshold may be set, and all sample images with matching degrees higher than the matching degree threshold are used as similar images. The value of N may be a non-fixed value that varies with each particular implementation.

S302, the cloud performs image clustering on the retrieved similar images to obtain a plurality of clustering results, and determines the position range corresponding to each clustering result.

After obtaining N similar images, theoretically, the position point corresponding to each similar image is a certain possible position point where the terminal device is currently located. However, in consideration of the aspect, for the same location point, the number of captured sample images tends to be more than one when constructing the feature map, and the similarity between sample images of the same location point is generally high. On the other hand, the similarity between sample images of similar location points is also generally high. On the other hand, since some similar or repetitive objects are contained in similar scenes, the content similarity contained in the sample images is also high for the sample images taken by these similar or repetitive objects at different position points. Therefore, in practical applications, the location points corresponding to N similar images are generally smaller than N, and the location points corresponding to many similar images are very close. Therefore, if the position range is determined by using a single similar image as a unit, the workload and difficulty of subsequent screening of the position range may be increased, so that the accuracy of final positioning is affected.

Based on the above situation, in the embodiment of the application, after obtaining N similar images, the cloud end continues to perform image clustering processing on the similar images so as to classify similar images with the same or similar associated actual position points. The location range of the sample image mapped in the feature map in each clustering result is the location range matched in S203. For example, the clustering result includes one or more sample images, and the set of position points corresponding to the sample images in the space region can be treated as a position range. For example, when the clustering result includes a plurality of sample images, the position point of the most edge of the position points corresponding to the sample images in the space area may be used as the boundary of the position range, and a position range including all the position points under the clustering result may be partitioned. In this case, a greater number of location points may be included in the location range than the number of location points associated under the clustering result. For example, if a certain clustering result includes 10 sample images and corresponds to 8 position points, the position range is divided by using the position point at the edge of the 8 position points as a boundary, and other position points than the 8 position points may be included in the corresponding position range. Therefore, in the embodiment of the application, each clustering result corresponds to a position range. The embodiment of the application does not limit the image clustering algorithm which is specifically used too much, and can be set by a technician according to the needs. For example, hierarchical clustering algorithm (Hierar Chical Methods) or Density clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) may be used, and graph-segmentation clustering algorithm such as normalized cut (Ncut) may be used that may refer to co-view information between images.

As one embodiment of the present application, it is assumed that a similar scene is an underground parking garage, in which a plurality of identical columns are included, and numbers such as 1, 2, 3, and 4 are drawn on a part of the columns for distinction. Due to different times, angles and the like of shooting sample images, certain differences exist in the content of the finally obtained sample images at the same position. Assume that after the image search is performed on the underground parking garage in S301, 20 similar images are obtained, and reference may be made to fig. 5 at this time, which is a schematic view of a scene of the similar image of the underground parking garage provided by the embodiment of the present application. On the basis of fig. 5, as a specific embodiment of image clustering in the present application, the cloud may perform image clustering on similar images (a), (b), (c), (d), (e), (f), (g), (h), (i), (j), (k), (l), (m), (n), (o), (p), (q), (r),(s) and (t) in fig. 5. At this time, the image clustering can be obtained:

the first clustering result corresponding to the column 1 comprises: similar images (a), (c), (e), (h), (j), (k), (m), (q) and (t).

The second clustering result corresponding to the column 2 comprises: similar images (b), (f), (n) and(s).

The third clustering result corresponding to the column 3 comprises: similar images (d), (g), (o) and (r).

The third clustering result corresponding to column 4 comprises: similar images (i), (l) and (p).

As an alternative embodiment of the present application, in the embodiment of the present application, the Ncut regards each of all the similar images as a node V of the graph, quantifies the similarity between the nodes (i.e. the number of points included in the 3-dimensional point cloud of the two node brackets in common view) as the weight of the connecting edge E of the corresponding vertex, so as to obtain an undirected weighted graph based on the similarity, where the undirected weighted graph may be represented as G (V, E), and thus the clustering problem may be converted into the partitioning problem of the graph. The optimal division criterion based on graph theory is to maximize the internal similarity of the divided sub-graphs and minimize the similarity between the sub-graphs. Therefore, in the process of dividing the similar images, whether any two nodes exist in the divided set or not is not directly connected, and the divided set can be used as a judging condition of whether to continue dividing or not. If there is no direct connection between two nodes in the collection, then grouping of similar images in the collection can continue. Referring to fig. 6, a schematic view of a scenario of image clustering based on the Ncut algorithm according to an embodiment of the present application is shown. In the embodiment of the present application, it is assumed that there are 8 total nodes (i.e., 8 similar images) of node 1, node 2, node 3, node 4, node 5, node 6, node 7, and node 8. When similar images are divided, the dividing can be carried out according to the direct connection rule between every two nodes in a single set, so that the method can be obtained: set 1, set 2, and set 3 (i.e., three clustering results). Wherein, the set 1 contains a node 1, a node 2 and a node 3, the set 2 contains a node 4 and a node 5, and the set 3 contains a node 6, a node 7 and a node 8.

S303, the cloud acquires three-dimensional point cloud data corresponding to each clustering result from the feature map.

Based on the completion of image clustering to determine one or more corresponding position ranges, the cloud end can extract three-dimensional point cloud data contained in each position range from the feature map. In order to distinguish from the device point cloud, in the embodiment of the present application, the three-dimensional point cloud data corresponding to each position range may be referred to as local point cloud data or classified point cloud data.

It should be understood that in the embodiment of the present application, S303 is an optional step, and in other alternative embodiments, it may be selected not to extract the local point cloud data. For example, in other alternative embodiments, local point cloud data corresponding to each location range may also be drawn in the feature map.

In the embodiment of the application, the cloud firstly searches a plurality of similar images from all the sample images, thereby realizing the confirmation of all the possibly matched position points (namely, the position points corresponding to the similar images) of the terminal equipment. And the cloud end clusters the similar images to distinguish each position range corresponding to the similar images in the similar scene, so as to determine the local position range where the terminal equipment is possibly located in the space region. Therefore, the embodiment of the application can realize accurate, effective and quick coarse positioning of the terminal equipment.

Referring to fig. 7, a schematic view of a matched position range according to an embodiment of the present application is shown in the embodiment of fig. 2. In the embodiment of the present application, it is assumed that 4 position ranges are matched, respectively: position range 1, position range 2, position range 3, and position range 4. The number of location points included in each location range in the matching process is different, so that the size and shape of each location range are different in practical application.

It should be noted that, in the embodiment of the present application, both the query image and the device point cloud data are used to help the cloud to accurately locate the terminal device. On the basis, in practical application, the embodiment of the application does not limit the mode that the terminal equipment sends the query image and the equipment point cloud data to the cloud to too much. In some optional embodiments, the terminal device may send the query image and the device point cloud data to the cloud in the manner of S201 described above. In other alternative embodiments, for the device point cloud data, the terminal device may send other data including the three-dimensional point cloud data to the cloud end, and then the cloud end extracts the required three-dimensional point cloud data from the received data by itself. For example, the terminal device may send video data including depth information acquired during a preset period of time, and inertial measurement unit (Inertial Measurement Unit, IMU) data associated with the video data to the cloud. And extracting required three-dimensional point cloud data by the cloud according to the video data and the inertial measurement unit data. Similarly, for the query image, the terminal device may send other data including the query image to the cloud end, and then the cloud end extracts the required query image from the received data by itself. For example, when the cloud receives the video data, the cloud may process the latest image frame in the video data as a query image, so that the terminal device may not need to send a query image alone. Based on this principle, in the operation of the embodiment S202 of the present application, the cloud may obtain the required query image and device point cloud data by extracting the relevant data sent by the terminal device instead of receiving the query image and device point cloud data sent by the terminal device.

And S203, the cloud terminal respectively performs point cloud matching on the three-dimensional point cloud data of each position range based on the device point cloud data, and screens out a target range with highest matching degree.

In the embodiment of the application, after determining each possible position range of the terminal equipment, the cloud end can continuously perform point cloud matching on the equipment point cloud data and the local point cloud data corresponding to each position range. Compared with the query image, the view field of the device point cloud data is larger, more information outside the view field of the query image can be contained, and meanwhile, the device point cloud data also contains depth information of a large number of space points in the surrounding environment of the terminal device. The device point cloud data contains more spatial information than the query image. By matching each local point cloud data by using the device point cloud data, local areas (i.e. position ranges) containing similar or repeated objects in similar scenes can be effectively distinguished, and the positioning range screening after coarse positioning is realized so as to narrow the range.

The embodiment of the application does not limit the specifically used point cloud matching method too much, and can be selected or set by a technician. For example, in some alternative embodiments, point cloud matching may be achieved using a normal distribution transform (Normal Distributions Transform, NDT) algorithm, an iterative closest point (Iterative Closest Point, ICP) algorithm, or a standard iterative closest point (Normal Iterative Closest Point, NICP) algorithm, among others.

As an optional embodiment of the present application, reference may be made to fig. 8, which is a schematic view of a scenario for implementing point cloud matching based on an NDT algorithm provided by an embodiment of the present application. In the embodiment of the application, the cloud uses an NDT algorithm to respectively perform point cloud matching on the equipment point cloud data and each local point cloud data, and calculates the median error of the equipment point cloud data and each local point cloud data so as to evaluate the consistency of the equipment point cloud data and each local point cloud data. And selecting the local point cloud data with the minimum median error from the local point cloud data to be used as the local point cloud data with the highest matching degree. And at this time, the position range corresponding to the local point cloud data with the highest matching degree is the target range.

S204, the cloud terminal performs image matching on the query image and the sample image related to the target range, and determines a matched target image.

After the position range of the terminal equipment is determined, accurate positioning in the position range can be started. At the moment, the cloud can perform image matching on the sample images related to the query image and the target range one by one, and the sample image with the highest matching degree is screened out to serve as the target image. The sample images for image matching may be all similar images associated with the target range, or all sample images associated with the target range (including all similar images associated with the target range), which is not limited herein.

The embodiment of the application does not limit the image matching method which is specifically used too much, and can be set by a technician. For example, in some alternative embodiments, image matching may be achieved using a 2-dimensional feature point-based image matching algorithm. In the embodiment of the application, for the sample images in the feature map, 2-dimensional feature point extraction and storage can be performed on each sample image while the feature map is constructed. At this time, 2-dimensional feature point data of the sample image associated with the target range may be read from the feature map in S204. Alternatively, for the sample image in the feature map, the 2-dimensional feature point data of the sample image associated with the target range may be extracted again at the time of operation S204. For the query image, the cloud end can extract 2-dimensional feature point data. And after obtaining the 2-dimensional characteristic point data of the query image and the 2-dimensional characteristic point data of the sample image associated with the target range, the cloud can match the 2-dimensional characteristic point data among the images, and determine one sample data with the highest matching degree of the 2-dimensional characteristic point data as the target image. The 2-dimensional feature point data refers to computer vision feature point data of an image, and specifically, content contained in the 2-dimensional feature point data can be determined according to actual application conditions. For example, the 2-dimensional feature point data may be feature points manually labeled by a technician, or image corner points, or feature points labeled by artificial intelligence (Artificial Intelligence, AI), or the like.

S205, the cloud end determines the position information and the pose information corresponding to the query image according to the position information of the target image, and sends the position information and the pose information to the terminal device.

After the target image is determined, the position information corresponding to the query image (namely, the position information corresponding to the terminal device) can be determined according to the position information corresponding to the target image. It should be noted that, in the embodiment of the present application, the location information may be location information in a world coordinate system, or may be location information in other custom coordinate systems. For example, a three-dimensional space coordinate system may be constructed by using a point in the space region as an origin, and the position information of any point in the space region may be represented by the coordinates of the three-dimensional space coordinate system. Meanwhile, the position information may be recorded in the form of coordinates, or may be recorded in other forms than coordinates. The positional information may be recorded, for example, in the form of a positional relationship of a certain object with other objects, such as by "the positional point C is at the intermediate position of the object a and the object B". And are not limited herein.

The cloud end can calculate pose information of the terminal equipment while determining the position information. The method for calculating pose information according to the embodiment of the application is not excessively limited, and can be set by a technician, for example, in some embodiments, the pose information calculation can be realized by adopting a method of N-point Perspective (PNP).

After the position information and the pose information are obtained, the cloud end can send the two information to the terminal equipment. It should be noted that, in practical applications, information of a specific required query may be determined each time according to different requirements. For example, the position information and the pose information may be queried simultaneously, or pose calculation may not be performed, that is, it is not necessary to determine that the corresponding pose information is not required to issue the pose information.

S206, the terminal equipment receives the position information and the pose information sent by the cloud, and the positioning is completed.

After receiving the position information and the pose information issued by the cloud (if the pose information is not needed, the terminal equipment receives the position information), the visual positioning is realized.

As a specific embodiment of the present application, reference may be made to fig. 9, which is a schematic overall flow chart of a visual positioning method according to an embodiment of the present application. In the embodiment of the application, a camera (camera) and an IMU are arranged in the terminal equipment, and equipment point cloud data are acquired by utilizing the camera and the IMU based on SLAM technology. Namely, the acquired equipment point cloud data belongs to SLAM point cloud data. Meanwhile, the terminal equipment adopts three-dimensional point cloud data acquired within 30 seconds closest to the current moment as equipment point cloud data. And a feature map corresponding to the space region is preset in the cloud. The details are as follows:

And the terminal equipment sends the query image and the equipment point cloud data within 30 seconds to the cloud.

The cloud firstly performs image retrieval on the query image based on the feature map to obtain N similar images.

And carrying out image clustering on the N similar images by the cloud to obtain H clustering results. Wherein N and M are positive integers, and N is greater than or equal to M.

And the cloud extracts local point cloud data corresponding to each clustering result from the feature map.

And the cloud terminal performs point cloud matching on the device point cloud data and each piece of local point cloud data respectively, and takes the local point cloud data with the highest matching degree as target point cloud data (namely, the local point cloud data corresponding to the target range).

And the cloud extracts 2-dimensional characteristic point data of the query image, and utilizes the 2-dimensional characteristic point data of the query image to match the 2-dimensional characteristic point data of each sample image associated with the target point cloud data, so as to determine the target image with the highest matching degree.

The cloud end determines the position information of the terminal equipment based on the target image, calculates the pose to determine the corresponding pose information, and then transmits the position information and the pose information to the terminal equipment.

The implementation details, principles, effects, etc. of each step in the embodiments of the present application may refer to the descriptions of the related embodiments of fig. 3 to 8, etc., and are not repeated here.

In the embodiment of the application, a large number of similar sample images are considered in similar scenes, so that coarse matching is firstly performed, one or more position ranges where equipment to be positioned is possibly located are determined, and coarse positioning of the equipment to be positioned is realized. And secondly screening the possible position ranges by utilizing the equipment point cloud data provided by the equipment to be positioned. Because the device point cloud data is larger in view on the one hand than the query image, on the other hand it also contains depth information for a large number of spatial points within the surrounding environment. Therefore, based on the equipment point cloud data, richer space information around the equipment to be positioned can be obtained, and similar objects can be better distinguished. The position range is screened on the basis, so that the position range of the equipment to be positioned in the actual position can be accurately determined. And finally, according to the query image, carrying out accurate matching on the sample image in the position range of the equipment to be positioned, so that the equipment to be positioned can be accurately positioned in the similar scene, and the positioning accuracy in the similar scene is improved.

In addition, in the embodiment of the application, the determination of the position range can be realized by utilizing image clustering. The size, shape and the like of the position range determined at this time are determined by the situation of the actual similar image, namely, the self-adaptive position range of the actual image retrieval situation is divided. Therefore, the embodiment of the application has stronger self-adaptive capability to the possible position range, and the division of the position range is more accurate and flexible, and possible position points are not easy to miss, so that the positioning accuracy of the embodiment of the application is higher.

Meanwhile, in the embodiment of the application, one or more position ranges are coarsely positioned, one target position range which is the best match is screened out, and then the accurate positioning is performed from the target range. The sample image range which is actually required to be matched with the feature points based on the query image can be limited to a final target range, namely, a large number of similar images are actively excluded from the retrieved similar images. Compared with the mode of carrying out characteristic point matching for a plurality of times from similar images to determine a final target image, the embodiment of the application can eliminate a large number of irrelevant similar images, thereby reducing the work of matching a large number of image characteristic points. Therefore, the embodiment of the application can effectively reduce the time consumption of visual positioning and improve the visual positioning speed.

Corresponding to the visual positioning method described in the above embodiments, fig. 10 shows a schematic structural diagram of the visual positioning device provided in the embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 10, the visual positioning apparatus includes:

the image acquisition module 1001 is configured to acquire a query image of a location point where the device to be located is located.

The point cloud obtaining module 1002 is configured to obtain device point cloud data of a device to be located, where the device point cloud data includes three-dimensional point cloud data of a point where the device to be located is located.

The range determining module 1003 is configured to determine, according to the query image and the device point cloud data, a target range in which the device to be located is located.

The location determining module 1004 is configured to perform image matching on a sample image associated with a target range based on the query image, determine a target image matched with the query image, and determine location information of a device to be located based on location information associated with the target image.

As an embodiment of the present application, the visual positioning device may further implement the steps responsible for the visual positioning device in any of the above method embodiments. The process of implementing the respective functions of each module in the visual positioning device provided in the embodiment of the present application may refer to the foregoing description of the embodiment shown in fig. 3 and other related method embodiments, which are not repeated herein.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance. It will also be understood that, although the terms "first," "second," etc. may be used herein in some embodiments of the application to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first table may be named a second table, and similarly, a second table may be named a first table without departing from the scope of the various described embodiments. The first table and the second table are both tables, but they are not the same table.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The visual positioning method provided by the embodiment of the application can be applied to electronic equipment such as mobile phones, tablet computers, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA) and the like, and the embodiment of the application does not limit the specific types of the electronic equipment.

For example, the electronic device may be a Station (ST) in a WLAN, a cellular telephone, a cordless telephone, a Session initiation protocol (Session InitiationProtocol, SIP) telephone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital assistant (Personal Digital Assistant, PDA) device, a handheld device with wireless communication capabilities, a computing device or other processing device connected to a wireless modem, an in-vehicle device, a car networking terminal, a computer, a laptop computer, a handheld communication device, a handheld computing device, a satellite radio, a wireless modem card, a television Set Top Box (STB), a customer premise equipment (customer premise equipment, CPE) and/or other devices for communicating over a wireless system as well as next generation communication systems, such as electronic devices in a 5G network or electronic devices in a future evolved public land mobile network (Public Land Mobile Network, PLMN) network, etc.

By way of example, but not limitation, when the electronic device is a wearable device, the wearable device may also be a generic name for applying wearable technology to intelligently design daily wear, developing wearable devices, such as glasses, gloves, watches, apparel, shoes, and the like. The wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also can realize a powerful function through software support, data interaction and cloud interaction. The generalized wearable intelligent device comprises full functions, large size, and complete or partial functions which can be realized independent of a smart phone, such as a smart watch or a smart glasses, and is only focused on certain application functions, and needs to be matched with other devices such as the smart phone for use, such as various smart bracelets, smart jewelry and the like for physical sign monitoring.

The electronic device is a mobile phone as an example.

The handset may include a processor, an external memory interface, an internal memory, a universal serial bus (universal serial bus, USB) interface, a charge management module, a power management module, a battery, an antenna, a mobile communication module, a wireless communication module, an audio module, a speaker, a receiver, a microphone, an earphone interface, a sensor module, keys, a motor, an indicator, a camera, a display screen, a SIM card interface, and the like. The sensor module may include a gyroscope sensor, an acceleration sensor, a barometric sensor, a magnetic sensor, an ambient light sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor (of course, the mobile phone may also include other sensors such as a temperature sensor, a pressure sensor, a barometric sensor, a bone conduction sensor, etc.).

It will be appreciated that the structure illustrated in the embodiments of the present application is not limited to a specific configuration of the mobile phone. In other embodiments of the application, the handset may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 11, the electronic device 11 of this embodiment includes: at least one processor 110 (only one is shown in fig. 11), a memory 111, said memory 111 having stored therein a computer program 112 executable on said processor 110. The processor 110, when executing the computer program 112, implements the steps of the various visual positioning method embodiments described above, such as steps 201 through 206 shown in fig. 3. Alternatively, the processor 110 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 1001 to 1004 shown in fig. 10, when executing the computer program 112.

The electronic device 11 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The electronic device may include, but is not limited to, a processor 110, a memory 111. It will be appreciated by those skilled in the art that fig. 11 is merely an example of an electronic device 11 and is not meant to be limiting as to the electronic device 11, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may also include an input transmitting device, a network access device, a bus, etc.

The processor 110 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 111 may in some embodiments be an internal storage unit of the electronic device 11, such as a hard disk or a memory of the electronic device 11. The memory 111 may be an external storage device of the electronic device 11, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 11. Further, the memory 111 may also include both an internal storage unit and an external storage device of the electronic device 11. The memory 111 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, etc., such as program codes of the computer program. The memory 111 may also be used to temporarily store data that has been transmitted or is to be transmitted.

In addition, it will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The embodiment of the application also provides an electronic device, which comprises at least one memory, at least one processor and a computer program stored in the at least one memory and capable of running on the at least one processor, wherein the processor executes the computer program to enable the electronic device to realize the steps in any of the method embodiments.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform the steps of the method embodiments described above.

The embodiment of the application also provides a chip system, which comprises a processor, wherein the processor is coupled with a memory, and the processor executes a computer program stored in the memory to realize the steps in the embodiments of the method.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of visual positioning, for application to a visual positioning device, the method comprising:

acquiring a query image of a position point of equipment to be positioned;

acquiring equipment point cloud data of the equipment to be positioned, wherein the equipment point cloud data comprises three-dimensional point cloud data of a position point where the equipment to be positioned is located;

determining a target range of the equipment to be positioned according to the query image and the equipment point cloud data;

and carrying out image matching on the sample image associated with the target range based on the query image, determining a target image matched with the query image, and determining the position information of the equipment to be positioned based on the position information associated with the target image.

2. The visual positioning method according to claim 1, wherein the determining, according to the query image and the device point cloud data, a target range in which the device to be positioned is located includes:

determining a plurality of matched position ranges matched with the query image;

and carrying out point cloud matching on the three-dimensional point cloud data of each position range based on the equipment point cloud data, and determining the target range matched with the equipment point cloud data.

3. The visual positioning method of claim 2, wherein said determining a number of matching location ranges matching the query image comprises:

searching a plurality of similar images similar to the query image from a preset sample image;

and carrying out image clustering on all the searched similar images to obtain a plurality of clustering results, and determining the position range corresponding to each clustering result.

4. A visual positioning method according to claim 3, wherein the searching for a plurality of similar images similar to the query image from the preset sample image comprises:

and carrying out image retrieval on the preset sample image based on the query image to obtain a plurality of similar images.

5. The visual positioning method according to any one of claims 1 to 4, wherein the obtaining device point cloud data of the device to be positioned includes:

receiving the equipment point cloud data sent by the equipment to be positioned; or alternatively, the process may be performed,

and receiving video data and inertial measurement unit data sent by the equipment to be positioned, and processing the video data and the inertial measurement unit data to obtain the equipment point cloud data.

6. The visual positioning method according to claim 5, wherein the acquiring a query image of a location point of the device to be positioned; acquiring the equipment point cloud data of the equipment to be positioned, wherein the equipment point cloud data comprises:

receiving video data and inertial measurement unit data sent by the equipment to be positioned, and processing the video data and the inertial measurement unit data to obtain equipment point cloud data;

and extracting the latest image frame in the video data as the query image.

7. The visual positioning method of claim 6, wherein the device point cloud data comprises: and three-dimensional point cloud data acquired by the equipment to be positioned within the latest preset time length.

8. A visual positioning system, comprising: visual positioning equipment and equipment to be positioned; the visual positioning device is used for executing the visual positioning method according to any one of claims 1 to 7, and the device to be positioned is the device to be positioned according to any one of claims 1 to 7;

acquiring a query image of a position point of equipment to be positioned; before obtaining the device point cloud data of the device to be positioned, the method further comprises:

The equipment to be positioned acquires a query image of a position point where the equipment to be positioned is located, three-dimensional point cloud data acquired in the latest preset time length, and the three-dimensional point cloud data acquired in the latest preset time length is used as the equipment point cloud data;

the equipment to be positioned sends the query image and the equipment point cloud data to the visual positioning equipment;

acquiring a query image of a position point of equipment to be positioned; acquiring the equipment point cloud data of the equipment to be positioned, wherein the equipment point cloud data comprises:

9. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, the processor implementing the visual localization method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the visual positioning method according to any one of claims 1 to 7.